Re: Expunging Deletes

2014-09-30 Thread Upayavira
Or consider separating frequently changing data into a different core
from the slow moving data, if you can, reducing the amount of data being
pushed around.

Upayavira

On Mon, Sep 29, 2014, at 09:16 PM, Bryan Bende wrote:
 You can try lowering the mergeFactor in solrconfig.xml to cause more
 merges
 to happen during normal indexing, which should result in more deleted
 documents being removed from the index, but there is a trade-off
 
 http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
 
 On Mon, Sep 29, 2014 at 2:14 PM, Eric Katherman e...@knackhq.com wrote:
 
  Thanks for replying!  Is there anything I could be doing to help prevent
  the 14GB collection with 700k deleted docs before it tries removing them
  and at that point running out of memory?  Maybe just scheduled off-peak
  optimize calls with expungeDeletes?  Or is there some other config option I
  could be using to help manage that a little better?
 
  Thanks!
  Eric
 
 
  On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com
  wrote:
 
   Yes, expungeDeletes=true will remove all deleted docs from the disk but
  it
   also requires merging all segments that have any deleted docs which, in
   your case, could mean a re-write of the entire index. So it'd be an
   expensive operation. Usually deletes are removed in the normal course of
   indexing as segments are merged together.
  
   On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com
  wrote:
  
   I'm running into memory issues and wondering if I should be using
   expungeDeletes on commits.  The server in question at the moment has
  450k
   documents in the collection and represents 15GB on disk.  There are also
   700k+ Deleted Docs and I'm guessing that is part of the disk space
   consumption but I am not having any luck getting that cleared out.  I
   noticed the expungeDeletes=false in some of the log output related to
   commit but didn't try setting it to true yet. Will this clear those
  deleted
   documents and recover that space?  Or should something else already be
   managing that but maybe isn't configured correctly?
  
   Our data is user specific data, each customer has their own database
   structure so it varies with each user.  They also add/remove data fairly
   frequently in many cases.  To compare another collection of the same
  data
   type, there are 1M documents and about 120k deleted docs but disk space
  is
   only 6.3GB.
  
   Hoping someone can share some advice about how to manage this.
  
   Thanks,
   Eric
  
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
 
 


Re: Expunging Deletes

2014-09-29 Thread Shalin Shekhar Mangar
Yes, expungeDeletes=true will remove all deleted docs from the disk but it
also requires merging all segments that have any deleted docs which, in
your case, could mean a re-write of the entire index. So it'd be an
expensive operation. Usually deletes are removed in the normal course of
indexing as segments are merged together.

On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote:

 I'm running into memory issues and wondering if I should be using
 expungeDeletes on commits.  The server in question at the moment has 450k
 documents in the collection and represents 15GB on disk.  There are also
 700k+ Deleted Docs and I'm guessing that is part of the disk space
 consumption but I am not having any luck getting that cleared out.  I
 noticed the expungeDeletes=false in some of the log output related to
 commit but didn't try setting it to true yet. Will this clear those deleted
 documents and recover that space?  Or should something else already be
 managing that but maybe isn't configured correctly?

 Our data is user specific data, each customer has their own database
 structure so it varies with each user.  They also add/remove data fairly
 frequently in many cases.  To compare another collection of the same data
 type, there are 1M documents and about 120k deleted docs but disk space is
 only 6.3GB.

 Hoping someone can share some advice about how to manage this.

 Thanks,
 Eric




-- 
Regards,
Shalin Shekhar Mangar.


Re: Expunging Deletes

2014-09-29 Thread Eric Katherman
Thanks for replying!  Is there anything I could be doing to help prevent the 
14GB collection with 700k deleted docs before it tries removing them and at 
that point running out of memory?  Maybe just scheduled off-peak optimize calls 
with expungeDeletes?  Or is there some other config option I could be using to 
help manage that a little better?

Thanks!
Eric


On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com 
wrote:

 Yes, expungeDeletes=true will remove all deleted docs from the disk but it
 also requires merging all segments that have any deleted docs which, in
 your case, could mean a re-write of the entire index. So it'd be an
 expensive operation. Usually deletes are removed in the normal course of
 indexing as segments are merged together.
 
 On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote:
 
 I'm running into memory issues and wondering if I should be using
 expungeDeletes on commits.  The server in question at the moment has 450k
 documents in the collection and represents 15GB on disk.  There are also
 700k+ Deleted Docs and I'm guessing that is part of the disk space
 consumption but I am not having any luck getting that cleared out.  I
 noticed the expungeDeletes=false in some of the log output related to
 commit but didn't try setting it to true yet. Will this clear those deleted
 documents and recover that space?  Or should something else already be
 managing that but maybe isn't configured correctly?
 
 Our data is user specific data, each customer has their own database
 structure so it varies with each user.  They also add/remove data fairly
 frequently in many cases.  To compare another collection of the same data
 type, there are 1M documents and about 120k deleted docs but disk space is
 only 6.3GB.
 
 Hoping someone can share some advice about how to manage this.
 
 Thanks,
 Eric
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: Expunging Deletes

2014-09-29 Thread Bryan Bende
You can try lowering the mergeFactor in solrconfig.xml to cause more merges
to happen during normal indexing, which should result in more deleted
documents being removed from the index, but there is a trade-off

http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

On Mon, Sep 29, 2014 at 2:14 PM, Eric Katherman e...@knackhq.com wrote:

 Thanks for replying!  Is there anything I could be doing to help prevent
 the 14GB collection with 700k deleted docs before it tries removing them
 and at that point running out of memory?  Maybe just scheduled off-peak
 optimize calls with expungeDeletes?  Or is there some other config option I
 could be using to help manage that a little better?

 Thanks!
 Eric


 On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com
 wrote:

  Yes, expungeDeletes=true will remove all deleted docs from the disk but
 it
  also requires merging all segments that have any deleted docs which, in
  your case, could mean a re-write of the entire index. So it'd be an
  expensive operation. Usually deletes are removed in the normal course of
  indexing as segments are merged together.
 
  On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com
 wrote:
 
  I'm running into memory issues and wondering if I should be using
  expungeDeletes on commits.  The server in question at the moment has
 450k
  documents in the collection and represents 15GB on disk.  There are also
  700k+ Deleted Docs and I'm guessing that is part of the disk space
  consumption but I am not having any luck getting that cleared out.  I
  noticed the expungeDeletes=false in some of the log output related to
  commit but didn't try setting it to true yet. Will this clear those
 deleted
  documents and recover that space?  Or should something else already be
  managing that but maybe isn't configured correctly?
 
  Our data is user specific data, each customer has their own database
  structure so it varies with each user.  They also add/remove data fairly
  frequently in many cases.  To compare another collection of the same
 data
  type, there are 1M documents and about 120k deleted docs but disk space
 is
  only 6.3GB.
 
  Hoping someone can share some advice about how to manage this.
 
  Thanks,
  Eric
 
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.




Expunging Deletes

2014-09-27 Thread Eric Katherman
I'm running into memory issues and wondering if I should be using 
expungeDeletes on commits.  The server in question at the moment has 
450k documents in the collection and represents 15GB on disk.  There are 
also 700k+ Deleted Docs and I'm guessing that is part of the disk 
space consumption but I am not having any luck getting that cleared 
out.  I noticed the expungeDeletes=false in some of the log output 
related to commit but didn't try setting it to true yet. Will this clear 
those deleted documents and recover that space?  Or should something 
else already be managing that but maybe isn't configured correctly?


Our data is user specific data, each customer has their own database 
structure so it varies with each user.  They also add/remove data fairly 
frequently in many cases.  To compare another collection of the same 
data type, there are 1M documents and about 120k deleted docs but disk 
space is only 6.3GB.


Hoping someone can share some advice about how to manage this.

Thanks,
Eric


RE: expunging deletes

2013-07-12 Thread Petersen, Robert
OK Thanks Shawn,

 I went with this because 10 wasn't working for us and it looks like my index 
is staying under 20 GB now with numDocs : 16897524 and maxDoc : 19048053

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce5/int
  int name=segmentsPerTier5/int
  int name=maxMergeAtOnceExplicit15/int
  double name=maxMergedSegmentMB6144.0/double
  double name=reclaimDeletesWeight6.0/double
/mergePolicy



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 10, 2013 5:34 PM
To: solr-user@lucene.apache.org
Subject: Re: expunging deletes

On 7/10/2013 5:58 PM, Petersen, Robert wrote:
 Using solr 3.6.1 and the following settings, I am trying to run without 
 optimizes.  I used to optimize nightly, but sometimes the optimize took a 
 very long time to complete and slowed down our indexing.  We are continuously 
 indexing our new or changed data all day and night.  After a few days running 
 without an optimize, the index size has nearly doubled and maxdocs is nearly 
 twice the size of numdocs.  I understand deletes should be expunged on 
 merges, but even after trying lots of different settings for our merge policy 
 it seems this growth is somewhat unbounded.  I have tried sending an optimize 
 with numSegments = 2 which is a lot lighter weight then a regular optimize 
 and that does bring the number down but not by too much.  Does anyone have 
 any ideas for better settings for my merge policy that would help?  Here is 
 my current index snapshot too:

Your merge settings are the equivalent of the old mergeFactor set to 35, and 
based on the fact that you have the Explicit set to 105, I'm guessing your 
settings originally came from something I posted - these are the numbers that I 
use.  These settings can result in a very large number of segments on your disk.

Because you index a lot (and probably reindex existing documents often), I can 
understand why you have high merge settings, but if you want to eliminate 
optimizes, you'll need to go lower.  The default merge setting of 10 (with an 
Explicit value of 30) is probably a good starting point, but you might need to 
go even smaller.

On Solr 3.6, an optimize probably cannot take place at the same time as index 
updates -- the optimize would probably delay updates until after it's finished. 
 I remember running into problems on Solr 3.x, so I set up my indexing program 
to stop updates while the index was optimizing.

Solr 4.x should lift any restriction where optimizes and updates can't happen 
at the same time.

With an index size of 25GB, a six-drive RAID10 should be able to optimize in 
10-15 minutes, but if your I/O system is single disk, RAID1, RAID5, or RAID6, 
the write performance may cause this to take longer.
If you went with SSD, optimizes would happen VERY fast.

Thanks,
Shawn





expunging deletes

2013-07-10 Thread Petersen, Robert
Hi guys,

Using solr 3.6.1 and the following settings, I am trying to run without 
optimizes.  I used to optimize nightly, but sometimes the optimize took a very 
long time to complete and slowed down our indexing.  We are continuously 
indexing our new or changed data all day and night.  After a few days running 
without an optimize, the index size has nearly doubled and maxdocs is nearly 
twice the size of numdocs.  I understand deletes should be expunged on merges, 
but even after trying lots of different settings for our merge policy it seems 
this growth is somewhat unbounded.  I have tried sending an optimize with 
numSegments = 2 which is a lot lighter weight then a regular optimize and that 
does bring the number down but not by too much.  Does anyone have any ideas for 
better settings for my merge policy that would help?  Here is my current index 
snapshot too:

Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 25.05 GB  (when the index is optimized it is around 15.5 GB)
searcherName : Searcher@6c3a3517 main 
caching : true 
numDocs : 16852155 
maxDoc : 24512617 
reader : 
SolrIndexReader{this=6e3b4ec8,r=ReadOnlyDirectoryReader@6e3b4ec8,refCnt=1,segments=61}
 


mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce35/int
  int name=segmentsPerTier35/int
  int name=maxMergeAtOnceExplicit105/int
  double name=maxMergedSegmentMB6144.0/double
  double name=reclaimDeletesWeight8.0/double
/mergePolicy
 
 mergeScheduler 
class=org.apache.lucene.index.ConcurrentMergeScheduler
  int name=maxMergeCount20/int
  int name=maxThreadCount3/int
  /mergeScheduler

Thanks,

Robert (Robi) Petersen
Senior Software Engineer
Search Department


   (formerly Buy.com)
85 enterprise, suite 100
aliso viejo, ca 92656
tel 949.389.2000 x5465
fax 949.448.5415


  





Re: expunging deletes

2013-07-10 Thread Shawn Heisey
On 7/10/2013 5:58 PM, Petersen, Robert wrote:
 Using solr 3.6.1 and the following settings, I am trying to run without 
 optimizes.  I used to optimize nightly, but sometimes the optimize took a 
 very long time to complete and slowed down our indexing.  We are continuously 
 indexing our new or changed data all day and night.  After a few days running 
 without an optimize, the index size has nearly doubled and maxdocs is nearly 
 twice the size of numdocs.  I understand deletes should be expunged on 
 merges, but even after trying lots of different settings for our merge policy 
 it seems this growth is somewhat unbounded.  I have tried sending an optimize 
 with numSegments = 2 which is a lot lighter weight then a regular optimize 
 and that does bring the number down but not by too much.  Does anyone have 
 any ideas for better settings for my merge policy that would help?  Here is 
 my current index snapshot too:

Your merge settings are the equivalent of the old mergeFactor set to 35,
and based on the fact that you have the Explicit set to 105, I'm
guessing your settings originally came from something I posted - these
are the numbers that I use.  These settings can result in a very large
number of segments on your disk.

Because you index a lot (and probably reindex existing documents often),
I can understand why you have high merge settings, but if you want to
eliminate optimizes, you'll need to go lower.  The default merge setting
of 10 (with an Explicit value of 30) is probably a good starting point,
but you might need to go even smaller.

On Solr 3.6, an optimize probably cannot take place at the same time as
index updates -- the optimize would probably delay updates until after
it's finished.  I remember running into problems on Solr 3.x, so I set
up my indexing program to stop updates while the index was optimizing.

Solr 4.x should lift any restriction where optimizes and updates can't
happen at the same time.

With an index size of 25GB, a six-drive RAID10 should be able to
optimize in 10-15 minutes, but if your I/O system is single disk, RAID1,
RAID5, or RAID6, the write performance may cause this to take longer.
If you went with SSD, optimizes would happen VERY fast.

Thanks,
Shawn



Expunging deletes from a very large index

2011-06-06 Thread Simon Wistow
Due to some emergency maintenance I needed to run delete on a large 
number of documents in a 200Gb index.

The problem is that it's taking an inordinately long amount of time (2+ 
hours so far and counting) and is steadily eating up disk space - 
presumably up to 2x index size which is getting awfully close to the 
wire on this machine.

Is that inevitable? Is there any way to speed up the process or use less 
space? Maybe do an optimize with a different number of maxSegments?

I suspect not but I thought it was worth asking.






Re: Expunging deletes from a very large index

2011-06-06 Thread Michael McCandless
You can drop your mergeFactor to 2 and then run expungeDeletes?

This will make the operation take longer but (assuming you have  3
segments in your index) should use less transient disk space.

You could also make a custom merge policy, that expunges one segment
at a time (even slower but even less transient disk space required).

optimize(maxNumSegments) may also help, though it's not guaranteed to
reclaim disk space due to deleted docs.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 6, 2011 at 2:16 AM, Simon Wistow si...@thegestalt.org wrote:
 Due to some emergency maintenance I needed to run delete on a large
 number of documents in a 200Gb index.

 The problem is that it's taking an inordinately long amount of time (2+
 hours so far and counting) and is steadily eating up disk space -
 presumably up to 2x index size which is getting awfully close to the
 wire on this machine.

 Is that inevitable? Is there any way to speed up the process or use less
 space? Maybe do an optimize with a different number of maxSegments?

 I suspect not but I thought it was worth asking.