Re: Expunging Deletes
Or consider separating frequently changing data into a different core from the slow moving data, if you can, reducing the amount of data being pushed around. Upayavira On Mon, Sep 29, 2014, at 09:16 PM, Bryan Bende wrote: You can try lowering the mergeFactor in solrconfig.xml to cause more merges to happen during normal indexing, which should result in more deleted documents being removed from the index, but there is a trade-off http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor On Mon, Sep 29, 2014 at 2:14 PM, Eric Katherman e...@knackhq.com wrote: Thanks for replying! Is there anything I could be doing to help prevent the 14GB collection with 700k deleted docs before it tries removing them and at that point running out of memory? Maybe just scheduled off-peak optimize calls with expungeDeletes? Or is there some other config option I could be using to help manage that a little better? Thanks! Eric On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes, expungeDeletes=true will remove all deleted docs from the disk but it also requires merging all segments that have any deleted docs which, in your case, could mean a re-write of the entire index. So it'd be an expensive operation. Usually deletes are removed in the normal course of indexing as segments are merged together. On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote: I'm running into memory issues and wondering if I should be using expungeDeletes on commits. The server in question at the moment has 450k documents in the collection and represents 15GB on disk. There are also 700k+ Deleted Docs and I'm guessing that is part of the disk space consumption but I am not having any luck getting that cleared out. I noticed the expungeDeletes=false in some of the log output related to commit but didn't try setting it to true yet. Will this clear those deleted documents and recover that space? Or should something else already be managing that but maybe isn't configured correctly? Our data is user specific data, each customer has their own database structure so it varies with each user. They also add/remove data fairly frequently in many cases. To compare another collection of the same data type, there are 1M documents and about 120k deleted docs but disk space is only 6.3GB. Hoping someone can share some advice about how to manage this. Thanks, Eric -- Regards, Shalin Shekhar Mangar.
Re: Expunging Deletes
Yes, expungeDeletes=true will remove all deleted docs from the disk but it also requires merging all segments that have any deleted docs which, in your case, could mean a re-write of the entire index. So it'd be an expensive operation. Usually deletes are removed in the normal course of indexing as segments are merged together. On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote: I'm running into memory issues and wondering if I should be using expungeDeletes on commits. The server in question at the moment has 450k documents in the collection and represents 15GB on disk. There are also 700k+ Deleted Docs and I'm guessing that is part of the disk space consumption but I am not having any luck getting that cleared out. I noticed the expungeDeletes=false in some of the log output related to commit but didn't try setting it to true yet. Will this clear those deleted documents and recover that space? Or should something else already be managing that but maybe isn't configured correctly? Our data is user specific data, each customer has their own database structure so it varies with each user. They also add/remove data fairly frequently in many cases. To compare another collection of the same data type, there are 1M documents and about 120k deleted docs but disk space is only 6.3GB. Hoping someone can share some advice about how to manage this. Thanks, Eric -- Regards, Shalin Shekhar Mangar.
Re: Expunging Deletes
Thanks for replying! Is there anything I could be doing to help prevent the 14GB collection with 700k deleted docs before it tries removing them and at that point running out of memory? Maybe just scheduled off-peak optimize calls with expungeDeletes? Or is there some other config option I could be using to help manage that a little better? Thanks! Eric On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes, expungeDeletes=true will remove all deleted docs from the disk but it also requires merging all segments that have any deleted docs which, in your case, could mean a re-write of the entire index. So it'd be an expensive operation. Usually deletes are removed in the normal course of indexing as segments are merged together. On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote: I'm running into memory issues and wondering if I should be using expungeDeletes on commits. The server in question at the moment has 450k documents in the collection and represents 15GB on disk. There are also 700k+ Deleted Docs and I'm guessing that is part of the disk space consumption but I am not having any luck getting that cleared out. I noticed the expungeDeletes=false in some of the log output related to commit but didn't try setting it to true yet. Will this clear those deleted documents and recover that space? Or should something else already be managing that but maybe isn't configured correctly? Our data is user specific data, each customer has their own database structure so it varies with each user. They also add/remove data fairly frequently in many cases. To compare another collection of the same data type, there are 1M documents and about 120k deleted docs but disk space is only 6.3GB. Hoping someone can share some advice about how to manage this. Thanks, Eric -- Regards, Shalin Shekhar Mangar.
Re: Expunging Deletes
You can try lowering the mergeFactor in solrconfig.xml to cause more merges to happen during normal indexing, which should result in more deleted documents being removed from the index, but there is a trade-off http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor On Mon, Sep 29, 2014 at 2:14 PM, Eric Katherman e...@knackhq.com wrote: Thanks for replying! Is there anything I could be doing to help prevent the 14GB collection with 700k deleted docs before it tries removing them and at that point running out of memory? Maybe just scheduled off-peak optimize calls with expungeDeletes? Or is there some other config option I could be using to help manage that a little better? Thanks! Eric On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes, expungeDeletes=true will remove all deleted docs from the disk but it also requires merging all segments that have any deleted docs which, in your case, could mean a re-write of the entire index. So it'd be an expensive operation. Usually deletes are removed in the normal course of indexing as segments are merged together. On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman e...@knackhq.com wrote: I'm running into memory issues and wondering if I should be using expungeDeletes on commits. The server in question at the moment has 450k documents in the collection and represents 15GB on disk. There are also 700k+ Deleted Docs and I'm guessing that is part of the disk space consumption but I am not having any luck getting that cleared out. I noticed the expungeDeletes=false in some of the log output related to commit but didn't try setting it to true yet. Will this clear those deleted documents and recover that space? Or should something else already be managing that but maybe isn't configured correctly? Our data is user specific data, each customer has their own database structure so it varies with each user. They also add/remove data fairly frequently in many cases. To compare another collection of the same data type, there are 1M documents and about 120k deleted docs but disk space is only 6.3GB. Hoping someone can share some advice about how to manage this. Thanks, Eric -- Regards, Shalin Shekhar Mangar.
Expunging Deletes
I'm running into memory issues and wondering if I should be using expungeDeletes on commits. The server in question at the moment has 450k documents in the collection and represents 15GB on disk. There are also 700k+ Deleted Docs and I'm guessing that is part of the disk space consumption but I am not having any luck getting that cleared out. I noticed the expungeDeletes=false in some of the log output related to commit but didn't try setting it to true yet. Will this clear those deleted documents and recover that space? Or should something else already be managing that but maybe isn't configured correctly? Our data is user specific data, each customer has their own database structure so it varies with each user. They also add/remove data fairly frequently in many cases. To compare another collection of the same data type, there are 1M documents and about 120k deleted docs but disk space is only 6.3GB. Hoping someone can share some advice about how to manage this. Thanks, Eric
RE: expunging deletes
OK Thanks Shawn, I went with this because 10 wasn't working for us and it looks like my index is staying under 20 GB now with numDocs : 16897524 and maxDoc : 19048053 mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce5/int int name=segmentsPerTier5/int int name=maxMergeAtOnceExplicit15/int double name=maxMergedSegmentMB6144.0/double double name=reclaimDeletesWeight6.0/double /mergePolicy -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 10, 2013 5:34 PM To: solr-user@lucene.apache.org Subject: Re: expunging deletes On 7/10/2013 5:58 PM, Petersen, Robert wrote: Using solr 3.6.1 and the following settings, I am trying to run without optimizes. I used to optimize nightly, but sometimes the optimize took a very long time to complete and slowed down our indexing. We are continuously indexing our new or changed data all day and night. After a few days running without an optimize, the index size has nearly doubled and maxdocs is nearly twice the size of numdocs. I understand deletes should be expunged on merges, but even after trying lots of different settings for our merge policy it seems this growth is somewhat unbounded. I have tried sending an optimize with numSegments = 2 which is a lot lighter weight then a regular optimize and that does bring the number down but not by too much. Does anyone have any ideas for better settings for my merge policy that would help? Here is my current index snapshot too: Your merge settings are the equivalent of the old mergeFactor set to 35, and based on the fact that you have the Explicit set to 105, I'm guessing your settings originally came from something I posted - these are the numbers that I use. These settings can result in a very large number of segments on your disk. Because you index a lot (and probably reindex existing documents often), I can understand why you have high merge settings, but if you want to eliminate optimizes, you'll need to go lower. The default merge setting of 10 (with an Explicit value of 30) is probably a good starting point, but you might need to go even smaller. On Solr 3.6, an optimize probably cannot take place at the same time as index updates -- the optimize would probably delay updates until after it's finished. I remember running into problems on Solr 3.x, so I set up my indexing program to stop updates while the index was optimizing. Solr 4.x should lift any restriction where optimizes and updates can't happen at the same time. With an index size of 25GB, a six-drive RAID10 should be able to optimize in 10-15 minutes, but if your I/O system is single disk, RAID1, RAID5, or RAID6, the write performance may cause this to take longer. If you went with SSD, optimizes would happen VERY fast. Thanks, Shawn
expunging deletes
Hi guys, Using solr 3.6.1 and the following settings, I am trying to run without optimizes. I used to optimize nightly, but sometimes the optimize took a very long time to complete and slowed down our indexing. We are continuously indexing our new or changed data all day and night. After a few days running without an optimize, the index size has nearly doubled and maxdocs is nearly twice the size of numdocs. I understand deletes should be expunged on merges, but even after trying lots of different settings for our merge policy it seems this growth is somewhat unbounded. I have tried sending an optimize with numSegments = 2 which is a lot lighter weight then a regular optimize and that does bring the number down but not by too much. Does anyone have any ideas for better settings for my merge policy that would help? Here is my current index snapshot too: Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 25.05 GB (when the index is optimized it is around 15.5 GB) searcherName : Searcher@6c3a3517 main caching : true numDocs : 16852155 maxDoc : 24512617 reader : SolrIndexReader{this=6e3b4ec8,r=ReadOnlyDirectoryReader@6e3b4ec8,refCnt=1,segments=61} mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int double name=maxMergedSegmentMB6144.0/double double name=reclaimDeletesWeight8.0/double /mergePolicy mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount20/int int name=maxThreadCount3/int /mergeScheduler Thanks, Robert (Robi) Petersen Senior Software Engineer Search Department (formerly Buy.com) 85 enterprise, suite 100 aliso viejo, ca 92656 tel 949.389.2000 x5465 fax 949.448.5415
Re: expunging deletes
On 7/10/2013 5:58 PM, Petersen, Robert wrote: Using solr 3.6.1 and the following settings, I am trying to run without optimizes. I used to optimize nightly, but sometimes the optimize took a very long time to complete and slowed down our indexing. We are continuously indexing our new or changed data all day and night. After a few days running without an optimize, the index size has nearly doubled and maxdocs is nearly twice the size of numdocs. I understand deletes should be expunged on merges, but even after trying lots of different settings for our merge policy it seems this growth is somewhat unbounded. I have tried sending an optimize with numSegments = 2 which is a lot lighter weight then a regular optimize and that does bring the number down but not by too much. Does anyone have any ideas for better settings for my merge policy that would help? Here is my current index snapshot too: Your merge settings are the equivalent of the old mergeFactor set to 35, and based on the fact that you have the Explicit set to 105, I'm guessing your settings originally came from something I posted - these are the numbers that I use. These settings can result in a very large number of segments on your disk. Because you index a lot (and probably reindex existing documents often), I can understand why you have high merge settings, but if you want to eliminate optimizes, you'll need to go lower. The default merge setting of 10 (with an Explicit value of 30) is probably a good starting point, but you might need to go even smaller. On Solr 3.6, an optimize probably cannot take place at the same time as index updates -- the optimize would probably delay updates until after it's finished. I remember running into problems on Solr 3.x, so I set up my indexing program to stop updates while the index was optimizing. Solr 4.x should lift any restriction where optimizes and updates can't happen at the same time. With an index size of 25GB, a six-drive RAID10 should be able to optimize in 10-15 minutes, but if your I/O system is single disk, RAID1, RAID5, or RAID6, the write performance may cause this to take longer. If you went with SSD, optimizes would happen VERY fast. Thanks, Shawn
Expunging deletes from a very large index
Due to some emergency maintenance I needed to run delete on a large number of documents in a 200Gb index. The problem is that it's taking an inordinately long amount of time (2+ hours so far and counting) and is steadily eating up disk space - presumably up to 2x index size which is getting awfully close to the wire on this machine. Is that inevitable? Is there any way to speed up the process or use less space? Maybe do an optimize with a different number of maxSegments? I suspect not but I thought it was worth asking.
Re: Expunging deletes from a very large index
You can drop your mergeFactor to 2 and then run expungeDeletes? This will make the operation take longer but (assuming you have 3 segments in your index) should use less transient disk space. You could also make a custom merge policy, that expunges one segment at a time (even slower but even less transient disk space required). optimize(maxNumSegments) may also help, though it's not guaranteed to reclaim disk space due to deleted docs. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 6, 2011 at 2:16 AM, Simon Wistow si...@thegestalt.org wrote: Due to some emergency maintenance I needed to run delete on a large number of documents in a 200Gb index. The problem is that it's taking an inordinately long amount of time (2+ hours so far and counting) and is steadily eating up disk space - presumably up to 2x index size which is getting awfully close to the wire on this machine. Is that inevitable? Is there any way to speed up the process or use less space? Maybe do an optimize with a different number of maxSegments? I suspect not but I thought it was worth asking.