Re: optimize is taking too much time
With a mergeFactor set to anything 1 you would never have only one segment - unless you optimized. So Lucene will never naturally merge all the segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've never tested that. It's hard to picture how that would work. If I understand correctly, the same actions occur (deleted documents are removed, etc.) because an optimize is only a multiway merge down to one segment, whereas normal merging is triggered by the mergeFactor, but does not have a target segment count to merge down to. -Jay On Sun, Feb 21, 2010 at 11:20 AM, David Smiley @MITRE.org dsmi...@mitre.org wrote: I've always thought that these two events were effectively equivalent. -- the results of an optimize vs the results of Lucene _naturally_ merging all segments together into one. If they don't have the safe effect then what is the difference? ~ David Smiley Otis Gospodnetic wrote: Hello, Solr will never optimize the whole index without somebody explicitly asking for it. Lucene will merge index segments on the master as documents are indexed. How often it does that depends on mergeFactor. See: http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: mklprasad mklpra...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 1:02:11 AM Subject: Re: optimize is taking too much time Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com. Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
Your response contradicts the wiki's description of mergeFactor: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor -- which clearly states that the indexes are merged into a single segment. It makes no reference to optimize to trigger this condition. If what you say is true, and we agree that the mergeFactor is the upper bound of the number of segments, then what is the lower bound of the number of segments seen for an index that has not been optimized? Always 2, or some function of mergeFactor? ~ David Smiley Jay Hill wrote: With a mergeFactor set to anything 1 you would never have only one segment - unless you optimized. So Lucene will never naturally merge all the segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've never tested that. It's hard to picture how that would work. If I understand correctly, the same actions occur (deleted documents are removed, etc.) because an optimize is only a multiway merge down to one segment, whereas normal merging is triggered by the mergeFactor, but does not have a target segment count to merge down to. -Jay ... -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27693177.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
On Sun, Feb 21, 2010 at 2:20 PM, David Smiley @MITRE.org dsmi...@mitre.org wrote: I've always thought that these two events were effectively equivalent. -- the results of an optimize vs the results of Lucene _naturally_ merging all segments together into one. Correct. Occasionally one hit's a major merge and as few as 1 segment is produced. Think about the odometer in your car (if you have one that spins). Each digit is the number of segments of that size... so the total number of segments in the index is the total of the digits. You naturally get back to a single segment whenever it rolls over to the next highest power of 10 (mergeFactor=10). -Yonik http://www.lucidimagination.com
Re: optimize is taking too much time
Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go. -- - Mark http://www.lucidimagination.com
Re: optimize is taking too much time
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Unless that merge-down-to-one occurs right when indexing stops there will almost always be a new (small) segment following immediately after the merge. It would be interesting to observe, over a long time, how often and for how long everything is merged down to a single segment. Probably with a very low mergeFactor (2 or 3?) merges-to-one might occur often enough to make optimizing unnecessary. But I'm guessing that the merge-to-one happens so infrequently in most situations that optimizing is more important. -Jay On Mon, Feb 22, 2010 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go. -- - Mark http://www.lucidimagination.com
Re: optimize is taking too much time
On Mon, Feb 22, 2010 at 6:39 PM, Jay Hill jayallenh...@gmail.com wrote: It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Right - but the performance impact of a huge merge can be non-trivial. People wishing to avoid the biggest of merges at unpredictable times should look at parameters such as maxMergeDocs that prevent merging segments over a certain size. -Yonik http://www.lucidimagination.com
Re: optimize is taking too much time
I've always thought that these two events were effectively equivalent. -- the results of an optimize vs the results of Lucene _naturally_ merging all segments together into one. If they don't have the safe effect then what is the difference? ~ David Smiley Otis Gospodnetic wrote: Hello, Solr will never optimize the whole index without somebody explicitly asking for it. Lucene will merge index segments on the master as documents are indexed. How often it does that depends on mergeFactor. See: http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: mklprasad mklpra...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 1:02:11 AM Subject: Re: optimize is taking too much time Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com. Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
Hello, Solr will never optimize the whole index without somebody explicitly asking for it. Lucene will merge index segments on the master as documents are indexed. How often it does that depends on mergeFactor. See: http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: mklprasad mklpra...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 1:02:11 AM Subject: Re: optimize is taking too much time Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com. Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
Hi, You can also make use of autocommit feature of solr. You have two possibilities either based on max number of uncommited docs or based on time. see updateHandler of your solrconfig.xml. Example:- autoCommit !-- maxDocs1/maxDocs -- !-- maximum time (in MS) after adding a doc before an autocommit is triggered -- maxTime60/maxTime /autoCommit once your done with adding run final optimize/commit. Regards, P.N.Raju, From: Jagdish Vasani jvasani1...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 3:12:15 PM Subject: Re: optimize is taking too much time Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com. Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify autoOptimze 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimize is taking too much time
: in my solr u have 1,42,45,223 records having some 50GB . : Now when iam loading a new record and when its trying optimize the docs its : taking 2 much memory and time : can any body please tell do we have any property in solr to get rid of this. Solr isn't going to optimize the index unless you tell it to -- how are you indexing your docs? are you sure you don't have something programmed to send an optimize command? -Hoss
Re: optimize is taking too much time
hossman wrote: : in my solr u have 1,42,45,223 records having some 50GB . : Now when iam loading a new record and when its trying optimize the docs its : taking 2 much memory and time : can any body please tell do we have any property in solr to get rid of this. Solr isn't going to optimize the index unless you tell it to -- how are you indexing your docs? are you sure you don't have something programmed to send an optimize command? -Hoss yes , From My Code For Every Load iam calling the server.optimize() method ( Now iam planning to remove this from the code) in the config level i have 'mergerFactor=10' i have a doubt like will the mergerFactor will only do a merge or will it also performs the optimization if not do i need to call autooptimize from my solrConfig in that case for my 50Gb will it takes less time . Please clearify me Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27634994.html Sent from the Solr - User mailing list archive at Nabble.com.
optimize is taking too much time
hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com.