Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
With a mergeFactor set to anything  1 you would never have only one segment
- unless you optimized. So Lucene will never naturally merge all the
segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've
never tested that. It's hard to picture how that would work.

If I understand correctly, the same actions occur (deleted documents are
removed, etc.) because an optimize is only a multiway merge down to one
segment, whereas normal merging is triggered by the mergeFactor, but does
not have a target segment count to merge down to.

-Jay

On Sun, Feb 21, 2010 at 11:20 AM, David Smiley @MITRE.org dsmi...@mitre.org
 wrote:


 I've always thought that these two events were effectively equivalent.  --
 the results of an optimize vs the results of Lucene _naturally_ merging all
 segments together into one.  If they don't have the safe effect then what
 is
 the difference?

 ~ David Smiley


 Otis Gospodnetic wrote:
 
  Hello,
 
  Solr will never optimize the whole index without somebody explicitly
  asking for it.
  Lucene will merge index segments on the master as documents are indexed.
  How often it does that depends on mergeFactor.
 
  See:
 
 http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user
 
 
  Otis 
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
  - Original Message 
  From: mklprasad mklpra...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, February 19, 2010 1:02:11 AM
  Subject: Re: optimize is taking too much time
 
 
 
 
  Jagdish Vasani-2 wrote:
  
   Hi,
  
   you should not optimize index after each insert of document.insted you
   should optimize it after inserting some good no of documents.
   because in optimize it will merge  all segments to one according to
   setting
   of lucene index.
  
   thanks,
   Jagdish
   On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
  
  
   hi
   in my solr u have 1,42,45,223 records having some 50GB .
   Now when iam loading a new record and when its trying optimize the
  docs
   its
   taking 2 much memory and time
  
  
   can any body please tell do we have any property in solr to get rid
 of
   this.
  
   Thanks in advance
  
   --
   View this message in context:
  
 
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  Yes,
  Thanks for reply
  i have removed the optmize() from  code. but i have a doubt ..
  1.Will  mergefactor internally do any optmization (or) we have to
 specify
 
  2. Even if solr initaiates optmize if i have a large data like 52GB will
  that takes huge time?
 
  Thanks,
  Prasad
 
 
 
  --
  View this message in context:
 
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: optimize is taking too much time

2010-02-22 Thread David Smiley @MITRE.org

Your response contradicts the wiki's description of mergeFactor:
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
-- which clearly states that the indexes are merged into a single segment. 
It makes no reference to optimize to trigger this condition.  If what you
say is true, and we agree that the mergeFactor is the upper bound of the
number of segments, then what is the lower bound of the number of segments
seen for an index that has not been optimized?  Always 2, or some function
of mergeFactor?

~ David Smiley


Jay Hill wrote:
 
 With a mergeFactor set to anything  1 you would never have only one
 segment
 - unless you optimized. So Lucene will never naturally merge all the
 segments into one. Unless, I suppose, the mergeFactor was set to 1, but
 I've
 never tested that. It's hard to picture how that would work.
 
 If I understand correctly, the same actions occur (deleted documents are
 removed, etc.) because an optimize is only a multiway merge down to one
 segment, whereas normal merging is triggered by the mergeFactor, but does
 not have a target segment count to merge down to.
 
 -Jay
 ...
 

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27693177.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimize is taking too much time

2010-02-22 Thread Yonik Seeley
On Sun, Feb 21, 2010 at 2:20 PM, David Smiley @MITRE.org
dsmi...@mitre.org wrote:
 I've always thought that these two events were effectively equivalent.  --
 the results of an optimize vs the results of Lucene _naturally_ merging all
 segments together into one.

Correct.  Occasionally one hit's a major merge and as few as 1
segment is produced.

Think about the odometer in your car (if you have one that spins).
Each digit is the number of segments of that size... so the total
number of segments in the index is the total of the digits.  You
naturally get back to a single segment whenever it rolls over to the
next highest power of 10 (mergeFactor=10).

-Yonik
http://www.lucidimagination.com


Re: optimize is taking too much time

2010-02-22 Thread Mark Miller

Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go.



--
- Mark

http://www.lucidimagination.com





Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a
running system, it's probably very rare that there is only a single segment
for any meaningful length of time. Unless that merge-down-to-one occurs
right when indexing stops there will almost always be a new (small) segment
following immediately after the merge. It would be interesting to observe,
over a long time, how often and for how long everything is merged down to a
single segment.

Probably with a very low mergeFactor (2 or 3?) merges-to-one might occur
often enough to make optimizing unnecessary. But I'm guessing that the
merge-to-one happens so infrequently in most situations that optimizing is
more important.

-Jay


On Mon, Feb 22, 2010 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:

 Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go.



 --
 - Mark

 http://www.lucidimagination.com






Re: optimize is taking too much time

2010-02-22 Thread Yonik Seeley
On Mon, Feb 22, 2010 at 6:39 PM, Jay Hill jayallenh...@gmail.com wrote:
 It's just that, in a
 running system, it's probably very rare that there is only a single segment
 for any meaningful length of time.

Right - but the performance impact of a huge merge can be non-trivial.
 People wishing to avoid the biggest of merges at unpredictable times
should look at parameters such as maxMergeDocs that prevent merging
segments over a certain size.

-Yonik
http://www.lucidimagination.com


Re: optimize is taking too much time

2010-02-21 Thread David Smiley @MITRE.org

I've always thought that these two events were effectively equivalent.  --
the results of an optimize vs the results of Lucene _naturally_ merging all
segments together into one.  If they don't have the safe effect then what is
the difference?

~ David Smiley


Otis Gospodnetic wrote:
 
 Hello,
 
 Solr will never optimize the whole index without somebody explicitly
 asking for it.
 Lucene will merge index segments on the master as documents are indexed. 
 How often it does that depends on mergeFactor.
 
 See:
 http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user
 
 
 Otis 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 - Original Message 
 From: mklprasad mklpra...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 1:02:11 AM
 Subject: Re: optimize is taking too much time
 
 
 
 
 Jagdish Vasani-2 wrote:
  
  Hi,
  
  you should not optimize index after each insert of document.insted you
  should optimize it after inserting some good no of documents.
  because in optimize it will merge  all segments to one according to
  setting
  of lucene index.
  
  thanks,
  Jagdish
  On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
  
 
  hi
  in my solr u have 1,42,45,223 records having some 50GB .
  Now when iam loading a new record and when its trying optimize the
 docs
  its
  taking 2 much memory and time
 
 
  can any body please tell do we have any property in solr to get rid of
  this.
 
  Thanks in advance
 
  --
  View this message in context:
  
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  
 
 Yes,
 Thanks for reply 
 i have removed the optmize() from  code. but i have a doubt ..
 1.Will  mergefactor internally do any optmization (or) we have to specify
 
 2. Even if solr initaiates optmize if i have a large data like 52GB will
 that takes huge time?
 
 Thanks,
 Prasad
 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimize is taking too much time

2010-02-19 Thread Otis Gospodnetic
Hello,

Solr will never optimize the whole index without somebody explicitly asking for 
it.
Lucene will merge index segments on the master as documents are indexed.  How 
often it does that depends on mergeFactor.

See:
http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: mklprasad mklpra...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 1:02:11 AM
 Subject: Re: optimize is taking too much time
 
 
 
 
 Jagdish Vasani-2 wrote:
  
  Hi,
  
  you should not optimize index after each insert of document.insted you
  should optimize it after inserting some good no of documents.
  because in optimize it will merge  all segments to one according to
  setting
  of lucene index.
  
  thanks,
  Jagdish
  On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
  
 
  hi
  in my solr u have 1,42,45,223 records having some 50GB .
  Now when iam loading a new record and when its trying optimize the docs
  its
  taking 2 much memory and time
 
 
  can any body please tell do we have any property in solr to get rid of
  this.
 
  Thanks in advance
 
  --
  View this message in context:
  
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  
 
 Yes,
 Thanks for reply 
 i have removed the optmize() from  code. but i have a doubt ..
 1.Will  mergefactor internally do any optmization (or) we have to specify
 
 2. Even if solr initaiates optmize if i have a large data like 52GB will
 that takes huge time?
 
 Thanks,
 Prasad
 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimize is taking too much time

2010-02-18 Thread Jagdish Vasani
Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote:


 hi
 in my solr u have 1,42,45,223 records having some 50GB .
 Now when iam loading a new record and when its trying optimize the docs its
 taking 2 much memory and time


 can any body please tell do we have any property in solr to get rid of
 this.

 Thanks in advance

 --
 View this message in context:
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: optimize is taking too much time

2010-02-18 Thread NarasimhaRaju
Hi, 
You can also make use of autocommit feature of solr.
You have two possibilities either based on max number of uncommited docs or 
based on time.
see updateHandler of your solrconfig.xml.

Example:-

autoCommit
   !--  
   maxDocs1/maxDocs
   --
   
   !-- maximum time (in MS) after adding a doc before an autocommit is 
triggered -- 
   maxTime60/maxTime 
  /autoCommit


once your done with adding run final optimize/commit.

Regards, 
P.N.Raju, 





From: Jagdish Vasani jvasani1...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 3:12:15 PM
Subject: Re: optimize is taking too much time

Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote:


 hi
 in my solr u have 1,42,45,223 records having some 50GB .
 Now when iam loading a new record and when its trying optimize the docs its
 taking 2 much memory and time


 can any body please tell do we have any property in solr to get rid of
 this.

 Thanks in advance

 --
 View this message in context:
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
 Sent from the Solr - User mailing list archive at Nabble.com.





  

Re: optimize is taking too much time

2010-02-18 Thread mklprasad



Jagdish Vasani-2 wrote:
 
 Hi,
 
 you should not optimize index after each insert of document.insted you
 should optimize it after inserting some good no of documents.
 because in optimize it will merge  all segments to one according to
 setting
 of lucene index.
 
 thanks,
 Jagdish
 On Fri, Feb 12, 2010 at 4:01 PM, mklprasad mklpra...@gmail.com wrote:
 

 hi
 in my solr u have 1,42,45,223 records having some 50GB .
 Now when iam loading a new record and when its trying optimize the docs
 its
 taking 2 much memory and time


 can any body please tell do we have any property in solr to get rid of
 this.

 Thanks in advance

 --
 View this message in context:
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

Yes,
Thanks for reply 
i have removed the optmize() from  code. but i have a doubt ..
1.Will  mergefactor internally do any optmization (or) we have to specify
autoOptimze 
2. Even if solr initaiates optmize if i have a large data like 52GB will
that takes huge time?

Thanks,
Prasad



-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimize is taking too much time

2010-02-17 Thread Chris Hostetter

: in my solr u have 1,42,45,223 records having some 50GB .
: Now when iam loading a new record and when its trying optimize the docs its
: taking 2 much memory and time 

: can any body please tell do we have any property in solr to get rid of this.

Solr isn't going to optimize the index unless you tell it to -- how are 
you indexing your docs? are you sure you don't have something programmed 
to send an optimize command?


-Hoss



Re: optimize is taking too much time

2010-02-17 Thread mklprasad



hossman wrote:
 
 
 : in my solr u have 1,42,45,223 records having some 50GB .
 : Now when iam loading a new record and when its trying optimize the docs
 its
 : taking 2 much memory and time 
 
 : can any body please tell do we have any property in solr to get rid of
 this.
 
 Solr isn't going to optimize the index unless you tell it to -- how are 
 you indexing your docs? are you sure you don't have something programmed 
 to send an optimize command?
 
 
 -Hoss
 
  yes ,
 From My Code 
 For Every Load iam calling the server.optimize() method
 ( Now iam planning to remove this from the code)
 in the config level i have 'mergerFactor=10'
 i have a doubt like will the mergerFactor will only do a merge  or will it
 also performs the optimization 
 if not do i need to call autooptimize from my solrConfig 
 in that case for my 50Gb will it takes less time .
 
 
 Please clearify me
 Thanks in advance
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27634994.html
Sent from the Solr - User mailing list archive at Nabble.com.



optimize is taking too much time

2010-02-12 Thread mklprasad

hi 
in my solr u have 1,42,45,223 records having some 50GB .
Now when iam loading a new record and when its trying optimize the docs its
taking 2 much memory and time 


can any body please tell do we have any property in solr to get rid of this.

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
Sent from the Solr - User mailing list archive at Nabble.com.