Re: Merge Policy Recommendation for 3.6.1

2012-09-29 Thread Sujatha Arun
Thanks Shawn,that helps a lot .our current OS limit is set to 300,000+, I
guess, which is I heard is maximum for the OS .. not sure of the soft and
hard limits .Will check this .

Regards,
Sujatha



On Fri, Sep 28, 2012 at 8:14 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/28/2012 12:43 AM, Sujatha Arun wrote:

 Hello,

 In the case where there are over 200+ cores on a single node , is it
 recommended to go with Tiered MP with segment size of 4 ? Our Index size
 vary from a few MB to 4 GB .

 Will there be any issue with Too many open files  and the number of
 indexes with respect to MP ?  At the moment we are thinking of going with
 Tiered MP ..

 Os file limit has been set to maximum.


 Whether or not to deviate from the standard TieredMergePolicy depends
 heavily on many factors which we do not know, but I can tell you that it's
 probably not a good idea.  That policy typically produces the best results
 in all scenarios.

 http://blog.mikemccandless.**com/2011/02/visualizing-**
 lucenes-segment-merges.htmlhttp://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 On the subject of open files:  With its default configuration, a Solr 3.x
 index will have either 8 or 11 files per segment, depending on whether you
 are using termvectors.  I am completely unsure about 4.0, because I've
 never used it, but it is probably similar.  The following calculations are
 based on my experience with 3.x.

 With a segment limit of 4, you might expect to have only six segments
 around at any one time - the four that are being merged, the new merged
 segment, and a segment where new data is being written.  If your system
 indexes data slow enough for merges to complete before another new segment
 is created, this is indeed the most you will ever see.  If your system
 indexes data fast enough, you might actually have short-lived moments with
 10 or 14 segments, and possibly more.

 Assuming some things, which lead to using the 13 segment figure:
 simultaneous indexing to multiple cores at once, with termvectors turned
 on.  With these assumptions, a 200 core Solr installation using 4 segments
 might potentially have nearly 37000 files open, but is more likely to have
 significantly less.  If you increase your merge policy segment limit, the
 numbers will go up from there.

 I have configured my Linux servers with a soft file limit of 49152 and a
 hard limit of 65536.  My segment limit is set to 35, and each server has a
 maximum of four active cores, which means that during heavy indexing, I can
 see over 8000 open files.

 What does maximum on the OS file limit actually mean?  Does your OS have
 a way to specify unlimited? My personal feeling is that it's a bad idea to
 run with no limits at all.  I would imagine that you need to go with a
 minimum soft limit of 65536.  Your segment limit of 4 is probably
 reasonable, unless you will be doing a lot of indexing in a very short
 amount of time.  If you are, you may want a larger limit, and a larger
 number of maximum open files.

 Thanks,
 Shawn




Merge Policy Recommendation for 3.6.1

2012-09-28 Thread Sujatha Arun
Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with Too many open files  and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.

Regards
Sujatha


Re: Merge Policy Recommendation for 3.6.1

2012-09-28 Thread Shawn Heisey

On 9/28/2012 12:43 AM, Sujatha Arun wrote:

Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with Too many open files  and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.


Whether or not to deviate from the standard TieredMergePolicy depends 
heavily on many factors which we do not know, but I can tell you that 
it's probably not a good idea.  That policy typically produces the best 
results in all scenarios.


http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

On the subject of open files:  With its default configuration, a Solr 
3.x index will have either 8 or 11 files per segment, depending on 
whether you are using termvectors.  I am completely unsure about 4.0, 
because I've never used it, but it is probably similar.  The following 
calculations are based on my experience with 3.x.


With a segment limit of 4, you might expect to have only six segments 
around at any one time - the four that are being merged, the new merged 
segment, and a segment where new data is being written.  If your system 
indexes data slow enough for merges to complete before another new 
segment is created, this is indeed the most you will ever see.  If your 
system indexes data fast enough, you might actually have short-lived 
moments with 10 or 14 segments, and possibly more.


Assuming some things, which lead to using the 13 segment figure: 
simultaneous indexing to multiple cores at once, with termvectors turned 
on.  With these assumptions, a 200 core Solr installation using 4 
segments might potentially have nearly 37000 files open, but is more 
likely to have significantly less.  If you increase your merge policy 
segment limit, the numbers will go up from there.


I have configured my Linux servers with a soft file limit of 49152 and a 
hard limit of 65536.  My segment limit is set to 35, and each server has 
a maximum of four active cores, which means that during heavy indexing, 
I can see over 8000 open files.


What does maximum on the OS file limit actually mean?  Does your OS 
have a way to specify unlimited? My personal feeling is that it's a bad 
idea to run with no limits at all.  I would imagine that you need to go 
with a minimum soft limit of 65536.  Your segment limit of 4 is probably 
reasonable, unless you will be doing a lot of indexing in a very short 
amount of time.  If you are, you may want a larger limit, and a larger 
number of maximum open files.


Thanks,
Shawn