subject:"RE\: Solr and Garbage Collection"

Re: Solr 7.7.0 - Garbage Collection issue

2019-02-12 Thread Joe Obernberger

Reverted back to 7.6.0 - same settings, but now I do not encounter the 
large CPU usage.


-Joe

On 2/12/2019 12:37 PM, Joe Obernberger wrote:
Thank you Shawn.  Yes, I used the settings off of your site. I've 
restarted the cluster and the CPU usage is back up again. Looking at 
it now, it doesn't appear to be GC related.

Full log from one of the nodes that is pegging 13 CPU cores:

http://lovehorsepower.com/solr_gc.log.0.current

Thank you For the gceasy.io site - that is very slick!  I'll use that 
in the future.  I can try using the standard settings, but again - at 
this point it doesn't look GC related to me?


-Joe

On 2/12/2019 11:35 AM, Shawn Heisey wrote:

On 2/12/2019 7:35 AM, Joe Obernberger wrote:
Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 
7.7.0.  This morning, all the nodes are using 1200+% of CPU. It 
looks like it's in garbage collection. We did reduce our HDFS cache 
size from 11G to 6G, but other than that, no other parameters were 
changes.


Your message included a small excerpt from the GC log.  That is not 
helpful.  We will need the entire GC log, possibly more than one log. 
The log or logs should fully cover the timeframe where the problem 
occurs.  Full disclosure: Once obtained, I would use this website to 
analyze GC log data:


http://gceasy.io


Parameters are:

GC_TUNE="-XX:+UseG1GC \
-XX:MaxDirectMemorySize=6g \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=300 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+UseLargePages \
-XX:ParallelGCThreads=16 \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"


Looks like you've chosen to use G1 settings very similar to what I 
put on my wiki page:


https://wiki.apache.org/solr/ShawnHeisey#Current_experiments

Those settings are not intended to be a canonical resource that 
everyone can use.  Your heap size is different than what I was using 
when I worked on that, so you may need different settings.


Have you considered not using your own GC tuning, letting Solr's 
start script handle that?


With the limited information available, my initial guess is that you 
need a larger heap, that Java is spending all its time freeing up 
enough memory to keep the program running.


Thanks,
Shawn

---
This email has been checked for viruses by AVG.
https://www.avg.com

Re: Solr 7.7.0 - Garbage Collection issue

2019-02-12 Thread Joe Obernberger

Thank you Shawn.  Yes, I used the settings off of your site. I've 
restarted the cluster and the CPU usage is back up again. Looking at it 
now, it doesn't appear to be GC related.

Full log from one of the nodes that is pegging 13 CPU cores:

http://lovehorsepower.com/solr_gc.log.0.current

Thank you For the gceasy.io site - that is very slick!  I'll use that in 
the future.  I can try using the standard settings, but again - at this 
point it doesn't look GC related to me?


-Joe

On 2/12/2019 11:35 AM, Shawn Heisey wrote:

On 2/12/2019 7:35 AM, Joe Obernberger wrote:
Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 
7.7.0.  This morning, all the nodes are using 1200+% of CPU. It looks 
like it's in garbage collection. We did reduce our HDFS cache size 
from 11G to 6G, but other than that, no other parameters were changes.


Your message included a small excerpt from the GC log.  That is not 
helpful.  We will need the entire GC log, possibly more than one log. 
The log or logs should fully cover the timeframe where the problem 
occurs.  Full disclosure: Once obtained, I would use this website to 
analyze GC log data:


http://gceasy.io


Parameters are:

GC_TUNE="-XX:+UseG1GC \
-XX:MaxDirectMemorySize=6g \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=300 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+UseLargePages \
-XX:ParallelGCThreads=16 \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"


Looks like you've chosen to use G1 settings very similar to what I put 
on my wiki page:


https://wiki.apache.org/solr/ShawnHeisey#Current_experiments

Those settings are not intended to be a canonical resource that 
everyone can use.  Your heap size is different than what I was using 
when I worked on that, so you may need different settings.


Have you considered not using your own GC tuning, letting Solr's start 
script handle that?


With the limited information available, my initial guess is that you 
need a larger heap, that Java is spending all its time freeing up 
enough memory to keep the program running.


Thanks,
Shawn

---
This email has been checked for viruses by AVG.
https://www.avg.com

Re: Solr 7.7.0 - Garbage Collection issue

2019-02-12 Thread Shawn Heisey


On 2/12/2019 7:35 AM, Joe Obernberger wrote:
Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 
7.7.0.  This morning, all the nodes are using 1200+% of CPU. It looks 
like it's in garbage collection.  We did reduce our HDFS cache size from 
11G to 6G, but other than that, no other parameters were changes.


Your message included a small excerpt from the GC log.  That is not 
helpful.  We will need the entire GC log, possibly more than one log. 
The log or logs should fully cover the timeframe where the problem 
occurs.  Full disclosure: Once obtained, I would use this website to 
analyze GC log data:


http://gceasy.io


Parameters are:

GC_TUNE="-XX:+UseG1GC \
-XX:MaxDirectMemorySize=6g \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=300 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+UseLargePages \
-XX:ParallelGCThreads=16 \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"


Looks like you've chosen to use G1 settings very similar to what I put 
on my wiki page:


https://wiki.apache.org/solr/ShawnHeisey#Current_experiments

Those settings are not intended to be a canonical resource that everyone 
can use.  Your heap size is different than what I was using when I 
worked on that, so you may need different settings.


Have you considered not using your own GC tuning, letting Solr's start 
script handle that?


With the limited information available, my initial guess is that you 
need a larger heap, that Java is spending all its time freeing up enough 
memory to keep the program running.


Thanks,
Shawn

RE: Solr and Garbage Collection

2009-10-06 Thread Fuad Efendi

 I read pretty much all posts on this thread (before and after this one).
Looks
 like the main suggestion from you and others is to keep max heap size
(-Xmx)
 as small as possible (as long as you don't see OOM exception). 


I suggested absolute opposite; please note also that as small as possible
does not have any meaning in multiuser environment of Tomcat. It depends on
query types (10 documents per request? OR, may be 1???) AND it depends
on average server loading (one concurrent request? Or, may be 200 threads
trying to deal with 2000 concurrent requests?) AND it depends on whether it
is Master (used for updates - parses tons of docs in a single file???) - and
it depends on unpredictable memory fragmentation - it all depends on use
case too(!!!), additionally to schema / index size.


Please note also, such staff depends on JVM vendor too: what if it
precompiles everything into CPU native code (including memory dealloc after
each call)? Some do!

-Fuad
http://www.linkedin.com/in/liferay


...but 'core' constantly disagrees with me :)

RE: Solr and Garbage Collection

2009-10-06 Thread Fuad Efendi

Master-Slave replica: new caches will be warmedprepopulated _before_ making
new IndexReader available for _new_ requests and _before_ discarding old one
- it means that theoretical sizing for FieldCache (which is defined by
number of docs in an index and cardinality of a field) should be doubled...
of course we need to play with GC options too for performance tuning
(mostly) 


  I read pretty much all posts on this thread (before and after this one).
 Looks
  like the main suggestion from you and others is to keep max heap size
 (-Xmx)
  as small as possible (as long as you don't see OOM exception).
 
 
 I suggested absolute opposite; please note also that as small as
possible
 does not have any meaning in multiuser environment of Tomcat. It depends
on
 query types (10 documents per request? OR, may be 1???) AND it depends
 on average server loading (one concurrent request? Or, may be 200 threads
 trying to deal with 2000 concurrent requests?) AND it depends on whether
it
 is Master (used for updates - parses tons of docs in a single file???) -
and
 it depends on unpredictable memory fragmentation - it all depends on use
 case too(!!!), additionally to schema / index size.
 
 
 Please note also, such staff depends on JVM vendor too: what if it
 precompiles everything into CPU native code (including memory dealloc
after
 each call)? Some do!
 
 -Fuad
 http://www.linkedin.com/in/liferay
 
 
 ...but 'core' constantly disagrees with me :)

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller

Another option of course, if you're using a recent version of Java 6:

try out the beta-ish, unsupported unless you pay, G1 garbage collector.
I've only recently started playing with it, but its supposed to be much
better than CMS. Its supposedly got much better throughput, its much
better at dealing with fragmentation issues (CMS is actually pretty bad
with fragmentation come to find out), and overall its just supposed to
be a very nice leap ahead in GC. Havn't had a chance to play with it
much myself, but its supposed to be fantastic. A whole new approach to
generational collection for Sun, and much closer to the real time GC's
available from some other vendors.

Mark Miller wrote:
 siping liu wrote:
   
 Hi,

 I read pretty much all posts on this thread (before and after this one). 
 Looks like the main suggestion from you and others is to keep max heap size 
 (-Xmx) as small as possible (as long as you don't see OOM exception). This 
 brings more questions than answers (for me at least. I'm new to Solr).

  

 First, our environment and problem encountered: Solr1.4 (nightly build, 
 downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
 Solaris(multi-cpu/cores). The cache setting is from the default 
 solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and 
 quickly run into the problem similar to the one orignal poster reported -- 
 long pause (seconds to minutes) under load test. jconsole showed that it 
 pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC 
 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 
 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking 
 is with mutile-cpu/cores we can get over with GC as quickly as possibe. With 
 the new setup, it works fine until Tomcat reaches heap size, then it blocks 
 and takes minutes on full GC to get more space from tenure generation. 
 We tried different Xmx (from very small to large), no difference in long GC 
 time. We never run into OOM.
   
 
 MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
 the Parallel collector. That also doesnt look like a good survivorratio.
   
  

 Questions:

 * In general various cachings are good for performance, we have more RAM to 
 use and want to use more caching to boost performance, isn't your suggestion 
 (of lowering heap limit) going against that?
   
 
 Leaving RAM for the FileSystem cache is also very important. But you
 should also have enough RAM for your Solr caches of course.
   
 * Looks like Solr caching made its way into tenure-generation on heap, 
 that's good. But why they get GC'ed eventually?? I did a quick check of Solr 
 code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. 
 Is that what is causing all this? This seems to suggest a design flaw in 
 Solr's memory management strategy (or just my ignorance about Solr?). I 
 mean, wouldn't this be the right way of doing it -- you allow user to 
 specify the cache size in solrconfig.xml, then user can set up heap limit in 
 JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not 
 SoftReference)??
   
 
 Do you see concurrent mode failure when looking at your gc logs? ie:

 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K),
 4.0975124 secs] 228336K-162118K(241520K)

 That means you have still getting major collections with CMS, and you
 don't want that. You might try kicking GC off earlier with something
 like: -XX:CMSInitiatingOccupancyFraction=50
   
 * Right now I have a single Tomcat hosting Solr and other applications. I 
 guess now it's better to have Solr on its own Tomcat, given that it's tricky 
 to adjust the java options.

  

 thanks.


  
   
 
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700

 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum query
 rate was 2-3X better with the concurrent generational GC compared to any of
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.

 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector. Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because

Re: Solr and Garbage Collection

2009-10-03 Thread Bill Au

SUN has recently clarify the issue regarding unsupported unless you pay
for the G1 garbage collector. Here is the updated release of Java 6 update
14:
http://java.sun.com/javase/6/webnotes/6u14.html


G1 will be part of Java 7, fully supported without pay.  The version
included in Java 6 update 14 is a beta release.  Since it is beta, SUN does
not recommend using it unless you have a support contract because as with
any beta software there will be bugs.  Non paying customers may very well
have to wait for the official version in Java 7 for bug fixes.

Here is more info on the G1 garbage collector:

http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp


Bill

On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller markrmil...@gmail.com wrote:

 Another option of course, if you're using a recent version of Java 6:

 try out the beta-ish, unsupported unless you pay, G1 garbage collector.
 I've only recently started playing with it, but its supposed to be much
 better than CMS. Its supposedly got much better throughput, its much
 better at dealing with fragmentation issues (CMS is actually pretty bad
 with fragmentation come to find out), and overall its just supposed to
 be a very nice leap ahead in GC. Havn't had a chance to play with it
 much myself, but its supposed to be fantastic. A whole new approach to
 generational collection for Sun, and much closer to the real time GC's
 available from some other vendors.

 Mark Miller wrote:
  siping liu wrote:
 
  Hi,
 
  I read pretty much all posts on this thread (before and after this one).
 Looks like the main suggestion from you and others is to keep max heap size
 (-Xmx) as small as possible (as long as you don't see OOM exception). This
 brings more questions than answers (for me at least. I'm new to Solr).
 
 
 
  First, our environment and problem encountered: Solr1.4 (nightly build,
 downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
 Solaris(multi-cpu/cores). The cache setting is from the default
 solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and
 quickly run into the problem similar to the one orignal poster reported --
 long pause (seconds to minutes) under load test. jconsole showed that it
 pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking
 is with mutile-cpu/cores we can get over with GC as quickly as possibe. With
 the new setup, it works fine until Tomcat reaches heap size, then it blocks
 and takes minutes on full GC to get more space from tenure generation.
 We tried different Xmx (from very small to large), no difference in long GC
 time. We never run into OOM.
 
 
  MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
  the Parallel collector. That also doesnt look like a good survivorratio.
 
 
 
  Questions:
 
  * In general various cachings are good for performance, we have more RAM
 to use and want to use more caching to boost performance, isn't your
 suggestion (of lowering heap limit) going against that?
 
 
  Leaving RAM for the FileSystem cache is also very important. But you
  should also have enough RAM for your Solr caches of course.
 
  * Looks like Solr caching made its way into tenure-generation on heap,
 that's good. But why they get GC'ed eventually?? I did a quick check of Solr
 code (Solr 1.3, not 1.4), and see a single instance of using WeakReference.
 Is that what is causing all this? This seems to suggest a design flaw in
 Solr's memory management strategy (or just my ignorance about Solr?). I
 mean, wouldn't this be the right way of doing it -- you allow user to
 specify the cache size in solrconfig.xml, then user can set up heap limit in
 JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
 SoftReference)??
 
 
  Do you see concurrent mode failure when looking at your gc logs? ie:
 
  174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
  secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K),
  4.0975124 secs] 228336K-162118K(241520K)
 
  That means you have still getting major collections with CMS, and you
  don't want that. You might try kicking GC off earlier with something
  like: -XX:CMSInitiatingOccupancyFraction=50
 
  * Right now I have a single Tomcat hosting Solr and other applications.
 I guess now it's better to have Solr on its own Tomcat, given that it's
 tricky to adjust the java options.
 
 
 
  thanks.
 
 
 
 
 
  From: wun...@wunderwood.org
  To: solr-user@lucene.apache.org
  Subject: RE: Solr and Garbage Collection
  Date: Fri, 25 Sep 2009 09:51:29 -0700
 
  30ms is not better or worse than 1s until you look at the service
  requirements. For many applications, it is worth dedicating 10% of your
  processing time to GC if that makes the worst-case pause short.
 
  On the other hand, my experience with the IBM JVM was that the maximum
 query

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller

Ah, yes - thanks for the clarification. Didn't pay attention to how
ambiguously I was using supported there :)

Bill Au wrote:
 SUN has recently clarify the issue regarding unsupported unless you pay
 for the G1 garbage collector. Here is the updated release of Java 6 update
 14:
 http://java.sun.com/javase/6/webnotes/6u14.html


 G1 will be part of Java 7, fully supported without pay.  The version
 included in Java 6 update 14 is a beta release.  Since it is beta, SUN does
 not recommend using it unless you have a support contract because as with
 any beta software there will be bugs.  Non paying customers may very well
 have to wait for the official version in Java 7 for bug fixes.

 Here is more info on the G1 garbage collector:

 http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp


 Bill

 On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller markrmil...@gmail.com wrote:

   
 Another option of course, if you're using a recent version of Java 6:

 try out the beta-ish, unsupported unless you pay, G1 garbage collector.
 I've only recently started playing with it, but its supposed to be much
 better than CMS. Its supposedly got much better throughput, its much
 better at dealing with fragmentation issues (CMS is actually pretty bad
 with fragmentation come to find out), and overall its just supposed to
 be a very nice leap ahead in GC. Havn't had a chance to play with it
 much myself, but its supposed to be fantastic. A whole new approach to
 generational collection for Sun, and much closer to the real time GC's
 available from some other vendors.

 Mark Miller wrote:
 
 siping liu wrote:

   
 Hi,

 I read pretty much all posts on this thread (before and after this one).
 
 Looks like the main suggestion from you and others is to keep max heap size
 (-Xmx) as small as possible (as long as you don't see OOM exception). This
 brings more questions than answers (for me at least. I'm new to Solr).
 

 First, our environment and problem encountered: Solr1.4 (nightly build,
 
 downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
 Solaris(multi-cpu/cores). The cache setting is from the default
 solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and
 quickly run into the problem similar to the one orignal poster reported --
 long pause (seconds to minutes) under load test. jconsole showed that it
 pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking
 is with mutile-cpu/cores we can get over with GC as quickly as possibe. With
 the new setup, it works fine until Tomcat reaches heap size, then it blocks
 and takes minutes on full GC to get more space from tenure generation.
 We tried different Xmx (from very small to large), no difference in long GC
 time. We never run into OOM.
 
 
 MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
 the Parallel collector. That also doesnt look like a good survivorratio.

   
 Questions:

 * In general various cachings are good for performance, we have more RAM
 
 to use and want to use more caching to boost performance, isn't your
 suggestion (of lowering heap limit) going against that?
 
 
 Leaving RAM for the FileSystem cache is also very important. But you
 should also have enough RAM for your Solr caches of course.

   
 * Looks like Solr caching made its way into tenure-generation on heap,
 
 that's good. But why they get GC'ed eventually?? I did a quick check of Solr
 code (Solr 1.3, not 1.4), and see a single instance of using WeakReference.
 Is that what is causing all this? This seems to suggest a design flaw in
 Solr's memory management strategy (or just my ignorance about Solr?). I
 mean, wouldn't this be the right way of doing it -- you allow user to
 specify the cache size in solrconfig.xml, then user can set up heap limit in
 JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
 SoftReference)??
 
 
 Do you see concurrent mode failure when looking at your gc logs? ie:

 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K),
 4.0975124 secs] 228336K-162118K(241520K)

 That means you have still getting major collections with CMS, and you
 don't want that. You might try kicking GC off earlier with something
 like: -XX:CMSInitiatingOccupancyFraction=50

   
 * Right now I have a single Tomcat hosting Solr and other applications.
 
 I guess now it's better to have Solr on its own Tomcat, given that it's
 tricky to adjust the java options.
 

 thanks.





 
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700

 30ms is not better or worse than 1s until you look

Re: Solr and Garbage Collection

2009-10-03 Thread Bill Au

the java options.

thanks.

From: wun...@wunderwood.org
To: solr-user@lucene.apache.org
Subject: RE: Solr and Garbage Collection
Date: Fri, 25 Sep 2009 09:51:29 -0700

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of
your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the
maximum

query

rate was 2-3X better with the concurrent generational GC compared to

any of

their other GC algorithms, so we got the best throughput along with
the
shortest pauses.

Solr garbage generation (for queries) seems to have two major

components:

per-request garbage and cache evictions. With a generational
collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so
that

can be collected rapidly and returned to use for further requests. If

the

nursery is too small, the per-request allocations will be made in

tenured

space and sit there until the next major GC. Cache evictions are
almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of
course)
while increasing the size of the nursery. That rate should drop when

the

nursery gets big enough, then not drop much further as it is
increased

more.

After that, reduce the size of tenured space until major GCs start

happening

too often (a judgment call). A bigger tenured space means longer

major GCs

and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say

20% or

less, make that cache much bigger or set it to zero. Either one will

reduce

the number of cache evictions. If you have an HTTP cache in front of

Solr,

zero may be the right choice, since the HTTP cache is cherry-picking

the

easily cacheable requests.

Note that a commit nearly doubles the memory required, because you
have

two

live Searcher objects with all their caches. Make sure you have

headroom for

a commit.

If you want to test the tenured space usage, you must test with real

world

queries. Those are the only way to get accurate cache eviction rates.

wunder

_
Bing™ brings you maps, menus, and reviews organized in one place.
Try

it now.

http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

--
- Mark

http://www.lucidimagination.com

--
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller

see concurrent mode failure when looking at your gc logs? ie:

174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
secs]174.446: [CMS (concurrent mode failure):

161928K-162118K(175104K),

4.0975124 secs] 228336K-162118K(241520K)

That means you have still getting major collections with CMS, and you
don't want that. You might try kicking GC off earlier with something
like: -XX:CMSInitiatingOccupancyFraction=50

* Right now I have a single Tomcat hosting Solr and other

applications.

I guess now it's better to have Solr on its own Tomcat, given that it's
tricky to adjust the java options.

thanks.

From: wun...@wunderwood.org
To: solr-user@lucene.apache.org
Subject: RE: Solr and Garbage Collection
Date: Fri, 25 Sep 2009 09:51:29 -0700

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of

your

processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the

maximum

query

rate was 2-3X better with the concurrent generational GC compared to

any of

their other GC algorithms, so we got the best throughput along with

the

shortest pauses.

Solr garbage generation (for queries) seems to have two major

components:

per-request garbage and cache evictions. With a generational

collector,

these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so

that

can be collected rapidly and returned to use for further requests. If

the

nursery is too small, the per-request allocations will be made in

tenured

space and sit there until the next major GC. Cache evictions are

almost

always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of

course)

while increasing the size of the nursery. That rate should drop when

the

nursery gets big enough, then not drop much further as it is

increased

more.

After that, reduce the size of tenured space until major GCs start

happening

too often (a judgment call). A bigger tenured space means longer

major GCs

and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say

20% or

less, make that cache much bigger or set it to zero. Either one will

reduce

the number of cache evictions. If you have an HTTP cache in front of

Solr,

zero may be the right choice, since the HTTP cache is cherry-picking

the

easily cacheable requests.

Note that a commit nearly doubles the memory required, because you

have

two

live Searcher objects with all their caches. Make sure you have

headroom for

a commit.

If you want to test the tenured space usage, you must test with real

world

queries. Those are the only way to get accurate cache eviction rates.

wunder

_
Bing™ brings you maps, menus, and reviews organized in one place.

Try

it now.

http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

--
- Mark

http://www.lucidimagination.com

--
- Mark

http://www.lucidimagination.com

--
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller

.


   
 
 * Looks like Solr caching made its way into tenure-generation on heap,

 
   
 that's good. But why they get GC'ed eventually?? I did a quick check of
 
   
 Solr
 
   
 code (Solr 1.3, not 1.4), and see a single instance of using
 
   
 WeakReference.
 
   
 Is that what is causing all this? This seems to suggest a design flaw in
 Solr's memory management strategy (or just my ignorance about Solr?). I
 mean, wouldn't this be the right way of doing it -- you allow user to
 specify the cache size in solrconfig.xml, then user can set up heap
 
   
 limit in
 
   
 JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
 SoftReference)??

 
   
 Do you see concurrent mode failure when looking at your gc logs? ie:

 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
 secs]174.446: [CMS (concurrent mode failure):
   
 
 161928K-162118K(175104K),
 
   
 4.0975124 secs] 228336K-162118K(241520K)

 That means you have still getting major collections with CMS, and you
 don't want that. You might try kicking GC off earlier with something
 like: -XX:CMSInitiatingOccupancyFraction=50


   
 
 * Right now I have a single Tomcat hosting Solr and other
 
   
 applications.
 
   
 I guess now it's better to have Solr on its own Tomcat, given that it's
 tricky to adjust the java options.

 
   
 thanks.






 
   
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700

 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of
   
 
 your
 
   
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the
   
 
 maximum
 
   
 query

 
   
 rate was 2-3X better with the concurrent generational GC compared to

   
 
 any of

 
   
 their other GC algorithms, so we got the best throughput along with
   
 
 the
 
   
 shortest pauses.

 Solr garbage generation (for queries) seems to have two major

   
 
 components:

 
   
 per-request garbage and cache evictions. With a generational
   
 
 collector,
 
   
 these two are handled by separate parts of the collector. Per-request
 garbage should completely fit in the short-term heap (nursery), so
   
 
 that
 
   
 it

 
   
 can be collected rapidly and returned to use for further requests. If

   
 
 the

 
   
 nursery is too small, the per-request allocations will be made in

   
 
 tenured

 
   
 space and sit there until the next major GC. Cache evictions are
   
 
 almost
 
   
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of
   
 
 course)
 
   
 while increasing the size of the nursery. That rate should drop when

   
 
 the

 
   
 nursery gets big enough, then not drop much further as it is
   
 
 increased
 
   
 more.

 
   
 After that, reduce the size of tenured space until major GCs start

   
 
 happening

 
   
 too often (a judgment call). A bigger tenured space means longer

   
 
 major GCs

 
   
 and thus longer pauses, so you don't want it oversized by too much.

 Also check the hit rates of your caches. If the hit rate is low, say

   
 
 20% or

 
   
 less, make that cache much bigger or set it to zero. Either one will

   
 
 reduce

 
   
 the number of cache evictions. If you have an HTTP cache in front of

   
 
 Solr,

 
   
 zero may be the right choice, since the HTTP cache is cherry-picking

   
 
 the

 
   
 easily cacheable requests.

 Note that a commit nearly doubles the memory required, because you
   
 
 have
 
   
 two

 
   
 live Searcher objects with all their caches. Make sure you have

   
 
 headroom for

 
   
 a commit

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller

 
   
 
 and takes minutes on full GC to get more space from tenure
 
   
 
 generation.
 
   
 
 We tried different Xmx (from very small to large), no difference in long
 
   
 
 GC
 
   
 
 time. We never run into OOM.

 
   
 
 MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
 the Parallel collector. That also doesnt look like a good
   
 
   
 survivorratio.
 
   
 
   
 
   
 Questions:

 * In general various cachings are good for performance, we have more
 
   
 
 RAM
 
   
 
 to use and want to use more caching to boost performance, isn't your
 suggestion (of lowering heap limit) going against that?

 
   
 
 Leaving RAM for the FileSystem cache is also very important. But you
 should also have enough RAM for your Solr caches of course.


   
 
   
 * Looks like Solr caching made its way into tenure-generation on heap,

 
   
 
 that's good. But why they get GC'ed eventually?? I did a quick check of
 
   
 
 Solr
 
   
 
 code (Solr 1.3, not 1.4), and see a single instance of using
 
   
 
 WeakReference.
 
   
 
 Is that what is causing all this? This seems to suggest a design flaw in
 Solr's memory management strategy (or just my ignorance about Solr?). I
 mean, wouldn't this be the right way of doing it -- you allow user to
 specify the cache size in solrconfig.xml, then user can set up heap
 
   
 
 limit in
 
   
 
 JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
 SoftReference)??

 
   
 
 Do you see concurrent mode failure when looking at your gc logs? ie:

 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
 secs]174.446: [CMS (concurrent mode failure):
   
 
   
 161928K-162118K(175104K),
 
   
 
 4.0975124 secs] 228336K-162118K(241520K)

 That means you have still getting major collections with CMS, and you
 don't want that. You might try kicking GC off earlier with something
 like: -XX:CMSInitiatingOccupancyFraction=50


   
 
   
 * Right now I have a single Tomcat hosting Solr and other
 
   
 
 applications.
 
   
 
 I guess now it's better to have Solr on its own Tomcat, given that it's
 tricky to adjust the java options.

 
   
 
 thanks.






 
   
 
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700

 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of
   
 
   
 your
 
   
 
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the
   
 
   
 maximum
 
   
 
 query

 
   
 
 rate was 2-3X better with the concurrent generational GC compared to

   
 
   
 any of

 
   
 
 their other GC algorithms, so we got the best throughput along with
   
 
   
 the
 
   
 
 shortest pauses.

 Solr garbage generation (for queries) seems to have two major

   
 
   
 components:

 
   
 
 per-request garbage and cache evictions. With a generational
   
 
   
 collector,
 
   
 
 these two are handled by separate parts of the collector. Per-request
 garbage should completely fit in the short-term heap (nursery), so
   
 
   
 that
 
   
 
 it

 
   
 
 can be collected rapidly and returned to use for further requests. If

   
 
   
 the

 
   
 
 nursery is too small, the per-request allocations will be made in

   
 
   
 tenured

 
   
 
 space and sit there until the next major GC. Cache evictions are
   
 
   
 almost
 
   
 
 always in long-term storage (tenured space) because an LRU algorithm

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks
like the main suggestion from you and others is to keep max heap size (-Xmx) as
small as possible (as long as you don't see OOM exception). This brings more
questions than answers (for me at least. I'm new to Solr).

First, our environment and problem encountered: Solr1.4 (nightly build,
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the
problem similar to the one orignal poster reported -- long pause (seconds to
minutes) under load test. jconsole showed that it pauses on GC. So more
JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new
setup, it works fine until Tomcat reaches heap size, then it blocks and takes
minutes on full GC to get more space from tenure generation. We tried
different Xmx (from very small to large), no difference in long GC time. We
never run into OOM.

Questions:

* In general various cachings are good for performance, we have more RAM to use
and want to use more caching to boost performance, isn't your suggestion (of
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's
good. But why they get GC'ed eventually?? I did a quick check of Solr code
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that
what is causing all this? This seems to suggest a design flaw in Solr's memory
management strategy (or just my ignorance about Solr?). I mean, wouldn't this
be the right way of doing it -- you allow user to specify the cache size in
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess
now it's better to have Solr on its own Tomcat, given that it's tricky to
adjust the java options.

thanks.

From: wun...@wunderwood.org
To: solr-user@lucene.apache.org
Subject: RE: Solr and Garbage Collection
Date: Fri, 25 Sep 2009 09:51:29 -0700

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X better with the concurrent generational GC compared to any of
their other GC algorithms, so we got the best throughput along with the
shortest pauses.

Solr garbage generation (for queries) seems to have two major components:
per-request garbage and cache evictions. With a generational collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so that it
can be collected rapidly and returned to use for further requests. If the
nursery is too small, the per-request allocations will be made in tenured
space and sit there until the next major GC. Cache evictions are almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of course)
while increasing the size of the nursery. That rate should drop when the
nursery gets big enough, then not drop much further as it is increased more.

After that, reduce the size of tenured space until major GCs start happening
too often (a judgment call). A bigger tenured space means longer major GCs
and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say 20% or
less, make that cache much bigger or set it to zero. Either one will reduce
the number of cache evictions. If you have an HTTP cache in front of Solr,
zero may be the right choice, since the HTTP cache is cherry-picking the
easily cacheable requests.

Note that a commit nearly doubles the memory required, because you have two
live Searcher objects with all their caches. Make sure you have headroom for
a commit.

If you want to test the tenured space usage, you must test with real world
queries. Those are the only way to get accurate cache eviction rates.

wunder

_
Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

Re: Solr and Garbage Collection

2009-10-02 Thread Mark Miller

siping liu wrote:
 Hi,

 I read pretty much all posts on this thread (before and after this one). 
 Looks like the main suggestion from you and others is to keep max heap size 
 (-Xmx) as small as possible (as long as you don't see OOM exception). This 
 brings more questions than answers (for me at least. I'm new to Solr).

  

 First, our environment and problem encountered: Solr1.4 (nightly build, 
 downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
 Solaris(multi-cpu/cores). The cache setting is from the default 
 solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and 
 quickly run into the problem similar to the one orignal poster reported -- 
 long pause (seconds to minutes) under load test. jconsole showed that it 
 pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC 
 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
 -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with 
 mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
 setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
 minutes on full GC to get more space from tenure generation. We tried 
 different Xmx (from very small to large), no difference in long GC time. We 
 never run into OOM.
   
MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
the Parallel collector. That also doesnt look like a good survivorratio.
  

 Questions:

 * In general various cachings are good for performance, we have more RAM to 
 use and want to use more caching to boost performance, isn't your suggestion 
 (of lowering heap limit) going against that?
   
Leaving RAM for the FileSystem cache is also very important. But you
should also have enough RAM for your Solr caches of course.
 * Looks like Solr caching made its way into tenure-generation on heap, that's 
 good. But why they get GC'ed eventually?? I did a quick check of Solr code 
 (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is 
 that what is causing all this? This seems to suggest a design flaw in Solr's 
 memory management strategy (or just my ignorance about Solr?). I mean, 
 wouldn't this be the right way of doing it -- you allow user to specify the 
 cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS 
 accordingly, and no need to use WeakReference (BTW, why not SoftReference)??
   
Do you see concurrent mode failure when looking at your gc logs? ie:

174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618
secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K),
4.0975124 secs] 228336K-162118K(241520K)

That means you have still getting major collections with CMS, and you
don't want that. You might try kicking GC off earlier with something
like: -XX:CMSInitiatingOccupancyFraction=50
 * Right now I have a single Tomcat hosting Solr and other applications. I 
 guess now it's better to have Solr on its own Tomcat, given that it's tricky 
 to adjust the java options.

  

 thanks.


  
   
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700

 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum query
 rate was 2-3X better with the concurrent generational GC compared to any of
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.

 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector. Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased more.

 After that, reduce the size of tenured space until major GCs start happening
 too often (a judgment call). A bigger tenured space means longer major GCs
 and thus longer pauses, so you don't want it oversized by too much.

 Also check the hit rates of your caches. If the hit rate is low, say 20% or
 less, make that cache much bigger or set it to zero. Either one will reduce
 the number of cache evictions. If you have an HTTP cache in front of Solr,
 zero may

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel

Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems
to solve this ugly bug. With the upgraded JVM I could run the solr servers
for more than 12 hours on the production environment with the GC mentioned
in the previous e-mails. The results are really amazing. The time spent on
collecting memory dropped from 11% to 3.81%Do you think there is more to
tune there?

Thanks!

Jonathan

On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote:

 You are running a very old version of Java 6 (update 6).  The latest is
 update 16.  You should definitely upgrade.  There is a bug in Java 6
 starting with update 4 that may result in a corrupted Lucene/Solr index:
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
 https://issues.apache.org/jira/browse/LUCENE-1282

 The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
 itself.  Upgrading to the latest release might help.  Switching to a
 different garbage collector should help.

 Bill

 On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com
 wrote:

  Jonathan Ariel wrote:
   Ok. After the server ran for more than 12 hours, the time spent on GC
   decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
  thread
   dump, maybe you can help identify what happened?
  
  Well thats a tough ;) My guess is its a bug :)
 
  Your two survivor spaces are filled, so it was likely about to move
  objects into the tenured space, which still has plenty of room for them
  (barring horrible fragmentation). Any issues with that type of thing
  should generate an OOM anyway though. You can find people that have run
  into similar issues in the past, but a lot of times unreproducible.
  Usually, their bugs are closed and they are told to try a newer JVM.
 
  Your JVM appears to be quite a few versions back. There have been many
  garbage collection bugs fixed in the 7 or so updates since your version,
  a good handful of them related to CMS.
 
  If you can, my best suggestion at the moment is to upgrade to the latest
  and see how that fairs.
 
  If not, you might see if going back to the throughput collector and
  turning on the parallel tenured space collector might meet your needs
  instead. You can work with other params to get that going better if you
  have to as well.
 
  Also, adjusting other settings with the low pause collector might
  trigger something to side step the bug. Not a great option there though
 ;)
 
  How many unique fields are you sorting/faceting on? It must be a lot if
  you need 10 gig for 8 million documents. Its kind of rough to have to
  work at such a close limit to your total heap available as a min mem
  requirement.
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
   #
   # An unexpected error has been detected by Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
   #
   # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
   linux-amd64)
   # Problematic frame:
   # V  [libjvm.so+0x265a2a]
   #
   # If you would like to submit a bug report, please visit:
   #   http://java.sun.com/webapps/bugreport/crash.jsp
   #
  
   ---  T H R E A D  ---
  
   Current thread (0x5be47400):  VMThread [stack:
   0x41bad000,0x41cae000] [id=32249]
  
   siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
   si_addr=0x
  
   Registers:
   RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
   RDX=0x005c49870037c996
   RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
   RDI=0x0037c985003a095e
   R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
   R11=0x0010
   R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
   R15=0x2aadab2015ac
   RIP=0x2b4e0f69ea2a, EFL=0x00010206,
  CSGSFS=0x0033,
   ERR=0x
 TRAPNO=0x000d
  
   Top of Stack: (sp=0x41cac550)
   0x41cac550:   41cac580 2b4e0f903c5b
   0x41cac560:   41cac590 0003
   0x41cac570:   2aac9289cf50 2aadab2015a8
   0x41cac580:   41cac5c0 2b4e0f72e388
   0x41cac590:   41cac5c0 2aac9289cf40
   0x41cac5a0:   0005 2b4e0fc86330
   0x41cac5b0:    2b4e0fd8c740
   0x41cac5c0:   41cac5f0 2b4e0f903b7f
   0x41cac5d0:   41cac610 0003
   0x41cac5e0:   2aaccb1750f8 2aaccea41570
   0x41cac5f0:   41cac610 2b4e0f931548
   0x41cac600:   2b4e0fc861d8 2aadd4052ab0
   0x41cac610:   41cac640 2b4e0f903d1a
   0x41cac620:   41cac650 0003
   0x41cac630:   5bc7d6d0 2b4e0fd8c740
   0x41cac640:   41cac650 2b4e0f90411c

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller

Do you have your GC logs? Are you still seeing major collections?

Where is the time spent?

Hard to say without some of that info.

The goal of the low pause collector is to finish collecting before the
tenured space is filled - if it doesn't, a standard major collection occurs.

The collector will use recent stats it records to try and pick a good
time to start - as a fail safe though, it will trigger no matter what at
a certain percentage. With Java 1.5, it was 68% full that it triggered.
With 1.6, its 92%.

If your still getting major collections, you might want to see if
lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not,
you might be near optimal settings.

There is likely not anything else you should mess with - unless using
the extra thread to collect while your app is running affects your apps
performance - in that case you might want to look into turning on the
incremental mode. But you havn't mentioned that, so I doubt it.



-- 
- Mark

http://www.lucidimagination.com



Jonathan Ariel wrote:
 Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems
 to solve this ugly bug. With the upgraded JVM I could run the solr servers
 for more than 12 hours on the production environment with the GC mentioned
 in the previous e-mails. The results are really amazing. The time spent on
 collecting memory dropped from 11% to 3.81%Do you think there is more to
 tune there?

 Thanks!

 Jonathan

 On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote:

   
 You are running a very old version of Java 6 (update 6).  The latest is
 update 16.  You should definitely upgrade.  There is a bug in Java 6
 starting with update 4 that may result in a corrupted Lucene/Solr index:
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
 https://issues.apache.org/jira/browse/LUCENE-1282

 The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
 itself.  Upgrading to the latest release might help.  Switching to a
 different garbage collector should help.

 Bill

 On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com
 wrote:

 
 Jonathan Ariel wrote:
   
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
 
 thread
   
 dump, maybe you can help identify what happened?

 
 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your version,
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the latest
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though
   
 ;)
 
 How many unique fields are you sorting/faceting on? It must be a lot if
 you need 10 gig for 8 million documents. Its kind of rough to have to
 work at such a close limit to your total heap available as a min mem
 requirement.

 --
 - Mark

 http://www.lucidimagination.com


   
 #
 # An unexpected error has been detected by Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
 #
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
 linux-amd64)
 # Problematic frame:
 # V  [libjvm.so+0x265a2a]
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x5be47400):  VMThread [stack:
 0x41bad000,0x41cae000] [id=32249]

 siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
 si_addr=0x

 Registers:
 RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
 RDX=0x005c49870037c996
 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
 RDI=0x0037c985003a095e
 R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
 R11=0x0010
 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
 R15=0x2aadab2015ac
 RIP=0x2b4e0f69ea2a, EFL=0x00010206,
 
 CSGSFS=0x0033,
   
 ERR=0x

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel

How do you track major collections? Even better, how do you log your GC
behavior with details? Right now I just log total time spent on collections,
but I don't really know on which collections.Regard application performance
with the ConcMarkSweepGC, I think I didn't experience any impact for now.
Actually the CPU usage of the solr servers is almost insignificant (it was
like that before).
BTW, do you know a good way to track the N most expensive solr queries? I
would like to measure that on 2 different solr servers with different GC.

On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com wrote:

 Do you have your GC logs? Are you still seeing major collections?

 Where is the time spent?

 Hard to say without some of that info.

 The goal of the low pause collector is to finish collecting before the
 tenured space is filled - if it doesn't, a standard major collection
 occurs.

 The collector will use recent stats it records to try and pick a good
 time to start - as a fail safe though, it will trigger no matter what at
 a certain percentage. With Java 1.5, it was 68% full that it triggered.
 With 1.6, its 92%.

 If your still getting major collections, you might want to see if
 lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not,
 you might be near optimal settings.

 There is likely not anything else you should mess with - unless using
 the extra thread to collect while your app is running affects your apps
 performance - in that case you might want to look into turning on the
 incremental mode. But you havn't mentioned that, so I doubt it.



 --
 - Mark

 http://www.lucidimagination.com



 Jonathan Ariel wrote:
  Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
 seems
  to solve this ugly bug. With the upgraded JVM I could run the solr
 servers
  for more than 12 hours on the production environment with the GC
 mentioned
  in the previous e-mails. The results are really amazing. The time spent
 on
  collecting memory dropped from 11% to 3.81%Do you think there is more to
  tune there?
 
  Thanks!
 
  Jonathan
 
  On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote:
 
 
  You are running a very old version of Java 6 (update 6).  The latest is
  update 16.  You should definitely upgrade.  There is a bug in Java 6
  starting with update 4 that may result in a corrupted Lucene/Solr index:
  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
  https://issues.apache.org/jira/browse/LUCENE-1282
 
  The JVM crash occurred in the gc thread.  So it looks like a bug in the
 JVM
  itself.  Upgrading to the latest release might help.  Switching to a
  different garbage collector should help.
 
  Bill
 
  On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
 
  Jonathan Ariel wrote:
 
  Ok. After the server ran for more than 12 hours, the time spent on GC
  decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
 
  thread
 
  dump, maybe you can help identify what happened?
 
 
  Well thats a tough ;) My guess is its a bug :)
 
  Your two survivor spaces are filled, so it was likely about to move
  objects into the tenured space, which still has plenty of room for them
  (barring horrible fragmentation). Any issues with that type of thing
  should generate an OOM anyway though. You can find people that have run
  into similar issues in the past, but a lot of times unreproducible.
  Usually, their bugs are closed and they are told to try a newer JVM.
 
  Your JVM appears to be quite a few versions back. There have been many
  garbage collection bugs fixed in the 7 or so updates since your
 version,
  a good handful of them related to CMS.
 
  If you can, my best suggestion at the moment is to upgrade to the
 latest
  and see how that fairs.
 
  If not, you might see if going back to the throughput collector and
  turning on the parallel tenured space collector might meet your needs
  instead. You can work with other params to get that going better if you
  have to as well.
 
  Also, adjusting other settings with the low pause collector might
  trigger something to side step the bug. Not a great option there though
 
  ;)
 
  How many unique fields are you sorting/faceting on? It must be a lot if
  you need 10 gig for 8 million documents. Its kind of rough to have to
  work at such a close limit to your total heap available as a min mem
  requirement.
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
  #
  # An unexpected error has been detected by Java Runtime Environment:
  #
  #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
  #
  # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
  linux-amd64)
  # Problematic frame:
  # V  [libjvm.so+0x265a2a]
  #
  # If you would like to submit a bug report, please visit:
  #   http://java.sun.com/webapps/bugreport/crash.jsp
  #
 
  ---  T H R E A D  ---
 
  Current thread (0x5be47400):

Re: Solr and Garbage Collection

2009-09-28 Thread Otis Gospodnetic

Jonathan,

Here is the JVM argument for logging GC activity:

-Xloggc:filelog GC status to a file with time stamps

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Jonathan Ariel ionat...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, September 28, 2009 4:49:03 PM
 Subject: Re: Solr and Garbage Collection
 
 How do you track major collections? Even better, how do you log your GC
 behavior with details? Right now I just log total time spent on collections,
 but I don't really know on which collections.Regard application performance
 with the ConcMarkSweepGC, I think I didn't experience any impact for now.
 Actually the CPU usage of the solr servers is almost insignificant (it was
 like that before).
 BTW, do you know a good way to track the N most expensive solr queries? I
 would like to measure that on 2 different solr servers with different GC.
 
 On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote:
 
  Do you have your GC logs? Are you still seeing major collections?
 
  Where is the time spent?
 
  Hard to say without some of that info.
 
  The goal of the low pause collector is to finish collecting before the
  tenured space is filled - if it doesn't, a standard major collection
  occurs.
 
  The collector will use recent stats it records to try and pick a good
  time to start - as a fail safe though, it will trigger no matter what at
  a certain percentage. With Java 1.5, it was 68% full that it triggered.
  With 1.6, its 92%.
 
  If your still getting major collections, you might want to see if
  lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
  you might be near optimal settings.
 
  There is likely not anything else you should mess with - unless using
  the extra thread to collect while your app is running affects your apps
  performance - in that case you might want to look into turning on the
  incremental mode. But you havn't mentioned that, so I doubt it.
 
 
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
  Jonathan Ariel wrote:
   Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
  seems
   to solve this ugly bug. With the upgraded JVM I could run the solr
  servers
   for more than 12 hours on the production environment with the GC
  mentioned
   in the previous e-mails. The results are really amazing. The time spent
  on
   collecting memory dropped from 11% to 3.81%Do you think there is more to
   tune there?
  
   Thanks!
  
   Jonathan
  
   On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote:
  
  
   You are running a very old version of Java 6 (update 6).  The latest is
   update 16.  You should definitely upgrade.  There is a bug in Java 6
   starting with update 4 that may result in a corrupted Lucene/Solr index:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
   https://issues.apache.org/jira/browse/LUCENE-1282
  
   The JVM crash occurred in the gc thread.  So it looks like a bug in the
  JVM
   itself.  Upgrading to the latest release might help.  Switching to a
   different garbage collector should help.
  
   Bill
  
   On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
   wrote:
  
  
   Jonathan Ariel wrote:
  
   Ok. After the server ran for more than 12 hours, the time spent on GC
   decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
  
   thread
  
   dump, maybe you can help identify what happened?
  
  
   Well thats a tough ;) My guess is its a bug :)
  
   Your two survivor spaces are filled, so it was likely about to move
   objects into the tenured space, which still has plenty of room for them
   (barring horrible fragmentation). Any issues with that type of thing
   should generate an OOM anyway though. You can find people that have run
   into similar issues in the past, but a lot of times unreproducible.
   Usually, their bugs are closed and they are told to try a newer JVM.
  
   Your JVM appears to be quite a few versions back. There have been many
   garbage collection bugs fixed in the 7 or so updates since your
  version,
   a good handful of them related to CMS.
  
   If you can, my best suggestion at the moment is to upgrade to the
  latest
   and see how that fairs.
  
   If not, you might see if going back to the throughput collector and
   turning on the parallel tenured space collector might meet your needs
   instead. You can work with other params to get that going better if you
   have to as well.
  
   Also, adjusting other settings with the low pause collector might
   trigger something to side step the bug. Not a great option there though
  
   ;)
  
   How many unique fields are you sorting/faceting on? It must be a lot if
   you need 10 gig for 8 million documents. Its kind of rough to have to
   work at such a close limit to your total heap available as a min mem
   requirement.
  
   --
   - Mark
  
   http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller

|-verbose:gc

|

|[GC 325407K-83000K(776768K), 0.2300771 secs]
[GC 325816K-83372K(776768K), 0.2454258 secs]
[Full GC 267628K-83769K(776768K), 1.8479984 secs]|

Additional details with: |-XX:+PrintGCDetails|

|[GC [DefNew: 64575K-959K(64576K), 0.0457646 secs] 196016K-133633K(261184K), 
0.0459067 secs]

And timestamps with: ||-XX:+PrintGCTimeStamps|

|111.042: [GC 111.042: [DefNew: 8128K-8128K(8128K), 0.505
secs]111.042: [Tenured: 18154K-2311K(24576K), 0.1290354 secs]
26282K-2311K(32704K), 0.1293306 secs] |

Jonathan Ariel wrote:
 How do you track major collections? Even better, how do you log your GC
 behavior with details? Right now I just log total time spent on collections,
 but I don't really know on which collections.Regard application performance
 with the ConcMarkSweepGC, I think I didn't experience any impact for now.
 Actually the CPU usage of the solr servers is almost insignificant (it was
 like that before).
 BTW, do you know a good way to track the N most expensive solr queries? I
 would like to measure that on 2 different solr servers with different GC.

 On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com wrote:

   
 Do you have your GC logs? Are you still seeing major collections?

 Where is the time spent?

 Hard to say without some of that info.

 The goal of the low pause collector is to finish collecting before the
 tenured space is filled - if it doesn't, a standard major collection
 occurs.

 The collector will use recent stats it records to try and pick a good
 time to start - as a fail safe though, it will trigger no matter what at
 a certain percentage. With Java 1.5, it was 68% full that it triggered.
 With 1.6, its 92%.

 If your still getting major collections, you might want to see if
 lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not,
 you might be near optimal settings.

 There is likely not anything else you should mess with - unless using
 the extra thread to collect while your app is running affects your apps
 performance - in that case you might want to look into turning on the
 incremental mode. But you havn't mentioned that, so I doubt it.



 --
 - Mark

 http://www.lucidimagination.com



 Jonathan Ariel wrote:
 
 Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
   
 seems
 
 to solve this ugly bug. With the upgraded JVM I could run the solr
   
 servers
 
 for more than 12 hours on the production environment with the GC
   
 mentioned
 
 in the previous e-mails. The results are really amazing. The time spent
   
 on
 
 collecting memory dropped from 11% to 3.81%Do you think there is more to
 tune there?

 Thanks!

 Jonathan

 On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote:


   
 You are running a very old version of Java 6 (update 6).  The latest is
 update 16.  You should definitely upgrade.  There is a bug in Java 6
 starting with update 4 that may result in a corrupted Lucene/Solr index:
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
 https://issues.apache.org/jira/browse/LUCENE-1282

 The JVM crash occurred in the gc thread.  So it looks like a bug in the
 
 JVM
 
 itself.  Upgrading to the latest release might help.  Switching to a
 different garbage collector should help.

 Bill

 On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com
 wrote:


 
 Jonathan Ariel wrote:

   
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the

 
 thread

   
 dump, maybe you can help identify what happened?


 
 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your
   
 version,
 
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the
   
 latest
 
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though

   
 ;)

 
 How many unique fields are you sorting/faceting on? It must be a lot if
 you need 10 gig for 8 million

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller

Another good option.

Here is a comparison of the commands I replied with and this one:

http://docs.hp.com/en/5992-5899/ch06s02.html

Very similar.

Otis Gospodnetic wrote:
 Jonathan,

 Here is the JVM argument for logging GC activity:

 -Xloggc:filelog GC status to a file with time stamps

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
   
 From: Jonathan Ariel ionat...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, September 28, 2009 4:49:03 PM
 Subject: Re: Solr and Garbage Collection

 How do you track major collections? Even better, how do you log your GC
 behavior with details? Right now I just log total time spent on collections,
 but I don't really know on which collections.Regard application performance
 with the ConcMarkSweepGC, I think I didn't experience any impact for now.
 Actually the CPU usage of the solr servers is almost insignificant (it was
 like that before).
 BTW, do you know a good way to track the N most expensive solr queries? I
 would like to measure that on 2 different solr servers with different GC.

 On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote:

 
 Do you have your GC logs? Are you still seeing major collections?

 Where is the time spent?

 Hard to say without some of that info.

 The goal of the low pause collector is to finish collecting before the
 tenured space is filled - if it doesn't, a standard major collection
 occurs.

 The collector will use recent stats it records to try and pick a good
 time to start - as a fail safe though, it will trigger no matter what at
 a certain percentage. With Java 1.5, it was 68% full that it triggered.
 With 1.6, its 92%.

 If your still getting major collections, you might want to see if
 lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
 you might be near optimal settings.

 There is likely not anything else you should mess with - unless using
 the extra thread to collect while your app is running affects your apps
 performance - in that case you might want to look into turning on the
 incremental mode. But you havn't mentioned that, so I doubt it.



 --
 - Mark

 http://www.lucidimagination.com



 Jonathan Ariel wrote:
   
 Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
 
 seems
   
 to solve this ugly bug. With the upgraded JVM I could run the solr
 
 servers
   
 for more than 12 hours on the production environment with the GC
 
 mentioned
   
 in the previous e-mails. The results are really amazing. The time spent
 
 on
   
 collecting memory dropped from 11% to 3.81%Do you think there is more to
 tune there?

 Thanks!

 Jonathan

 On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote:


 
 You are running a very old version of Java 6 (update 6).  The latest is
 update 16.  You should definitely upgrade.  There is a bug in Java 6
 starting with update 4 that may result in a corrupted Lucene/Solr index:
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
 https://issues.apache.org/jira/browse/LUCENE-1282

 The JVM crash occurred in the gc thread.  So it looks like a bug in the
   
 JVM
   
 itself.  Upgrading to the latest release might help.  Switching to a
 different garbage collector should help.

 Bill

 On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
 wrote:


   
 Jonathan Ariel wrote:

 
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the

   
 thread

 
 dump, maybe you can help identify what happened?


   
 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your
 
 version,
   
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the
 
 latest
   
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though

 
 ;)

   
 How many unique fields

Re: Solr and Garbage Collection

2009-09-28 Thread Bill Au

One way to track expensive is to look at the query time, QTime, in the solr
log.
There are a couple of tools for analyzing gc logs:

http://www.tagtraum.com/gcviewer.html
https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER

They will give you frequency and duration of minor and major collection.

On a multi-processor/core system with CPU cycles to spare, using the
concurrent collector will reduce (may even eliminate) major collection.  The
trade off is that CPU utilization on the system will go up.  When I tried it
with one of my Java app, the system utilization went up so much under heavy
load that it reduced the overall throughput of my app.  You milage may
varies.  You will have to measure it for your app to see for yourself.

Bill

On Mon, Sep 28, 2009 at 4:49 PM, Jonathan Ariel ionat...@gmail.com wrote:

 How do you track major collections? Even better, how do you log your GC
 behavior with details? Right now I just log total time spent on
 collections,
 but I don't really know on which collections.Regard application performance
 with the ConcMarkSweepGC, I think I didn't experience any impact for now.
 Actually the CPU usage of the solr servers is almost insignificant (it was
 like that before).
 BTW, do you know a good way to track the N most expensive solr queries? I
 would like to measure that on 2 different solr servers with different GC.

 On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com
 wrote:

  Do you have your GC logs? Are you still seeing major collections?
 
  Where is the time spent?
 
  Hard to say without some of that info.
 
  The goal of the low pause collector is to finish collecting before the
  tenured space is filled - if it doesn't, a standard major collection
  occurs.
 
  The collector will use recent stats it records to try and pick a good
  time to start - as a fail safe though, it will trigger no matter what at
  a certain percentage. With Java 1.5, it was 68% full that it triggered.
  With 1.6, its 92%.
 
  If your still getting major collections, you might want to see if
  lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not,
  you might be near optimal settings.
 
  There is likely not anything else you should mess with - unless using
  the extra thread to collect while your app is running affects your apps
  performance - in that case you might want to look into turning on the
  incremental mode. But you havn't mentioned that, so I doubt it.
 
 
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
  Jonathan Ariel wrote:
   Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
  seems
   to solve this ugly bug. With the upgraded JVM I could run the solr
  servers
   for more than 12 hours on the production environment with the GC
  mentioned
   in the previous e-mails. The results are really amazing. The time spent
  on
   collecting memory dropped from 11% to 3.81%Do you think there is more
 to
   tune there?
  
   Thanks!
  
   Jonathan
  
   On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote:
  
  
   You are running a very old version of Java 6 (update 6).  The latest
 is
   update 16.  You should definitely upgrade.  There is a bug in Java 6
   starting with update 4 that may result in a corrupted Lucene/Solr
 index:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
   https://issues.apache.org/jira/browse/LUCENE-1282
  
   The JVM crash occurred in the gc thread.  So it looks like a bug in
 the
  JVM
   itself.  Upgrading to the latest release might help.  Switching to a
   different garbage collector should help.
  
   Bill
  
   On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com
   wrote:
  
  
   Jonathan Ariel wrote:
  
   Ok. After the server ran for more than 12 hours, the time spent on
 GC
   decreased from 11% to 3,4%, but 5 hours later it crashed. This is
 the
  
   thread
  
   dump, maybe you can help identify what happened?
  
  
   Well thats a tough ;) My guess is its a bug :)
  
   Your two survivor spaces are filled, so it was likely about to move
   objects into the tenured space, which still has plenty of room for
 them
   (barring horrible fragmentation). Any issues with that type of thing
   should generate an OOM anyway though. You can find people that have
 run
   into similar issues in the past, but a lot of times unreproducible.
   Usually, their bugs are closed and they are told to try a newer JVM.
  
   Your JVM appears to be quite a few versions back. There have been
 many
   garbage collection bugs fixed in the 7 or so updates since your
  version,
   a good handful of them related to CMS.
  
   If you can, my best suggestion at the moment is to upgrade to the
  latest
   and see how that fairs.
  
   If not, you might see if going back to the throughput collector and
   turning on the parallel tenured space collector might meet your needs
   instead. You can work with other params to get that going better

Re: Solr and Garbage Collection

2009-09-27 Thread Jonathan Ariel

Yes, it seems like a bug. I will update my JVM, try again and let you
know the results :)

On 9/26/09, Mark Miller markrmil...@gmail.com wrote:
 Jonathan Ariel wrote:
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
 thread
 dump, maybe you can help identify what happened?

 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your version,
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the latest
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though ;)

 How many unique fields are you sorting/faceting on? It must be a lot if
 you need 10 gig for 8 million documents. Its kind of rough to have to
 work at such a close limit to your total heap available as a min mem
 requirement.

 --
 - Mark

 http://www.lucidimagination.com


 #
 # An unexpected error has been detected by Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
 #
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
 linux-amd64)
 # Problematic frame:
 # V  [libjvm.so+0x265a2a]
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x5be47400):  VMThread [stack:
 0x41bad000,0x41cae000] [id=32249]

 siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
 si_addr=0x

 Registers:
 RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
 RDX=0x005c49870037c996
 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
 RDI=0x0037c985003a095e
 R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
 R11=0x0010
 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
 R15=0x2aadab2015ac
 RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,
 ERR=0x
   TRAPNO=0x000d

 Top of Stack: (sp=0x41cac550)
 0x41cac550:   41cac580 2b4e0f903c5b
 0x41cac560:   41cac590 0003
 0x41cac570:   2aac9289cf50 2aadab2015a8
 0x41cac580:   41cac5c0 2b4e0f72e388
 0x41cac590:   41cac5c0 2aac9289cf40
 0x41cac5a0:   0005 2b4e0fc86330
 0x41cac5b0:    2b4e0fd8c740
 0x41cac5c0:   41cac5f0 2b4e0f903b7f
 0x41cac5d0:   41cac610 0003
 0x41cac5e0:   2aaccb1750f8 2aaccea41570
 0x41cac5f0:   41cac610 2b4e0f931548
 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
 0x41cac610:   41cac640 2b4e0f903d1a
 0x41cac620:   41cac650 0003
 0x41cac630:   5bc7d6d0 2b4e0fd8c740
 0x41cac640:   41cac650 2b4e0f90411c
 0x41cac650:   41cac680 2b4e0fa1d16e
 0x41cac660:    5bc7d6d0
 0x41cac670:   0002 2b4e0fd8c740
 0x41cac680:   41cac6c0 2b4e0fa74640
 0x41cac690:   41cac6b0 5bc7d6d0
 0x41cac6a0:   0002 2b4e0fd8c740
 0x41cac6b0:   0001 2b4e0fd8c740
 0x41cac6c0:   41cac700 2b4e0f9a52da
 0x41cac6d0:   bfc0 
 0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
 0x41cac6f0:   2b4e0fd8c740 0001
 0x41cac700:   41cac750 2b4e0f6feb80
 0x41cac710:   449dae1d9ae42358 3ff0cccd
 0x41cac720:   2aad289aa680 0001
 0x41cac730:    41cac780
 0x41cac740:   0001 5bc7d6d0

 Instructions: (pc=0x2b4e0f69ea2a)
 0x2b4e0f69ea1a:   89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10
 0x2b4e0f69ea2a:   48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48

Re: Solr and Garbage Collection

2009-09-27 Thread Jonathan Ariel

Well.. it is strange that when I use the default GC I don't get any errors.
If I'm so close to run out of memory I should see those OOM exceptions as
well with the standard GC.BTW I'm faceting on around 13 fields and my total
number of unique values is around 3.
One of the fields with the biggest amount of unique values has almost 16000
unique values.


On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi f...@efendi.ca wrote:

 Mark,


 Nothing against orange-hat :)

 Nothing against GC tuning; but if SOLR needs application-specific settings
 it should be well-documented.

 GC-tuning: for instance, we need it for 'realtime' Online Trading
 applications. However, even Online Banking doesn't need; primary reason -
 GC
 must happen 'outside of current transaction', GC 'must be predictable', and
 (for instance) Oracle/BEA JRockit has specific 'realtime' version for
 that... Does SOLR need that?


 Having load-stress simulator (multithreaded!!!) will definitely help to
 predict any possible bottleneck... it's even better to write it from
 scratch
 (depends on schema!), by sending random requests to SOLR in-parallel...
 instead of waiting when FieldCache tries to add new FieldImpl to cache
 (unpredictable!)


 Tomcat is multithreaded; what if end-users need to load 1000s large
 documents (in parallel! 1000s concurrent users), can you predict memory
 requirements and GC options without application-specific knowledge? What
 about new SOLR-Caches warming up?


 -Fuad


  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: September-27-09 2:46 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr and Garbage Collection
 
  If he needed double the RAM, he'd likely know by now :) The JVM likes to
  throw OOM exceptions when you need more RAM. Until it does - thats an
  odd path to focus on. There has been no indication he has ever seen an
  OOM with his over 10 GB heap.  It sounds like he has run Solr in his
  environment for quite a long time - after running for that long, until
  he gets an OOM, its about as good as chasing ghost to worry about it.
 
  I like to think of GC tuning as orange-hat. Mostly because I like the
  color orange.
 
  Fuad Efendi wrote:
   Ok. After the server ran for more than 12 hours, the time spent on GC
   decreased from 11% to 3,4%, but 5 hours later it crashed.
  
  
   All this 'black-hat' GC tuning and 'fast' object moving (especially
 objects
   accessing by some thread during GC-defragmentation)
  
   - try to use multithreaded load-stress tools (at least 100 requests
   in-parallel) and see that you need at least double memory if 12Gb is
   threshold for your FieldCache (largest objects)
  
  
   Also, don't trust this counters:
  
   So I logged the Garbage Collection activity to check if it's because
 of
   that. It seems like 11% of the time the application runs, it is
 stopped
   because of GC.
  
  
  
   Stopped? Of course, locking/unlocking in order to move objects
 currently
   accessesd in multiuser-multithreaded Tomcat... you can easily create
 crash
   scenario proving that latest-greatest JVMs are buggy too.
  
  
  
   Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in
 order
 to
   avoid OOM, you need to double it (in order to warm new cash instances
 on
   index replica / update).
  
  
   http://www.linkedin.com/in/liferay
  
  
  
 
 
  --
  - Mark
 
  http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-27 Thread Jonathan Ariel

Right... when I increased it to 12GB all OOM just disappear. And all the
tests are being run on the live environment and for several hours, so it is
real enough :)As soon as I update JVM and test again the GC I will let you
know. If you think I can run another test meanwhile just let me know.

On Sun, Sep 27, 2009 at 5:05 PM, Mark Miller markrmil...@gmail.com wrote:

 Jonathan Ariel wrote:
  Well.. it is strange that when I use the default GC I don't get any
 errors.
 
 Not so strange - it's different code. The bug is Likely in the low pause
 collector and not the serial collector.
  If I'm so close to run out of memory I should see those OOM exceptions as
  well with the standard GC.
 Those? Your not seeing any that you mentioned unless you lower your heap?
  BTW I'm faceting on around 13 fields and my total
  number of unique values is around 3.
  One of the fields with the biggest amount of unique values has almost
 16000
  unique values.
 
 
  On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi f...@efendi.ca wrote:
 
 
  Mark,
 
 
  Nothing against orange-hat :)
 
  Nothing against GC tuning; but if SOLR needs application-specific
 settings
  it should be well-documented.
 
  GC-tuning: for instance, we need it for 'realtime' Online Trading
  applications. However, even Online Banking doesn't need; primary reason
 -
  GC
  must happen 'outside of current transaction', GC 'must be predictable',
 and
  (for instance) Oracle/BEA JRockit has specific 'realtime' version for
  that... Does SOLR need that?
 
 
  Having load-stress simulator (multithreaded!!!) will definitely help to
  predict any possible bottleneck... it's even better to write it from
  scratch
  (depends on schema!), by sending random requests to SOLR in-parallel...
  instead of waiting when FieldCache tries to add new FieldImpl to cache
  (unpredictable!)
 
 
  Tomcat is multithreaded; what if end-users need to load 1000s large
  documents (in parallel! 1000s concurrent users), can you predict memory
  requirements and GC options without application-specific knowledge? What
  about new SOLR-Caches warming up?
 
 
  -Fuad
 
 
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: September-27-09 2:46 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr and Garbage Collection
 
  If he needed double the RAM, he'd likely know by now :) The JVM likes
 to
  throw OOM exceptions when you need more RAM. Until it does - thats an
  odd path to focus on. There has been no indication he has ever seen an
  OOM with his over 10 GB heap.  It sounds like he has run Solr in his
  environment for quite a long time - after running for that long, until
  he gets an OOM, its about as good as chasing ghost to worry about it.
 
  I like to think of GC tuning as orange-hat. Mostly because I like the
  color orange.
 
  Fuad Efendi wrote:
 
  Ok. After the server ran for more than 12 hours, the time spent on
 GC
  decreased from 11% to 3,4%, but 5 hours later it crashed.
 
 
  All this 'black-hat' GC tuning and 'fast' object moving (especially
 
  objects
 
  accessing by some thread during GC-defragmentation)
 
  - try to use multithreaded load-stress tools (at least 100 requests
  in-parallel) and see that you need at least double memory if 12Gb is
  threshold for your FieldCache (largest objects)
 
 
  Also, don't trust this counters:
 
 
  So I logged the Garbage Collection activity to check if it's because
 
  of
 
  that. It seems like 11% of the time the application runs, it is
 
  stopped
 
  because of GC.
 
 
  Stopped? Of course, locking/unlocking in order to move objects
 
  currently
 
  accessesd in multiuser-multithreaded Tomcat... you can easily create
 
  crash
 
  scenario proving that latest-greatest JVMs are buggy too.
 
 
 
  Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in
 
  order
  to
 
  avoid OOM, you need to double it (in order to warm new cash instances
 
  on
 
  index replica / update).
 
 
  http://www.linkedin.com/in/liferay
 
 
 
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
 
 
 
 
 


 --
 - Mark

 http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-27 Thread Bill Au

You are running a very old version of Java 6 (update 6).  The latest is
update 16.  You should definitely upgrade.  There is a bug in Java 6
starting with update 4 that may result in a corrupted Lucene/Solr index:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044
https://issues.apache.org/jira/browse/LUCENE-1282

The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
itself.  Upgrading to the latest release might help.  Switching to a
different garbage collector should help.

Bill

On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote:

 Jonathan Ariel wrote:
  Ok. After the server ran for more than 12 hours, the time spent on GC
  decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
 thread
  dump, maybe you can help identify what happened?
 
 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your version,
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the latest
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though ;)

 How many unique fields are you sorting/faceting on? It must be a lot if
 you need 10 gig for 8 million documents. Its kind of rough to have to
 work at such a close limit to your total heap available as a min mem
 requirement.

 --
 - Mark

 http://www.lucidimagination.com


  #
  # An unexpected error has been detected by Java Runtime Environment:
  #
  #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
  #
  # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
  linux-amd64)
  # Problematic frame:
  # V  [libjvm.so+0x265a2a]
  #
  # If you would like to submit a bug report, please visit:
  #   http://java.sun.com/webapps/bugreport/crash.jsp
  #
 
  ---  T H R E A D  ---
 
  Current thread (0x5be47400):  VMThread [stack:
  0x41bad000,0x41cae000] [id=32249]
 
  siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
  si_addr=0x
 
  Registers:
  RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
  RDX=0x005c49870037c996
  RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
  RDI=0x0037c985003a095e
  R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
  R11=0x0010
  R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
  R15=0x2aadab2015ac
  RIP=0x2b4e0f69ea2a, EFL=0x00010206,
 CSGSFS=0x0033,
  ERR=0x
TRAPNO=0x000d
 
  Top of Stack: (sp=0x41cac550)
  0x41cac550:   41cac580 2b4e0f903c5b
  0x41cac560:   41cac590 0003
  0x41cac570:   2aac9289cf50 2aadab2015a8
  0x41cac580:   41cac5c0 2b4e0f72e388
  0x41cac590:   41cac5c0 2aac9289cf40
  0x41cac5a0:   0005 2b4e0fc86330
  0x41cac5b0:    2b4e0fd8c740
  0x41cac5c0:   41cac5f0 2b4e0f903b7f
  0x41cac5d0:   41cac610 0003
  0x41cac5e0:   2aaccb1750f8 2aaccea41570
  0x41cac5f0:   41cac610 2b4e0f931548
  0x41cac600:   2b4e0fc861d8 2aadd4052ab0
  0x41cac610:   41cac640 2b4e0f903d1a
  0x41cac620:   41cac650 0003
  0x41cac630:   5bc7d6d0 2b4e0fd8c740
  0x41cac640:   41cac650 2b4e0f90411c
  0x41cac650:   41cac680 2b4e0fa1d16e
  0x41cac660:    5bc7d6d0
  0x41cac670:   0002 2b4e0fd8c740
  0x41cac680:   41cac6c0 2b4e0fa74640
  0x41cac690:   41cac6b0 5bc7d6d0
  0x41cac6a0:   0002 2b4e0fd8c740
  0x41cac6b0:   0001 2b4e0fd8c740
  0x41cac6c0:   41cac700 2b4e0f9a52da
  0x41cac6d0:   bfc0 
  0x41cac6e0:   2b4e0fd8c740 5bc7d6d0

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller

Jonathan Ariel wrote:
 I have around 8M documents.
   
Thats actually not so bad - I take it you are faceting/sorting on quite
a few unique fields?

 I set up my server to use a different collector and it seems like it
 decreased from 11% to 4%, of course I need to wait a bit more because it is
 just a 1 hour old log. But it seems like it is much better now.
 I will tell you on Monday the results :)
   
Are you still seeing major collections then? (eg the tenured space hits
its limit) You might be able to get even better.
 On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller markrmil...@gmail.com wrote:

   
 Thats a good point too - if you can reduce your need for such a large
 heap, by all means, do so.

 However, considering you already need at least 10GB or you get OOM, you
 have a long way to go with that approach. Good luck :)

 How many docs do you have ? I'm guessing its mostly FieldCache type
 stuff, and thats the type of thing you can't really side step, unless
 you give up the functionality thats using it.

 Grant Ingersoll wrote:
 
 On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:

   
 Hi to all!
 Lately my solr servers seem to stop responding once in a while. I'm
 using
 solr 1.3.
 Of course I'm having more traffic on the servers.
 So I logged the Garbage Collection activity to check if it's because of
 that. It seems like 11% of the time the application runs, it is stopped
 because of GC. And some times the GC takes up to 10 seconds!
 Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
 servers. My index is around 10GB and I'm giving to the instances 10GB of
 RAM.

 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
 you have
 any recommendation on this?
 
 As I said in Eteve's thread on JVM settings, some extra time spent on
 application design/debugging will save a whole lot of headache in
 Garbage Collection and trying to tune the gazillion different options
 available.  Ask yourself:  What is on the heap and does it need to be
 there?  For instance, do you, if you have them, really need sortable
 ints?   If your servers seem to come to a stop, I'm going to bet you
 have major collections going on.  Major collections in a production
 system are very bad.  They tend to happen right after commits in
 poorly tuned systems, but can also happen in other places if you let
 things build up due to really large heaps and/or things like really
 large cache settings.  I would pull up jConsole and have a look at
 what is happening when the pauses occur.  Is it a major collection?
 If so, then hook up a heap analyzer or a profiler and see what is on
 the heap around those times.  Then have a look at your schema/config,
 etc. and see if there are things that are memory intensive (sorting,
 faceting, excessively large filter caches).

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

   
 --
 - Mark

 http://www.lucidimagination.com




 

   


-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-26 Thread Jonathan Ariel

Ok. After the server ran for more than 12 hours, the time spent on GC
decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
dump, maybe you can help identify what happened?
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x265a2a]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x5be47400):  VMThread [stack:
0x41bad000,0x41cae000] [id=32249]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
si_addr=0x

Registers:
RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
RDX=0x005c49870037c996
RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
RDI=0x0037c985003a095e
R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
R11=0x0010
R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
R15=0x2aadab2015ac
RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,
ERR=0x
  TRAPNO=0x000d

Top of Stack: (sp=0x41cac550)
0x41cac550:   41cac580 2b4e0f903c5b
0x41cac560:   41cac590 0003
0x41cac570:   2aac9289cf50 2aadab2015a8
0x41cac580:   41cac5c0 2b4e0f72e388
0x41cac590:   41cac5c0 2aac9289cf40
0x41cac5a0:   0005 2b4e0fc86330
0x41cac5b0:    2b4e0fd8c740
0x41cac5c0:   41cac5f0 2b4e0f903b7f
0x41cac5d0:   41cac610 0003
0x41cac5e0:   2aaccb1750f8 2aaccea41570
0x41cac5f0:   41cac610 2b4e0f931548
0x41cac600:   2b4e0fc861d8 2aadd4052ab0
0x41cac610:   41cac640 2b4e0f903d1a
0x41cac620:   41cac650 0003
0x41cac630:   5bc7d6d0 2b4e0fd8c740
0x41cac640:   41cac650 2b4e0f90411c
0x41cac650:   41cac680 2b4e0fa1d16e
0x41cac660:    5bc7d6d0
0x41cac670:   0002 2b4e0fd8c740
0x41cac680:   41cac6c0 2b4e0fa74640
0x41cac690:   41cac6b0 5bc7d6d0
0x41cac6a0:   0002 2b4e0fd8c740
0x41cac6b0:   0001 2b4e0fd8c740
0x41cac6c0:   41cac700 2b4e0f9a52da
0x41cac6d0:   bfc0 
0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
0x41cac6f0:   2b4e0fd8c740 0001
0x41cac700:   41cac750 2b4e0f6feb80
0x41cac710:   449dae1d9ae42358 3ff0cccd
0x41cac720:   2aad289aa680 0001
0x41cac730:    41cac780
0x41cac740:   0001 5bc7d6d0

Instructions: (pc=0x2b4e0f69ea2a)
0x2b4e0f69ea1a:   89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10
0x2b4e0f69ea2a:   48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48

Stack: [0x41bad000,0x41cae000],  sp=0x41cac550,
 free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0x265a2a]
V  [libjvm.so+0x4cac5b]
V  [libjvm.so+0x2f5388]
V  [libjvm.so+0x4cab7f]
V  [libjvm.so+0x4f8548]
V  [libjvm.so+0x4cad1a]
V  [libjvm.so+0x4cb11c]
V  [libjvm.so+0x5e416e]
V  [libjvm.so+0x63b640]
V  [libjvm.so+0x56c2da]
V  [libjvm.so+0x2c5b80]
V  [libjvm.so+0x2c8866]
V  [libjvm.so+0x2c7f10]
V  [libjvm.so+0x2551ba]
V  [libjvm.so+0x254a6a]
V  [libjvm.so+0x254778]
V  [libjvm.so+0x2c579c]
V  [libjvm.so+0x23502a]
V  [libjvm.so+0x2c5b0e]
V  [libjvm.so+0x661a5e]
V  [libjvm.so+0x66e48a]
V  [libjvm.so+0x66da32]
V  [libjvm.so+0x66dcb4]
V  [libjvm.so+0x66d7ae]
V  [libjvm.so+0x50628a]

VM_Operation (0x4076bd20): GenCollectForAllocation, mode: safepoint,
requested by thread 0x5c42d800


---  P R O C E S S  ---

Java Threads: ( = current thread )
  0x5c466400 JavaThread btpool0-502 [_thread_blocked, id=4508,
stack(0x46332000,0x46433000)]
  0x5c2a2400 JavaThread btpool0-501 [_thread_blocked, id=4507,
stack(0x428f8000,0x429f9000)]
  0x5c0fec00 JavaThread btpool0-500 [_thread_blocked, id=4506,
stack(0x43e0d000,0x43f0e000)]
  0x5c2ce400 JavaThread btpool0-498 [_thread_blocked, id=4504,
stack(0x42dfd000,0x42efe000)]
  0x5be69000 JavaThread btpool0-497 [_thread_blocked, id=4503,
stack(0x45f2e000,0x4602f000)]
  0x5c30e000 JavaThread btpool0-496 [_thread_blocked, id=4251,

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller

Jonathan Ariel wrote:
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
 dump, maybe you can help identify what happened?
   
Well thats a tough ;) My guess is its a bug :)

Your two survivor spaces are filled, so it was likely about to move
objects into the tenured space, which still has plenty of room for them
(barring horrible fragmentation). Any issues with that type of thing
should generate an OOM anyway though. You can find people that have run
into similar issues in the past, but a lot of times unreproducible.
Usually, their bugs are closed and they are told to try a newer JVM.

Your JVM appears to be quite a few versions back. There have been many
garbage collection bugs fixed in the 7 or so updates since your version,
a good handful of them related to CMS.

If you can, my best suggestion at the moment is to upgrade to the latest
and see how that fairs.

If not, you might see if going back to the throughput collector and
turning on the parallel tenured space collector might meet your needs
instead. You can work with other params to get that going better if you
have to as well.

Also, adjusting other settings with the low pause collector might
trigger something to side step the bug. Not a great option there though ;)

How many unique fields are you sorting/faceting on? It must be a lot if
you need 10 gig for 8 million documents. Its kind of rough to have to
work at such a close limit to your total heap available as a min mem
requirement.

-- 
- Mark

http://www.lucidimagination.com


 #
 # An unexpected error has been detected by Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
 #
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
 linux-amd64)
 # Problematic frame:
 # V  [libjvm.so+0x265a2a]
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x5be47400):  VMThread [stack:
 0x41bad000,0x41cae000] [id=32249]

 siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
 si_addr=0x

 Registers:
 RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
 RDX=0x005c49870037c996
 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
 RDI=0x0037c985003a095e
 R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
 R11=0x0010
 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
 R15=0x2aadab2015ac
 RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,
 ERR=0x
   TRAPNO=0x000d

 Top of Stack: (sp=0x41cac550)
 0x41cac550:   41cac580 2b4e0f903c5b
 0x41cac560:   41cac590 0003
 0x41cac570:   2aac9289cf50 2aadab2015a8
 0x41cac580:   41cac5c0 2b4e0f72e388
 0x41cac590:   41cac5c0 2aac9289cf40
 0x41cac5a0:   0005 2b4e0fc86330
 0x41cac5b0:    2b4e0fd8c740
 0x41cac5c0:   41cac5f0 2b4e0f903b7f
 0x41cac5d0:   41cac610 0003
 0x41cac5e0:   2aaccb1750f8 2aaccea41570
 0x41cac5f0:   41cac610 2b4e0f931548
 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
 0x41cac610:   41cac640 2b4e0f903d1a
 0x41cac620:   41cac650 0003
 0x41cac630:   5bc7d6d0 2b4e0fd8c740
 0x41cac640:   41cac650 2b4e0f90411c
 0x41cac650:   41cac680 2b4e0fa1d16e
 0x41cac660:    5bc7d6d0
 0x41cac670:   0002 2b4e0fd8c740
 0x41cac680:   41cac6c0 2b4e0fa74640
 0x41cac690:   41cac6b0 5bc7d6d0
 0x41cac6a0:   0002 2b4e0fd8c740
 0x41cac6b0:   0001 2b4e0fd8c740
 0x41cac6c0:   41cac700 2b4e0f9a52da
 0x41cac6d0:   bfc0 
 0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
 0x41cac6f0:   2b4e0fd8c740 0001
 0x41cac700:   41cac750 2b4e0f6feb80
 0x41cac710:   449dae1d9ae42358 3ff0cccd
 0x41cac720:   2aad289aa680 0001
 0x41cac730:    41cac780
 0x41cac740:   0001 5bc7d6d0

 Instructions: (pc=0x2b4e0f69ea2a)
 0x2b4e0f69ea1a:   89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10
 0x2b4e0f69ea2a:   48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48

 Stack: [0x41bad000,0x41cae000],  sp=0x41cac550,
  free space=1021k
 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller

Also, in case the info might help track something down:

Its pretty darn odd that both your survivor spaces are full. I've never
seen that ever in one of these dumps. Always one is empty. When one is
filled, its moved to the other. Then back. And forth. For a certain
number of times until its moved into the tenured space. Both being
filled like that really seems like a bug to me - I've looked over tons
of the dumps in the past (random ones online), and I have never seen one
of the survivor spaces not empty.

Mark Miller wrote:
 Jonathan Ariel wrote:
   
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
 dump, maybe you can help identify what happened?
   
 
 Well thats a tough ;) My guess is its a bug :)

 Your two survivor spaces are filled, so it was likely about to move
 objects into the tenured space, which still has plenty of room for them
 (barring horrible fragmentation). Any issues with that type of thing
 should generate an OOM anyway though. You can find people that have run
 into similar issues in the past, but a lot of times unreproducible.
 Usually, their bugs are closed and they are told to try a newer JVM.

 Your JVM appears to be quite a few versions back. There have been many
 garbage collection bugs fixed in the 7 or so updates since your version,
 a good handful of them related to CMS.

 If you can, my best suggestion at the moment is to upgrade to the latest
 and see how that fairs.

 If not, you might see if going back to the throughput collector and
 turning on the parallel tenured space collector might meet your needs
 instead. You can work with other params to get that going better if you
 have to as well.

 Also, adjusting other settings with the low pause collector might
 trigger something to side step the bug. Not a great option there though ;)

 How many unique fields are you sorting/faceting on? It must be a lot if
 you need 10 gig for 8 million documents. Its kind of rough to have to
 work at such a close limit to your total heap available as a min mem
 requirement.

   


-- 
- Mark

http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread cbennett

Hi,

Have you looked at tuning the garbage collection ?

Take a look at the following articles

http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot
-camp-draft/
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

Changing to the concurrent or throughput collector should help with the long
pauses.


Colin.

-Original Message-
From: Jonathan Ariel [mailto:ionat...@gmail.com] 
Sent: Friday, September 25, 2009 11:37 AM
To: solr-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: Solr and Garbage Collection

Right, now I'm giving it 12GB of heap memory.
If I give it less (10GB) it throws the following exception:

Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
61)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3
52)
at
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2
67)
at
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2
07)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java
:70)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
ler.java:169)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
03)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
ection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11
4)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:
835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22
6)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4
42)

On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel ionat...@gmail.com
 wrote:
  Hi to all!
  Lately my solr servers seem to stop responding once in a while. I'm
using
  solr 1.3.
  Of course I'm having more traffic on the servers.
  So I logged the Garbage Collection activity to check if it's because of
  that. It seems like 11% of the time the application runs, it is stopped
  because of GC. And some times the GC takes up to 10 seconds!
  Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
  servers. My index is around 10GB and I'm giving to the instances 10GB of
  RAM.

 Bigger heaps lead to bigger GC pauses in general.
 Do you mean that you are giving the JVM a 10GB heap?  Were you getting
 OOM exceptions with a smaller heap?

 -Yonik
 http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi

Give it even more memory.

Lucene FieldCache is used to store non-tokenized single-value non-boolean
(DocumentId - FieldValue) pairs, and it is used (in-full!) for instance for
sorting query results.

So that if you have 100,000,000 documents with specific heavily distributed
field values (cardinality is high! Size is 100bytes!) you need
10,000,000,000 bytes for just this instance of FieldCache.

GC does not play any role. FieldCache won't be GC-collected.


-Fuad
http://www.linkedin.com/in/liferay



 -Original Message-
 From: Jonathan Ariel [mailto:ionat...@gmail.com]
 Sent: September-25-09 11:37 AM
 To: solr-user@lucene.apache.org; yo...@lucidimagination.com
 Subject: Re: Solr and Garbage Collection
 
 Right, now I'm giving it 12GB of heap memory.
 If I give it less (10GB) it throws the following exception:
 
 Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: Java heap space
 at

org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
61
 )
 at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
 at

org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3
52
 )
 at

org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2
67
 )
 at
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185)
 at

org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2
07
 )
 at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104)
 at

org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java
:7
 0)
 at

org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
le
 r.java:169)
 at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
ja
 va:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
 at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
03
 )
 at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
23
 2)
 at

org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.j
 ava:1089)
 at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at

org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at

org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
ec
 tion.java:211)
 at

org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11
4)
 at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at

org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:
83
 5)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
 at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at

org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22
6)
 at

org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4
42
 )
 
 On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
 yo...@lucidimagination.comwrote:
 
  On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel ionat...@gmail.com
  wrote:
   Hi to all!
   Lately my solr servers seem to stop responding once in a while. I'm
using
   solr 1.3.
   Of course I'm having more traffic on the servers.
   So I logged the Garbage Collection activity to check if it's because
of
   that. It seems like 11% of the time the application runs, it is
stopped
   because of GC. And some times the GC takes up to 10 seconds!
   Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel
Xeon
   servers. My index is around 10GB and I'm giving to the instances 10GB
of
   RAM.
 
  Bigger heaps lead to bigger GC pauses in general.
  Do you mean that you are giving the JVM a 10GB heap?  Were you getting
  OOM exceptions with a smaller heap?
 
  -Yonik
  http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi

 You are saying that I should give more memory than 12GB?


Yes. Look at this:

  SEVERE: java.lang.OutOfMemoryError: Java heap space

org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
 61
  )



It can't find few (!!!) contiguous bytes for .createValue(...)

It can't add (Field Value, Document ID) pair to an array.

GC tuning won't help in this specific case...

May be SOLR/Lucene core developers may WARM FieldCache at IndexReader
opening time, in the future... to have early OOM...


Avoiding faceting (and sorting) on such field will only postpone OOM to
unpredictable date/time...


-Fuad
http://www.linkedin.com/in/liferay

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

It won't really - it will just keep the JVM from wasting time resizing
the heap on you. Since you know you need so much RAM anyway, no reason
not to just pin it at what you need.
Not going to help you much with GC though.

Jonathan Ariel wrote:
 BTW why making them equal will lower the frequency of GC?

 On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
   
 Bigger heaps lead to bigger GC pauses in general.
   
 Opposite viewpoint:
 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.

 To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

 Use -server option.

 -server option of JVM is 'native CPU code', I remember WebLogic 7 console
 with SUN JVM 1.3 not showing any GC (just horizontal line).

 -Fuad
 http://www.linkedin.com/in/liferay




 


-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

-server option of JVM is 'native CPU code', I remember WebLogic 7 console
with SUN JVM 1.3 not showing any GC (just horizontal line).

Not sure what that is all about either. -server and -client are just two
different versions of hotspot.
The -server version is optimized for long running applications - it
starts slower, and over time, it learns
about your app and makes good throughput optimizations.

The -client hotspot version works faster quicker, and does concentrate
more on response than throughput.
Better for desktop apps. -server is better for long lived server apps.
Generally.

Mark Miller wrote:
 It won't really - it will just keep the JVM from wasting time resizing
 the heap on you. Since you know you need so much RAM anyway, no reason
 not to just pin it at what you need.
 Not going to help you much with GC though.

 Jonathan Ariel wrote:
   
 BTW why making them equal will lower the frequency of GC?

 On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
   
 
 Bigger heaps lead to bigger GC pauses in general.
   
 
 Opposite viewpoint:
 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.

 To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

 Use -server option.

 -server option of JVM is 'native CPU code', I remember WebLogic 7 console
 with SUN JVM 1.3 not showing any GC (just horizontal line).

 -Fuad
 http://www.linkedin.com/in/liferay




 
   


   


-- 
- Mark

http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X better with the concurrent generational GC compared to any of
their other GC algorithms, so we got the best throughput along with the
shortest pauses.

Solr garbage generation (for queries) seems to have two major components:
per-request garbage and cache evictions. With a generational collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so that it
can be collected rapidly and returned to use for further requests. If the
nursery is too small, the per-request allocations will be made in tenured
space and sit there until the next major GC. Cache evictions are almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of course)
while increasing the size of the nursery. That rate should drop when the
nursery gets big enough, then not drop much further as it is increased more.

After that, reduce the size of tenured space until major GCs start happening
too often (a judgment call). A bigger tenured space means longer major GCs
and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say 20% or
less, make that cache much bigger or set it to zero. Either one will reduce
the number of cache evictions. If you have an HTTP cache in front of Solr,
zero may be the right choice, since the HTTP cache is cherry-picking the
easily cacheable requests.

Note that a commit nearly doubles the memory required, because you have two
live Searcher objects with all their caches. Make sure you have headroom for
a commit.

If you want to test the tenured space usage, you must test with real world
queries. Those are the only way to get accurate cache eviction rates.

wunder

-Original Message-
From: Jonathan Ariel [mailto:ionat...@gmail.com] 
Sent: Friday, September 25, 2009 9:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Garbage Collection

BTW why making them equal will lower the frequency of GC?

On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
 Bigger heaps lead to bigger GC pauses in general.

 Opposite viewpoint:
 1sec GC happening once an hour is MUCH BETTER than 30ms GC
once-per-second.

 To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

 Use -server option.

 -server option of JVM is 'native CPU code', I remember WebLogic 7 console
 with SUN JVM 1.3 not showing any GC (just horizontal line).

 -Fuad
 http://www.linkedin.com/in/liferay

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

Walter Underwood wrote:
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum query
 rate was 2-3X better with the concurrent generational GC compared to any of
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
   
With which collector? Since the very early JVM's, all GC is generational.
Most of the collectors (other than the Serial Collector) also work
concurrently.
By default, they are concurrent on different generations, but you can
add concurrency
to the other generation with each now too.
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector.
Different parts of the collector? Its a different collector depending on
the generation.
The young generation is collected with a copy collector. This is because
almost all the objects
in the young generation are likely dead, and a copy collector only needs
to visit live objects. So
its very efficient. The tenured generation uses something more along the
lines of mark and sweep or mark
and compact.
  Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased more.

 After that, reduce the size of tenured space until major GCs start happening
 too often (a judgment call). A bigger tenured space means longer major GCs
 and thus longer pauses, so you don't want it oversized by too much.
   
With the concurrent low pause collector, the goal is to avoid major
collections,
by collecting *before* the tenured space is filled. If you you are
getting major collections,
you need to tune your settings - the whole point of that collector is to
avoid major
collections, and do almost all of the work while your application is not
paused. There are
still 2 brief pauses during the collection, but they should not be
significant at all.
 Also check the hit rates of your caches. If the hit rate is low, say 20% or
 less, make that cache much bigger or set it to zero. Either one will reduce
 the number of cache evictions. If you have an HTTP cache in front of Solr,
 zero may be the right choice, since the HTTP cache is cherry-picking the
 easily cacheable requests.

 Note that a commit nearly doubles the memory required, because you have two
 live Searcher objects with all their caches. Make sure you have headroom for
 a commit.

 If you want to test the tenured space usage, you must test with real world
 queries. Those are the only way to get accurate cache eviction rates.

 wunder

 -Original Message-
 From: Jonathan Ariel [mailto:ionat...@gmail.com] 
 Sent: Friday, September 25, 2009 9:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 BTW why making them equal will lower the frequency of GC?

 On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
   
 Bigger heaps lead to bigger GC pauses in general.
   
 Opposite viewpoint:
 1sec GC happening once an hour is MUCH BETTER than 30ms GC
 
 once-per-second.
   
 To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

 Use -server option.

 -server option of JVM is 'native CPU code', I remember WebLogic 7 console
 with SUN JVM 1.3 not showing any GC (just horizontal line).

 -Fuad
 http://www.linkedin.com/in/liferay




 


   


-- 
- Mark

http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood

As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low
pause collector is only in the Sun JVM.

I just found this excellent article about the various IBM GC options for a
Lucene application with a 100GB heap:

http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
_h.html

wunder

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Friday, September 25, 2009 10:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Garbage Collection

Walter Underwood wrote:
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum
query
 rate was 2-3X better with the concurrent generational GC compared to any
of
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
   
With which collector? Since the very early JVM's, all GC is generational.
Most of the collectors (other than the Serial Collector) also work
concurrently.
By default, they are concurrent on different generations, but you can
add concurrency
to the other generation with each now too.
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector.
Different parts of the collector? Its a different collector depending on
the generation.
The young generation is collected with a copy collector. This is because
almost all the objects
in the young generation are likely dead, and a copy collector only needs
to visit live objects. So
its very efficient. The tenured generation uses something more along the
lines of mark and sweep or mark
and compact.
  Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased
more.

 After that, reduce the size of tenured space until major GCs start
happening
 too often (a judgment call). A bigger tenured space means longer major
GCs
 and thus longer pauses, so you don't want it oversized by too much.
   
With the concurrent low pause collector, the goal is to avoid major
collections,
by collecting *before* the tenured space is filled. If you you are
getting major collections,
you need to tune your settings - the whole point of that collector is to
avoid major
collections, and do almost all of the work while your application is not
paused. There are
still 2 brief pauses during the collection, but they should not be
significant at all.
 Also check the hit rates of your caches. If the hit rate is low, say 20%
or
 less, make that cache much bigger or set it to zero. Either one will
reduce
 the number of cache evictions. If you have an HTTP cache in front of Solr,
 zero may be the right choice, since the HTTP cache is cherry-picking the
 easily cacheable requests.

 Note that a commit nearly doubles the memory required, because you have
two
 live Searcher objects with all their caches. Make sure you have headroom
for
 a commit.

 If you want to test the tenured space usage, you must test with real world
 queries. Those are the only way to get accurate cache eviction rates.

 wunder

 -Original Message-
 From: Jonathan Ariel [mailto:ionat...@gmail.com] 
 Sent: Friday, September 25, 2009 9:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 BTW why making them equal will lower the frequency of GC?

 On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
   
 Bigger heaps lead to bigger GC pauses in general.
   
 Opposite viewpoint:
 1sec GC happening once an hour is MUCH BETTER than 30ms GC
 
 once-per-second.
   
 To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

 Use -server option.

 -server option of JVM is 'native CPU code', I remember WebLogic 7 console
 with SUN JVM 1.3 not showing any GC (just horizontal line).

 -Fuad
 http://www.linkedin.com/in/liferay




 


   


-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel

Ok. I will try with the concurrent low pause collector and let you know
the results.
On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood wun...@wunderwood.orgwrote:

 As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low
 pause collector is only in the Sun JVM.

 I just found this excellent article about the various IBM GC options for a
 Lucene application with a 100GB heap:


 http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
 _h.html

 wunder

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Friday, September 25, 2009 10:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 Walter Underwood wrote:
  30ms is not better or worse than 1s until you look at the service
  requirements. For many applications, it is worth dedicating 10% of your
  processing time to GC if that makes the worst-case pause short.
 
  On the other hand, my experience with the IBM JVM was that the maximum
 query
  rate was 2-3X better with the concurrent generational GC compared to any
 of
  their other GC algorithms, so we got the best throughput along with the
  shortest pauses.
 
 With which collector? Since the very early JVM's, all GC is generational.
 Most of the collectors (other than the Serial Collector) also work
 concurrently.
 By default, they are concurrent on different generations, but you can
 add concurrency
 to the other generation with each now too.
  Solr garbage generation (for queries) seems to have two major components:
  per-request garbage and cache evictions. With a generational collector,
  these two are handled by separate parts of the collector.
 Different parts of the collector? Its a different collector depending on
 the generation.
 The young generation is collected with a copy collector. This is because
 almost all the objects
 in the young generation are likely dead, and a copy collector only needs
 to visit live objects. So
 its very efficient. The tenured generation uses something more along the
 lines of mark and sweep or mark
 and compact.
   Per-request
  garbage should completely fit in the short-term heap (nursery), so that
 it
  can be collected rapidly and returned to use for further requests. If the
  nursery is too small, the per-request allocations will be made in tenured
  space and sit there until the next major GC. Cache evictions are almost
  always in long-term storage (tenured space) because an LRU algorithm
  guarantees that the garbage will be old.
 
  Check the growth rate of tenured space (under constant load, of course)
  while increasing the size of the nursery. That rate should drop when the
  nursery gets big enough, then not drop much further as it is increased
 more.
 
  After that, reduce the size of tenured space until major GCs start
 happening
  too often (a judgment call). A bigger tenured space means longer major
 GCs
  and thus longer pauses, so you don't want it oversized by too much.
 
 With the concurrent low pause collector, the goal is to avoid major
 collections,
 by collecting *before* the tenured space is filled. If you you are
 getting major collections,
 you need to tune your settings - the whole point of that collector is to
 avoid major
 collections, and do almost all of the work while your application is not
 paused. There are
 still 2 brief pauses during the collection, but they should not be
 significant at all.
  Also check the hit rates of your caches. If the hit rate is low, say 20%
 or
  less, make that cache much bigger or set it to zero. Either one will
 reduce
  the number of cache evictions. If you have an HTTP cache in front of
 Solr,
  zero may be the right choice, since the HTTP cache is cherry-picking the
  easily cacheable requests.
 
  Note that a commit nearly doubles the memory required, because you have
 two
  live Searcher objects with all their caches. Make sure you have headroom
 for
  a commit.
 
  If you want to test the tenured space usage, you must test with real
 world
  queries. Those are the only way to get accurate cache eviction rates.
 
  wunder
 
  -Original Message-
  From: Jonathan Ariel [mailto:ionat...@gmail.com]
  Sent: Friday, September 25, 2009 9:34 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr and Garbage Collection
 
  BTW why making them equal will lower the frequency of GC?
 
  On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
 
  Bigger heaps lead to bigger GC pauses in general.
 
  Opposite viewpoint:
  1sec GC happening once an hour is MUCH BETTER than 30ms GC
 
  once-per-second.
 
  To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
 
  Use -server option.
 
  -server option of JVM is 'native CPU code', I remember WebLogic 7
 console
  with SUN JVM 1.3 not showing any GC (just horizontal line).
 
  -Fuad
  http://www.linkedin.com/in/liferay
 
 
 
 
 
 
 
 


 --
 - Mark

 http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

My bad - later, it looks as if your giving general advice, and thats
what I took issue with.

Any Collector that is not doing generational collection is essentially
from the dark ages and shouldn't be used.

Any Collector that doesn't have concurrent options, unless possibly your
running a tiny app (under 100MB of RAM), or only have a single CPU, is
also dark ages, and not fit for a server environement.

I havn't kept up with IBM's JVM, but it sounds like they are well behind
Sun in GC then.

- Mark

Walter Underwood wrote:
 As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low
 pause collector is only in the Sun JVM.

 I just found this excellent article about the various IBM GC options for a
 Lucene application with a 100GB heap:

 http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
 _h.html

 wunder

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com] 
 Sent: Friday, September 25, 2009 10:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 Walter Underwood wrote:
   
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum
 
 query
   
 rate was 2-3X better with the concurrent generational GC compared to any
 
 of
   
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
   
 
 With which collector? Since the very early JVM's, all GC is generational.
 Most of the collectors (other than the Serial Collector) also work
 concurrently.
 By default, they are concurrent on different generations, but you can
 add concurrency
 to the other generation with each now too.
   
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector.
 
 Different parts of the collector? Its a different collector depending on
 the generation.
 The young generation is collected with a copy collector. This is because
 almost all the objects
 in the young generation are likely dead, and a copy collector only needs
 to visit live objects. So
 its very efficient. The tenured generation uses something more along the
 lines of mark and sweep or mark
 and compact.
   
  Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased
 
 more.
   
 After that, reduce the size of tenured space until major GCs start
 
 happening
   
 too often (a judgment call). A bigger tenured space means longer major
 
 GCs
   
 and thus longer pauses, so you don't want it oversized by too much.
   
 
 With the concurrent low pause collector, the goal is to avoid major
 collections,
 by collecting *before* the tenured space is filled. If you you are
 getting major collections,
 you need to tune your settings - the whole point of that collector is to
 avoid major
 collections, and do almost all of the work while your application is not
 paused. There are
 still 2 brief pauses during the collection, but they should not be
 significant at all.
   
 Also check the hit rates of your caches. If the hit rate is low, say 20%
 
 or
   
 less, make that cache much bigger or set it to zero. Either one will
 
 reduce
   
 the number of cache evictions. If you have an HTTP cache in front of Solr,
 zero may be the right choice, since the HTTP cache is cherry-picking the
 easily cacheable requests.

 Note that a commit nearly doubles the memory required, because you have
 
 two
   
 live Searcher objects with all their caches. Make sure you have headroom
 
 for
   
 a commit.

 If you want to test the tenured space usage, you must test with real world
 queries. Those are the only way to get accurate cache eviction rates.

 wunder

 -Original Message-
 From: Jonathan Ariel [mailto:ionat...@gmail.com] 
 Sent: Friday, September 25, 2009 9:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 BTW why making them equal will lower the frequency of GC?

 On 9/25/09, Fuad Efendi f...@efendi.ca wrote:
   
 
 Bigger heaps lead to bigger GC pauses in general.
   
 
 Opposite viewpoint:
 1sec GC happening once an hour

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood

For batch-oriented computing, like Hadoop, the most efficient GC is probably
a non-concurrent, non-generational GC. I doubt that there are many
batch-oriented applications of Solr, though.

The rest of the advice is intended to be general and it sounds like we agree
about sizing. If the nursery is not big enough, the tenured space will be
used for allocations that have a short lifetime and that will increase the
length and/or frequency of major collections.

Cache evictions are the interesting part, because they cause a constant rate
of tenured space garbage. In most many servers, you can get a big enough
nursery that major collections are very rare. That won't happen in Solr
because of cache evictions.

The IBM JVM is excellent. Their concurrent generational GC policy is
gencon.

wunder

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Friday, September 25, 2009 10:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Garbage Collection

My bad - later, it looks as if your giving general advice, and thats
what I took issue with.

Any Collector that is not doing generational collection is essentially
from the dark ages and shouldn't be used.

Any Collector that doesn't have concurrent options, unless possibly your
running a tiny app (under 100MB of RAM), or only have a single CPU, is
also dark ages, and not fit for a server environement.

I havn't kept up with IBM's JVM, but it sounds like they are well behind
Sun in GC then.

- Mark

Walter Underwood wrote:
 As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low
 pause collector is only in the Sun JVM.

 I just found this excellent article about the various IBM GC options for a
 Lucene application with a 100GB heap:


http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
 _h.html

 wunder

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com] 
 Sent: Friday, September 25, 2009 10:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 Walter Underwood wrote:
   
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum
 
 query
   
 rate was 2-3X better with the concurrent generational GC compared to any
 
 of
   
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
   
 
 With which collector? Since the very early JVM's, all GC is generational.
 Most of the collectors (other than the Serial Collector) also work
 concurrently.
 By default, they are concurrent on different generations, but you can
 add concurrency
 to the other generation with each now too.
   
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector.
 
 Different parts of the collector? Its a different collector depending on
 the generation.
 The young generation is collected with a copy collector. This is because
 almost all the objects
 in the young generation are likely dead, and a copy collector only needs
 to visit live objects. So
 its very efficient. The tenured generation uses something more along the
 lines of mark and sweep or mark
 and compact.
   
  Per-request
 garbage should completely fit in the short-term heap (nursery), so that
it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.

 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased
 
 more.
   
 After that, reduce the size of tenured space until major GCs start
 
 happening
   
 too often (a judgment call). A bigger tenured space means longer major
 
 GCs
   
 and thus longer pauses, so you don't want it oversized by too much.
   
 
 With the concurrent low pause collector, the goal is to avoid major
 collections,
 by collecting *before* the tenured space is filled. If you you are
 getting major collections,
 you need to tune your settings - the whole point of that collector is to
 avoid major
 collections, and do almost all of the work while your application is not
 paused. There are
 still 2 brief pauses during the collection, but they should not be
 significant at all.
   
 Also check the hit rates of your caches. If the hit rate is low, say 20%
 
 or
   
 less, make that cache much bigger or set it to zero

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

Walter Underwood wrote:
 For batch-oriented computing, like Hadoop, the most efficient GC is probably
 a non-concurrent, non-generational GC. 
Okay - for batch we somewhat agree I guess - if you can stand any length
of pausing, non concurrent can be nice, because you don't pay for thread
sync communication. Only with a small heap size though (less than 100MB
is what I've seen). You would pause the batch job while GC takes place.
If you have 8 processors, and you are pausing all of them to collect a
large heap using only 1 processor, that doesn't make much sense to me.
The thread communication pain will be far outweighed by using more
processors to do the collection faster, and not stop the world for
your batch job so long. Stopping your application dead in its tracks,
and then only using one of the available processors to collect a large
heap, while the rest sit idle, doesn't make much sense.

I also don't agree it ever really makes sense not to do generational
collection. What is your argument here? Generational collection is
**way** more efficient for short lived objects, which tend to be up to
98% of the objects in most applications. The only way I see that making
sense is if you have almost no short lived objects (which occurs in
what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non
generational approach anymore. It's just standard GC practice.
 I doubt that there are many
 batch-oriented applications of Solr, though.

 The rest of the advice is intended to be general and it sounds like we agree
 about sizing. If the nursery is not big enough, the tenured space will be
 used for allocations that have a short lifetime and that will increase the
 length and/or frequency of major collections.
   
Yes - I wasn't arguing with every point - I was picking and choosing :)
After the heap size, the size of the young generation is the most
important factor.
 Cache evictions are the interesting part, because they cause a constant rate
 of tenured space garbage. In most many servers, you can get a big enough
 nursery that major collections are very rare. That won't happen in Solr
 because of cache evictions.

 The IBM JVM is excellent. Their concurrent generational GC policy is
 gencon.
   
Yeah, I actually know very little about the IBM JVM, so I wasn't really
commenting. But from the info I gleaned here and on a couple quick web
searches, I'm not too impressed by it's GC.
 wunder

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com] 
 Sent: Friday, September 25, 2009 10:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 My bad - later, it looks as if your giving general advice, and thats
 what I took issue with.

 Any Collector that is not doing generational collection is essentially
 from the dark ages and shouldn't be used.

 Any Collector that doesn't have concurrent options, unless possibly your
 running a tiny app (under 100MB of RAM), or only have a single CPU, is
 also dark ages, and not fit for a server environement.

 I havn't kept up with IBM's JVM, but it sounds like they are well behind
 Sun in GC then.

 - Mark

 Walter Underwood wrote:
   
 As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low
 pause collector is only in the Sun JVM.

 I just found this excellent article about the various IBM GC options for a
 Lucene application with a 100GB heap:


 
 http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
   
 _h.html

 wunder

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com] 
 Sent: Friday, September 25, 2009 10:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr and Garbage Collection

 Walter Underwood wrote:
   
 
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.

 On the other hand, my experience with the IBM JVM was that the maximum
 
   
 query
   
 
 rate was 2-3X better with the concurrent generational GC compared to any
 
   
 of
   
 
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
   
 
   
 With which collector? Since the very early JVM's, all GC is generational.
 Most of the collectors (other than the Serial Collector) also work
 concurrently.
 By default, they are concurrent on different generations, but you can
 add concurrency
 to the other generation with each now too.
   
 
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector.
 
   
 Different parts of the collector? Its a different collector depending on
 the generation.
 The young generation is collected with a copy collector. This is because
 almost all the objects
 in the young

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel

Ok. I'll first change the GC and see if the time spent decreased. Than
I'll try increasing the heap as Fuad recommends.

On 9/25/09, Mark Miller markrmil...@gmail.com wrote:
 When we talk about Collectors, we are not just talking about
 collecting - whatever that means. There isn't really a collecting
 phase - the whole algorithm is garbage collecting - hence calling the
 different implementations collectors.

 Usually, fragmentation is dealt with using a mark-compact collector (or
 IBM has used a mark-sweep-compact collector).
 Copying collectors are not only super efficient at collecting young
 spaces, but they are also great for fragmentation - when you copy
 everything to the new space, you can remove any fragmentation. At the
 cost of double the space requirements though.

 So mark-compact is a compromise. First you mark whats reachable, then
 everything thats marked is copied/compacted to the bottom of the heap.
 Its all part of a collection though.

 Jonathan Ariel wrote:
 Maybe what's missing here is how did I get the 11%.I just ran solr with
 the
 following JVM params: -XX:+PrintGCApplicationConcurrentTime
 -XX:+PrintGCApplicationStoppedTime with that I can measure the amount of
 time the application run between collection pauses and the length of the
 collection pauses, respectively.
 I think that in this case the 11% is just for memory collection and not
 defragmentation... but I'm not 100% sure.

 On Fri, Sep 25, 2009 at 5:05 PM, Fuad Efendi f...@efendi.ca wrote:


 But again, GC is not just Garbage Collection as many in this thread
 think... it is also memory defragmentation which is much costly than
 collection just because it needs move somewhere _live_objects_ (and
 wait/lock till such objects get unlocked to be moved...) - obviously more
 memory helps...

 11% is extremely high.


 -Fuad
 http://www.linkedin.com/in/liferay



 -Original Message-
 From: Jonathan Ariel [mailto:ionat...@gmail.com]
 Sent: September-25-09 3:36 PM
 To: solr-user@lucene.apache.org
 Subject: Re: FW: Solr and Garbage Collection

 I'm not planning on lowering the heap. I just want to lower the time
 wasted on GC, which is 11% right now.So what I'll try is changing the

 GC

 to -XX:+UseConcMarkSweepGC

 On Fri, Sep 25, 2009 at 4:17 PM, Fuad Efendi f...@efendi.ca wrote:


 Mark,

 what if piece of code needs 10 contiguous Kb to load a document field?

 How

 locked memory pieces are optimized/moved (putting on hold almost whole
 application)?
 Lowering heap is _bad_ idea; we will have extremely frequent GC

 (optimize

 of
 live objects!!!) even if RAM is (theoretically) enough.

 -Fuad



 Faud, you didn't read the thread right.

 He is not having a problem with OOM. He got the OOM because he

 lowered

 the heap to try and help GC.

 He normally runs with a heap that can handle his FC.

 Please re-read the thread. You are confusing the tread.

 - Mark



 GC will frequently happen even if RAM is more than enough: in case

 if
 it

 is

 heavily sparse... so that have even more RAM!
 -Fuad









 --
 - Mark

 http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Grant Ingersoll



On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:


Hi to all!
Lately my solr servers seem to stop responding once in a while. I'm  
using

solr 1.3.
Of course I'm having more traffic on the servers.
So I logged the Garbage Collection activity to check if it's because  
of
that. It seems like 11% of the time the application runs, it is  
stopped

because of GC. And some times the GC takes up to 10 seconds!
Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel  
Xeon
servers. My index is around 10GB and I'm giving to the instances  
10GB of

RAM.

How can I check which is the GC that it is being used? If I'm right  
JVM
Ergonomics should use the Throughput GC, but I'm not 100% sure. Do  
you have

any recommendation on this?



As I said in Eteve's thread on JVM settings, some extra time spent on  
application design/debugging will save a whole lot of headache in  
Garbage Collection and trying to tune the gazillion different options  
available.  Ask yourself:  What is on the heap and does it need to be  
there?  For instance, do you, if you have them, really need sortable  
ints?   If your servers seem to come to a stop, I'm going to bet you  
have major collections going on.  Major collections in a production  
system are very bad.  They tend to happen right after commits in  
poorly tuned systems, but can also happen in other places if you let  
things build up due to really large heaps and/or things like really  
large cache settings.  I would pull up jConsole and have a look at  
what is happening when the pauses occur.  Is it a major collection?   
If so, then hook up a heap analyzer or a profiler and see what is on  
the heap around those times.  Then have a look at your schema/config,  
etc. and see if there are things that are memory intensive (sorting,  
faceting, excessively large filter caches).


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

Jonathan Ariel wrote:
 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
 any recommendation on this?

   
Just to straighten out this one too - Ergonomics doesn't use throughput
- throughput is the collector that allows Ergonomics ;)

And throughput is the default as long as your machine is detected as
server class.

But throughput is not great with large tenured spaces out of the box. It
only parallelizes the new space collection. You have to turn on an
option to get parallel tenured collection as well - which is essential
to scale to large heap sizes.

-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

Mark Miller wrote:
 Jonathan Ariel wrote:
   
 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
 any recommendation on this?

   
 
 Just to straighten out this one too - Ergonomics doesn't use throughput
 - throughput is the collector that allows Ergonomics ;)

 And throughput is the default as long as your machine is detected as
 server class.

 But throughput is not great with large tenured spaces out of the box. It
 only parallelizes the new space collection. You have to turn on an
 option to get parallel tenured collection as well - which is essential
 to scale to large heap sizes.

   
hmm - I'm not being totally accurate there - ergonomics is what detects
server and so makes throughput the default collector for a server
machine. But much of the GC ergonomics support only works with the
throughput collector. Kind of chicken and egg :)

-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

Thats a good point too - if you can reduce your need for such a large
heap, by all means, do so.

However, considering you already need at least 10GB or you get OOM, you
have a long way to go with that approach. Good luck :)

How many docs do you have ? I'm guessing its mostly FieldCache type
stuff, and thats the type of thing you can't really side step, unless
you give up the functionality thats using it.

Grant Ingersoll wrote:

 On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:

 Hi to all!
 Lately my solr servers seem to stop responding once in a while. I'm
 using
 solr 1.3.
 Of course I'm having more traffic on the servers.
 So I logged the Garbage Collection activity to check if it's because of
 that. It seems like 11% of the time the application runs, it is stopped
 because of GC. And some times the GC takes up to 10 seconds!
 Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
 servers. My index is around 10GB and I'm giving to the instances 10GB of
 RAM.

 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
 you have
 any recommendation on this?


 As I said in Eteve's thread on JVM settings, some extra time spent on
 application design/debugging will save a whole lot of headache in
 Garbage Collection and trying to tune the gazillion different options
 available.  Ask yourself:  What is on the heap and does it need to be
 there?  For instance, do you, if you have them, really need sortable
 ints?   If your servers seem to come to a stop, I'm going to bet you
 have major collections going on.  Major collections in a production
 system are very bad.  They tend to happen right after commits in
 poorly tuned systems, but can also happen in other places if you let
 things build up due to really large heaps and/or things like really
 large cache settings.  I would pull up jConsole and have a look at
 what is happening when the pauses occur.  Is it a major collection? 
 If so, then hook up a heap analyzer or a profiler and see what is on
 the heap around those times.  Then have a look at your schema/config,
 etc. and see if there are things that are memory intensive (sorting,
 faceting, excessively large filter caches).

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search



-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller

One more point and I'll stop - I've hit my email quota for the day ;)

While its a pain to have to juggle GC params and tune - when you require
a heap thats more than a gig or two, I personally believe its essential
to do so for good performance. The (default settings / ergonomics with
throughput) just don't cut it. Sad fact of life :) Luckily, you don't
generally have to do that much to get things nice - the number of
options is not that staggering, and you don't usually need to get into
most of them. Choosing the right collector, and tweaking a setting or
two can often be enough.

The most important to do with a large heap and the throughput collector
is to turn on parallel tenured collection. I've said it before, but it
really is key. At least if you have more than a processor or two -
which, for your sake, I hope you do :)

- Mark

Mark Miller wrote:
 Thats a good point too - if you can reduce your need for such a large
 heap, by all means, do so.

 However, considering you already need at least 10GB or you get OOM, you
 have a long way to go with that approach. Good luck :)

 How many docs do you have ? I'm guessing its mostly FieldCache type
 stuff, and thats the type of thing you can't really side step, unless
 you give up the functionality thats using it.

 Grant Ingersoll wrote:
   
 On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:

 
 Hi to all!
 Lately my solr servers seem to stop responding once in a while. I'm
 using
 solr 1.3.
 Of course I'm having more traffic on the servers.
 So I logged the Garbage Collection activity to check if it's because of
 that. It seems like 11% of the time the application runs, it is stopped
 because of GC. And some times the GC takes up to 10 seconds!
 Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
 servers. My index is around 10GB and I'm giving to the instances 10GB of
 RAM.

 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
 you have
 any recommendation on this?
   
 As I said in Eteve's thread on JVM settings, some extra time spent on
 application design/debugging will save a whole lot of headache in
 Garbage Collection and trying to tune the gazillion different options
 available.  Ask yourself:  What is on the heap and does it need to be
 there?  For instance, do you, if you have them, really need sortable
 ints?   If your servers seem to come to a stop, I'm going to bet you
 have major collections going on.  Major collections in a production
 system are very bad.  They tend to happen right after commits in
 poorly tuned systems, but can also happen in other places if you let
 things build up due to really large heaps and/or things like really
 large cache settings.  I would pull up jConsole and have a look at
 what is happening when the pauses occur.  Is it a major collection? 
 If so, then hook up a heap analyzer or a profiler and see what is on
 the heap around those times.  Then have a look at your schema/config,
 etc. and see if there are things that are memory intensive (sorting,
 faceting, excessively large filter caches).

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

 


   


-- 
- Mark

http://www.lucidimagination.com

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel

I have around 8M documents.
I set up my server to use a different collector and it seems like it
decreased from 11% to 4%, of course I need to wait a bit more because it is
just a 1 hour old log. But it seems like it is much better now.
I will tell you on Monday the results :)

On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats a good point too - if you can reduce your need for such a large
 heap, by all means, do so.

 However, considering you already need at least 10GB or you get OOM, you
 have a long way to go with that approach. Good luck :)

 How many docs do you have ? I'm guessing its mostly FieldCache type
 stuff, and thats the type of thing you can't really side step, unless
 you give up the functionality thats using it.

 Grant Ingersoll wrote:
 
  On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
 
  Hi to all!
  Lately my solr servers seem to stop responding once in a while. I'm
  using
  solr 1.3.
  Of course I'm having more traffic on the servers.
  So I logged the Garbage Collection activity to check if it's because of
  that. It seems like 11% of the time the application runs, it is stopped
  because of GC. And some times the GC takes up to 10 seconds!
  Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
  servers. My index is around 10GB and I'm giving to the instances 10GB of
  RAM.
 
  How can I check which is the GC that it is being used? If I'm right JVM
  Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
  you have
  any recommendation on this?
 
 
  As I said in Eteve's thread on JVM settings, some extra time spent on
  application design/debugging will save a whole lot of headache in
  Garbage Collection and trying to tune the gazillion different options
  available.  Ask yourself:  What is on the heap and does it need to be
  there?  For instance, do you, if you have them, really need sortable
  ints?   If your servers seem to come to a stop, I'm going to bet you
  have major collections going on.  Major collections in a production
  system are very bad.  They tend to happen right after commits in
  poorly tuned systems, but can also happen in other places if you let
  things build up due to really large heaps and/or things like really
  large cache settings.  I would pull up jConsole and have a look at
  what is happening when the pauses occur.  Is it a major collection?
  If so, then hook up a heap analyzer or a profiler and see what is on
  the heap around those times.  Then have a look at your schema/config,
  etc. and see if there are things that are memory intensive (sorting,
  faceting, excessively large filter caches).
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
  using Solr/Lucene:
  http://www.lucidimagination.com/search
 


 --
 - Mark

 http://www.lucidimagination.com

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi

Sorry for OFF-topic:
Create dummy Hello, World! JSP, use Tomcat, execute load-stress
simulator(s) from separate machine(s), and measure... don't forget to
allocate necessary thread pools in Tomcat (if you have to)...
Although such JSP doesn't use any memory, you will see how easy one can go
with 5000 TPS (or 'virtually' 5 concurrent users) on modern quad-cores
by simply allocating more memory (...GB) and more Tomcat threads. There is
threshold too... repeat it with HTTPD Workers (and threads), same result,
although it doesn't use any GC. More memory - more threads - more keep
alives per TCP...

However, 'theoretically' you need only 64Mb for Hello World :)))

49 matches

Mail list logo