Re: Solr 7.7.0 - Garbage Collection issue
Reverted back to 7.6.0 - same settings, but now I do not encounter the large CPU usage. -Joe On 2/12/2019 12:37 PM, Joe Obernberger wrote: Thank you Shawn. Yes, I used the settings off of your site. I've restarted the cluster and the CPU usage is back up again. Looking at it now, it doesn't appear to be GC related. Full log from one of the nodes that is pegging 13 CPU cores: http://lovehorsepower.com/solr_gc.log.0.current Thank you For the gceasy.io site - that is very slick! I'll use that in the future. I can try using the standard settings, but again - at this point it doesn't look GC related to me? -Joe On 2/12/2019 11:35 AM, Shawn Heisey wrote: On 2/12/2019 7:35 AM, Joe Obernberger wrote: Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks like it's in garbage collection. We did reduce our HDFS cache size from 11G to 6G, but other than that, no other parameters were changes. Your message included a small excerpt from the GC log. That is not helpful. We will need the entire GC log, possibly more than one log. The log or logs should fully cover the timeframe where the problem occurs. Full disclosure: Once obtained, I would use this website to analyze GC log data: http://gceasy.io Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Looks like you've chosen to use G1 settings very similar to what I put on my wiki page: https://wiki.apache.org/solr/ShawnHeisey#Current_experiments Those settings are not intended to be a canonical resource that everyone can use. Your heap size is different than what I was using when I worked on that, so you may need different settings. Have you considered not using your own GC tuning, letting Solr's start script handle that? With the limited information available, my initial guess is that you need a larger heap, that Java is spending all its time freeing up enough memory to keep the program running. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr 7.7.0 - Garbage Collection issue
Thank you Shawn. Yes, I used the settings off of your site. I've restarted the cluster and the CPU usage is back up again. Looking at it now, it doesn't appear to be GC related. Full log from one of the nodes that is pegging 13 CPU cores: http://lovehorsepower.com/solr_gc.log.0.current Thank you For the gceasy.io site - that is very slick! I'll use that in the future. I can try using the standard settings, but again - at this point it doesn't look GC related to me? -Joe On 2/12/2019 11:35 AM, Shawn Heisey wrote: On 2/12/2019 7:35 AM, Joe Obernberger wrote: Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks like it's in garbage collection. We did reduce our HDFS cache size from 11G to 6G, but other than that, no other parameters were changes. Your message included a small excerpt from the GC log. That is not helpful. We will need the entire GC log, possibly more than one log. The log or logs should fully cover the timeframe where the problem occurs. Full disclosure: Once obtained, I would use this website to analyze GC log data: http://gceasy.io Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Looks like you've chosen to use G1 settings very similar to what I put on my wiki page: https://wiki.apache.org/solr/ShawnHeisey#Current_experiments Those settings are not intended to be a canonical resource that everyone can use. Your heap size is different than what I was using when I worked on that, so you may need different settings. Have you considered not using your own GC tuning, letting Solr's start script handle that? With the limited information available, my initial guess is that you need a larger heap, that Java is spending all its time freeing up enough memory to keep the program running. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr 7.7.0 - Garbage Collection issue
On 2/12/2019 7:35 AM, Joe Obernberger wrote: Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks like it's in garbage collection. We did reduce our HDFS cache size from 11G to 6G, but other than that, no other parameters were changes. Your message included a small excerpt from the GC log. That is not helpful. We will need the entire GC log, possibly more than one log. The log or logs should fully cover the timeframe where the problem occurs. Full disclosure: Once obtained, I would use this website to analyze GC log data: http://gceasy.io Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Looks like you've chosen to use G1 settings very similar to what I put on my wiki page: https://wiki.apache.org/solr/ShawnHeisey#Current_experiments Those settings are not intended to be a canonical resource that everyone can use. Your heap size is different than what I was using when I worked on that, so you may need different settings. Have you considered not using your own GC tuning, letting Solr's start script handle that? With the limited information available, my initial guess is that you need a larger heap, that Java is spending all its time freeing up enough memory to keep the program running. Thanks, Shawn
RE: Solr and Garbage Collection
I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). I suggested absolute opposite; please note also that as small as possible does not have any meaning in multiuser environment of Tomcat. It depends on query types (10 documents per request? OR, may be 1???) AND it depends on average server loading (one concurrent request? Or, may be 200 threads trying to deal with 2000 concurrent requests?) AND it depends on whether it is Master (used for updates - parses tons of docs in a single file???) - and it depends on unpredictable memory fragmentation - it all depends on use case too(!!!), additionally to schema / index size. Please note also, such staff depends on JVM vendor too: what if it precompiles everything into CPU native code (including memory dealloc after each call)? Some do! -Fuad http://www.linkedin.com/in/liferay ...but 'core' constantly disagrees with me :)
RE: Solr and Garbage Collection
Master-Slave replica: new caches will be warmedprepopulated _before_ making new IndexReader available for _new_ requests and _before_ discarding old one - it means that theoretical sizing for FieldCache (which is defined by number of docs in an index and cardinality of a field) should be doubled... of course we need to play with GC options too for performance tuning (mostly) I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). I suggested absolute opposite; please note also that as small as possible does not have any meaning in multiuser environment of Tomcat. It depends on query types (10 documents per request? OR, may be 1???) AND it depends on average server loading (one concurrent request? Or, may be 200 threads trying to deal with 2000 concurrent requests?) AND it depends on whether it is Master (used for updates - parses tons of docs in a single file???) - and it depends on unpredictable memory fragmentation - it all depends on use case too(!!!), additionally to schema / index size. Please note also, such staff depends on JVM vendor too: what if it precompiles everything into CPU native code (including memory dealloc after each call)? Some do! -Fuad http://www.linkedin.com/in/liferay ...but 'core' constantly disagrees with me :)
Re: Solr and Garbage Collection
Another option of course, if you're using a recent version of Java 6: try out the beta-ish, unsupported unless you pay, G1 garbage collector. I've only recently started playing with it, but its supposed to be much better than CMS. Its supposedly got much better throughput, its much better at dealing with fragmentation issues (CMS is actually pretty bad with fragmentation come to find out), and overall its just supposed to be a very nice leap ahead in GC. Havn't had a chance to play with it much myself, but its supposed to be fantastic. A whole new approach to generational collection for Sun, and much closer to the real time GC's available from some other vendors. Mark Miller wrote: siping liu wrote: Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because
Re: Solr and Garbage Collection
SUN has recently clarify the issue regarding unsupported unless you pay for the G1 garbage collector. Here is the updated release of Java 6 update 14: http://java.sun.com/javase/6/webnotes/6u14.html G1 will be part of Java 7, fully supported without pay. The version included in Java 6 update 14 is a beta release. Since it is beta, SUN does not recommend using it unless you have a support contract because as with any beta software there will be bugs. Non paying customers may very well have to wait for the official version in Java 7 for bug fixes. Here is more info on the G1 garbage collector: http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp Bill On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller markrmil...@gmail.com wrote: Another option of course, if you're using a recent version of Java 6: try out the beta-ish, unsupported unless you pay, G1 garbage collector. I've only recently started playing with it, but its supposed to be much better than CMS. Its supposedly got much better throughput, its much better at dealing with fragmentation issues (CMS is actually pretty bad with fragmentation come to find out), and overall its just supposed to be a very nice leap ahead in GC. Havn't had a chance to play with it much myself, but its supposed to be fantastic. A whole new approach to generational collection for Sun, and much closer to the real time GC's available from some other vendors. Mark Miller wrote: siping liu wrote: Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query
Re: Solr and Garbage Collection
Ah, yes - thanks for the clarification. Didn't pay attention to how ambiguously I was using supported there :) Bill Au wrote: SUN has recently clarify the issue regarding unsupported unless you pay for the G1 garbage collector. Here is the updated release of Java 6 update 14: http://java.sun.com/javase/6/webnotes/6u14.html G1 will be part of Java 7, fully supported without pay. The version included in Java 6 update 14 is a beta release. Since it is beta, SUN does not recommend using it unless you have a support contract because as with any beta software there will be bugs. Non paying customers may very well have to wait for the official version in Java 7 for bug fixes. Here is more info on the G1 garbage collector: http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp Bill On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller markrmil...@gmail.com wrote: Another option of course, if you're using a recent version of Java 6: try out the beta-ish, unsupported unless you pay, G1 garbage collector. I've only recently started playing with it, but its supposed to be much better than CMS. Its supposedly got much better throughput, its much better at dealing with fragmentation issues (CMS is actually pretty bad with fragmentation come to find out), and overall its just supposed to be a very nice leap ahead in GC. Havn't had a chance to play with it much myself, but its supposed to be fantastic. A whole new approach to generational collection for Sun, and much closer to the real time GC's available from some other vendors. Mark Miller wrote: siping liu wrote: Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look
Re: Solr and Garbage Collection
the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder _ Bing™ brings you maps, menus, and reviews organized in one place. Try it now. http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1 -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder _ Bing™ brings you maps, menus, and reviews organized in one place. Try it now. http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1 -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit
Re: Solr and Garbage Collection
and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm
RE: Solr and Garbage Collection
Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder _ Bing™ brings you maps, menus, and reviews organized in one place. Try it now. http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1
Re: Solr and Garbage Collection
siping liu wrote: Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K-66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K-162118K(175104K), 4.0975124 secs] 228336K-162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may
Re: Solr and Garbage Collection
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c
Re: Solr and Garbage Collection
Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x
Re: Solr and Garbage Collection
How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com wrote: Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400):
Re: Solr and Garbage Collection
Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:filelog GC status to a file with time stamps Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jonathan Ariel ionat...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, September 28, 2009 4:49:03 PM Subject: Re: Solr and Garbage Collection How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
|-verbose:gc | |[GC 325407K-83000K(776768K), 0.2300771 secs] [GC 325816K-83372K(776768K), 0.2454258 secs] [Full GC 267628K-83769K(776768K), 1.8479984 secs]| Additional details with: |-XX:+PrintGCDetails| |[GC [DefNew: 64575K-959K(64576K), 0.0457646 secs] 196016K-133633K(261184K), 0.0459067 secs] And timestamps with: ||-XX:+PrintGCTimeStamps| |111.042: [GC 111.042: [DefNew: 8128K-8128K(8128K), 0.505 secs]111.042: [Tenured: 18154K-2311K(24576K), 0.1290354 secs] 26282K-2311K(32704K), 0.1293306 secs] | Jonathan Ariel wrote: How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com wrote: Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million
Re: Solr and Garbage Collection
Another good option. Here is a comparison of the commands I replied with and this one: http://docs.hp.com/en/5992-5899/ch06s02.html Very similar. Otis Gospodnetic wrote: Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:filelog GC status to a file with time stamps Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jonathan Ariel ionat...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, September 28, 2009 4:49:03 PM Subject: Re: Solr and Garbage Collection How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields
Re: Solr and Garbage Collection
One way to track expensive is to look at the query time, QTime, in the solr log. There are a couple of tools for analyzing gc logs: http://www.tagtraum.com/gcviewer.html https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER They will give you frequency and duration of minor and major collection. On a multi-processor/core system with CPU cycles to spare, using the concurrent collector will reduce (may even eliminate) major collection. The trade off is that CPU utilization on the system will go up. When I tried it with one of my Java app, the system utilization went up so much under heavy load that it reduced the overall throughput of my app. You milage may varies. You will have to measure it for your app to see for yourself. Bill On Mon, Sep 28, 2009 at 4:49 PM, Jonathan Ariel ionat...@gmail.com wrote: How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller markrmil...@gmail.com wrote: Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=N). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au bill.w...@gmail.com wrote: You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better
Re: Solr and Garbage Collection
Yes, it seems like a bug. I will update my JVM, try again and let you know the results :) On 9/26/09, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c 0x41cac650: 41cac680 2b4e0fa1d16e 0x41cac660: 5bc7d6d0 0x41cac670: 0002 2b4e0fd8c740 0x41cac680: 41cac6c0 2b4e0fa74640 0x41cac690: 41cac6b0 5bc7d6d0 0x41cac6a0: 0002 2b4e0fd8c740 0x41cac6b0: 0001 2b4e0fd8c740 0x41cac6c0: 41cac700 2b4e0f9a52da 0x41cac6d0: bfc0 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 0x41cac6f0: 2b4e0fd8c740 0001 0x41cac700: 41cac750 2b4e0f6feb80 0x41cac710: 449dae1d9ae42358 3ff0cccd 0x41cac720: 2aad289aa680 0001 0x41cac730: 41cac780 0x41cac740: 0001 5bc7d6d0 Instructions: (pc=0x2b4e0f69ea2a) 0x2b4e0f69ea1a: 89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10 0x2b4e0f69ea2a: 48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48
Re: Solr and Garbage Collection
Well.. it is strange that when I use the default GC I don't get any errors. If I'm so close to run out of memory I should see those OOM exceptions as well with the standard GC.BTW I'm faceting on around 13 fields and my total number of unique values is around 3. One of the fields with the biggest amount of unique values has almost 16000 unique values. On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi f...@efendi.ca wrote: Mark, Nothing against orange-hat :) Nothing against GC tuning; but if SOLR needs application-specific settings it should be well-documented. GC-tuning: for instance, we need it for 'realtime' Online Trading applications. However, even Online Banking doesn't need; primary reason - GC must happen 'outside of current transaction', GC 'must be predictable', and (for instance) Oracle/BEA JRockit has specific 'realtime' version for that... Does SOLR need that? Having load-stress simulator (multithreaded!!!) will definitely help to predict any possible bottleneck... it's even better to write it from scratch (depends on schema!), by sending random requests to SOLR in-parallel... instead of waiting when FieldCache tries to add new FieldImpl to cache (unpredictable!) Tomcat is multithreaded; what if end-users need to load 1000s large documents (in parallel! 1000s concurrent users), can you predict memory requirements and GC options without application-specific knowledge? What about new SOLR-Caches warming up? -Fuad -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: September-27-09 2:46 PM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection If he needed double the RAM, he'd likely know by now :) The JVM likes to throw OOM exceptions when you need more RAM. Until it does - thats an odd path to focus on. There has been no indication he has ever seen an OOM with his over 10 GB heap. It sounds like he has run Solr in his environment for quite a long time - after running for that long, until he gets an OOM, its about as good as chasing ghost to worry about it. I like to think of GC tuning as orange-hat. Mostly because I like the color orange. Fuad Efendi wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. All this 'black-hat' GC tuning and 'fast' object moving (especially objects accessing by some thread during GC-defragmentation) - try to use multithreaded load-stress tools (at least 100 requests in-parallel) and see that you need at least double memory if 12Gb is threshold for your FieldCache (largest objects) Also, don't trust this counters: So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. Stopped? Of course, locking/unlocking in order to move objects currently accessesd in multiuser-multithreaded Tomcat... you can easily create crash scenario proving that latest-greatest JVMs are buggy too. Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to avoid OOM, you need to double it (in order to warm new cash instances on index replica / update). http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Right... when I increased it to 12GB all OOM just disappear. And all the tests are being run on the live environment and for several hours, so it is real enough :)As soon as I update JVM and test again the GC I will let you know. If you think I can run another test meanwhile just let me know. On Sun, Sep 27, 2009 at 5:05 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Well.. it is strange that when I use the default GC I don't get any errors. Not so strange - it's different code. The bug is Likely in the low pause collector and not the serial collector. If I'm so close to run out of memory I should see those OOM exceptions as well with the standard GC. Those? Your not seeing any that you mentioned unless you lower your heap? BTW I'm faceting on around 13 fields and my total number of unique values is around 3. One of the fields with the biggest amount of unique values has almost 16000 unique values. On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi f...@efendi.ca wrote: Mark, Nothing against orange-hat :) Nothing against GC tuning; but if SOLR needs application-specific settings it should be well-documented. GC-tuning: for instance, we need it for 'realtime' Online Trading applications. However, even Online Banking doesn't need; primary reason - GC must happen 'outside of current transaction', GC 'must be predictable', and (for instance) Oracle/BEA JRockit has specific 'realtime' version for that... Does SOLR need that? Having load-stress simulator (multithreaded!!!) will definitely help to predict any possible bottleneck... it's even better to write it from scratch (depends on schema!), by sending random requests to SOLR in-parallel... instead of waiting when FieldCache tries to add new FieldImpl to cache (unpredictable!) Tomcat is multithreaded; what if end-users need to load 1000s large documents (in parallel! 1000s concurrent users), can you predict memory requirements and GC options without application-specific knowledge? What about new SOLR-Caches warming up? -Fuad -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: September-27-09 2:46 PM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection If he needed double the RAM, he'd likely know by now :) The JVM likes to throw OOM exceptions when you need more RAM. Until it does - thats an odd path to focus on. There has been no indication he has ever seen an OOM with his over 10 GB heap. It sounds like he has run Solr in his environment for quite a long time - after running for that long, until he gets an OOM, its about as good as chasing ghost to worry about it. I like to think of GC tuning as orange-hat. Mostly because I like the color orange. Fuad Efendi wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. All this 'black-hat' GC tuning and 'fast' object moving (especially objects accessing by some thread during GC-defragmentation) - try to use multithreaded load-stress tools (at least 100 requests in-parallel) and see that you need at least double memory if 12Gb is threshold for your FieldCache (largest objects) Also, don't trust this counters: So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. Stopped? Of course, locking/unlocking in order to move objects currently accessesd in multiuser-multithreaded Tomcat... you can easily create crash scenario proving that latest-greatest JVMs are buggy too. Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to avoid OOM, you need to double it (in order to warm new cash instances on index replica / update). http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c 0x41cac650: 41cac680 2b4e0fa1d16e 0x41cac660: 5bc7d6d0 0x41cac670: 0002 2b4e0fd8c740 0x41cac680: 41cac6c0 2b4e0fa74640 0x41cac690: 41cac6b0 5bc7d6d0 0x41cac6a0: 0002 2b4e0fd8c740 0x41cac6b0: 0001 2b4e0fd8c740 0x41cac6c0: 41cac700 2b4e0f9a52da 0x41cac6d0: bfc0 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0
Re: Solr and Garbage Collection
Jonathan Ariel wrote: I have around 8M documents. Thats actually not so bad - I take it you are faceting/sorting on quite a few unique fields? I set up my server to use a different collector and it seems like it decreased from 11% to 4%, of course I need to wait a bit more because it is just a 1 hour old log. But it seems like it is much better now. I will tell you on Monday the results :) Are you still seeing major collections then? (eg the tenured space hits its limit) You might be able to get even better. On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller markrmil...@gmail.com wrote: Thats a good point too - if you can reduce your need for such a large heap, by all means, do so. However, considering you already need at least 10GB or you get OOM, you have a long way to go with that approach. Good luck :) How many docs do you have ? I'm guessing its mostly FieldCache type stuff, and thats the type of thing you can't really side step, unless you give up the functionality thats using it. Grant Ingersoll wrote: On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c 0x41cac650: 41cac680 2b4e0fa1d16e 0x41cac660: 5bc7d6d0 0x41cac670: 0002 2b4e0fd8c740 0x41cac680: 41cac6c0 2b4e0fa74640 0x41cac690: 41cac6b0 5bc7d6d0 0x41cac6a0: 0002 2b4e0fd8c740 0x41cac6b0: 0001 2b4e0fd8c740 0x41cac6c0: 41cac700 2b4e0f9a52da 0x41cac6d0: bfc0 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 0x41cac6f0: 2b4e0fd8c740 0001 0x41cac700: 41cac750 2b4e0f6feb80 0x41cac710: 449dae1d9ae42358 3ff0cccd 0x41cac720: 2aad289aa680 0001 0x41cac730: 41cac780 0x41cac740: 0001 5bc7d6d0 Instructions: (pc=0x2b4e0f69ea2a) 0x2b4e0f69ea1a: 89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10 0x2b4e0f69ea2a: 48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48 Stack: [0x41bad000,0x41cae000], sp=0x41cac550, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x265a2a] V [libjvm.so+0x4cac5b] V [libjvm.so+0x2f5388] V [libjvm.so+0x4cab7f] V [libjvm.so+0x4f8548] V [libjvm.so+0x4cad1a] V [libjvm.so+0x4cb11c] V [libjvm.so+0x5e416e] V [libjvm.so+0x63b640] V [libjvm.so+0x56c2da] V [libjvm.so+0x2c5b80] V [libjvm.so+0x2c8866] V [libjvm.so+0x2c7f10] V [libjvm.so+0x2551ba] V [libjvm.so+0x254a6a] V [libjvm.so+0x254778] V [libjvm.so+0x2c579c] V [libjvm.so+0x23502a] V [libjvm.so+0x2c5b0e] V [libjvm.so+0x661a5e] V [libjvm.so+0x66e48a] V [libjvm.so+0x66da32] V [libjvm.so+0x66dcb4] V [libjvm.so+0x66d7ae] V [libjvm.so+0x50628a] VM_Operation (0x4076bd20): GenCollectForAllocation, mode: safepoint, requested by thread 0x5c42d800 --- P R O C E S S --- Java Threads: ( = current thread ) 0x5c466400 JavaThread btpool0-502 [_thread_blocked, id=4508, stack(0x46332000,0x46433000)] 0x5c2a2400 JavaThread btpool0-501 [_thread_blocked, id=4507, stack(0x428f8000,0x429f9000)] 0x5c0fec00 JavaThread btpool0-500 [_thread_blocked, id=4506, stack(0x43e0d000,0x43f0e000)] 0x5c2ce400 JavaThread btpool0-498 [_thread_blocked, id=4504, stack(0x42dfd000,0x42efe000)] 0x5be69000 JavaThread btpool0-497 [_thread_blocked, id=4503, stack(0x45f2e000,0x4602f000)] 0x5c30e000 JavaThread btpool0-496 [_thread_blocked, id=4251,
Re: Solr and Garbage Collection
Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c 0x41cac650: 41cac680 2b4e0fa1d16e 0x41cac660: 5bc7d6d0 0x41cac670: 0002 2b4e0fd8c740 0x41cac680: 41cac6c0 2b4e0fa74640 0x41cac690: 41cac6b0 5bc7d6d0 0x41cac6a0: 0002 2b4e0fd8c740 0x41cac6b0: 0001 2b4e0fd8c740 0x41cac6c0: 41cac700 2b4e0f9a52da 0x41cac6d0: bfc0 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 0x41cac6f0: 2b4e0fd8c740 0001 0x41cac700: 41cac750 2b4e0f6feb80 0x41cac710: 449dae1d9ae42358 3ff0cccd 0x41cac720: 2aad289aa680 0001 0x41cac730: 41cac780 0x41cac740: 0001 5bc7d6d0 Instructions: (pc=0x2b4e0f69ea2a) 0x2b4e0f69ea1a: 89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10 0x2b4e0f69ea2a: 48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48 Stack: [0x41bad000,0x41cae000], sp=0x41cac550, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
Re: Solr and Garbage Collection
Also, in case the info might help track something down: Its pretty darn odd that both your survivor spaces are full. I've never seen that ever in one of these dumps. Always one is empty. When one is filled, its moved to the other. Then back. And forth. For a certain number of times until its moved into the tenured space. Both being filled like that really seems like a bug to me - I've looked over tons of the dumps in the past (random ones online), and I have never seen one of the survivor spaces not empty. Mark Miller wrote: Jonathan Ariel wrote: Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
Hi, Have you looked at tuning the garbage collection ? Take a look at the following articles http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot -camp-draft/ http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html Changing to the concurrent or throughput collector should help with the long pauses. Colin. -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 11:37 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Solr and Garbage Collection Right, now I'm giving it 12GB of heap memory. If I give it less (10GB) it throws the following exception: Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 52) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 67) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 07) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :70) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 03) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42) On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel ionat...@gmail.com wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. Bigger heaps lead to bigger GC pauses in general. Do you mean that you are giving the JVM a 10GB heap? Were you getting OOM exceptions with a smaller heap? -Yonik http://www.lucidimagination.com
RE: Solr and Garbage Collection
Give it even more memory. Lucene FieldCache is used to store non-tokenized single-value non-boolean (DocumentId - FieldValue) pairs, and it is used (in-full!) for instance for sorting query results. So that if you have 100,000,000 documents with specific heavily distributed field values (cardinality is high! Size is 100bytes!) you need 10,000,000,000 bytes for just this instance of FieldCache. GC does not play any role. FieldCache won't be GC-collected. -Fuad http://www.linkedin.com/in/liferay -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: September-25-09 11:37 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Solr and Garbage Collection Right, now I'm giving it 12GB of heap memory. If I give it less (10GB) it throws the following exception: Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61 ) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 52 ) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 67 ) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 07 ) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :7 0) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand le r.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. ja va:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 03 ) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 23 2) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .j ava:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ec tion.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 83 5) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42 ) On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel ionat...@gmail.com wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. Bigger heaps lead to bigger GC pauses in general. Do you mean that you are giving the JVM a 10GB heap? Were you getting OOM exceptions with a smaller heap? -Yonik http://www.lucidimagination.com
RE: Solr and Garbage Collection
You are saying that I should give more memory than 12GB? Yes. Look at this: SEVERE: java.lang.OutOfMemoryError: Java heap space org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61 ) It can't find few (!!!) contiguous bytes for .createValue(...) It can't add (Field Value, Document ID) pair to an array. GC tuning won't help in this specific case... May be SOLR/Lucene core developers may WARM FieldCache at IndexReader opening time, in the future... to have early OOM... Avoiding faceting (and sorting) on such field will only postpone OOM to unpredictable date/time... -Fuad http://www.linkedin.com/in/liferay
Re: Solr and Garbage Collection
It won't really - it will just keep the JVM from wasting time resizing the heap on you. Since you know you need so much RAM anyway, no reason not to just pin it at what you need. Not going to help you much with GC though. Jonathan Ariel wrote: BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
-server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). Not sure what that is all about either. -server and -client are just two different versions of hotspot. The -server version is optimized for long running applications - it starts slower, and over time, it learns about your app and makes good throughput optimizations. The -client hotspot version works faster quicker, and does concentrate more on response than throughput. Better for desktop apps. -server is better for long lived server apps. Generally. Mark Miller wrote: It won't really - it will just keep the JVM from wasting time resizing the heap on you. Since you know you need so much RAM anyway, no reason not to just pin it at what you need. Not going to help you much with GC though. Jonathan Ariel wrote: BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay
Re: Solr and Garbage Collection
Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. With the concurrent low pause collector, the goal is to avoid major collections, by collecting *before* the tenured space is filled. If you you are getting major collections, you need to tune your settings - the whole point of that collector is to avoid major collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low pause collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. With the concurrent low pause collector, the goal is to avoid major collections, by collecting *before* the tenured space is filled. If you you are getting major collections, you need to tune your settings - the whole point of that collector is to avoid major collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Ok. I will try with the concurrent low pause collector and let you know the results. On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood wun...@wunderwood.orgwrote: As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low pause collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. With the concurrent low pause collector, the goal is to avoid major collections, by collecting *before* the tenured space is filled. If you you are getting major collections, you need to tune your settings - the whole point of that collector is to avoid major collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
My bad - later, it looks as if your giving general advice, and thats what I took issue with. Any Collector that is not doing generational collection is essentially from the dark ages and shouldn't be used. Any Collector that doesn't have concurrent options, unless possibly your running a tiny app (under 100MB of RAM), or only have a single CPU, is also dark ages, and not fit for a server environement. I havn't kept up with IBM's JVM, but it sounds like they are well behind Sun in GC then. - Mark Walter Underwood wrote: As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low pause collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. With the concurrent low pause collector, the goal is to avoid major collections, by collecting *before* the tenured space is filled. If you you are getting major collections, you need to tune your settings - the whole point of that collector is to avoid major collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi f...@efendi.ca wrote: Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour
RE: Solr and Garbage Collection
For batch-oriented computing, like Hadoop, the most efficient GC is probably a non-concurrent, non-generational GC. I doubt that there are many batch-oriented applications of Solr, though. The rest of the advice is intended to be general and it sounds like we agree about sizing. If the nursery is not big enough, the tenured space will be used for allocations that have a short lifetime and that will increase the length and/or frequency of major collections. Cache evictions are the interesting part, because they cause a constant rate of tenured space garbage. In most many servers, you can get a big enough nursery that major collections are very rare. That won't happen in Solr because of cache evictions. The IBM JVM is excellent. Their concurrent generational GC policy is gencon. wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:31 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection My bad - later, it looks as if your giving general advice, and thats what I took issue with. Any Collector that is not doing generational collection is essentially from the dark ages and shouldn't be used. Any Collector that doesn't have concurrent options, unless possibly your running a tiny app (under 100MB of RAM), or only have a single CPU, is also dark ages, and not fit for a server environement. I havn't kept up with IBM's JVM, but it sounds like they are well behind Sun in GC then. - Mark Walter Underwood wrote: As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low pause collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. With the concurrent low pause collector, the goal is to avoid major collections, by collecting *before* the tenured space is filled. If you you are getting major collections, you need to tune your settings - the whole point of that collector is to avoid major collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero
Re: Solr and Garbage Collection
Walter Underwood wrote: For batch-oriented computing, like Hadoop, the most efficient GC is probably a non-concurrent, non-generational GC. Okay - for batch we somewhat agree I guess - if you can stand any length of pausing, non concurrent can be nice, because you don't pay for thread sync communication. Only with a small heap size though (less than 100MB is what I've seen). You would pause the batch job while GC takes place. If you have 8 processors, and you are pausing all of them to collect a large heap using only 1 processor, that doesn't make much sense to me. The thread communication pain will be far outweighed by using more processors to do the collection faster, and not stop the world for your batch job so long. Stopping your application dead in its tracks, and then only using one of the available processors to collect a large heap, while the rest sit idle, doesn't make much sense. I also don't agree it ever really makes sense not to do generational collection. What is your argument here? Generational collection is **way** more efficient for short lived objects, which tend to be up to 98% of the objects in most applications. The only way I see that making sense is if you have almost no short lived objects (which occurs in what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non generational approach anymore. It's just standard GC practice. I doubt that there are many batch-oriented applications of Solr, though. The rest of the advice is intended to be general and it sounds like we agree about sizing. If the nursery is not big enough, the tenured space will be used for allocations that have a short lifetime and that will increase the length and/or frequency of major collections. Yes - I wasn't arguing with every point - I was picking and choosing :) After the heap size, the size of the young generation is the most important factor. Cache evictions are the interesting part, because they cause a constant rate of tenured space garbage. In most many servers, you can get a big enough nursery that major collections are very rare. That won't happen in Solr because of cache evictions. The IBM JVM is excellent. Their concurrent generational GC policy is gencon. Yeah, I actually know very little about the IBM JVM, so I wasn't really commenting. But from the info I gleaned here and on a couple quick web searches, I'm not too impressed by it's GC. wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:31 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection My bad - later, it looks as if your giving general advice, and thats what I took issue with. Any Collector that is not doing generational collection is essentially from the dark ages and shouldn't be used. Any Collector that doesn't have concurrent options, unless possibly your running a tiny app (under 100MB of RAM), or only have a single CPU, is also dark ages, and not fit for a server environement. I havn't kept up with IBM's JVM, but it sounds like they are well behind Sun in GC then. - Mark Walter Underwood wrote: As I said, I was using the IBM JVM, not the Sun JVM. The concurrent low pause collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the other generation with each now too. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young
Re: Solr and Garbage Collection
Ok. I'll first change the GC and see if the time spent decreased. Than I'll try increasing the heap as Fuad recommends. On 9/25/09, Mark Miller markrmil...@gmail.com wrote: When we talk about Collectors, we are not just talking about collecting - whatever that means. There isn't really a collecting phase - the whole algorithm is garbage collecting - hence calling the different implementations collectors. Usually, fragmentation is dealt with using a mark-compact collector (or IBM has used a mark-sweep-compact collector). Copying collectors are not only super efficient at collecting young spaces, but they are also great for fragmentation - when you copy everything to the new space, you can remove any fragmentation. At the cost of double the space requirements though. So mark-compact is a compromise. First you mark whats reachable, then everything thats marked is copied/compacted to the bottom of the heap. Its all part of a collection though. Jonathan Ariel wrote: Maybe what's missing here is how did I get the 11%.I just ran solr with the following JVM params: -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime with that I can measure the amount of time the application run between collection pauses and the length of the collection pauses, respectively. I think that in this case the 11% is just for memory collection and not defragmentation... but I'm not 100% sure. On Fri, Sep 25, 2009 at 5:05 PM, Fuad Efendi f...@efendi.ca wrote: But again, GC is not just Garbage Collection as many in this thread think... it is also memory defragmentation which is much costly than collection just because it needs move somewhere _live_objects_ (and wait/lock till such objects get unlocked to be moved...) - obviously more memory helps... 11% is extremely high. -Fuad http://www.linkedin.com/in/liferay -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: September-25-09 3:36 PM To: solr-user@lucene.apache.org Subject: Re: FW: Solr and Garbage Collection I'm not planning on lowering the heap. I just want to lower the time wasted on GC, which is 11% right now.So what I'll try is changing the GC to -XX:+UseConcMarkSweepGC On Fri, Sep 25, 2009 at 4:17 PM, Fuad Efendi f...@efendi.ca wrote: Mark, what if piece of code needs 10 contiguous Kb to load a document field? How locked memory pieces are optimized/moved (putting on hold almost whole application)? Lowering heap is _bad_ idea; we will have extremely frequent GC (optimize of live objects!!!) even if RAM is (theoretically) enough. -Fuad Faud, you didn't read the thread right. He is not having a problem with OOM. He got the OOM because he lowered the heap to try and help GC. He normally runs with a heap that can handle his FC. Please re-read the thread. You are confusing the tread. - Mark GC will frequently happen even if RAM is more than enough: in case if it is heavily sparse... so that have even more RAM! -Fuad -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr and Garbage Collection
Jonathan Ariel wrote: How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? Just to straighten out this one too - Ergonomics doesn't use throughput - throughput is the collector that allows Ergonomics ;) And throughput is the default as long as your machine is detected as server class. But throughput is not great with large tenured spaces out of the box. It only parallelizes the new space collection. You have to turn on an option to get parallel tenured collection as well - which is essential to scale to large heap sizes. -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Mark Miller wrote: Jonathan Ariel wrote: How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? Just to straighten out this one too - Ergonomics doesn't use throughput - throughput is the collector that allows Ergonomics ;) And throughput is the default as long as your machine is detected as server class. But throughput is not great with large tenured spaces out of the box. It only parallelizes the new space collection. You have to turn on an option to get parallel tenured collection as well - which is essential to scale to large heap sizes. hmm - I'm not being totally accurate there - ergonomics is what detects server and so makes throughput the default collector for a server machine. But much of the GC ergonomics support only works with the throughput collector. Kind of chicken and egg :) -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Thats a good point too - if you can reduce your need for such a large heap, by all means, do so. However, considering you already need at least 10GB or you get OOM, you have a long way to go with that approach. Good luck :) How many docs do you have ? I'm guessing its mostly FieldCache type stuff, and thats the type of thing you can't really side step, unless you give up the functionality thats using it. Grant Ingersoll wrote: On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
One more point and I'll stop - I've hit my email quota for the day ;) While its a pain to have to juggle GC params and tune - when you require a heap thats more than a gig or two, I personally believe its essential to do so for good performance. The (default settings / ergonomics with throughput) just don't cut it. Sad fact of life :) Luckily, you don't generally have to do that much to get things nice - the number of options is not that staggering, and you don't usually need to get into most of them. Choosing the right collector, and tweaking a setting or two can often be enough. The most important to do with a large heap and the throughput collector is to turn on parallel tenured collection. I've said it before, but it really is key. At least if you have more than a processor or two - which, for your sake, I hope you do :) - Mark Mark Miller wrote: Thats a good point too - if you can reduce your need for such a large heap, by all means, do so. However, considering you already need at least 10GB or you get OOM, you have a long way to go with that approach. Good luck :) How many docs do you have ? I'm guessing its mostly FieldCache type stuff, and thats the type of thing you can't really side step, unless you give up the functionality thats using it. Grant Ingersoll wrote: On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
I have around 8M documents. I set up my server to use a different collector and it seems like it decreased from 11% to 4%, of course I need to wait a bit more because it is just a 1 hour old log. But it seems like it is much better now. I will tell you on Monday the results :) On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller markrmil...@gmail.com wrote: Thats a good point too - if you can reduce your need for such a large heap, by all means, do so. However, considering you already need at least 10GB or you get OOM, you have a long way to go with that approach. Good luck :) How many docs do you have ? I'm guessing its mostly FieldCache type stuff, and thats the type of thing you can't really side step, unless you give up the functionality thats using it. Grant Ingersoll wrote: On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
Sorry for OFF-topic: Create dummy Hello, World! JSP, use Tomcat, execute load-stress simulator(s) from separate machine(s), and measure... don't forget to allocate necessary thread pools in Tomcat (if you have to)... Although such JSP doesn't use any memory, you will see how easy one can go with 5000 TPS (or 'virtually' 5 concurrent users) on modern quad-cores by simply allocating more memory (...GB) and more Tomcat threads. There is threshold too... repeat it with HTTPD Workers (and threads), same result, although it doesn't use any GC. More memory - more threads - more keep alives per TCP... However, 'theoretically' you need only 64Mb for Hello World :)))