Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote: Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it AFAIK Robert already created and issue here: https://issues.apache.org/jira/browse/LUCENE-4279 and it seems fixed. Given the massive commit last night its already committed and backported so it will be in 4.0-BETA. simon Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Hi everyone, Is there any chance to get his backported for a 3.6.2 ? Regards, Laurent 2012/8/2 Simon Willnauer simon.willna...@gmail.com On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote: Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it AFAIK Robert already created and issue here: https://issues.apache.org/jira/browse/LUCENE-4279 and it seems fixed. Given the massive commit last night its already committed and backported so it will be in 4.0-BETA. simon Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills laurent.vai...@gmail.com wrote: Hi everyone, Is there any chance to get his backported for a 3.6.2 ? Hello, I personally have no problem with it: but its really technically not a bugfix, just an optimization. It also doesnt solve the actual problem if you have a tomcat threadpool configuration recycling threads too fast. There will be other performance problems. -- lucidimagination.com
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. I took a heap dump and found that most of the memory is consumed by CloseableThreadLocal which is holding a WeakHashMap of Threads and its state. Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap dump shows that all such entries are using Snowball Filter. I looked into LUCENE-3841 and verified that my version of SOLR 4 has that code. So, I am wondering the reason for this memory leak - is it due to some other bug with Solr/Lucene? Here is a brief snapshot of HeapDump showing the problem Class Name | Shallow Heap | Retained Heap - *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x300c3eb28 | 24 | 3,885,213,072* |- class class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x2f9753340 |0 | 0 |- this$0 org.apache.solr.schema.IndexSchema @ 0x300bf4048 | 96 | 276,704 *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x300c3eb40 | 16 | 3,885,208,728* | |- class class org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x2f98368c0 |0 | 0 | |- storedValue org.apache.lucene.util.CloseableThreadLocal @ 0x300c3eb50 | 24 | 3,885,208,712 | | |- class class org.apache.lucene.util.CloseableThreadLocal @ 0x2f9788918 |8 | 8 | | |- t java.lang.ThreadLocal @ 0x300c3eb68 | 16 |16 | | | '- class class java.lang.ThreadLocal @ 0x2f80f0868 System Class|8 |24 *| | |- hardRefs java.util.WeakHashMap @ 0x300c3eb78 | 48 | 3,885,208,656* | | | |- class class java.util.WeakHashMap @ 0x2f8476c00 System Class| 16 |16 | | | |- table java.util.WeakHashMap$Entry[16] @ 0x300c3eba8 | 80 | 2,200,016,960 | | | | |- class class java.util.WeakHashMap$Entry[] @ 0x2f84789e8 |0 | 0 | | | | |-* [7] java.util.WeakHashMap$Entry @ 0x306a24950 | 40 | 318,502,920* | | | | | |- class class java.util.WeakHashMap$Entry @ 0x2f84786f8 System Class|0 | 0 | | | | | |- queue java.lang.ref.ReferenceQueue @ 0x300c3ebf8 | 32 |48 | | | | | |- referent java.lang.Thread @ 0x30678c2c0 web-23 | 112 | 160 | | | | | |- value java.util.HashMap @ 0x30678cbb0 | 48 | 318,502,880 | | | | | | |- class class java.util.HashMap @ 0x2f80b9428 System Class | 24 |24 *| | | | | | |- table java.util.HashMap$Entry[32768] @ 0x3c07c6f58 | 131,088 | 318,502,832* | | | | | | | |- class class java.util.HashMap$Entry[] @ 0x2f80bd9c8 |0 | 0 | | | | | | | |- [10457] java.util.HashMap$Entry @ 0x30678cbe0 | 32 |40,864 | | | | | | | | |- class class java.util.HashMap$Entry @ 0x2f80bd400 System Class |0 | 0 | | | | | | | | |- key java.lang.String @ 0x30678cc00 prod_desc_keywd_en_CA | 32 |96 | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x30678cc60 | 24 |20,344 | | | | | | | | |- next java.util.HashMap$Entry @ 0x39a2c9100 | 32 |20,392 | | | | | | | | | |- class class java.util.HashMap$Entry @ 0x2f80bd400 System Class|0 | 0 | | | | | | | | | |- key java.lang.String @ 0x39a2c9120 3637994_fr_CA_cat_name_keywd| 32 | 104 | | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x39a2c9188 | 24 |20,256 | | | | | | | | | | |- class class org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x2f97a69a0|0 | 0 | | | | | | | | | |
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
I wasn't in on tracking down the original issue, but I know at least one client ran into a problem with weak hash references that was a bug in the JVM here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 Here's the summary: Parallel CMS (ie with more than one CMS marking thread) does not enqueue all the dead Reference objects in the old gen. If you have very large numbers of dead Reference objects (in this case Finalizable objects) then running CMS with 2 marking threads appears to cause only a small fraction of Reference objects to be identified. The unmarked objects build up in old gen and eventually a STW Full GC is triggered which enqueues the dead Reference objects. Running with -XX:ConcGCThreads=1 also fixes the problem. Best Erick On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. I took a heap dump and found that most of the memory is consumed by CloseableThreadLocal which is holding a WeakHashMap of Threads and its state. Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap dump shows that all such entries are using Snowball Filter. I looked into LUCENE-3841 and verified that my version of SOLR 4 has that code. So, I am wondering the reason for this memory leak - is it due to some other bug with Solr/Lucene? Here is a brief snapshot of HeapDump showing the problem Class Name | Shallow Heap | Retained Heap - *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x300c3eb28 | 24 | 3,885,213,072* |- class class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x2f9753340 |0 | 0 |- this$0 org.apache.solr.schema.IndexSchema @ 0x300bf4048 | 96 | 276,704 *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x300c3eb40 | 16 | 3,885,208,728* | |- class class org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x2f98368c0 |0 | 0 | |- storedValue org.apache.lucene.util.CloseableThreadLocal @ 0x300c3eb50 | 24 | 3,885,208,712 | | |- class class org.apache.lucene.util.CloseableThreadLocal @ 0x2f9788918 |8 | 8 | | |- t java.lang.ThreadLocal @ 0x300c3eb68 | 16 |16 | | | '- class class java.lang.ThreadLocal @ 0x2f80f0868 System Class|8 |24 *| | |- hardRefs java.util.WeakHashMap @ 0x300c3eb78 | 48 | 3,885,208,656* | | | |- class class java.util.WeakHashMap @ 0x2f8476c00 System Class| 16 |16 | | | |- table java.util.WeakHashMap$Entry[16] @ 0x300c3eba8 | 80 | 2,200,016,960 | | | | |- class class java.util.WeakHashMap$Entry[] @ 0x2f84789e8 |0 | 0 | | | | |-* [7] java.util.WeakHashMap$Entry @ 0x306a24950 | 40 | 318,502,920* | | | | | |- class class java.util.WeakHashMap$Entry @ 0x2f84786f8 System Class|0 | 0 | | | | | |- queue java.lang.ref.ReferenceQueue @ 0x300c3ebf8 | 32 |48 | | | | | |- referent java.lang.Thread @ 0x30678c2c0 web-23 | 112 | 160 | | | | | |- value java.util.HashMap @ 0x30678cbb0 | 48 | 318,502,880 | | | | | | |- class class java.util.HashMap @ 0x2f80b9428 System Class | 24 |24 *| | | | | | |- table java.util.HashMap$Entry[32768] @ 0x3c07c6f58 | 131,088 | 318,502,832* | | | | | | | |- class class java.util.HashMap$Entry[] @ 0x2f80bd9c8 |0 | 0 | | | | | | | |- [10457] java.util.HashMap$Entry @ 0x30678cbe0 | 32 |40,864 | | | | | | | | |- class class java.util.HashMap$Entry @ 0x2f80bd400 System Class |0 | 0 | | | | | | | | |- key java.lang.String @ 0x30678cc00 prod_desc_keywd_en_CA | 32 |