Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Simon Willnauer
On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote:
 Thanks Robert for these inputs.

 Since we do not really Snowball analyzer for this field, we would not use
 it for now. If this still does not address our issue, we would tweak thread
 pool as per eks dev suggestion - I am bit hesitant to do this change yet as
 we would be reducing thread pool which can adversely impact our throughput

 If Snowball Filter is being optimized for Solr 4 beta then it would be
 great for us. If you have already filed a JIRA for this then please let me
 know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon

 Thanks again
 Saroj





 On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote:

 On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
  Hi All
 
  I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
 that
  when we are indexing lots of data with 16 concurrent threads, Heap grows
  continuously. It remains high and ultimately most of the stuff ends up
  being moved to Old Gen. Eventually, Old Gen also fills up and we start
  getting into excessive GC problem.

 Hi: I don't claim to know anything about how tomcat manages threads,
 but really you shouldnt have all these objects.

 In general snowball stemmers should be reused per-thread-per-field.
 But if you have a lot of fields*threads, especially if there really is
 high thread churn on tomcat, then this could be bad with snowball:
 see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

 I think it would be useful to see if you can tune tomcat's threadpool
 as he describes.

 separately: Snowball stemmers are currently really ram-expensive for
 stupid reasons.
 each one creates a ton of Among objects, e.g. an EnglishStemmer today
 is about 8KB.

 I'll regenerate these and open a JIRA issue: as the snowball code
 generator in their svn was improved
 recently and each one now takes about 64 bytes instead (the Among's
 are static and reused).

 Still this wont really solve your problem, because the analysis
 chain could have other heavy parts
 in initialization, but it seems good to fix.

 As a workaround until then you can also just use the good old
 PorterStemmer (PorterStemFilterFactory in solr).
 Its not exactly the same as using Snowball(English) but its pretty
 close and also much faster.

 --
 lucidimagination.com



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Laurent Vaills
Hi everyone,

Is there any chance to get his backported for a 3.6.2 ?

Regards,
Laurent

2012/8/2 Simon Willnauer simon.willna...@gmail.com

 On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote:
  Thanks Robert for these inputs.
 
  Since we do not really Snowball analyzer for this field, we would not use
  it for now. If this still does not address our issue, we would tweak
 thread
  pool as per eks dev suggestion - I am bit hesitant to do this change yet
 as
  we would be reducing thread pool which can adversely impact our
 throughput
 
  If Snowball Filter is being optimized for Solr 4 beta then it would be
  great for us. If you have already filed a JIRA for this then please let
 me
  know and I would like to follow it

 AFAIK Robert already created and issue here:
 https://issues.apache.org/jira/browse/LUCENE-4279
 and it seems fixed. Given the massive commit last night its already
 committed and backported so it will be in 4.0-BETA.

 simon
 
  Thanks again
  Saroj
 
 
 
 
 
  On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote:
 
  On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
   Hi All
  
   I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
  that
   when we are indexing lots of data with 16 concurrent threads, Heap
 grows
   continuously. It remains high and ultimately most of the stuff ends up
   being moved to Old Gen. Eventually, Old Gen also fills up and we start
   getting into excessive GC problem.
 
  Hi: I don't claim to know anything about how tomcat manages threads,
  but really you shouldnt have all these objects.
 
  In general snowball stemmers should be reused per-thread-per-field.
  But if you have a lot of fields*threads, especially if there really is
  high thread churn on tomcat, then this could be bad with snowball:
  see eks dev's comment on
 https://issues.apache.org/jira/browse/LUCENE-3841
 
  I think it would be useful to see if you can tune tomcat's threadpool
  as he describes.
 
  separately: Snowball stemmers are currently really ram-expensive for
  stupid reasons.
  each one creates a ton of Among objects, e.g. an EnglishStemmer today
  is about 8KB.
 
  I'll regenerate these and open a JIRA issue: as the snowball code
  generator in their svn was improved
  recently and each one now takes about 64 bytes instead (the Among's
  are static and reused).
 
  Still this wont really solve your problem, because the analysis
  chain could have other heavy parts
  in initialization, but it seems good to fix.
 
  As a workaround until then you can also just use the good old
  PorterStemmer (PorterStemFilterFactory in solr).
  Its not exactly the same as using Snowball(English) but its pretty
  close and also much faster.
 
  --
  lucidimagination.com
 



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Robert Muir
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills laurent.vai...@gmail.com wrote:
 Hi everyone,

 Is there any chance to get his backported for a 3.6.2 ?


Hello, I personally have no problem with it: but its really
technically not a bugfix, just an optimization.

It also doesnt solve the actual problem if you have a tomcat
threadpool configuration recycling threads too fast. There will be
other performance problems.

-- 
lucidimagination.com


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Robert Muir
On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
 Hi All

 I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
 when we are indexing lots of data with 16 concurrent threads, Heap grows
 continuously. It remains high and ultimately most of the stuff ends up
 being moved to Old Gen. Eventually, Old Gen also fills up and we start
 getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

Still this wont really solve your problem, because the analysis
chain could have other heavy parts
in initialization, but it seems good to fix.

As a workaround until then you can also just use the good old
PorterStemmer (PorterStemFilterFactory in solr).
Its not exactly the same as using Snowball(English) but its pretty
close and also much faster.

-- 
lucidimagination.com


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread roz dev
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote:

 On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
  Hi All
 
  I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
 that
  when we are indexing lots of data with 16 concurrent threads, Heap grows
  continuously. It remains high and ultimately most of the stuff ends up
  being moved to Old Gen. Eventually, Old Gen also fills up and we start
  getting into excessive GC problem.

 Hi: I don't claim to know anything about how tomcat manages threads,
 but really you shouldnt have all these objects.

 In general snowball stemmers should be reused per-thread-per-field.
 But if you have a lot of fields*threads, especially if there really is
 high thread churn on tomcat, then this could be bad with snowball:
 see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

 I think it would be useful to see if you can tune tomcat's threadpool
 as he describes.

 separately: Snowball stemmers are currently really ram-expensive for
 stupid reasons.
 each one creates a ton of Among objects, e.g. an EnglishStemmer today
 is about 8KB.

 I'll regenerate these and open a JIRA issue: as the snowball code
 generator in their svn was improved
 recently and each one now takes about 64 bytes instead (the Among's
 are static and reused).

 Still this wont really solve your problem, because the analysis
 chain could have other heavy parts
 in initialization, but it seems good to fix.

 As a workaround until then you can also just use the good old
 PorterStemmer (PorterStemFilterFactory in solr).
 Its not exactly the same as using Snowball(English) but its pretty
 close and also much faster.

 --
 lucidimagination.com



Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-07-31 Thread roz dev
Hi All

I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
when we are indexing lots of data with 16 concurrent threads, Heap grows
continuously. It remains high and ultimately most of the stuff ends up
being moved to Old Gen. Eventually, Old Gen also fills up and we start
getting into excessive GC problem.

I took a heap dump and found that most of the memory is consumed by
CloseableThreadLocal which is holding a WeakHashMap of Threads and its
state.

Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap
dump shows that all such entries are using Snowball Filter. I looked into
LUCENE-3841 and verified that my version of SOLR 4 has that code.

So, I am wondering the reason for this memory leak - is it due to some
other bug with Solr/Lucene?

Here is a brief snapshot of HeapDump showing the problem

Class
Name
| Shallow Heap | Retained Heap
-
*org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x300c3eb28
|   24 | 3,885,213,072*
|- class class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x2f9753340   |0
| 0
|- this$0 org.apache.solr.schema.IndexSchema @
0x300bf4048
|   96 |   276,704
*|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
@ 0x300c3eb40  |   16 |
3,885,208,728*
|  |- class class
org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @
0x2f98368c0   |0 | 0
|  |- storedValue org.apache.lucene.util.CloseableThreadLocal @
0x300c3eb50   |
24 | 3,885,208,712
|  |  |- class class org.apache.lucene.util.CloseableThreadLocal @
0x2f9788918  |8
| 8
|  |  |- t java.lang.ThreadLocal @
0x300c3eb68
|   16 |16
|  |  |  '- class class java.lang.ThreadLocal @ 0x2f80f0868 System
Class|8
|24
*|  |  |- hardRefs java.util.WeakHashMap @
0x300c3eb78
|   48 | 3,885,208,656*
|  |  |  |- class class java.util.WeakHashMap @ 0x2f8476c00 System
Class|   16
|16
|  |  |  |- table java.util.WeakHashMap$Entry[16] @
0x300c3eba8
|   80 | 2,200,016,960
|  |  |  |  |- class class java.util.WeakHashMap$Entry[] @
0x2f84789e8
|0 | 0
|  |  |  |  |-* [7] java.util.WeakHashMap$Entry @
0x306a24950
|   40 |   318,502,920*
|  |  |  |  |  |- class class java.util.WeakHashMap$Entry @ 0x2f84786f8
System Class|0
| 0
|  |  |  |  |  |- queue java.lang.ref.ReferenceQueue @
0x300c3ebf8
|   32 |48
|  |  |  |  |  |- referent java.lang.Thread @ 0x30678c2c0
web-23
|  112 |   160
|  |  |  |  |  |- value java.util.HashMap @
0x30678cbb0
|   48 |   318,502,880
|  |  |  |  |  |  |- class class java.util.HashMap @ 0x2f80b9428 System
Class   |   24
|24
*|  |  |  |  |  |  |- table java.util.HashMap$Entry[32768] @
0x3c07c6f58   |
131,088 |   318,502,832*
|  |  |  |  |  |  |  |- class class java.util.HashMap$Entry[] @
0x2f80bd9c8 |0
| 0
|  |  |  |  |  |  |  |- [10457] java.util.HashMap$Entry @
0x30678cbe0
|   32 |40,864
|  |  |  |  |  |  |  |  |- class class java.util.HashMap$Entry @
0x2f80bd400 System Class   |0
| 0
|  |  |  |  |  |  |  |  |- key java.lang.String @ 0x30678cc00
prod_desc_keywd_en_CA  |
32 |96
|  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x30678cc60  |   24 |20,344
|  |  |  |  |  |  |  |  |- next java.util.HashMap$Entry @
0x39a2c9100
|   32 |20,392
|  |  |  |  |  |  |  |  |  |- class class java.util.HashMap$Entry @
0x2f80bd400 System Class|0
| 0
|  |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x39a2c9120
3637994_fr_CA_cat_name_keywd|   32
|   104
|  |  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x39a2c9188   |   24 |20,256
|  |  |  |  |  |  |  |  |  |  |- class class
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x2f97a69a0|0 | 0
|  |  |  |  |  |  |  |  |  |  

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-07-31 Thread Erick Erickson
I wasn't in on tracking down the original issue, but I know
at least one client ran into a problem with weak hash
references that was a bug in the JVM here:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034

Here's the summary:

Parallel CMS (ie with more than one CMS marking thread) does not
enqueue all the dead Reference objects in the old gen.

If you have very large numbers of dead Reference objects (in this case
Finalizable objects) then running CMS with 2 marking threads appears
to cause only a small fraction of Reference objects to be identified.

The unmarked objects build up in old gen and eventually a STW Full GC
is triggered which enqueues the dead Reference objects. Running with
-XX:ConcGCThreads=1 also fixes the problem.

Best
Erick

On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
 Hi All

 I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
 when we are indexing lots of data with 16 concurrent threads, Heap grows
 continuously. It remains high and ultimately most of the stuff ends up
 being moved to Old Gen. Eventually, Old Gen also fills up and we start
 getting into excessive GC problem.

 I took a heap dump and found that most of the memory is consumed by
 CloseableThreadLocal which is holding a WeakHashMap of Threads and its
 state.

 Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap
 dump shows that all such entries are using Snowball Filter. I looked into
 LUCENE-3841 and verified that my version of SOLR 4 has that code.

 So, I am wondering the reason for this memory leak - is it due to some
 other bug with Solr/Lucene?

 Here is a brief snapshot of HeapDump showing the problem

 Class
 Name
 | Shallow Heap | Retained Heap
 -
 *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
 0x300c3eb28
 |   24 | 3,885,213,072*
 |- class class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
 0x2f9753340   |0
 | 0
 |- this$0 org.apache.solr.schema.IndexSchema @
 0x300bf4048
 |   96 |   276,704
 *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
 @ 0x300c3eb40  |   16 |
 3,885,208,728*
 |  |- class class
 org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @
 0x2f98368c0   |0 | 0
 |  |- storedValue org.apache.lucene.util.CloseableThreadLocal @
 0x300c3eb50   |
 24 | 3,885,208,712
 |  |  |- class class org.apache.lucene.util.CloseableThreadLocal @
 0x2f9788918  |8
 | 8
 |  |  |- t java.lang.ThreadLocal @
 0x300c3eb68
 |   16 |16
 |  |  |  '- class class java.lang.ThreadLocal @ 0x2f80f0868 System
 Class|8
 |24
 *|  |  |- hardRefs java.util.WeakHashMap @
 0x300c3eb78
 |   48 | 3,885,208,656*
 |  |  |  |- class class java.util.WeakHashMap @ 0x2f8476c00 System
 Class|   16
 |16
 |  |  |  |- table java.util.WeakHashMap$Entry[16] @
 0x300c3eba8
 |   80 | 2,200,016,960
 |  |  |  |  |- class class java.util.WeakHashMap$Entry[] @
 0x2f84789e8
 |0 | 0
 |  |  |  |  |-* [7] java.util.WeakHashMap$Entry @
 0x306a24950
 |   40 |   318,502,920*
 |  |  |  |  |  |- class class java.util.WeakHashMap$Entry @ 0x2f84786f8
 System Class|0
 | 0
 |  |  |  |  |  |- queue java.lang.ref.ReferenceQueue @
 0x300c3ebf8
 |   32 |48
 |  |  |  |  |  |- referent java.lang.Thread @ 0x30678c2c0
 web-23
 |  112 |   160
 |  |  |  |  |  |- value java.util.HashMap @
 0x30678cbb0
 |   48 |   318,502,880
 |  |  |  |  |  |  |- class class java.util.HashMap @ 0x2f80b9428 System
 Class   |   24
 |24
 *|  |  |  |  |  |  |- table java.util.HashMap$Entry[32768] @
 0x3c07c6f58   |
 131,088 |   318,502,832*
 |  |  |  |  |  |  |  |- class class java.util.HashMap$Entry[] @
 0x2f80bd9c8 |0
 | 0
 |  |  |  |  |  |  |  |- [10457] java.util.HashMap$Entry @
 0x30678cbe0
 |   32 |40,864
 |  |  |  |  |  |  |  |  |- class class java.util.HashMap$Entry @
 0x2f80bd400 System Class   |0
 | 0
 |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x30678cc00
 prod_desc_keywd_en_CA  |
 32 |