RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-24 Thread Andy Hind

Entries are only reclaimed after the holding object/class has been
garbage collected and you access thread locals on a given thread. Lets
say you give your app server 1G to play with. Then you create thread
locals like there is no tomorrow. GC has a bit of a sleep and before you
know it you do have a very large table to hold thread locals. It is the
interaction of garbage collection and thread local access that is used
to clear entries and determine a stable size.

Remember, there are lots of other things going on and it may not just be
lucene using thread locals. If something else pushes up the thread local
table size there is still going to be an issue.

I have had a quick play using a simple lucene example and got a thread
local table of 131072 by allowing the test 512m. This thread local table
size increases as you have more free memory and GC is less frequent. It
also tends to increase with the number of search iterations. I presume
you are more likely to hit a bad case. I will add the example to the bug
report. 

Ok, there are ~7000 actual entries in this table at the end - pretty
sparse.
When I access it is going to look at the next 12 entries. Which is 64%
hit rate. So maybe this is a stable size. I have used it enough. Maybe
it is a freak of use and gc.

In the actual case the thread local tables seem to grow steadily over
time.
This is going to be down to the number of threads, typical amount of
free memory, rate of thread local creation, thread local table size, gc
and the rate at which stale entries are cleared out, ... It is not going
to be an issue for everyone.

So with 100 threads I can see how we could end up holding on to maybe
700,000 objects. These all have a clone of an input stream each with at
least a 1k byte buffer from a quick look at a couple? Has someone got a
better size?

I will see if I can turn my example into a test case.

Regards

Andy


-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED] 
Sent: 23 March 2006 18:50
To: java-dev@lucene.apache.org
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException

The testcase is still not correct - at least with regards to Lucene.

Review the ThreadLocal and ThreadLocalMap code again, you will see that
references to the ThreadLocal are kept using weak references, in a slot
in
an array, and entries are reclaimed() (i.e. the slot cleared)
PERIODICALLY
as new entries are ADDED or RETRIEVED. The array will never decrease in
size, but it is unlikely to grow to be very large.

See TheadLocalMap.cleanSomeSlots() and where/when it is called.

 The reason that your test fails is that each entry in the table
maintains a HARD reference to the value, and since the entries are only
reclaimed PERIODICALLY (for performance reasons - see cleanSomeSlots()),
you
do have the possibility to run out of memory (since the ThreadLocal
values
you are storing are EXTREMELY LARGE).

In Lucene, the values used in the ThreadLocal are 1/10 the size, and
thus should cause no problems.

-Original Message-
From: Andy Hind [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 12:17 PM
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException



Well, unfortunately, it is your test case that is not equivalent.

OK you make 10 times as many objects that are 1/10 the size.
But the thread local map is the same size and likely to end up holding
the same number of stale entries, so yes the memory footprint is 10
times smaller and it works.

If you increase the size of the thread local table by x10 it will fail
again. This will not be an exact memory size match due to the underlying
algorithm for cleaning stuff out looking at log2(current.no.of thread
locals) + ?*log2(table.size) and being dependent on GC, so I have no
idea what fraction of stale entries will remain for any size.

I suggest you look at the source code for ThreadLocal and how stale
entries are removed look for GC and you will see where the value
is set to null to help the GC.

I should have proposed using SoftReferences not WeakReferences.
Apologies. Soft References also work fine.

The finalize method is indeed pointless but makes no difference to the
result. The point I was making is that it is pointless in lucene.

Yes, creating threads is expensive - my point - somewhat tongue-in-cheek
is at least they would be clean :-)

You can add an assert to make sure the weak/soft reference exists after
the Object is created. It does. You *may* be unlucky and have GC take
place in between and it go for weak/soft references.

I used 1.5.0_04 and checked the source code for ThreadLocal is 1.33
which is the same version as in 1.5.0_06 and the issue was initially
found using 1.5.0_06.

The issue is for real.
You can blame ThreadLocal but it does what it says

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-24 Thread Robert Engels
Seems like something else is wrong in your environment, since you will only
get 2 ThreadLocals per segment - having 7000 entries in the ThreadLocal of a
thread seems like a lot. Even so, with the current finalize() method, the
buffer used by the ThreadLocal is reclaimed, since the ThreadLocal value is
set to null.

I would also be very suspect as too how you are determining this. If you are
using a profiler or debugger, the extra CPU processing required by the
environment along with the serious CPU nature of lucene will often cause the
garbage collector to not run as frequently. I would try reducing the memory
(maybe 128m) to cause the JVM to run out of memory and force the GC to run,
and see how many stale entries retained. If the GC runs more often, then
the call to cleanSomeSlots() will find more slots available to be reclaimed.

100 threads processing Lucene queries seems like WAY TOO many as well. Given
the CPU bound nature of MOST of Lucene, 100 threads is going to waste a lot
of cpu performing context switching. This is an IDEAL case to use a
ThreadPool of Lucene processing threads - I would no more than a few per
logical processor sounds reasonable.

Anyway, I suggest the JIRA issue be closed, there is no bug in Lucene, and
the common (and best) use of a ThreadPool in a high-use server environment
makes this a non-issue.


-Original Message-
From: Andy Hind [mailto:[EMAIL PROTECTED]
Sent: Friday, March 24, 2006 6:03 AM
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException



Entries are only reclaimed after the holding object/class has been
garbage collected and you access thread locals on a given thread. Lets
say you give your app server 1G to play with. Then you create thread
locals like there is no tomorrow. GC has a bit of a sleep and before you
know it you do have a very large table to hold thread locals. It is the
interaction of garbage collection and thread local access that is used
to clear entries and determine a stable size.

Remember, there are lots of other things going on and it may not just be
lucene using thread locals. If something else pushes up the thread local
table size there is still going to be an issue.

I have had a quick play using a simple lucene example and got a thread
local table of 131072 by allowing the test 512m. This thread local table
size increases as you have more free memory and GC is less frequent. It
also tends to increase with the number of search iterations. I presume
you are more likely to hit a bad case. I will add the example to the bug
report.

Ok, there are ~7000 actual entries in this table at the end - pretty
sparse.
When I access it is going to look at the next 12 entries. Which is 64%
hit rate. So maybe this is a stable size. I have used it enough. Maybe
it is a freak of use and gc.

In the actual case the thread local tables seem to grow steadily over
time.
This is going to be down to the number of threads, typical amount of
free memory, rate of thread local creation, thread local table size, gc
and the rate at which stale entries are cleared out, ... It is not going
to be an issue for everyone.

So with 100 threads I can see how we could end up holding on to maybe
700,000 objects. These all have a clone of an input stream each with at
least a 1k byte buffer from a quick look at a couple? Has someone got a
better size?

I will see if I can turn my example into a test case.

Regards

Andy


-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: 23 March 2006 18:50
To: java-dev@lucene.apache.org
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException

The testcase is still not correct - at least with regards to Lucene.

Review the ThreadLocal and ThreadLocalMap code again, you will see that
references to the ThreadLocal are kept using weak references, in a slot
in
an array, and entries are reclaimed() (i.e. the slot cleared)
PERIODICALLY
as new entries are ADDED or RETRIEVED. The array will never decrease in
size, but it is unlikely to grow to be very large.

See TheadLocalMap.cleanSomeSlots() and where/when it is called.

 The reason that your test fails is that each entry in the table
maintains a HARD reference to the value, and since the entries are only
reclaimed PERIODICALLY (for performance reasons - see cleanSomeSlots()),
you
do have the possibility to run out of memory (since the ThreadLocal
values
you are storing are EXTREMELY LARGE).

In Lucene, the values used in the ThreadLocal are 1/10 the size, and
thus should cause no problems.

-Original Message-
From: Andy Hind [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 12:17 PM
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal

[jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-23 Thread Andy Hind (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-529?page=all ]

Andy Hind updated LUCENE-529:
-

Attachment: ThreadLocalTest.java

Attached is a test which you can use to see how ThreadLocals are left around.
Getting an out of memory exception depends on a number thingsit is set up 
to fail for 64M

Now I understand what is going on, there are a few alternatives:

1) set null on close
- fine for single thread use
- probably leaves (n-1)*segments*2things hanging around for n threaded use

2) Use a weak reference and leave it up to GC to get rid of the referent when 
it is not being used

3) Manage the things youself by object id and thread id - and clean up on 
object close() 

I would go with option 1) and 2) although it may mean things get GCed before a 
call to close() when not used.

The fix I initially suggested is in production, and has been stress tested with 
a couple of hundred users continually pounding the app,
 but not for multithreaded use of IndexReaders. Each time does a couple of 
simple searches with no clever reuse of index readers (which is on the todo 
list)

I do not see how setting the thread local to null on close() has any negative 
impact. You are not going to use the cached information again??

Before the fix: 10-100 threads - 1G JVM - OOM in a few hours 
After: 10-100 threads 256M JVM -  days with a flat memory footprint

I am not sure why the thread local table is so big for us, but that is not 
really the issue.
It could just be building lots of IndexReaders (with thread locals hanging - 
probably making 10/instance ) and gc not kicking in so this table grows and can 
hold a lot of stale entries.  I may get time to investigate further

 TermInfosReader and other + instance ThreadLocal = transient/odd memory 
 leaks =  OutOfMemoryException
 ---

  Key: LUCENE-529
  URL: http://issues.apache.org/jira/browse/LUCENE-529
  Project: Lucene - Java
 Type: Bug
   Components: Index
 Versions: 1.9
  Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
 code 
 Reporter: Andy Hind
  Attachments: ThreadLocalTest.java

 TermInfosReader uses an instance level ThreadLocal for enumerators.
 This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to 
 current JVMs, 
 not just an old JVM issue as described in the finalizer of the 1.9 code.
 There is also an instance level thread local in SegmentReaderwhich will 
 have the same issue.
 There may be other uses which also need to be fixed.
 I don't understand the intended use for these variables.however
 Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
 source code. Each instance of TermInfosReader will be creating an instance of 
 the thread local. All this does is create an instance variable on each thread 
 when it accesses the thread local. Setting it to null in the finaliser will 
 set it to null on one thread, the finalizer thread, where it has never been 
 created.  There is no point to this :-(
 I assume there is a good concurrency reason why an instance variable can not 
 be used...
 I have not used multi-threaded searching, but I have used a lot of threads 
 each making searchers and searching.
 1.4.3 has a clear memory leak caused by this thread local. This use case 
 above is definitely solved by setting the thread local to null in the 
 close(). This at least has a chance of being on the correct thread :-) 
 I know reusing Searchers would help but that is my choice and I will get to 
 that later  
 Now you wnat to know why
 Thread locals are stored in a table of entries. Each entry is *weak 
 reference* to the key (Here the TermInfosReader instance)  and a *simple 
 reference* to the thread local value. When the instance is GCed its key 
 becomes null. 
 This is now a stale entry in the table.
 Stale entries are cleared up in an ad hoc way and until they are cleared up 
 the value will not be garbage collected.
 Until the instance is GCed it is a valid key and its presence may cause the 
 table to expand.
 See the ThreadLocal code.
 So if you have lots of threads, all creating thread locals rapidly, you can 
 get each thread holding a large table of thread locals which all contain many 
 stale entries and preventing some objects from being garbage collected. 
 The limited GC of the thread local table is not enough to save you from 
 running out of memory.  
 Summary:
 
 - remove finalizer()
 - set the thread local to null in close() 
   - values will be available for gc 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-23 Thread Robert Engels
Your testcase is invalid.

Reduce the size by 10, increase the repeat by 10, (SAME amount of memory use), 
and it works fine.

The reason it works in the one case is that you use new WeakReference(new 
Arrary()), - since the array cannot be referenced, it is immediately GC'd. You 
should have noticed since there were no finalization messages printed. You can 
demonstrate this clearly by adding an else to the if in the finalize() to print 
out that the object was indeed finalized.

ThreadLocal's work and are GC'd correctly.

There is something else wrong in your system.

I ran the test using 1.5.0._06.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 7:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException


 [ http://issues.apache.org/jira/browse/LUCENE-529?page=all ]

Andy Hind updated LUCENE-529:
-

Attachment: ThreadLocalTest.java

Attached is a test which you can use to see how ThreadLocals are left around.
Getting an out of memory exception depends on a number thingsit is set up 
to fail for 64M

Now I understand what is going on, there are a few alternatives:

1) set null on close
- fine for single thread use
- probably leaves (n-1)*segments*2things hanging around for n threaded use

2) Use a weak reference and leave it up to GC to get rid of the referent when 
it is not being used

3) Manage the things youself by object id and thread id - and clean up on 
object close() 

I would go with option 1) and 2) although it may mean things get GCed before a 
call to close() when not used.

The fix I initially suggested is in production, and has been stress tested with 
a couple of hundred users continually pounding the app,
 but not for multithreaded use of IndexReaders. Each time does a couple of 
simple searches with no clever reuse of index readers (which is on the todo 
list)

I do not see how setting the thread local to null on close() has any negative 
impact. You are not going to use the cached information again??

Before the fix: 10-100 threads - 1G JVM - OOM in a few hours 
After: 10-100 threads 256M JVM -  days with a flat memory footprint

I am not sure why the thread local table is so big for us, but that is not 
really the issue.
It could just be building lots of IndexReaders (with thread locals hanging - 
probably making 10/instance ) and gc not kicking in so this table grows and can 
hold a lot of stale entries.  I may get time to investigate further

 TermInfosReader and other + instance ThreadLocal = transient/odd memory 
 leaks =  OutOfMemoryException
 ---

  Key: LUCENE-529
  URL: http://issues.apache.org/jira/browse/LUCENE-529
  Project: Lucene - Java
 Type: Bug
   Components: Index
 Versions: 1.9
  Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
 code 
 Reporter: Andy Hind
  Attachments: ThreadLocalTest.java

 TermInfosReader uses an instance level ThreadLocal for enumerators.
 This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to 
 current JVMs, 
 not just an old JVM issue as described in the finalizer of the 1.9 code.
 There is also an instance level thread local in SegmentReaderwhich will 
 have the same issue.
 There may be other uses which also need to be fixed.
 I don't understand the intended use for these variables.however
 Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
 source code. Each instance of TermInfosReader will be creating an instance of 
 the thread local. All this does is create an instance variable on each thread 
 when it accesses the thread local. Setting it to null in the finaliser will 
 set it to null on one thread, the finalizer thread, where it has never been 
 created.  There is no point to this :-(
 I assume there is a good concurrency reason why an instance variable can not 
 be used...
 I have not used multi-threaded searching, but I have used a lot of threads 
 each making searchers and searching.
 1.4.3 has a clear memory leak caused by this thread local. This use case 
 above is definitely solved by setting the thread local to null in the 
 close(). This at least has a chance of being on the correct thread :-) 
 I know reusing Searchers would help but that is my choice and I will get to 
 that later  
 Now you wnat to know why
 Thread locals are stored in a table of entries. Each entry is *weak 
 reference* to the key (Here the TermInfosReader instance)  and a *simple 
 reference* to the thread local value. When the instance is GCed its key 
 becomes null. 
 This is now a stale entry in the table.
 Stale entries are cleared up in an ad hoc way and until they are cleared up 
 the value will not be garbage

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-23 Thread Robert Engels
The only other thing that may be causing your problem is the use of finalize(). 
This can interfere with the GC ability to GC objects.

I am not sure why the finalize() is used in the Lucene ThreadLocal handling. It 
doesn't seem necessary to me.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 10:00 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException
Importance: High


Your testcase is invalid.

Reduce the size by 10, increase the repeat by 10, (SAME amount of memory use), 
and it works fine.

The reason it works in the one case is that you use new WeakReference(new 
Arrary()), - since the array cannot be referenced, it is immediately GC'd. You 
should have noticed since there were no finalization messages printed. You can 
demonstrate this clearly by adding an else to the if in the finalize() to print 
out that the object was indeed finalized.

ThreadLocal's work and are GC'd correctly.

There is something else wrong in your system.

I ran the test using 1.5.0._06.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 7:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException


 [ http://issues.apache.org/jira/browse/LUCENE-529?page=all ]

Andy Hind updated LUCENE-529:
-

Attachment: ThreadLocalTest.java

Attached is a test which you can use to see how ThreadLocals are left around.
Getting an out of memory exception depends on a number thingsit is set up 
to fail for 64M

Now I understand what is going on, there are a few alternatives:

1) set null on close
- fine for single thread use
- probably leaves (n-1)*segments*2things hanging around for n threaded use

2) Use a weak reference and leave it up to GC to get rid of the referent when 
it is not being used

3) Manage the things youself by object id and thread id - and clean up on 
object close() 

I would go with option 1) and 2) although it may mean things get GCed before a 
call to close() when not used.

The fix I initially suggested is in production, and has been stress tested with 
a couple of hundred users continually pounding the app,
 but not for multithreaded use of IndexReaders. Each time does a couple of 
simple searches with no clever reuse of index readers (which is on the todo 
list)

I do not see how setting the thread local to null on close() has any negative 
impact. You are not going to use the cached information again??

Before the fix: 10-100 threads - 1G JVM - OOM in a few hours 
After: 10-100 threads 256M JVM -  days with a flat memory footprint

I am not sure why the thread local table is so big for us, but that is not 
really the issue.
It could just be building lots of IndexReaders (with thread locals hanging - 
probably making 10/instance ) and gc not kicking in so this table grows and can 
hold a lot of stale entries.  I may get time to investigate further

 TermInfosReader and other + instance ThreadLocal = transient/odd memory 
 leaks =  OutOfMemoryException
 ---

  Key: LUCENE-529
  URL: http://issues.apache.org/jira/browse/LUCENE-529
  Project: Lucene - Java
 Type: Bug
   Components: Index
 Versions: 1.9
  Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
 code 
 Reporter: Andy Hind
  Attachments: ThreadLocalTest.java

 TermInfosReader uses an instance level ThreadLocal for enumerators.
 This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to 
 current JVMs, 
 not just an old JVM issue as described in the finalizer of the 1.9 code.
 There is also an instance level thread local in SegmentReaderwhich will 
 have the same issue.
 There may be other uses which also need to be fixed.
 I don't understand the intended use for these variables.however
 Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
 source code. Each instance of TermInfosReader will be creating an instance of 
 the thread local. All this does is create an instance variable on each thread 
 when it accesses the thread local. Setting it to null in the finaliser will 
 set it to null on one thread, the finalizer thread, where it has never been 
 created.  There is no point to this :-(
 I assume there is a good concurrency reason why an instance variable can not 
 be used...
 I have not used multi-threaded searching, but I have used a lot of threads 
 each making searchers and searching.
 1.4.3 has a clear memory leak caused by this thread local. This use case 
 above is definitely solved by setting the thread local to null in the 
 close(). This at least has

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-23 Thread Andy Hind

Well, unfortunately, it is your test case that is not equivalent.

OK you make 10 times as many objects that are 1/10 the size.
But the thread local map is the same size and likely to end up holding
the same number of stale entries, so yes the memory footprint is 10
times smaller and it works.

If you increase the size of the thread local table by x10 it will fail
again. This will not be an exact memory size match due to the underlying
algorithm for cleaning stuff out looking at log2(current.no.of thread
locals) + ?*log2(table.size) and being dependent on GC, so I have no
idea what fraction of stale entries will remain for any size.

I suggest you look at the source code for ThreadLocal and how stale
entries are removed look for GC and you will see where the value
is set to null to help the GC. 

I should have proposed using SoftReferences not WeakReferences.
Apologies. Soft References also work fine.

The finalize method is indeed pointless but makes no difference to the
result. The point I was making is that it is pointless in lucene. 

Yes, creating threads is expensive - my point - somewhat tongue-in-cheek
is at least they would be clean :-) 

You can add an assert to make sure the weak/soft reference exists after
the Object is created. It does. You *may* be unlucky and have GC take
place in between and it go for weak/soft references.

I used 1.5.0_04 and checked the source code for ThreadLocal is 1.33
which is the same version as in 1.5.0_06 and the issue was initially
found using 1.5.0_06.

The issue is for real.
You can blame ThreadLocal but it does what it says on the tin.

Regards

Andy


-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED] 
Sent: 23 March 2006 16:05
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException

The only other thing that may be causing your problem is the use of
finalize(). This can interfere with the GC ability to GC objects.

I am now sure why the finalize() is used in the Lucene ThreadLocal
handling. It doesn't seem necessary to me.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 10:00 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException
Importance: High


Your testcase is invalid.

Reduce the size by 10, increase the repeat by 10, (SAME amount of memory
use), and it works fine.

The reason it works in the one case is that you use new
WeakReference(new Arrary()), - since the array cannot be referenced, it
is immediately GC'd. You should have noticed since there were no
finalization messages printed. You can demonstrate this clearly by
adding an else to the if in the finalize() to print out that the object
was indeed finalized.

ThreadLocal's work and are GC'd correctly.

There is something else wrong in your system.

I ran the test using 1.5.0._06.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 7:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException


 [ http://issues.apache.org/jira/browse/LUCENE-529?page=all ]

Andy Hind updated LUCENE-529:
-

Attachment: ThreadLocalTest.java

Attached is a test which you can use to see how ThreadLocals are left
around.
Getting an out of memory exception depends on a number thingsit is
set up to fail for 64M

Now I understand what is going on, there are a few alternatives:

1) set null on close
- fine for single thread use
- probably leaves (n-1)*segments*2things hanging around for n
threaded use

2) Use a weak reference and leave it up to GC to get rid of the referent
when it is not being used

3) Manage the things youself by object id and thread id - and clean up
on object close() 

I would go with option 1) and 2) although it may mean things get GCed
before a call to close() when not used.

The fix I initially suggested is in production, and has been stress
tested with a couple of hundred users continually pounding the app,
 but not for multithreaded use of IndexReaders. Each time does a couple
of simple searches with no clever reuse of index readers (which is on
the todo list)

I do not see how setting the thread local to null on close() has any
negative impact. You are not going to use the cached information again??

Before the fix: 10-100 threads - 1G JVM - OOM in a few hours 
After: 10-100 threads 256M JVM -  days with a flat memory footprint

I am not sure why the thread local table is so big for us, but that is
not really the issue.
It could just be building lots of IndexReaders (with thread locals
hanging - probably making 10/instance ) and gc not kicking in so

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal = transient/odd memory leaks = OutOfMemoryException

2006-03-23 Thread Robert Engels
The testcase is still not correct - at least with regards to Lucene.

Review the ThreadLocal and ThreadLocalMap code again, you will see that
references to the ThreadLocal are kept using weak references, in a slot in
an array, and entries are reclaimed() (i.e. the slot cleared) PERIODICALLY
as new entries are ADDED or RETRIEVED. The array will never decrease in
size, but it is unlikely to grow to be very large.

See TheadLocalMap.cleanSomeSlots() and where/when it is called.

 The reason that your test fails is that each entry in the table
maintains a HARD reference to the value, and since the entries are only
reclaimed PERIODICALLY (for performance reasons - see cleanSomeSlots()), you
do have the possibility to run out of memory (since the ThreadLocal values
you are storing are EXTREMELY LARGE).

In Lucene, the values used in the ThreadLocal are 1/10 the size, and
thus should cause no problems.

-Original Message-
From: Andy Hind [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 12:17 PM
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException



Well, unfortunately, it is your test case that is not equivalent.

OK you make 10 times as many objects that are 1/10 the size.
But the thread local map is the same size and likely to end up holding
the same number of stale entries, so yes the memory footprint is 10
times smaller and it works.

If you increase the size of the thread local table by x10 it will fail
again. This will not be an exact memory size match due to the underlying
algorithm for cleaning stuff out looking at log2(current.no.of thread
locals) + ?*log2(table.size) and being dependent on GC, so I have no
idea what fraction of stale entries will remain for any size.

I suggest you look at the source code for ThreadLocal and how stale
entries are removed look for GC and you will see where the value
is set to null to help the GC.

I should have proposed using SoftReferences not WeakReferences.
Apologies. Soft References also work fine.

The finalize method is indeed pointless but makes no difference to the
result. The point I was making is that it is pointless in lucene.

Yes, creating threads is expensive - my point - somewhat tongue-in-cheek
is at least they would be clean :-)

You can add an assert to make sure the weak/soft reference exists after
the Object is created. It does. You *may* be unlucky and have GC take
place in between and it go for weak/soft references.

I used 1.5.0_04 and checked the source code for ThreadLocal is 1.33
which is the same version as in 1.5.0_06 and the issue was initially
found using 1.5.0_06.

The issue is for real.
You can blame ThreadLocal but it does what it says on the tin.

Regards

Andy


-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: 23 March 2006 16:05
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException

The only other thing that may be causing your problem is the use of
finalize(). This can interfere with the GC ability to GC objects.

I am now sure why the finalize() is used in the Lucene ThreadLocal
handling. It doesn't seem necessary to me.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 10:00 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException
Importance: High


Your testcase is invalid.

Reduce the size by 10, increase the repeat by 10, (SAME amount of memory
use), and it works fine.

The reason it works in the one case is that you use new
WeakReference(new Arrary()), - since the array cannot be referenced, it
is immediately GC'd. You should have noticed since there were no
finalization messages printed. You can demonstrate this clearly by
adding an else to the if in the finalize() to print out that the object
was indeed finalized.

ThreadLocal's work and are GC'd correctly.

There is something else wrong in your system.

I ran the test using 1.5.0._06.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 7:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Updated: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal = transient/odd memory leaks =
OutOfMemoryException


 [ http://issues.apache.org/jira/browse/LUCENE-529?page=all ]

Andy Hind updated LUCENE-529:
-

Attachment: ThreadLocalTest.java

Attached is a test which you can use to see how ThreadLocals are left
around.
Getting an out of memory exception depends on a number thingsit is
set up to fail for 64M

Now I understand what is going on, there are a few alternatives:

1) set