Just confirmed the fix for this problem is ready in patch LUCENE-1383
Thanks Robert Engels for arguing with me and understand the problem
quickly, and contributed a ClosableThreadLocal class, although the
problem itself is hard to reproduce for him, and thanks Michael
McCandless for fixing the
Yeah I think that's the right approach.
I'll code it up.
Mike
robert engels wrote:
I think that would work, but I think you would be better off
encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal,
and use that every where. Add a method clear(), that clears the
ThreadLoca
Technically, you need to sync on the set as well, since you need to
remove the old value, and add the new to the list. Although Lucene
doesn't use the set. just the initial value set, so the overhead is
minimal.
On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:
OK so we compact the
I think that would work, but I think you would be better off
encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal,
and use that every where. Add a method clear(), that clears the
ThreadLocals list (which will allow the values to be GC'd).
On Sep 11, 2008, at 9:43 AM, Michael
OK so we compact the list (removing dead threads) every time we add a
new entry to the list. This way for a long lived SegmentReader but
short lived threads, the list keeps only live threads.
We do need sync access to the list, but that's only on binding a new
thread. Retrieving an exis
You still need to sync access to the list, and how would it be
removed from the list prior to close? That is you need one per
thread, but you can have the reader shared across all threads. So if
threads were created and destroyed without ever closing the reader,
the list would grow unbounde
I don't need it by thread, because I would still use ThreadLocal to
retrieve the SegmentTermEnum. This avoids any sync during get.
The list is just a "fallback" to hold a hard reference to the
SegmentTermEnum to keep it alive. That's it's only purpose. Then,
when SegmentReader is close
But you need it by thread, so it can't be a list.
You could have a HashMap of in FieldsReader, and
when SegmentReader is closed, FieldsReader is closed, which clears
the map, and not use thread locals at all. The difference being you
would need a sync'd map.
On Sep 11, 2008, at 4:56 AM,
What if we wrap the value in a WeakReference, but secondarily hold a
hard reference to it in a "normal" list?
Then, when TermInfosReader is closed we clear that list of all its
hard references, at which point GC will be free to reclaim the object
out from under the ThreadLocal even before
You can't hold the ThreadLocal value in a WeakReference, because
there is no hard reference between enumeration calls (so it would be
cleared out from under you while enumerating).
All of this occurs because you have some objects (readers/segments
etc.) that are shared across all threads, b
When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$
Well, the code is correct, because it can work by avoiding this trap. But it
failed to act as a good API.
I learned the inside details from you. I am not the only one that's trapped.
And more users will likely be trapped again, unless javadoc to describe the
close() function is changed. Actually,
Always your prerogative.
On Sep 10, 2008, at 1:15 PM, Chris Lu wrote:
Actually I am done with it by simply downgrading and not to use
r659602 and later.
The old version is more clean and consistent with the API and close
() does mean close, not something complicated and unknown to most
user
Actually I am done with it by simply downgrading and not to use r659602 and
later.The old version is more clean and consistent with the API and close()
does mean close, not something complicated and unknown to most users, which
almost feels like a trap. And later on, if no changes happened for this
Why not just use reopen() and be done with it???
On Sep 10, 2008, at 12:48 PM, Chris Lu wrote:
Yeah, the timing is different. But it's an unknown, undetermined,
and uncontrollable time...
We can not ask the user,
while(memory is low){
sleep(1000);
}
do_the_real_thing_an_hour_later
--
Ch
Yeah, the timing is different. But it's an unknown, undetermined, and
uncontrollable time...
We can not ask the user,
while(memory is low){
sleep(1000);
}
do_the_real_thing_an_hour_later
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site:
SafeThreadLocal is very interesting. It'll be good not only for Lucene, but
also other projects.
Could you please post it?
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Datab
Not likely. Actually I made some changes to Lucene source code and I can see
the changes in the memory snapshot. So it is the latest Lucene version.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.
Close() does work - it is just that the memory may not be freed until
much later...
When working with VERY LARGE objects, this can be a problem.
On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:
Thanks for the analysis, really appreciate it, and I agree with it.
But...
This is really a normal
Not holding searcher/reader. I did check that via memory snapshot.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.ph
Thanks for the analysis, really appreciate it, and I agree with it. But...
This is really a normal J2EE use case. The threads seldom die.
Doesn't that mean closing the RAMDirectory doesn't work for J2EE
applications?
And only reopen() works?
And close() doesn't release the resources? duh...
I can
The other thing Lucene can do is create a SafeThreadLocal - it is
rather trivial, and have that integrate at a higher-level, allowing
for manual clean-up across all threads.
It MIGHT be a bit slower than the JDK version (since that uses
heuristics to clear stale entries), and so doesn't al
My review of truck, show a SegmentReader, contains a TermInfosReader,
which contains a threadlocal of ThreadResources, which contains a
SegmentTermEnum.
So there should be a ThreadResources in the memory profiler for each
SegmentTermEnum instances - unless you have something goofy going on.
You do not need to create a new RAMDirectory - just write to the
existing one, and then reopen() the IndexReader using it.
This will prevent lots of big objects being created. This may be the
source of your problem.
Even if the Segment is closed, the ThreadLocal will no longer be
referenc
Good question.
As far as I can tell, nowhere in Lucene do we put a SegmentTermEnum
directly into ThreadLocal, after rev 659602.
Is it possible that output came from a run with Lucene before rev
659602?
Mike
Chris Lu wrote:
Is it possible that some other places that's using SegmentTermE
Chris,
After you close your IndexSearcher/Reader, is it possible you're still
holding a reference to it?
Mike
Chris Lu wrote:
Frankly I don't know why TermInfosReader.ThreadResources is not
showing up in the memory snapshot.
Yes. It's been there for a long time. But let's see what's ch
Is it possible that some other places that's using SegmentTermEnum as
ThreadLocal?This may explain why TermInfosReader.ThreadResources is not in
the memory snapshot.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
de
I am really want to find out where I am doing wrong, if that's the case.
Yes. I have made certain that I closed all Readers/Searchers, and verified
that through memory profiler.
Yes. I am creating new RAMDirectory. But that's the problem. I need to
update the content. Sure, if no content update an
Actually, a single RAMDirectory would be sufficient (since it
supports writes). There should never be a reason to create a new
RAMDirectory (unless you have some specialized real-time search
occuring).
If you are creating new RAMDirectories, the statements below hold.
On Sep 10, 2008, at 1
It is basic Java. Threads are not guaranteed to run on any sort of
schedule. If you create lots of large objects in one thread,
releasing them in another, there is a good chance you will get an OOM
(since the releasing thread may not run before the OOM occurs)...
This is not Lucene specifi
Frankly I don't know why TermInfosReader.ThreadResources is not showing up
in the memory snapshot.
Yes. It's been there for a long time. But let's see what's changed : A LRU
cache of termInfoCache is added.
I SegmentTermEnum previously would be released, since it's relatively a
simple object.
But
I do not believe I am making any mistake. Actually I just got an email from
another user, complaining about the same thing. And I am having the same
usage pattern.
After the reader is opened, the RAMDirectory is shared by several objects.
There is one instance of RAMDirectory in the memory, and it
Does this make any difference?If I intentionally close the searcher and
reader failed to release the memory, I can not rely on some magic of JVM to
release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: ht
Sorry, but I am fairly certain you are mistaken.
If you only have a single IndexReader, the RAMDirectory will be
shared in all cases.
The only memory growth is any buffer space allocated by an IndexInput
(used in many places and cached).
Normally the IndexInput created by a RAMDirectory d
Why do you need to keep a strong reference?
Why not a WeakReference ?
--Noble
On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu <[EMAIL PROTECTED]> wrote:
> The problem should be similar to what's talked about on this discussion.
> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
>
> Th
I still don't quite understand what's causing your memory growth.
SegmentTermEnum insances have been held in a ThreadLocal cache in
TermInfosReader for a very long time (at least since Lucene 1.4).
If indeed it's the RAMDir's contents being kept "alive" due to this,
then, you should have a
Actually, even I only use one IndexReader, some resources are cached via the
ThreadLocal cache, and can not be released unless all threads do the close
action.
SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
which is big.
--
Chris Lu
-
Instant S
Yes. In the end, the IndexReader holds a large object via ThreadLocal.
On the one hand, I should pool IndexReader because opening IndexReader cost
a lot.
On the other hand, I should not pool IndexReader because some resources are
cached via ThreadLocal, and unless all threads closes the IndexReader
As a follow-up, the SegmentTermEnum does contain an IndexInput and
based on your configuration (buffer sizes, eg) this could be a large
object, so you do need to be careful !
On Sep 10, 2008, at 12:14 AM, robert engels wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,
You do not need a pool of IndexReaders...
It does not matter what class it is, what matters is the class that
ultimately holds the reference.
If the IndexReader is never closed, the SegmentReader(s) is never
closed, so the thread local in TermInfosReader is not cleared
(because the thread
I have tried to create an IndexReader pool and dynamically create searcher.
But the memory leak is the same. It's not related to the Searcher class
specifically, but the SegmentTermEnum in TermInfosReader.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/App
A searcher uses an IndexReader - the IndexReader is slow to open, not
a Searcher. And searchers can share an IndexReader.
You want to create a single shared (across all threads/users)
IndexReader (usually), and create an Searcher as needed and dispose.
It is VERY CHEAP to create the Search
On J2EE environment, usually there is a searcher pool with several searchers
open.The speed to opening a large index for every user is not acceptable.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://sear
You need to close the searcher within the thread that is using it, in
order to have it cleaned up quickly... usually right after you
display the page of results.
If you are keeping multiple searcher refs across multiple threads for
paging/whatever, you have not coded it correctly.
Imagine
Right, in a sense I can not release it from another thread. But that's the
problem.
It's a J2EE environment, all threads are kind of equal. It's simply not
possible to iterate through all threads to close the searcher, thus
releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE
Your code is not correct. You cannot release it on another thread -
the first thread may creating hundreds/thousands of instances before
the other thread ever runs...
On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
If I release it on the thread that's creating the searcher, by
setting searche
If I release it on the thread that's creating the searcher, by setting
searcher=null, everything is fine, the memory is released very cleanly.
My load test was to repeatedly create a searcher on a RAMDirectory and
release it on another thread. The test will quickly go to OOM after several
runs. I s
Chris Lu wrote:
The problem should be similar to what's talked about on this
discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
The "rough" conclusion of that thread is that, technically, this isn't
a memory leak but rather a "delayed freeing" problem. Ie, it m
The problem should be similar to what's talked about on this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
May23,2008)
This patch brings in a ThreadLocal cache to TermInfosReader.
It's usually
49 matches
Mail list logo