Actually, even I only use one IndexReader, some resources are cached via the
ThreadLocal cache, and can not be released unless all threads do the close
action.
SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
which is big.
--
Chris Lu
-
Instant S
Yes. In the end, the IndexReader holds a large object via ThreadLocal.
On the one hand, I should pool IndexReader because opening IndexReader cost
a lot.
On the other hand, I should not pool IndexReader because some resources are
cached via ThreadLocal, and unless all threads closes the IndexReader
As a follow-up, the SegmentTermEnum does contain an IndexInput and
based on your configuration (buffer sizes, eg) this could be a large
object, so you do need to be careful !
On Sep 10, 2008, at 12:14 AM, robert engels wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,
You do not need a pool of IndexReaders...
It does not matter what class it is, what matters is the class that
ultimately holds the reference.
If the IndexReader is never closed, the SegmentReader(s) is never
closed, so the thread local in TermInfosReader is not cleared
(because the thread
I have tried to create an IndexReader pool and dynamically create searcher.
But the memory leak is the same. It's not related to the Searcher class
specifically, but the SegmentTermEnum in TermInfosReader.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/App
A searcher uses an IndexReader - the IndexReader is slow to open, not
a Searcher. And searchers can share an IndexReader.
You want to create a single shared (across all threads/users)
IndexReader (usually), and create an Searcher as needed and dispose.
It is VERY CHEAP to create the Search
On J2EE environment, usually there is a searcher pool with several searchers
open.The speed to opening a large index for every user is not acceptable.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://sear
You need to close the searcher within the thread that is using it, in
order to have it cleaned up quickly... usually right after you
display the page of results.
If you are keeping multiple searcher refs across multiple threads for
paging/whatever, you have not coded it correctly.
Imagine
Right, in a sense I can not release it from another thread. But that's the
problem.
It's a J2EE environment, all threads are kind of equal. It's simply not
possible to iterate through all threads to close the searcher, thus
releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE
Your code is not correct. You cannot release it on another thread -
the first thread may creating hundreds/thousands of instances before
the other thread ever runs...
On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
If I release it on the thread that's creating the searcher, by
setting searche
If I release it on the thread that's creating the searcher, by setting
searcher=null, everything is fine, the memory is released very cleanly.
My load test was to repeatedly create a searcher on a RAMDirectory and
release it on another thread. The test will quickly go to OOM after several
runs. I s
[
https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629688#action_12629688
]
[EMAIL PROTECTED] edited comment on LUCENE-1378 at 9/9/08 8:06 PM:
-
[
https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1378:
Attachment: LUCENE-1378.patch
Here is a clean patch off trunk if you want to avoid the perl (have
Chris Lu wrote:
The problem should be similar to what's talked about on this
discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
The "rough" conclusion of that thread is that, technically, this isn't
a memory leak but rather a "delayed freeing" problem. Ie, it m
[
https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629631#action_12629631
]
Otis Gospodnetic commented on LUCENE-1378:
--
Eh, rusty perl
$ find . -name \*.jav
The problem should be similar to what's talked about on this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
May23,2008)
This patch brings in a ThreadLocal cache to TermInfosReader.
It's usually
>>> Even so,
>>> this may not be sufficient for some FS such as HDFS... Is it
>>> reasonable in this case to keep in memory everything including
>>> stored fields and term vectors?
>>
>> We could maybe do something like a proxy IndexInput/IndexOutput that
>> would allow updating the read buffer fro
On Tue, Sep 9, 2008 at 12:45 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> No, it would essentially be a change in the semantics that all
>> implementations would need to support.
>
> Right, which is you are allowed to open an IndexInput on a file when an
> IndexOutput
On Tue, Sep 9, 2008 at 12:41 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> OR, if all writes are append-only, perhaps we don't ever need to
>> invalidate the read buffer and would just need to remove the current
>> logic that caches the file length and then let the unde
Yonik Seeley wrote:
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote:
On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]>
wrote:
Yeah, I think the underlying RandomAccessFile might do the right
thing, but IndexInput isn't required to see any changes on the fly
Yonik Seeley wrote:
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
What about something like term freq? Would it need to count the
number of docs after the local maxDoc or is there a better way?
Good question...
I think we'd have to take
[
https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-1354:
Attachment: LUCENE-1354.patch
Has CheckIndexStatus. Will commit shortly
> Provide Progra
[
https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629515#action_12629515
]
gsingers edited comment on LUCENE-1354 at 9/9/08 9:15 AM:
-
[
https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629515#action_12629515
]
Grant Ingersoll commented on LUCENE-1354:
-
Mike, I think you forgot to add the Che
[
https://issues.apache.org/jira/browse/LUCENE-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll resolved LUCENE-1243.
-
Resolution: Fixed
Lucene Fields: [Patch Available] (was: [Patch Available, New])
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> Yeah, I think the underlying RandomAccessFile might do the right
>> thing, but IndexInput isn't required to see any changes on the fly
>> (and current im
On Mon, Sep 8, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> I thought an index reader which supports real-time search no longer
>> maintains a static view of an index?
>
> It seems advantageous to just make it really cheap to get a new view
> of the index (if you do it for every sear
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> What about something like term freq? Would it need to count the
>> number of docs after the local maxDoc or is there a better way?
>
> Good question...
>
> I think we'd have to take a full copy o
[
https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller resolved LUCENE-1357.
-
Resolution: Fixed
> SpanScorer does not respect ConstantScoreRangeQuery setting
> --
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629432#action_12629432
]
Michael McCandless commented on LUCENE-914:
---
I really don't have a strong opinion
[
https://issues.apache.org/jira/browse/LUCENE-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1379:
---
Description: I think we should fix this for 2.4 (now back to 10)?
Fix Version/s
OK we are gradually whittling down the list. It's down to 9 issues now.
I have 2 issues, Grant has 3, Otis has 2 and Mark and Karl have 1 each.
Can each of you try to finish your issues this week, or, take them off
your plate / move to future?
We are almost there!!
I can be the release ma
[
https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1378:
--
Assignee: Otis Gospodnetic
Otis can you finish & commit this?
> Remove remain
[
https://issues.apache.org/jira/browse/LUCENE-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1344:
--
Assignee: Michael McCandless
> Make the Lucene jar an OSGi bundle
> --
This would just tap into the live hashtable that DocumentsWriter*
maintain for the posting lists... except the docFreq will need to be
copied away on reopen, I think.
Mike
Jason Rutherglen wrote:
Term dictionary? I'm curious how that would be solved?
On Mon, Sep 8, 2008 at 3:04 PM, Mic
Yonik Seeley wrote:
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Right, getCurrentIndex would return a MultiReader that includes
SegmentReader for each segment in the index, plus a "RAMReader" that
searches the RAM buffer. That RAMReader is a tiny shell class
36 matches
Mail list logo