Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
Actually, even I only use one IndexReader, some resources are cached via the ThreadLocal cache, and can not be released unless all threads do the close action. SegmentTermEnum itself is small, but it holds RAMDirectory along the path, which is big. -- Chris Lu - Instant S

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
Yes. In the end, the IndexReader holds a large object via ThreadLocal. On the one hand, I should pool IndexReader because opening IndexReader cost a lot. On the other hand, I should not pool IndexReader because some resources are cached via ThreadLocal, and unless all threads closes the IndexReader

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
As a follow-up, the SegmentTermEnum does contain an IndexInput and based on your configuration (buffer sizes, eg) this could be a large object, so you do need to be careful ! On Sep 10, 2008, at 12:14 AM, robert engels wrote: A searcher uses an IndexReader - the IndexReader is slow to open,

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
You do not need a pool of IndexReaders... It does not matter what class it is, what matters is the class that ultimately holds the reference. If the IndexReader is never closed, the SegmentReader(s) is never closed, so the thread local in TermInfosReader is not cleared (because the thread

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
I have tried to create an IndexReader pool and dynamically create searcher. But the memory leak is the same. It's not related to the Searcher class specifically, but the SegmentTermEnum in TermInfosReader. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/App

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
A searcher uses an IndexReader - the IndexReader is slow to open, not a Searcher. And searchers can share an IndexReader. You want to create a single shared (across all threads/users) IndexReader (usually), and create an Searcher as needed and dispose. It is VERY CHEAP to create the Search

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
On J2EE environment, usually there is a searcher pool with several searchers open.The speed to opening a large index for every user is not acceptable. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://sear

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
You need to close the searcher within the thread that is using it, in order to have it cleaned up quickly... usually right after you display the page of results. If you are keeping multiple searcher refs across multiple threads for paging/whatever, you have not coded it correctly. Imagine

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
Right, in a sense I can not release it from another thread. But that's the problem. It's a J2EE environment, all threads are kind of equal. It's simply not possible to iterate through all threads to close the searcher, thus releasing the ThreadLocal cache. Unless Lucene is not recommended for J2EE

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
Your code is not correct. You cannot release it on another thread - the first thread may creating hundreds/thousands of instances before the other thread ever runs... On Sep 9, 2008, at 10:10 PM, Chris Lu wrote: If I release it on the thread that's creating the searcher, by setting searche

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
If I release it on the thread that's creating the searcher, by setting searcher=null, everything is fine, the memory is released very cleanly. My load test was to repeatedly create a searcher on a RAMDirectory and release it on another thread. The test will quickly go to OOM after several runs. I s

[jira] Issue Comment Edited: (LUCENE-1378) Remove remaining @author references

2008-09-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629688#action_12629688 ] [EMAIL PROTECTED] edited comment on LUCENE-1378 at 9/9/08 8:06 PM: -

[jira] Updated: (LUCENE-1378) Remove remaining @author references

2008-09-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1378: Attachment: LUCENE-1378.patch Here is a clean patch off trunk if you want to avoid the perl (have

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Michael McCandless
Chris Lu wrote: The problem should be similar to what's talked about on this discussion. http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal The "rough" conclusion of that thread is that, technically, this isn't a memory leak but rather a "delayed freeing" problem. Ie, it m

[jira] Commented: (LUCENE-1378) Remove remaining @author references

2008-09-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629631#action_12629631 ] Otis Gospodnetic commented on LUCENE-1378: -- Eh, rusty perl $ find . -name \*.jav

ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
The problem should be similar to what's talked about on this discussion. http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal There is a memory leak for Lucene search from Lucene-1195.(svn r659602, May23,2008) This patch brings in a ThreadLocal cache to TermInfosReader. It's usually

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Ning Li
>>> Even so, >>> this may not be sufficient for some FS such as HDFS... Is it >>> reasonable in this case to keep in memory everything including >>> stored fields and term vectors? >> >> We could maybe do something like a proxy IndexInput/IndexOutput that >> would allow updating the read buffer fro

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 12:45 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> No, it would essentially be a change in the semantics that all >> implementations would need to support. > > Right, which is you are allowed to open an IndexInput on a file when an > IndexOutput

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 12:41 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> OR, if all writes are append-only, perhaps we don't ever need to >> invalidate the read buffer and would just need to remove the current >> logic that caches the file length and then let the unde

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Michael McCandless
Yonik Seeley wrote: On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote: On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: Yeah, I think the underlying RandomAccessFile might do the right thing, but IndexInput isn't required to see any changes on the fly

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Michael McCandless
Yonik Seeley wrote: On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: What about something like term freq? Would it need to count the number of docs after the local maxDoc or is there a better way? Good question... I think we'd have to take

[jira] Updated: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1354: Attachment: LUCENE-1354.patch Has CheckIndexStatus. Will commit shortly > Provide Progra

[jira] Issue Comment Edited: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629515#action_12629515 ] gsingers edited comment on LUCENE-1354 at 9/9/08 9:15 AM: -

[jira] Commented: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629515#action_12629515 ] Grant Ingersoll commented on LUCENE-1354: - Mike, I think you forgot to add the Che

[jira] Resolved: (LUCENE-1243) A few new benchmark tasks

2008-09-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1243. - Resolution: Fixed Lucene Fields: [Patch Available] (was: [Patch Available, New])

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote: > On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> Yeah, I think the underlying RandomAccessFile might do the right >> thing, but IndexInput isn't required to see any changes on the fly >> (and current im

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Ning Li
On Mon, Sep 8, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> I thought an index reader which supports real-time search no longer >> maintains a static view of an index? > > It seems advantageous to just make it really cheap to get a new view > of the index (if you do it for every sear

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> What about something like term freq? Would it need to count the >> number of docs after the local maxDoc or is there a better way? > > Good question... > > I think we'd have to take a full copy o

[jira] Resolved: (LUCENE-1357) SpanScorer does not respect ConstantScoreRangeQuery setting

2008-09-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1357. - Resolution: Fixed > SpanScorer does not respect ConstantScoreRangeQuery setting > --

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629432#action_12629432 ] Michael McCandless commented on LUCENE-914: --- I really don't have a strong opinion

[jira] Updated: (LUCENE-1379) SpanScorer fails when sloppyFreq() returns 0

2008-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1379: --- Description: I think we should fix this for 2.4 (now back to 10)? Fix Version/s

2.4 status

2008-09-09 Thread Michael McCandless
OK we are gradually whittling down the list. It's down to 9 issues now. I have 2 issues, Grant has 3, Otis has 2 and Mark and Karl have 1 each. Can each of you try to finish your issues this week, or, take them off your plate / move to future? We are almost there!! I can be the release ma

[jira] Assigned: (LUCENE-1378) Remove remaining @author references

2008-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1378: -- Assignee: Otis Gospodnetic Otis can you finish & commit this? > Remove remain

[jira] Assigned: (LUCENE-1344) Make the Lucene jar an OSGi bundle

2008-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1344: -- Assignee: Michael McCandless > Make the Lucene jar an OSGi bundle > --

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Michael McCandless
This would just tap into the live hashtable that DocumentsWriter* maintain for the posting lists... except the docFreq will need to be copied away on reopen, I think. Mike Jason Rutherglen wrote: Term dictionary? I'm curious how that would be solved? On Mon, Sep 8, 2008 at 3:04 PM, Mic

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Michael McCandless
Yonik Seeley wrote: On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Right, getCurrentIndex would return a MultiReader that includes SegmentReader for each segment in the index, plus a "RAMReader" that searches the RAM buffer. That RAMReader is a tiny shell class