[jira] Updated: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-10 Thread David Fertig (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Fertig updated LUCENE-1381: - Description: With several older lucene projects already running and "stable", I have recently w

[jira] Created: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-10 Thread David Fertig (JIRA)
Hanging while indexing/digesting on multiple threads Key: LUCENE-1381 URL: https://issues.apache.org/jira/browse/LUCENE-1381 Project: Lucene - Java Issue Type: Bug Components: An

[jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader

2008-09-10 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630091#action_12630091 ] robert engels commented on LUCENE-1195: --- Also, SafeThreadLocal can be trivially chan

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
You can't hold the ThreadLocal value in a WeakReference, because there is no hard reference between enumeration calls (so it would be cleared out from under you while enumerating). All of this occurs because you have some objects (readers/segments etc.) that are shared across all threads, b

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
When I look at the reference tree That is the feeling I get. if you held a WeakReference it would get released . |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput |- input of org.apache.lucene.index.SegmentTermEnum |- value of java.lang.ThreadLocal$

[jira] Updated: (LUCENE-1195) Performance improvement for TermInfosReader

2008-09-10 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robert engels updated LUCENE-1195: -- Attachment: SafeThreadLocal.java A "safe" ThreadLocal that can be used for more deterministic

[jira] Resolved: (LUCENE-1366) Rename Field.Index.UN_TOKENIZED/TOKENIZED/NO_NORMS

2008-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1366. Resolution: Fixed Committed revision 694004. > Rename Field.Index.UN_TOKENIZED/TO

Re: 2.4 status

2008-09-10 Thread John Wang
Looking forward to 2.4! -John On Tue, Sep 9, 2008 at 2:38 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > OK we are gradually whittling down the list. It's down to 9 issues now. > > I have 2 issues, Grant has 3, Otis has 2 and Mark and Karl have 1 each. > > Can each of you try to finish

docid set compression and boolean docid set operations

2008-09-10 Thread John Wang
Hi guys: We have build this on top of the lucene 1.4. api/refactoring for docid sets and docIdIterater. We've implemented the p4Delta compression algorithm presented at www2008: http://www2008.org/papers/fp618.html We've been using this in production here at LinkedIn and would lov

Re: docid set compression and boolean docid set operations

2008-09-10 Thread John Wang
Sorry, I meant lucene 2.4 -John On Wed, Sep 10, 2008 at 2:08 PM, John Wang <[EMAIL PROTECTED]> wrote: > Hi guys: > > We have build this on top of the lucene 1.4. api/refactoring for docid > sets and docIdIterater. > > We've implemented the p4Delta compression algorithm presented at > w

[jira] Commented: (LUCENE-1344) Make the Lucene jar an OSGi bundle

2008-09-10 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629954#action_12629954 ] Nicolas Lalevée commented on LUCENE-1344: - About the missing header in the maven j

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Well, the code is correct, because it can work by avoiding this trap. But it failed to act as a good API. I learned the inside details from you. I am not the only one that's trapped. And more users will likely be trapped again, unless javadoc to describe the close() function is changed. Actually,

Re: Realtime Search for Social Networks Collaboration

2008-09-10 Thread Jason Rutherglen
Hi Mike, There would be a new sorted list or something to replace the hashtable? Seems like an issue that is not solved. Jason On Tue, Sep 9, 2008 at 5:29 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > This would just tap into the live hashtable that DocumentsWriter* maintain > for the p

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Always your prerogative. On Sep 10, 2008, at 1:15 PM, Chris Lu wrote: Actually I am done with it by simply downgrading and not to use r659602 and later. The old version is more clean and consistent with the API and close () does mean close, not something complicated and unknown to most user

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Actually I am done with it by simply downgrading and not to use r659602 and later.The old version is more clean and consistent with the API and close() does mean close, not something complicated and unknown to most users, which almost feels like a trap. And later on, if no changes happened for this

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Why not just use reopen() and be done with it??? On Sep 10, 2008, at 12:48 PM, Chris Lu wrote: Yeah, the timing is different. But it's an unknown, undetermined, and uncontrollable time... We can not ask the user, while(memory is low){ sleep(1000); } do_the_real_thing_an_hour_later -- Ch

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Yeah, the timing is different. But it's an unknown, undetermined, and uncontrollable time... We can not ask the user, while(memory is low){ sleep(1000); } do_the_real_thing_an_hour_later -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site:

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
SafeThreadLocal is very interesting. It'll be good not only for Lucene, but also other projects. Could you please post it? -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Datab

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Not likely. Actually I made some changes to Lucene source code and I can see the changes in the memory snapshot. So it is the latest Lucene version. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Close() does work - it is just that the memory may not be freed until much later... When working with VERY LARGE objects, this can be a problem. On Sep 10, 2008, at 12:36 PM, Chris Lu wrote: Thanks for the analysis, really appreciate it, and I agree with it. But... This is really a normal

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Not holding searcher/reader. I did check that via memory snapshot. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.ph

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Thanks for the analysis, really appreciate it, and I agree with it. But... This is really a normal J2EE use case. The threads seldom die. Doesn't that mean closing the RAMDirectory doesn't work for J2EE applications? And only reopen() works? And close() doesn't release the resources? duh... I can

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
The other thing Lucene can do is create a SafeThreadLocal - it is rather trivial, and have that integrate at a higher-level, allowing for manual clean-up across all threads. It MIGHT be a bit slower than the JDK version (since that uses heuristics to clear stale entries), and so doesn't al

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
My review of truck, show a SegmentReader, contains a TermInfosReader, which contains a threadlocal of ThreadResources, which contains a SegmentTermEnum. So there should be a ThreadResources in the memory profiler for each SegmentTermEnum instances - unless you have something goofy going on.

[jira] Commented: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-10 Thread Michael Semb Wever (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629846#action_12629846 ] Michael Semb Wever commented on LUCENE-1380: i suspected such re the option na

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
You do not need to create a new RAMDirectory - just write to the existing one, and then reopen() the IndexReader using it. This will prevent lots of big objects being created. This may be the source of your problem. Even if the Segment is closed, the ThreadLocal will no longer be referenc

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless
Good question. As far as I can tell, nowhere in Lucene do we put a SegmentTermEnum directly into ThreadLocal, after rev 659602. Is it possible that output came from a run with Lucene before rev 659602? Mike Chris Lu wrote: Is it possible that some other places that's using SegmentTermE

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless
Chris, After you close your IndexSearcher/Reader, is it possible you're still holding a reference to it? Mike Chris Lu wrote: Frankly I don't know why TermInfosReader.ThreadResources is not showing up in the memory snapshot. Yes. It's been there for a long time. But let's see what's ch

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Is it possible that some other places that's using SegmentTermEnum as ThreadLocal?This may explain why TermInfosReader.ThreadResources is not in the memory snapshot. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net de

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
I am really want to find out where I am doing wrong, if that's the case. Yes. I have made certain that I closed all Readers/Searchers, and verified that through memory profiler. Yes. I am creating new RAMDirectory. But that's the problem. I need to update the content. Sure, if no content update an

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Actually, a single RAMDirectory would be sufficient (since it supports writes). There should never be a reason to create a new RAMDirectory (unless you have some specialized real-time search occuring). If you are creating new RAMDirectories, the statements below hold. On Sep 10, 2008, at 1

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
It is basic Java. Threads are not guaranteed to run on any sort of schedule. If you create lots of large objects in one thread, releasing them in another, there is a good chance you will get an OOM (since the releasing thread may not run before the OOM occurs)... This is not Lucene specifi

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Frankly I don't know why TermInfosReader.ThreadResources is not showing up in the memory snapshot. Yes. It's been there for a long time. But let's see what's changed : A LRU cache of termInfoCache is added. I SegmentTermEnum previously would be released, since it's relatively a simple object. But

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
I do not believe I am making any mistake. Actually I just got an email from another user, complaining about the same thing. And I am having the same usage pattern. After the reader is opened, the RAMDirectory is shared by several objects. There is one instance of RAMDirectory in the memory, and it

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Does this make any difference?If I intentionally close the searcher and reader failed to release the memory, I can not rely on some magic of JVM to release it. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: ht

[jira] Commented: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-10 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629827#action_12629827 ] Steven Rowe commented on LUCENE-1380: - As I said in the thread on java-user that spawn

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Sorry, but I am fairly certain you are mistaken. If you only have a single IndexReader, the RAMDirectory will be shared in all cases. The only memory growth is any buffer space allocated by an IndexInput (used in many places and cached). Normally the IndexInput created by a RAMDirectory d

[jira] Updated: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-10 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1320: Attachment: LUCENE-1320.patch Java 1.4 compatible. Give this a try > ShingleMatrixFilter

[jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-10 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629798#action_12629798 ] Grant Ingersoll commented on LUCENE-1320: - I'm almost done w/ a conversion. Regex

[jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-10 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629763#action_12629763 ] Karl Wettin commented on LUCENE-1320: - It really is quite a bit of work to downgrade t

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
Why do you need to keep a strong reference? Why not a WeakReference ? --Noble On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu <[EMAIL PROTECTED]> wrote: > The problem should be similar to what's talked about on this discussion. > http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal > > Th

[jira] Updated: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-10 Thread Michael Semb Wever (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated LUCENE-1380: --- Attachment: LUCENE-1380.patch Addition to ShingleFilter for property coterminalPosit

[jira] Created: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-10 Thread Michael Semb Wever (JIRA)
Patch for ShingleFilter.coterminalPositionIncrement --- Key: LUCENE-1380 URL: https://issues.apache.org/jira/browse/LUCENE-1380 Project: Lucene - Java Issue Type: Improvement Componen

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless
I still don't quite understand what's causing your memory growth. SegmentTermEnum insances have been held in a ThreadLocal cache in TermInfosReader for a very long time (at least since Lucene 1.4). If indeed it's the RAMDir's contents being kept "alive" due to this, then, you should have a