[jira] Updated: (LUCENE-545) Field Selection and Lazy Field Loading

2006-05-03 Thread Chuck Williams (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-545?page=all ] Chuck Williams updated LUCENE-545: -- Attachment: LazyFields.tar.gz Continuing the discussion from Lucene-558, LazyFields.tar.gz extends this patch (Lucene-545) with an additional optimization

Control over Lucene Index

2006-05-03 Thread Ralf Bierig
Hi, in the context of a distributed information retrieval project, we would like to use Lucene for its indexing capabilities but not for retrieval. In particular, we would like to populate a Lucene index with the tokens and statistics already computed by an external indexer, thereby bypassin

Re: Control over Lucene Index

2006-05-03 Thread Grant Ingersoll
You could write a "dummy" Analyzer that provides the tokens from your external process. As for statistics, what kind are you interested in? I suppose you can store them in a field along with the document, or you can set the boost values for the field/document, but that may be a bit simple for

Re: Control over Lucene Index

2006-05-03 Thread Doug Cutting
You could implement the IndexReader API, then use IndexMerger to write this in Lucene's format. Doug Ralf Bierig wrote: Hi, in the context of a distributed information retrieval project, we would like to use Lucene for its indexing capabilities but not for retrieval. In particular, we woul

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread Nicholaus Shupe (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377604 ] Nicholaus Shupe commented on LUCENE-436: I agree with robert engels about the finalize method being taken out of the class. In fact, the finalize method is run by the

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377607 ] robert engels commented on LUCENE-436: -- fwiw, we have done EXTENSIVE memory profiling of the low-level Lucene (and JVM) routines. It is my opinion that there are no memo

Re: Returning a minimum number of clusters

2006-05-03 Thread Marvin Humphrey
On May 2, 2006, at 1:55 PM, [EMAIL PROTECTED] wrote:This is an issue of scaling the different dimensions. It is more expensive to calculate similarity based on the entire document's contents rather than just a snippet chosen by the Highlighter. However, it's presumably more accurate, and

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread Nicholaus Shupe (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377623 ] Nicholaus Shupe commented on LUCENE-436: robert engels is *incorrect* in that there isn't a memory leak with Lucene. The test case made by kieran demonstrates the mem

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377628 ] robert engels commented on LUCENE-436: -- I agree that the test case does fail, so I was wrong. There is "something" broken in Lucene or the JDK. What is the JDK bug that

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread Nicholaus Shupe (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377631 ] Nicholaus Shupe commented on LUCENE-436: The problem is apparent on my machine with JDK 1.5, so basically this bug shows up, regardless of JDK version. I've looked at

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread Nicholaus Shupe (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377636 ] Nicholaus Shupe commented on LUCENE-436: This type of ThreadLocal (anti?)pattern also seems to be present in SegmentReader. > [PATCH] TermInfosReader, SegmentTermEnum

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377642 ] robert engels commented on LUCENE-436: -- It works fine for me under 1.5.0_06. I have determined the problem with the ThreadLocal code in 1.4.2. The entries in the ThreadL

Re: storing term text internally as byte array and bytecount as prefix, etc.

2006-05-03 Thread Marvin Humphrey
On May 1, 2006, at 7:33 PM, Chuck Williams wrote: > Could someone summarize succinctly why it is considered a > major issue that Lucene uses the Java modified UTF-8 > encoding within its index rather than the standard UTF-8 > encoding. Is the only concern compatibility with index > formats in ot

[jira] Updated: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=all ] robert engels updated LUCENE-436: - Attachment: FixedThreadLocal.java ThreadLocal replacement that avoids memory leak > [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception > --

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377688 ] robert engels commented on LUCENE-436: -- I posted FixedThreadLocal.java. Changed Lucene to use this class instead of ThreadLocal. Test case runs fine under 1.4. The perfo

[jira] Updated: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=all ] robert engels updated LUCENE-436: - Attachment: ThreadLocalTest.java demonstrate thread local memory leak > [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception > --

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377697 ] robert engels commented on LUCENE-436: -- I posted ThreadLocalTest.java that clearly demonstrates the "problem" with JDK 1.4. BUT, it also demonstrates the problem with th

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-03 Thread robert engels (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377701 ] robert engels commented on LUCENE-436: -- Just for my own "pat on the back", it goes back to what I said in the earliest comment. There is NOT a memory leak in Lucene or J