Re: Existing Parsers

2004-09-13 Thread Honey George
Hi Chris, I do not have a stats but I think the performance is reasonable. I use xpdf for PDF wvWare for DOC. The size of my index is ~2GB (this is not limited to only pdf doc). For avoiding memory problems, I have set an upperbound to the size of the documents that can be indexed. For

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

2004-09-13 Thread Daniel Taurat
Hi Doug, you are absolutely right about the older version of the JDK: it is 1.3.1 (ibm). Unfortunately we cannot upgrade since we are bound to IBM Portalserver 4 environment. Results: I patched the Lucene1.4.1: it has improved not much: after indexing 1897 Objects the number of SegmentTermEnum

ANT +BUILD + LUCENE

2004-09-13 Thread Karthik N S
Hi Guys Apologies.. The Task for me is to build the Index folder using Lucene a simple Build.xml for ANT The Problem .. Same 'Build .xml' should be used for differnet O/s... [ Win / Linux ] The glitch is respective jar files such as Lucene-1.4 .jar other jar files are not

RE: question on Hits.doc

2004-09-13 Thread Cocula Remi
Hi, I recently had the same kind of problem but it was due to the way à was dealing with Hits. Obtaining a Hits object from a Query is very fast. but then I was looping over ALL the hits to retrieve informations on the documents before displaying the result to the user. It was not necessary

Re: OutOfMemory example

2004-09-13 Thread John Moylan
You should reuse your old index (as eg an application variable) unless it has changed - use getCurrentVersion to check the index for updates. This has come up before. John Ji Kuhn wrote: Hi, I think I can reproduce memory leaking problem while reopening an index. Lucene version tested

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

2004-09-13 Thread Daniel Taurat
Okay, reference test is done: on JDK 1.4.2 Lucene 1.4.1 really seems to run fine: just a moderate number of SegmentTermEnums that is controlled by gc (about 500 for the 1900 test objects). Daniel Taurat wrote: Hi Doug, you are absolutely right about the older version of the JDK: it is 1.3.1

RE: OutOfMemory example

2004-09-13 Thread Ji Kuhn
I disagree or I don't understand. I can change the code as it is shown below. Now I must reopen the index to see the changes, but the memory problem remains. I realy don't know what I'm doing wrong, the code is so simple. Jiri. ... public static void main(String[] args) throws

Re: OutOfMemory example

2004-09-13 Thread John Moylan
http://issues.apache.org/bugzilla/show_bug.cgi?id=30628 you can close the index, but the Garbage Collector still needs to reclaim the memory and it may be taking longer than your loop to do so. John Ji Kuhn wrote: I disagree or I don't understand. I can change the code as it is shown below.

Re: OutOfMemory example

2004-09-13 Thread sergiu gordea
I have a few comments regarding your code ... 1. Why do you use RamDirectory and not the hard disk? 2. as John said, you should reuse the index instead of creating it each time in the main function if(!indexExists(File indexFile)) IndexWriter writer = new IndexWriter(directory, new

RE: OutOfMemory example

2004-09-13 Thread Ji Kuhn
Thanks for the bug's id, it seems like my problem and I have a stand-alone code with main(). What about slow garbage collector? This looks for me as wrong suggestion. Let change the code once again: ... public static void main(String[] args) throws IOException, InterruptedException {

RE: OutOfMemory example

2004-09-13 Thread Ji Kuhn
You don't see the point of my post. I sent an application which can everyone run only with lucene jar and in deterministic way produce OutOfMemoryError. That's all. Jiri. -Original Message- From: sergiu gordea [mailto:[EMAIL PROTECTED] Sent: Monday, September 13, 2004 5:16 PM To:

force gc idiom - Re: OutOfMemory example

2004-09-13 Thread David Spencer
Ji Kuhn wrote: Thanks for the bug's id, it seems like my problem and I have a stand-alone code with main(). What about slow garbage collector? This looks for me as wrong suggestion. I've seen this written up before (javaworld?) as a way to probably force GC instead of just a System.gc() call. I

RE: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread Ji Kuhn
This doesn't work either! Lets concentrate on the first version of my code. I believe that the code should run endlesly (I have said it before: in version 1.4 final it does). Jiri. -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Monday, September 13, 2004 5:34 PM

Re: OutOfMemory example

2004-09-13 Thread sergiu gordea
then probably is my mistake ...I havn't read all the emails in the thread. So ... your goal is to produce errors ... I try to avoid them :)) All the best, Sergiu Ji Kuhn wrote: You don't see the point of my post. I sent an application which can everyone run only with lucene jar and in

OptimizeIt -- Re: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread David Spencer
Ji Kuhn wrote: This doesn't work either! You're right. I'm running under JDK1.5 and trying larger values for -Xmx and it still fails. Running under (Borlands) OptimzeIt shows the number of Terms and Terminfos (both in org.apache.lucene.index) increase every time thru the loop, by several

FieldSortedHitQueue.Comparators -- Re: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread David Spencer
Just noticed something else suspicious. FieldSortedHitQueue has a field called Comparators and it seems like things are never removed from it Ji Kuhn wrote: This doesn't work either! Lets concentrate on the first version of my code. I believe that the code should run endlesly (I have said

Re: FieldSortedHitQueue.Comparators -- Re: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread David Spencer
David Spencer wrote: Just noticed something else suspicious. FieldSortedHitQueue has a field called Comparators and it seems like things are never removed from it Replying to my own postthis could be the problem. If I put in a print statement here in FieldSortedHitQueue, recompile, and

SegmentReader - Re: FieldSortedHitQueue.Comparators -- Re: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread David Spencer
Another clue, the SegmentReaders are piling up too, which may be why the Comparator map is increasing in size, because SegmentReaders are the keys to Comparator...though again, I don't know enough about the Lucene internals to know what refs to SegmentReaders are valid which which ones may be

Re: OutOfMemory example

2004-09-13 Thread Daniel Naber
On Monday 13 September 2004 15:06, Ji Kuhn wrote: I think I can reproduce memory leaking problem while reopening an index. Lucene version tested is 1.4.1, version 1.4 final works OK. My JVM is: Could you try with the latest Lucene version from CVS? I cannot reproduce your problem with that

Re: OptimizeIt -- Re: force gc idiom - Re: OutOfMemory example

2004-09-13 Thread Kevin A. Burton
David Spencer wrote: Ji Kuhn wrote: This doesn't work either! You're right. I'm running under JDK1.5 and trying larger values for -Xmx and it still fails. Running under (Borlands) OptimzeIt shows the number of Terms and Terminfos (both in org.apache.lucene.index) increase every time thru the

Re: OutOfMemory example

2004-09-13 Thread Kevin A. Burton
Ji Kuhn wrote: Hi, I think I can reproduce memory leaking problem while reopening an index. Lucene version tested is 1.4.1, version 1.4 final works OK. My JVM is: $ java -version java version 1.4.2_05 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-b04) Java HotSpot(TM)

Re: Addition to contributions page

2004-09-13 Thread Daniel Naber
On Friday 10 September 2004 15:48, Chas Emerick wrote: PDFTextStream should be added to the 'Document Converters' section, with this URL http://snowtide.com , and perhaps this heading: 'PDFTextStream -- PDF text and metadata extraction'. The 'Author' field should probably be left blank,

Re: OutOfMemory example

2004-09-13 Thread David Spencer
Daniel Naber wrote: On Monday 13 September 2004 15:06, Ji Kuhn wrote: I think I can reproduce memory leaking problem while reopening an index. Lucene version tested is 1.4.1, version 1.4 final works OK. My JVM is: Could you try with the latest Lucene version from CVS? I cannot reproduce

Similarity score computation documentation

2004-09-13 Thread Ken McCracken
Hi, I was looking through the score computation when running search, and think there may be a discrepancy between what is _documented_ in the org.apache.lucene.search.Similarity class overview Javadocs, and what actually occurs in the code. I believe the problem is only with the documentation.