Indexing XML with Lucene

2003-02-14 Thread Pierre Lacchini
Hello, I'm using Lucene, and I need to index an XML Database (Tamino). How can I do that ? Do i have to use an XML parser as Digester ? I'm kinda noob with Lucene, and I really need help ;) Thx !Pierre Lacchini Consultant développement PeopleWare 12, rue du Cimetière L-8413 Steinfort Phone : +

RE: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Marcel Stor
-Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED]] Sent: Freitag, 14. Februar 2003 14:13 To: Lucene Users List Subject: OutOfMemoryException while Indexing an XML file Hi all, I was using the sample code provided I believe by Doug Cutting to index an XML

Re: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Otis Gospodnetic
Nothing in the code snippet you sent would cause that exception. If I were you I'd run it under a profiler to quickly see where the leak is. You can even use something free like JMP. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, I was using the sample code provided I believe by

RE: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Rob Outar
So to the best of your knowledge the Lucene Document Object should not cause the exception even though the XML file is huge and 1000's of fields are being added to the Lucene Document Object? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday,

RE: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Aaron Galea
I had this problem when using xerces to parse xml documents. The problem I think lies in the Java garbage collector. The way I solved it was to create a shell script that invokes a java program for each xml file that adds it to the index. Hope this helps... Aaron -- Original Message

RE: indexing xml files with relative links to dtds

2003-02-14 Thread icewind
Ok, I used parser.setFeature to turn the validation off. This helps, but it turns out there are entity references in the XML files as well. These are causing the same problems the DTD references were causing before... Any suggestions with regards to this? Thanks, --- Hui Ouyang [EMAIL

Phrase queries with wildcards - Do they work ?

2003-02-14 Thread Mailing Lists Account
WIth 1.2 release, do the phrase queries containing wild cards such as microsoft app* actually work ? I couldn't get it to work even though the couple of posts in the mailing lists reported success. I observed that query parser recognizes the above query as Phrase query. Is there any work-around

Re: Phrase query and porter stemmer

2003-02-14 Thread Mailing Lists Account
Interesting. Thanks to the lucene and this list, I am learning lot more about how search engines work. regards Ramesh - Original Message - From: Eric Isakson [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, February 13, 2003 9:10 PM Subject: RE: Phrase query and

Re: Quickest way to build a Document - (Keyword, Freq)* map

2003-02-14 Thread Ype Kingma
On Friday 14 February 2003 15:10, you wrote: Hi, I am using Lucene right now to index several semi-structured documents. I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword - frequency from the information already in the lucene index. I

Re: OutOfMemoryException while Indexing an XML file

2003-02-14 Thread Tatu Saloranta
On Friday 14 February 2003 07:27, Aaron Galea wrote: I had this problem when using xerces to parse xml documents. The problem I think lies in the Java garbage collector. The way I solved it was to create It's unlikely that GC is the culprit. Current ones are good at purging objects that are