RE: OutOfMemoryException while Indexing an XML file

2003-02-18 Thread Rob Outar
We are aware of DOM limitations/memory problems, but I am using SAX to parse the file and index elements and attributes in my content handler. Thanks, Rob -Original Message- From: Tatu Saloranta [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 8:18 PM To: Lucene Users List

RE: OutOfMemoryException while Indexing an XML file/PdfParser

2003-02-18 Thread Pinky Iyer
I am having similar problem but indexing pdf documents using pdfbox parser (available at www.pdfbox.com). I get an exception saying Exception in thread main java.lang.OutOfMemoryError Any body who has implemented the above code? Any help appreciated??? Thanks! PI Rob Outar [EMAIL PROTECTED]

RE: OutOfMemoryException while Indexing an XML file/PdfParser

2003-02-18 Thread Matt Tucker
Rob, We ran into this problem too, and our solution was to use a native PDF text extractor (PDFBox just can't seem to handle large PDFs well). Basically, we try to parse with the native app first, and if that fails, we parse with PDFBox. We used: http://www.foolabs.com/xpdf/ A code snippet for

RE: OutOfMemoryException while Indexing an XML file/PdfParser

2003-02-18 Thread Ben Litchfield
I am aware of the issues with parsing certain PDF documents. I am currently working on refactoring PDFBox to deal with large documents. You will see this in the next release. I would like to thank people for feedback and sending problem documents. Ben Litchfield http://www.pdfbox.org On