Re: how do you work with PDF

2004-11-16 Thread Chas Emerick
Document instance can also contain the PDF document's metadata attributes as Lucene Fields. Here's a longer tutorial that covers this and more: http://snowtide.com/home/PDFTextStream/techtips/easy_lucene_integration Chas Emerick [EMAIL PROTECTED] Snowtide Informatics Systems PDFTextStream: fast

Re: Google Desktop Could be Better

2004-10-21 Thread Chas Emerick
PDFTextStream's jar is 400K. This may not be relevant if Bill is only interested in open-source options, but I thought I'd put it out there anyway. Chas Emerick PDFTextStream: fast PDF text extraction for Java apps and Lucene http://snowtide.com/home/PDFTextStream/ On Oct 17, 2004, at 12:51

Addition to contributions page

2004-09-10 Thread Chas Emerick
doing. PDFTextStream should be added to the 'Document Converters' section, with this URL http://snowtide.com , and perhaps this heading: 'PDFTextStream -- PDF text and metadata extraction'. The 'Author' field should probably be left blank, since there's no single creator. Thanks much, Chas

Re: pdf in Chinese

2004-09-08 Thread Chas Emerick
I'm not aware of any Java library that can reliably extract Chinese text from PDF documents. We're planning on supporting Chinese, Japanese, and Korean in version 2 of PDFTextStream, but there's no doubt that it's a huge challenge. Chas Emerick | [EMAIL PROTECTED] PDFTextStream: fast PDF

Re: PDF-Text Performance comparison

2004-09-08 Thread Chas Emerick
that you're not at all sore. Chas Emerick | [EMAIL PROTECTED] PDFTextStream: fast PDF text extraction for Java applications http://snowtide.com/home/PDFTextStream/ On Sep 8, 2004, at 10:41 AM, Ben Litchfield wrote: On Wed, 8 Sep 2004, Chas Emerick wrote: PDFTextStream: fast PDF text extraction