Document instance can also contain the PDF document's metadata
attributes as Lucene Fields.
Here's a longer tutorial that covers this and more:
http://snowtide.com/home/PDFTextStream/techtips/easy_lucene_integration
Chas Emerick
[EMAIL PROTECTED]
Snowtide Informatics Systems
PDFTextStream: fast
PDFTextStream's jar is 400K. This may not be relevant if Bill is
only interested in open-source options, but I thought I'd put it out
there anyway.
Chas Emerick
PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/
On Oct 17, 2004, at 12:51
doing.
PDFTextStream should be added to the 'Document Converters' section,
with this URL http://snowtide.com , and perhaps this heading:
'PDFTextStream -- PDF text and metadata extraction'. The 'Author'
field should probably be left blank, since there's no single creator.
Thanks much,
Chas
I'm not aware of any Java library that can reliably extract Chinese
text from PDF documents. We're planning on supporting Chinese,
Japanese, and Korean in version 2 of PDFTextStream, but there's no
doubt that it's a huge challenge.
Chas Emerick | [EMAIL PROTECTED]
PDFTextStream: fast PDF
that you're not at
all sore.
Chas Emerick | [EMAIL PROTECTED]
PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/
On Sep 8, 2004, at 10:41 AM, Ben Litchfield wrote:
On Wed, 8 Sep 2004, Chas Emerick wrote:
PDFTextStream: fast PDF text extraction