.@gmail.com]
Sent: Friday, September 9, 2016 10:06 AM
To: user@tika.apache.org
Subject: How to parse PDF files effectively with Tika
Hi All
While I've experimented with writing a simple demo code which creates
a Tika PDFParser (and few other parsers) and provides a
ToTextContentHandler fo
On Mon, 12 Sep 2016, Sergey Beryozkin wrote:
By the way, I've found out AutoDetectParser may not work if the (pdf) stream
is an attachment stream which may not support a mark.
Simplest would probably be just to wrap it in a TikaInputStream, which
would handle any buffering/marking as needed
return wrapper.getMetadata();
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, September 9, 2016 10:06 AM
To: user@tika.apache.org
Subject: How to parse PDF files effectively with Tika
Hi All
While I've experimented with writing a simple demo code which
efaultHandler(), new Metadata(), context);
}
return wrapper.getMetadata();
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, September 9, 2016 10:06 AM
To: user@tika.apache.org
Subject: How to parse PDF files effectively with Tika
Hi All
While I've
Hi All
While I've experimented with writing a simple demo code which creates a
Tika PDFParser (and few other parsers) and provides a
ToTextContentHandler for it to return the content, I'm realizing I'm not
really quite sure what the best strategy is.
For example, Tim has mentioned that it is