Hi,
On Mon, Jul 19, 2010 at 4:27 PM, Sergiy Karpenko
<[email protected]> wrote:
> I want configure tika to use only PDFParser
The easiest way to achieve this is to directly use the PDFParser class
instead of working through the configuration.
> File file = getResourceAsFile("/test-documents/testPDF.pdf");
> TikaConfig myTC = new
> TikaConfig(getResourceAsFile("/test-documents/tika-config.xml"));
> String s1 = ParseUtils.getStringContent(file, myTC);
Use something like this instead:
Parser parser = new PDFParser();
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
InputStream stream = TikaInputStream.get(new File("document.pdf"));
try {
parser.parse(stream, handler, metadata, context);
} finally {
stream.close();
}
String content = handler.toString();
BR,
Jukka Zitting