Hello,
Can someone point me in the right direction for streaming the structured
xhtml output from a Tika Parser. The closest I am getting is using a
BodyContentHandler as below.
Parser parser = tika.getParser();
ParseContext context = new ParseContext();
context.set(Locale.class, Locale.ENGLISH);
PrintStream printer = new PrintStream(System.out);
ContentHandler handler = new BodyContentHandler(printer);
Metadata mtdt = new Metadata();
parser.parse(new FileInputStream(f), handler, mtdt, context);
printer.close();
Is there a ContentHandler that can do this easily? I apologize that my
comprehension of the SAX api is minimal at best.
Thanks,
Don