Pardon the interruption: I did not have tika-parsers on my classpath! Thank you, Dmitry
On Mon, Mar 9, 2015 at 9:59 PM, Dmitry Minkovsky <[email protected]> wrote: > I am trying to use the Tika facade. Here's my test code: > > > Tika tika = new Tika(); > Metadata md = new Metadata(); > > try { > String content = tika.parseToString(src, md, 100000); > > System.out.println("Content length: " + content.length()); > > for (String s: md.names()) { > System.out.println(s + ": " + md.get(s)); > } > } > catch (TikaException e) { System.out.println(e); } > > > Here's the output: > > > Content length: 0 > > X-Parsed-By: org.apache.tika.parser.EmptyParser > > Content-Type: text/html > > So: > > * If Tika correctly identifies the input as text/html, why does it use the > EmptyParser? > * If I'm supposed to pass a parser, which parser should I pass for best > results, assuming that autodetection is successful, as it seems to be above. > > Thank you, > Dmitry >
