I am trying to use the Tika facade. Here's my test code:
Tika tika = new Tika();
Metadata md = new Metadata();
try {
String content = tika.parseToString(src, md, 100000);
System.out.println("Content length: " + content.length());
for (String s: md.names()) {
System.out.println(s + ": " + md.get(s));
}
}
catch (TikaException e) { System.out.println(e); }
Here's the output:
> Content length: 0
> X-Parsed-By: org.apache.tika.parser.EmptyParser
> Content-Type: text/html
So:
* If Tika correctly identifies the input as text/html, why does it use the
EmptyParser?
* If I'm supposed to pass a parser, which parser should I pass for best
results, assuming that autodetection is successful, as it seems to be above.
Thank you,
Dmitry