I am trying to use the Tika facade. Here's my test code:

Tika tika = new Tika();
Metadata md = new Metadata();

try {
    String content = tika.parseToString(src, md, 100000);

    System.out.println("Content length: " + content.length());

    for (String s: md.names()) {
        System.out.println(s + ": " + md.get(s));
    }
}
catch (TikaException e) { System.out.println(e); }


Here's the output:

> Content length: 0
> X-Parsed-By: org.apache.tika.parser.EmptyParser
> Content-Type: text/html

So:

* If Tika correctly identifies the input as text/html, why does it use the
EmptyParser?
* If I'm supposed to pass a parser, which parser should I pass for best
results, assuming that autodetection is successful, as it seems to be above.

Thank you,
Dmitry

Reply via email to