Nick,

I believe you answered a different question than what I asked. My observation was specifically about the AutoDetectParser listing its supported mediatypes, not about the HTMLParser. The code I used is similar to:

        for (MediaType mt : autoDetectParser.getSupportedTypes(pctx)) {
            System.out.println(mt.toString());
        }

The mimetypes you listed for the HtmlParser do not show up here.

Thanks,
Bill


On 04/16/2012 01:51 PM, Nick Burch wrote:
On Thu, 12 Apr 2012, William Hays wrote:
Using the API, I have extracted the supported media types for the AutoDetectParser in Tika 1.1 and I'm not seeing HTML or XHTML mimetypes in that list of 92 items, though it parses such files fine.

Hmm, HTML is showing up for me:

java -jar tika-app-1.1.jar --list-parser-details | grep -A 4 HtmlParser
    org.apache.tika.parser.html.HtmlParser
      application/x-asp
      application/xhtml+xml
      application/vnd.wap.xhtml+xml
      text/html

Nick

--
------------
William Hays
Software Development&  Analysis
MIT Libraries E25-131
617.324.5682 (phone)
[email protected]


Reply via email to