On Thu, 12 Apr 2012, William Hays wrote:
Using the API, I have extracted the supported media types for the AutoDetectParser in Tika 1.1 and I'm not seeing HTML or XHTML mimetypes in that list of 92 items, though it parses such files fine.

Hmm, HTML is showing up for me:

java -jar tika-app-1.1.jar --list-parser-details | grep -A 4 HtmlParser
    org.apache.tika.parser.html.HtmlParser
      application/x-asp
      application/xhtml+xml
      application/vnd.wap.xhtml+xml
      text/html

Nick

Reply via email to