Nick,
I believe you answered a different question than what I asked. My
observation was specifically about the AutoDetectParser listing its
supported mediatypes, not about the HTMLParser. The code I used is
similar to:
for (MediaType mt : autoDetectParser.getSupportedTypes(pctx)) {
System.out.println(mt.toString());
}
The mimetypes you listed for the HtmlParser do not show up here.
Thanks,
Bill
On 04/16/2012 01:51 PM, Nick Burch wrote:
On Thu, 12 Apr 2012, William Hays wrote:
Using the API, I have extracted the supported media types for the
AutoDetectParser in Tika 1.1 and I'm not seeing HTML or XHTML
mimetypes in that list of 92 items, though it parses such files fine.
Hmm, HTML is showing up for me:
java -jar tika-app-1.1.jar --list-parser-details | grep -A 4 HtmlParser
org.apache.tika.parser.html.HtmlParser
application/x-asp
application/xhtml+xml
application/vnd.wap.xhtml+xml
text/html
Nick
--
------------
William Hays
Software Development& Analysis
MIT Libraries E25-131
617.324.5682 (phone)
[email protected]