Hi Jan, confirmed: Nutch cannot parse, while Tika (same version used by Nutch) can parse chm. The chm parsers are in tika-parser*.jar which is contained in the Nutch package.
Any ideas? Sebastian On 08/08/2012 12:03 PM, Jan Riewe wrote: > Hey there, > > i try to parse CHM (Microsoft Help Files) with Nucht, but i get a: > > Can't retrieve Tika parser for mime-type application/vnd.ms-htmlhelp > > i've tried version 1.4 (tika 0.10) and 1.51 from nutch (tika 1.1) which > should be able to parse those files > https://issues.apache.org/jira/browse/TIKA-245 > > In the tika-mimetypes.xml i do find a entry related to > application/vnd.ms-htmlhelp > > Does anyone ever ran into the same issues and knows how to fix that? > > Bye > Jan >

