Hi Jan,

confirmed: Nutch cannot parse, while Tika (same version used by Nutch)
can parse chm. The chm parsers are in tika-parser*.jar which is contained
in the Nutch package.

Any ideas?

Sebastian

On 08/08/2012 12:03 PM, Jan Riewe wrote:
> Hey there,
> 
> i try to parse CHM (Microsoft Help Files) with Nucht, but i get a:
> 
> Can't retrieve Tika parser for mime-type application/vnd.ms-htmlhelp
> 
> i've tried version 1.4 (tika 0.10) and 1.51 from nutch (tika 1.1) which
> should be able to parse those files
> https://issues.apache.org/jira/browse/TIKA-245
> 
> In the tika-mimetypes.xml i do find a entry related to
> application/vnd.ms-htmlhelp
> 
> Does anyone ever ran into the same issues and knows how to fix that?
> 
> Bye
> Jan
> 

Reply via email to