new JIRA? On 9 August 2012 23:30, Markus Jelsma <[email protected]> wrote:
> hmm, i'm not sure but maybe we don't include all Tika parser deps in our > build.xml? > > > > -----Original message----- > > From:Sebastian Nagel <[email protected]> > > Sent: Thu 09-Aug-2012 23:18 > > To: [email protected] > > Subject: Re: CHM Files and Tika > > > > Hi Jan, > > > > confirmed: Nutch cannot parse, while Tika (same version used by Nutch) > > can parse chm. The chm parsers are in tika-parser*.jar which is contained > > in the Nutch package. > > > > Any ideas? > > > > Sebastian > > > > On 08/08/2012 12:03 PM, Jan Riewe wrote: > > > Hey there, > > > > > > i try to parse CHM (Microsoft Help Files) with Nucht, but i get a: > > > > > > Can't retrieve Tika parser for mime-type application/vnd.ms-htmlhelp > > > > > > i've tried version 1.4 (tika 0.10) and 1.51 from nutch (tika 1.1) which > > > should be able to parse those files > > > https://issues.apache.org/jira/browse/TIKA-245 > > > > > > In the tika-mimetypes.xml i do find a entry related to > > > application/vnd.ms-htmlhelp > > > > > > Does anyone ever ran into the same issues and knows how to fix that? > > > > > > Bye > > > Jan > > > > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

