Re: CHM Files and Tika

2012-08-14 Thread Sebastian Nagel
parser deps in our build.xml? -Original message- From:Sebastian Nagel wastl.na...@googlemail.com Sent: Thu 09-Aug-2012 23:18 To: user@nutch.apache.org Subject: Re: CHM Files and Tika Hi Jan, confirmed: Nutch cannot parse, while Tika (same version used by Nutch) can parse chm

Re: CHM Files and Tika

2012-08-10 Thread Julien Nioche
@nutch.apache.org Subject: Re: CHM Files and Tika Hi Jan, confirmed: Nutch cannot parse, while Tika (same version used by Nutch) can parse chm. The chm parsers are in tika-parser*.jar which is contained in the Nutch package. Any ideas? Sebastian On 08/08/2012 12:03 PM, Jan Riewe

Re: CHM Files and Tika

2012-08-09 Thread Sebastian Nagel
Hi Jan, confirmed: Nutch cannot parse, while Tika (same version used by Nutch) can parse chm. The chm parsers are in tika-parser*.jar which is contained in the Nutch package. Any ideas? Sebastian On 08/08/2012 12:03 PM, Jan Riewe wrote: Hey there, i try to parse CHM (Microsoft Help Files)

RE: CHM Files and Tika

2012-08-09 Thread Markus Jelsma
hmm, i'm not sure but maybe we don't include all Tika parser deps in our build.xml? -Original message- From:Sebastian Nagel wastl.na...@googlemail.com Sent: Thu 09-Aug-2012 23:18 To: user@nutch.apache.org Subject: Re: CHM Files and Tika Hi Jan, confirmed: Nutch cannot parse

CHM Files and Tika

2012-08-08 Thread Jan Riewe
Hey there, i try to parse CHM (Microsoft Help Files) with Nucht, but i get a: Can't retrieve Tika parser for mime-type application/vnd.ms-htmlhelp i've tried version 1.4 (tika 0.10) and 1.51 from nutch (tika 1.1) which should be able to parse those files