Folks,

I'm having a very similar problem with the latest svn version of Nutch
(Revision: 279844).

The crawler returns this message: fetch okay, but can't parse [URL
scrubbed]/pub/Presentation.ppt, reason: failed(2,203): Content-Type not
text/html: application/vnd.ms-powerpoint

So in this case, the MIME type is correct, so the file should be passed
to the parse-mspowerpoint plugin, but it's not. Now that the plugin has
been committed, how do we actually make it work (yes, I've read 
http://issues.apache.org/jira/browse/NUTCH-88 )?

Thanks,
Renat

Reply via email to