Folks, I'm having a very similar problem with the latest svn version of Nutch (Revision: 279844).
The crawler returns this message: fetch okay, but can't parse [URL scrubbed]/pub/Presentation.ppt, reason: failed(2,203): Content-Type not text/html: application/vnd.ms-powerpoint So in this case, the MIME type is correct, so the file should be passed to the parse-mspowerpoint plugin, but it's not. Now that the plugin has been committed, how do we actually make it work (yes, I've read http://issues.apache.org/jira/browse/NUTCH-88 )? Thanks, Renat
