Folks, I'm having a very similar problem with the latest svn version of Nutch (Revision: 279844).
The crawler returns this message: fetch okay, but can't parse [URL scrubbed]/pub/Presentation.ppt, reason: failed(2,203): Content-Type not text/html: application/vnd.ms-powerpoint So in this case, the MIME type is correct, so the file should be passed to the parse-mspowerpoint plugin, but it's not. Now that the plugin has been committed, how do we actually make it work (yes, I've read http://issues.apache.org/jira/browse/NUTCH-88 )? Thanks, Renat ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
