I'm trying to index only html files in the filesystem. I made modifications to the crawl-urlfilter file and changed it to handle file links and accept everything else. I also enabled the file plugin in nutch-site.xml. However, when nutch outputs that it failed to fetch file (for any html file) with FileError: 404. How can I get rid of this behavior?
Thanks ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
