Hi Lewis, i am using nutch 1.5.1 I get no specific log output or errors.
I am expecting nutch to crawl pdfs with no file extension e.g. /output/mypdffile, actually nutch is only crawling/parsing pdfs which look like this /output/mypdffile*.pdf* readdb stats: Statistics for CrawlDb: XYZ TOTAL urls: 104 retry 0: 104 min score: 0.0 avg score: 0.037596155 max score: 1.01 status 2 (db_fetched): 104 CrawlDb statistics: done Thanks Patrick -- View this message in context: http://lucene.472066.n3.nabble.com/Crawling-PDFs-no-file-extension-tp4032174p4033105.html Sent from the Nutch - User mailing list archive at Nabble.com.

