Re: Crawling PDFs

paddz Mon, 14 Jan 2013 02:43:24 -0800

Hi Lewis,

i am using nutch 1.5.1
I get no specific log output or errors.


I am expecting nutch to crawl pdfs with no file extension e.g.
/output/mypdffile, actually nutch is only crawling/parsing pdfs which look
like this /output/mypdffile*.pdf*

readdb stats:
Statistics for CrawlDb: XYZ
TOTAL urls:     104
retry 0:        104
min score:      0.0
avg score:      0.037596155
max score:      1.01
status 2 (db_fetched):  104
CrawlDb statistics: done

Thanks
Patrick




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Crawling-PDFs-no-file-extension-tp4032174p4033105.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Crawling PDFs

Reply via email to