Right now, I am using regex-urlfilter.txt to disable pdf crawling. However, I 
want to be able to see  the pdf links when I generate read link db (bin/nutch 
readlinkdb).
Is there a crawl-filter that I can customize, so that crawl request to the pdf 
url is ignored or should I update Fetcher?
Thanks.




Reply via email to