How to disable pdf crawling but show pdf links as outlinks

suraj shrestha Sun, 25 Sep 2011 15:21:47 -0700

Right now, I am using regex-urlfilter.txt to disable pdf crawling. However, I 
want to be able to see  the pdf links when I generate read link db (bin/nutch 
readlinkdb).
Is there a crawl-filter that I can customize, so that crawl request to the pdf 
url is ignored or should I update Fetcher?
Thanks.

How to disable pdf crawling but show pdf links as outlinks

Reply via email to