nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com.

I am using this command:

bin/nutch crawl urls -dir crawl -depth 3 -topN 50

- urls directory contains urls.txt which contains http://www.fmforums.com/
- crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*fmforums.com/

Note - my nutch setup indexes other sites fine.

For example I am using this command:

bin/nutch crawl urls -dir crawl -depth 3 -topN 50

- urls directory contains urls.txt which contains http://dispatch.neocodesoftware.com - crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*dispatch.neocodesoftware.com/

And nutch generates a good crawl.

How can I troubleshoot why nutch says "No URLs to fetch"?

Reply via email to