nutch says No URLs to fetch - check your seed list and URL filters when
trying to index fmforums.com.
I am using this command:
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
- urls directory contains urls.txt which contains http://www.fmforums.com/
- crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*fmforums.com/
Note - my nutch setup indexes other sites fine.
For example I am using this command:
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
- urls directory contains urls.txt which contains
http://dispatch.neocodesoftware.com
- crawl-urlfilter.txt contains
+^http://([a-z0-9]*\.)*dispatch.neocodesoftware.com/
And nutch generates a good crawl.
How can I troubleshoot why nutch says "No URLs to fetch"?