What is in your regex-urlfilter.txt?

> -----Original Message-----
> From: joshua paul [mailto:jos...@neocodesoftware.com]
> Sent: Wednesday, 21 April 2010 9:44 AM
> To: nutch-user@lucene.apache.org
> Subject: nutch says No URLs to fetch - check your seed list and URL
> filters when trying to index fmforums.com
> 
> nutch says No URLs to fetch - check your seed list and URL filters when
> trying to index fmforums.com.
> 
> I am using this command:
> 
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> 
> - urls directory contains urls.txt which contains
> http://www.fmforums.com/
> - crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*fmforums.com/
> 
> Note - my nutch setup indexes other sites fine.
> 
> For example I am using this command:
> 
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> 
> - urls directory contains urls.txt which contains
> http://dispatch.neocodesoftware.com
> - crawl-urlfilter.txt contains
> +^http://([a-z0-9]*\.)*dispatch.neocodesoftware.com/
> 
> And nutch generates a good crawl.
> 
> How can I troubleshoot why nutch says "No URLs to fetch"?

Reply via email to