RE: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

Arkadi.Kosmynin Tue, 20 Apr 2010 16:49:47 -0700

What is in your regex-urlfilter.txt?


> -----Original Message-----
> From: joshua paul [mailto:[email protected]]
> Sent: Wednesday, 21 April 2010 9:44 AM
> To: [email protected]
> Subject: nutch says No URLs to fetch - check your seed list and URL
> filters when trying to index fmforums.com
> 
> nutch says No URLs to fetch - check your seed list and URL filters when
> trying to index fmforums.com.
> 
> I am using this command:
> 
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> 
> - urls directory contains urls.txt which contains
> http://www.fmforums.com/
> - crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*fmforums.com/
> 
> Note - my nutch setup indexes other sites fine.
> 
> For example I am using this command:
> 
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> 
> - urls directory contains urls.txt which contains
> http://dispatch.neocodesoftware.com
> - crawl-urlfilter.txt contains
> +^http://([a-z0-9]*\.)*dispatch.neocodesoftware.com/
> 
> And nutch generates a good crawl.
> 
> How can I troubleshoot why nutch says "No URLs to fetch"?

RE: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

Reply via email to