bin/nutch crawl urls -dir crawl -depth 3 -topN 50 where urls directory contains urls.txt which contains http://www.fmforums.com/
where crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*fmforums.com/ note - my nutch setup indexes other sites fine. for example where urls directory contains urls.txt which contains http://dispatch.neocodesoftware.com where crawl-urlfilter.txt contains +^http://([a-z0-9]*\.)*dispatch.neocodesoftware.com/ generates a good crawl... i know i have a known good install so why does nutch say No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com??? also fmforums.com/robots.txt looks ok: ############################### # # sample robots.txt file for this website # # addresses all robots by using wild card * User-agent: * # # list folders robots are not allowed to index #Disallow: /tutorials/404redirect/ Disallow: # # list specific files robots are not allowed to index #Disallow: /tutorials/custom_error_page.html Disallow: # # list the location of any sitemaps Sitemap: http://www.yourdomain.com/site_index.xml # # End of robots.txt file # ############################### -- View this message in context: http://n3.nabble.com/nutch-says-No-URLs-to-fetch-check-your-seed-list-and-URL-filters-when-trying-to-index-fmforums-com-tp724973p724973.html Sent from the Nutch - User mailing list archive at Nabble.com.