Hello I use nutch-1.2 with fedora 14 and try to index about 4000 domains. I use bin/nutch crawl urls -dir crawl -depth 3 topN -1 and have in crawl-urlfilter.txt this # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*
I noticed that if a domain has entered like http://mydomain.com in the seed file, nutch gives error failed with: java.net.UnknownHostException for some domains. If, however, I enter the same domain with www like http://www.mydomain.com nutch does not give any errors. Since, if we enter the http://mydomain.com in the browser it redirects to http://www.mydomain.com I thought this might be a bug in nutch. Any thoughts how to fix this issue? Thanks. Alex.

