Hi, Nutch will fetch URL's without robots.txt, but if robots.txt throws an UnknownHostException, the URL will throw it as well and fail.
Cheers -----Original message----- > From:chethan <[email protected]> > Sent: Thu 07-Jun-2012 16:16 > To: [email protected] > Subject: robots.txt UnknownHostException > > Hi, > > When Nutch doesn't find the robots.txt for a given URL, why does it not > fetch that URL at all? I mean, if the robots is not found, doesn't it mean > that the owner of that website doesn't really care about crawlers? So, it's > ok for Nutch to fetch from it right? > > Thanks, > Chethan >

