Hi,

Nutch will fetch URL's without robots.txt, but if robots.txt throws an 
UnknownHostException, the URL will throw it as well and fail.

Cheers
 
 
-----Original message-----
> From:chethan <[email protected]>
> Sent: Thu 07-Jun-2012 16:16
> To: [email protected]
> Subject: robots.txt UnknownHostException
> 
> Hi,
> 
> When Nutch doesn't find the robots.txt for a given URL, why does it not
> fetch that URL at all? I mean, if the robots is not found, doesn't it mean
> that the owner of that website doesn't really care about crawlers? So, it's
> ok for Nutch to fetch from it right?
> 
> Thanks,
> Chethan
> 

Reply via email to