RE: robots.txt UnknownHostException

Markus Jelsma Thu, 07 Jun 2012 07:19:05 -0700

Hi,

Nutch will fetch URL's without robots.txt, but if robots.txt throws an 
UnknownHostException, the URL will throw it as well and fail.


Cheers
 
 
-----Original message-----
> From:chethan <[email protected]>
> Sent: Thu 07-Jun-2012 16:16
> To: [email protected]
> Subject: robots.txt UnknownHostException
> 
> Hi,
> 
> When Nutch doesn't find the robots.txt for a given URL, why does it not
> fetch that URL at all? I mean, if the robots is not found, doesn't it mean
> that the owner of that website doesn't really care about crawlers? So, it's
> ok for Nutch to fetch from it right?
> 
> Thanks,
> Chethan
>

RE: robots.txt UnknownHostException

Reply via email to