Well, you could set a fake user agent.

> As I crawl more websites I finding I'm encountering more and more websites
> that reject the crawl by basically redirecting the crawl to an HTML page
> that that states something along the lines of:
> 
> HTTP 602 Unsupported Browser The browser you are using (XYZ Spider/0.1 beta
> (xyz.com search engine; http://www.xyz.com))
> 
> or
> 
>  Sorry, but you either have JavaScript turned off or a JavaScript
> incompatible browser
> 
> Or
> 
> Unsupported Browser
> Browser type and version Generic crawler 0.1
> Browser build Platform Unknown
> Cookies supported False
> Cookies enabled Disabled
> JavaScript supported False
> JavaScript enabled False
> ActiveX enabled False
> VBScript enabled False
> Java applets supported False
> Etc...
> 
> 
> Lots of different messages come back, but basically it is rejecting a crawl
> of the website because of browser incompatibility.
> 
> Do I have Nutch configured incorrectly?
> Is there a way to crawl these sites?
> Recommendations?
> 
> 
> Thanks
> Brad

Reply via email to