As I crawl more websites I finding I'm encountering more and more websites
that reject the crawl by basically redirecting the crawl to an HTML page
that that states something along the lines of:

HTTP 602 Unsupported Browser The browser you are using (XYZ Spider/0.1 beta
(xyz.com search engine; http://www.xyz.com))

or

 Sorry, but you either have JavaScript turned off or a JavaScript
incompatible browser

Or

Unsupported Browser 
Browser type and version Generic crawler 0.1 
Browser build Platform Unknown 
Cookies supported False 
Cookies enabled Disabled 
JavaScript supported False 
JavaScript enabled False 
ActiveX enabled False 
VBScript enabled False 
Java applets supported False 
Etc...


Lots of different messages come back, but basically it is rejecting a crawl
of the website because of browser incompatibility.  

Do I have Nutch configured incorrectly?
Is there a way to crawl these sites?
Recommendations?


Thanks
Brad



Reply via email to