But this won't turn on JavaScript. If a site relyes on it crawling such won't give useful content. Best Regards Alexander Aristov
On 21 October 2010 20:42, Markus Jelsma <[email protected]> wrote: > Well, you could set a fake user agent. > > > As I crawl more websites I finding I'm encountering more and more > websites > > that reject the crawl by basically redirecting the crawl to an HTML page > > that that states something along the lines of: > > > > HTTP 602 Unsupported Browser The browser you are using (XYZ Spider/0.1 > beta > > (xyz.com search engine; http://www.xyz.com)) > > > > or > > > > Sorry, but you either have JavaScript turned off or a JavaScript > > incompatible browser > > > > Or > > > > Unsupported Browser > > Browser type and version Generic crawler 0.1 > > Browser build Platform Unknown > > Cookies supported False > > Cookies enabled Disabled > > JavaScript supported False > > JavaScript enabled False > > ActiveX enabled False > > VBScript enabled False > > Java applets supported False > > Etc... > > > > > > Lots of different messages come back, but basically it is rejecting a > crawl > > of the website because of browser incompatibility. > > > > Do I have Nutch configured incorrectly? > > Is there a way to crawl these sites? > > Recommendations? > > > > > > Thanks > > Brad >

