But this won't turn on JavaScript. If a site relyes on it crawling such
won't give useful content.
Best Regards
Alexander Aristov


On 21 October 2010 20:42, Markus Jelsma <[email protected]> wrote:

> Well, you could set a fake user agent.
>
> > As I crawl more websites I finding I'm encountering more and more
> websites
> > that reject the crawl by basically redirecting the crawl to an HTML page
> > that that states something along the lines of:
> >
> > HTTP 602 Unsupported Browser The browser you are using (XYZ Spider/0.1
> beta
> > (xyz.com search engine; http://www.xyz.com))
> >
> > or
> >
> >  Sorry, but you either have JavaScript turned off or a JavaScript
> > incompatible browser
> >
> > Or
> >
> > Unsupported Browser
> > Browser type and version Generic crawler 0.1
> > Browser build Platform Unknown
> > Cookies supported False
> > Cookies enabled Disabled
> > JavaScript supported False
> > JavaScript enabled False
> > ActiveX enabled False
> > VBScript enabled False
> > Java applets supported False
> > Etc...
> >
> >
> > Lots of different messages come back, but basically it is rejecting a
> crawl
> > of the website because of browser incompatibility.
> >
> > Do I have Nutch configured incorrectly?
> > Is there a way to crawl these sites?
> > Recommendations?
> >
> >
> > Thanks
> > Brad
>

Reply via email to