Can you please set the user agent to something that resembles a browser like Chrome for example and test? I just posted a query yesterday for a similar issue where the mobile version of the site gets served up instead of 500.
On Fri, Feb 27, 2015 at 1:08 PM, Iain Lopata <ilopa...@hotmail.com> wrote: > I get a 500. Have tried removing Nutch from my user-agent string and still > get the same result. > > -----Original Message----- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Friday, February 27, 2015 12:05 PM > To: user@nutch.apache.org > Subject: RE: Can anyone fetch this page? > > Seems fine to me > http://oldservice.openindex.io/extract.php?url=http%3A%2F%2Fwww.nature.com%2Fnature%2Fjournal%2Fv518%2Fn7540%2Ffull%2Fnature14236.html > > > -----Original message----- >> From:Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> >> Sent: Friday 27th February 2015 18:56 >> To: user@nutch.apache.org >> Subject: Can anyone fetch this page? >> >> Hi Folks, >> I was getting 500 internal server error using Nutch trunk when >> attempting to fetch content from this domain. >> http://www.nature.com >> Just for detail, Nature.com is a catalogue of journals and science >> resources, including the journal *Nature*. Publishes science news and >> articles across a wide range of scientific fields. So it is nothing >> malicious or sensitive/offending content-wise. >> Can anyone else fetch this URL? >> I can get it with curl and wget but not Nutch. >> Thanks >> Lewis >> >> >> -- >> *Lewis* >> >