I get a 500. Have tried removing Nutch from my user-agent string and still get the same result.
-----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Friday, February 27, 2015 12:05 PM To: [email protected] Subject: RE: Can anyone fetch this page? Seems fine to me http://oldservice.openindex.io/extract.php?url=http%3A%2F%2Fwww.nature.com%2Fnature%2Fjournal%2Fv518%2Fn7540%2Ffull%2Fnature14236.html -----Original message----- > From:Lewis John Mcgibbney <[email protected]> > Sent: Friday 27th February 2015 18:56 > To: [email protected] > Subject: Can anyone fetch this page? > > Hi Folks, > I was getting 500 internal server error using Nutch trunk when > attempting to fetch content from this domain. > http://www.nature.com > Just for detail, Nature.com is a catalogue of journals and science > resources, including the journal *Nature*. Publishes science news and > articles across a wide range of scientific fields. So it is nothing > malicious or sensitive/offending content-wise. > Can anyone else fetch this URL? > I can get it with curl and wget but not Nutch. > Thanks > Lewis > > > -- > *Lewis* >

