Hey Lewis, I was able to fetch it:
MacBookPro2014:crawls2 almohsin$ NUTCH parsechecker "http://www.nature.com/" fetching: http://www.nature.com/ parsing: http://www.nature.com/ contentType: text/html signature: 6cee25dd58e27e7cb0394a1325f3df6e --------- Url --------------- http://www.nature.com/ --------- ParseData --------- Version: 5 Status: success(1,0) Title: Outlinks: 120 outlink: toUrl: http://www.nature.com/#content anchor: Jump to main content .......... outlink: toUrl: http://static.chartbeat.com/js/chartbeat.js anchor: Content Metadata: Vary=Accept-Encoding Date=Fri, 27 Feb 2015 18:01:59 GMT P3P=CP="CAO DSP LAW IVA IVD HIS OUR UNR STP UNI COM" Expires=Thu, 01-Jan-1970 00:00:00 GMT nutch.crawl.score=0.0 Content-Encoding=gzip webserver=npgj2ee16.nature.com Set-Cookie=JSESSIONID=1mxz9o0ewy9dwk18aqzndtyiq;Path=/oa;Domain=.nature.com Connection=close Content-Type=text/html; charset=utf-8 Server=Jetty(6.1.26) Parse Metadata: CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8 language=en Best regards, Mohammad Al-Mohsin On Fri, Feb 27, 2015 at 9:55 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Folks, > I was getting 500 internal server error using Nutch trunk when attempting > to fetch content from this domain. > http://www.nature.com > Just for detail, Nature.com is a catalogue of journals and science > resources, including the journal *Nature*. Publishes science news and > articles across a wide range of scientific fields. So it is nothing > malicious or sensitive/offending content-wise. > Can anyone else fetch this URL? > I can get it with curl and wget but not Nutch. > Thanks > Lewis > > > -- > *Lewis* >

