The page was successfully fetched and parsed but the title just contains: 
"ERROR: The requested URL could not be retrieved" as it seems.

On Thursday 15 December 2011 15:36:40 Christopher Gross wrote:
> I'm getting a success status AND an error message when trying to do a
> parse check.  It is a SharePoint site, but this part allows for
> anonymous access -- I can curl the page just fine without having to do
> anything funky.  I have a robots.txt in place that allows everyone
> through (it is an internal test site, url has been redacted).  Here's
> what I run:
> 
> [user@eval bin]$ ./nutch parsechecker "http://sharepointurl/Home.aspx";
> fetching: http://sharepointurl/Home.aspx
> parsing: http://sharepointurl/Home.aspx
> contentType: text/html
> ---------
> Url
> ---------------
> http://http://sharepointurl/Home.aspx---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: ERROR: The requested URL could not be retrieved
> Outlinks: 0
> Content Metadata: Connection=close Content-Type=text/html
> Parse Metadata: CharEncodingForConversion=windows-1252
> OriginalCharEncoding=windows-1252
> 
> Google searches have been fruitless.  Can anyone help me make sense of
> what is going on here?  I can provide some snippets of config files if
> need be.
> 
> Nutch 1.4, SharePoint 2010, Java 1.6.0_06-b02.
> 
> Thanks!
> 
> -- Chris

-- 
Markus Jelsma - CTO - Openindex

Reply via email to