Hi Bob,

the relevant Javadoc comment stands before the declaration of a variable (here 
a constant):
  /** Resource is gone. */
  public static final int GONE = 11;

More detailed, GONE results from one of the following HTTP status codes:
 400 Bad request
 401 Unauthorized
 410 Gone   (*forever* gone, opposed to 404 Not Found)
See 
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java

My guess would be that "www.sitename.com" requires authentication.

Just repeat the request as
 bin/nutch parsechecker \
    -Dstore.http.headers=true \
    -Dstore.http.request=true \
    ... <url>

(I guess you're already using parsechecker or indexchecker)
This will show the HTTP headers where you'll find the exact HTTP status code.

Best,
Sebastian



On 12/17/19 4:36 PM, Robert Scavilla wrote:
> Hi again, and thank in advance for your kind help.
> 
> Nutch 1.14
> 
> I am getting the following error message when crawling a site:
> *Fetch failed with protocol status: gone(11), lastModified=0:
> https://www.sitename.com <https://www.sitename.com>*
> 
> The only documentation I can find says:
> 
>> public static final int GONE = 11;
>> /** Resource has moved permanently. New url should be found in args. */
>>
> I'm not sure what this means. When I load the page in my browser it shows
> status codes 200 or 304 for all resources.
> 
> The problem only exists on a single site - other sites crawl fine.
> 
> I saved a page from the site locally and that page fetches successfully.
> 
> Can you please steer my in the right direction. Many Thanks,
> ...bob
> 

Reply via email to