Hi Bob, the relevant Javadoc comment stands before the declaration of a variable (here a constant): /** Resource is gone. */ public static final int GONE = 11;
More detailed, GONE results from one of the following HTTP status codes: 400 Bad request 401 Unauthorized 410 Gone (*forever* gone, opposed to 404 Not Found) See src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java My guess would be that "www.sitename.com" requires authentication. Just repeat the request as bin/nutch parsechecker \ -Dstore.http.headers=true \ -Dstore.http.request=true \ ... <url> (I guess you're already using parsechecker or indexchecker) This will show the HTTP headers where you'll find the exact HTTP status code. Best, Sebastian On 12/17/19 4:36 PM, Robert Scavilla wrote: > Hi again, and thank in advance for your kind help. > > Nutch 1.14 > > I am getting the following error message when crawling a site: > *Fetch failed with protocol status: gone(11), lastModified=0: > https://www.sitename.com <https://www.sitename.com>* > > The only documentation I can find says: > >> public static final int GONE = 11; >> /** Resource has moved permanently. New url should be found in args. */ >> > I'm not sure what this means. When I load the page in my browser it shows > status codes 200 or 304 for all resources. > > The problem only exists on a single site - other sites crawl fine. > > I saved a page from the site locally and that page fetches successfully. > > Can you please steer my in the right direction. Many Thanks, > ...bob >