[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann reopened NUTCH-258: -------------------------------------
Assign To: Chris A. Mattmann Issue found to in fact be a real issue with the Fetcher: here's the proposed solution: * add flag field (preferably a public static final short String) in Configuration instance to signify whether or not a SEVERE error has been logged within a task's context * check this field within the fetcher to determine whether or not to stop the fetcher, just for that fetching task identified by its Configuration (and no others) > Once Nutch logs a SEVERE log item, Nutch fails forevermore > ---------------------------------------------------------- > > Key: NUTCH-258 > URL: http://issues.apache.org/jira/browse/NUTCH-258 > Project: Nutch > Type: Bug > Components: fetcher > Versions: 0.8-dev > Environment: All > Reporter: Scott Ganyo > Assignee: Chris A. Mattmann > Priority: Critical > Attachments: dumbfix.patch > > Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. > This is from the run() method in Fetcher.java: > public void run() { > synchronized (Fetcher.this) {activeThreads++;} // count threads > > try { > UTF8 key = new UTF8(); > CrawlDatum datum = new CrawlDatum(); > > while (true) { > if (LogFormatter.hasLoggedSevere()) // something bad happened > break; // exit > > Notice the last 2 lines. This will prevent Nutch from ever Fetching again > once this is hit as LogFormatter is storing this data as a static. > (Also note that "LogFormatter.hasLoggedSevere()" is also checked in > org.apache.nutch.net.URLFilterChecker and will disable this class as well.) > This must be fixed or Nutch cannot be run as any kind of long-running > service. Furthermore, I believe it is a poor decision to rely on a logging > event to determine the state of the application - this could have any number > of side-effects that would be extremely difficult to track down. (As it has > already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira