[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann updated NUTCH-258:
------------------------------------
Attachment: NUTCH-258.Mattmann.060906.patch.txt
Hi Folks,
Attached is a patch that implements the suggested two fixes to this issue. I
had to go through the Nutch code and look for LOG.severe calls, and then add an
additional:
conf.set(NutchConfiguration.LOG_SEVERE_FIELD, NutchConfiguration.LOG_SEVERE);
at the bottom of it. I had to go through several places in the code too where
SEVERE errors were being logged and make sure that those pieces of code had
access to the Configuration object. I ran unit-level tests and compilation, but
no system level tests. Could Scott or someone else who was experiencing this
problem test out this patch and then let me know if this fixes the issue?
Thanks!
Cheers,
Chris
> Once Nutch logs a SEVERE log item, Nutch fails forevermore
> ----------------------------------------------------------
>
> Key: NUTCH-258
> URL: http://issues.apache.org/jira/browse/NUTCH-258
> Project: Nutch
> Type: Bug
> Components: fetcher
> Versions: 0.8-dev
> Environment: All
> Reporter: Scott Ganyo
> Assignee: Chris A. Mattmann
> Priority: Critical
> Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch
>
> Once a SEVERE log item is written, Nutch shuts down any fetching forevermore.
> This is from the run() method in Fetcher.java:
> public void run() {
> synchronized (Fetcher.this) {activeThreads++;} // count threads
>
> try {
> UTF8 key = new UTF8();
> CrawlDatum datum = new CrawlDatum();
>
> while (true) {
> if (LogFormatter.hasLoggedSevere()) // something bad happened
> break; // exit
>
> Notice the last 2 lines. This will prevent Nutch from ever Fetching again
> once this is hit as LogFormatter is storing this data as a static.
> (Also note that "LogFormatter.hasLoggedSevere()" is also checked in
> org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
> This must be fixed or Nutch cannot be run as any kind of long-running
> service. Furthermore, I believe it is a poor decision to rely on a logging
> event to determine the state of the application - this could have any number
> of side-effects that would be extremely difficult to track down. (As it has
> already for me.)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers