[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]

Chris A. Mattmann updated NUTCH-258:
------------------------------------

    Attachment: NUTCH-258.Mattmann.080406.patch.txt

Hi Folks,

 Sorry I'm a little later than I expected on this one. Attached is a patch that 
implements the suggested fix for NUTCH-258 by Jerome. Everywhere that a SEVERE 
(or I guess it's called "Fatal" now) error is logged, a RuntimeException is 
thrown that is caught in the outermost loop of the Fetcher, which should stop 
only the current Fetcher thread that catches the Exception. Additionally, I 
have tried to clean up places where I felt a LOG.fatal wasn't warranted, so 
please do confirm that everywhere that I cleaned up is in fact places that we 
don't want LOG.fatal, otherwise, feel free to move them back to LOG.fatal. The 
only caveat is, there has to be some discipline by the programmer ensuring that 
everywhere LOG.fatal is used, a RuntimeException should be thrown if your code 
is touched by the Fetcher during fetching. 

I tested this patch against the latest Nutch SVN and all unit tests are 
passing. It would be great if someone could test this in a distributed 
environment.

Thanks,
  Chris


> Once Nutch logs a SEVERE log item, Nutch fails forevermore
> ----------------------------------------------------------
>
>                 Key: NUTCH-258
>                 URL: http://issues.apache.org/jira/browse/NUTCH-258
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>         Environment: All
>            Reporter: Scott Ganyo
>         Assigned To: Chris A. Mattmann
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, 
> NUTCH-258.Mattmann.080406.patch.txt
>
>
> Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
>  This is from the run() method in Fetcher.java:
>     public void run() {
>       synchronized (Fetcher.this) {activeThreads++;} // count threads
>       
>       try {
>         UTF8 key = new UTF8();
>         CrawlDatum datum = new CrawlDatum();
>         
>         while (true) {
>           if (LogFormatter.hasLoggedSevere())     // something bad happened
>             break;                                // exit
>           
> Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
> once this is hit as LogFormatter is storing this data as a static.
> (Also note that "LogFormatter.hasLoggedSevere()" is also checked in 
> org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
> This must be fixed or Nutch cannot be run as any kind of long-running 
> service.  Furthermore, I believe it is a poor decision to rely on a logging 
> event to determine the state of the application - this could have any number 
> of side-effects that would be extremely difficult to track down.  (As it has 
> already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to