[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467887 ] Chris A. Mattmann commented on NUTCH-258: - Guys, From recent conversations on the mailing list where Doug mentioned that this issue may now be rendered resolved by recent changes to Hadoop, I'm wondering if we can close this issue? It's currently listed as a critical priority bug, and there are currently 3 watchers of the issue. I've asked this several times recently in the last few months about whether people are still experiencing this issue. So, the question is, are they? If not, I'd like to close out the issue as I'm trying to get things organized here in JIRA, so that developers and contributors can have a good idea of what issues are out there, that really need some attention. With the recent lack of developer resources, I think closing out issues that are not reproduceable, issues that people are no longer experiencing, or issues resolved by recent changes in Hadoop/etc. is an important thing to do in this process. Thus, I'm opening this issue up to any objections for closing/resolving it. If I don't hear any objections in the next week, I will close this issue out. Thanks! Cheers, Chris Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: https://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8 Environment: All Reporter: Scott Ganyo Assigned To: Chris A. Mattmann Priority: Critical Fix For: 0.9.0 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, NUTCH-258.Mattmann.080406.patch.txt Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467916 ] Sami Siren commented on NUTCH-258: -- I haven't noticed this being a problem for me, so no objections from here. Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: https://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8 Environment: All Reporter: Scott Ganyo Assigned To: Chris A. Mattmann Priority: Critical Fix For: 0.9.0 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, NUTCH-258.Mattmann.080406.patch.txt Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467931 ] Scott Ganyo commented on NUTCH-258: --- Chris, I originally opened the issue... but unfortunately I can neither confirm nor deny that this is fixed as I'm no longer on the project that originally had the issue. (And, in fact, they never allowed an upgrade to the latest version of Nutch/Hadoop anyway.) So, close away if nobody else is having the issue! Thanks! Scott Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: https://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8 Environment: All Reporter: Scott Ganyo Assigned To: Chris A. Mattmann Priority: Critical Fix For: 0.9.0 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, NUTCH-258.Mattmann.080406.patch.txt Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor ... FYI, Hadoop no longer does this. Doug
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Hi Doug, So, does this render the patch that I wrote obsolete? Cheers, Chris On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote: Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor ... FYI, Hadoop no longer does this. Doug __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Chris Mattmann wrote: So, does this render the patch that I wrote obsolete? It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). Doug
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). So this raises an interesting question: People (such as Scott G.) out there -- are you folks still experiencing similar problems? Do the recent Hadoop changes alleviate the bad behavior you were experiencing? If so, then maybe this issue should be closed... Cheers, Chris __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12429035 ] Chris A. Mattmann commented on NUTCH-258: - Hi Folks, A patch is available on this issue. Has anyone who was experiencing the original problem tried out the latest trunk with this patch applied? Does this patch resolve your issue? Thanks, Chris Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8 Environment: All Reporter: Scott Ganyo Assigned To: Chris A. Mattmann Priority: Critical Fix For: 0.9.0 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, NUTCH-258.Mattmann.080406.patch.txt Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12416379 ] Chris A. Mattmann commented on NUTCH-258: - Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-( Since Nutch no more use the deprecated Hadoop LogFormatter, there is no more logSevere check in the code. Oh Jerome. You're always trying to scoop me on stuff! ;) But I'm not sure all these log severe should be marked as severe (fatal level is used now). Agreed. Let's review the places in the patch where severe errors are logged, and then remove/add as deemed necessary. So, what I suggest is to review all the fatal logs and check if they are really fatal for the whole process. Agreed. I'll get on this right away. And finally, why not simply throwing a RuntimeException that will by catched the Fetcher if something wrong really occurs? Because we don't want one RuntimeException killing all subsequent fetching tasks. See the previous discussions on this by Andrzej, Scott, and I. Basically it boils down to ensuring that LOG.severe and its associated checking mechanism is associated within the context of a particular fetching task that executes: we believed that the best way to do that would be to use the Hadoop Configuration (which is task specific). Make sense? Okey dokey, I'll work on an updated patch and submit for review soon (I won't specify an exact date, because I'm always late ;) ). Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Assignee: Chris A. Mattmann Priority: Critical Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12415984 ] Jerome Charron commented on NUTCH-258: -- Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-( Since Nutch no more use the deprecated Hadoop LogFormatter, there is no more logSevere check in the code. So we quickly need to have a patch for this issue in order to have the same behaviors. In your patch Chris, you set a severe flag each time a log severe is called. But I'm not sure all these log severe should be marked as severe (fatal level is used now). For instance, is it really fatal for the fetcher that the conf file for RegexUrlNormalizer is wrong? Is it really fatal for the fetcher if the language identifier raise an exception while loading ngrams profiles? Is it really fatal for the fetcher if the ontology plugin failed on reading an ontology? But sure it is fatal if the user-agent is not correctly setted in http plugins! So, what I suggest is to review all the fatal logs and check if they are really fatal for the whole process. And finally, why not simply throwing a RuntimeException that will by catched the Fetcher if something wrong really occurs? Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Assignee: Chris A. Mattmann Priority: Critical Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Thanks, Chris! (And thank you, Andrzej for interpreting my rantings!) That plan sounds fantastic and I would be happy to help out. Scott On Jun 5, 2006, at 1:01 PM, Chris Mattmann wrote: Hi Andrzej, The main problem, as Scott observed, is that the static flag affects all instances of the task executing inside the same JVM. If there are several Fetcher tasks (or any other tasks that check for SEVERE flag!), belonging to different jobs, all of them will quit. This is certainly not the intended behavior. Got it. In fact, I believe that this would make a fantastic anti- pattern. If this kind of behavior is *really* wanted (and I argue that it should not be below), it should be done through an explicit mechanism, not as a side- effect. I have a proposal for a simple solution: set a flag in the current Configuration instance, and check for this flag. The Configuration instance provides a task-specific context persisting throughout the lifetime of a task - but limited only to that task. Voila - problem solved. We get rid of the dubious use of LogFormatter (I hope Chris that even you would agree that this pattern is slightly .. unusual ;) ) What, unusual? Huh? :-) and we gain flexible mechanism limited in scope to the current task, which ensures isolation from other tasks in the same JVM. How about that? +1 I like your proposed solution. I haven't used multiple fetchers really inside the same process too, much however, I do have an application that calls fetches in more of a sequential way in the same JVM. So, I guess I just never ran across the behavior. The thing I like about the proposed solution is its separation and isolation of a task context, which I think that Nutch (now relying on Hadoop as the underlying architectural computing platform) needed to address. So, to summarize, the proposed resolution is: * add flag field in Configuration instance to signify whether or not a SEVERE error has been logged within a task's context * check this field within the fetcher to determine whether or not to stop the fetcher, just for that fetching task identified by its Configuration (and no others) Is this representative of what you're proposing Andrzej? If so, I'd like to take the lead on contributing a small patch that handles this, and then it would be great if people like Scott could test this out in their existing environments where this error was manifesting itself. Thanks! Cheers, Chris (BTW: would you like me to re-open the JIRA issue, or do you want to do it?) __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414762 ] Scott Ganyo commented on NUTCH-258: --- For the record: I strongly object to closing this issue for the following reasons: 1) Having a *side-effect* of the entire system stop processing after merely logging a message at a certain event level is a poor practice. In fact, I believe that this would make a fantastic anti-pattern. If this kind of behavior is *really* wanted (and I argue that it should not be below), it should be done through an explicit mechanism, not as a side-effect. For example, did you realize that since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor that anyone using Nutch as a library and logs a SEVERE error will suffer by having Nutch stop fetching? 2) Moreover, having the system stop processing forever more by use of a static(!) flag makes the use of the Nutch system as a library within a server or service environment impossible. Once this logging is done, no more Fetcher processing in this run *or any other* can take place. This is inappropriate. You might as well call System.exit() at this point! In fact, I could even argue that the current behavior is worse than a System.exit(), as it can actually obfuscate why the system has ceased being operational even though it is still ostensibly running. Thus, while there definitely *are* instances of inappropriate logging levels being used and I could document them, I believe that this issue is more endemic to the system and it's architecture than the utilization of a particular logging level for a certain event. Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Priority: Critical Attachments: dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ] Stefan Groschupf commented on NUTCH-258: Scott, I agree with you. However we need a clean patch to solve the problem, we can not just comment things out of the code. So I vote for the issue and I vote to reopen this issue. Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Priority: Critical Attachments: dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Folks, Before I (or someone else) reopens the issue, I think it's important to understand the implications: 1) Having a *side-effect* of the entire system stop processing after merely logging a message at a certain event level is a poor practice. I'm not sure that the Fetcher quitting is a * side-effect * as you call it. In fact, I think it's clearly stated as the behavior of the system, both within the code, and in several mailing list conversations I've seen over the course of the past two years (I can dig these up, if needed). In fact, I believe that this would make a fantastic anti-pattern. If this kind of behavior is *really* wanted (and I argue that it should not be below), it should be done through an explicit mechanism, not as a side-effect. Again, the use of side-effect here is strange to me: how is an explicit check for any LOG messages to the SEVERE level before quitting a side-effect? For example, did you realize that since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor that anyone using Nutch as a library and logs a SEVERE\ error will suffer by having Nutch stop fetching? I'm not convinced that having Nutch stop fetching when a SEVERE error is logged is the wrong behavior. Let's think about what possible SEVERE errors may typically be logged: Out of Memory error, potentially, InterruptedExceptions in Threads (possibly), failure in any of the plugin libraries critical to the fetch running (possibly), the list goes on and on. So, in this case, you argue that the Fetcher should continue operating? 2) Moreover, having the system stop processing forever more by use of a static(!) flag makes the use of the Nutch system as a library within a server or service environment impossible. Once this logging is done, no more Fetcher processing in this run *or any other* can take place. I've been using Nutch in a server environment (JSPs and Tomcat) within a large-scale data system at NASA for the course of the past year, and have never been impeded by the behavior of the fetcher. Can you be more specific here as to the exact use-case that's failing in your scenario? I've also been watching the mailing lists for the better course of almost 2 years, and have seen little traffic (outside of the aforementioned clarifications/etc. above) about this issue. I may be out on an island here, but again, I'm not convinced that this is a core issue. Just my 2 cents. If the votes continue that this is an issue, however, I'll have no problem opening it up (or one of the committers can do it as well). Cheers, Chris On 6/5/06 7:11 AM, Stefan Groschupf (JIRA) [EMAIL PROTECTED] wrote: [ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ] Stefan Groschupf commented on NUTCH-258: Scott, I agree with you. However we need a clean patch to solve the problem, we can not just comment things out of the code. So I vote for the issue and I vote to reopen this issue. Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Priority: Critical Attachments: dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 Phone: 818-354-8810
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Chris Mattmann wrote: Folks, Before I (or someone else) reopens the issue, I think it's important to understand the implications: I vote for re-opening. See below. 1) Having a *side-effect* of the entire system stop processing after merely logging a message at a certain event level is a poor practice. I'm not sure that the Fetcher quitting is a * side-effect * as you call it. In fact, I think it's clearly stated as the behavior of the system, both within the code, and in several mailing list conversations I've seen over the course of the past two years (I can dig these up, if needed). The main problem, as Scott observed, is that the static flag affects all instances of the task executing inside the same JVM. If there are several Fetcher tasks (or any other tasks that check for SEVERE flag!), belonging to different jobs, all of them will quit. This is certainly not the intended behavior. In fact, I believe that this would make a fantastic anti-pattern. If this kind of behavior is *really* wanted (and I argue that it should not be below), it should be done through an explicit mechanism, not as a side-effect. I have a proposal for a simple solution: set a flag in the current Configuration instance, and check for this flag. The Configuration instance provides a task-specific context persisting throughout the lifetime of a task - but limited only to that task. Voila - problem solved. We get rid of the dubious use of LogFormatter (I hope Chris that even you would agree that this pattern is slightly .. unusual ;) ), and we gain flexible mechanism limited in scope to the current task, which ensures isolation from other tasks in the same JVM. How about that? I've been using Nutch in a server environment (JSPs and Tomcat) within a large-scale data system at NASA for the course of the past year, and have never been impeded by the behavior of the fetcher. Can you be more specific Have you ever tried to run several different crawls inside the same JVM? That's a common requirement if you want to use Nutch as a crawler component inside a larger application. I have, and as a result of my bad experiences I initiated the discussion, which led to the dynamic NutchConf patches implemented by Stefan. The issue of LogFormatter has been discussed also about that time, but since we hadn't had dynamic NutchConf yet it was postponed, because there was no clear idea how to solve it cleanly. I believe there is now. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Hi Andrzej, The main problem, as Scott observed, is that the static flag affects all instances of the task executing inside the same JVM. If there are several Fetcher tasks (or any other tasks that check for SEVERE flag!), belonging to different jobs, all of them will quit. This is certainly not the intended behavior. Got it. In fact, I believe that this would make a fantastic anti-pattern. If this kind of behavior is *really* wanted (and I argue that it should not be below), it should be done through an explicit mechanism, not as a side-effect. I have a proposal for a simple solution: set a flag in the current Configuration instance, and check for this flag. The Configuration instance provides a task-specific context persisting throughout the lifetime of a task - but limited only to that task. Voila - problem solved. We get rid of the dubious use of LogFormatter (I hope Chris that even you would agree that this pattern is slightly .. unusual ;) ) What, unusual? Huh? :-) and we gain flexible mechanism limited in scope to the current task, which ensures isolation from other tasks in the same JVM. How about that? +1 I like your proposed solution. I haven't used multiple fetchers really inside the same process too, much however, I do have an application that calls fetches in more of a sequential way in the same JVM. So, I guess I just never ran across the behavior. The thing I like about the proposed solution is its separation and isolation of a task context, which I think that Nutch (now relying on Hadoop as the underlying architectural computing platform) needed to address. So, to summarize, the proposed resolution is: * add flag field in Configuration instance to signify whether or not a SEVERE error has been logged within a task's context * check this field within the fetcher to determine whether or not to stop the fetcher, just for that fetching task identified by its Configuration (and no others) Is this representative of what you're proposing Andrzej? If so, I'd like to take the lead on contributing a small patch that handles this, and then it would be great if people like Scott could test this out in their existing environments where this error was manifesting itself. Thanks! Cheers, Chris (BTW: would you like me to re-open the JIRA issue, or do you want to do it?) __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Chris Mattmann wrote: +1 So, to summarize, the proposed resolution is: * add flag field in Configuration instance to signify whether or not a SEVERE error has been logged within a task's context Yes, preferably define this as a public static final String-s in NutchConfiguration, both the field name and this special field value. The value should be a (short) string too, to minimize conversions from/to other formats. * check this field within the fetcher to determine whether or not to stop the fetcher, just for that fetching task identified by its Configuration (and no others) Yes. Is this representative of what you're proposing Andrzej? If so, I'd like to take the lead on contributing a small patch that handles this, and then it would be great if people like Scott could test this out in their existing environments where this error was manifesting itself. Thanks! Cheers, Chris (BTW: would you like me to re-open the JIRA issue, or do you want to do it?) Sure, feel free to follow-up on this to its conclusion :) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
I have a proposal for a simple solution: set a flag in the current Configuration instance, and check for this flag. The Configuration instance provides a task-specific context persisting throughout the lifetime of a task - but limited only to that task. Voila - problem solved. We get rid of the dubious use of LogFormatter (I hope Chris that even you would agree that this pattern is slightly .. unusual ;) ), and we gain flexible mechanism limited in scope to the current task, which ensures isolation from other tasks in the same JVM. How about that? Wonderful idea :-D + 1
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414598 ] Chris A. Mattmann commented on NUTCH-258: - Hi there, I believe that the fetcher halting on a LOG.Severe is the intended behavior of the system. The use of this SEVERE error in Nutch is pretty consistent with Sun's documentation (http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html#1.2), including its javadoc for JDK 5 (http://java.sun.com/j2se/1.5.0/docs/api/java/util/logging/Level.html). A SEVERE error is defined as a message level indicating a serious failure.. So I think that in the case of the Fetcher, that this behavior is actually warranted, considering if anything got logged to the SEVERE level, then there was some serious, un-recoverable error while fetching. If you believe that there is an inappropriate use of LOG.severe, however, in the Fetcher, for instance, if an informational message is being logged to the SEVERE level, then that's a separate issue, and please indicate where this is happening, However, as I stated, I believe SEVERE errors causing the fetcher to halt is indeed the intended behavior of Nutch, so, if there are no objections, I would like to close this issue. Thanks, Chris Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Priority: Critical Attachments: dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12412705 ] Stefan Neufeind commented on NUTCH-258: --- Beware of simply silencing the error! It helped me at one place - but at another it really caused an infinite loop not to end. Once Nutch logs a SEVERE log item, Nutch fails forevermore -- Key: NUTCH-258 URL: http://issues.apache.org/jira/browse/NUTCH-258 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: All Reporter: Scott Ganyo Priority: Critical Attachments: dumbfix.patch Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. This is from the run() method in Fetcher.java: public void run() { synchronized (Fetcher.this) {activeThreads++;} // count threads try { UTF8 key = new UTF8(); CrawlDatum datum = new CrawlDatum(); while (true) { if (LogFormatter.hasLoggedSevere()) // something bad happened break;// exit Notice the last 2 lines. This will prevent Nutch from ever Fetching again once this is hit as LogFormatter is storing this data as a static. (Also note that LogFormatter.hasLoggedSevere() is also checked in org.apache.nutch.net.URLFilterChecker and will disable this class as well.) This must be fixed or Nutch cannot be run as any kind of long-running service. Furthermore, I believe it is a poor decision to rely on a logging event to determine the state of the application - this could have any number of side-effects that would be extremely difficult to track down. (As it has already for me.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira