[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-26 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467887
 ] 

Chris A. Mattmann commented on NUTCH-258:
-

Guys,

 From recent conversations on the mailing list where Doug mentioned that this 
issue may now be rendered resolved by recent changes to Hadoop, I'm wondering 
if we can close this issue? It's currently listed as a critical priority bug, 
and there are currently 3 watchers of the issue. I've asked this several times 
recently in the last few months about whether people are still experiencing 
this issue. So, the question is, are they? If not, I'd like to close out the 
issue as I'm trying to get things organized here in JIRA, so that developers 
and contributors can have a good idea of what issues are out there, that really 
need some attention. With the recent lack of developer resources, I think 
closing out issues that are not reproduceable, issues that people are no longer 
experiencing, or issues resolved by recent changes in Hadoop/etc. is an 
important thing to do in this process.

 Thus, I'm opening this issue up to any objections for closing/resolving it. If 
I don't hear any objections in the next week, I will close this issue out.

Thanks!

Cheers,
  Chris


 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

 Key: NUTCH-258
 URL: https://issues.apache.org/jira/browse/NUTCH-258
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.8
 Environment: All
Reporter: Scott Ganyo
 Assigned To: Chris A. Mattmann
Priority: Critical
 Fix For: 0.9.0

 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, 
 NUTCH-258.Mattmann.080406.patch.txt


 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-26 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467916
 ] 

Sami Siren commented on NUTCH-258:
--

I haven't noticed this being a problem for me, so no objections from here.

 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

 Key: NUTCH-258
 URL: https://issues.apache.org/jira/browse/NUTCH-258
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.8
 Environment: All
Reporter: Scott Ganyo
 Assigned To: Chris A. Mattmann
Priority: Critical
 Fix For: 0.9.0

 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, 
 NUTCH-258.Mattmann.080406.patch.txt


 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-26 Thread Scott Ganyo (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467931
 ] 

Scott Ganyo commented on NUTCH-258:
---

Chris,

I originally opened the issue... but unfortunately I can neither confirm nor 
deny that this is fixed as I'm no longer on the project that originally had the 
issue.  (And, in fact, they never allowed an upgrade to the latest version of 
Nutch/Hadoop anyway.)  So, close away if nobody else is having the issue!

Thanks!
Scott


 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

 Key: NUTCH-258
 URL: https://issues.apache.org/jira/browse/NUTCH-258
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.8
 Environment: All
Reporter: Scott Ganyo
 Assigned To: Chris A. Mattmann
Priority: Critical
 Fix For: 0.9.0

 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, 
 NUTCH-258.Mattmann.080406.patch.txt


 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting

Scott Ganyo (JIRA) wrote:

 ... since Hadoop hijacks and reassigns all log formatters (also a bad 
practice!) in the org.apache.hadoop.util.LogFormatter static constructor ...


FYI, Hadoop no longer does this.

Doug


Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
Hi Doug,

 So, does this render the patch that I wrote obsolete?

Cheers,
  Chris



On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote:

 Scott Ganyo (JIRA) wrote:
  ... since Hadoop hijacks and reassigns all log formatters (also a bad
 practice!) in the org.apache.hadoop.util.LogFormatter static constructor ...
 
 FYI, Hadoop no longer does this.
 
 Doug

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting

Chris Mattmann wrote:

 So, does this render the patch that I wrote obsolete?


It's at least out-of-date and perhaps obsolete.  A quick read of 
Fetcher.java looks like there might be a case where a fatal error is 
logged but the fetcher doesn't exit, in FetcherThread#output().


Doug


Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
 It's at least out-of-date and perhaps obsolete.  A quick read of
 Fetcher.java looks like there might be a case where a fatal error is
 logged but the fetcher doesn't exit, in FetcherThread#output().
 

So this raises an interesting question:

People (such as Scott G.) out there -- are you folks still experiencing
similar problems? Do the recent Hadoop changes alleviate the bad behavior
you were experiencing? If so, then maybe this issue should be closed...

Cheers,
  Chris

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12429035 ] 

Chris A. Mattmann commented on NUTCH-258:
-

Hi Folks,

  A patch is available on this issue. Has anyone who was experiencing the 
original problem tried out the latest trunk with this patch applied? Does this 
patch resolve your issue?

Thanks,
  Chris


 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

 Key: NUTCH-258
 URL: http://issues.apache.org/jira/browse/NUTCH-258
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.8
 Environment: All
Reporter: Scott Ganyo
 Assigned To: Chris A. Mattmann
Priority: Critical
 Fix For: 0.9.0

 Attachments: dumbfix.patch, NUTCH-258.Mattmann.060906.patch.txt, 
 NUTCH-258.Mattmann.080406.patch.txt


 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-15 Thread Chris A. Mattmann (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12416379 ] 

Chris A. Mattmann commented on NUTCH-258:
-

 Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-(
 Since Nutch no more use the deprecated Hadoop LogFormatter, there is no more 
 logSevere check in the code.

Oh Jerome. You're always trying to scoop me on stuff! ;)


 But I'm not sure all these log severe should be marked as severe (fatal level 
 is used now).

Agreed. Let's review the places in the patch where severe errors are logged, 
and then remove/add as deemed necessary. 


 So, what I suggest is to review all the fatal logs and check if they are 
 really fatal for the whole process. 

Agreed. I'll get on this right away.

 And finally, why not simply throwing a RuntimeException that will by catched 
 the Fetcher if something wrong really occurs?

Because we don't want one RuntimeException killing all subsequent fetching 
tasks. See the previous discussions on this by Andrzej, Scott, and I. Basically 
it boils down to ensuring that LOG.severe and its associated checking mechanism 
is associated within the context of a particular fetching task that executes: 
we believed that the best way to do that would be to use the Hadoop 
Configuration (which is task specific). Make sense?

Okey dokey, I'll work on an updated patch and submit for review soon (I won't 
specify an exact date, because I'm always late ;) ).


 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Assignee: Chris A. Mattmann
 Priority: Critical
  Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-13 Thread Jerome Charron (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12415984 ] 

Jerome Charron commented on NUTCH-258:
--

Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-(
Since Nutch no more use the deprecated Hadoop LogFormatter, there is no more 
logSevere check in the code.
So we quickly need to have a patch for this issue in order to have the same 
behaviors.

In your patch Chris, you set a severe flag each time a log severe is called.
But I'm not sure all these log severe should be marked as severe (fatal level 
is used now).
For instance, is it really fatal for the fetcher that the conf file for 
RegexUrlNormalizer is wrong?
Is it really fatal for the fetcher if the language identifier raise an 
exception while loading ngrams profiles?
Is it really fatal for the fetcher if the ontology plugin failed on reading an 
ontology?
But sure it is fatal if the user-agent is not correctly setted in http plugins!

So, what I suggest is to review all the fatal logs and check if they are really 
fatal for the whole process.
And finally, why not simply throwing a RuntimeException that will by catched 
the Fetcher if something wrong really occurs?

 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Assignee: Chris A. Mattmann
 Priority: Critical
  Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-09 Thread Scott Ganyo

Thanks, Chris!  (And thank you, Andrzej for interpreting my rantings!)

That plan sounds fantastic and I would be happy to help out.

Scott

On Jun 5, 2006, at 1:01 PM, Chris Mattmann wrote:


Hi Andrzej,



The main problem, as Scott observed, is that the static flag  
affects all

instances of the task executing inside the same JVM. If there are
several Fetcher tasks (or any other tasks that check for SEVERE  
flag!),

belonging to different jobs, all of them will quit. This is certainly
not the intended behavior.



Got it.



In fact, I believe that this would make a fantastic anti- 
pattern.  If this
kind of behavior is *really* wanted (and I argue that it should  
not be

below),
it should be done through an explicit mechanism, not as a side- 
effect.







I have a proposal for a simple solution: set a flag in the current
Configuration instance, and check for this flag. The Configuration
instance provides a task-specific context persisting throughout the
lifetime of a task - but limited only to that task. Voila - problem
solved. We get rid of the dubious use of LogFormatter (I hope  
Chris that

even you would agree that this pattern is slightly .. unusual ;) )


What, unusual? Huh? :-)


and
we gain flexible mechanism limited in scope to the current task,  
which

ensures isolation from other tasks in the same JVM. How about that?


+1

I like your proposed solution. I haven't used multiple fetchers really
inside the same process too, much however, I do have an application  
that
calls fetches in more of a sequential way in the same JVM. So, I  
guess I
just never ran across the behavior. The thing I like about the  
proposed
solution is its separation and isolation of a task context, which I  
think
that Nutch (now relying on Hadoop as the underlying architectural  
computing

platform) needed to address.

So, to summarize, the proposed resolution is:

* add flag field in Configuration instance to signify whether or not a
SEVERE error has been logged within a task's context

* check this field within the fetcher to determine whether or not  
to stop
the fetcher, just for that fetching task identified by its  
Configuration

(and no others)

Is this representative of what you're proposing Andrzej? If so, I'd  
like to
take the lead on contributing a small patch that handles this, and  
then it
would be great if people like Scott could test this out in their  
existing

environments where this error was manifesting itself.

Thanks!

Cheers,
  Chris

(BTW: would you like me to re-open the JIRA issue, or do you want  
to do it?)


__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not  
reflect

those of either NASA, JPL, or the California Institute of Technology.






[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Scott Ganyo (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414762 ] 

Scott Ganyo commented on NUTCH-258:
---

For the record:  I strongly object to closing this issue for the following 
reasons:

1) Having a *side-effect* of the entire system stop processing after merely 
logging a message at a certain event level is a poor practice.  In fact, I 
believe that this would make a fantastic anti-pattern.  If this kind of 
behavior is *really* wanted (and I argue that it should not be below), it 
should be done through an explicit mechanism, not as a side-effect.  For 
example, did you realize that since Hadoop hijacks and reassigns all log 
formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter 
static constructor that anyone using Nutch as a library and logs a SEVERE error 
will suffer by having Nutch stop fetching?

2) Moreover, having the system stop processing forever more by use of a 
static(!) flag makes the use of the Nutch system as a library within a server 
or service environment impossible.  Once this logging is done, no more Fetcher 
processing in this run *or any other* can take place.  This is inappropriate.  
You might as well call System.exit() at this point!  In fact, I could even 
argue that the current behavior is worse than a System.exit(), as it can 
actually obfuscate why the system has ceased being operational even though it 
is still ostensibly running.

Thus, while there definitely *are* instances of inappropriate logging levels 
being used and I could document them, I believe that this issue is more endemic 
to the system and it's architecture than the utilization of a particular 
logging level for a certain event.

 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Priority: Critical
  Attachments: dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Stefan Groschupf (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ] 

Stefan Groschupf commented on NUTCH-258:


Scott, 
I agree with you. However we need a clean patch to solve the problem, we can 
not just comment things out of the code.
So I vote for the issue and I vote to reopen this issue.

 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Priority: Critical
  Attachments: dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Folks,

 Before I (or someone else) reopens the issue, I think it's important to
understand the implications:

1) Having a *side-effect* of the entire system stop processing after merely
 logging a message at a certain event level is a poor practice.

I'm not sure that the Fetcher quitting is a * side-effect * as you call it.
In fact, I think it's clearly stated as the behavior of the system, both
within the code, and in several mailing list conversations I've seen over
the course of the past two years (I can dig these up, if needed).

 In fact, I believe that this would make a fantastic anti-pattern.  If this
 kind of behavior is *really* wanted (and I argue that it should not be below),
 it should be done through an explicit mechanism, not as a side-effect.

Again, the use of side-effect here is strange to me: how is an explicit
check for any LOG messages to the SEVERE level before quitting a
side-effect? 

 For example, did you realize that since Hadoop hijacks and reassigns all log
 formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter
 static constructor that anyone using Nutch as a library and logs a SEVERE\
 error will suffer by having Nutch stop fetching?

I'm not convinced that having Nutch stop fetching when a SEVERE error is
logged is the wrong behavior. Let's think about what possible SEVERE errors
may typically be logged: Out of Memory error, potentially,
InterruptedExceptions in Threads (possibly), failure in any of the plugin
libraries critical to the fetch running (possibly), the list goes on and on.
So, in this case, you argue that the Fetcher should continue operating?

 2) Moreover, having the system stop processing forever more by use of a
 static(!) flag makes the use of the Nutch system as a library within a server
 or service environment impossible.  Once this logging is done, no more Fetcher
 processing in this run *or any other* can take place.

I've been using Nutch in a server environment (JSPs and Tomcat) within a
large-scale data system at NASA for the course of the past year, and have
never been impeded by the behavior of the fetcher. Can you be more specific
here as to the exact use-case that's failing in your scenario? I've also
been watching the mailing lists for the better course of almost 2 years, and
have seen little traffic (outside of the aforementioned clarifications/etc.
above) about this issue. I may be out on an island here, but again, I'm not
convinced that this is a core issue.

Just my 2 cents. If the votes continue that this is an issue, however, I'll
have no problem opening it up (or one of the committers can do it as well).

Cheers,
  Chris





On 6/5/06 7:11 AM, Stefan Groschupf (JIRA) [EMAIL PROTECTED] wrote:

 [ 
 http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ]
 
 Stefan Groschupf commented on NUTCH-258:
 
 
 Scott, 
 I agree with you. However we need a clean patch to solve the problem, we can
 not just comment things out of the code.
 So I vote for the issue and I vote to reopen this issue.
 
 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --
 
  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug
 
   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Priority: Critical
  Attachments: dumbfix.patch
 
 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore.
 This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running
 service.  Furthermore, I believe it is a poor decision to rely on a logging
 event to determine the state of the application - this could have any number
 of side-effects that would be extremely difficult to track down.  (As it has
 already for me.)

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
Phone:  818-354-8810

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Andrzej Bialecki

Chris Mattmann wrote:

Folks,

 Before I (or someone else) reopens the issue, I think it's important to
understand the implications:
  


I vote for re-opening. See below.

  

1) Having a *side-effect* of the entire system stop processing after merely
logging a message at a certain event level is a poor practice.



I'm not sure that the Fetcher quitting is a * side-effect * as you call it.
In fact, I think it's clearly stated as the behavior of the system, both
within the code, and in several mailing list conversations I've seen over
the course of the past two years (I can dig these up, if needed).
  


The main problem, as Scott observed, is that the static flag affects all 
instances of the task executing inside the same JVM. If there are 
several Fetcher tasks (or any other tasks that check for SEVERE flag!), 
belonging to different jobs, all of them will quit. This is certainly 
not the intended behavior.


  

In fact, I believe that this would make a fantastic anti-pattern.  If this
kind of behavior is *really* wanted (and I argue that it should not be below),
it should be done through an explicit mechanism, not as a side-effect.



  


I have a proposal for a simple solution: set a flag in the current 
Configuration instance, and check for this flag. The Configuration 
instance provides a task-specific context persisting throughout the 
lifetime of a task - but limited only to that task. Voila - problem 
solved. We get rid of the dubious use of LogFormatter (I hope Chris that 
even you would agree that this pattern is slightly .. unusual ;) ), and 
we gain flexible mechanism limited in scope to the current task, which 
ensures isolation from other tasks in the same JVM. How about that?




I've been using Nutch in a server environment (JSPs and Tomcat) within a
large-scale data system at NASA for the course of the past year, and have
never been impeded by the behavior of the fetcher. Can you be more specific
  


Have you ever tried to run several different crawls inside the same JVM? 
That's a common requirement if you want to use Nutch as a crawler 
component inside a larger application. I have, and as a result of my 
bad experiences I initiated the discussion, which led to the dynamic 
NutchConf patches implemented by Stefan. The issue of LogFormatter has 
been discussed also about that time, but since we hadn't had dynamic 
NutchConf yet it was postponed, because there was no clear idea how to 
solve it cleanly. I believe there is now.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Hi Andrzej,

 
 The main problem, as Scott observed, is that the static flag affects all
 instances of the task executing inside the same JVM. If there are
 several Fetcher tasks (or any other tasks that check for SEVERE flag!),
 belonging to different jobs, all of them will quit. This is certainly
 not the intended behavior.
 

Got it.

   
 In fact, I believe that this would make a fantastic anti-pattern.  If this
 kind of behavior is *really* wanted (and I argue that it should not be
 below),
 it should be done through an explicit mechanism, not as a side-effect.
 
 
   
 
 I have a proposal for a simple solution: set a flag in the current
 Configuration instance, and check for this flag. The Configuration
 instance provides a task-specific context persisting throughout the
 lifetime of a task - but limited only to that task. Voila - problem
 solved. We get rid of the dubious use of LogFormatter (I hope Chris that
 even you would agree that this pattern is slightly .. unusual ;) )

What, unusual? Huh? :-)

 and 
 we gain flexible mechanism limited in scope to the current task, which
 ensures isolation from other tasks in the same JVM. How about that?

+1

I like your proposed solution. I haven't used multiple fetchers really
inside the same process too, much however, I do have an application that
calls fetches in more of a sequential way in the same JVM. So, I guess I
just never ran across the behavior. The thing I like about the proposed
solution is its separation and isolation of a task context, which I think
that Nutch (now relying on Hadoop as the underlying architectural computing
platform) needed to address.

So, to summarize, the proposed resolution is:

* add flag field in Configuration instance to signify whether or not a
SEVERE error has been logged within a task's context

* check this field within the fetcher to determine whether or not to stop
the fetcher, just for that fetching task identified by its Configuration
(and no others)

Is this representative of what you're proposing Andrzej? If so, I'd like to
take the lead on contributing a small patch that handles this, and then it
would be great if people like Scott could test this out in their existing
environments where this error was manifesting itself.

Thanks!

Cheers,
  Chris

(BTW: would you like me to re-open the JIRA issue, or do you want to do it?)

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Andrzej Bialecki

Chris Mattmann wrote:

+1

So, to summarize, the proposed resolution is:

* add flag field in Configuration instance to signify whether or not a
SEVERE error has been logged within a task's context
  


Yes, preferably define this as a public static final String-s in 
NutchConfiguration, both the field name and this special field value. 
The value should be a (short) string too, to minimize conversions 
from/to other formats.



* check this field within the fetcher to determine whether or not to stop
the fetcher, just for that fetching task identified by its Configuration
(and no others)
  


Yes.


Is this representative of what you're proposing Andrzej? If so, I'd like to
take the lead on contributing a small patch that handles this, and then it
would be great if people like Scott could test this out in their existing
environments where this error was manifesting itself.

Thanks!

Cheers,
  Chris

(BTW: would you like me to re-open the JIRA issue, or do you want to do it?)
  


Sure, feel free to follow-up on this to its conclusion :)

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Stefan Groschupf
I have a proposal for a simple solution: set a flag in the current  
Configuration instance, and check for this flag. The Configuration  
instance provides a task-specific context persisting throughout the  
lifetime of a task - but limited only to that task. Voila - problem  
solved. We get rid of the dubious use of LogFormatter (I hope Chris  
that even you would agree that this pattern is slightly ..  
unusual ;) ), and we gain flexible mechanism limited in scope to  
the current task, which ensures isolation from other tasks in the  
same JVM. How about that?

Wonderful idea :-D
+ 1




[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414598 ] 

Chris A. Mattmann commented on NUTCH-258:
-

Hi there,

 I believe that the fetcher halting on a LOG.Severe is the intended behavior of 
the system. The use of this SEVERE error in Nutch is pretty consistent with 
Sun's documentation 
(http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html#1.2), 
including its javadoc for JDK 5 
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/logging/Level.html). A 
SEVERE error is defined as a message level indicating a serious failure.. So 
I think that in the case of the Fetcher, that this behavior is actually 
warranted, considering if anything got logged to the SEVERE level, then there 
was some serious, un-recoverable error while fetching.

 If you believe that there is an inappropriate use of LOG.severe, however, in 
the Fetcher, for instance, if an informational message is being logged to the 
SEVERE level, then that's a separate issue, and please indicate where this is 
happening, However, as I stated, I believe SEVERE errors causing the fetcher to 
halt is indeed the intended behavior of Nutch, so, if there are no objections, 
I would like to close this issue.

Thanks,
  Chris


 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Priority: Critical
  Attachments: dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-05-21 Thread Stefan Neufeind (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12412705 ] 

Stefan Neufeind commented on NUTCH-258:
---

Beware of simply silencing  the error! It helped me at one place - but at 
another it really caused an infinite loop not to end.

 Once Nutch logs a SEVERE log item, Nutch fails forevermore
 --

  Key: NUTCH-258
  URL: http://issues.apache.org/jira/browse/NUTCH-258
  Project: Nutch
 Type: Bug

   Components: fetcher
 Versions: 0.8-dev
  Environment: All
 Reporter: Scott Ganyo
 Priority: Critical
  Attachments: dumbfix.patch

 Once a SEVERE log item is written, Nutch shuts down any fetching forevermore. 
  This is from the run() method in Fetcher.java:
 public void run() {
   synchronized (Fetcher.this) {activeThreads++;} // count threads
   
   try {
 UTF8 key = new UTF8();
 CrawlDatum datum = new CrawlDatum();
 
 while (true) {
   if (LogFormatter.hasLoggedSevere()) // something bad happened
 break;// exit
   
 Notice the last 2 lines.  This will prevent Nutch from ever Fetching again 
 once this is hit as LogFormatter is storing this data as a static.
 (Also note that LogFormatter.hasLoggedSevere() is also checked in 
 org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
 This must be fixed or Nutch cannot be run as any kind of long-running 
 service.  Furthermore, I believe it is a poor decision to rely on a logging 
 event to determine the state of the application - this could have any number 
 of side-effects that would be extremely difficult to track down.  (As it has 
 already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira