[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters

2007-05-09 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494552
 ] 

Andrzej Bialecki  commented on NUTCH-393:
-

I agree with that - either all filters should run or the document should be 
discarded. If it's acceptable to tolerate exceptions in some indexing filters, 
such exceptions should be caught there.

 Indexer doesn't handle null documents returned by filters
 -

 Key: NUTCH-393
 URL: https://issues.apache.org/jira/browse/NUTCH-393
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 0.8.1, 0.9.0
Reporter: Eelco Lempsink
 Attachments: NUTCH-393.patch


 Plugins (like IndexingFilter) may return a null value, but this isn't handled 
 by the Indexer.  A trivial adjustment is all it takes:
 @@ -237,6 +237,7 @@
if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); }
return;
  }
 +if (doc == null) return;
  
  float boost = 1.0f;
  // run scoring filters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters

2006-11-07 Thread Enis Soztutar (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ] 

Enis Soztutar commented on NUTCH-393:
-

Also IndexingException is catched by the Indexer, in which  case the whole 
document is not added to the writer (the function returns).

Indexer : 334
try {
// run indexing filters
doc = this.filters.filter(doc, parse, (UTF8)key, fetchDatum, inlinks);
} catch (IndexingException e) {
if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); }
   return;
}  

IndexingException should be cought in the IndexingFilters.filter(), so that 
when an IndexingException is thrown in one indexing plugin, the other plugins 
could still be run. 



 Indexer doesn't handle null documents returned by filters
 -

 Key: NUTCH-393
 URL: http://issues.apache.org/jira/browse/NUTCH-393
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 0.8.1
Reporter: Eelco Lempsink
 Attachments: NUTCH-393.patch


 Plugins (like IndexingFilter) may return a null value, but this isn't handled 
 by the Indexer.  A trivial adjustment is all it takes:
 @@ -237,6 +237,7 @@
if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); }
return;
  }
 +if (doc == null) return;
  
  float boost = 1.0f;
  // run scoring filters

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters

2006-11-07 Thread Eelco Lempsink (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447939 ] 

Eelco Lempsink commented on NUTCH-393:
--

I'm not sure I agree with that. After running a document through a set of 
filters you'd expect all filters ran. If not, that's an exception.  For 
instance, your index might depend on all numbers and non-english words being 
stripped. When one of those filters hits an exception, but the other one runs, 
your index will become dirty.

 Indexer doesn't handle null documents returned by filters
 -

 Key: NUTCH-393
 URL: http://issues.apache.org/jira/browse/NUTCH-393
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 0.8.1
Reporter: Eelco Lempsink
 Attachments: NUTCH-393.patch


 Plugins (like IndexingFilter) may return a null value, but this isn't handled 
 by the Indexer.  A trivial adjustment is all it takes:
 @@ -237,6 +237,7 @@
if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); }
return;
  }
 +if (doc == null) return;
  
  float boost = 1.0f;
  // run scoring filters

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira