[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters
[ https://issues.apache.org/jira/browse/NUTCH-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494552 ] Andrzej Bialecki commented on NUTCH-393: - I agree with that - either all filters should run or the document should be discarded. If it's acceptable to tolerate exceptions in some indexing filters, such exceptions should be caught there. Indexer doesn't handle null documents returned by filters - Key: NUTCH-393 URL: https://issues.apache.org/jira/browse/NUTCH-393 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 0.8.1, 0.9.0 Reporter: Eelco Lempsink Attachments: NUTCH-393.patch Plugins (like IndexingFilter) may return a null value, but this isn't handled by the Indexer. A trivial adjustment is all it takes: @@ -237,6 +237,7 @@ if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); } return; } +if (doc == null) return; float boost = 1.0f; // run scoring filters -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters
[ http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ] Enis Soztutar commented on NUTCH-393: - Also IndexingException is catched by the Indexer, in which case the whole document is not added to the writer (the function returns). Indexer : 334 try { // run indexing filters doc = this.filters.filter(doc, parse, (UTF8)key, fetchDatum, inlinks); } catch (IndexingException e) { if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); } return; } IndexingException should be cought in the IndexingFilters.filter(), so that when an IndexingException is thrown in one indexing plugin, the other plugins could still be run. Indexer doesn't handle null documents returned by filters - Key: NUTCH-393 URL: http://issues.apache.org/jira/browse/NUTCH-393 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 0.8.1 Reporter: Eelco Lempsink Attachments: NUTCH-393.patch Plugins (like IndexingFilter) may return a null value, but this isn't handled by the Indexer. A trivial adjustment is all it takes: @@ -237,6 +237,7 @@ if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); } return; } +if (doc == null) return; float boost = 1.0f; // run scoring filters -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters
[ http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447939 ] Eelco Lempsink commented on NUTCH-393: -- I'm not sure I agree with that. After running a document through a set of filters you'd expect all filters ran. If not, that's an exception. For instance, your index might depend on all numbers and non-english words being stripped. When one of those filters hits an exception, but the other one runs, your index will become dirty. Indexer doesn't handle null documents returned by filters - Key: NUTCH-393 URL: http://issues.apache.org/jira/browse/NUTCH-393 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 0.8.1 Reporter: Eelco Lempsink Attachments: NUTCH-393.patch Plugins (like IndexingFilter) may return a null value, but this isn't handled by the Indexer. A trivial adjustment is all it takes: @@ -237,6 +237,7 @@ if (LOG.isWarnEnabled()) { LOG.warn(Error indexing +key+: +e); } return; } +if (doc == null) return; float boost = 1.0f; // run scoring filters -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira