[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514592 ]
Doğacan Güney commented on NUTCH-518: ------------------------------------- > My objection to the original patch was specifically what the > OPICScoringFilter should do in this case, and not what any > ScoringFilter should be able to do. Two common use cases seem to be (besides opic): 1) Boosters: Boosters check for a specific pattern and if the pattern exists adds a small boost to the original score. 2) Sinks: These are commonly used to restrict crawls to specific domains. They are used to "pull" the page's score to zero so that page is never fetched as long as there are other pages to fetch. If opic adds: 1st use case) returns 0 if it doesn't want to boost, or a positive value 2nd use case) returns a negative number to sink or 0. if opic multiplies: 1st use case) returns 1 if it doesn't want to boost or >1. 2nd use case) returns 0 to sink or 1. As far as I'm concerned, it doesn't matter if opic multiplies or adds. Other scoring filters will have to change their behaviour to work with opic in both cases anyway. (FWIW, I think multiplication is a tiny bit more elegant. I like the idea of returning 0 to sink better than returning negative numbers.) > Fix OpicScoringFilter to respect scoring filter chaining > -------------------------------------------------------- > > Key: NUTCH-518 > URL: https://issues.apache.org/jira/browse/NUTCH-518 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.0.0 > Reporter: Enis Soztutar > Assignee: Doğacan Güney > Fix For: 1.0.0 > > Attachments: opicScoring.chain.patch > > > Opic Scoring returns the score that it calculates, rather than returning > previous_score * calculated_score. This prevents using another scoring filter > along with Opic scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers