Scoring API deficiency
----------------------
Key: NUTCH-321
URL: http://issues.apache.org/jira/browse/NUTCH-321
Project: Nutch
Issue Type: Improvement
Affects Versions: 0.8-dev
Reporter: Andrzej Bialecki
Fix For: 0.8-dev
Currently the method ScoringFilter.updateDbScore() doesn't use the "old" value
from existing CrawlDB. Instead it uses the value taken from the fetchlist from
the current segment, which represents a snapshot of the "old" value taken at
the moment of generating the fetchlist.
The problem with this approach is that if/when we add a possibility to
interleave generate/fetch/update cycles, the initial score values in CrawlDatum
instance that comes from the current segment could be already outdated, if
another updatedb was run in the meantime, which changed the DB score.
For this reason we should always assume that the value from CrawlDB, if exists,
represents the most recent version of CrawlDatum before the update, and use
this instance as a base.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers