Scoring API deficiency
----------------------

                 Key: NUTCH-321
                 URL: http://issues.apache.org/jira/browse/NUTCH-321
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 0.8-dev
            Reporter: Andrzej Bialecki 
             Fix For: 0.8-dev


Currently the method ScoringFilter.updateDbScore() doesn't use the "old" value 
from existing CrawlDB. Instead it uses the value taken from the fetchlist from 
the current segment, which represents a snapshot of the "old" value taken at 
the moment of generating the fetchlist.

The problem with this approach is that if/when we add a possibility to 
interleave generate/fetch/update cycles, the initial score values in CrawlDatum 
instance that comes from the current segment could be already outdated, if 
another updatedb was run in the meantime, which changed the DB score.

For this reason we should always assume that the value from CrawlDB, if exists, 
represents the most recent version of CrawlDatum before the update, and use 
this instance as a base.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to