Sebastian Nagel created NUTCH-2297:
--------------------------------------

             Summary: CrawlDbReader -stats wrong values for earliest fetch time 
and shortest interval
                 Key: NUTCH-2297
                 URL: https://issues.apache.org/jira/browse/NUTCH-2297
             Project: Nutch
          Issue Type: Bug
          Components: crawldb
    Affects Versions: 1.13
            Reporter: Sebastian Nagel
            Assignee: Sebastian Nagel
            Priority: Minor
             Fix For: 1.13


NUTCH-2286 added min, max and average for fetch interval and fetch time.
When running in distributed mode (not reproducible in local mode), the values 
for the minimum (earliest fetch time and shortest fetch interval) may be wrong 
with implausible values:
{noformat}
TOTAL urls: 7180518032
 shortest fetch interval:    175 days, 00:00:00             <<<<<< ????
 avg fetch interval: 10 days, 08:01:36
 longest fetch interval:     15 days, 18:00:00
 earliest fetch time:        Thu Dec 20 05:30:00 UTC 3106   <<<<<< ????
 avg of fetch times: Fri Feb 19 00:07:00 UTC 2016
 latest fetch time:  Mon Jul 18 05:22:00 UTC 2016
 retry 0:    6907984913
 retry 1:    148125397
 retry 2:    82761892
 retry 3:    41645830
 min score:  0.0
 avg score:  0.014360981
 max score:  9.25
 ...
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to