[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2286: ----------------------------------- Summary: CrawlDbReader -stats to show fetch time and interval (was: CrawlDbReader -stats fetch time and interval) > CrawlDbReader -stats to show fetch time and interval > ---------------------------------------------------- > > Key: NUTCH-2286 > URL: https://issues.apache.org/jira/browse/NUTCH-2286 > Project: Nutch > Issue Type: Improvement > Components: crawldb > Affects Versions: 1.12 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.13 > > > An overview about fetch times and fetch intervals could be useful to > configure a crawl. CrawlDbReader could easily calculate min, max and average > and show it as part of the statistics job (command-line option {{-stats}}): > {noformat} > % bin/nutch readdb .../crawldb/ -stats > ... > TOTAL urls: 544910 > shortest fetch interval: 7 days, 00:00:00 > avg fetch interval: 7 days, 17:43:58 > longest fetch interval: 10 days, 12:00:00 > earliest fetch time: Wed May 25 11:42:00 CEST 2016 > avg of fetch times: Sun Jun 05 18:11:00 CEST 2016 > latest fetch time: Wed Jun 22 10:25:00 CEST 2016 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)