[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517843 ]
Doğacan Güney commented on NUTCH-532: ------------------------------------- Emmanuel, your patch breaks backward compatibilty since old crawl datum's can't be read anymore (well, they can be read but reading a float as int may produce gibberish values). You should increase CrawlDatum's VERSION and deal with float values. Also, as Andrzej has suggested, adding calculateLastFetchTime to CrawlDatum is not a good approach. It works for now because all the fetch schedule implementations calculate last fetch time similarly, but in the future, one may write another fetch schedule implementation that calculates last fetch time in a different way. So it is probably best to add it to FetchSchedule. Same goes for setFetchTimeBasedOnInterval (though, after Andrzej's comments, I am not sure if it is necessary at all). > CrawlDbMerger: wrong computation of last fetch time > --------------------------------------------------- > > Key: NUTCH-532 > URL: https://issues.apache.org/jira/browse/NUTCH-532 > Project: Nutch > Issue Type: Bug > Reporter: Emmanuel Joke > Assignee: Emmanuel Joke > Fix For: 1.0.0 > > Attachments: NUTCH-532.patch, NUTCH-532_v2.patch > > > CrawlDbMerger.reduce analyse the last fetch time of each record and keep the > more recent record. > This comparison is based on a FetchInterval in days : resTime = > res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); > It was not really a noticeable as the Math.Round method return the > INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers