[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517402
]
Andrzej Bialecki commented on NUTCH-532:
-----------------------------------------
Float values were originally intended to express fractions of a day, when fetch
interval was expressed in days, but after we changed the unit to seconds there
is little purpose to it.
However, we need to be careful about the size of the data - long values are ..
long ;), and for all operations that involve CrawlDatum this will have
performance implications. Is it really useful to keep re-fetch interval in
milliseconds? If we limit the resolution to a unit of seconds, as it is now,
then I think an int value should be enough - which means that the
sizeof(CrawlDatum) stays the same.
+1 on adding a getLastFetchTime, with a good javadoc that explains the formula
and assumptions. Perhaps it should be called calculateLastFetchTime, to avoid
misunderstandings, because in reality we don't keep that value. The method
should be added to FetchSchedule interface, and it should be implemented in
AbstractFetchSchedule.
Re: datum.setFetchTime - IMHO it's a premature optimization, this expression is
used just twice in the whole code base
> CrawlDbMerger: wrong computation of last fetch time
> ---------------------------------------------------
>
> Key: NUTCH-532
> URL: https://issues.apache.org/jira/browse/NUTCH-532
> Project: Nutch
> Issue Type: Bug
> Reporter: Emmanuel Joke
> Assignee: Emmanuel Joke
> Fix For: 1.0.0
>
> Attachments: NUTCH-532.patch
>
>
> CrawlDbMerger.reduce analyse the last fetch time of each record and keep the
> more recent record.
> This comparison is based on a FetchInterval in days : resTime =
> res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000);
> It was not really a noticeable as the Math.Round method return the
> INTEGER.MAX_VALUE i.e 25 days.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers