[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: NUTCH-532_v2.patch New patch provided * Add new method to CrawlDatum calculateLastFetchTime() and setFetchTimeBasedOnInterval(long time) * Change the Fetch Interval from Float to Int. * Update the different classes regardings the modifcation listed above * Fixed the AdaptiveFetchSchedule and CrawlDbMerger convertion bug. CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch, NUTCH-532_v2.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: NUTCH-532_v3.patch My mistake, acually i'm not really familiar with the VERSION. I made the change as requested I hope it will be correct now. CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch, NUTCH-532_v2.patch, NUTCH-532_v3.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: NUTCH-532_v4.patch I updated the code following Andrzej comments. I've also update the AbstractFetchSchedule to manage the old property of maxInterval. I noticed that you have removed the old property db.max.fetch.interval from nutch-default.xml. However the old property db.default.fetch.interval is still in the nutch-default.xml. I don't see the point to keep it. Why don't we remove it from the file ? CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch, NUTCH-532_v2.patch, NUTCH-532_v3.patch, NUTCH-532_v4.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: NUTCH-532.patch Patch provided. CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: (was: NUTCH-532.patch) CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] [jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-532: Attachment: NUTCH-532.patch CrawlDbMerger: wrong computation of last fetch time --- Key: NUTCH-532 URL: https://issues.apache.org/jira/browse/NUTCH-532 Project: Nutch Issue Type: Bug Reporter: Emmanuel Joke Assignee: Emmanuel Joke Fix For: 1.0.0 Attachments: NUTCH-532.patch CrawlDbMerger.reduce analyse the last fetch time of each record and keep the more recent record. This comparison is based on a FetchInterval in days : resTime = res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000); It was not really a noticeable as the Math.Round method return the INTEGER.MAX_VALUE i.e 25 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers