[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786443#action_12786443 ] MilleBii commented on NUTCH-770: Tried it succesfully on a windows platform. It does not work on a Ubuntu, pseudo-distributed hadoop configuration with mappers running in parallel Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Andrzej Bialecki Fix For: 1.1 Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784250#action_12784250 ] Andrzej Bialecki commented on NUTCH-770: - Fixed in rev. 885776. Thank you! Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Andrzej Bialecki Fix For: 1.1 Attachments: log-770, NUTCH-770-v2.patch, NUTCH-770-v3.patch, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783638#action_12783638 ] Andrzej Bialecki commented on NUTCH-770: - bq. time limit is definitely better than timebomb (but not as amusing). :) let's got for informative and less confusing now ... Could you please also add the nutch-default.xml property and its documentation. Re: FetchQueues - ok, you have a point here. Re: code style - yes. Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Attachments: log-770, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783248#action_12783248 ] Julien Nioche commented on NUTCH-770: - The log simply shows that the patch has not been applied properly. See http://markmail.org/message/wbd3r3t5bfxzkbpn for a discussion on how to apply patches Should work fine from the root directory of Nutch with patch -p0 ~/Desktop/NUTCH-770.patch Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Attachments: log-770, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783252#action_12783252 ] MilleBii commented on NUTCH-770: That's what I did and just retried ... so I'm a bit suprised too. Other patches worked fine so far. ??? Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Attachments: log-770, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-770) Timebomb for Fetcher
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783283#action_12783283 ] Andrzej Bialecki commented on NUTCH-770: - I propose to change the name of this functionality - timebomb is not self-explanatory, and it suggests that if you misbehave then your cluster may explode ;) Instead I would use time limit, rename all vars and methods to follow this naming, and document it properly in nutch-default.xml. A few comments to the patch: * it has some overlap with NUTCH-769 (the emptyQueue() method), but that's easy to resolve, see also the next point. * why change the code in FetchQueues at all? Time limit is a global condition, we could just break the main loop in run() and ignore the QueueFeeder (or don't start it if the time limit already passed when starting run() ). * the patch does not follow the code style (notably whitespace in for/while loops and assignments). Timebomb for Fetcher Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Attachments: log-770, NUTCH-770.patch This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder skips all remaining entries then all active queues are purged. This allows to keep the Fetch step under comtrol and works well in combination with NUTCH-769 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.