Can you confirm that duplication is happening in the case that one attempt gets underway but killed before the other's completion? I believe by default (though I'm not sure for Pig), each attempt's output is first isolated to a path keyed to its attempt id, and only committed when one and only one attempt is complete.
On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee < [email protected]> wrote: > Any thoughts on this problem ? I am using a DEFINE command ( in PIG ) > and hence the actions are not idempotent. Because of which duplicate > execution does have an affect on my results. Any way to overcome that > ? > > On Tue, Feb 9, 2010 at 9:26 PM, prasenjit mukherjee > <[email protected]> wrote: > > But the second attempted job got killed even before the first one was > > completed. How can we explain that. > > > > On Tue, Feb 9, 2010 at 7:38 PM, Eric Sammer <[email protected]> wrote: > >> Prasen: > >> > >> This is most likely speculative execution. Hadoop fires up multiple > >> attempts for the same task and lets them "race" to see which finishes > >> first and then kills the others. This is meant to speed things along. > >> > >> Speculative execution is on by default, but can be disabled. See the > >> configuration reference for mapred-*.xml. > >> > >> On 2/9/10 9:03 AM, prasenjit mukherjee wrote: > >>> Sometimes for the same task I see that a duplicate task gets run on a > >>> different machine and gets killed later. Not always but sometimes. Any > >>> reason why duplicate tasks get run. I thought tasks are duplicated > >>> only if either the first attempt exits( exceptions etc ) or exceeds > >>> mapred.task.timeout. In this case none of them happens. As can be seen > >>> from timestamp, the second attempt starts even though the first > >>> attempt is still running ( only for 1 minute ). > >>> > >>> Any explanation ? > >>> > >>> attempt_201002090552_0009_m_000001_0 > >>> /default-rack/ip-10-242-142-193.ec2.internal > >>> SUCCEEDED > >>> 100.00% > >>> 9-Feb-2010 07:04:37 > >>> 9-Feb-2010 07:07:00 (2mins, 23sec) > >>> > >>> attempt_201002090552_0009_m_000001_1 > >>> Task attempt: /default-rack/ip-10-212-147-129.ec2.internal > >>> Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal > >>> KILLED > >>> 100.00% > >>> 9-Feb-2010 07:05:34 > >>> 9-Feb-2010 07:07:10 (1mins, 36sec) > >>> > >>> -Prasen > >>> > >> > >> > >> -- > >> Eric Sammer > >> [email protected] > >> http://esammer.blogspot.com > >> > > >
