Any writeup on the rule/conditions when spec.exec. kicks in ? I searched the net and found only some JIRA references. Looks like its based on many many factors etc. Still if I can get hold of some basic rules that'll be great.
Thanks, Prasen On Wed, Feb 10, 2010 at 8:40 AM, Dmitriy Ryaboy <[email protected]> wrote: > turn off speculative execution. > but your map tasks should be idempotent. If they are not, rethink. > Speculative execution is a good thing (and so is preemption, it's > eviller twin). > > -D > > On Tue, Feb 9, 2010 at 6:52 PM, prasenjit mukherjee > <[email protected]> wrote: >> Any thoughts on this problem ? I am using a DEFINE command ( in PIG ) >> and hence the actions are not idempotent. Because of which duplicate >> execution does have an affect on my results. Any way to overcome that >> ? >> >> On Tue, Feb 9, 2010 at 9:26 PM, prasenjit mukherjee >> <[email protected]> wrote: >>> But the second attempted job got killed even before the first one was >>> completed. How can we explain that. >>> >>> On Tue, Feb 9, 2010 at 7:38 PM, Eric Sammer <[email protected]> wrote: >>>> Prasen: >>>> >>>> This is most likely speculative execution. Hadoop fires up multiple >>>> attempts for the same task and lets them "race" to see which finishes >>>> first and then kills the others. This is meant to speed things along. >>>> >>>> Speculative execution is on by default, but can be disabled. See the >>>> configuration reference for mapred-*.xml. >>>> >>>> On 2/9/10 9:03 AM, prasenjit mukherjee wrote: >>>>> Sometimes for the same task I see that a duplicate task gets run on a >>>>> different machine and gets killed later. Not always but sometimes. Any >>>>> reason why duplicate tasks get run. I thought tasks are duplicated >>>>> only if either the first attempt exits( exceptions etc ) or exceeds >>>>> mapred.task.timeout. In this case none of them happens. As can be seen >>>>> from timestamp, the second attempt starts even though the first >>>>> attempt is still running ( only for 1 minute ). >>>>> >>>>> Any explanation ? >>>>> >>>>> attempt_201002090552_0009_m_000001_0 >>>>> /default-rack/ip-10-242-142-193.ec2.internal >>>>> SUCCEEDED >>>>> 100.00% >>>>> 9-Feb-2010 07:04:37 >>>>> 9-Feb-2010 07:07:00 (2mins, 23sec) >>>>> >>>>> attempt_201002090552_0009_m_000001_1 >>>>> Task attempt: /default-rack/ip-10-212-147-129.ec2.internal >>>>> Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal >>>>> KILLED >>>>> 100.00% >>>>> 9-Feb-2010 07:05:34 >>>>> 9-Feb-2010 07:07:10 (1mins, 36sec) >>>>> >>>>> -Prasen >>>>> >>>> >>>> >>>> -- >>>> Eric Sammer >>>> [email protected] >>>> http://esammer.blogspot.com >>>> >>> >> >
