The ids of the tasks are different so the node got killed after failing on 3 different(!) reduce tasks. The reduce task 48 will probably have been resubmitted to another node.
2014-03-27 10:22 GMT+01:00 Krishna Rao <[email protected]>: > Hi, > > we have a daily Hive script that usually takes a few hours to run. The > other day I notice one of the jobs was taking in excess of a few hours. > Digging into it I saw that there were 3 attempts to launch a job on a > single node: > > Task Id Start Time Finish Time > Error > task_201312241250_46714_r_000048 Error launching task > task_201312241250_46714_r_000049 Error launching task > task_201312241250_46714_r_000050 Error launching task > > I later found out that this node had a dodgy/unresponsive disk (still > being tested right now). > > We've seen tasks fail in the past, but re-submitted to another node and > succeeding. So, shouldn't this task have been kicked off on another node > after the first failure? Is there anything I could be missing in terms of > configuration that should be set? > > We're using CDH4.4.0. > > Cheers, > > Krishna >
