> 0.23.1 with Pig 0.10.0 on top. Ok.
> How is the preemption suppose to work? Is a single reducer suppose to > be preempted or will a batch of reducers be preempted. A batch of reducers. Enough reducers will be killed to accommodate any/all pending map-tasks. > Also, when you > say preemption, do you mean that the current execution of a reducer is > actually paused and resumed again later. Or, does preemption mean that > the reducer's container is discarded and must be started again from > scratch? No, by preempted, I mean that the current reduce tasks are killed. And because MapReduce tolerates arbitrary number of killed task-attempts (as opposed to failed task-attempts), this is okay. So yes, the reducers when they get rescheduled will start all-over again. > Do you know of any doc on the specifics of task scheduling? Would you > say that the example I gave is in line with how scheduling is > intended? We don't have docs on task-level scheduling, but you can look at RMContainerAllocator.java and related classes in MRAppMaster (i.e. hadoop-mapreduce-client-app/ module) for understanding this. And no, like I mentioned before scheduling isn't random, but maps first, and a slow reduce ramp-up as reducers finish. > FYI: the starvation issue is a known bug > (https://issues.apache.org/jira/browse/MAPREDUCE-4299). Mistook that you were using capacity-scheduler. There were other such bugs in both the Fifo and capacity-schedulers which got fixed (not sure of fixed-version). We've tested Capacity-scheduler a lot more if you pick up the latest version - 0.23.2/branch-0.23 HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/
signature.asc
Description: Message signed with OpenPGP using GPGMail
