Hey Matt, Its possibly due to your YARN config... Possibly, YARN/Mapred ACLs / YARN scheduler config or Cgroups not (incase enabled) set up not correctly. We could dig in more if we have the yarn-site.xml and scheduler conf files.
Thanks, Nat. On Mon, Jun 15, 2015 at 10:39 PM, Matt K <[email protected]> wrote: > I see there's 2 threads - one that kicks off the mappers, and another that > kicks off reducers. The one that kicks off the mappers got stuck. It's not > yet clear to me where it got stuck exactly. > > On Tue, Jun 16, 2015 at 1:11 AM, Matt K <[email protected]> wrote: > >> Hi all, >> >> I'm dealing with a production issue, any help would be appreciated. I am >> seeing very strange behavior in the TaskTrackers. After they pick up the >> task, it never comes out of the UNASSIGNED state, and the task just gets >> killed 10 minutes later. >> >> 2015-06-16 02:42:21,114 INFO org.apache.hadoop.mapred.TaskTracker: >> LaunchTaskAction (registerTask): attempt_201506152116_0046_m_000286_0 >> task's state:UNASSIGNED >> 2015-06-16 02:52:21,805 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201506152116_0046_m_000286_0: Task >> attempt_201506152116_0046_m_000286_0 failed to report status for 600 >> seconds. Killing! >> >> Normally, I would see the following in the logs: >> >> 2015-06-16 04:30:32,328 INFO org.apache.hadoop.mapred.TaskTracker: Trying >> to launch : attempt_201506152116_0062_r_000004_0 which needs 1 slots >> >> However, it doesn't get this far for these particular tasks. I am >> perusing the source code here, and this doesn't seem to be possible: >> >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapred/TaskTracker.java#TaskTracker.TaskLauncher.0tasksToLaunch >> >> The code does something like this: >> >> public void addToTaskQueue(LaunchTaskAction action) { >> synchronized (tasksToLaunch) { >> TaskInProgress tip = registerTask(action, this); >> tasksToLaunch.add(tip); >> tasksToLaunch.notifyAll(); >> } >> } >> >> The following should pick it up: >> >> public void run() { >> while (!Thread.interrupted()) { >> try { >> TaskInProgress tip; >> Task task; >> synchronized (tasksToLaunch) { >> while (tasksToLaunch.isEmpty()) { >> tasksToLaunch.wait(); >> } >> //get the TIP >> tip = tasksToLaunch.remove(0); >> task = tip.getTask(); >> LOG.info("Trying to launch : " + tip.getTask().getTaskID() + >> " which needs " + task.getNumSlotsRequired() + " >> slots"); >> } >> >> What's even stranger is that this is happening for Map tasks only. Reduce >> tasks are fine. >> >> This is only happening on a handful of the nodes, but enough to either slow >> down jobs or cause them to fail. >> >> We're running Hadoop 2.3.0-cdh5.0.2 >> >> Thanks, >> >> -Matt >> >> > > > -- > www.calcmachine.com - easy online calculator. >
