Hi Nat, I should've mentioned this before. We're running MRv1.

On Tue, Jun 16, 2015 at 2:24 AM, nataraj jonnalagadda <
[email protected]> wrote:

> Hey Matt,
>
> Its possibly due to your YARN config... Possibly, YARN/Mapred ACLs / YARN
> scheduler config or Cgroups not (incase enabled) set up not correctly. We
> could dig in more if we have the yarn-site.xml and scheduler conf files.
>
>
> Thanks,
> Nat.
>
>
>
> On Mon, Jun 15, 2015 at 10:39 PM, Matt K <[email protected]> wrote:
>
>> I see there's 2 threads - one that kicks off the mappers, and another
>> that kicks off reducers. The one that kicks off the mappers got stuck. It's
>> not yet clear to me where it got stuck exactly.
>>
>> On Tue, Jun 16, 2015 at 1:11 AM, Matt K <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I'm dealing with a production issue, any help would be appreciated. I am
>>> seeing very strange behavior in the TaskTrackers. After they pick up the
>>> task, it never comes out of the UNASSIGNED state, and the task just gets
>>> killed 10 minutes later.
>>>
>>> 2015-06-16 02:42:21,114 INFO org.apache.hadoop.mapred.TaskTracker:
>>> LaunchTaskAction (registerTask): attempt_201506152116_0046_m_000286_0
>>> task's state:UNASSIGNED
>>> 2015-06-16 02:52:21,805 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201506152116_0046_m_000286_0: Task
>>> attempt_201506152116_0046_m_000286_0 failed to report status for 600
>>> seconds. Killing!
>>>
>>> Normally, I would see the following in the logs:
>>>
>>> 2015-06-16 04:30:32,328 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Trying to launch : attempt_201506152116_0062_r_000004_0 which needs 1 slots
>>>
>>> However, it doesn't get this far for these particular tasks. I am
>>> perusing the source code here, and this doesn't seem to be possible:
>>>
>>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapred/TaskTracker.java#TaskTracker.TaskLauncher.0tasksToLaunch
>>>
>>> The code does something like this:
>>>
>>>     public void addToTaskQueue(LaunchTaskAction action) {
>>>       synchronized (tasksToLaunch) {
>>>         TaskInProgress tip = registerTask(action, this);
>>>         tasksToLaunch.add(tip);
>>>         tasksToLaunch.notifyAll();
>>>       }
>>>     }
>>>
>>> The following should pick it up:
>>>
>>>     public void run() {
>>>       while (!Thread.interrupted()) {
>>>         try {
>>>           TaskInProgress tip;
>>>           Task task;
>>>           synchronized (tasksToLaunch) {
>>>             while (tasksToLaunch.isEmpty()) {
>>>               tasksToLaunch.wait();
>>>             }
>>>             //get the TIP
>>>             tip = tasksToLaunch.remove(0);
>>>             task = tip.getTask();
>>>             LOG.info("Trying to launch : " + tip.getTask().getTaskID() +
>>>                      " which needs " + task.getNumSlotsRequired() + " 
>>> slots");
>>>           }
>>>
>>> What's even stranger is that this is happening for Map tasks only. Reduce 
>>> tasks are fine.
>>>
>>> This is only happening on a handful of the nodes, but enough to either slow 
>>> down jobs or cause them to fail.
>>>
>>> We're running Hadoop 2.3.0-cdh5.0.2
>>>
>>> Thanks,
>>>
>>> -Matt
>>>
>>>
>>
>>
>> --
>> www.calcmachine.com - easy online calculator.
>>
>
>


-- 
www.calcmachine.com - easy online calculator.

Reply via email to