Srikanth Sampath created MAPREDUCE-6754:
-------------------------------------------

             Summary: Container Ids for a yarn application should be 
monotonically increasing in the scope of the application
                 Key: MAPREDUCE-6754
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754
             Project: Hadoop Map/Reduce
          Issue Type: Sub-task
            Reporter: Srikanth Sampath
            Assignee: Srikanth Sampath


Currently across application attempts, container Ids are reused.  The container 
id is stored in AppSchedulingInfo and it is reinitialized with every 
application attempt.  So the containerId scope is limited to the application 
attempt.

In the MR Framework, It is important to note that the containerId is being used 
as part of the JvmId.  JvmId has 3 components <jobId, "m/r?", containerId>.  
The JvmId is used in datastructures in TaskAttemptListener and is passed 
between the AppMaster and the individual tasks.  For an application attempt, no 
two tasks have the same JvmId.

This works well currently, since inflight tasks get killed if the AppMaster 
goes down.  However, if we want to enable WorkPreserving nature for the AM, 
containers (and hence containerIds) live across application attempts.  If we 
recycle containerIds across attempts, then two independent tasks (one inflight 
from a previous attempt  and another new in a succeeding attempt) can have the 
same JvmId and cause havoc.

This can be solved in two ways:

*Approach A*: Include attempt id as part of the JvmId. This is a viable 
solution, however, there is a change in the format of the JVMid. Changing 
something that has existed so long for an optional feature is not persuasive.

*Approach B*: Keep the container id to be a monotonically increasing id for the 
life of an application. So, container ids are not reused across application 
attempts containers should be able to outlive an application attempt. This is 
the preferred approach as it is clean, logical and is backwards compatible. 
Nothing changes for existing applications or the internal workings.  
*How this is achieved:*
Currently, we maintain latest containerId only for application attempts and 
reinitialize for new attempts.  With this approach, we retrieve the latest 
containerId from the just-failed attempt and initialize the new attempt with 
the latest containerId (instead of 0).   I can provide the patch if it helps.  
It currently exists in MAPREDUCE-6726



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to