[ 
https://issues.apache.org/jira/browse/YARN-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3389:
---------------------------
    Description: In AttemptFailedTransition, new attempt will get 
states('justFinishedContainers' and 'finishedContainersSentToAM') reference 
from failed attempt. Then these attempts share the two states(previous attempts 
also share the two states). Suppose two or more CONTAINER_FINISHED events for 
different attempts are handled at the same time, and suppose they ran on same 
node. Attempts will update justFinishedContainers's same key's value 
concurrently. Although 'justFinishedContainers' is a ConcurrentHashMap, 
operations on its value 'List<ContainerStatus>' is not atomic, namely  
{code}appAttempt.justFinishedContainers.get(containerFinishedEvent.getNodeId()).add(containerFinishedEvent.getContainerStatus()){code}
 is not atomic.  (was: In AttemptFailedTransition, the new attempt will get 
state('justFinishedContainers' and 'finishedContainersSentToAM') reference from 
the failed attempt. Then the two attempts might operate on these two variables 
concurrently, e.g. they might update 'justFinishedContainers' concurrently when 
they are both handling CONTAINER_FINISHED event.)

> Avoid race conditions when attempts operate on shared states concurrently
> -------------------------------------------------------------------------
>
>                 Key: YARN-3389
>                 URL: https://issues.apache.org/jira/browse/YARN-3389
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-3389.01.patch
>
>
> In AttemptFailedTransition, new attempt will get 
> states('justFinishedContainers' and 'finishedContainersSentToAM') reference 
> from failed attempt. Then these attempts share the two states(previous 
> attempts also share the two states). Suppose two or more CONTAINER_FINISHED 
> events for different attempts are handled at the same time, and suppose they 
> ran on same node. Attempts will update justFinishedContainers's same key's 
> value concurrently. Although 'justFinishedContainers' is a ConcurrentHashMap, 
> operations on its value 'List<ContainerStatus>' is not atomic, namely  
> {code}appAttempt.justFinishedContainers.get(containerFinishedEvent.getNodeId()).add(containerFinishedEvent.getContainerStatus()){code}
>  is not atomic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to