[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2017-11-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239034#comment-16239034
 ] 

Arun Suresh commented on YARN-5447:
---

Looks like YARN-7371 is already handling this.

> Consider including allocationRequestId in NMContainerStatus to allow recovery 
> in case of RM failover
> 
>
> Key: YARN-5447
> URL: https://issues.apache.org/jira/browse/YARN-5447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
>
> We have added a mapping of the allocated container to the original request 
> through YARN-4887/YARN-4888. There is a corner case in which the mapping will 
> be lost, i.e. if RM fails over before notifying the AM about newly allocated 
> container(s). This JIRA tracks the changes required to include the 
> allocationRequestId in NMContainerStatus to allow recovery in case of RM 
> failover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2017-11-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239033#comment-16239033
 ] 

Arun Suresh commented on YARN-5447:
---

Got it. To enable this though, we would also need to update the 
ContainerTokenIdentifier to include the allocationRequestId - since, currently 
the NM has no idea about the allocateReqId.

> Consider including allocationRequestId in NMContainerStatus to allow recovery 
> in case of RM failover
> 
>
> Key: YARN-5447
> URL: https://issues.apache.org/jira/browse/YARN-5447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
>
> We have added a mapping of the allocated container to the original request 
> through YARN-4887/YARN-4888. There is a corner case in which the mapping will 
> be lost, i.e. if RM fails over before notifying the AM about newly allocated 
> container(s). This JIRA tracks the changes required to include the 
> allocationRequestId in NMContainerStatus to allow recovery in case of RM 
> failover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2017-10-31 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227295#comment-16227295
 ] 

Jian He commented on YARN-5447:
---

yeah, we run into case 2, AM is not persisting the mapping, and the mapping got 
lost after AM and RM restarted, 

> Consider including allocationRequestId in NMContainerStatus to allow recovery 
> in case of RM failover
> 
>
> Key: YARN-5447
> URL: https://issues.apache.org/jira/browse/YARN-5447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> We have added a mapping of the allocated container to the original request 
> through YARN-4887/YARN-4888. There is a corner case in which the mapping will 
> be lost, i.e. if RM fails over before notifying the AM about newly allocated 
> container(s). This JIRA tracks the changes required to include the 
> allocationRequestId in NMContainerStatus to allow recovery in case of RM 
> failover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2017-10-26 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221447#comment-16221447
 ] 

Arun Suresh commented on YARN-5447:
---

Just noticed your comment [~jianhe]. I don't think I understand your concern.
Let us try to distinguish between 2 failover scenarios:
# If the RM failsover, assuming the AM is still running when the new RM comes 
up, the AM will re-register and in the response, it will be notified by the RM 
all it currently running containers. The RM recreates this list from the 
NMContainerStatus it receives from the NM heartbeats. Since the AM keeps the 
mapping between allocationReqId and containerType/role in memory I am guessing 
we are fine.
# If the AM failsover, we will get a new app attempt and this new app attempt 
will receive all the previous attempts running containers on registration. In 
this case, the mapping might be lost, if the AM had not persisted it somewhere. 

This JIRA was to track case 1, we can expand the scope to solving case 2.

> Consider including allocationRequestId in NMContainerStatus to allow recovery 
> in case of RM failover
> 
>
> Key: YARN-5447
> URL: https://issues.apache.org/jira/browse/YARN-5447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> We have added a mapping of the allocated container to the original request 
> through YARN-4887/YARN-4888. There is a corner case in which the mapping will 
> be lost, i.e. if RM fails over before notifying the AM about newly allocated 
> container(s). This JIRA tracks the changes required to include the 
> allocationRequestId in NMContainerStatus to allow recovery in case of RM 
> failover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover

2017-10-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217855#comment-16217855
 ] 

Jian He commented on YARN-5447:
---

[~asuresh], in yarn service we are rely on allocateRequestId as a mapping from 
allocateId -> component.
However, this failed in recovery, because the allocateRequestId is not 
recovered and break the client logic, see YARN-7371

We need to re-consider this ?

> Consider including allocationRequestId in NMContainerStatus to allow recovery 
> in case of RM failover
> 
>
> Key: YARN-5447
> URL: https://issues.apache.org/jira/browse/YARN-5447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> We have added a mapping of the allocated container to the original request 
> through YARN-4887/YARN-4888. There is a corner case in which the mapping will 
> be lost, i.e. if RM fails over before notifying the AM about newly allocated 
> container(s). This JIRA tracks the changes required to include the 
> allocationRequestId in NMContainerStatus to allow recovery in case of RM 
> failover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org