[
https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863347#comment-15863347
]
Sunil G commented on YARN-6148:
-------------------------------
[~varun_saxena]
I have some doubts over here.
{{allocateResponse.setNumClusterNodes(allocation.getApplicableNMCount());}}
Still we sent one integer value of node count to AM. This count is better than
earlier because its the node count for all used partitions for this app
(instead of all nodes in cluster). But I am wary about few use cases.
Lets consider below scenario for MR:
- AM container is running in LabelA(it has 10 nodes)
- Maps and reducers are running in LabelB (10 nodes)
4 tasks are failed in one node of LabelB. Now that node ll be blacklisted.
As per your code, node count is 20 (Am container's LabelA, Map/Reducer
containers LabelB). But for task failure, we need not have to worry about
LabelA (AM container node label). We should consider only LabelB here.
So I think we must send per label node count to MR (only for request
partitions). {{<<labelA, 10>, <labelB, 10>>}} It can be a new parameter in the
existing protocol. Also MR code to be changed as per this. This may be more
complete.
> NM node count reported to AM in Allocate Response should consider requested
> node label partitions.
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-6148
> URL: https://issues.apache.org/jira/browse/YARN-6148
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Varun Saxena
> Assignee: Varun Saxena
> Attachments: YARN-6148.01.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]