[
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261927#comment-16261927
]
Juan Rodríguez Hortalá commented on YARN-6483:
----------------------------------------------
Hi [~asuresh],
I have added a new patch, and also updated the pull request for easy
visualization. I have:
- provided default implementations for the new methods of NodeReport
- reverted the changes to RMNodeDecommissioningEvent to use an Integer again
- defined a new enum `org.apache.hadoop.yarn.api.records.NodeUpdateType` that
is used as a new optional field for NodeReport, that is only set to a non null
value for NodeReports corresponding to node transitions, and it's null for node
reports not associated to node transitions like those requested by
ClientRMService. I have added assertions for the former case to
TestAMRMRPCNodeUpdates, and for the latter to TestClientRMService
Thanks again for taking a look!
> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes
> returned by the Resource Manager as a response to the Application Master
> heartbeat
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 3.1.0
> Reporter: Juan Rodríguez Hortalá
> Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch,
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful
> decommissioning mechanism to give time for tasks to complete in a node that
> is scheduled for decommission, and for reducer tasks to read the shuffle
> blocks in that node. Also, YARN effectively blacklists nodes in
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent
> additional containers to be launched in those nodes, so no more shuffle
> blocks are written to the node. This blacklisting is not effective for
> applications like Spark, because a Spark executor running in a YARN container
> will keep receiving more tasks after the corresponding node has been
> blacklisted at the YARN level. We would like to propose a modification of the
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added
> to the list of updated nodes returned by the Resource Manager as a response
> to the Application Master heartbeat. This way a Spark application master
> would be able to blacklist a DECOMMISSIONING at the Spark level.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]