[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642498#comment-14642498
 ] 

Dian Fu commented on YARN-3964:
-------------------------------

Hi [~leftnoteasy],
Really thanks for your comments.
{quote}
I took a quick look at the patch, some problems I can see now:
- It involves some unnecessary interface/parameter to NodeLabelsProvider, this 
also leads to unnecessary changes to NM
{quote}
This patch tries to move {{NodeLabelsProvider}} from 
{{hadoop-yarn-server-nodemanager}} to {{hadoop-yarn-server-common}} to make it 
usable by both NM and RM. But it's fine to keep it untouched. 
{quote}
- Fetcher implementation is polling updated labels for ALL NMs in the cluster, 
if a cluster has several thousands of NMs, this can be inefficient.
{quote}
Good advice. We can solve this issue by updating the labels for ALL NMs in one 
request, not one by one. Will update the patch accordingly.
{quote}
My biggest concern is still about if this change is must-to-have:
Since we already have a set of APIs to do this, I can't see a big add-on value 
of doing this inside RM. 
{quote}
I understand your concern and agree that with a cron job, some scripts and REST 
API, we do be able to achieve the functionality. While this improvement will 
have its value. It can largely decrease the amount of additional work to do and 
other difficulties for integrating a label source. Also it increases the 
usability of the label feature from management perspective. We know, a lot of 
times, how a technology will be adapted by users depends largely on how easily 
the technology can be used or integrated. Although this is not a 
"must-to-have", this improvement take the label feature a step further from the 
integration point of view.

For large clusters, it's usually not practical to manage the label of all nodes 
manually. Enterprises usually use some kind of label or label policy storage. 
This improvement can help address this requirement perfectly with the minimized 
additional development work. Also, this feature can be used as a different use 
case than synchronizing the labels through REST API because the configuration 
of a label provider mechanism at the YARN side means the management operations 
(usually done by administrator) instead of REST API operation of a client, 
adding the trustfulness of label source. 

Further more, we will target to make this change to be simple, light weight and 
strait-forward . It will not bring any additional complexity to YARN 
architecture but provide a flexible functionality for label integration. 

Thank you again for your feedback.


> Support NodeLabelsProvider at Resource Manager side
> ---------------------------------------------------
>
>                 Key: YARN-3964
>                 URL: https://issues.apache.org/jira/browse/YARN-3964
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>         Attachments: YARN-3964 design doc.pdf, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to