Wangda Tan commented on YARN-2494:

I replied your comment in YARN-796 here, I think it's an implementation detail 
of FileSystemNodeLabelManager:
bq. It looks like the FileSystemNodeLabelManager will just append changes to 
the edit log forever, until it is restarted, is that correct? If so, a 
long-running cluster with lots of changes could result in a rather large edit 
log. I think every so many writes (N writes) a recovery should be "forced" to 
clean up the edit log and consolidate state (do a recover...)

It a good suggestion, but I think it's more like an enhancement to me. I 
roughly estimate, if we have 10,000 node label changes in one hour, average 
size of the label is 16 (8 for label and 8 for node), if we have the cluster 
running for one year, size of the editlog will be: {{10000 * 16 * 24 * 365 / 
1024 / 1024}} MB = 1336 MB, according to existing HDFS read throughput (at 
least we can get 50MB / sec), it should be acceptable to me if restart a RM ran 
for a whole year and cost about 30s extra time.

I agree that periodically create a new mirror and cleanup editlog is better 
than this, we can do it if we have other high priority problems addressed.


> [YARN-796] Node label manager API and storage implementations
> -------------------------------------------------------------
>                 Key: YARN-2494
>                 URL: https://issues.apache.org/jira/browse/YARN-2494
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, 
> YARN-2494.patch, YARN-2494.patch
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart

This message was sent by Atlassian JIRA

Reply via email to