[
https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355775#comment-17355775
]
Peter Bacsko commented on YARN-10780:
-------------------------------------
There are a lot of NPE at {{serviceStop()}}, [~gandras] could you check those?
In the meantime, I'll review the changes.
> Optimise retrieval of configured node labels in CS queues
> ---------------------------------------------------------
>
> Key: YARN-10780
> URL: https://issues.apache.org/jira/browse/YARN-10780
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Andras Gyori
> Assignee: Andras Gyori
> Priority: Major
> Attachments: YARN-10780.001.patch, YARN-10780.002.patch
>
>
> CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with
> respect to queue numbers (its O(n*m), where n is the number of queues and m
> is the number of properties set by each queue). During CS reinit, the node
> labels are often queried, however looking at the code:
> {code:java}
> for (Entry<String, String> stringStringEntry : this) {
> e = stringStringEntry;
> String key = e.getKey();
> if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS
> + DOT)) {
> // Find <label-name> in
> // <queue-path>.accessible-node-labels.<label-name>.property
> int labelStartIdx =
> key.indexOf(ACCESSIBLE_NODE_LABELS)
> + ACCESSIBLE_NODE_LABELS.length() + 1;
> int labelEndIndx = key.indexOf('.', labelStartIdx);
> String labelName = key.substring(labelStartIdx, labelEndIndx);
> configuredNodeLabels.add(labelName);
> }
> }
> {code}
> This method iterates through ALL properties set in the configuration. For
> example in case of initialising 2500 queues, each having at least 2
> properties:
> 2500 * 5000 ~= over 12 million iteration + additional properties
> There are some ways to resolve this issue while keeping backward
> compatibility:
> # Create a property like the original accessible-node-labels, which contains
> predefined labels. If it is set, then getConfiguredNodeLabels get the value
> of this property, otherwise it falls back to the old logic. I think
> accessible-node-labels are not used for this purpose (though I have a feeling
> that it should have been).
> # Collect node labels for all queues at the beginning of parseQueue and only
> iterate through the properties once. This will increase the space complexity
> in exchange of not requiring intervention from user's perspective.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]