Andras Gyori created YARN-10780:
-----------------------------------
Summary: Optimise retrieval of configured node labels in CS queues
Key: YARN-10780
URL: https://issues.apache.org/jira/browse/YARN-10780
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Andras Gyori
Assignee: Andras Gyori
CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with
respect to queue numbers (its O(n*m), where n is the number of queues and m is
the number of properties set by each queue). During CS reinit, the node labels
are often queried, however looking at the code:
{code:java}
for (Entry<String, String> stringStringEntry : this) {
e = stringStringEntry;
String key = e.getKey();
if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS
+ DOT)) {
// Find <label-name> in
// <queue-path>.accessible-node-labels.<label-name>.property
int labelStartIdx =
key.indexOf(ACCESSIBLE_NODE_LABELS)
+ ACCESSIBLE_NODE_LABELS.length() + 1;
int labelEndIndx = key.indexOf('.', labelStartIdx);
String labelName = key.substring(labelStartIdx, labelEndIndx);
configuredNodeLabels.add(labelName);
}
}
{code}
This method iterates through ALL properties set in the configuration. For
example in case of initialising 2500 queues, each having at least 2 properties:
2500 * 5000 ~= over 12 million iteration
There are some ways to resolve this issue while keeping backward compatibility:
# Create a property like the original accessible-node-labels, which contains
predefined labels. If it is set, then getConfiguredNodeLabels get the value of
this property, otherwise it falls back to the old logic. I think
accessible-node-labels are not used for this purpose (though I have a feeling
that it should have been).
# Collect node labels for all queues at the beginning of parseQueue and only
iterate through the properties once. This will increase the space complexity in
exchange of not requiring intervention from user's perspective.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]