[ https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367259#comment-17367259 ]
Peter Bacsko commented on YARN-10780: ------------------------------------- [~gandras] looks good, could you take care of the checkstyle problems? > Optimise retrieval of configured node labels in CS queues > --------------------------------------------------------- > > Key: YARN-10780 > URL: https://issues.apache.org/jira/browse/YARN-10780 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Andras Gyori > Assignee: Andras Gyori > Priority: Major > Attachments: YARN-10780.001.patch, YARN-10780.002.patch, > YARN-10780.003.patch, YARN-10780.004.patch > > > CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with > respect to queue numbers (its O(n*m), where n is the number of queues and m > is the number of properties set by each queue). During CS reinit, the node > labels are often queried, however looking at the code: > {code:java} > for (Entry<String, String> stringStringEntry : this) { > e = stringStringEntry; > String key = e.getKey(); > if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS > + DOT)) { > // Find <label-name> in > // <queue-path>.accessible-node-labels.<label-name>.property > int labelStartIdx = > key.indexOf(ACCESSIBLE_NODE_LABELS) > + ACCESSIBLE_NODE_LABELS.length() + 1; > int labelEndIndx = key.indexOf('.', labelStartIdx); > String labelName = key.substring(labelStartIdx, labelEndIndx); > configuredNodeLabels.add(labelName); > } > } > {code} > This method iterates through ALL properties set in the configuration. For > example in case of initialising 2500 queues, each having at least 2 > properties: > 2500 * 5000 ~= over 12 million iteration + additional properties > There are some ways to resolve this issue while keeping backward > compatibility: > # Create a property like the original accessible-node-labels, which contains > predefined labels. If it is set, then getConfiguredNodeLabels get the value > of this property, otherwise it falls back to the old logic. I think > accessible-node-labels are not used for this purpose (though I have a feeling > that it should have been). > # Collect node labels for all queues at the beginning of parseQueue and only > iterate through the properties once. This will increase the space complexity > in exchange of not requiring intervention from user's perspective. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org