[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773299#comment-16773299 ]
Yufei Gu commented on YARN-9278: -------------------------------- Hi [~uranus], this seems a perf issue for a busy large cluster due to the preemption implementation, which is iteration and check. I would suggest lower {{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let preemption kick in earlier for a large cluster. The default value is 80%, which means preemption won't kick in until 80% resources of the whole cluster have been used. Please be aware that low utilization threshold may cause an unnecessary container churn, so you don't want it to be too low. > Shuffle nodes when selecting to be preempted nodes > -------------------------------------------------- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler > Reporter: Zhaohui Xin > Assignee: Zhaohui Xin > Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List<FSSchedulerNode> newPotentialNodes = new ArrayList<FSSchedulerNode>(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org