[ https://issues.apache.org/jira/browse/YARN-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957849#comment-15957849 ]
Jason Lowe commented on YARN-6344: ---------------------------------- I'd prefer a configured rack locality delay of zero means no additional rack delay, but I see that is semantically different than disabling it altogether. Specifying a rack locality delay of zero means it will _not_ scale the node locality delay based on the request/cluster sizes like it does today, whereas setting it to -1 will. In that sense it's not purely an additional delay. Given I don't know the complete backstory on the reasoning behind why it behaves the way it does for node locality delay, I can see the desire to leave the existing behavior unchanged when this new setting isn't configured. Patch looks good to me. > Rethinking OFF_SWITCH locality in CapacityScheduler > --------------------------------------------------- > > Key: YARN-6344 > URL: https://issues.apache.org/jira/browse/YARN-6344 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: Konstantinos Karanasos > Assignee: Konstantinos Karanasos > Attachments: YARN-6344.001.patch, YARN-6344.002.patch, > YARN-6344.003.patch, YARN-6344.004.patch > > > When relaxing locality from node to rack, the {{node-locality-parameter}} is > used: when scheduling opportunities for a scheduler key are more than the > value of this parameter, we relax locality and try to assign the container to > a node in the corresponding rack. > On the other hand, when relaxing locality to off-switch (i.e., assign the > container anywhere in the cluster), we are using a {{localityWaitFactor}}, > which is computed based on the number of outstanding requests for a specific > scheduler key, which is divided by the size of the cluster. > In case of applications that request containers in big batches (e.g., > traditional MR jobs), and for relatively small clusters, the > localityWaitFactor does not affect relaxing locality much. > However, in case of applications that request containers in small batches, > this load factor takes a very small value, which leads to assigning > off-switch containers too soon. This situation is even more pronounced in big > clusters. > For example, if an application requests only one container per request, the > locality will be relaxed after a single missed scheduling opportunity. > The purpose of this JIRA is to rethink the way we are relaxing locality for > off-switch assignments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org