[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502752#comment-16502752
 ] 

Weiwei Yang commented on YARN-8394:
-----------------------------------

Hi [~yufeigu]

By that I mean, when a cluster is using separated storage and computation 
systems, i.e file system is remote, there is no locality at all. Such 
architecture is very popular now on cloud. If CS continues to use the default 
delay logic, MR jobs perf suffers. Tasks are waiting for missed opportunities 
until they are finally switched to off-switch requests. Does that make sense? 

> Improve data locality documentation for Capacity Scheduler
> ----------------------------------------------------------
>
>                 Key: YARN-8394
>                 URL: https://issues.apache.org/jira/browse/YARN-8394
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Weiwei Yang
>            Priority: Major
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to