[
https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900467#comment-15900467
]
Wangda Tan commented on YARN-6289:
----------------------------------
[~Huangkx6810],
For locality scheduling scheduling, there're typically two causes:
1) FileSystem/Application should support locality. For example, FileInputFormat
in MR uses FileSystem.getBlockLocations to get where blocks located.
2) Misconfiguration of topology script makes wrong rack name returned for given
hosts.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html
In addition to that.
3) There's a fix to handle too long delay to wait locality in
CapacityScheduler: YARN-4287, but this will not handle #1/#2.
> Fail to achieve data locality when runing MapReduce and Spark on HDFS
> ---------------------------------------------------------------------
>
> Key: YARN-6289
> URL: https://issues.apache.org/jira/browse/YARN-6289
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2 Hadoop-2.7.1
> Reporter: Huangkaixuan
> Attachments: Hadoop_Spark_Conf.zip, YARN-DataLocality.docx
>
>
> When running a simple wordcount experiment on YARN, I noticed that the task
> failed to achieve data locality, even though there is no other job running on
> the cluster at the same time. The experiment was done in a 7-node (1 master,
> 6 data nodes/node managers) cluster and the input of the wordcount job (both
> Spark and MapReduce) is a single-block file in HDFS which is two-way
> replicated (replication factor = 2). I ran wordcount on YARN for 10 times.
> The results show that only 30% of tasks can achieve data locality, which
> seems like the result of a random placement of tasks. The experiment details
> are in the attachment, and feel free to reproduce the experiments.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]