[ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906088#comment-15906088 ]
Huangkaixuan commented on YARN-6289: ------------------------------------ Thanks [~leftnoteasy] For #1 - Can you explain a little more, the answer is not clear, it should say more conclusively that - MR is using FileSystem.getFileBlockLocations but Yarn is not honoring locality in default scheduling mode. For #2 - Since the data is all rack local, we are not expecting this experiment to help. Is there a reason you think it might? For #3 - There were no other jobs running on the cluster at the same time and we thought we should get 100% locality all the time. Can you please explain how to get the data locality here? > Fail to achieve data locality when runing MapReduce and Spark on HDFS > --------------------------------------------------------------------- > > Key: YARN-6289 > URL: https://issues.apache.org/jira/browse/YARN-6289 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling > Environment: Hardware configuration > CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread > Memory: 128GB Memory (16x8GB) 1600MHz > Disk: 600GBx2 3.5-inch with RAID-1 > Network bandwidth: 968Mb/s > Software configuration > Spark-1.6.2 Hadoop-2.7.1 > Reporter: Huangkaixuan > Attachments: Hadoop_Spark_Conf.zip, YARN-DataLocality.docx > > > When running a simple wordcount experiment on YARN, I noticed that the task > failed to achieve data locality, even though there is no other job running on > the cluster at the same time. The experiment was done in a 7-node (1 master, > 6 data nodes/node managers) cluster and the input of the wordcount job (both > Spark and MapReduce) is a single-block file in HDFS which is two-way > replicated (replication factor = 2). I ran wordcount on YARN for 10 times. > The results show that only 30% of tasks can achieve data locality, which > seems like the result of a random placement of tasks. The experiment details > are in the attachment, and feel free to reproduce the experiments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org