[ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Huangkaixuan updated YARN-6289: ------------------------------- Attachment: YARN-DataLocality.docx > Fail to achieve data locality when runing MapReduce and Spark on HDFS > --------------------------------------------------------------------- > > Key: YARN-6289 > URL: https://issues.apache.org/jira/browse/YARN-6289 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler > Environment: Hardware configuration > CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread > Memory: 128GB Memory (16x8GB) 1600MHz > Disk: 600GBx2 3.5-inch with RAID-1 > Network bandwidth: 968Mb/s > Software configuration > Spark-1.6.2 Hadoop-2.7.1 > Reporter: Huangkaixuan > Priority: Minor > Attachments: YARN-DataLocality.docx > > > When I ran experiments with both Spark and MapReduce wordcount on YARN, I > noticed that the task failed to achieve data locality, even though there is > no other job running on the cluster. > I adopted a 7 node (1 master, 6 data nodes/node managers) cluster and set 2x > replication for HDFS. In the experiments, I run Spark/MapReduce wordcount on > YARN for 10 times with a single data block. The results show that only 30% of > tasks can achieve data locality, it seems like random in the placement of > tasks. the experiment details are in the attachment, you can reproduce the > experiments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org