[ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898619#comment-15898619 ]
Huangkaixuan edited comment on YARN-6289 at 3/7/17 2:45 AM: ------------------------------------------------------------ The detail results of the experiments are shown in the patch was (Author: huangkx6810): The detail results of the experiment are shown in the patch > yarn got little data locality > ----------------------------- > > Key: YARN-6289 > URL: https://issues.apache.org/jira/browse/YARN-6289 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler > Environment: Hardware configuration > CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread > Memory: 128GB Memory (16x8GB) 1600MHz > Disk: 600GBx2 3.5-inch with RAID-1 > Network bandwidth: 968Mb/s > Software configuration > Spark-1.6.2 Hadoop-2.7.1 > Reporter: Huangkaixuan > Priority: Minor > Attachments: YARN-6289.01.docx > > > When I ran experiments with both Spark and MapReduce wordcount with yarn > on the file, I noticed that the job did not get data locality every time. It > was seemingly random in the placement of the tasks, even though there is no > other job running on the cluster. I expected the task placement to always be > on the single machine which is holding the data block, but that did not > happen. > I run the experiments with a 7 node cluster with 2x replication(1 > master, 6 data nodes/node managers) , the experiment details are in the patch > so you can recreate the result. > In the experiments, I run Spark/MapReduce wordcount with yarn for 10 > times in a single block and the results show that only 30% of tasks can > satisfy data locality, it seems like random in the placement of tasks. > Next,I will run two more experiments(7 node cluster with 2x replication > with 2 blocks and 4 blocks) to verify the results and plan to do some > optimization work (optimize the schedule policy) to improve data locality -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org