[
https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Huangkaixuan updated YARN-6289:
-------------------------------
Summary: Fail to achieve data locality when runing MapReduce and Spark on
HDFS (was: yarn got little data locality)
> Fail to achieve data locality when runing MapReduce and Spark on HDFS
> ---------------------------------------------------------------------
>
> Key: YARN-6289
> URL: https://issues.apache.org/jira/browse/YARN-6289
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2 Hadoop-2.7.1
> Reporter: Huangkaixuan
> Priority: Minor
> Attachments: YARN-6289.01.docx
>
>
> When I ran experiments with both Spark and MapReduce wordcount with yarn
> on the file, I noticed that the job did not get data locality every time. It
> was seemingly random in the placement of the tasks, even though there is no
> other job running on the cluster. I expected the task placement to always be
> on the single machine which is holding the data block, but that did not
> happen.
> I run the experiments with a 7 node cluster with 2x replication(1
> master, 6 data nodes/node managers) , the experiment details are in the patch
> so you can recreate the result.
> In the experiments, I run Spark/MapReduce wordcount with yarn for 10
> times in a single block and the results show that only 30% of tasks can
> satisfy data locality, it seems like random in the placement of tasks.
> Next,I will run two more experiments(7 node cluster with 2x replication
> with 2 blocks and 4 blocks) to verify the results and plan to do some
> optimization work (optimize the schedule policy) to improve data locality
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]