[ 
https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927544#comment-15927544
 ] 

Huangkaixuan commented on YARN-6289:
------------------------------------

Hi [~leftnoteasy]

To verify whether rack awareness has any influence on data locality in former 
experiments, we have carried out two experiments in which we configured the 
rack awareness for all the nodes to be in the same rack and to be in separate 
racks respectively. 

>From the results, we can find that the tasks get node locality and rack 
>locality when configuring the rack awareness for all the nodes to be in the 
>same rack. And, the tasks get node locality and OffSwitch when configuring the 
>rack awareness for all the nodes to be in separate racks. 

The conclusion is that the data the rack awareness settings do not make it 
anymore likely to schedule data local tasks in our experiments.

The detailed environment and results of the experiments are shown in the 
attachment.


> Fail to achieve data locality when runing MapReduce and Spark on HDFS
> ---------------------------------------------------------------------
>
>                 Key: YARN-6289
>                 URL: https://issues.apache.org/jira/browse/YARN-6289
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: distributed-scheduling
>         Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread 
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2   Hadoop-2.7.1 
>            Reporter: Huangkaixuan
>         Attachments: Hadoop_Spark_Conf.zip, YARN-DataLocality.docx
>
>
> When running a simple wordcount experiment on YARN, I noticed that the task 
> failed to achieve data locality, even though there is no other job running on 
> the cluster at the same time. The experiment was done in a 7-node (1 master, 
> 6 data nodes/node managers) cluster and the input of the wordcount job (both 
> Spark and MapReduce) is a single-block file in HDFS which is two-way 
> replicated (replication factor = 2). I ran wordcount on YARN for 10 times. 
> The results show that only 30% of tasks can achieve data locality, which 
> seems like the result of a random placement of tasks. The experiment details 
> are in the attachment, and feel free to reproduce the experiments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to