[
https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896854#comment-15896854
]
Huangkaixuan edited comment on YARN-6289 at 3/6/17 8:02 AM:
------------------------------------------------------------
The experiment details:
7 node cluster (1 master, 6 data nodes/node managers)
HostName Simple37 Simple27 Simple28 Simple30
Simple31 Simple32 Simple33
Role Master Master Node1 Node2 Node3 Node4 Node5 node6
Configure HDFS with replication factor 2
File has a single block in HDFS
Configure Spark to use dynamic allocation
Configure Yarn for both mapreduce shuffle service and Spark shuffle service
Add a single small file (few bytes) to HDFS
Run wordcount on the file (using Spark/MapReduce)
Inspect if the single task for the map stage was scheduled on the node with the
data
Results of experiment one (run 10 times):
7 node cluster(1 master, 6 data nodes/node managers), 2x replication, 1 block
file, MapReduce wordcount
Round NO. Data location Scheduled node Hit Time Cost
1 Node3/Node4 Node6 No 20s
2 Node5/Node3 Node6 No 17s
3 Node3/Node5 Node1 No 21s
4 Node2/Node3 Node6 No 18s
5 Node1/Node2 Node1 Yes 15s
6 Node4/Node5 Node3 No 19s
7 Node2/Node3 Node2 Yes 14s
8 Node1/Node4 Node5 No 16s
9 Node1/Node6 Node6 Yes 15s
10 Node3/Node5 Node4 NO 17s
7 node cluster(1 master, 6 data nodes/node managers), 2x replication, 1 block
file, Spark wordcount
Round NO. Data location Scheduled node Hit Time cost
1 Node3/Node4 Node4 Yes 24s
2 Node2/Node3 Node5 No 30s
3 Node3/Node5 Node4 No 35s
4 Node2/Node3 Node2 Yes 24s
5 Node1/Node2 Node4 No 26s
6 Node4/Node5 Node2 No 25s
7 Node2/Node3 Node4 No 27s
8 Node1/Node4 Node1 Yes 22s
9 Node1/Node6 Node2 No 23s
10 Node1/Node2 Node4 No 33s
was (Author: huangkx6810):
Experiment1:
7 node Hadoop cluster (1 master, 6 data nodes/node managers)
Simple37 Simple27 Simple28 Simple30 Simple31
Simple32 Simple33
Master Node1 Node2 Node3 Node4 Node5 node6
Configure HDFS with replication factor 2
File has a single block in HDFS
Configure Spark to use dynamic allocation
Configure Yarn for both mapreduce shuffle service and Spark shuffle
service
Add a single small file (few bytes) to HDFS
Run wordcount on the file (using Spark/MapReduce)
Inspect if the single task for the map stage was scheduled on the node
with the data
The result are shown in the webui as follow:
Result1:
7 node cluster(1 master, 6 data nodes/node managers), 2x replication, 1 block
file
MapReduce wordcount
Times Data location Scheduled node Hit Time
1 Node3/Node4 Node6 No 20s
2 Node5/Node3 Node6 No 17s
3 Node3/Node5 Node1 No 21s
4 Node2/Node3 Node6 No 18s
5 Node1/Node2 Node1 Yes 15s
6 Node4/Node5 Node3 No 19s
7 Node2/Node3 Node2 Yes 14s
8 Node1/Node4 Node5 No 16s
9 Node1/Node6 Node6 yes 15s
10 Node3/Node5 Node4 no 17s
7 node cluster(1 master, 6 data nodes/node managers), 2x replication, 1 block
file
Spark wordcount
Times Data location Scheduled node Hit Time
1 Node3/Node4 Node4 Yes 24s
2 Node2/Node3 Node5 No 30s
3 Node3/Node5 Node4 No 35s
4 Node2/Node3 Node2 Yes 24s
5 Node1/Node2 Node4 No 26s
6 Node4/Node5 Node2 No 25s
7 Node2/Node3 Node4 No 27s
8 Node1/Node4 Node1 Yes 22s
9 Node1/Node6 Node2 No 23s
10 Node1/Node2 Node4 No 33s
> yarn got little data locality
> -----------------------------
>
> Key: YARN-6289
> URL: https://issues.apache.org/jira/browse/YARN-6289
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2 Hadoop-2.7.1
> Reporter: Huangkaixuan
> Priority: Minor
>
> When I ran this experiment with both Spark and MapReduce wordcount on the
> file, I noticed that the job did not get data locality every time. It was
> seemingly random in the placement of the tasks, even though there is no other
> job running on the cluster. I expected the task placement to always be on the
> single machine which is holding the data block, but that did not happen.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]