HDFS-13156. HDFS Block Placement Policy - Client Local Rack. Contributed by 
Ayush Saxena.

(cherry picked from commit de44e1064f051248934ceffdd98a3cc13653d886)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d838d39a
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d838d39a
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d838d39a

Branch: refs/heads/branch-3.2
Commit: d838d39a2cb13bfd1ded50fb91f44168b66ed99e
Parents: 62d329c
Author: Vinayakumar B <vinayakum...@apache.org>
Authored: Fri Oct 12 17:27:23 2018 +0530
Committer: Vinayakumar B <vinayakum...@apache.org>
Committed: Fri Oct 12 17:28:48 2018 +0530

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/d838d39a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
index 471a27f..a0121e9 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -102,7 +102,7 @@ Large HDFS instances run on a cluster of computers that 
commonly spread across m
 The NameNode determines the rack id each DataNode belongs to via the process 
outlined in [Hadoop Rack Awareness](../hadoop-common/RackAwareness.html).
 A simple but non-optimal policy is to place replicas on unique racks. This 
prevents losing data when an entire rack fails and allows use of bandwidth from 
multiple racks when reading data. This policy evenly distributes replicas in 
the cluster which makes it easy to balance load on component failure. However, 
this policy increases the cost of writes because a write needs to transfer 
blocks to multiple racks.
 
-For the common case, when the replication factor is three, HDFS’s placement 
policy is to put one replica on the local machine if the writer is on a 
datanode, otherwise on a random datanode, another replica on a node in a 
different (remote) rack, and the last on a different node in the same remote 
rack. This policy cuts the inter-rack write traffic which generally improves 
write performance. The chance of rack failure is far less than that of node 
failure; this policy does not impact data reliability and availability 
guarantees. However, it does reduce the aggregate network bandwidth used when 
reading data since a block is placed in only two unique racks rather than 
three. With this policy, the replicas of a file do not evenly distribute across 
the racks. One third of replicas are on one node, two thirds of replicas are on 
one rack, and the other third are evenly distributed across the remaining 
racks. This policy improves write performance without compromising data 
reliability or 
 read performance.
+For the common case, when the replication factor is three, HDFS’s placement 
policy is to put one replica on the local machine if the writer is on a 
datanode, otherwise on a random datanode in the same rack as that of the 
writer, another replica on a node in a different (remote) rack, and the last on 
a different node in the same remote rack. This policy cuts the inter-rack write 
traffic which generally improves write performance. The chance of rack failure 
is far less than that of node failure; this policy does not impact data 
reliability and availability guarantees. However, it does reduce the aggregate 
network bandwidth used when reading data since a block is placed in only two 
unique racks rather than three. With this policy, the replicas of a file do not 
evenly distribute across the racks. One third of replicas are on one node, two 
thirds of replicas are on one rack, and the other third are evenly distributed 
across the remaining racks. This policy improves write performance wi
 thout compromising data reliability or read performance.
 
 If the replication factor is greater than 3,
 the placement of the 4th and following replicas are determined randomly


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

Reply via email to