Dear Sai Sai, "Hadoop, the definitive guide" says regarding default replica placement:
- first replica is placed on the same node as the client (lowest bandwidth penalty). - second replica is placed off-rack, at a random node of the other rack (avoiding busy racks). - third replicate is placed on random node on rack where second replica is stored. - other replicas are placed on random nodes of the cluster (avoiding busy racks). If client is not on the cluster, first replica is placed on a random node (avoiding busy racks). Best regards, Jens
