With rep of 3 you would have to lose 3 entire nodes to lose data. The rep
factor is 3 nodes, not 3 spindles.. The number of disks (sort of) determine
how hdfs spreads io across the spindles for the single copy of the data
(one of 3 nodes with copies) that the node owns. Note that things get
slightly complicated when the FIRST datum is written to a cluster. (But
that was not your question ; {)
On Apr 4, 2015 10:39 PM, "Arthur Chan" <[email protected]> wrote:> Hi, > > I use the default replication factor 3 here, the cluster has 10 nodes, > each of my datanode has 8 hard disks. If one of the nodes is down because > of hardware failure, i.e. the 8 hard disks will no longer be available > immediately during the down time of this machine, does it mean that I will > have data lost? (8 hard disks > 3 replicated) > > Or what would be the maximum number of servers that are allowed to be down > without data lost here? > > Regards > Arthur > > On Wednesday, December 17, 2014, Harshit Mathur <[email protected]> > wrote: > >> Hi Arthur, >> >> In HDFS there will be block level replication, In case of total failure >> of a datanode the lost blocks will get under replicated hence the namenode >> will create copy of these under replicated blocks on some other datanode. >> >> BR, >> Harshit >> >> On Wed, Dec 17, 2014 at 11:35 AM, [email protected] < >> [email protected]> wrote: >>> >>> Hi, >>> >>> If each of my datanode servers has 8 hard disks (a 10-node cluster) and >>> I use the default replication factor of 3, how will Hadoop handle it when a >>> datanode with total hardware failure suddenly? >>> >>> Regards >>> Arthur >>> >> >> >> >> -- >> Harshit Mathur >> >
