All, We currently a Hadoop 2.2.0 cluster with the following characteristics: - 4 nodes - Each node is a datanode - Each node has 3 physical disks for data: 2 x 500GB and 1 x 2TB disk. - HDFS replication factor of 3
It appears that our 500GB disks are filling up first (the alternative would be to put 4 times the number of blocks on the 2TB disks per node). I'm concerned that once the 500GB disks fill, our performance will slow down (less spindles being read / written at the same time per node). Is this correct? Is there anything we can do to change this behavior?
Thanks, Brian
