Hadoop and RAID 5

Ulul Wed, 01 Oct 2014 14:02:52 -0700

Dear hadoopers,

Has anyone been confronted to deploying a cluster in a traditional ITshop whose admins handle thousands of servers ?They traditionally use SAN or NAS storage for app data, rely on RAID 1for system disks and in the few cases where internal disks are used,they configure them with RAID 5 provided by the internal HW controller.

Using a JBOD setup , as advised in each and every Hadoop doc I ever laidmy hands on, means that each HDD failure will imply, on top of thephysical replacement of the drive, that an admin performs at least an mkfs.Added to the fact that these operations will become more frequent sincemore internal disks will be used, it can be perceived as an annoyingdisruption in industrial handling of numerous servers.

In Tom White's guide there is a discussion of RAID 0, stating that Yahoobenchmarks showed a 10% loss in performance so we can expect even worseperf with RAID 5 but I found no figures.

I also found an Hortonworks interview of StackIQ who provides softwareto automate such failure fix up. But it would be rather painful to gostraight to another solution, contract and so on while starting with Hadoop.

Please share your experiences around RAID for redundancy (1, 5 or other)in Hadoop conf.


Thank you
Ulul

Hadoop and RAID 5

Reply via email to