Dear hadoopers,
Has anyone been confronted to deploying a cluster in a traditional IT
shop whose admins handle thousands of servers ?
They traditionally use SAN or NAS storage for app data, rely on RAID 1
for system disks and in the few cases where internal disks are used,
they configure them with RAID 5 provided by the internal HW controller.
Using a JBOD setup , as advised in each and every Hadoop doc I ever laid
my hands on, means that each HDD failure will imply, on top of the
physical replacement of the drive, that an admin performs at least an mkfs.
Added to the fact that these operations will become more frequent since
more internal disks will be used, it can be perceived as an annoying
disruption in industrial handling of numerous servers.
In Tom White's guide there is a discussion of RAID 0, stating that Yahoo
benchmarks showed a 10% loss in performance so we can expect even worse
perf with RAID 5 but I found no figures.
I also found an Hortonworks interview of StackIQ who provides software
to automate such failure fix up. But it would be rather painful to go
straight to another solution, contract and so on while starting with Hadoop.
Please share your experiences around RAID for redundancy (1, 5 or other)
in Hadoop conf.
Thank you
Ulul