Hi

I have read scattered documentation across the net which mostly say HDFS 
doesn't go well with SAN being used to store data. While some say, it is an 
emerging trend. I would love to know if there have been any tests performed 
which hint on what aspects does a direct storage excels/falls behind a SAN.

We are investigating whether a direct storage option is better than a SAN 
storage for a modest cluster with data in 100 TBs in steady state. The SAN of 
course can support order of magnitude more of iops we care about for now, but 
given it is a shared infrastructure and we may expand our data size, it may not 
be an advantage in the future.

Another thing I am interested in: for MR jobs, where data locality is the key 
driver, how does that span out when using a SAN instead of direct storage?

And of course on the subjective topics of availability and reliability on using 
a SAN for data storage in HDFS, I would love to receive your views.

Thanks,
Abhishek

Reply via email to