One other word of advice, since these disks are so slow, you will want to go for more spindles. 4 disks per node is pretty minimum and some people are advocating more like 24 disks per node (!). I'd probably balance between 4-12 or so.
Remember even if your SATA is rated at 150MB/sec, that is the data rate from sequential reads, not including seeking which destroys that performance. Even with fast disks you can still have a higher await as IO requests pile up behind other ones. If you are expecting low MS read this will blow your 95th percentile of the water... I have seen this many times, running even a medium or low IO job on a latency sensitive cluster can drive up the highest percentile really really high. Literally 20-40ms -> 150->800ms. On Thu, Sep 30, 2010 at 4:28 PM, Daniel Einspanjer <[email protected]> wrote: > Sorry, I meant raid 5 so we can lose any one drive and the whole node will > continue. > hdfs-site.xml is configured to allow one failed disk before shutting down > the datanode. > > On 9/30/10 7:25 PM, Ryan Rawson wrote: >> >> What kind of raid are you doing? Sounds like raid0, which means you >> have a 100% chance of losing the entire box if a single disk goes >> down. If you choose just one, lets say sda, to host the OS you are >> now at 33% chance of losing the box if a disk goes bad - assuming that >> all disks have the same failure probability of course. >> >> What we do is install the OS on disk1, (sda), then have 4 JBODs and I >> put our logs on disk1 as well. log4j is tricky because it will cause >> issues on disk corruption/io error events, but i have seen systems >> continue to operate even if log4j can't write to disk due to a disk >> full scenario. >> >> There is almost no non-HDFS data, you can literally wedge it in like >> 8gb. The biggest things that are not HDFS data are logs, and those >> can go into the HDFS partition, they tend to be low volume but can add >> up over time since the default is not to reap them. >> >> >> >> On Thu, Sep 30, 2010 at 4:17 PM, Daniel Einspanjer >> <[email protected]> wrote: >>> >>> Right now, most of our boxes have 3 disk in them. We take a small >>> partition >>> on each of those and raid stripe them together to use as the OS partition >>> then allocate the rest of the disks as JBOD for HDFS storage. >>> >>> We are building out a new cluster and I'm wondering if there are any >>> better >>> ideas for balancing the need for storage and speed of the HDFS disks with >>> having *some place* to put the OS and non-HDFS data. >>> >>> What are other people doing about that? >>> >>> -Daniel >>> >
