One other word of advice, since these disks are so slow, you will want
to go for more spindles.  4 disks per node is pretty minimum and some
people are advocating more like 24 disks per node (!).  I'd probably
balance between 4-12 or so.

Remember even if your SATA is rated at 150MB/sec, that is the data
rate from sequential reads, not including seeking which destroys that
performance.  Even with fast disks you can still have a higher await
as IO requests pile up behind other ones.  If you are expecting low MS
read this will blow your 95th percentile of the water... I have seen
this many times, running even a medium or low IO job on a latency
sensitive cluster can drive up the highest percentile really really
high.  Literally 20-40ms -> 150->800ms.

On Thu, Sep 30, 2010 at 4:28 PM, Daniel Einspanjer
<[email protected]> wrote:
> Sorry, I meant raid 5 so we can lose any one drive and the whole node will
> continue.
> hdfs-site.xml is configured to allow one failed disk before shutting down
> the datanode.
>
> On 9/30/10 7:25 PM, Ryan Rawson wrote:
>>
>> What kind of raid are you doing?  Sounds like raid0, which means you
>> have a 100% chance of losing the entire box if a single disk goes
>> down.  If you choose just one, lets say sda, to host the OS you are
>> now at 33% chance of losing the box if a disk goes bad - assuming that
>> all disks have the same failure probability of course.
>>
>> What we do is install the OS on disk1, (sda), then have 4 JBODs and I
>> put our logs on disk1 as well.  log4j is tricky because it will cause
>> issues on disk corruption/io error events, but i have seen systems
>> continue to operate even if log4j can't write to disk due to a disk
>> full scenario.
>>
>> There is almost no non-HDFS data, you can literally wedge it in like
>> 8gb.  The biggest things that are not HDFS data are logs, and those
>> can go into the HDFS partition, they tend to be low volume but can add
>> up over time since the default is not to reap them.
>>
>>
>>
>> On Thu, Sep 30, 2010 at 4:17 PM, Daniel Einspanjer
>> <[email protected]>  wrote:
>>>
>>> Right now, most of our boxes have 3 disk in them.  We take a small
>>> partition
>>> on each of those and raid stripe them together to use as the OS partition
>>> then allocate the rest of the disks as JBOD for HDFS storage.
>>>
>>> We are building out a new cluster and I'm wondering if there are any
>>> better
>>> ideas for balancing the need for storage and speed of the HDFS disks with
>>> having *some place* to put the OS and non-HDFS data.
>>>
>>> What are other people doing about that?
>>>
>>> -Daniel
>>>
>

Reply via email to