Sorry, I meant raid 5 so we can lose any one drive and the whole node
will continue.
hdfs-site.xml is configured to allow one failed disk before shutting
down the datanode.
On 9/30/10 7:25 PM, Ryan Rawson wrote:
What kind of raid are you doing? Sounds like raid0, which means you
have a 100% chance of losing the entire box if a single disk goes
down. If you choose just one, lets say sda, to host the OS you are
now at 33% chance of losing the box if a disk goes bad - assuming that
all disks have the same failure probability of course.
What we do is install the OS on disk1, (sda), then have 4 JBODs and I
put our logs on disk1 as well. log4j is tricky because it will cause
issues on disk corruption/io error events, but i have seen systems
continue to operate even if log4j can't write to disk due to a disk
full scenario.
There is almost no non-HDFS data, you can literally wedge it in like
8gb. The biggest things that are not HDFS data are logs, and those
can go into the HDFS partition, they tend to be low volume but can add
up over time since the default is not to reap them.
On Thu, Sep 30, 2010 at 4:17 PM, Daniel Einspanjer
<[email protected]> wrote:
Right now, most of our boxes have 3 disk in them. We take a small partition
on each of those and raid stripe them together to use as the OS partition
then allocate the rest of the disks as JBOD for HDFS storage.
We are building out a new cluster and I'm wondering if there are any better
ideas for balancing the need for storage and speed of the HDFS disks with
having *some place* to put the OS and non-HDFS data.
What are other people doing about that?
-Daniel