@Michael: I have done some tests between RAID0, 1, JBOD and LVM on another server.
Results are there: http://www.spaggiari.org/index.php/hbase/hard-drives-performances LVM and JBOD were close, that's why I talked about LVM, since it seems to be pretty close to JBOD performance wyse and can be done on any hardware even if the MB is not proposing any RAID/JBOD option. @Chris: I will have to test and see. Like what if I had a drive now to an existing DataNode? Is it going to spread it's existing data over the 2 drives? Or are they going to grow the same speed? I will add one drive to one server tomorrow and see the results... Then I will run some performances tests and see... 2013/2/10, Michael Katzenellenbogen <[email protected]>: > Are you able to create multiple RAID0 volumes? Perhaps you can expose > each disk as its own RAID0 volume... > > Not sure why or where LVM comes into the picture here ... LVM is on > the software layer and (hopefully) the RAID/JBOD stuff is at the > hardware layer (and in the case of HDFS, LVM will only add unneeded > overhead). > > -Michael > > On Feb 10, 2013, at 9:19 PM, Jean-Marc Spaggiari > <[email protected]> wrote: > >> The issue is that my MB is not doing JBOD :( I have RAID only >> possible, and I'm fighting for the last 48h and still not able to make >> it work... That's why I'm thinking about using dfs.data.dir instead. >> >> I have 1 drive per node so far and need to move to 2 to reduce WIO. >> >> What will be better with JBOD against dfs.data.dir? I have done some >> tests JBOD vs LVM and did not find any pros for JBOD so far. >> >> JM >> >> 2013/2/10, Michael Katzenellenbogen <[email protected]>: >>> One thought comes to mind: disk failure. In the event a disk goes bad, >>> then with RAID0, you just lost your entire array. With JBOD, you lost >>> one disk. >>> >>> -Michael >>> >>> On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari >>> <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have a quick question regarding RAID0 performances vs multiple >>>> dfs.data.dir entries. >>>> >>>> Let's say I have 2 x 2TB drives. >>>> >>>> I can configure them as 2 separate drives mounted on 2 folders and >>>> assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives >>>> with RAID0 and assigned them as a single folder to dfs.data.dir. >>>> >>>> With RAID0, the reads and writes are going to be spread over the 2 >>>> disks. This is significantly increasing the speed. But if I put 2 >>>> entries in dfs.data.dir, hadoop is going to spread over those 2 >>>> directories too, and at the end, ths results should the same, no? >>>> >>>> Any experience/advice/results to share? >>>> >>>> Thanks, >>>> >>>> JM >>> >
