Re: Mutiple dfs.data.dir vs RAID0

Jean-Marc Spaggiari Sun, 10 Feb 2013 18:38:41 -0800

@Michael:
I have done some tests between RAID0, 1, JBOD and LVM on another server.


Results are there:
http://www.spaggiari.org/index.php/hbase/hard-drives-performances
LVM and JBOD were close, that's why I talked about LVM, since it seems
to be pretty close to JBOD performance wyse and can be done on any
hardware even if the MB is not proposing any RAID/JBOD option.

@Chris:
I will have to test and see. Like what if I had a drive now to an
existing DataNode? Is it going to spread it's existing data over the 2
drives? Or are they going to grow the same speed?

I will add one drive to one server tomorrow and see the results...
Then I will run some performances tests and see...

2013/2/10, Michael Katzenellenbogen <[email protected]>:
> Are you able to create multiple RAID0 volumes? Perhaps you can expose
> each disk as its own RAID0 volume...
>
> Not sure why or where LVM comes into the picture here ... LVM is on
> the software layer and (hopefully) the RAID/JBOD stuff is at the
> hardware layer (and in the case of HDFS, LVM will only add unneeded
> overhead).
>
> -Michael
>
> On Feb 10, 2013, at 9:19 PM, Jean-Marc Spaggiari
> <[email protected]> wrote:
>
>> The issue is that my MB is not doing JBOD :( I have RAID only
>> possible, and I'm fighting for the last 48h and still not able to make
>> it work... That's why I'm thinking about using dfs.data.dir instead.
>>
>> I have 1 drive per node so far and need to move to 2 to reduce WIO.
>>
>> What will be better with JBOD against dfs.data.dir? I have done some
>> tests JBOD vs LVM and did not find any pros for JBOD so far.
>>
>> JM
>>
>> 2013/2/10, Michael Katzenellenbogen <[email protected]>:
>>> One thought comes to mind: disk failure. In the event a disk goes bad,
>>> then with RAID0, you just lost your entire array. With JBOD, you lost
>>> one disk.
>>>
>>> -Michael
>>>
>>> On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari
>>> <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a quick question regarding RAID0 performances vs multiple
>>>> dfs.data.dir entries.
>>>>
>>>> Let's say I have 2 x 2TB drives.
>>>>
>>>> I can configure them as 2 separate drives mounted on 2 folders and
>>>> assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
>>>> with RAID0 and assigned them as a single folder to dfs.data.dir.
>>>>
>>>> With RAID0, the reads and writes are going to be spread over the 2
>>>> disks. This is significantly increasing the speed. But if I put 2
>>>> entries in dfs.data.dir, hadoop is going to spread over those 2
>>>> directories too, and at the end, ths results should the same, no?
>>>>
>>>> Any experience/advice/results to share?
>>>>
>>>> Thanks,
>>>>
>>>> JM
>>>
>

Re: Mutiple dfs.data.dir vs RAID0

Reply via email to