One thought comes to mind: disk failure. In the event a disk goes bad, then with RAID0, you just lost your entire array. With JBOD, you lost one disk.
-Michael On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari <[email protected]> wrote: > Hi, > > I have a quick question regarding RAID0 performances vs multiple > dfs.data.dir entries. > > Let's say I have 2 x 2TB drives. > > I can configure them as 2 separate drives mounted on 2 folders and > assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives > with RAID0 and assigned them as a single folder to dfs.data.dir. > > With RAID0, the reads and writes are going to be spread over the 2 > disks. This is significantly increasing the speed. But if I put 2 > entries in dfs.data.dir, hadoop is going to spread over those 2 > directories too, and at the end, ths results should the same, no? > > Any experience/advice/results to share? > > Thanks, > > JM
