Hi, I have a quick question regarding RAID0 performances vs multiple dfs.data.dir entries.
Let's say I have 2 x 2TB drives. I can configure them as 2 separate drives mounted on 2 folders and assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives with RAID0 and assigned them as a single folder to dfs.data.dir. With RAID0, the reads and writes are going to be spread over the 2 disks. This is significantly increasing the speed. But if I put 2 entries in dfs.data.dir, hadoop is going to spread over those 2 directories too, and at the end, ths results should the same, no? Any experience/advice/results to share? Thanks, JM
