Apologies for the plug, but using MapR FS would help you a lot here. The trick is that you can run an NFS server on every node and mount that server as localhost.
The benefits are: 1) the entire cluster appears as a conventional POSIX style file system in addition to being available via HDFS API's. 2) you get very high I/O speeds 3) you get real snapshots and mirrors if you need them 4) you get the use of the HBase API without having to run HBase. Tables are integrated directly into MapR FS. 5) programs that need to exceed local disk size can do so 6) data can be isolated to single machines if you want. 7) you can get it for free or pay for support The downsides are: 1) it isn't HDFS. 2) the data platform isn't Apache licensed (all of eco-system code is unchanged wrt licensing) On Thu, May 28, 2015 at 9:37 AM, Matt <[email protected]> wrote: > I know I can / should assign individual disks to HDFS, but as a test > cluster there are apps that expect data volumes to work on. A dedicated > Hadoop production cluster would have a disk layout specific to the task.
