Apologies for the plug, but using MapR FS would help you a lot here.  The
trick is that you can run an NFS server on every node and mount that server
as localhost.

The benefits are:

1) the entire cluster appears as a conventional POSIX style file system in
addition to being available via HDFS API's.

2) you get very high I/O speeds

3) you get real snapshots and mirrors if you need them

4) you get the use of the HBase API without having to run HBase.  Tables
are integrated directly into MapR FS.

5) programs that need to exceed local disk size can do so

6) data can be isolated to single machines if you want.

7) you can get it for free or pay for support


The downsides are:

1) it isn't HDFS.

2) the data platform isn't Apache licensed (all of eco-system code is
unchanged wrt licensing)





On Thu, May 28, 2015 at 9:37 AM, Matt <[email protected]> wrote:

> I know I can / should assign individual disks to HDFS, but as a test
> cluster there are apps that expect data volumes to work on. A dedicated
> Hadoop production cluster would have a disk layout specific to the task.

Reply via email to