We have a box that's a bit overpowered for just running our namenode and jobtracker on a 10-node cluster and we also wanted to make use of the storage and processor resources of that node, like you.
What we did is use LXC containers to segregate the different processes. LXC is a very light weight psudo-virtualization platform for linux (near 0 overhead). The key benefit to LXC, in this case, is that we can use linux cgroups (standard, simple config in LXC) to specify that the container/VM running the namenode/jobtracker should have 10x the CPU and IO resources than the container that runs a tasktracker/data node (though since LXC containers all run under the same kernel, any "unused" resources are assigned to runnable processes). We run cloudera hadoop and deployed a slightly modified tasktracker configuration on the shared box (fewer task slots so as to not over utilize memory). That tasktracker doesn't do as much work as the other dedicated nodes, but it does a fair share, and the cgroup configurations (cpu.shares & blkio.weight for the curious) ensure that the bulk processing doesn't interfere with the critical namenode & jobtracker systems. From: Robert Dyer [mailto:[email protected]] Sent: Tuesday, May 14, 2013 11:23 PM To: [email protected] Subject: Re: About configuring cluster setup You can, however note that unless you also run a TaskTracker on that node (bad idea) then any blocks that are replicated to this node won't be available as input to MapReduces and you are lowering the odds of having data locality on those blocks. On Tue, May 14, 2013 at 2:01 AM, Ramya S <[email protected]> wrote: Hi, Can we configure 1 node as both Name node and Data node ?
