Sharing the cluster with HDFS and Map reduce might cause significant
problems. Mapreduce is very IO intensive and this might cause lot of
unnecessary hiccups in your cluster. I would suggest atleast providing
something like this, if you really want to share the nodes.
- atleast considerable amount of memory space say 400-500MB (depending on
your usage) for the java heap
- one dedicated disk not used by MR or Datanodes, so that ZooKeeper
performance is a little predictable for you.
On 3/8/10 10:58 AM, "David Rosenstrauch" <dar...@darose.net> wrote:
> I'm contemplating an upcoming zookeeper rollout and was wondering what
> the zookeeper brain trust here thought about a network deployment question:
> Is it generally considered bad practice to just deploy zookeeper on our
> existing hdfs/MR nodes? Or is it better to run zookeeper instances on
> their own dedicated nodes?
> On the one hand, we're not going to be making heavy-duty use of
> zookeeper, so it might be sufficient for zookeeper nodes to share box
> resources with HDFS & MR. On the other hand, though, I don't want
> zookeeper to become unavailable if the nodes are running a resource
> intensive job that's hogging CPU or network.
> What's generally considered best practice for Zookeeper?