I have used 5 and 3 in different clusters. Moderate amounts of sharing is
reasonable, but sharing with less intensive applications is definitely
better. Sharing with the job tracker, for instance is likely fine since it
doesn't abuse disk so much. The namenode is similar, but not quite as
nice. Sharing with task only nodes is better than sharing with data nodes.
If your hadoop cluster is 10 machines, this is probably pretty serious
overhead. If it is 200 machines, it is much less so.
If you are running in EC2, then spawning 3 extra small instances is not a
For the record, we share our production ZK machines with other tasks, but
not with map-reduce related tasks and not with our production search
On Mon, Mar 8, 2010 at 11:21 AM, Patrick Hunt <ph...@apache.org> wrote:
> Best practice for "on-line production serving" is 5 dedicated hosts with
> "shared nothing", physically distributed thoughout the data center (5 hosts
> in a rack might not be the best idea for super reliability). There's alot of
> lee-way though, many ppl run with 3 and spof on switch for example.