I think the standard advice is to use only one zk node for clusters of size < 10, and to collocate it with the namenode. So, I would suggest changing your config to: 1 master + NN + ZK 1 client (doing heavy put & get) 6 RS+DN.
The reason you want to have an odd number of zk nodes is because zookeeper uses a quorum protocol that requires a majority of configured nodes to be operational (e.g. 1/2 + 1). So if you have 6 zk nodes you will need to have 4 operational at any one time or the cluster goes down. If you have 5 zk nodes, you will only need to have 3 available -- giving you an opportunity to perform maintenance on one and still be resilient to a single failure. In general, the more zk nodes you have the faster reads will be (as they can be distributed among any of the nodes), and the slower writes will be (as all nodes must complete a write). That being said, your cluster is so small that you don't have to worry so much about fault tolerance. A single zk node will be much better. Dave -----Original Message----- From: Tao Xie [mailto:[email protected]] Sent: Monday, September 13, 2010 8:09 PM To: [email protected] Subject: how about zookeeper overhead? I see the following recommendation in http://*hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements "It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk. For very heavily loaded clusters, run ZooKeeper servers on separate machines from the Region Servers (DataNodes and TaskTrackers). Now my cofiguration is 1 master + NN 1 client (doing heavy put & get) 6 RS+DN+ZK. If I start only one zk on the master node, I see throughput for put operation increase. I want to know what's the correct way to configure zk and if I have only one zk, what about the impacts to put and get performance? Can the zk becomes bottleneck? I heard someone says the read performance will be negatively affected. I haven't tested it yet. Thanks.
