Recommendations for zookeeper deployment

Mekaraj, Prashant Tue, 12 Jan 2010 10:38:41 -0800

Hi,

http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great 
resource. It's rare to see a open source project think so much about practical 
enterprise deployment and this is much appreciated.


There are a few more recommendations that I think would be useful to add to the 
page. 

1. dataDir size: Since the dataDir stores snapshots and you recommend storing 
at least 3 snapshots, I am thinking of using 3 times the size of the heap 
allocated to the process as a guideline for how big the dataDir drive should be.
2. dataLogDir size: Since a new log file is started every time a snapshot is 
taken, and using 3 snapshots as a recommendation, I am thinking of using the 
same 3 times size of heap as a guideline.
3. Persistence of data and log directories: 
https://issues.apache.org/jira/browse/ZOOKEEPER-546 implies that there are 
cases where all zk data is  loaded from a different configuration store. In 
such cases, even if I use a disk that is cleaned regularly(on reboots or 
rebuilds), I would be fine. 

Also - If a zk server were to be added to an existing ensemble- for example 
when the machine reboots), if the data and datalog directories are empty, it 
seems to me that the server would sync with the leader and build its log and 
snapshots again, although there will be a performance hit on the entire 
ensemble while this is taking place. Is this correct ?

Thanks again
-prashant



--------------------------------------------------------------------------
NOTICE: If received in error, please destroy, and notify sender. Sender does 
not intend to waive confidentiality or privilege. Use of this email is 
prohibited when received in error. We may monitor and store emails to the 
extent permitted by applicable law.

Recommendations for zookeeper deployment

Reply via email to