Hi Ted, > b) EC2 interconnect has a lot more going on than in a dedicated VLAN. That > can make the ZK servers appear a bit less connected. You have to plan for > ConnectionLoss events.
Interesting. > c) for highest reliability, I switched to large instances. On reflection, I > think that was helpful, but less important than I thought at the time. Besides the fact that there are more resources for ZooKeeper, this likely helps as well because it reduces the number of systems competing for the real hardware. > d) increasing and decreasing cluster size is nearly painless and is easily > scriptable. To decrease, do a rolling update on the survivors to update (...) Quite interesting indeed. I guess the work that Henry is pushing on these couple of JIRA tickets will greatly facilitate this. Do you mind if I ask you a couple of questions on this: Do you have any kind of performance data about how much load ZK can take under this environment? Have you tried to put the log and snapshot files under EBS? -- Gustavo Niemeyer http://niemeyer.net