Agreed that this is important. Looking at the current code, it seems to do the right thing (Helix ignores change in connection state unless there's flapping behavior). I can write a test.
Kanak ________________________________ > Date: Wed, 26 Feb 2014 16:08:25 -0800 > Subject: Dependency on Zookeeper at runtime > From: [email protected] > To: [email protected]; [email protected] > > Nice article by Pinterest folks on Zookeeper as SPoF. > http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest > > > Though I agree with the problems, not sure I would go the extent of > having separate daemons, it introduces more fault points. > > However, with Helix we have designed the system to continue to work in > the current state if Zookeeper crashes. Atleast I had that goal during > initial coding phase. > > Basically the system to work as if nothing happened. The only > compromise is that no more transitions can happen in the system while > zookeeper is down. > > Should we add an integration test to always guarantee this property. Is > this valuable. > > thanks, > Kishore G
