[Moving to dev] Although I'm in total agreement with the idea of "no complexity until it's necessary" I don't see that there's a really strong technical reason not to include this primitive. It's very similar to the multi-get style API that, say, memcache gives you.
zoo_multi_test_and_set(List<int> versions, List<string> znodes, List<byte> data) would be an example API, and seems to me like it could be implemented in the same way as a single set_data call. I definitely don't support any kind of multiple-call api (like transactions) because it doesn't fit with the ZooKeeper one method call = 1 linearization point model. I really do recommend the Sinfonia paper from SOSP '07 for those that haven't read it ( http://www.hpl.hp.com/personal/Mehul_Shah/papers/sosp_2007_aguilera.pdf) for a nice implementation of these kinds of ideas. A supporting argument is this: if this *is* very hard to implement currently, I think we could expend some effort to make it easier. Decoupling operations on the data tree and voting for them further (and also decoupling session management and data tree updates) would be a worthwhile cleanup for 3.4.0. It would be really cool to be able to put a different storage engine behind ZK (I can think of many examples!) with a minimum of effort. At the same time, there are some API calls that I might find useful (get minimum sequential node, for example) whose prototyping and implementation would be made easier. cheers, Henry On 30 March 2010 13:00, Benjamin Reed <br...@yahoo-inc.com> wrote: > i agree with ted. i think he points out some disadvantages with trying do > do more. there is a slippery slope with these kinds of things. the > implementation is complicated enough even with the simple model that we use. > > ben > > > On 03/29/2010 08:34 PM, Ted Dunning wrote: > >> I perhaps should not have said power, except insofar as ZK's strengths are >> in reliability which derives from simplicity. >> >> There are essentially two common ways to implement multi-node update. The >> first is the tradtional db style with begin-transaction paired with either >> a >> commit or a rollback after some number of updates. This is clearly >> unacceptable in the ZK world if the updates are sent to the server because >> there can be an indefinite delay between the begin and commit. >> >> A second approach is to buffer all of the updates on the client side and >> transmit them in a batch to the server to succeed or fail as a group. >> This >> allows updates to be arbitrarily complex which begins to eat away at the >> "no-blocking" guarantee a bit. >> >> On Mon, Mar 29, 2010 at 8:08 PM, Henry Robinson<he...@cloudera.com> >> wrote: >> >> >> >>> Could you say a bit about how you feel ZK would sacrifice power and >>> reliability through multi-node updates? My view is that it wouldn't: >>> since >>> all operations are executed serially, there's no concurrency to be lost >>> by >>> allowing multi-updates, and there doesn't need to be a 'start / end' >>> transactional style interface (which I do believe would be very bad). >>> >>> I could see ZK implement a Sinfonia-style batch operation API which makes >>> all-or-none updates. The reason I can see that it doesn't already allow >>> this >>> is the avowed intent of the original ZK team to keep the API as simple as >>> it >>> can reasonably be, and to not introduce complexity without need. >>> >>> >>> >> > -- Henry Robinson Software Engineer Cloudera 415-994-6679