Re: need for more conditional write support
Ben, Can you point me to the place where the queue is inspected to convert updates into idempotent writes? On Mon, Dec 20, 2010 at 6:56 PM, Qian Ye yeqian@gmail.com wrote: Hi all, have we reached any consensus on this issue? What's our next step about it? I'm looking forward to make use of this kind of feature. thanks~ On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning ted.dunn...@gmail.com wrote: One alternative is to simply specify -1 in the version list to avoid the version check for that one item. That would allow the subset constraint to be retained as a valid semantic check for most situations and would allow a very explicit way to describe when you want to violate that constraint. On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright wrig...@gmail.com wrote: I'm not sure why (other than your syntax) you would require the second list (to update) to be a subset of the first (to test). There are plenty of situations where you may want to update one node based on the value of another (and test that the value hasn't changed before updating) but don't really care about the second node, and it would just be extra overhead to check it's current value. In fact, I think that was the OP's situation. -Dave On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. This is isomorphic to my suggestion to allow null data. We should toss around many options to figure out which is the most congenial idiom. Yours is nice since it has two sets of parallel lists. In java with optional arguments it would be possible to use a builder style with optional arguments: zk.testVersions(node1, version1, node2, version2, ...) .updateData(node1, data1, node3, data3, ...) I would tend to make it part of the contract that the nodes in the second part be a subset of of the nodes in the first part. The first method would create an object packaging up the first set of args and the second method would do the work. Of course, this is just syntactic sugar for the more list oriented version. On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright wrig...@gmail.com wrote: My recommendation would actually be a combination of the two which offers the most flexibility: zoo_multi_test_and_set(Liststring znodesToTest, Listint versions, Liststring znodesToSet, Listbyte[] data) ...this specifies a list of nodes versions to check, and if the versions match, a list of nodes to set and the associated data. This allows multiple scenarios, including setting nodes other than the ones you are version checking, setting more nodes than you version check, checking more nodes than you set, etc. I don't think the implementation would be any harder than either of the others. -Dave On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning ted.dunn...@gmail.com wrote: Well, I would just call the first method set. And I think that the second method is no easier to implement and probably a bit less useful. The idea that the second might be almost as useful as the first is interesting however. It probably means that we should allow some of the data elements to be null or something to allow for testing versions but not setting data. On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye yeqian@gmail.com wrote: zoo_multi_test_and_set(Listint versions, Liststring znodes, Listbyte[] data) can solve the problem I mentioned before, and some relavant issues, like hard for programmers to use, as mentioned in mail-archive, should be paid attention to. I think we can move small step first, that is, provide interface like zoo_multi_test_and_set(Listint versions, Liststring znodes, byte[] data, string znode) The API test versions of several different znodes before set one znode, and if the client want to set other znode, it can call this API repeatedly. Because we only set one node by this API, the result will be straight, success or failure. We need not take care of the half-success result. How do ur guys think about this API? -- With Regards! Ye, Qian
Re: need for more conditional write support
are you guys going to put a limit on the size of the updates? can someone do an update over 50 znodes where data value is 500K, for example? if there is a failure during the update, is it okay for just a subset of the znodes to be updated? ben On 12/20/2010 06:56 PM, Qian Ye wrote: Hi all, have we reached any consensus on this issue? What's our next step about it? I'm looking forward to make use of this kind of feature. thanks~ On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunningted.dunn...@gmail.com wrote: One alternative is to simply specify -1 in the version list to avoid the version check for that one item. That would allow the subset constraint to be retained as a valid semantic check for most situations and would allow a very explicit way to describe when you want to violate that constraint. On Thu, Dec 16, 2010 at 10:06 AM, Dave Wrightwrig...@gmail.com wrote: I'm not sure why (other than your syntax) you would require the second list (to update) to be a subset of the first (to test). There are plenty of situations where you may want to update one node based on the value of another (and test that the value hasn't changed before updating) but don't really care about the second node, and it would just be extra overhead to check it's current value. In fact, I think that was the OP's situation. -Dave On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunningted.dunn...@gmail.com wrote: Yes. This is isomorphic to my suggestion to allow null data. We should toss around many options to figure out which is the most congenial idiom. Yours is nice since it has two sets of parallel lists. In java with optional arguments it would be possible to use a builder style with optional arguments: zk.testVersions(node1, version1, node2, version2, ...) .updateData(node1, data1, node3, data3, ...) I would tend to make it part of the contract that the nodes in the second part be a subset of of the nodes in the first part. The first method would create an object packaging up the first set of args and the second method would do the work. Of course, this is just syntactic sugar for the more list oriented version. On Thu, Dec 16, 2010 at 8:16 AM, Dave Wrightwrig...@gmail.com wrote: My recommendation would actually be a combination of the two which offers the most flexibility: zoo_multi_test_and_set(Liststring znodesToTest, Listint versions, Liststring znodesToSet, Listbyte[] data) ...this specifies a list of nodes versions to check, and if the versions match, a list of nodes to set and the associated data. This allows multiple scenarios, including setting nodes other than the ones you are version checking, setting more nodes than you version check, checking more nodes than you set, etc. I don't think the implementation would be any harder than either of the others. -Dave On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunningted.dunn...@gmail.com wrote: Well, I would just call the first method set. And I think that the second method is no easier to implement and probably a bit less useful. The idea that the second might be almost as useful as the first is interesting however. It probably means that we should allow some of the data elements to be null or something to allow for testing versions but not setting data. On Tue, Dec 14, 2010 at 11:21 PM, Qian Yeyeqian@gmail.com wrote: zoo_multi_test_and_set(Listint versions, Liststring znodes, Listbyte[] data) can solve the problem I mentioned before, and some relavant issues, like hard for programmers to use, as mentioned in mail-archive, should be paid attention to. I think we can move small step first, that is, provide interface like zoo_multi_test_and_set(Listint versions, Liststring znodes, byte[] data, string znode) The API test versions of several different znodes before set one znode, and if the client want to set other znode, it can call this API repeatedly. Because we only set one node by this API, the result will be straight, success or failure. We need not take care of the half-success result. How do ur guys think about this API?
Re: need for more conditional write support
My thought is that this should be handled by limiting the *total* size of all updates in one call to be limited the same way that *single* update sizes are limited now. On Thu, Dec 16, 2010 at 11:04 AM, Henry Robinson he...@cloudera.com wrote: This should be a cautionary note on performance, however: as there is no parallelism in the execution of updates (although there is plenty in the serialisation process) we should build a mechanism to constrain how much work this operation can perform, otherwise there's a danger of hurting throughput for all clients of a cluster.
Re: need for more conditional write support
I think the second is easier to implement because it only updates data on one node, and need not handle rollback in the case of some update failure after some update success. Well, I agree that the API zoo_multi_test_and_set(Listint versions, Liststring znodes_test, Listbyte[] data, Liststring znodes_set); is more powerful. If we can decide to do this, I think we could start to discuss some implement details. On Wed, Dec 15, 2010 at 11:50 PM, Ted Dunning ted.dunn...@gmail.com wrote: Well, I would just call the first method set. And I think that the second method is no easier to implement and probably a bit less useful. The idea that the second might be almost as useful as the first is interesting however. It probably means that we should allow some of the data elements to be null or something to allow for testing versions but not setting data. On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye yeqian@gmail.com wrote: zoo_multi_test_and_set(Listint versions, Liststring znodes, Listbyte[] data) can solve the problem I mentioned before, and some relavant issues, like hard for programmers to use, as mentioned in mail-archive, should be paid attention to. I think we can move small step first, that is, provide interface like zoo_multi_test_and_set(Listint versions, Liststring znodes, byte[] data, string znode) The API test versions of several different znodes before set one znode, and if the client want to set other znode, it can call this API repeatedly. Because we only set one node by this API, the result will be straight, success or failure. We need not take care of the half-success result. How do ur guys think about this API? -- With Regards! Ye, Qian
Re: need for more conditional write support
Hi Ted: The solution you mentioned works in some situation, but not mine. Because, in the third step, after you checking the condition on B and C, the value on B and C still might be modified before you update value on A. The key point is that with the current ZK primitives, you cannot lock node A when u are updating node B. A possible solution based on current ZK primitives for this scenario is that create a extra node for each data node to play as the lock. So the update on A can be protected by this kind of lock. However, this implementation will bring in much complexity. For example, how to prevent deadlock in some abnormal situations. What's more, I think this kind of conditional write support is simpler than multiple transactions. Multiple transactions can be built with this kind of support. The link is broken? http://www.mail-archive.com/zookeeper-...@hadoop.apache.org/msg08315.html On Fri, Dec 10, 2010 at 7:33 AM, Ted Dunning ted.dunn...@gmail.com wrote: Qian, Depending on your situation, you can implement something like this now with the ZK primitives. In particular, - get the current version v_a of A - test the values of B and C - if the condition on B and C is met, update A with required version v_a You may want to retry the whole thing if you get an exception on the update of A. This does a safe test and set operation, but does not allow for the potential of atomically updating multiple znodes in one operation. A special case solution to that is to put all objects that may need to be updated together in the same znode content. That is clearly not a general solution, but it is often possible. On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye yeqian@gmail.com wrote: Hi all: I'm working on a distributed system these days, and need more conditional write support on Zookeeper. Now the zookeeper only support modifing, delete or set, node data with a version number represent the current version of the node. I need modification on the condition of other nodes. For e.g. I want to set the node data of /node to A, if the node data of /node1 is B and the node data of /node2 is C. Should we support this kind of interface? thanks -- With Regards! Ye, Qian -- With Regards! Ye, Qian
Re: need for more conditional write support
Hi Qian, There have been discussions on multiple transaction in zookeeper: http://www.mail-archive.com/zookeeper-...@hadoop.apache.org/msg08315.html Which I think might help this case. There have been some discussions on this lately but not much progress. You can start a discussion on the list to see what folks feel about multiple transactions. I think if we are to support something like what you mention it should be via multiple transactions wherein a success would mean a success for all the transactions that are part of a multiple transaction. Thanks mahadev On 12/9/10 4:19 AM, Qian Ye yeqian@gmail.com wrote: Hi all: I'm working on a distributed system these days, and need more conditional write support on Zookeeper. Now the zookeeper only support modifing, delete or set, node data with a version number represent the current version of the node. I need modification on the condition of other nodes. For e.g. I want to set the node data of /node to A, if the node data of /node1 is B and the node data of /node2 is C. Should we support this kind of interface? thanks -- With Regards! Ye, Qian