Re: need for more conditional write support

2010-12-20 Thread Ted Dunning
Ben,

Can you point me to the place where the queue is inspected to convert
updates into idempotent writes?

On Mon, Dec 20, 2010 at 6:56 PM, Qian Ye yeqian@gmail.com wrote:

 Hi all, have we reached any consensus on this issue? What's our next step
 about it?
 I'm looking forward to make use of this kind of feature.

 thanks~

 On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  One alternative is to simply specify -1 in the version list to avoid the
  version check for that one item.  That would allow
  the subset constraint to be retained as a valid semantic check for most
  situations and would allow
  a very explicit way to describe when you want to violate that constraint.
 
  On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright wrig...@gmail.com wrote:
 
   I'm not sure why (other than your syntax) you would require the second
   list (to update) to be a subset of the first (to test). There are
   plenty of situations where you may want to update one node based on
   the value of another (and test that the value hasn't changed before
   updating) but don't really care about the second node, and it would
   just be extra overhead to check it's current value. In fact, I think
   that was the OP's situation.
  
   -Dave
  
   On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning ted.dunn...@gmail.com
   wrote:
Yes.  This is isomorphic to my suggestion to allow null data.  We
  should
toss around many options to figure out which is the most congenial
  idiom.
 Yours is nice since it has two sets of parallel lists.
   
In java with optional arguments it would be possible to use a builder
   style
with optional arguments:
   
  zk.testVersions(node1, version1, node2, version2, ...)
  .updateData(node1, data1, node3, data3, ...)
   
I would tend to make it part of the contract that the nodes in the
  second
part be a subset of of the nodes in the first part.  The first method
   would
create an object packaging up the first set of args and the second
  method
would do the work.  Of course, this is just syntactic sugar for the
  more
list oriented version.
   
On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright wrig...@gmail.com
  wrote:
   
My recommendation would actually be a combination of the two which
offers the most flexibility:
   
zoo_multi_test_and_set(Liststring znodesToTest, Listint
 versions,
Liststring znodesToSet, Listbyte[] data)
   
...this specifies a list of nodes  versions to check, and if the
versions match, a list of nodes to set and the associated data.
This allows multiple scenarios, including setting nodes other than
 the
ones you are version checking, setting more nodes than you version
check, checking more nodes than you set, etc.
I don't think the implementation would be any harder than either of
  the
others.
   
-Dave
   
   
On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning 
 ted.dunn...@gmail.com
wrote:
 Well, I would just call the first method set.

 And I think that the second method is no easier to implement and
   probably
a
 bit less useful.

 The idea that the second might be almost as useful as the first is
 interesting however.  It probably
 means that we should allow some of the data elements to be null or
something
 to allow for testing
 versions but not setting data.

 On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye yeqian@gmail.com
   wrote:

 zoo_multi_test_and_set(Listint versions, Liststring znodes,
 Listbyte[] data)

 can solve the problem I mentioned before, and some relavant
 issues,
   like
 hard for programmers to use, as mentioned in mail-archive, should
  be
paid
 attention to. I think we can move small step first, that is,
  provide
 interface like

 zoo_multi_test_and_set(Listint versions, Liststring znodes,
   byte[]
 data, string znode)


 The API test versions of several different znodes before set one
   znode,
and
 if the client want to set other znode, it can call this API
   repeatedly.
 Because we only set one node by this API, the result will be
   straight,
 success or failure. We need not take care of the half-success
  result.

 How do ur guys think about this API?


   
   
  
 



 --
 With Regards!

 Ye, Qian



Re: need for more conditional write support

2010-12-20 Thread Benjamin Reed
are you guys going to put a limit on the size of the updates? can 
someone do an update over 50 znodes where data value is 500K, for example?


if there is a failure during the update, is it okay for just a subset of 
the znodes to be updated?


ben

On 12/20/2010 06:56 PM, Qian Ye wrote:

Hi all, have we reached any consensus on this issue? What's our next step
about it?
I'm looking forward to make use of this kind of feature.

thanks~

On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunningted.dunn...@gmail.com  wrote:


One alternative is to simply specify -1 in the version list to avoid the
version check for that one item.  That would allow
the subset constraint to be retained as a valid semantic check for most
situations and would allow
a very explicit way to describe when you want to violate that constraint.

On Thu, Dec 16, 2010 at 10:06 AM, Dave Wrightwrig...@gmail.com  wrote:


I'm not sure why (other than your syntax) you would require the second
list (to update) to be a subset of the first (to test). There are
plenty of situations where you may want to update one node based on
the value of another (and test that the value hasn't changed before
updating) but don't really care about the second node, and it would
just be extra overhead to check it's current value. In fact, I think
that was the OP's situation.

-Dave

On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunningted.dunn...@gmail.com
wrote:

Yes.  This is isomorphic to my suggestion to allow null data.  We

should

toss around many options to figure out which is the most congenial

idiom.

  Yours is nice since it has two sets of parallel lists.

In java with optional arguments it would be possible to use a builder

style

with optional arguments:

   zk.testVersions(node1, version1, node2, version2, ...)
   .updateData(node1, data1, node3, data3, ...)

I would tend to make it part of the contract that the nodes in the

second

part be a subset of of the nodes in the first part.  The first method

would

create an object packaging up the first set of args and the second

method

would do the work.  Of course, this is just syntactic sugar for the

more

list oriented version.

On Thu, Dec 16, 2010 at 8:16 AM, Dave Wrightwrig...@gmail.com

wrote:

My recommendation would actually be a combination of the two which
offers the most flexibility:

zoo_multi_test_and_set(Liststring  znodesToTest, Listint  versions,
Liststring  znodesToSet, Listbyte[]  data)

...this specifies a list of nodes  versions to check, and if the
versions match, a list of nodes to set and the associated data.
This allows multiple scenarios, including setting nodes other than the
ones you are version checking, setting more nodes than you version
check, checking more nodes than you set, etc.
I don't think the implementation would be any harder than either of

the

others.

-Dave


On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunningted.dunn...@gmail.com
wrote:

Well, I would just call the first method set.

And I think that the second method is no easier to implement and

probably

a

bit less useful.

The idea that the second might be almost as useful as the first is
interesting however.  It probably
means that we should allow some of the data elements to be null or

something

to allow for testing
versions but not setting data.

On Tue, Dec 14, 2010 at 11:21 PM, Qian Yeyeqian@gmail.com

wrote:

zoo_multi_test_and_set(Listint  versions, Liststring  znodes,
Listbyte[]  data)

can solve the problem I mentioned before, and some relavant issues,

like

hard for programmers to use, as mentioned in mail-archive, should

be

paid

attention to. I think we can move small step first, that is,

provide

interface like

zoo_multi_test_and_set(Listint  versions, Liststring  znodes,

byte[]

data, string znode)


The API test versions of several different znodes before set one

znode,

and

if the client want to set other znode, it can call this API

repeatedly.

Because we only set one node by this API, the result will be

straight,

success or failure. We need not take care of the half-success

result.

How do ur guys think about this API?








Re: need for more conditional write support

2010-12-16 Thread Ted Dunning
My thought is that this should be handled by limiting the *total* size of
all updates in one call to be limited the same way that *single* update
sizes are limited now.

On Thu, Dec 16, 2010 at 11:04 AM, Henry Robinson he...@cloudera.com wrote:

 This should be a cautionary note on performance, however: as there is no
 parallelism in the execution of updates (although there is plenty in the
 serialisation process) we should build a mechanism to constrain how much
 work this operation can perform, otherwise there's a danger of hurting
 throughput for all clients of a cluster.



Re: need for more conditional write support

2010-12-15 Thread Qian Ye
I think the second is easier to implement because it only updates data on
one node, and need not handle rollback in the case of some update failure
after some update success.

Well, I agree that the API
zoo_multi_test_and_set(Listint versions, Liststring znodes_test,
Listbyte[] data, Liststring znodes_set);

is more powerful. If we can decide to do this, I think we could start to
discuss some implement details.


On Wed, Dec 15, 2010 at 11:50 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Well, I would just call the first method set.

 And I think that the second method is no easier to implement and probably a
 bit less useful.

 The idea that the second might be almost as useful as the first is
 interesting however.  It probably
 means that we should allow some of the data elements to be null or
 something
 to allow for testing
 versions but not setting data.

 On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye yeqian@gmail.com wrote:

  zoo_multi_test_and_set(Listint versions, Liststring znodes,
  Listbyte[] data)
 
  can solve the problem I mentioned before, and some relavant issues, like
  hard for programmers to use, as mentioned in mail-archive, should be paid
  attention to. I think we can move small step first, that is, provide
  interface like
 
  zoo_multi_test_and_set(Listint versions, Liststring znodes, byte[]
  data, string znode)
 
 
  The API test versions of several different znodes before set one znode,
 and
  if the client want to set other znode, it can call this API repeatedly.
  Because we only set one node by this API, the result will be straight,
  success or failure. We need not take care of the half-success result.
 
  How do ur guys think about this API?
 




-- 
With Regards!

Ye, Qian


Re: need for more conditional write support

2010-12-10 Thread Qian Ye
Hi Ted:

The solution you mentioned works in some situation, but not mine. Because,
in the third step, after you checking the condition on B and C, the value on
B and C still might be modified before you update value on A. The key point
is that with the current ZK primitives, you cannot lock node A when u are
updating node B.

A possible solution based on current ZK primitives for this scenario is that
create a extra node for each data node to play as the lock. So the update on
A can be protected by this kind of lock. However, this implementation will
bring in much complexity. For example, how to prevent deadlock in some
abnormal situations.

What's more, I think this kind of conditional write support is simpler than
multiple transactions. Multiple transactions can be built with this kind of
support. The link is broken?
http://www.mail-archive.com/zookeeper-...@hadoop.apache.org/msg08315.html


On Fri, Dec 10, 2010 at 7:33 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Qian,

 Depending on your situation, you can implement something like this now with
 the ZK primitives.

 In particular,

- get the current version v_a of A
- test the values of B and C
- if the condition on B and C is met, update A with required version v_a

 You may want to retry the whole thing if you get an exception on the update
 of A.

 This does a safe test and set operation, but does not allow for the
 potential of atomically updating multiple znodes in one operation.  A
 special case solution to that is to put all objects that may need to be
 updated together in the same znode content.  That is clearly not a general
 solution, but it is often possible.

 On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye yeqian@gmail.com wrote:

  Hi all:
 
  I'm working on a distributed system these days, and need more conditional
  write support on Zookeeper. Now the zookeeper only support modifing,
 delete
  or set, node data with a version number represent the current version of
  the
  node. I need modification on the condition of other nodes. For e.g. I
 want
  to set the node data of /node to A, if the node data of /node1 is B and
 the
  node data of /node2 is C. Should we support this kind of interface?
 
  thanks
  --
  With Regards!
 
  Ye, Qian
 




-- 
With Regards!

Ye, Qian


Re: need for more conditional write support

2010-12-09 Thread Mahadev Konar

Hi Qian,
   There have been discussions on multiple transaction in zookeeper:

 http://www.mail-archive.com/zookeeper-...@hadoop.apache.org/msg08315.html

Which I think might help this case. There have been some discussions on this
lately but not much progress. You can start a discussion on the list to see
what folks feel about multiple transactions. I think if we are to support
something like what you mention it should be via multiple transactions
wherein a success would mean a success for all the transactions that are
part of a multiple transaction.

Thanks
mahadev

On 12/9/10 4:19 AM, Qian Ye yeqian@gmail.com wrote:

 Hi all:
 
 I'm working on a distributed system these days, and need more conditional
 write support on Zookeeper. Now the zookeeper only support modifing, delete
 or set, node data with a version number represent the current version of the
 node. I need modification on the condition of other nodes. For e.g. I want
 to set the node data of /node to A, if the node data of /node1 is B and the
 node data of /node2 is C. Should we support this kind of interface?
 
 thanks
 --
 With Regards!
 
 Ye, Qian