[jira] [Updated] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-1011:
--
Labels: pull-request-available  (was: )

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -
>
> Key: ZOOKEEPER-1011
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Semih Salihoglu
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
> while (true) {
> synchronized (mutex) {
> List list = zk.getChildren(root, true);
> if (list.size() < size) {
> mutex.wait();
> } else {
> return true;
> }
> }
> }
> }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>   
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

2018-08-11 Thread maoling (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maoling updated ZOOKEEPER-1011:
---
Priority: Major  (was: Trivial)

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -
>
> Key: ZOOKEEPER-1011
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Semih Salihoglu
>Assignee: maoling
>Priority: Major
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
> while (true) {
> synchronized (mutex) {
> List list = zk.getChildren(root, true);
> if (list.size() < size) {
> mutex.wait();
> } else {
> return true;
> }
> }
> }
> }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>   
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

2018-07-29 Thread maoling (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maoling updated ZOOKEEPER-1011:
---
Summary: fix Java Barrier Documentation example's race condition issue and 
polish up the Barrier Documentation  (was: Java Barrier Documentation example 
has a race condition issue)

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -
>
> Key: ZOOKEEPER-1011
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Semih Salihoglu
>Assignee: maoling
>Priority: Trivial
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
> while (true) {
> synchronized (mutex) {
> List list = zk.getChildren(root, true);
> if (list.size() < size) {
> mutex.wait();
> } else {
> return true;
> }
> }
> }
> }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>   
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)