Re: Question about the Barrier Java example on the ZooKeeper documentation

Semih Salihoglu Wed, 09 Mar 2011 01:56:15 -0800

I created a bug but I don't see a way to assign it to myself (or anyone
actually). Here's the link:
https://issues.apache.org/jira/browse/ZOOKEEPER-1011.


semih

On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira <[email protected]> wrote:

> Hi Semih, Jira is the system we use to report and discuss zookeeper issues:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER
>
> Once you have an account, you can create a new issue, describe it, and
> propose a fix to the problem at hand.
>
> -Flavio
>
> On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote:
>
> Sure, I'll get to it this weekend probably.
>
> I don't know what jira is so some information of how to do this would be
> very helpful.
>
> Thank you,
>
> semih
>
> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <[email protected]> wrote:
>
>> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <[email protected]>wrote:
>>
>>> I believe the goal of the examples was never to be a complete solutions
>>> to barriers or queues, but just to give a quick bootstrap to beginners. It
>>> is true, though, that the documentation page does not make that claim, and
>>> can be misleading.
>>>
>>> I see two possible action points out of this discussion:
>>> 1- State clearly in the beginning that the example discussed is not
>>> correct under the assumption that a process may finish the computation
>>> before another has started, and the example is there for illustration
>>> purposes;
>>> 2- Have another example following the current one that discusses the
>>> problem and shows how to fix it. This is an interesting option that
>>> illustrates how one could reason about a solution when developing with
>>> zookeeper.
>>>
>>>
>> This (2) sounds much better to me. Semih, would you like to give that a
>> try? (updating the docs I mean)
>>
>> Patrick
>>
>>
>>> If you are interested in helping us fix it, Semih, then you could perhaps
>>> create a jira and assign yourself to fix it. I can help you out.
>>>
>>> -Flavio
>>>
>>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>>>
>>> Hi Mahadev,
>>>
>>> Sorry for the late response. I agree, actually in this other
>>> documentation
>>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
>>> is
>>> only the pseudo-code, I think this situation is avoided. Here there is
>>> another znode /ready that all nodes have a watch on. And after each node
>>> writes their own ephemeral child, they don't wait. They read how many of
>>> has
>>> been written and the last one writes the /ready znode and everyone wakes
>>> up.
>>> The only race condition in this one is that there can be two nodes trying
>>> to
>>> write /ready and only one of them will succeed but this is ok.
>>>
>>> Thank you again,
>>>
>>> semih
>>>
>>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <[email protected]>
>>> wrote:
>>>
>>> Semih,
>>>
>>> You pointed it out right. It is possible ot enter into a situation
>>>
>>> like that. The recipe does have a bug. It can be fixed with the last
>>>
>>> client creating a special znode and every node in the list watching
>>>
>>> for that (so itll be an indication for entering the barrier). no?
>>>
>>>
>>> thanks
>>>
>>> mahadev
>>>
>>>
>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <[email protected]>
>>>
>>> wrote:
>>>
>>> Hi All,
>>>
>>>
>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>>
>>> tutorial
>>>
>>> in one of the ZooKeeper documentations.
>>>
>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>>>
>>> A
>>>
>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>>
>>> question
>>>
>>> about this example. It's not really a ZooKeeper question, it's more a
>>>
>>> question about the Barrier primitive I think. Here it is: In the enter
>>>
>>> method of this Barrier implementation below
>>>
>>>
>>> boolean enter() throws KeeperException, InterruptedException{
>>>
>>>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>>>
>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>
>>>           while (true) {
>>>
>>>               synchronized (mutex) {
>>>
>>>                    List<String> list = zk.getChildren(root, true);
>>>
>>>
>>>                    if (list.size() < size) {
>>>
>>>                       mutex.wait();
>>>
>>>                   } else {
>>>
>>>                       return true;
>>>
>>>                    }
>>>
>>>               }
>>>
>>>            }
>>>
>>>       }
>>>
>>>
>>> could there be a race condition? Let's say there are two
>>>
>>> machines/nodes: node1 and node2 that will use this code to synchronize
>>>
>>> over ZK. Let's say the following steps take place:
>>>
>>>
>>>
>>>  1. node1 calls the zk.create method and then reads the number of
>>>
>>> children, and sees that it's 1 and starts waiting.
>>>
>>>  2. node2 calls the zk.create method (doesn't call the
>>>
>>> zk.getChildren method yet, let's say it's very slow)
>>>
>>>  3. node1 is notified that the number of children on the znode
>>>
>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>>
>>> does its work and then leaves the barrier, deleting its node.
>>>
>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>>
>>> it sees that the number of children is equal to 1. Since node1 will
>>>
>>> never enter the barrier again, it will keep waiting.
>>>
>>>
>>> Could this scenario happen? If not, what is preventing this? I haven't
>>>
>>> copied the code piece that enters barrier-does work-leaves barrier.
>>>
>>> But in the link I pasted above, it's the barrierTest(String args[])
>>>
>>> method.
>>>
>>>
>>> Thank you very much in advance,
>>>
>>>
>>> semih
>>>
>>>
>>>
>>>
>>>   *flavio*
>>> *junqueira*
>>>
>>> research scientist
>>>
>>> [email protected]
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>>>
>>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> [email protected]
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Reply via email to