On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <[email protected]> wrote:
> I believe the goal of the examples was never to be a complete solutions to > barriers or queues, but just to give a quick bootstrap to beginners. It is > true, though, that the documentation page does not make that claim, and can > be misleading. > > I see two possible action points out of this discussion: > 1- State clearly in the beginning that the example discussed is not correct > under the assumption that a process may finish the computation before > another has started, and the example is there for illustration purposes; > 2- Have another example following the current one that discusses the > problem and shows how to fix it. This is an interesting option that > illustrates how one could reason about a solution when developing with > zookeeper. > > This (2) sounds much better to me. Semih, would you like to give that a try? (updating the docs I mean) Patrick > If you are interested in helping us fix it, Semih, then you could perhaps > create a jira and assign yourself to fix it. I can help you out. > > -Flavio > > On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: > > Hi Mahadev, > > Sorry for the late response. I agree, actually in this other documentation > http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there > is > only the pseudo-code, I think this situation is avoided. Here there is > another znode /ready that all nodes have a watch on. And after each node > writes their own ephemeral child, they don't wait. They read how many of > has > been written and the last one writes the /ready znode and everyone wakes > up. > The only race condition in this one is that there can be two nodes trying > to > write /ready and only one of them will succeed but this is ok. > > Thank you again, > > semih > > On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <[email protected]> wrote: > > Semih, > > You pointed it out right. It is possible ot enter into a situation > > like that. The recipe does have a bug. It can be fixed with the last > > client creating a special znode and every node in the list watching > > for that (so itll be an indication for entering the barrier). no? > > > thanks > > mahadev > > > On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <[email protected]> > > wrote: > > Hi All, > > > I am new to this group and to ZooKeeper. I was readin the Barrier > > tutorial > > in one of the ZooKeeper documentations. > > http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . > > A > > barrier primitive is exactly how I want to use ZooKeeper. I have a > > question > > about this example. It's not really a ZooKeeper question, it's more a > > question about the Barrier primitive I think. Here it is: In the enter > > method of this Barrier implementation below > > > boolean enter() throws KeeperException, InterruptedException{ > > zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE, > > CreateMode.EPHEMERAL_SEQUENTIAL); > > while (true) { > > synchronized (mutex) { > > List<String> list = zk.getChildren(root, true); > > > if (list.size() < size) { > > mutex.wait(); > > } else { > > return true; > > } > > } > > } > > } > > > could there be a race condition? Let's say there are two > > machines/nodes: node1 and node2 that will use this code to synchronize > > over ZK. Let's say the following steps take place: > > > > 1. node1 calls the zk.create method and then reads the number of > > children, and sees that it's 1 and starts waiting. > > 2. node2 calls the zk.create method (doesn't call the > > zk.getChildren method yet, let's say it's very slow) > > 3. node1 is notified that the number of children on the znode > > changed, it checks that the size is 2 so it leaves the barrier, it > > does its work and then leaves the barrier, deleting its node. > > 4. node2 calls zk.getChildren and because node1 has already left, > > it sees that the number of children is equal to 1. Since node1 will > > never enter the barrier again, it will keep waiting. > > > Could this scenario happen? If not, what is preventing this? I haven't > > copied the code piece that enters barrier-does work-leaves barrier. > > But in the link I pasted above, it's the barrierTest(String args[]) > > method. > > > Thank you very much in advance, > > > semih > > > > > *flavio* > *junqueira* > > research scientist > > [email protected] > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > >
