I created a bug but I don't see a way to assign it to myself (or anyone actually). Here's the link: https://issues.apache.org/jira/browse/ZOOKEEPER-1011.
semih On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira <[email protected]> wrote: > Hi Semih, Jira is the system we use to report and discuss zookeeper issues: > > https://issues.apache.org/jira/browse/ZOOKEEPER > > Once you have an account, you can create a new issue, describe it, and > propose a fix to the problem at hand. > > -Flavio > > On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote: > > Sure, I'll get to it this weekend probably. > > I don't know what jira is so some information of how to do this would be > very helpful. > > Thank you, > > semih > > On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <[email protected]> wrote: > >> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <[email protected]>wrote: >> >>> I believe the goal of the examples was never to be a complete solutions >>> to barriers or queues, but just to give a quick bootstrap to beginners. It >>> is true, though, that the documentation page does not make that claim, and >>> can be misleading. >>> >>> I see two possible action points out of this discussion: >>> 1- State clearly in the beginning that the example discussed is not >>> correct under the assumption that a process may finish the computation >>> before another has started, and the example is there for illustration >>> purposes; >>> 2- Have another example following the current one that discusses the >>> problem and shows how to fix it. This is an interesting option that >>> illustrates how one could reason about a solution when developing with >>> zookeeper. >>> >>> >> This (2) sounds much better to me. Semih, would you like to give that a >> try? (updating the docs I mean) >> >> Patrick >> >> >>> If you are interested in helping us fix it, Semih, then you could perhaps >>> create a jira and assign yourself to fix it. I can help you out. >>> >>> -Flavio >>> >>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: >>> >>> Hi Mahadev, >>> >>> Sorry for the late response. I agree, actually in this other >>> documentation >>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there >>> is >>> only the pseudo-code, I think this situation is avoided. Here there is >>> another znode /ready that all nodes have a watch on. And after each node >>> writes their own ephemeral child, they don't wait. They read how many of >>> has >>> been written and the last one writes the /ready znode and everyone wakes >>> up. >>> The only race condition in this one is that there can be two nodes trying >>> to >>> write /ready and only one of them will succeed but this is ok. >>> >>> Thank you again, >>> >>> semih >>> >>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <[email protected]> >>> wrote: >>> >>> Semih, >>> >>> You pointed it out right. It is possible ot enter into a situation >>> >>> like that. The recipe does have a bug. It can be fixed with the last >>> >>> client creating a special znode and every node in the list watching >>> >>> for that (so itll be an indication for entering the barrier). no? >>> >>> >>> thanks >>> >>> mahadev >>> >>> >>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <[email protected]> >>> >>> wrote: >>> >>> Hi All, >>> >>> >>> I am new to this group and to ZooKeeper. I was readin the Barrier >>> >>> tutorial >>> >>> in one of the ZooKeeper documentations. >>> >>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . >>> >>> A >>> >>> barrier primitive is exactly how I want to use ZooKeeper. I have a >>> >>> question >>> >>> about this example. It's not really a ZooKeeper question, it's more a >>> >>> question about the Barrier primitive I think. Here it is: In the enter >>> >>> method of this Barrier implementation below >>> >>> >>> boolean enter() throws KeeperException, InterruptedException{ >>> >>> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE, >>> >>> CreateMode.EPHEMERAL_SEQUENTIAL); >>> >>> while (true) { >>> >>> synchronized (mutex) { >>> >>> List<String> list = zk.getChildren(root, true); >>> >>> >>> if (list.size() < size) { >>> >>> mutex.wait(); >>> >>> } else { >>> >>> return true; >>> >>> } >>> >>> } >>> >>> } >>> >>> } >>> >>> >>> could there be a race condition? Let's say there are two >>> >>> machines/nodes: node1 and node2 that will use this code to synchronize >>> >>> over ZK. Let's say the following steps take place: >>> >>> >>> >>> 1. node1 calls the zk.create method and then reads the number of >>> >>> children, and sees that it's 1 and starts waiting. >>> >>> 2. node2 calls the zk.create method (doesn't call the >>> >>> zk.getChildren method yet, let's say it's very slow) >>> >>> 3. node1 is notified that the number of children on the znode >>> >>> changed, it checks that the size is 2 so it leaves the barrier, it >>> >>> does its work and then leaves the barrier, deleting its node. >>> >>> 4. node2 calls zk.getChildren and because node1 has already left, >>> >>> it sees that the number of children is equal to 1. Since node1 will >>> >>> never enter the barrier again, it will keep waiting. >>> >>> >>> Could this scenario happen? If not, what is preventing this? I haven't >>> >>> copied the code piece that enters barrier-does work-leaves barrier. >>> >>> But in the link I pasted above, it's the barrierTest(String args[]) >>> >>> method. >>> >>> >>> Thank you very much in advance, >>> >>> >>> semih >>> >>> >>> >>> >>> *flavio* >>> *junqueira* >>> >>> research scientist >>> >>> [email protected] >>> direct +34 93-183-8828 >>> >>> avinguda diagonal 177, 8th floor, barcelona, 08018, es >>> phone (408) 349 3300 fax (408) 349 3301 >>> >>> >>> >> > > *flavio* > *junqueira* > > research scientist > > [email protected] > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > >
