Barrier Tutorial Possible Deadlock

Justin Bailey Sun, 08 May 2011 13:15:08 -0700

Hi,

I'm just learning ZK and want to make sure I am understanding everythingcorrectly. In the Barrier Tutorial, it seems like there is a racecondition that could cause a possible deadlock when the executed codewithin the barrier is short and one client has higher latency than another.

For example, say the number of process nodes required to startcomputation is 2.


1) Process 1 creates node, and enables children watcher.

2) Process 2 creates node and node creation fires watcher notificationto process 1.3) Process 2 retrieves children with list size 2, executes code, anddeletes node.4) Process 1 receives watcher notification from creation of node 2, andrequests children, whose size is now 1.5) Process 1 indefinitely waits for process 2's node to be created,while process 2 indefinitely waits for process 1's node to be deleted.

Are my assumptions of ZK's behavior correct? If so, I can't think ofany solutions that are both efficient and correct. The only correctsolutions I can think of either requires watches on all children, orsending children nodes and their data to processes multiple times basedon a parent data watch event.

To any developers out there, how difficult would it be to customize theZK code to both send data along with notifications and to have permanentwatchers? This would allow notifications for all changes to beguaranteed, sacrificing latency. Having both options would be analogousto having both TCP and UDP protocols available for use depending on theparticular requirements of the application.


Thanks,
Justin

Barrier Tutorial Possible Deadlock

Reply via email to