[GitHub] [helix] kaisun2000 commented on a change in pull request #1035: Fix leaking Zk path watch and Callbackhandler issue

GitBox Thu, 04 Jun 2020 01:31:26 -0700


kaisun2000 commented on a change in pull request #1035:
URL: https://github.com/apache/helix/pull/1035#discussion_r435079661




##########
File path: 
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1865,17 +1934,24 @@ public Object call() throws Exception {
    *         exist.
    */
   public List<String> watchForChilds(final String path) {
+    return watchForChilds(path, false);
+  }
+
+  private List<String> watchForChilds(final String path, boolean 
skipWatchingNodeNotExist) {
     if (_zookeeperEventThread != null && Thread.currentThread() == 
_zookeeperEventThread) {
       throw new IllegalArgumentException("Must not be done in the zookeeper 
event thread.");
     }
     return retryUntilConnected(new Callable<List<String>>() {
       @Override
       public List<String> call() throws Exception {
-        exists(path, true);
+        if (!skipWatchingNodeNotExist) {

Review comment:
       @dasahcc, this part is kind of complicated. Seems you are concerned that 
there are some insistent state. But fortunately, we can reason this way, which 
covers all the cases.
   
   "if node does not exist and skipWatchingNodeNotExist is true", in some case, 
we do need to install data watch if we expect the parent node is to be created. 
   
   Let us look at the life cycle of CallbackHandler (CB) to see all the cases.
   
   - INIT of CallbackHandler when parent node note existing. CB first installs 
data(exists) watch to this parent node. Then when the node is created, CB got 
notification of creation. Then CB will read all of the children (installing 
child watch at this time) and install data watch for each children. 
   
   - INIT of CallbackHandler when parent node exists. CB still first installs 
data watch to this parent node. And the use getData() to get the list of 
children and install data watch for each children. 
   
   Not the above two code paths are the same. They are started from CB's 
subscribeForChanges() --> subscribeChildChange(INIT) --> 
_zkClient.subscribeChildChanges( skipWatchingNodeNotExist as false) --> 
watchForChild(skipWatchingNodeNotExist as false)
   
   This would trigger both exists() and getData(). This is exactly what we 
want, because in the first case, you need exists() if the parent path does not 
exists as explained before. 
   
   Then your next question can be why for the second case I need exists() if 
the path does exist? The is because later, if the paren path is removed, you 
want to remove the CB from ZkHelixManager too. With data watch, you are 
guaranteed a notification of handleChildChange(). Here the 
`_manager.removeListener(_propertyKey, _listener);` code path would be trigger. 
This is exactly what happens when current state parent node moves due to new 
session established in participant code. 
   
   So above discussion addressed the basic life cycle of CallbackHandler. 
   
   The left case is about what happens when normal notification of CALLBACK 
type.
   This is triggered by adding or deleting a current state node under the 
parent path. At this time, the code path is: 
   
   subscribeForChanges() --> subscribeChildChange(CALLBACK) --> 
_zkClient.subscribeChildChanges( skipWatchingNodeNotExist as true) --> 
watchForChild(skipWatchingNodeNotExist as true)
   
   Here, watchForChild will not install data watch for parent node, only use 
getData to install child watch and return list of children. Then 
CallbackHandler would install data watch for each children. 
   
   So here, if you think about it, there is no in-consistant case of installing 
part of the parent child watch, but not child data watch. 
   
   Note, we should not install data watch to parent node here at CALLBACK stage 
for the reason that the removing of parent node is async to this whole process. 
That can be the source of leaking once the INIT phase passed, as I explained in 
the doc.
   
   The above discussion exhausts all the cases of CallbackHandler life cycle. 
Hope this would give people a clear idea of how we reason it. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [helix] kaisun2000 commented on a change in pull request #1035: Fix leaking Zk path watch and Callbackhandler issue

Reply via email to