[GitHub] [helix] pkuwm commented on a change in pull request #1526: Leverage zk paginated getChildren API for ZkClient to fetch a large number of children

GitBox Wed, 16 Dec 2020 22:43:16 -0800


pkuwm commented on a change in pull request #1526:
URL: https://github.com/apache/helix/pull/1526#discussion_r544836381




##########
File path: 
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +234,60 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {

Review comment:
       I am not sure if I fully understand your point. I guess your point is to 
reduce the places where `_getChildrenMethod` is referred. I just tried and 
realized that the code is not as clean as current way, because 
`lookupNonPaginatedGetChildren` is called twice. It won't achieve "the 
_getChildrenMethod is only referred in this method".
   
   Since it is an internal variable/cache, I think it's fine to refer it 
multiple places as current way.

##########
File path: 
zookeeper-api/src/main/java/org/apache/helix/zookeeper/constant/ZkSystemPropertyKeys.java
##########
@@ -63,4 +63,18 @@
 
   /** System property key for jute.maxbuffer */
   public static final String JUTE_MAXBUFFER = "jute.maxbuffer";
+
+  /**
+   * Setting this property to {@code true} in system properties will force 
Helix ZkClient to use
+   * the <b>non-paginated</b> {@code getChildren} API, no matter if zookeeper 
supports pagination
+   * or not.
+   * <p>
+   * Given both the zookeeper client and server support <b>paginated</b> 
{@code getChildren} API as
+   * a prerequisite, if set to {@code false}, it will enable Helix ZkClient's 
{@code getChildren}
+   * API to call zookeeper's <b>paginated</b> {@code getChildren} API.
+   * <p>
+   * The default value is {@code false}.
+   */
+  public static final String ZK_GETCHILDREN_PAGINATION_DISABLED =

Review comment:
       I think it's not only about internal debugging purpose. This option can 
be used to disable calling pagination if users want to. It can help roll back 
the pagination easily without users downgrade the zk lib or server. I think the 
flexibility is helpful. If there is a case that users still prefer to always 
use the non-paginated getChildren, this option can help.

##########
File path: 
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +234,60 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {
+    if (_getChildrenMethod != null) {
+      // Method is already cached.
+      return;
+    }
+
+    doLookUpGetChildrenMethod();
+
+    LOG.info("Pagination config {}={}, method to be invoked: {}",
+        ZkSystemPropertyKeys.ZK_GETCHILDREN_PAGINATION_DISABLED, 
GETCHILDREN_PAGINATION_DISABLED,
+        _getChildrenMethod.getName());
+  }
+
+  private void doLookUpGetChildrenMethod() {

Review comment:
       This is to clean up the code a bit. Ref discussion thread: 
https://github.com/apache/helix/pull/1526#discussion_r529066418
   

##########
File path: 
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -102,6 +112,7 @@ protected void reconnect(Watcher watcher) throws 
InterruptedException {
         LOG.debug("Creating new ZookKeeper instance to reconnect to " + 
_servers + ".");
         _zk = new ZooKeeper(_servers, _sessionTimeOut, watcher);
         prevZk.close();
+        _getChildrenMethod = null;

Review comment:
       Theoretically it could happen, though the possibility is almost 0. It’s 
like
   ```
   if (a == null) { a = xxx; }
   a.call();
   ```
   Between these two lines.
   NPE does not cause correctness issue, the operation just fails. NPE could 
also happen now: like zkclient is closed, another thread is still trying the zk 
operation, it could also get NPE, as zk = null in close()
   
   There are a few options to avoid it:
   
   1. lock the method reference. Obviously we don’t want it as it’s not nice. 
Would ask why only getChildren is locked but not other APIs. Kind of weird..
   2. Instead of setting method = null in reconnect, we do lookup() and assign 
a value to it, so null is avoided. This is something I considered before when 
comparing it with setting to null (I felt setting to null might be risky not a 
good practice at least..)
   3. Just don’t reset method in reconnect. The purpose of resetting is for 
client has pagination zk version first, and then zk server upgrades, so client 
can reset and detect server supports it then client can start to use 
pagination. But it could not cover all cases, like if servers can upgrade 
really fast without expiring client session, client can still not detect 
server’s upgrade. So I think the right way is, if we want to onboard 
pagination, always expect server upgrades first then client. Vice versa, 
pagination is not leveraged in helix client.
   
   2 is doable, but I prefer 3, because we always follow the rule: "if we want 
to onboard pagination, always expect server upgrades first then client".  2 is 
not transparent or consistent: sometimes it can detect and sometimes not.
   
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [helix] pkuwm commented on a change in pull request #1526: Leverage zk paginated getChildren API for ZkClient to fetch a large number of children

Reply via email to