pkuwm commented on a change in pull request #1526:
URL: https://github.com/apache/helix/pull/1526#discussion_r544836381
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +234,60 @@ public String getServers() {
public void addAuthInfo(String scheme, byte[] auth) {
_zk.addAuthInfo(scheme, auth);
}
+
+ private void lookupGetChildrenMethod() {
Review comment:
I am not sure if I fully understand your point. I guess your point is to
reduce the places where `_getChildrenMethod` is referred. I just tried and
realized that the code is not as clean as current way, because
`lookupNonPaginatedGetChildren` is called twice. It won't achieve "the
_getChildrenMethod is only referred in this method".
Since it is an internal variable/cache, I think it's fine to refer it
multiple places as current way.
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/constant/ZkSystemPropertyKeys.java
##########
@@ -63,4 +63,18 @@
/** System property key for jute.maxbuffer */
public static final String JUTE_MAXBUFFER = "jute.maxbuffer";
+
+ /**
+ * Setting this property to {@code true} in system properties will force
Helix ZkClient to use
+ * the <b>non-paginated</b> {@code getChildren} API, no matter if zookeeper
supports pagination
+ * or not.
+ * <p>
+ * Given both the zookeeper client and server support <b>paginated</b>
{@code getChildren} API as
+ * a prerequisite, if set to {@code false}, it will enable Helix ZkClient's
{@code getChildren}
+ * API to call zookeeper's <b>paginated</b> {@code getChildren} API.
+ * <p>
+ * The default value is {@code false}.
+ */
+ public static final String ZK_GETCHILDREN_PAGINATION_DISABLED =
Review comment:
I think it's not only about internal debugging purpose. This option can
be used to disable calling pagination if users want to. It can help roll back
the pagination easily without users downgrade the zk lib or server. I think the
flexibility is helpful. If there is a case that users still prefer to always
use the non-paginated getChildren, this option can help.
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +234,60 @@ public String getServers() {
public void addAuthInfo(String scheme, byte[] auth) {
_zk.addAuthInfo(scheme, auth);
}
+
+ private void lookupGetChildrenMethod() {
+ if (_getChildrenMethod != null) {
+ // Method is already cached.
+ return;
+ }
+
+ doLookUpGetChildrenMethod();
+
+ LOG.info("Pagination config {}={}, method to be invoked: {}",
+ ZkSystemPropertyKeys.ZK_GETCHILDREN_PAGINATION_DISABLED,
GETCHILDREN_PAGINATION_DISABLED,
+ _getChildrenMethod.getName());
+ }
+
+ private void doLookUpGetChildrenMethod() {
Review comment:
This is to clean up the code a bit. Ref discussion thread:
https://github.com/apache/helix/pull/1526#discussion_r529066418
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -102,6 +112,7 @@ protected void reconnect(Watcher watcher) throws
InterruptedException {
LOG.debug("Creating new ZookKeeper instance to reconnect to " +
_servers + ".");
_zk = new ZooKeeper(_servers, _sessionTimeOut, watcher);
prevZk.close();
+ _getChildrenMethod = null;
Review comment:
Theoretically it could happen, though the possibility is almost 0. It’s
like
```
if (a == null) { a = xxx; }
a.call();
```
Between these two lines.
NPE does not cause correctness issue, the operation just fails. NPE could
also happen now: like zkclient is closed, another thread is still trying the zk
operation, it could also get NPE, as zk = null in close()
There are a few options to avoid it:
1. lock the method reference. Obviously we don’t want it as it’s not nice.
Would ask why only getChildren is locked but not other APIs. Kind of weird..
2. Instead of setting method = null in reconnect, we do lookup() and assign
a value to it, so null is avoided. This is something I considered before when
comparing it with setting to null (I felt setting to null might be risky not a
good practice at least..)
3. Just don’t reset method in reconnect. The purpose of resetting is for
client has pagination zk version first, and then zk server upgrades, so client
can reset and detect server supports it then client can start to use
pagination. But it could not cover all cases, like if servers can upgrade
really fast without expiring client session, client can still not detect
server’s upgrade. So I think the right way is, if we want to onboard
pagination, always expect server upgrades first then client. Vice versa,
pagination is not leveraged in helix client.
2 is doable, but I prefer 3, because we always follow the rule: "if we want
to onboard pagination, always expect server upgrades first then client". 2 is
not transparent or consistent: sometimes it can detect and sometimes not.
What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]