pkuwm commented on a change in pull request #1109:
URL: https://github.com/apache/helix/pull/1109#discussion_r449394344
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -984,11 +997,50 @@ private void fireAllEvents() {
protected List<String> getChildren(final String path, final boolean watch) {
long startT = System.currentTimeMillis();
+
try {
List<String> children = retryUntilConnected(new Callable<List<String>>()
{
+ private int connectionLossRetryCount = 0;
+
@Override
public List<String> call() throws Exception {
- return getConnection().getChildren(path, watch);
+ try {
+ return getConnection().getChildren(path, watch);
+ } catch (ConnectionLossException e) {
+ ++connectionLossRetryCount;
+ // Allow retrying 3 times before checking stat checking number of
children,
+ // because there is a higher possibility that connection loss is
caused by other
+ // factors such as network connectivity, connected ZK node could
not serve
+ // the request, session expired, etc.
+ if (connectionLossRetryCount >= 3) {
+ // Issue: https://github.com/apache/helix/issues/962
+ // Connection loss might be caused by an excessive number of
children.
+ // Infinitely retrying connecting may cause high GC in ZK server
and kill ZK server.
+ // This is a workaround to check numChildren to have a chance to
exit retry loop.
+ // TODO: remove this check once we have a better way to exit
infinite retry
+ Stat stat = getStat(path);
Review comment:
I also thought about it. It would have nested retryUntilConnected. The
benefit is it would retry to get the stat if there is connection loss for
getStat(). If we use native exists() and there is connection loss, it would not
retry and instead go back to getChildren() again.
The potential issue could be the retry timeout, because new
retryUntilConnected() has a new start time. So getState() would potentially
retry longer than the configured retry timeout for getChildren().
I don't have a strong preference. Seems you prefer native exits(), I'll
change it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]