xyuanlu commented on code in PR #2805: URL: https://github.com/apache/helix/pull/2805#discussion_r1628558328
########## meta-client/src/main/java/org/apache/helix/metaclient/impl/zk/ZkMetaClient.java: ########## @@ -207,16 +208,55 @@ public void set(String key, T data, int version) { @Override public T update(String key, DataUpdater<T> updater) { - org.apache.zookeeper.data.Stat stat = new org.apache.zookeeper.data.Stat(); - // TODO: add retry logic for ZkBadVersionException. - try { - T oldData = _zkClient.readData(key, stat); - T newData = updater.update(oldData); - set(key, newData, stat.getVersion()); - return newData; - } catch (ZkException e) { - throw translateZkExceptionToMetaclientException(e); - } + return update(key, updater, false); + } + + @Override + public T update(String key, DataUpdater<T> updater, boolean retryOnFailure) { + final int MAX_RETRY_ATTEMPTS = 3; + int retryAttempts = 0; + boolean retry; + T updatedData = null; + do { + retry = false; + retryAttempts++; + try { + ImmutablePair<T, Stat> tup = getDataAndStat(key); + Stat stat = tup.right; + T oldData = tup.left; + T newData = updater.update(oldData); + set(key, newData, stat.getVersion()); + updatedData = newData; + } catch (MetaClientBadVersionException badVersionException) { + // If exceeded max retry attempts, re-throw exception + if (retryAttempts >= MAX_RETRY_ATTEMPTS) { + LOG.error("Failed to update node at " + key + " after " + MAX_RETRY_ATTEMPTS + " attempts."); + throw badVersionException; + } + // Retry on bad version + retry = true; + } catch (MetaClientNoNodeException noNodeException) { + // If node does not exist, attempt to create it - pass null to updater + T newData = updater.update(null); + if (newData != null) { + try { + create(key, newData); Review Comment: Thanks for creating the PR! Question about the retry logic here. For example, when user try to update node "/a/b/c", and there is no node "/a/b", set in line 228 with throw NoNodeException, and we reach line 243. Since there is no node "/a/b", create will also throw NoNodeException, and we keep retry until expire. Should we do recursive create instead? Or we just let it expire and forward the NoNodeException? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org