kaisun2000 commented on a change in pull request #970:
URL: https://github.com/apache/helix/pull/970#discussion_r419899421
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java
##########
@@ -172,22 +193,52 @@ public int getRc() {
return _rc;
}
+ @Override
+ public void notifyCallers() {
+ LOG.warn("The callback {} has been cancelled.", this);
+ markOperationDone();
+ }
+
+ /**
+ * Additional callback handling.
+ */
abstract public void handle();
- }
- public static class ZkAsyncCallContext {
- private long _startTimeMilliSec;
- private int _bytes;
- private ZkClientMonitor _monitor;
- private boolean _isRead;
+ private void markOperationDone() {
+ synchronized (_isOperationDone) {
+ _isOperationDone.set(true);
+ _isOperationDone.notifyAll();
+ }
+ }
- public ZkAsyncCallContext(final ZkClientMonitor monitor, long
startTimeMilliSec, int bytes,
- boolean isRead) {
- _monitor = monitor;
- _startTimeMilliSec = startTimeMilliSec;
- _bytes = bytes;
- _isRead = isRead;
+ /**
+ * @param rc the return code
+ * @return true if the error is transient and the operation may succeed
when being retried.
+ */
+ private boolean needRetry(int rc) {
+ try {
+ switch (Code.get(rc)) {
+ /** Connection to the server has been lost */
+ case CONNECTIONLOSS:
+ /** The session has been expired by the server */
+ case SESSIONEXPIRED:
+ /** Session moved to another server, so operation is ignored */
Review comment:
These Aync call is normally for batch access from ZkBaseDataAccessor I
believe. Here, the idea is to not create ephemeral nodes because
SESSIONEXPIRED can be retry. Then we should probably fail ephemeral code
creating asyncly too, right?
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -183,6 +190,9 @@ protected ZkClient(IZkConnection zkConnection, int
connectionTimeout, long opera
_operationRetryTimeoutInMillis = operationRetryTimeout;
_isNewSessionEventFired = false;
+ _asyncCallRetryThread = new ZkAsyncRetryThread(zkConnection.getServers());
Review comment:
We should give name of this thread that can be tied to the ZkEvent
thread name. This way, when we debug it, we know the relation. Otherwise it
would be very hard to correlate and reason.
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java
##########
@@ -122,44 +121,66 @@ public void handle() {
}
/**
- * Default callback for zookeeper async api
+ * Default callback for zookeeper async api.
*/
- public static abstract class DefaultCallback {
- AtomicBoolean _lock = new AtomicBoolean(false);
- int _rc = -1;
+ public static abstract class DefaultCallback implements
CancellableZkAsyncCallback {
+ AtomicBoolean _isOperationDone = new AtomicBoolean(false);
+ int _rc = UNKNOWN_RET_CODE;
Review comment:
why change this value from -1 to 255?
##########
File path:
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java
##########
@@ -122,44 +121,66 @@ public void handle() {
}
/**
- * Default callback for zookeeper async api
+ * Default callback for zookeeper async api.
*/
- public static abstract class DefaultCallback {
- AtomicBoolean _lock = new AtomicBoolean(false);
- int _rc = -1;
+ public static abstract class DefaultCallback implements
CancellableZkAsyncCallback {
+ AtomicBoolean _isOperationDone = new AtomicBoolean(false);
+ int _rc = UNKNOWN_RET_CODE;
public void callback(int rc, String path, Object ctx) {
if (rc != 0 && LOG.isDebugEnabled()) {
LOG.debug(this + ", rc:" + Code.get(rc) + ", path: " + path);
}
- if (ctx != null && ctx instanceof ZkAsyncCallContext) {
- ZkAsyncCallContext zkCtx = (ZkAsyncCallContext) ctx;
- if (zkCtx._monitor != null) {
- if (zkCtx._isRead) {
- zkCtx._monitor.record(path, zkCtx._bytes, zkCtx._startTimeMilliSec,
- ZkClientMonitor.AccessType.READ);
- } else {
- zkCtx._monitor.record(path, zkCtx._bytes, zkCtx._startTimeMilliSec,
- ZkClientMonitor.AccessType.WRITE);
- }
- }
+ if (ctx != null && ctx instanceof ZkAsyncCallMonitorContext) {
+ ((ZkAsyncCallMonitorContext) ctx).recordAccess(path);
}
_rc = rc;
- handle();
- synchronized (_lock) {
- _lock.set(true);
- _lock.notify();
+ // If retry is requested by passing the retry callback context, do retry
if necessary.
+ if (needRetry(rc)) {
+ if (ctx != null && ctx instanceof ZkAsyncRetryCallContext) {
+ try {
+ if (((ZkAsyncRetryCallContext) ctx).requestRetry()) {
+ // The retry operation will be done asynchronously. Once it is
done, the same callback
+ // handler object shall be triggered to ensure the result is
notified to the right
+ // caller(s).
+ return;
+ } else {
+ LOG.warn(
+ "Cannot request to retry the operation. The retry request
thread may have been stopped.");
+ }
+ } catch (Throwable t) {
+ LOG.error("Failed to request to retry the operation.", t);
Review comment:
Need retry, retri-able context, but retry operation failed? What to do
here? Mark done, return some retriable RC value like CONNECTIONLOSS is not what
the customer expect to handle right?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]