Hi Curator team,
We've three retry related questions. 1. We're trying to decide, which retry policy we should set. Our desired behavior is to retry until succeeded with an exponential back-off up to a max limit of wait. However current ExponentialBackoffRetry implementation doesn't allow having an unbounded number of retries. I've found the change[1] for adding maximum number of retries to ExponentialBackoffRetry, but it suggests that the reason was integer overflow. I'm happy to write my own policy, but do you know any reason not to allow unbounded number of retries? 2. Another issue we've faced is that our users might not always set the ACL entries correctly on the nodes and because of this they receive NOAUTH errors. We're using PersistentEphemeralNode and PathChildrenCache recipes and the behavior we'd like is to retry (with an exponential back-off) until the ACLs are corrected. However none of the mentioned recipes retries on NO_AUTH error. A possible solution would be to configure the CuratorFramework to retry on NOAUTH code, but the retriable result codes are hard coded in RetryLoop. As a feature request can the retriable result codes can be made configurable via the CuratorFramework. The solution we've tried is to add a new field to CuratorFrameworkImpl, which is a Set of KeeperException.Code and initialize it through the builder. At CuratorFrameworkImpl#processBackgroundOperation in the condition for retrying we've also tested whether the result code is in the Set. This way we're able to retry with an exponential back-off for NOAUTH result codes. 3. During my investigation with the retry policy it occurred to me that the SharedValue recipe reads the value of the node synchronously when a watch event is triggered. However it doesn't check the keeper state and it sends the request even, when the state is "Disconnected". This'll block the zookeeper event thread until the request's retries are exhausted, which could be quite long based on the retry policy in use and it delays the delivery of the disconnect event to other listeners. I think in this case the request might be not sent if disconnected and sent, when a reconnect even arrives or send the read asynchronously. Any advice is appreciated. Kind regards, Zoltan Szekeres [1] https://github.com/Netflix/curator/commit/3c1b1b4dbf256e318b803e7bbcc2a3dcd2b88619 ________________________________ NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies; do not disclose, use or act upon the information; and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
