Are you using a PromotedToLock? Combined with a reasonable retry it should make failures almost never happen. You can also just set the number of retries to a huge number.
-Jordan On September 19, 2014 at 12:38:52 PM, Purshotam Shah ([email protected]) wrote: Hi Jordan, Same issue with Curator 2.5.0. I read one of your mail thread (can't find it now) where you said that DistributedAtomicLong is not guarantee to succeed in multithread env, we have to keep on trying until it succeed. Is that true? This is becoming bottleneck in our stress testing. When try to call DistributedAtomicLong concurrently from multiple thread (30 thread), we see few failures (with retry policy "ExponentialBackoffRetry(1000, 3))". What is the best way to guarantee that DistributedAtomicLong will always succeed? Thanks, Puru From: Purshotam Shah <[email protected]> Date: Wednesday, June 25, 2014 at 4:12 PM To: Jordan Zimmerman <[email protected]>, "[email protected]" <[email protected]> Subject: Re: DistributedAtomicLong fails in multithread env. Thanks. Looks like it’s working fine with Curator 2.5.0. Will do some more testing. Will respond if it fails with 2.5.0. Thanks, Puru. From: Jordan Zimmerman <[email protected]> Date: Tuesday, June 24, 2014 at 6:38 PM To: Purshotam Shah <[email protected]>, "[email protected]" <[email protected]> Subject: Re: DistributedAtomicLong fails in multithread env. This sounds like https://issues.apache.org/jira/browse/CURATOR-108 - Curator 2.5.0 added a new method, initialize(), to work around this issue. Please try that and let me know. -Jordan From: Purshotam Shah [email protected] Reply: [email protected][email protected] Date: June 24, 2014 at 8:20:18 PM To: [email protected][email protected] Subject: DistributedAtomicLong fails in multithread env. We are using DistributedAtomicLong to use job sequenceID in ZK. We noticed that getZKId in multithread env fails. value.preValue() and value.postValue() value = 0 and succeeded = false. If we synchronized the function it works fine, but I don't think it's a right approach. Other approach is to retry multiple time, but how many times. We need to make sure that getZKId return sequence. What is the best approach? DistributedAtomicLong atomicIdGenerator; PromotedToLock.Builder lockBuilder = PromotedToLock.builder() .lockPath(getPromotedLock()).retryPolicy(ZKUtils.getRetryPloicy()) .timeout(Service.lockTimeout, TimeUnit.MILLISECONDS); atomicIdGenerator = new DistributedAtomicLong(zk.getClient(), ZK_SEQUENCE_PATH, ZKUtils.getRetryPloicy(), lockBuilder.build()); private long getZKId( ) { if (atomicIdGenerator == null) { throw new RuntimeException("Sequence generator can't be null. Path : " + ZK_SEQUENCE_PATH); } AtomicValue<Long> value = null; try { value = atomicIdGenerator.increment(); } catch (Exception e) { throw new RuntimeException("Exception incrementing UID for session ", e); } finally { if (value != null && value.succeeded()) { return value.preValue(); } else { throw new RuntimeException("Exception incrementing UID for session "); } } }
