Hi Jae; just letting you know that, using zookeeper 3.4.6 and curator 2.4.1, I could not verify your case in my environment. It would be nice If see this problem in my environment, How can I elaborate that?
After starting the application (using PathChildrenCacheListener) , I stop the zookeeper and 40 seconds after restart it. Application switch to RECONNECTED state after SUSPENDED state , reporting ConnectionLoss. (After 30 minutes checking logs, It did not go back to SUSPENDED state ,still connected and listening the children node changes.) java.io.IOException: An existing connection was forcibly closed by the remote host 08:40:34.464 [main-EventThread] INFO o.a.c.f.state.ConnectionStateManager - State change: SUSPENDED 08:40:34.473 [PathChildrenCache-0] ERROR o.a.c.f.r.cache.PathChildrenCache - 08:40:40.198 [CuratorFramework-0] WARN org.apache.curator.ConnectionState - Connection attempt unsuccessful after 2000 (greater than max timeout of 500). Resetting connection and trying again with a new connection. 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper - Closing session: 0x0 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn - Closing client for session: 0x0 08:40:42.344 [CuratorFramework-0] WARN org.apache.curator.ConnectionState - Connection attempt unsuccessful after 2146 (greater than max timeout of 500). Resetting connection and trying again with a new connection. 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState - reset 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper - Closing session: 0x0 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn - Closing client for session: 0x0 08:40:42.403 [CuratorFramework-0-SendThread(127.0.0.1:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 08:40:42.409 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss 08:40:42.410 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss 08:40:46.920 [CuratorFramework-0] WARN org.apache.curator.ConnectionState - Connection attempt unsuccessful after 1389 (greater than max timeout of 500). Resetting connection and trying again with a new connection. 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState - reset 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper - Closing session: 0x0 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn - Closing client for session: 0x0 08:41:14.303 [CuratorFramework-0-SendThread(0:0:0:0:0:0:0:1:2181)] DEBUG o.a.zookeeper.ClientCnxnSocketNIO - Ignoring exception during shutdown input java.net.SocketException: Socket is not connected Then After starting zookeeper instance Path Children Cache Continue to get updated 08:41:15.804 [CuratorFramework-0-EventThread] INFO o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED Regards. On 9 April 2014 18:55, Bae, Jae Hyeon <[email protected]> wrote: > Last night, I rolling-restarted zookeeper 3.4.5 to update configuration > and I saw curator-2.4.0 couldn't recover connection loss. > > ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2] > org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry > gave up > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > INFO 2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread] > org.apache.curator.framework.state.ConnectionStateManager: State change: > RECONNECTED > INFO 2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread] > org.apache.curator.framework.state.ConnectionStateManager: State change: > SUSPENDED > ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2] > org.apache.curator.framework.imps.CuratorFrameworkImpl: Background > exception was not retry-able or retry gave up > java.lang.NullPointerException > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191) > at > com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527) > at com.google.common.collect.Lists.transform(Lists.java:510) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > I am not sure this bug is on PathChildrenCache. > > I need to restart all instances using curator-2.4.0, which is really bad. > > Thank you > Best, Jae > -- Osman Sebati Çam https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam> http://osmanscam.blogspot.ie
