shamil commented on issue #8332: Zk reconnection problem URL: https://github.com/apache/incubator-druid/issues/8332#issuecomment-524992148 I'm experiencing similar issue. Here are logs for historical as an example, but we see similar logs for the rest... When zookeeper is down we see this (which is legit) ``` java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) [zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0] ``` once zookeeper back we see this: ``` [Curator-Framework-0] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping [Curator-Framework-0] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper.service.staging.consul sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@5d352de0 [Curator-Framework-0-SendThread(ip-10-101-48-19.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10-101-48-19.ec2.internal/10.101.48.19:2181. Will not attempt to authenticate using SASL (unknown error) [Curator-Framework-0-SendThread(ip-10-101-48-19.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10-101-48-19.ec2.internal/10.101.48.19:2181, initiating session [Curator-Framework-0-SendThread(ip-10-101-48-19.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10-101-48-19.ec2.internal/10.101.48.19:2181, sessionid = 0x100f78841360001, negotiated timeout = 30000 [Curator-Framework-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/internal-discovery/HISTORICAL/10.101.48.68:8083] java.lang.NullPointerException at org.apache.curator.framework.imps.GzipCompressionProvider.compress(GzipCompressionProvider.java:115) ~[curator-framework-4.1.0.jar:4.1.0] at org.apache.druid.curator.PotentiallyGzippedCompressionProvider.compress(PotentiallyGzippedCompressionProvider.java:42) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating] at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:583) ~[curator-framework-4.1.0.jar:4.1.0] at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:49) ~[curator-framework-4.1.0.jar:4.1.0] at org.apache.druid.curator.announcement.Announcer.createAnnouncement(Announcer.java:371) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating] at org.apache.druid.curator.announcement.Announcer.access$100(Announcer.java:58) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating] at org.apache.druid.curator.announcement.Announcer$1.childEvent(Announcer.java:277) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating] at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:538) [curator-recipes-4.1.0.jar:4.1.0] at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:532) [curator-recipes-4.1.0.jar:4.1.0] at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.1.0.jar:4.1.0] at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) [curator-client-4.1.0.jar:?] at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-4.1.0.jar:4.1.0] at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:530) [curator-recipes-4.1.0.jar:4.1.0] at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-4.1.0.jar:4.1.0] at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) [curator-recipes-4.1.0.jar:4.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] 2019-08-26T15:42:12,267 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Ignored event type [CONNECTION_RECONNECTED] for nodeType watcher. 2019-08-26T15:42:12,269 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/listeners/lookups/__default/http:10.101.48.68:8083] 2019-08-26T15:42:12,270 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.452Z_c05c476f4b014c0c85dbf8ed8e4858933] 2019-08-26T15:42:12,271 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.392Z_ea0fcd4a0f854509b00c4527608b80c02] 2019-08-26T15:42:12,274 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[10.101.49.211:8081:DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/coordinator', host='10.101.49.211', bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='COORDINATOR', services={}}] disappeared. 2019-08-26T15:42:12,272 INFO [ZkCoordinator] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='10.101.48.68', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=300000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]. 2019-08-26T15:42:12,274 INFO [ZkCoordinator] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@14d1737a]. 2019-08-26T15:42:12,275 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - Stopping announcer 2019-08-26T15:42:12,273 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.168Z_143c5f157a4b4cf7a1497cf48e3d94c60] 2019-08-26T15:42:12,279 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.347Z_06ebd493e7b04579b9ad26e7f7ef6c421] 2019-08-26T15:42:12,280 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/10.101.48.68:8083] 2019-08-26T15:42:12,280 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.504Z_5badbd11f1f445eda094e6ce48e651024] 2019-08-26T15:42:12,283 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:10.101.48.68:8083] 2019-08-26T15:42:12,285 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.392Z_ea0fcd4a0f854509b00c4527608b80c02] 2019-08-26T15:42:12,287 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.347Z_06ebd493e7b04579b9ad26e7f7ef6c421] 2019-08-26T15:42:12,289 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.168Z_143c5f157a4b4cf7a1497cf48e3d94c60] 2019-08-26T15:42:12,290 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.452Z_c05c476f4b014c0c85dbf8ed8e4858933] 2019-08-26T15:42:12,292 INFO [ZkCoordinator] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.101.48.68:8083/10.101.48.68:8083_historical__default_tier_2019-08-21T08:45:24.504Z_5badbd11f1f445eda094e6ce48e651024] 2019-08-26T15:42:12,293 INFO [ZkCoordinator] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER] 2019-08-26T15:42:12,294 INFO [ZkCoordinator] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server... ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org