I am not this is directly related but we also sometimes see clients losing 
connections on 6.5.1, this with the problem described below are unique to 
6.5.1, i have not seen this many issues with cloud in a short time for a very 
long time. 

2017-05-09 21:30:36.661 ERROR (Document compiler) [c:logs s:shard1 r:core_node1 
x:logs_shard1_replica1] o.a.s.c.s.i.CloudSolrClient Request to collection 
search failed due to (0) java.lang.IllegalStateException: Connection pool shut 
down, retry? 0

Clients appear unable to recover from this problem. The cloud the clients are 
connecting to is up and doing fine.

Any ideas?

Thanks,
Markus

 
 
-----Original message-----
> From:Markus Jelsma <markus.jel...@openindex.io>
> Sent: Monday 8th May 2017 11:35
> To: solr-user <solr-user@lucene.apache.org>
> Subject: 6.5.1. cloud went partially down
> 
> Hi,
> 
> Multiple 6.5.1. clouds / collections went down this weekend around the same 
> time, they share the same ZK quorum. The nodes stayed up but did not rejoin 
> the cluster (find or connect to ZK)
> 
> This is what the log told us:
> 
> 2017-05-06 18:58:34.893 WARN  
> (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: ZooKe
> eperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Disconnected type:None path:null path: null 
> type: None
> 2017-05-06 18:58:34.893 WARN  
> (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:35.001 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
> ZooKeeperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Disconnected type:None path:null path: null 
> type: None
> 2017-05-06 18:58:35.010 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:45.360 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: 
> ZooKeeperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Expired type:None path:null path: null type: 
> None
> 2017-05-06 18:58:45.360 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. 
> Attempting to reconnect to recover relationship with ZooKeeper...
> 2017-05-06 18:58:45.380 WARN  
> (OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_0000000558)
>  [   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue 
> loop
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /overseer/queue
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
>         at 
> org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
>         at 
> org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
>         at 
> org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
>         at 
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
>         at 
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-05-06 18:58:45.381 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
> 2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer 
> could not read the data
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /overseer_elect/leader
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-05-06 18:58:46.453 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
> ZooKeeperConnection 
> Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
>  got event WatchedEvent state:Expired type:None path:null path: null type: 
> None
> 2017-05-06 18:58:46.453 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. 
> Attempting to reconnect to recover relationship with ZooKeeper...
> 2017-05-06 18:58:46.460 WARN  
> (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
> x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) 
> [c:search s:shard2 r:core_node6 x:search_shard2_replica3] 
> o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
> 2017-05-06 18:58:53.599 ERROR 
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.ZkController 
> :org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /live_nodes/idx6.example.org:8983_solr
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
>         at 
> org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
>         at 
> org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
>         at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
>         at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
>         at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 
> 2017-05-06 18:58:53.599 ERROR 
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper 
> failed:org.apache.solr.common.cloud.ZooKeeperException: 
>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:392)
>         at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
>         at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
>         at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /live_nodes/idx6.example.org:8983_solr
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
>         at 
> org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
>         at 
> org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
>         ... 10 more
> 2017-05-06 18:58:53.600 WARN  
> (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
> o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed
> 2017-05-06 18:58:57.052 ERROR (qtp1873653341-14807) [   ] 
> o.a.s.h.RequestHandlerBase 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /collections/search/state.json
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
>         at 
> org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
>         at 
> org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:321)
>         at 
> org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:102)
>         at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370)
>         at 
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
>         at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>         at 
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
>         at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
> 
> After that we occasionally see:
> 
> 2017-05-06 18:58:59.079 ERROR (qtp1873653341-14989) [   ] 
> o.a.s.s.HttpSolrCall 
> null:org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /collections/search/state.json
> 
> We executed a hard Solr restart to get stuff back up. Is this a known issue?
> 
> Thanks,
> Markus
> 

Reply via email to