Hi Folks, We are seeing the following in our logs on our Solr nodes after which Solr nodes go into multiple full GCs and eventually runs out of heap. We saw this ticket - https://issues.apache.org/jira/browse/SOLR-7338 - wondering that’s the one causing it. We are currently on 4.10.0
INFO - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher:got event WatchedEvent state:Expired type:None path:null path:null type:None INFO - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... INFO - 2015-06-17 08:06:28.166; org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - starting a new one... INFO - 2015-06-17 08:06:28.171; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper INFO - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher: got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper INFO - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper reestablished. INFO - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to ZooKeeper INFO - 2015-06-17 08:06:28.179; org.apache.solr.common.cloud.ConnectionManager; Connected:true WARN - 2015-06-17 08:06:28.179; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog coreNodeName=core_node2 INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; publishing core=category state=down collection=category INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; publishing core=category_shadow state=down collection=category_shadow INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; publishing core=rules_shadow state=down collection=rules_shadow INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; publishing core=rules state=down collection=rules INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; publishing core=catalog_shadow state=down collection=catalog_shadow INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; publishing core=catalog state=down collection=catalog INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.198; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. o wait for leader to see down state. WARN - 2015-06-17 08:07:51.188; org.apache.solr.cloud.ZkController; org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/rules_shadow/leader_elect/shard1/election at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:363) at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237) at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166) ERROR - 2015-06-17 08:07:51.190; org.apache.solr.common.SolrException; There was a problem finding the leader in zk:java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:928) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:914) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1514) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:386) at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237) at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166) INFO - 2015-06-17 08:07:51.220; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.240; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.258; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.274; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.284; org.apache.solr.cloud.ElectionContext; canceling election /overseer_elect/election/93424944611198761-<<<>>>>:8080_solr-n_0000000286 Any pointers here? Thanks, Sunil