Hi Folks,

We are seeing the following in our logs on our Solr nodes after which Solr 
nodes go into multiple full GCs  and eventually runs out of heap. We saw this 
ticket - https://issues.apache.org/jira/browse/SOLR-7338 - wondering that’s the 
one causing it.  We are currently on 4.10.0

INFO  - 2015-06-17 08:06:28.163; 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@422f41e9 
name:ZooKeeperConnection Watcher:got event WatchedEvent state:Expired type:None 
path:null path:null type:None
INFO  - 2015-06-17 08:06:28.163; 
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session 
was expired. Attempting to reconnect to recover relationship with ZooKeeper...
INFO  - 2015-06-17 08:06:28.166; 
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - 
starting a new one...
INFO  - 2015-06-17 08:06:28.171; 
org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect 
to ZooKeeper
INFO  - 2015-06-17 08:06:28.177; 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@422f41e9 
name:ZooKeeperConnection Watcher: got event WatchedEvent state:SyncConnected 
type:None path:null path:null type:None
INFO  - 2015-06-17 08:06:28.177; 
org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO  - 2015-06-17 08:06:28.178; 
org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper 
reestablished.
INFO  - 2015-06-17 08:06:28.178; 
org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to ZooKeeper
INFO  - 2015-06-17 08:06:28.179; 
org.apache.solr.common.cloud.ConnectionManager; Connected:true
WARN  - 2015-06-17 08:06:28.179; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=category coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=category_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=rules_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=rules coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=catalog_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; 
Stopping recovery for core=catalog coreNodeName=core_node2
INFO  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; publishing 
core=category state=down collection=category
INFO  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; publishing 
core=category_shadow state=down collection=category_shadow
INFO  - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; publishing 
core=rules_shadow state=down collection=rules_shadow
INFO  - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; publishing 
core=rules state=down collection=rules
INFO  - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; publishing 
core=catalog_shadow state=down collection=catalog_shadow
INFO  - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; publishing 
core=catalog state=down collection=catalog
INFO  - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; numShards 
not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.198; org.apache.solr.cloud.ZkController; Replica 
core_node2 NOT in leader-initiated recovery, need to wait for leader to see 
down state.
o wait for leader to see down state.
WARN  - 2015-06-17 08:07:51.188; org.apache.solr.cloud.ZkController;
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /collections/rules_shadow/leader_elect/shard1/election
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287)
        at 
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:363)
        at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
        at 
org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
ERROR - 2015-06-17 08:07:51.190; org.apache.solr.common.SolrException; There 
was a problem finding the leader in zk:java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
        at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307)
        at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304)
        at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:928)
        at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:914)
        at 
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1514)
        at 
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:386)
        at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
        at 
org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)

INFO  - 2015-06-17 08:07:51.220; org.apache.solr.cloud.ZkController; Replica 
core_node2 NOT in leader-initiated recovery, need to wait for leader to see 
down state.
INFO  - 2015-06-17 08:07:51.240; org.apache.solr.cloud.ZkController; Replica 
core_node2 NOT in leader-initiated recovery, need to wait for leader to see 
down state.
INFO  - 2015-06-17 08:07:51.258; org.apache.solr.cloud.ZkController; Replica 
core_node2 NOT in leader-initiated recovery, need to wait for leader to see 
down state.
INFO  - 2015-06-17 08:07:51.274; org.apache.solr.cloud.ZkController; Replica 
core_node2 NOT in leader-initiated recovery, need to wait for leader to see 
down state.
INFO  - 2015-06-17 08:07:51.284; org.apache.solr.cloud.ElectionContext; 
canceling election 
/overseer_elect/election/93424944611198761-<<<>>>>:8080_solr-n_0000000286


Any pointers here?

Thanks,
Sunil

Reply via email to