Folks any suggestion here ? On Thu, Mar 1, 2018 at 12:28 PM, Aaryan Reddy <aaryanreddy...@gmail.com> wrote:
> Hello All, > > I am running into frequent issue where the leader shard in solr cloud > stays active but does not acknowledge as "leader" . This brings down the > other replicas as they go into to recovery mode and eventually fail trying > to sync up. > > The error seen in "solr.log" is below: { this also similar to what is > shared in this email thread (https://www.mail-archive.com/ > solr-user@lucene.apache.org/msg127969.html) } > > This has consumed lot of time but have not been able get any direction > here . Any help will be appreciated > > Solr Version used : 5.5.2 { Comes packaged with HDP 2.5.3 } > The index are being stored on HDFS. > > ==error== > > completed with http://node06.test.net:8984/solr/TEST_COLLECTION2_shard >> 5_replica1/ >> 2018-02-21 20:41:10.148 INFO (zkCallback-5-thread-4294-processing-n: >> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1 >> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.SyncStrategy http://no >> de04.test.net:8984/solr/TEST_COLLECTION2_shard5_replica2/: sync >> completed with http://node17.test.net:8984/solr/TEST_COLLECTION2_shard >> 5_replica3/ >> 2018-02-21 20:41:10.149 INFO (zkCallback-5-thread-4294-processing-n: >> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1 >> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.ShardLeaderElectionContextBase >> Creating leader registration node /collections/TEST_COLLECTION2/ >> leaders/sh >> ard5/leader after winning as /collections/TEST_COLLECTION2/ >> leader_elect/shard5/election/171270658970051676-core_node16-n_0000001784 >> 2018-02-21 20:41:10.151 INFO (zkCallback-5-thread-4294-processing-n: >> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1 >> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.u.RetryUtil Retry due to >> Throwable, org.apache.zookeeper.KeeperException$NodeExistsException >> KeeperErrorCode >> = NodeExists >> 2018-02-21 20:41:10.498 ERROR >> (recoveryExecutor-3-thread-55-processing-s:shard10 >> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION >> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10 >> r:core_node59 x:TEST_COLLECTION_shard10_replica3] >> o.a.s.c.RecoveryStrategy Error while trying to recover. >> core=TEST_COLLECTION_shard10_replica3:org.apache.solr.common.SolrException: >> No registered leader was found after waiting for 4000ms , collection: >> TEST_COLLECTION slice: shard10 >> at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(Zk >> StateReader.java:626) >> at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(Zk >> StateReader.java:612) >> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt >> rategy.java:306) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy. >> java:222) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executor >> s.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE >> xecutor$1.run(ExecutorUtil.java:231) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1145) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> 2018-02-21 20:41:10.498 INFO >> (recoveryExecutor-3-thread-55-processing-s:shard10 >> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION >> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10 >> r:core_node59 x:TEST_COLLECTION_shard10_replica3] >> o.a.s.c.RecoveryStrategy Replay not started, or was not successful... still >> buffering updates. >> 2018-02-21 20:41:10.498 ERROR >> (recoveryExecutor-3-thread-55-processing-s:shard10 >> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION >> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10 >> r:core_node59 x:TEST_COLLECTION_shard10_replica3] >> o.a.s.c.RecoveryStrategy Recovery failed - trying again... (0) >> 2018-02-21 20:41:10.498 INFO >> (recoveryExecutor-3-thread-55-processing-s:shard10 >> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION >> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10 >> r:core_node59 x:TEST_COLLECTION_shard10_replica3] >> o.a.s.c.RecoveryStrategy Wait [2.0] seconds before trying to recover again >> (attempt=1) >> 2018-02-21 20:41:10.928 INFO (zkCallback-5-thread-4295-processing-n: >> node04.test.net:8984_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state >> change: [WatchedEvent state:SyncConnected type:NodeDataChanged >> path:/collections/TEST_COLLECTION3/state.json] for collection >> [TEST_COLLECTION3] has occurred - updating... (live nodes size: [17]) >> 2018-02-21 20:41:10.928 INFO (zkCallback-5-thread-4293-processing-n: >> node04.test.net:8984_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state >> change: [WatchedEvent state:SyncConnected type:NodeDataChanged >> path:/collections/TEST_COLLECTION3/state.json] for collection >> [TEST_COLLECTION3] has occurred - updating... (live nodes size: [17]) > > > Thank You, > Aaryan > >