Re: Solr Collection Loosing Leader

Aaryan Reddy Wed, 07 Mar 2018 04:54:36 -0800

Folks any suggestion here ?

On Thu, Mar 1, 2018 at 12:28 PM, Aaryan Reddy <aaryanreddy...@gmail.com>
wrote:


> Hello All,
>
> I am running into frequent issue where the leader shard in solr cloud
> stays active but does not acknowledge as "leader" . This brings down the
> other replicas as they go into to recovery mode and eventually fail trying
> to sync up.
>
> The error seen in "solr.log" is below: { this also similar to what is
> shared in this email thread (https://www.mail-archive.com/
> solr-user@lucene.apache.org/msg127969.html) }
>
> This has consumed lot of time but have not been able get any direction
> here . Any help will be appreciated
>
> Solr Version used : 5.5.2 { Comes packaged with HDP 2.5.3 }
> The index are being stored on HDFS.
>
> ==error==
>
> completed with http://node06.test.net:8984/solr/TEST_COLLECTION2_shard
>> 5_replica1/
>> 2018-02-21 20:41:10.148 INFO  (zkCallback-5-thread-4294-processing-n:
>> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1
>> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.SyncStrategy http://no
>> de04.test.net:8984/solr/TEST_COLLECTION2_shard5_replica2/:  sync
>> completed with http://node17.test.net:8984/solr/TEST_COLLECTION2_shard
>> 5_replica3/
>> 2018-02-21 20:41:10.149 INFO  (zkCallback-5-thread-4294-processing-n:
>> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1
>> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.ShardLeaderElectionContextBase
>> Creating leader registration node /collections/TEST_COLLECTION2/
>> leaders/sh
>> ard5/leader after winning as /collections/TEST_COLLECTION2/
>> leader_elect/shard5/election/171270658970051676-core_node16-n_0000001784
>> 2018-02-21 20:41:10.151 INFO  (zkCallback-5-thread-4294-processing-n:
>> node04.test.net:8984_solr) [c:TEST_COLLECTION2 s:shard5 r:core_node1
>> 6 x:TEST_COLLECTION2_shard5_replica2] o.a.s.c.u.RetryUtil Retry due to
>> Throwable, org.apache.zookeeper.KeeperException$NodeExistsException
>> KeeperErrorCode
>> = NodeExists
>> 2018-02-21 20:41:10.498 ERROR 
>> (recoveryExecutor-3-thread-55-processing-s:shard10
>> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION
>> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10
>> r:core_node59 x:TEST_COLLECTION_shard10_replica3]
>> o.a.s.c.RecoveryStrategy Error while trying to recover.
>> core=TEST_COLLECTION_shard10_replica3:org.apache.solr.common.SolrException:
>> No registered leader was found after waiting for 4000ms , collection:
>> TEST_COLLECTION slice: shard10
>>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(Zk
>> StateReader.java:626)
>>         at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(Zk
>> StateReader.java:612)
>>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt
>> rategy.java:306)
>>         at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.
>> java:222)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> s.java:471)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor$1.run(ExecutorUtil.java:231)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:744)
>> 2018-02-21 20:41:10.498 INFO  
>> (recoveryExecutor-3-thread-55-processing-s:shard10
>> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION
>> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10
>> r:core_node59 x:TEST_COLLECTION_shard10_replica3]
>> o.a.s.c.RecoveryStrategy Replay not started, or was not successful... still
>> buffering updates.
>> 2018-02-21 20:41:10.498 ERROR 
>> (recoveryExecutor-3-thread-55-processing-s:shard10
>> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION
>> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10
>> r:core_node59 x:TEST_COLLECTION_shard10_replica3]
>> o.a.s.c.RecoveryStrategy Recovery failed - trying again... (0)
>> 2018-02-21 20:41:10.498 INFO  
>> (recoveryExecutor-3-thread-55-processing-s:shard10
>> x:TEST_COLLECTION_shard10_replica3 c:TEST_COLLECTION
>> n:node04.test.net:8984_solr r:core_node59) [c:TEST_COLLECTION s:shard10
>> r:core_node59 x:TEST_COLLECTION_shard10_replica3]
>> o.a.s.c.RecoveryStrategy Wait [2.0] seconds before trying to recover again
>> (attempt=1)
>> 2018-02-21 20:41:10.928 INFO  (zkCallback-5-thread-4295-processing-n:
>> node04.test.net:8984_solr) [   ] o.a.s.c.c.ZkStateReader A cluster state
>> change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/TEST_COLLECTION3/state.json] for collection
>> [TEST_COLLECTION3] has occurred - updating... (live nodes size: [17])
>> 2018-02-21 20:41:10.928 INFO  (zkCallback-5-thread-4293-processing-n:
>> node04.test.net:8984_solr) [   ] o.a.s.c.c.ZkStateReader A cluster state
>> change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/TEST_COLLECTION3/state.json] for collection
>> [TEST_COLLECTION3] has occurred - updating... (live nodes size: [17])
>
>
> Thank You,
> Aaryan
>
>

Re: Solr Collection Loosing Leader

Reply via email to