More info - looks like a zookeeper node got deleted somehow.
NoNode for
/collections/UNCLASS_30DAYS/leaders/shard31/leader

I then made that node using solr zk mkroot, and now I get the error:

:org.apache.solr.common.SolrException: Error getting leader from zk for shard shard31
    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1299)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:1150)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:1081)
    at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187)     at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader props
    at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1346)     at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1310)
    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1266)
    ... 7 more
Caused by: java.lang.NullPointerException
    at org.apache.solr.common.util.Utils.fromJSON(Utils.java:239)
    at org.apache.solr.common.cloud.ZkNodeProps.load(ZkNodeProps.java:92)
    at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1328)
    ... 9 more

Can I manually enter information for the leader? How would I get that?

-Joe

On 5/30/2019 8:39 AM, Joe Obernberger wrote:
Hi All - I have a 40 node cluster that has been running great for a long while, but it all came down due to OOM.  I adjusted the parameters and restarted, but one shard with 3 replicas (all NRT) will not elect a leader.  I see messages like:

2019-05-30 12:35:30.597 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Sync replicas to http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/ 2019-05-30 12:35:30.597 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr START replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/, http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/] nUpdates=100 2019-05-30 12:35:30.651 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr  Received 100 versions from http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/ fingerprint:null 2019-05-30 12:35:30.652 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr  Our versions are too old. ourHighThreshold=1634891841359839232 otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112 otherHighest=1634892708023631872 2019-05-30 12:35:30.652 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr DONE. sync failed 2019-05-30 12:35:30.652 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate 2019-05-30 12:35:30.683 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:35:30.693 INFO  (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration. 2019-05-30 12:35:30.694 WARN (updateExecutor-3-thread-4-processing-n:elara:9100_solr x:UNCLASS_30DAYS_shard31_replica_n182 c:UNCLASS_30DAYS s:shard31 r:core_node185) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.RecoveryStrategy Stopping recovery for core=[UNCLASS_30DAYS_shard31_replica_n182] coreNodeName=[core_node185]

and

2019-05-30 12:25:39.522 INFO  (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 136ms 2019-05-30 12:25:39.672 INFO  (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas with higher term participated in leader election 2019-05-30 12:25:39.672 INFO  (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:25:39.677 INFO  (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration.

and

2019-05-30 12:26:39.820 INFO  (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas with higher term participated in leader election 2019-05-30 12:26:39.820 INFO  (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:26:39.826 INFO  (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration.

I've tried FORCELEADER, but it had no effect.  I also tried adding a shard, but that one didn't come up either.  The index is on HDFS.

Help!

-Joe

Reply via email to