More info - looks like a zookeeper node got deleted somehow.
NoNode for
/collections/UNCLASS_30DAYS/leaders/shard31/leader
I then made that node using solr zk mkroot, and now I get the error:
:org.apache.solr.common.SolrException: Error getting leader from zk for
shard shard31
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1299)
at org.apache.solr.cloud.ZkController.register(ZkController.java:1150)
at org.apache.solr.cloud.ZkController.register(ZkController.java:1081)
at
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1346)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1310)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1266)
... 7 more
Caused by: java.lang.NullPointerException
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:239)
at org.apache.solr.common.cloud.ZkNodeProps.load(ZkNodeProps.java:92)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1328)
... 9 more
Can I manually enter information for the leader? How would I get that?
-Joe
On 5/30/2019 8:39 AM, Joe Obernberger wrote:
Hi All - I have a 40 node cluster that has been running great for a
long while, but it all came down due to OOM. I adjusted the
parameters and restarted, but one shard with 3 replicas (all NRT) will
not elect a leader. I see messages like:
2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Sync
replicas to http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/
2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync:
core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr
START
replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/,
http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/]
nUpdates=100
2019-05-30 12:35:30.651 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync:
core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr
Received 100 versions from
http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/
fingerprint:null
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync:
core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr
Our versions are too old. ourHighThreshold=1634891841359839232
otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112
otherHighest=1634892708023631872
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync:
core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr
DONE. sync failed
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Leader's
attempt to sync with shard failed, moving to the next candidate
2019-05-30 12:35:30.683 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:35:30.693 INFO (zkCallback-7-thread-3)
[c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
2019-05-30 12:35:30.694 WARN
(updateExecutor-3-thread-4-processing-n:elara:9100_solr
x:UNCLASS_30DAYS_shard31_replica_n182 c:UNCLASS_30DAYS s:shard31
r:core_node185) [c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.RecoveryStrategy
Stopping recovery for core=[UNCLASS_30DAYS_shard31_replica_n182]
coreNodeName=[core_node185]
and
2019-05-30 12:25:39.522 INFO (zkCallback-7-thread-1)
[c:UNCLASS_30DAYS s:shard31 r:core_node187
x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ActionThrottle
Throttling leader attempts - waiting for 136ms
2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1)
[c:UNCLASS_30DAYS s:shard31 r:core_node187
x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas
with higher term participated in leader election
2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1)
[c:UNCLASS_30DAYS s:shard31 r:core_node187
x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:25:39.677 INFO (zkCallback-7-thread-1)
[c:UNCLASS_30DAYS s:shard31 r:core_node187
x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
and
2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5)
[c:UNCLASS_30DAYS s:shard31 r:core_node183
x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas
with higher term participated in leader election
2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5)
[c:UNCLASS_30DAYS s:shard31 r:core_node183
x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:26:39.826 INFO (zkCallback-7-thread-5)
[c:UNCLASS_30DAYS s:shard31 r:core_node183
x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
I've tried FORCELEADER, but it had no effect. I also tried adding a
shard, but that one didn't come up either. The index is on HDFS.
Help!
-Joe