Hi All - I have a 40 node cluster that has been running great for a long
while, but it all came down due to OOM. I adjusted the parameters and
restarted, but one shard with 3 replicas (all NRT) will not elect a
leader. I see messages like:
2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.SyncStrategy Sync replicas to
http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/
2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182
url=http://elara:9100/solr START
replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/,
http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/] nUpdates=100
2019-05-30 12:35:30.651 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182
url=http://elara:9100/solr Received 100 versions from
http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/
fingerprint:null
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182
url=http://elara:9100/solr Our versions are too old.
ourHighThreshold=1634891841359839232
otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112
otherHighest=1634892708023631872
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182
url=http://elara:9100/solr DONE. sync failed
2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.SyncStrategy Leader's attempt to sync with shard failed, moving
to the next candidate
2019-05-30 12:35:30.683 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:35:30.693 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
2019-05-30 12:35:30.694 WARN
(updateExecutor-3-thread-4-processing-n:elara:9100_solr
x:UNCLASS_30DAYS_shard31_replica_n182 c:UNCLASS_30DAYS s:shard31
r:core_node185) [c:UNCLASS_30DAYS s:shard31 r:core_node185
x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.RecoveryStrategy Stopping
recovery for core=[UNCLASS_30DAYS_shard31_replica_n182]
coreNodeName=[core_node185]
and
2019-05-30 12:25:39.522 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS
s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 136ms
2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS
s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas
with higher term participated in leader election
2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS
s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:25:39.677 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS
s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
and
2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS
s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas
with higher term participated in leader election
2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS
s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContext There may be a better leader
candidate than us - going back into recovery
2019-05-30 12:26:39.826 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS
s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180]
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral
leader parent node, won't remove previous leader registration.
I've tried FORCELEADER, but it had no effect. I also tried adding a
shard, but that one didn't come up either. The index is on HDFS.
Help!
-Joe
- Solr 7.6.0 - won't elect leader Joe Obernberger
-