Cao Manh Dat created SOLR-12166: ----------------------------------- Summary: Race condition in rejoinElection and registering replica Key: SOLR-12166 URL: https://issues.apache.org/jira/browse/SOLR-12166 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Cao Manh Dat Assignee: Cao Manh Dat
I found this case when beasting LIROnShardRestartTest, the case here is * ReplicaA may be the new leader - try and sync with other replicasĀ and somehow failed to become the leader (ex: LIR flag). * ReplicaA call rejoinElection, therefore, starting the recovery process * After rejoinElection, it somehow wins the election (ex: all replicas participated in the election, therefore LIR flag is cleared). * ReplicaA register itself as ACTIVE after winning the election * The recovery process above publish ReplicaA to DOWN or RECOVERY * We end up with a dead-end shard with a DOWN leader, hence other replicas can't do recovery with replicaA -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org