Chris M. Hostetter created SOLR-14183: -----------------------------------------
Summary: replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands Key: SOLR-14183 URL: https://issues.apache.org/jira/browse/SOLR-14183 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter Spun off of SOLR-13486: Consider the following situation, which can occur in {{TestTlogReplayVsRecovery}} * healthy cluster, healthy shard with multiple replicas * network partition occurs, leader adds new documents * network partition is healed, leader is restarted * leader determines it should be leader again ** sends {{REQUESTRECOVERY}} to replicas ** leader marks itself as {{state=ACTIVE}} * client checks cluster status and sees all replicas are {{ACTIVE}} ** client assumes all replicas are far game for searching all documents ** *CLIENT FAILS TO FIND EXPECTED DOCUMENTS IF QUERYING NON-LEADER REPLICA* * asynchronously, non-leader replicas get around to {{doRecovery}} ** only now are non-leader replicas marking themselves as {{state=RECOVERING}} ---- I think we need to reconsider when replicas are marked {{state=RECOVERING}}, either doing it synchronously in {{CoreAdminOperation.REQUESTRECOVERY_OP}}, or letting the leader set it when the leader knows it needs to initiate recovery, so that the status is updated and available to clients (and tests) immediately. Alternatively: we need a more comprehensive way for clients (and tests) to know if a shard is "healthy" then just checking the state of each replica (since setting {{state=RECOVERING}} isn't updated in real time. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org