[jira] [Created] (SOLR-14183) replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands

Chris M. Hostetter (Jira) Sat, 11 Jan 2020 15:24:30 -0800

Chris M. Hostetter created SOLR-14183:
-----------------------------------------


             Summary: replicas do not immediately/synchronously reflect 
state=RECOVERYING when recieving REQUESTRECOVERY commands
                 Key: SOLR-14183
                 URL: https://issues.apache.org/jira/browse/SOLR-14183
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Chris M. Hostetter


Spun off of SOLR-13486: Consider the following situation, which can occur in 
{{TestTlogReplayVsRecovery}}
 * healthy cluster, healthy shard with multiple replicas
 * network partition occurs, leader adds new documents
 * network partition is healed, leader is restarted
 * leader determines it should be leader again
 ** sends {{REQUESTRECOVERY}} to replicas
 ** leader marks itself as {{state=ACTIVE}}
 * client checks cluster status and sees all replicas are {{ACTIVE}}
 ** client assumes all replicas are far game for searching all documents
 ** *CLIENT FAILS TO FIND EXPECTED DOCUMENTS IF QUERYING NON-LEADER REPLICA*
 * asynchronously, non-leader replicas get around to {{doRecovery}}
 ** only now are non-leader replicas marking themselves as {{state=RECOVERING}}

----
I think we need to reconsider when replicas are marked {{state=RECOVERING}}, 
either doing it synchronously in {{CoreAdminOperation.REQUESTRECOVERY_OP}}, or 
letting the leader set it when the leader knows it needs to initiate recovery, 
so that the status is updated and available to clients (and tests) immediately.

Alternatively: we need a more comprehensive way for clients (and tests) to know 
if a shard is "healthy" then just checking the state of each replica (since 
setting {{state=RECOVERING}} isn't updated in real time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14183) replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands

Reply via email to