David Alves created KUDU-1703:
---------------------------------

             Summary: Handle lagging replicas for snapshot reads
                 Key: KUDU-1703
                 URL: https://issues.apache.org/jira/browse/KUDU-1703
             Project: Kudu
          Issue Type: Improvement
    Affects Versions: 1.1.0
            Reporter: David Alves


When we fix safe time advancement, replicas will start to block on snapshot 
scans for arbitrary amounts of time, waiting to have a consistent view of the 
world at that timestamp before serving the scan.

This will be a serious problem for lagging replicas, which might be several 
seconds or even minutes behind. Moreover in the absence of writes, the same 
will happen even for non-lagging replicas, which will have their safe times 
updated only when the leader heartbeats.

We need to at least make sure that:
- Blocked scanner threads are not starving other work.
- If the replica's safe time is lagging by a lot, the replica refuses to do the 
scan.

We might also consider other optimizations (like pinging the leader).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to