David Alves created KUDU-1703: --------------------------------- Summary: Handle lagging replicas for snapshot reads Key: KUDU-1703 URL: https://issues.apache.org/jira/browse/KUDU-1703 Project: Kudu Issue Type: Improvement Affects Versions: 1.1.0 Reporter: David Alves
When we fix safe time advancement, replicas will start to block on snapshot scans for arbitrary amounts of time, waiting to have a consistent view of the world at that timestamp before serving the scan. This will be a serious problem for lagging replicas, which might be several seconds or even minutes behind. Moreover in the absence of writes, the same will happen even for non-lagging replicas, which will have their safe times updated only when the leader heartbeats. We need to at least make sure that: - Blocked scanner threads are not starving other work. - If the replica's safe time is lagging by a lot, the replica refuses to do the scan. We might also consider other optimizations (like pinging the leader). -- This message was sent by Atlassian JIRA (v6.3.4#6332)