[jira] [Updated] (KUDU-1188) For snapshot read correctness, enforce simple form of leader leases
[ https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1188: -- Labels: roadmap-candidate (was: ) > For snapshot read correctness, enforce simple form of leader leases > --- > > Key: KUDU-1188 > URL: https://issues.apache.org/jira/browse/KUDU-1188 > Project: Kudu > Issue Type: Sub-task > Components: consensus, tserver >Affects Versions: Public beta >Reporter: David Alves >Assignee: David Alves >Priority: Major > Labels: roadmap-candidate > > Since raft doesn't allow holes in the log, a new leader is guaranteed to have > all the writes that preceded its election and to have them in flight when > elected (meaning mvcc will have those transactions in flight, meaning a > snapshot read will wait for them to complete). So, for writes, leases aren't > really necessary. This is contrary to paxos in spanner where there is no > timestamp propagation and the log might have holes and leases are required to > enforce write correctness. > However some form of lease is necessary to enforce read consistency. In > particular in the following case: > Leader A, accepts a write at time 10 which commits and has no following > writes, it then serves a snapshot read at 15, and crashed. > Leader B is elected but has a slow clock which reads 11 when he's ready to > serve writes. It then accepts a write at time 13. > The snapshot read at 15 is now broken. > A simple form to avoid this is to have each replica promise, on each ack, > that if ever elected leader it won't accept writes or serve snapshot read > until a certain period, say 2 secs has passed since that ack. On the leader > side, the leader is only allowed to serve snapshot read up to 2 seconds since > _a majority_ of replicas has ack'd. which in practice means 1 replica usually. > With such a mechanism in place, if the lease is 5, then leader B wouldn't > accept the write at time 13 and would instead wait until 15 had passed, not > breaking the snapshot read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-1188) For snapshot read correctness, enforce simple form of leader leases
[ https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated KUDU-1188: -- Target Version/s: Backlog (was: 1.2.0) Moving this out of 1.2, it won't make it. > For snapshot read correctness, enforce simple form of leader leases > --- > > Key: KUDU-1188 > URL: https://issues.apache.org/jira/browse/KUDU-1188 > Project: Kudu > Issue Type: Sub-task > Components: consensus, tserver >Affects Versions: Public beta >Reporter: David Alves >Assignee: David Alves > > Since raft doesn't allow holes in the log, a new leader is guaranteed to have > all the writes that preceded its election and to have them in flight when > elected (meaning mvcc will have those transactions in flight, meaning a > snapshot read will wait for them to complete). So, for writes, leases aren't > really necessary. This is contrary to paxos in spanner where there is no > timestamp propagation and the log might have holes and leases are required to > enforce write correctness. > However some form of lease is necessary to enforce read consistency. In > particular in the following case: > Leader A, accepts a write at time 10 which commits and has no following > writes, it then serves a snapshot read at 15, and crashed. > Leader B is elected but has a slow clock which reads 11 when he's ready to > serve writes. It then accepts a write at time 13. > The snapshot read at 15 is now broken. > A simple form to avoid this is to have each replica promise, on each ack, > that if ever elected leader it won't accept writes or serve snapshot read > until a certain period, say 2 secs has passed since that ack. On the leader > side, the leader is only allowed to serve snapshot read up to 2 seconds since > _a majority_ of replicas has ack'd. which in practice means 1 replica usually. > With such a mechanism in place, if the lease is 5, then leader B wouldn't > accept the write at time 13 and would instead wait until 15 had passed, not > breaking the snapshot read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KUDU-1188) For snapshot read correctness, enforce simple form of leader leases
[ https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated KUDU-1188: -- Component/s: consensus > For snapshot read correctness, enforce simple form of leader leases > --- > > Key: KUDU-1188 > URL: https://issues.apache.org/jira/browse/KUDU-1188 > Project: Kudu > Issue Type: Sub-task > Components: consensus, tserver >Affects Versions: Public beta >Reporter: David Alves >Assignee: David Alves > > Since raft doesn't allow holes in the log, a new leader is guaranteed to have > all the writes that preceded its election and to have them in flight when > elected (meaning mvcc will have those transactions in flight, meaning a > snapshot read will wait for them to complete). So, for writes, leases aren't > really necessary. This is contrary to paxos in spanner where there is no > timestamp propagation and the log might have holes and leases are required to > enforce write correctness. > However some form of lease is necessary to enforce read consistency. In > particular in the following case: > Leader A, accepts a write at time 10 which commits and has no following > writes, it then serves a snapshot read at 15, and crashed. > Leader B is elected but has a slow clock which reads 11 when he's ready to > serve writes. It then accepts a write at time 13. > The snapshot read at 15 is now broken. > A simple form to avoid this is to have each replica promise, on each ack, > that if ever elected leader it won't accept writes or serve snapshot read > until a certain period, say 2 secs has passed since that ack. On the leader > side, the leader is only allowed to serve snapshot read up to 2 seconds since > _a majority_ of replicas has ack'd. which in practice means 1 replica usually. > With such a mechanism in place, if the lease is 5, then leader B wouldn't > accept the write at time 13 and would instead wait until 15 had passed, not > breaking the snapshot read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KUDU-1188) For snapshot read correctness, enforce simple form of leader leases
[ https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated KUDU-1188: -- Target Version/s: GA > For snapshot read correctness, enforce simple form of leader leases > --- > > Key: KUDU-1188 > URL: https://issues.apache.org/jira/browse/KUDU-1188 > Project: Kudu > Issue Type: Sub-task > Components: tserver >Affects Versions: Public beta >Reporter: David Alves > > Since raft doesn't allow holes in the log, a new leader is guaranteed to have > all the writes that preceded its election and to have them in flight when > elected (meaning mvcc will have those transactions in flight, meaning a > snapshot read will wait for them to complete). So, for writes, leases aren't > really necessary. This is contrary to paxos in spanner where there is no > timestamp propagation and the log might have holes and leases are required to > enforce write correctness. > However some form of lease is necessary to enforce read consistency. In > particular in the following case: > Leader A, accepts a write at time 10 which commits and has no following > writes, it then serves a snapshot read at 15, and crashed. > Leader B is elected but has a slow clock which reads 11 when he's ready to > serve writes. It then accepts a write at time 13. > The snapshot read at 15 is now broken. > A simple form to avoid this is to have each replica promise, on each ack, > that if ever elected leader it won't accept writes or serve snapshot read > until a certain period, say 2 secs has passed since that ack. On the leader > side, the leader is only allowed to serve snapshot read up to 2 seconds since > _a majority_ of replicas has ack'd. which in practice means 1 replica usually. > With such a mechanism in place, if the lease is 5, then leader B wouldn't > accept the write at time 13 and would instead wait until 15 had passed, not > breaking the snapshot read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)