+1 to both the feature and the concept of how it's implemented.
Haven't looked at the code though.

This feature would be very useful for us at Trustly.
This would mean we got get rid of an entire system component in our
architecture (=memcached) which we only use to write data which must
be immediately readable at the sync slave after the master commits.
The only such data we currently have is the backoffice sessionid which
must be readable on the slave, otherwise the read-only calls which we
route to the slave might fail because it's missing.

On Wed, Nov 11, 2015 at 6:37 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> Hi hackers,
> Many sites use hot standby servers to spread read-heavy workloads over more 
> hardware, or at least would like to.  This works well today if your 
> application can tolerate some time lag on standbys.  The problem is that 
> there is no guarantee of when a particular commit will become visible for 
> clients connected to standbys.  The existing synchronous commit feature is no 
> help here because it guarantees only that the WAL has been flushed on another 
> server before commit returns.  It says nothing about whether it has been 
> applied or whether it has been applied on the standby that you happen to be 
> talking to.
> A while ago I posted a small patch[1] to allow synchronous_commit to wait for 
> remote apply on the current synchronous standby, but (as Simon Riggs rightly 
> pointed out in that thread) that part isn't the main problem.  It seems to me 
> that the main problem for a practical 'writer waits' system is how to support 
> a dynamic set of servers, gracefully tolerating failures and timing out 
> laggards, while also providing a strong guarantee during any such 
> transitions.  Now I would like to propose something to do that, and share a 
> proof-of-concept patch.
> === PROPOSAL ===
> The working name for the proposed feature is "causal reads", because it 
> provides a form of "causal consistency"[2] (and "read-your-writes" 
> consistency) no matter which server the client is connected to.  There is a 
> similar feature by the same name in another product (albeit implemented 
> differently -- 'reader waits'; more about that later).  I'm not wedded to the 
> name.
> The feature allows arbitrary read-only transactions to be run on any hot 
> standby, with a specific guarantee about the visibility of preceding 
> transactions.  The guarantee is that if you set a new GUC "causal_reads = on" 
> in any pair of consecutive transactions (tx1, tx2) where tx2 begins after tx1 
> successfully returns, then tx2 will either see tx1 or fail with a new error 
> "standby is not available for causal reads", no matter which server it runs 
> on.  A discovery mechanism is also provided, giving an instantaneous snapshot 
> of the set of standbys that are currently available for causal reads (ie 
> won't raise the error), in the form of a new column in pg_stat_replication.
> For example, a web server might run tx1 to insert a new row representing a 
> message in a discussion forum on the primary server, and then send the user 
> to another web page that runs tx2 to load all messages in the forum on an 
> arbitrary hot standby server.  If causal_reads = on in both tx1 and tx2 (for 
> example, because it's on globally), then tx2 is guaranteed to see the new 
> post, or get a (hopefully rare) error telling the client to retry on another 
> server.
> Very briefly, the approach is:
> 1.  The primary tracks apply lag on each standby (including between commits).
> 2.  The primary deems standbys 'available' for causal reads if they are 
> applying WAL and replying to keepalives fast enough, and periodically sends 
> the standby an authorization to consider itself available for causal reads 
> until a time in the near future.
> 3.  Commit on the primary with "causal_reads = on" waits for all 'available' 
> standbys either to apply the commit record, or to cease to be 'available' and 
> begin raising the error if they are still alive (because their authorizations 
> have expired).
> 4.  Standbys can start causal reads transactions only while they have an 
> authorization with an expiry time in the future; otherwise they raise an 
> error when an initial snapshot is taken.
> In a follow-up email I can write about the design trade-offs considered 
> (mainly 'writer waits' vs 'reader waits'), comparison with some other 
> products, method of estimating replay lag, wait and timeout logic and how it 
> maintains the guarantee in various failure scenarios, logic for standbys 
> joining and leaving, implications of system clock skew between servers, or 
> any other questions you may have, depending on feedback/interest (but see 
> comments in the attached patch for some of those subjects).  For now I didn't 
> want to clog up the intertubes with too large a wall of text.
> Please see the POC patch attached.  It adds two new GUCs.  After setting up 
> one or more hot standbys as per usual, simply add "causal_reads_timeout = 4s" 
> to the primary's postgresql.conf and restart.  Now, you can set "causal_reads 
> = on" in some/all sessions to get guaranteed causal consistency.  Expected 
> behaviour: the causal reads guarantee is maintained at all times, even when 
> you overwhelm, kill, crash, disconnect, restart, pause, add and remove 
> standbys, and the primary drops them from the set it waits for in a timely 
> fashion.  You can monitor the system with the replay_lag and 
> causal_reads_status in pg_stat_replication and some state transition LOG 
> messages on the primary.  (The patch also supports "synchronous_commit = 
> apply", but it's not clear how useful that is in practice, as already 
> discussed.)
> Lastly, a few notes about how this feature related to some other work:
> The current version of this patch has causal_reads as a feature separate from 
> synchronous_commit, from a user's point of view.  The thinking behind this is 
> that load balancing and data loss avoidance are separate concerns: 
> synchronous_commit deals with the latter, and causal_reads with the former.  
> That said, existing SyncRep machinery is obviously used (specifically  
> SyncRep queues, with a small modification, as a way to wait for apply 
> messages to arrive from standbys).  (An earlier prototype had causal reads as 
> a new level for synchronous_commit and associated states as new walsender 
> states above 'streaming'.  When contemplating how to combine this proposal 
> with the multiple-synchronous-standby patch, some colleagues and I came 
> around to the view that the concerns are separate.  The reason for wanting to 
> configure complicated quorum definitions is to control data loss risks and 
> has nothing to do with load balancing requirements, so we thought the 
> features should probably be separate.)
> The multiple-synchronous-servers patch[3] could be applied or not 
> independently of this feature as a result of that separation, as it doesn't 
> use synchronous_standby_names or indeed any kind of statically defined quorum.
> The standby WAL writer patch[4] would significantly improve walreceiver 
> performance and smoothness which would work very well with this proposal.
> Please let me know what you think!
> Thanks,
> [1] 
> http://www.postgresql.org/message-id/flat/CAEepm=1fqkivl4v-otphwsgw4af9hcogimrcw-ybtjipx9g...@mail.gmail.com
> [2] From http://queue.acm.org/detail.cfm?id=1466448
> "Causal consistency. If process A has communicated to process B that it has 
> updated a data item, a subsequent access by process B will return the updated 
> value, and a write is guaranteed to supersede the earlier write. Access by 
> process C that has no causal relationship to process A is subject to the 
> normal eventual consistency rules.
> Read-your-writes consistency. This is an important model where process A, 
> after it has updated a data item, always accesses the updated value and will 
> never see an older value. This is a special case of the causal consistency 
> model."
> [3] 
> http://www.postgresql.org/message-id/flat/caog9aphycpmtypaawfd3_v7svokbnecfivmrc1axhb40zbs...@mail.gmail.com
> [4] 
> http://www.postgresql.org/message-id/flat/CA+U5nMJifauXvVbx=v3UbYbHO3Jw2rdT4haL6CCooEDM5=4...@mail.gmail.com
> --
> Thomas Munro
> http://www.enterprisedb.com
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to