On Thu, Sep 17, 2015 at 12:50 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 1 September 2015 at 20:25, Thomas Munro <thomas.mu...@enterprisedb.com>
>> The next problem is that the master can be waiting quite a long time for a
>> reply from the remote walreceiver containing the desired apply LSN: in the
>> best case it learns of apply progress from replies to subsequent unrelated
>> records (which might be very soon on a busy system but still involves
>> waiting for the next transaction's WAL flush), and in the worst case it
>> needs to wait for wal_receiver_status_interval (10 seconds by default),
>> which makes for a long COMMIT delay. I was thinking that the solution to
>> that may be to teach StartupLOG to signal the walreceiver after it updates
>> XLogCtl->lastReplayedEndRecPtr, which should cause walrcv_receive to be
>> interrupted and return early, and then walreceiver could send a reply if it
>> sees that lastReplayedEndRecPtr has moved. Maybe that would generate an
>> unacceptably high frequency of signals, and maybe there is a better form of
>> IPC for this. Without introducing any new IPC, the walreceiver could
>> instead simply report apply progress to the master whenever it sees that the
>> apply LSN has moved after its regular NAPTIME_PER_CYCLE wait (100ms), but
>> that would still introduces bogus latency. A quick and dirty way to see
>> that on top of the attached patch is to set requestReply = true in
>> WalReceiverMain to force a send after every nap.
> This problem is exactly why I wrote my recent patch to make WALWriter work
> in recovery.
> Currently, the WALReceiver issues regular fsyncs that prevent it from
> replying in time. Also, the WALReceiver waits on incoming data only, so we
> can't (yet) set a latch when the Startup process has applied some records.
> I've solved the first problem and know how to solve the second, just haven't
> coded it yet. I was expecting to do that for CF3 or CF4.
> I don't think we should be using signals, nor would I expect them to work
> effectively while in an fsync.
That sounds much better. I had noticed that with my patch the
walreceiver loop was basically trying to do far too much. I was
contemplating investigating a pipe for IPC, so that it could
select/poll on both the socket connected to master + the new apply
feedback pipe, rather that using raw signals (directly or via latches)
and interrupting syscalls.
>> I can see that using synchronous_commit = apply in the practice might
>> prove difficult: how does a client know which node is the synchronous
>> standby? Perhaps those sorts of practical problems are the reason no one
>> has done or wanted this.
> It means we need quorum sync rep as well, to make this useful in practice
> without sacrificing HA.
> Bringing my patch and Beena's patch together will solve this for us in 9.6
I've been looking at that patch. It makes sense for adding redundancy
in synchronous_commit = on mode (waiting for WAL flush but not apply).
But it strikes me that to make multi-server synchronous_commit = apply
really useful, it is not enough to wait for a quorum of any N servers
in a group to reply, because a client connected to a given standby
doesn't know whether that standby was one of the N and therefore
whether it is guaranteed to see the effects of a committed transaction
that it has heard about. Do you have a plan that could address that?
I have been working on a proposal that adds support for reliable
"causal" and "ready-your-writes" consistency, while still allowing for
some number of standbys to fail/fall behind without blocking all
transactions forever. After a COMMIT with synchronous_commit = apply
returns successfully, you can run a query on any standby node, or tell
another process to run a query on any standby node, and it is
guaranteed to either see the committed transaction or receive a new
error "standby not synchronized". This behaviour is activated by also
setting synchronous_commit = apply on the standby, and works by adding
some two-way timeout logic. I will have more to say about this soon
(I have some other work to get out of the way first).
I will not be at all surprised to hear that you already have this
covered and are 18 steps ahead of me!
> So yes, 1) we have thought of it and want it, 2) the basic patch is trivial,
> 3) but it isn't the main problem.
Agreed. I had a go at this because I needed the trivial plumbing in
so I could work on the more difficult problem above, and I didn't know
you had it in the pipeline already. I'm glad to hear that you do, and
that you have solved the problem of the interleaving of operations in
walreceiver, and I will be following along with interest.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: