Re: [HACKERS] Causal reads take II

Ants Aasma Thu, 19 Jan 2017 06:02:30 -0800

On Thu, Jan 19, 2017 at 2:22 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Thu, Jan 19, 2017 at 8:11 PM, Ants Aasma <ants.aa...@eesti.ee> wrote:
>> On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>> Long term, I think it would be pretty cool if we could develop a set
>>> of features that give you distributed sequential consistency on top of
>>> streaming replication.  Something like (this | causality-tokens) +
>>> SERIALIZABLE-DEFERRABLE-on-standbys[3] +
>>> distributed-dirty-read-prevention[4].
>>
>> Is it necessary that causal writes wait for replication before making
>> the transaction visible on the master? I'm asking because the per tx
>> variable wait time between logging commit record and making
>> transaction visible makes it really hard to provide matching
>> visibility order on master and standby.
>
> Yeah, that does seem problematic.  Even with async replication or no
> replication, isn't there already a race in CommitTransaction() where
> two backends could reach RecordTransactionCommit() in one order but
> ProcArrayEndTransaction() in the other order?  AFAICS using
> synchronous replication in one of the transactions just makes it more
> likely you'll experience such a visibility difference between the DO
> and REDO histories (!), by making RecordTransactionCommit() wait.
> Nothing prevents you getting a snapshot that can see t2 but not t1 in
> the DO history, while someone doing PITR or querying an asynchronous
> standby gets a snapshot that can see t1 but not t2 because those
> replay the REDO history.


Yes there is a race even with all transactions having the same
synchronization level. But nobody will mind if we some day fix that
race. :) With different synchronization levels it is much trickier to
fix as either async commits must wait behind sync commits before
becoming visible, return without becoming visible or visibility order
must differ from commit record LSN order. The first makes the async
commit feature useless, second seems a no-go for semantic reasons,
third requires extra information sent to standby's so they know the
actual commit order.

>> In CSN based snapshot
>> discussions we came to the conclusion that to make standby visibility
>> order match master while still allowing for async transactions to
>> become visible before they are durable we need to make the commit
>> sequence a vector clock and transmit extra visibility ordering
>> information to standby's. Having one more level of delay between wal
>> logging of commit and making it visible would make the problem even
>> worse.
>
> I'd like to read that... could you please point me at the right bit of
> that discussion?

Some of that discussion was face to face at pgconf.eu, some of it is
here: 
https://www.postgresql.org/message-id/CA+CSw_vbt=cwluogr7gxdpnho_y4cz7x97+o_bh-rfo7ken...@mail.gmail.com

Let me know if you have any questions.

>> It seems that fixing that would
>> require either keeping some per client state or a global agreement on
>> what snapshots are safe to provide, both of which you tried to avoid
>> for this feature.
>
> Agreed.  You briefly mentioned this problem in the context of pairs of
> read-only transactions a while ago[1].  As you said then, it does seem
> plausible to do that with a token system that gives clients the last
> commit LSN from the snapshot used by a read only query, so that you
> can ask another standby to make sure that LSN has been replayed before
> running another read-only transaction.  This could be handled
> explicitly by a motivated client that is talking to multiple nodes.  A
> more general problem is client A telling client B to go and run
> queries and expecting B to see all transactions that A has seen; it
> now has to pass the LSN along with that communication, or rely on some
> kind of magic proxy that sees all transactions, or a radically
> different system with a GTM.

If/when we do CSN based snapshots, adding a GTM could be relatively
straightforward. It's basically not all that far from what Spanner is
doing by using a timestamp as the snapshot. But this is all relatively
independent of this patch.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Causal reads take II

Reply via email to