Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 11:50 AM, Andres Freund  wrote:
> On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
>> On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner  wrote:
>> > On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  
>> > wrote:
>>
>> >> Could you provide an example of a case where xacts replayed in
>> >> commit order will produce incorrect results?
>> >
>> > https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>> >
>> > ... where T3 is on the replication target.
>>
>> I should, perhaps, have mentioned that the cases where this is are
>> problem are "eventually consistent" -- it's a matter of being able
>> to see a state that violates business rule invariants or where data
>> which is "locked down" according to one part of the database is
>> still changing.  Such problems are prevented on a single database,
>> but would not be prevented on a logical replica where transactions
>> are applied in commit order.  Given enough time, the replica would
>> eventually settle into a state without the anomalies, similar to
>> some other products with eventual consistency.
>
> I've generally a bit of difficulty to see this as a significant problem
> for logical rep, as long as hot-standby, and crash-recovery in general,
> also has this problem...

Serialization anomalies cannot be seen on a hot standby nor on
crash recovery.  Granted, the mechanism which prevents it on the
hot standby is that we don't allow the transaction isolation level
to be set to SERIALIZABLE, to prevent the expectation that queries
will be free of anomalies, which is pretty crude.  On crash
recovery you cannot see any anomalies as far as I'm aware -- what
has given you the idea that it is possible?

I'm not sure that the same technique I described for logical
replication could be made to work for page-level physical
replication, but a "safe snapshot" technique has been previously
discussed on the lists which would work for physical replication
(although *that* technique seems unsuited to logical replication).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Craig Ringer
On 24 August 2016 at 00:50, Andres Freund  wrote:

> On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
> > On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner 
> wrote:
> > [an explanation of SSI anomalies]
>

I've generally a bit of difficulty to see this as a significant problem
> for logical rep, as long as hot-standby, and crash-recovery in general,
> also has this problem...
>

Same here, as commented upthread. I think it'd be cool to be able to
deliver SSI-alike behaviour on a standby, but it's far from a priority, and
it applies to phys ans well as logical rep.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Andres Freund
On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
> On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner  wrote:
> > On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  wrote:
> 
> >> Could you provide an example of a case where xacts replayed in
> >> commit order will produce incorrect results?
> >
> > https://wiki.postgresql.org/wiki/SSI#Deposit_Report
> >
> > ... where T3 is on the replication target.
> 
> I should, perhaps, have mentioned that the cases where this is are
> problem are "eventually consistent" -- it's a matter of being able
> to see a state that violates business rule invariants or where data
> which is "locked down" according to one part of the database is
> still changing.  Such problems are prevented on a single database,
> but would not be prevented on a logical replica where transactions
> are applied in commit order.  Given enough time, the replica would
> eventually settle into a state without the anomalies, similar to
> some other products with eventual consistency.

I've generally a bit of difficulty to see this as a significant problem
for logical rep, as long as hot-standby, and crash-recovery in general,
also has this problem...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Craig Ringer
On 24 August 2016 at 00:26, Kevin Grittner  wrote:

> On Tue, Aug 23, 2016 at 9:36 AM, Kevin Grittner  wrote:
> > On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner 
> wrote:
> >> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer 
> wrote:
> >>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
> 
>  On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
> >>
> > Could you provide an example of a case where xacts replayed in
> > commit order will produce incorrect results?
> 
>  https://wiki.postgresql.org/wiki/SSI#Deposit_Report
> 
>  ... where T3 is on the replication target.
> >>>
> >>> Right. But we don't attempt to replicate locking let alone SSI state.
> As I
> >>> said this is expected. If T1, T2 and T3 run in the master in either
> READ
> >>> COMMITTED or SERIALIZABLE we will correctly replay whatever got
> committed
> >>> and leave the replica in the same state as the master.
> >>
> >> Eventually.  Between the commit of T3 and T2 a state can be seen on
> >> the replica which would not have been allowed on the source.
> >>
> >>> It is row level replication so there is no simple way to detect this
> >>> anomaly.
> >>
> >> That is probably true, but there is a way to *prevent* it.
> >>
> >>> We would have to send a lot of co-ordination data *in both
> >>> directions*, right?
> >>
> >> No.  The source has all the information about both commit order and
> >> read-write dependencies, and could produce a stream of transaction
> >> IDs to specify the safe order for applying transactions to prevent
> >> the anomaly from appearing on the replica.  In this case the commit
> >> order is T1->T3->T2, but the apparent order of execution (AOoE) is
> >> T1->T2->T3.
> >
> > Sorry, trying to keep this conversation going while doing something
> > else and sent a response there which doesn't really make sense,
> > since the issue is whether to allow T3 to run *on the replica*.
> > I'll take another look when I'm less distracted.  You may be right.
>
> I had the right idea, but messed up the details.  The source has
> commit order T2->T1 and AOoE T1->T2.  So as long as making any work
> from T2 visible is held up until at least the point where the work
> of T1 is visible, T3 on the replica cannot see an anomalous state.
>
> It is still true that a one-way stream of information from the
> primary to the replica regarding AOoE would be sufficient
> communication.
>

Hm. That's really interesting.   So we could achieve SSI-like properties
for read-only xacts on a replica if we could defer xact decoding, or buffer
the xact on the downstream and defer apply, based on SSI info from the
upstream. That'd be pretty cool.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 9:36 AM, Kevin Grittner  wrote:
> On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner  wrote:
>> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
>>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:

 On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
>>
> Could you provide an example of a case where xacts replayed in
> commit order will produce incorrect results?

 https://wiki.postgresql.org/wiki/SSI#Deposit_Report

 ... where T3 is on the replication target.
>>>
>>> Right. But we don't attempt to replicate locking let alone SSI state. As I
>>> said this is expected. If T1, T2 and T3 run in the master in either READ
>>> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
>>> and leave the replica in the same state as the master.
>>
>> Eventually.  Between the commit of T3 and T2 a state can be seen on
>> the replica which would not have been allowed on the source.
>>
>>> It is row level replication so there is no simple way to detect this
>>> anomaly.
>>
>> That is probably true, but there is a way to *prevent* it.
>>
>>> We would have to send a lot of co-ordination data *in both
>>> directions*, right?
>>
>> No.  The source has all the information about both commit order and
>> read-write dependencies, and could produce a stream of transaction
>> IDs to specify the safe order for applying transactions to prevent
>> the anomaly from appearing on the replica.  In this case the commit
>> order is T1->T3->T2, but the apparent order of execution (AOoE) is
>> T1->T2->T3.
>
> Sorry, trying to keep this conversation going while doing something
> else and sent a response there which doesn't really make sense,
> since the issue is whether to allow T3 to run *on the replica*.
> I'll take another look when I'm less distracted.  You may be right.

I had the right idea, but messed up the details.  The source has
commit order T2->T1 and AOoE T1->T2.  So as long as making any work
from T2 visible is held up until at least the point where the work
of T1 is visible, T3 on the replica cannot see an anomalous state.

It is still true that a one-way stream of information from the
primary to the replica regarding AOoE would be sufficient
communication.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Petr Jelinek

On 23/08/16 15:59, Robert Haas wrote:

On Mon, Aug 22, 2016 at 4:32 PM, Andres Freund  wrote:

On 2016-08-22 16:29:12 -0400, Robert Haas wrote:

So, I wish I could give you some better advice on this topic, but
sadly I am not an expert in this area.  However, it seems to me that
this is just one facet of a much more general problem: given two
transactions T1 and T2, the order of replay must match the order of
commit unless you can prove that there are no dependencies between
them.  I don't see why it matters whether the operations are sequence
operations or data operations; it's just a question of whether they're
modifying the same "stuff".

Of course, it's possible I'm missing something important here...


Maybe that normally logical decoding outputs stuff in commit order?


As slow as I sometimes am, I did know that.  :-)

I think what I was missing is that nextval() operations are
non-transactional.  Craig describes them as non-transactional except
when they are transactional, but I think that's not really the right
way of looking at it.  Creating the sequence is transactional, and
updating the value is not.  What seems to be causing trouble for Craig
is that if the nextval() operation is inserted into the replication
stream non-transactionally, it can happen before the sequence actually
gets created.  I'm wondering if we can't find a way to make it so that
it's OK for those operations to happen out of order, rather than
trying to make the nextval() operation sometimes transactional and
other times non-transactional.



Well, that's what Craig is trying to do by tracking if the transactional 
change has happend on a sequence in current transaction, no?


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner  wrote:
> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
>>>
>>> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
>
 Could you provide an example of a case where xacts replayed in
 commit order will produce incorrect results?
>>>
>>> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>>>
>>> ... where T3 is on the replication target.
>>
>> Right. But we don't attempt to replicate locking let alone SSI state. As I
>> said this is expected. If T1, T2 and T3 run in the master in either READ
>> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
>> and leave the replica in the same state as the master.
>
> Eventually.  Between the commit of T3 and T2 a state can be seen on
> the replica which would not have been allowed on the source.
>
>> It is row level replication so there is no simple way to detect this
>> anomaly.
>
> That is probably true, but there is a way to *prevent* it.
>
>> We would have to send a lot of co-ordination data *in both
>> directions*, right?
>
> No.  The source has all the information about both commit order and
> read-write dependencies, and could produce a stream of transaction
> IDs to specify the safe order for applying transactions to prevent
> the anomaly from appearing on the replica.  In this case the commit
> order is T1->T3->T2, but the apparent order of execution (AOoE) is
> T1->T2->T3.

Sorry, trying to keep this conversation going while doing something
else and sent a response there which doesn't really make sense,
since the issue is whether to allow T3 to run *on the replica*.
I'll take another look when I'm less distracted.  You may be right.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
>>
>> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 

>>> Could you provide an example of a case where xacts replayed in
>>> commit order will produce incorrect results?
>>
>> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>>
>> ... where T3 is on the replication target.
>
> Right. But we don't attempt to replicate locking let alone SSI state. As I
> said this is expected. If T1, T2 and T3 run in the master in either READ
> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
> and leave the replica in the same state as the master.

Eventually.  Between the commit of T3 and T2 a state can be seen on
the replica which would not have been allowed on the source.

> It is row level replication so there is no simple way to detect this
> anomaly.

That is probably true, but there is a way to *prevent* it.

> We would have to send a lot of co-ordination data *in both
> directions*, right?

No.  The source has all the information about both commit order and
read-write dependencies, and could produce a stream of transaction
IDs to specify the safe order for applying transactions to prevent
the anomaly from appearing on the replica.  In this case the commit
order is T1->T3->T2, but the apparent order of execution (AOoE) is
T1->T2->T3.  If the source communicated that to the replica, and
the replica held up application of any changes from T3 until T2 was
committed there would be no chance to read incorrect results.  It
would not matter if T2 and T3 were committed on the replica
simultaneously or in AOoE, as long as the work of T3 does not
appear before the work of T2.

The replica does not need to send anything back to the source for
this to work.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Robert Haas
On Mon, Aug 22, 2016 at 4:32 PM, Andres Freund  wrote:
> On 2016-08-22 16:29:12 -0400, Robert Haas wrote:
>> So, I wish I could give you some better advice on this topic, but
>> sadly I am not an expert in this area.  However, it seems to me that
>> this is just one facet of a much more general problem: given two
>> transactions T1 and T2, the order of replay must match the order of
>> commit unless you can prove that there are no dependencies between
>> them.  I don't see why it matters whether the operations are sequence
>> operations or data operations; it's just a question of whether they're
>> modifying the same "stuff".
>>
>> Of course, it's possible I'm missing something important here...
>
> Maybe that normally logical decoding outputs stuff in commit order?

As slow as I sometimes am, I did know that.  :-)

I think what I was missing is that nextval() operations are
non-transactional.  Craig describes them as non-transactional except
when they are transactional, but I think that's not really the right
way of looking at it.  Creating the sequence is transactional, and
updating the value is not.  What seems to be causing trouble for Craig
is that if the nextval() operation is inserted into the replication
stream non-transactionally, it can happen before the sequence actually
gets created.  I'm wondering if we can't find a way to make it so that
it's OK for those operations to happen out of order, rather than
trying to make the nextval() operation sometimes transactional and
other times non-transactional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Craig Ringer
On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
>
> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
wrote:
> > On 23 Aug 2016 05:43, "Kevin Grittner"  wrote:
> >> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas 
wrote:
> >>
> >>> it seems to me that
> >>> this is just one facet of a much more general problem: given two
> >>> transactions T1 and T2, the order of replay must match the order of
> >>> commit unless you can prove that there are no dependencies between
> >>> them.  I don't see why it matters whether the operations are sequence
> >>> operations or data operations; it's just a question of whether they're
> >>> modifying the same "stuff".
>
> >> The commit order is the simplest and safest *unless* there is a
> >> read-write anti-dependency a/k/a read-write dependency a/k/a
> >> rw-conflict: where a read from one transaction sees the "before"
> >> version of data modified by the other transaction.  In such a case
> >> it is necessary for correct serializable transaction behavior for
> >> the transaction that read the "before" image to be replayed before
> >> the write it didn't see, regardless of commit order.  If you're not
> >> trying to avoid serialization anomalies, it is less clear to me
> >> what is best.
> >
> > Could you provide an example of a case where xacts replayed in
> > commit order will produce incorrect results?
>
> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>
> ... where T3 is on the replication target.

Right. But we don't attempt to replicate locking let alone SSI state. As I
said this is expected. If T1, T2 and T3 run in the master in either READ
COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
and leave the replica in the same state as the master.

It is row level replication so there is no simple way to detect this
anomaly. We would have to send a lot of co-ordination data *in both
directions*, right?

Sounds like a job for tightly coupled clustering with a GTM, GLM etc. We're
a very very long way from that.

The docs probably need to be fairly explicit about the guarantees made and
not made and how behavior of queries on replicas can differ from queries on
master. The same is true for physical replication though, right?

Good point that xacts run on replicas can differ in results from the same
query on master though. At least in read only SERIALIZABLE xacts which are
effectively downgraded to REPEATABLE READ (snapshot) with respect to xacts
on master. But continue to get full SSI with respect to other (possibly
r/w) xacts on the replica. We can't really just disallow SERIALIZABLE
isolation xacts because they remain useful. But we might want to be able to
at least warn if they touch replicated (subscriber) tables.


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner  wrote:
> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  wrote:

>> Could you provide an example of a case where xacts replayed in
>> commit order will produce incorrect results?
>
> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>
> ... where T3 is on the replication target.

I should, perhaps, have mentioned that the cases where this is are
problem are "eventually consistent" -- it's a matter of being able
to see a state that violates business rule invariants or where data
which is "locked down" according to one part of the database is
still changing.  Such problems are prevented on a single database,
but would not be prevented on a logical replica where transactions
are applied in commit order.  Given enough time, the replica would
eventually settle into a state without the anomalies, similar to
some other products with eventual consistency.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  wrote:
> On 23 Aug 2016 05:43, "Kevin Grittner"  wrote:
>> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas  wrote:
>>
>>> it seems to me that
>>> this is just one facet of a much more general problem: given two
>>> transactions T1 and T2, the order of replay must match the order of
>>> commit unless you can prove that there are no dependencies between
>>> them.  I don't see why it matters whether the operations are sequence
>>> operations or data operations; it's just a question of whether they're
>>> modifying the same "stuff".

>> The commit order is the simplest and safest *unless* there is a
>> read-write anti-dependency a/k/a read-write dependency a/k/a
>> rw-conflict: where a read from one transaction sees the "before"
>> version of data modified by the other transaction.  In such a case
>> it is necessary for correct serializable transaction behavior for
>> the transaction that read the "before" image to be replayed before
>> the write it didn't see, regardless of commit order.  If you're not
>> trying to avoid serialization anomalies, it is less clear to me
>> what is best.
>
> Could you provide an example of a case where xacts replayed in
> commit order will produce incorrect results?

https://wiki.postgresql.org/wiki/SSI#Deposit_Report

... where T3 is on the replication target.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Craig Ringer
On 23 Aug 2016 05:43, "Kevin Grittner"  wrote:
>
> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas 
wrote:
>
> > it seems to me that
> > this is just one facet of a much more general problem: given two
> > transactions T1 and T2, the order of replay must match the order of
> > commit unless you can prove that there are no dependencies between
> > them.  I don't see why it matters whether the operations are sequence
> > operations or data operations; it's just a question of whether they're
> > modifying the same "stuff".

It matters because sequence operations aren't transactional in pg. Except
when they are - operations on a newly CREATEd sequence or one where we did
a TRUNCATE ...RESTART IDENTITY.

But we don't store the xid of the xact associated with a transactional
sequence update along with the sequence update anywhere. We just rely on nk
other xact knowing to look at the sequence relfilenode we're changing.
Doesn't work so well in logical rep.

We also don't store knowledge of whether or not the sequence advance is
transactional. Again important because for two xacts t1 and t2:

* Sequence last value is 50

* T1 calls nextval. Needs a new chunk because all cached values have been
used. Writes sequence wal advancing seq last_value to 100, returns 51.

* T2 calls nextval, gets cached value 52.

* T2 commits

* Master crashes and we fail over to replica.

This is fine for physical rep. We replay the sequence advance and all is
well.

But for logical rep the sequence can't be treated as part of t1. If t1
rolls back or we fail over before replying it we might return value 52 from
nextval even though we replayed and committed t2 that used value 52. Oops.

However if some xact t3 creates a sequence we can't replay updates to it
until the sequence relation is committed. And it's even more fun with
TRUNCATE ... RESTART IDENTITY where we need rollback behaviour too.

Make sense? It's hard because sequences are sometimes but not always exrmpt
from transactional behaviour and pg doesn't record when, since it can rely
on physical wal redo order and can apply sequence advances before the
sequence relation is committed yet.

>
> The commit order is the simplest and safest *unless* there is a
> read-write anti-dependency a/k/a read-write dependency a/k/a
> rw-conflict: where a read from one transaction sees the "before"
> version of data modified by the other transaction.  In such a case
> it is necessary for correct serializable transaction behavior for
> the transaction that read the "before" image to be replayed before
> the write it didn't see, regardless of commit order.  If you're not
> trying to avoid serialization anomalies, it is less clear to me
> what is best.

Could you provide an example of a case where xacts replayed in commit order
will produce incorrect results?

Remember that we aren't doing statement based replication in pg logical
decoding/replication. We don't care how a row got changed, only that we
make consistent transitions from before state to after state to for each
transaction, such that the data committed and visible on the master is
visible on the standby and no uncommitted or not yet visible data on the
master is committed/visible on the replica. The replica should have visible
committed data matching the master as it was when it originally executed
the xact we most recently replayed.

No locking is decoded or replayed. It is not expected that a normal non
replication client executing some other concurrent xact will have the same
effect if run on standby as on master.

It's replication not tightly coupled clustering. If/when we have things
like parallel decoding and replay of  concurrent xacts then issues like the
dependencies you mention will start to become a concern. We are a long way
from there.

For sequences the requirement IMO is that the sequence advances on the
replica to or past the position it was at on the master when the first xact
that saw those sequence values committed. We should never see the sequence
'behind' such that calling nextval on the replica can produce a value
already seen and stored by some committed xact on the replica. Being a bit
ahead is ok, much like pg discards sequence values on crash.

That's not that hard. The problems arise when the sequence it's self isn't
committed yet, per above.


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Kevin Grittner
On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas  wrote:

> it seems to me that
> this is just one facet of a much more general problem: given two
> transactions T1 and T2, the order of replay must match the order of
> commit unless you can prove that there are no dependencies between
> them.  I don't see why it matters whether the operations are sequence
> operations or data operations; it's just a question of whether they're
> modifying the same "stuff".

The commit order is the simplest and safest *unless* there is a
read-write anti-dependency a/k/a read-write dependency a/k/a
rw-conflict: where a read from one transaction sees the "before"
version of data modified by the other transaction.  In such a case
it is necessary for correct serializable transaction behavior for
the transaction that read the "before" image to be replayed before
the write it didn't see, regardless of commit order.  If you're not
trying to avoid serialization anomalies, it is less clear to me
what is best.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Andres Freund
On 2016-08-22 16:29:12 -0400, Robert Haas wrote:
> So, I wish I could give you some better advice on this topic, but
> sadly I am not an expert in this area.  However, it seems to me that
> this is just one facet of a much more general problem: given two
> transactions T1 and T2, the order of replay must match the order of
> commit unless you can prove that there are no dependencies between
> them.  I don't see why it matters whether the operations are sequence
> operations or data operations; it's just a question of whether they're
> modifying the same "stuff".
> 
> Of course, it's possible I'm missing something important here...

Maybe that normally logical decoding outputs stuff in commit order?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Robert Haas
On Sun, Aug 21, 2016 at 11:13 PM, Craig Ringer  wrote:
> If the sequence is created in the current xact (i.e. uncommitted) we have to
> add the sequence updates to that xact to be replayed only if it commits. The
> sequence is visible only to the toplevel xact that created the sequence so
> advances of it can only come from that xact and its children. The actual
> CREATE SEQUENCE is presumed to be handled separately by an event trigger or
> similar.
>
> If the new sequence is committed we must replay sequence advances
> immediately and non-transactionally to ensure they're not lost due to xact
> rollback or replayed in the wrong order due to xact commit order.

So, I wish I could give you some better advice on this topic, but
sadly I am not an expert in this area.  However, it seems to me that
this is just one facet of a much more general problem: given two
transactions T1 and T2, the order of replay must match the order of
commit unless you can prove that there are no dependencies between
them.  I don't see why it matters whether the operations are sequence
operations or data operations; it's just a question of whether they're
modifying the same "stuff".

Of course, it's possible I'm missing something important here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-21 Thread Craig Ringer
On 22 August 2016 at 11:13, Craig Ringer  wrote:


> So yeah. I think extending SeqTableData and xl_seq_rec with xid is the way
> to go. Objections?
>


Prototyping this shows that re-using SeqTableData to store the creator xid
won't work out. We can't just store the toplevel xid because TRUNCATE ...
RESTART IDENTITY is transactional, using a new relfilenode and new sequence
"timeline". If we just store and xlog the toplevel xid as the sequence's
creator/restarter we'll fail to correctly handle it if a subxact that did
TRUNCATE ... RESTART IDENTITY rolls back, e.g.

CREATE TABLE x(id serial not null);
SELECT nextval('x_id_seq');   =>   1
BEGIN;
SELECT nextval('x_id_seq');   =>   2
SAVEPOINT sp1;
SELECT nextval('x_id_seq');   =>   3
TRUNCATE TABLE x RESTART IDENTITY;
SELECT nextval('x_id_seq');   =>   1
ROLLBACK TO SAVEPOINT sp1;
SELECT nextval('x_id_seq');   =>   4

sequence.c:init_sequence() detects this by noticing that the relfilenode
has changed and discarding cached values, resuming at last_value. Knowledge
of whether we created the sequence relfilenode is not retained so we can't
do something similar.

Unless anyone has any clever (or obvious but not to me) solutions to this,
I'll probably need to maintain a separate map of sequence relfilenodes we
created and which xid we created them in, so we can test whether that xid
is still in progress when logging a change. It's still pretty much free
when wal_level < logical or the current xact hasn't created any sequences.

Otherwise I could store a List of xids in the SeqTableData for the sequence
and check that for in-progress xids. It'd usually be NIL. If not, it'll
almost always be a 1-item List, the creating / resetting xid. If subxacts
are involved it'll become a stack. We walk down the stack checking whether
xacts are in progress and popping them if not until we find an in-progress
entry or run out of stack and set it to NIL.

Either will produce the same desired result: the correct subxact xid for
the innermost in-progress xact that created or reset this sequence, if any.

(I initially planned to just punt on TRUNCATE and let event triggers handle
it, but the need to roll back sequence advances if a TRUNCATE ... RESTART
IDENTITY is rolled back means sequence decoding must pay attention to it).


I'm also having trouble working out how to get a historical snapshot for
the most recent committed xact in a decoding session so the sequence's name
can be looked up by oid in the relcache during decoding. Advice would be
welcome if anyone can spare a moment.

I'll keep working on this concurrent with some higher priority work.
Suggestions, advice, or screams of horror welcomed. I think we really,
really need logical decoding of sequence advances...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Logical decoding of sequence advances, part II

2016-08-21 Thread Craig Ringer
Hi all

My earlier efforts at logical decoding of sequence advances were too
simplistic[1], falling afoul of issues with sequences being both
transactional and not transactional depending on whether the sequence is
created in the current xact or not.

TL;DR of solution:

* extend SeqTableData and xl_seq_rec with xid
* set xid in SeqTableData during DefineSequence()
* if SeqTableData set, write it in xl_seq_rec when writing xlog
* assign sequence update to specified xid's reorder buffer in decoding if
xid set, otherwise decode immediately


The problem:

If the sequence is created in the current xact (i.e. uncommitted) we have
to add the sequence updates to that xact to be replayed only if it commits.
The sequence is visible only to the toplevel xact that created the sequence
so advances of it can only come from that xact and its children. The actual
CREATE SEQUENCE is presumed to be handled separately by an event trigger or
similar.

If the new sequence is committed we must replay sequence advances
immediately and non-transactionally to ensure they're not lost due to xact
rollback or replayed in the wrong order due to xact commit order.

If the sequence is ALTERed in a way that changes pg_class that's event
triggers' job and sequence decoding doesn't care. If it's ALTERed in a way
that changes Form_pg_sequence we replay the change immediately, using the
last committed snapshot to get the sequence details, so the change will
take immediate effect and is retained whether or not any pg_class changes
are committed. This reflects how it happens on the upstream.


Planned solution:

Extend xl_seq_rec with a created_in_xid TransactionId field. If
created_in_xid != InvalidTransactionId, logical decoding associates the
sequence advance with the given toplevel xact and adds it to the reorder
buffer instead of immediately invoking the sequence decoding callback. The
decoding callback then gets invoked during ReorderBufferCommit processing
at the appropriate time, like any other transactional change.

To determine whether to log an xid for the sequence advance we need some
backend local state to determine whether the sequence is new in this xact.
Handily we already have one, the seqhashtab of SeqTableData in sequence.c,
just where it's needed. So all that's needed is to add a TransactionId
field that we set if we created that sequence in this session. If it's set
we test it for TransactionIsInProgress() when xlog'ing a sequence advance;
if it is, log that xid. If not in progress, clear the xid in SeqTableData
entry so we don't check again.



Another approach would be, during decoding, to look up the relfilenode of
the sequence to get the sequence oid and do a pg_class lookup. Check to see
whether xmin is part of an in-progress xact. If so, add the sequence
advance to that xact's reorder buffer, otherwise decode it immediately. The
problem is that

(a) I think we lack relfilenode-to-oid mapping information at
decoding-time. RelidByRelfilenode() needs a snapshot and is invoked
during ReorderBufferCommit(). We have make the transactional vs
nontransactional decision in LogicalDecodingProcessRecord() when I'm pretty
sure we don't have a snapshot.

(b) It also has issues with ALTER TRANSACTION. We must replay decoded xact
updates immediately even if some in-flight xact has modified the pg_class
entry for the sequence. So we can't just check whether the xmin is one of
our xact's (sub)xids, we must also check whether some older tuple for the
same sequence oid has a corresponding xmax and keep walking backwards until
we determine whether we originally CREATEd the sequence in this xact or
only ALTERed it.


So yeah. I think extending SeqTableData and xl_seq_rec with xid is the way
to go. Objections?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services