Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-03-14 Thread Craig Ringer
On 11 March 2016 at 22:24, Petr Jelinek  wrote:

> On 02/03/16 08:05, Craig Ringer wrote:
>
>> On 1 March 2016 at 05:30, Petr Jelinek > > wrote:
>>
>

>
>>
>> I wonder if it would be acceptable to create new info flag for
>> RM_SEQ_ID that would behave just like XLOG_SEQ_LOG but would be used
>> only for the nontransactional updates (nextval) so that decoding
>> could easily differentiate between transactional and
>> non-transactional update of sequence and then just either call the
>> callback immediately or add the change to reorder buffer based on
>> that. The redo code could just have simple OR expression to behave
>> same with both of the info flags.
>>
>>
>> That's much cleaner than trying to keep track of sequence creations and
>> really pretty harmless. I'll give that a go and see how it looks.
>>
>> Seems like simpler solution than building all the tracking code on
>> the decoding side to me.
>>
>>
>> +1
>>
>>
> Except this won't work for sequences that have been created in same
> transaction as the nextval()/setval() was called because in those cases we
> don't want to decode the advancement of sequence until the end of
> transaction and we can't map the relfilenode to sequence without going
> through reorder buffer in those cases either
>


I'll explain this a bit (for when I forget all about it and come back to it
confused, or if someone else picks this up):

The issue is transactions like

BEGIN;
CREATE TABLE blah (id serial primary key, something text);
INSERT INTO blah (something) SELECT  ;
COMMIT;

Here we create the sequence, then we advance the sequence in subsequent
statements that're part of the same xact but not directly connected to the
sequence creation. There's no convenient way to tell, when we see the
Form_pg_sequence updates in WAL for the newly created sequence, that it's
for a not-yet-committed xact so we shouldn't send the advance to the client
yet.

Once the xact that created the sequence commits we have to make sure we
send its latest state, not the initial state when it was created. So with
the above proposal we'd still need to look up those new-info-flagged
entries against a map of uncommitted sequences by relfilenode and decide
whether to send it immediately or update the latest state of an uncommitted
sequence in a reorder buffer.

IOW we have to do pretty much what I described before. We can still log
sequence updates with a different info flag but we need to know how to
associate the record with the xact that created it, so we have to log the
creating xid in the record for the initial state of a newly created
sequence. At least that'd be less ugly than trying to peek at decoded
catalog updates in the reorder buffer to spot new sequence creation and can
be done only when wal_level = logical, but it'd mean that the two record
types were different in more than just info flag.

The other wrinkle Petr refers to is that when decoding XLOG_SEQ_LOG we only
have a relfilenode. We don't know the oid of the sequence, which we need to
look up its name. The reorder buffer code uses RelidByRelfilenode for that,
which requires a snapshot. I'm not sure what problem that poses, since we'd
obviously need a snapshot set up to look up the name by oid anyway and we'd
be using the most recently committed historic snapshot for both.

Anyway, this is still complicated because of the mess with sequences being
both transactional and not-transactional in ways that rely on how the low
level storage and WAL works.

Unfortunately I don't expect to have time to produce a new patch for 9.6.

(BTW, I'd be interested in seeing what code breaks if we introduced a
compile option in src/include/pg_config_manual.h to force oid and
relfilenode randomization rather than starting off with them being the
same.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-03-11 Thread Petr Jelinek

On 02/03/16 08:05, Craig Ringer wrote:

On 1 March 2016 at 05:30, Petr Jelinek > wrote:


On 29/02/16 03:23, Craig Ringer wrote:

Sound reasonable?


I wonder if it would be acceptable to create new info flag for
RM_SEQ_ID that would behave just like XLOG_SEQ_LOG but would be used
only for the nontransactional updates (nextval) so that decoding
could easily differentiate between transactional and
non-transactional update of sequence and then just either call the
callback immediately or add the change to reorder buffer based on
that. The redo code could just have simple OR expression to behave
same with both of the info flags.


That's much cleaner than trying to keep track of sequence creations and
really pretty harmless. I'll give that a go and see how it looks.

Seems like simpler solution than building all the tracking code on
the decoding side to me.


+1



Except this won't work for sequences that have been created in same 
transaction as the nextval()/setval() was called because in those cases 
we don't want to decode the advancement of sequence until the end of 
transaction and we can't map the relfilenode to sequence without going 
through reorder buffer in those cases either


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-03-01 Thread Craig Ringer
On 1 March 2016 at 05:30, Petr Jelinek  wrote:

>
> On 29/02/16 03:23, Craig Ringer wrote:
>>
>>

> Sound reasonable?
>>
>
> I wonder if it would be acceptable to create new info flag for RM_SEQ_ID
> that would behave just like XLOG_SEQ_LOG but would be used only for the
> nontransactional updates (nextval) so that decoding could easily
> differentiate between transactional and non-transactional update of
> sequence and then just either call the callback immediately or add the
> change to reorder buffer based on that. The redo code could just have
> simple OR expression to behave same with both of the info flags.
>

That's much cleaner than trying to keep track of sequence creations and
really pretty harmless. I'll give that a go and see how it looks.


> Seems like simpler solution than building all the tracking code on the
> decoding side to me.


+1

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-02-29 Thread Alvaro Herrera
Petr Jelinek wrote:

> I wonder if it would be acceptable to create new info flag for RM_SEQ_ID
> that would behave just like XLOG_SEQ_LOG but would be used only for the
> nontransactional updates (nextval) so that decoding could easily
> differentiate between transactional and non-transactional update of sequence
> and then just either call the callback immediately or add the change to
> reorder buffer based on that. The redo code could just have simple OR
> expression to behave same with both of the info flags.
> 
> Seems like simpler solution than building all the tracking code on the
> decoding side to me.

Given the mess in Craig's description, the new info flag sounds a much
more reasonable approach to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-02-29 Thread Petr Jelinek



On 29/02/16 03:23, Craig Ringer wrote:

On 17 December 2015 at 10:08, Craig Ringer > wrote:

On 15 December 2015 at 20:17, Andres Freund > wrote:


I think this is quite the wrong approach. You're calling the logical
decoding callback directly from decode.c, circumventing
reorderbuffer.c. While you don't need the actual reordering, you
*do*
need the snapshot integration.


Yeah. I thought I could get away without it because I could use
Form_pg_sequence.sequence_name to bypass any catalog lookups at all,
but per upthread that field should be ignored and can't be used.

As you since noticed you can't just
look into the sequence tuple and be happy with that, you need to do
pg_class lookups et al. Furhtermore callbacks can validly expect
to have
a snapshot set.


Good point. Just because I don't need one doesn't mean others won't,
and all the other callbacks do.

I'll have a read over reorderbuffer.c and see if I can integrate
this properly.

At the very least what you need to do is to SetupHistoricSnapshot(),
then lookup the actual identity of the sequence using
RelidByRelfilenode() and only then you can call the callback.


Yeah. That means it's safe for the callback to do a syscache lookup
for the sequence name by oid.


I've revisited this patch in an attempt to get a corrected version ready
for the next CF.

Turns out it's not that simple.


Sequences advances aren't part of a transaction (though they always
happen during one) and proceed whether the xact that happened to trigger
them commits or rolls back. So it doesn't make sense to add them to a
reorder buffer for an xact. We want to send it immediately, not
associate it with some arbitrary xact that just happened to use the last
value in the cache that might not commit for ages.


But then when do we send it? If sent at the moment of decoding the
advance from WAL then it'll always be sent to the client between the
commit of one xact and the begin of another, because it'll be sent while
we're reading xlog and populating reorder buffers. For advance of an
already-created sequence advance that's what we want, but CREATE
SEQUENCE, ALTER SEQUENCE etc also log RM_SEQ_ID XLOG_SEQ_LOG operations.
The xact that created the sequence isn't committed yet so sending the
advance to the downstream will lead to attempting to advance a sequence
that doesn't yet exist. Similarly ALTER SEQUENCE emits a new
XLOG_SEQ_LOG with the updated Form_pg_sequence. All fine for physical
replication but rather more challenging for logical replication since
sometimes we want the sequence change to be associated with a particular
xact and sometimes not.


So the reorder buffer has to keep track of sequences. It must check to
see if a catalog change is a sequence creation and if so mark the
sequence as pending, keeping a copy of the Form_pg_sequence that's
updated to the latest version as the xact proceeds and writes updates to
the sequence. On commit the sequence advance is replayed at the end of
the xact using the snapshot of the newly committed xact; on rollback
it's discarded since the sequence never became visible to anyone else.
We can safely assert that that sequence will not be updated by any other
xact until this one commits.


In reorderbuffer.c, there's a test for relation->rd_rel->relkind ==
RELKIND_SEQUENCE that skips changes to sequence relations. We'll want to
replicate sequence catalog updates as DDL via event triggers and deparse
so that's OK, but I think we'll need to make a note of sequence creation
here to mark new sequences as uncommitted.


If we see a sequence change that isn't for an uncommitted newly-created
sequence we make an immediate call through the reorder buffer to send
the sequence update info to the client. That's OK even if it's something
like ALTER SEQUENCE ... INCREMENT 10 RENAME TO othername; . The ALTER
... RENAME part gets sent with the xact that did it when/if it commits
since it's part of the pg_class tuple for the sequence; the INCREMENT
gets sent immediately since it's part of the Form_pg_sequence. That
matches what happens locally, and it means there's no need to keep track
of every sequence, only new ones.


On commit or rollback of an xact that creates a sequence we pop the
sequence oid from the ordered list of uncommitted sequence oids that
must be kept by the decoding session.


If we land up decoding the same WAL again we might send sequence updates
that temporarily move a sequence backwards then advance it again when we
replay the newer updates. That's the same as hot standby and seems fine.
You obviously won't be able to safely get new values from sequences
that're replicated from an upstream on the downstream anyway - and I
anticipate that logical decoding receivers will probably want 

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-02-28 Thread Craig Ringer
On 17 December 2015 at 10:08, Craig Ringer  wrote:

> On 15 December 2015 at 20:17, Andres Freund  wrote:
>>
>>
>> I think this is quite the wrong approach. You're calling the logical
>> decoding callback directly from decode.c, circumventing
>> reorderbuffer.c. While you don't need the actual reordering, you *do*
>> need the snapshot integration.
>
>
> Yeah. I thought I could get away without it because I could use
> Form_pg_sequence.sequence_name to bypass any catalog lookups at all, but
> per upthread that field should be ignored and can't be used.
>
>
>> As you since noticed you can't just
>> look into the sequence tuple and be happy with that, you need to do
>> pg_class lookups et al. Furhtermore callbacks can validly expect to have
>> a snapshot set.
>>
>
> Good point. Just because I don't need one doesn't mean others won't, and
> all the other callbacks do.
>
> I'll have a read over reorderbuffer.c and see if I can integrate this
> properly.
>
>
>> At the very least what you need to do is to SetupHistoricSnapshot(),
>> then lookup the actual identity of the sequence using
>> RelidByRelfilenode() and only then you can call the callback.
>>
>
> Yeah. That means it's safe for the callback to do a syscache lookup for
> the sequence name by oid.
>

I've revisited this patch in an attempt to get a corrected version ready
for the next CF.

Turns out it's not that simple.


Sequences advances aren't part of a transaction (though they always happen
during one) and proceed whether the xact that happened to trigger them
commits or rolls back. So it doesn't make sense to add them to a reorder
buffer for an xact. We want to send it immediately, not associate it with
some arbitrary xact that just happened to use the last value in the cache
that might not commit for ages.


But then when do we send it? If sent at the moment of decoding the advance
from WAL then it'll always be sent to the client between the commit of one
xact and the begin of another, because it'll be sent while we're reading
xlog and populating reorder buffers. For advance of an already-created
sequence advance that's what we want, but CREATE SEQUENCE, ALTER SEQUENCE
etc also log RM_SEQ_ID XLOG_SEQ_LOG operations. The xact that created the
sequence isn't committed yet so sending the advance to the downstream will
lead to attempting to advance a sequence that doesn't yet exist. Similarly
ALTER SEQUENCE emits a new XLOG_SEQ_LOG with the updated Form_pg_sequence.
All fine for physical replication but rather more challenging for logical
replication since sometimes we want the sequence change to be associated
with a particular xact and sometimes not.


So the reorder buffer has to keep track of sequences. It must check to see
if a catalog change is a sequence creation and if so mark the sequence as
pending, keeping a copy of the Form_pg_sequence that's updated to the
latest version as the xact proceeds and writes updates to the sequence. On
commit the sequence advance is replayed at the end of the xact using the
snapshot of the newly committed xact; on rollback it's discarded since the
sequence never became visible to anyone else. We can safely assert that
that sequence will not be updated by any other xact until this one commits.


In reorderbuffer.c, there's a test for relation->rd_rel->relkind ==
RELKIND_SEQUENCE that skips changes to sequence relations. We'll want to
replicate sequence catalog updates as DDL via event triggers and deparse so
that's OK, but I think we'll need to make a note of sequence creation here
to mark new sequences as uncommitted.


If we see a sequence change that isn't for an uncommitted newly-created
sequence we make an immediate call through the reorder buffer to send the
sequence update info to the client. That's OK even if it's something like
ALTER SEQUENCE ... INCREMENT 10 RENAME TO othername; . The ALTER ... RENAME
part gets sent with the xact that did it when/if it commits since it's part
of the pg_class tuple for the sequence; the INCREMENT gets sent immediately
since it's part of the Form_pg_sequence. That matches what happens locally,
and it means there's no need to keep track of every sequence, only new ones.


On commit or rollback of an xact that creates a sequence we pop the
sequence oid from the ordered list of uncommitted sequence oids that must
be kept by the decoding session.


If we land up decoding the same WAL again we might send sequence updates
that temporarily move a sequence backwards then advance it again when we
replay the newer updates. That's the same as hot standby and seems fine.
You obviously won't be able to safely get new values from sequences that're
replicated from an upstream on the downstream anyway - and I anticipate
that logical decoding receivers will probably want to use seqam etc later
to make those sequences read-only until a failover event.


Sound reasonable?



* keep track of the last-committed xact's snapshot in the decoding session

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-16 Thread Craig Ringer
On 15 December 2015 at 20:17, Andres Freund  wrote:
>
>
> I think this is quite the wrong approach. You're calling the logical
> decoding callback directly from decode.c, circumventing
> reorderbuffer.c. While you don't need the actual reordering, you *do*
> need the snapshot integration.


Yeah. I thought I could get away without it because I could use
Form_pg_sequence.sequence_name to bypass any catalog lookups at all, but
per upthread that field should be ignored and can't be used.


> As you since noticed you can't just
> look into the sequence tuple and be happy with that, you need to do
> pg_class lookups et al. Furhtermore callbacks can validly expect to have
> a snapshot set.
>

Good point. Just because I don't need one doesn't mean others won't, and
all the other callbacks do.

I'll have a read over reorderbuffer.c and see if I can integrate this
properly.


> At the very least what you need to do is to SetupHistoricSnapshot(),
> then lookup the actual identity of the sequence using
> RelidByRelfilenode() and only then you can call the callback.
>

Yeah. That means it's safe for the callback to do a syscache lookup for the
sequence name by oid.


> > Needed to make logical replication based failover possible.
>
> While it'll make it easier, I think it's certainly quite possible to do
> so without this feature if you accept some sane restrictions. If you
> e.g. define to only handle serial columns (i.e. sequences that
> used/owned by exactly one table & column), you can do a 'good enough'
> emulation the output plugin.
>
> Take something like BDR's bdr_relcache.c (something which you're going
> to end up needing for any nontrivial replication solution). Everytime
> you build a cache entry you look up whether there's an owned by
> sequence.  When decoding an insert or update to that relation, also emit
> an 'increase this sequence to at least XXX' message.
>

Eh. I don't think it's good enough for failover really. Too much of a
foot-gun risk for my taste. We've both seen the weird things users do and
the way nobody reads the manual or understands the limitations (global DDL
lock anybody?) and I really want to make things work right by default with
minimal caveats.

I'm not against working around it, and we'll need to for 9.4/9.5, but I'm
trying to get something more solid into 9.6.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Petr Jelinek

On 2015-12-15 13:51, Andres Freund wrote:

On 2015-12-15 13:46:29 +0100, Petr Jelinek wrote:

I don't think that approach alone is good enough. It might be ok for
selective replication where the replication is driven by tables anyway, but
in general and especially for failover it's not good enough to tell user
that we handle some sequences and they have to fix the rest manually.


I think it solves roughly 80-90% of all usages of sequences. That's a
significant improvement over the status quo.

I'm not saying it's perfect, just that it's applicable to 9.4, and might
be good enough initially.


And I am saying that I think more can and should be done even for 9.4/5.




That's not much different than fixing them all in practice as you
script it anyway.


If you can easily script it, it's just the same type (sequences owned by
a single column), everything else starts to be a bit more complicated anyway.



Well, there is some difference between scripting it for general use-case 
and scripting it with domain knowledge, but I see what you mean.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Andres Freund
On 2015-12-14 16:19:33 +0800, Craig Ringer wrote:
> On 14 December 2015 at 11:28, Craig Ringer  wrote:
>
> > Hi all
> >
> > Attached is a patch against 9.6 to add support for informing logical
> > decoding plugins of the new sequence last_value when sequence advance WAL
> > records are processed during decoding.
> >
>
>
> Attached a slightly updated version. It just has less spam in the
> regression tests, by adding a new option to test_decoding to show
> sequences, which it doesn't enable except in sequence specific tests.

I think this is quite the wrong approach. You're calling the logical
decoding callback directly from decode.c, circumventing
reorderbuffer.c. While you don't need the actual reordering, you *do*
need the snapshot integration.   As you since noticed you can't just
look into the sequence tuple and be happy with that, you need to do
pg_class lookups et al. Furhtermore callbacks can validly expect to have
a snapshot set.

At the very least what you need to do is to SetupHistoricSnapshot(),
then lookup the actual identity of the sequence using
RelidByRelfilenode() and only then you can call the callback.


> Needed to make logical replication based failover possible.

While it'll make it easier, I think it's certainly quite possible to do
so without this feature if you accept some sane restrictions. If you
e.g. define to only handle serial columns (i.e. sequences that
used/owned by exactly one table & column), you can do a 'good enough'
emulation the output plugin.

Take something like BDR's bdr_relcache.c (something which you're going
to end up needing for any nontrivial replication solution). Everytime
you build a cache entry you look up whether there's an owned by
sequence.  When decoding an insert or update to that relation, also emit
an 'increase this sequence to at least XXX' message.

While it's not perfect, this has the big advantage of being doable with 9.4.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Andres Freund
On 2015-12-15 13:46:29 +0100, Petr Jelinek wrote:
> I don't think that approach alone is good enough. It might be ok for
> selective replication where the replication is driven by tables anyway, but
> in general and especially for failover it's not good enough to tell user
> that we handle some sequences and they have to fix the rest manually.

I think it solves roughly 80-90% of all usages of sequences. That's a
significant improvement over the status quo.

I'm not saying it's perfect, just that it's applicable to 9.4, and might
be good enough initially. I'm not arguing against adding sequence
decoding here.


> That's not much different than fixing them all in practice as you
> script it anyway.

If you can easily script it, it's just the same type (sequences owned by
a single column), everything else starts to be a bit more complicated anyway.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Petr Jelinek

On 2015-12-15 13:17, Andres Freund wrote:

On 2015-12-14 16:19:33 +0800, Craig Ringer wrote:


Needed to make logical replication based failover possible.


While it'll make it easier, I think it's certainly quite possible to do
so without this feature if you accept some sane restrictions. If you
e.g. define to only handle serial columns (i.e. sequences that
used/owned by exactly one table & column), you can do a 'good enough'
emulation the output plugin.

Take something like BDR's bdr_relcache.c (something which you're going
to end up needing for any nontrivial replication solution). Everytime
you build a cache entry you look up whether there's an owned by
sequence.  When decoding an insert or update to that relation, also emit
an 'increase this sequence to at least XXX' message.

While it's not perfect, this has the big advantage of being doable with 9.4.



I don't think that approach alone is good enough. It might be ok for 
selective replication where the replication is driven by tables anyway, 
but in general and especially for failover it's not good enough to tell 
user that we handle some sequences and they have to fix the rest 
manually. That's not much different than fixing them all in practice as 
you script it anyway.


However, if it was combined with something like what londiste does, 
which is to check sequences periodically and send last_value + some 
reasonable buffer, it could work well in most cases. In this scenario 
your method would be used for checking that sequence is close to going 
over buffer and firing the periodic send sooner than it would be if it 
was purely time based.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-14 Thread Craig Ringer
On 14 December 2015 at 16:19, Craig Ringer  wrote:


> Attached a slightly updated version. It just has less spam in the
> regression tests, by adding a new option to test_decoding to show
> sequences, which it doesn't enable except in sequence specific tests.
>

Whoops, the patch as written is wrong. I used Form_pg_sequence's
sequence_name field to get the sequence name. This seems to be a vestigial
field with no useful purpose except taking up space; in particular, it's
not updated when ALTER SEQUENCE ... RENAME TO is used to rename a sequence.
The sequence's pg_class entry is updated but the sequence_name in the
sequence page is not.

Since the sequence_name is written to WAL with each sequence recovery
position advance, isn't used anywhere, and can be wrong, shouldn't it just
be removed from Form_pg_sequence entirely? Or at least replaced with the
sequence's pg_class entry oid?

I could always fix ALTER SEQUENCE to update sequence_name, but it seems
pretty pointless to have it at all.

In any case I'll need to update this patch. Which will be fun, since the
sequence oid doesn't seem to be written to the xlog record, just the
HeapTuple header and Form_pg_sequence, and the redo is handled by doing a
page replacement. I'd like to update Form_pg_sequence to store oid not
sequence_name at the same time but that makes all this a dramatically more
intrusive change that's rather less likely to be accepted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-14 Thread Craig Ringer
On 14 December 2015 at 11:28, Craig Ringer  wrote:

> Hi all
>
> Attached is a patch against 9.6 to add support for informing logical
> decoding plugins of the new sequence last_value when sequence advance WAL
> records are processed during decoding.
>


Attached a slightly updated version. It just has less spam in the
regression tests, by adding a new option to test_decoding to show
sequences, which it doesn't enable except in sequence specific tests.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 8ee88b0850fea17086a293c8afee83af3a6b95e1 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 14 Dec 2015 15:07:30 +0800
Subject: [PATCH] Logical decoding for sequence advances

Add a new logical decoding callback seq_advance_cb that's invoked
whenever we decode a new sequence chunk allocation from WAL.

Also add support for it in test_decoding.

It's guaranteed that the last_value passed to the callback is equal to
or greater than (for +ve sequences) any value returned by a nextval()
call visible to any transaction committed at or before the callback is
invoked.

In practice it'll be quite a lot larger because PostgreSQL
preallocates large chunks of sequences to minimise WAL writes, at
the cost of wasting values if it has to do crash recovery. Logical
decoding sees the crash recovery position when it gets advanced.

Needed to make logical replication based failover possible.
---
 contrib/test_decoding/Makefile  |   2 +-
 contrib/test_decoding/expected/sequence.out | 108 
 contrib/test_decoding/sql/sequence.sql  |  35 +
 contrib/test_decoding/test_decoding.c   |  33 +
 src/backend/replication/logical/decode.c|  60 +++-
 src/backend/replication/logical/logical.c   |  28 +++-
 src/include/replication/logical.h   |   3 +
 src/include/replication/output_plugin.h |  16 +
 8 files changed, 282 insertions(+), 3 deletions(-)
 create mode 100644 contrib/test_decoding/expected/sequence.out
 create mode 100644 contrib/test_decoding/sql/sequence.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index a362e69..91e2765 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -38,7 +38,7 @@ submake-test_decoding:
 	$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
 REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel \
-	binary prepared replorigin
+	binary prepared replorigin sequence
 
 regresscheck: | submake-regress submake-test_decoding temp-install
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/sequence.out b/contrib/test_decoding/expected/sequence.out
new file mode 100644
index 000..a13d29a
--- /dev/null
+++ b/contrib/test_decoding/expected/sequence.out
@@ -0,0 +1,108 @@
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+CREATE SEQUENCE test_seq;
+SELECT sum(nextval('test_seq')) FROM generate_series(1, 100);
+ sum  
+--
+ 5050
+(1 row)
+
+ALTER SEQUENCE test_seq RESTART WITH 2;
+SELECT sum(nextval('test_seq')) FROM generate_series(1, 100);
+ sum  
+--
+ 5150
+(1 row)
+
+ALTER SEQUENCE test_seq INCREMENT BY 5;
+SELECT sum(nextval('test_seq')) FROM generate_series(1, 100);
+  sum  
+---
+ 35350
+(1 row)
+
+ALTER SEQUENCE test_seq CACHE 1;
+SELECT sum(nextval('test_seq')) FROM generate_series(1, 15000);
+sum
+---
+ 571552500
+(1 row)
+
+DROP SEQUENCE test_seq;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', '1');
+ data 
+--
+ sequence test_seq advanced to 1
+ sequence test_seq advanced to 33
+ sequence test_seq advanced to 66
+ sequence test_seq advanced to 99
+ sequence test_seq advanced to 132
+ sequence test_seq advanced to 2
+ sequence test_seq advanced to 34
+ sequence test_seq advanced to 67
+ sequence test_seq advanced to 100
+ sequence test_seq advanced to 133
+ sequence test_seq advanced to 101
+ sequence test_seq advanced to 266
+ sequence test_seq advanced to 431
+ sequence test_seq advanced to 596
+ sequence test_seq advanced to 761
+ sequence test_seq advanced to 601
+ sequence test_seq advanced to 50761
+ sequence test_seq advanced to 100761
+(18 rows)
+
+CREATE SEQUENCE test2_seq MINVALUE 100 MAXVALUE 200 START 200 INCREMENT BY -1 CYCLE;
+SELECT sum(nextval('test2_seq')) FROM generate_series(1, 300);
+  sum  
+---
+ 45147
+(1 row)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', '1');
+ data