Re: Skipping logical replication transactions on subscriber side

2022-06-23 Thread Robert Haas
On Wed, Jun 22, 2022 at 10:48 PM Noah Misch  wrote:
> Here's a more-verbose description of (2), with additions about what it does
> and doesn't achieve:
>
> 2. On systems where double alignment differs from int64 alignment, require
>NAMEDATALEN%8==0.  Modify the test from commits 79b716c and c1da0ac to stop
>treating "name" fields specially.  The test will still fail for AIX
>compatibility violations, but "name" columns no longer limit your field
>position candidates like they do today (today == option (1)).  Upgrading to
>v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
>to the new restriction.  (I'm not sure pg_upgrade checks NAMEDATALEN
>compatibility, but it should require at least one of: same NAMEDATALEN, or
>absence of "name" columns in user tables.)

Doing this much seems pretty close to free to me. I doubt anyone
really cares about using a NAMEDATALEN value that is not a multiple of
8 on any platform. I also think there are few people who care about
AIX. The intersection must be very small indeed, or so I would think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Noah Misch
On Wed, Jun 22, 2022 at 09:50:02AM -0400, Robert Haas wrote:
> On Wed, Jun 22, 2022 at 12:28 AM Noah Misch  wrote:
> > Here's how I rank the options, from most-preferred to least-preferred:
> >
> > 1. Put new eight-byte fields at the front of each catalog, when in doubt.
> > 2. On systems where double alignment differs from int64 alignment, require
> >NAMEDATALEN%8==0.  Upgrading to v16 would require dump/reload for AIX 
> > users
> >changing NAMEDATALEN to conform to the new restriction.
> > 3. Introduce new typalign values.  Upgrading to v16 would require 
> > dump/reload
> >for all AIX users.
> > 4. De-support AIX.
> > 5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
> >sometimes 8-byte aligned to be 8-byte alignment on all platforms".
> >Upgrading to v16 would require dump/reload for all AIX users.
> > 6. Require -qalign=natural on AIX.  Upgrading to v16 would require 
> > dump/reload
> >and possible system library rebuilds for all AIX users.
> >
> > I gather (1) isn't at the top of your ranking, or you wouldn't have written
> > in.  What do you think of (2)?
> 
> (2) pleases me in the sense that it seems to inconvenience very few
> people, perhaps no one, in order to avoid inconveniencing a larger
> number of people. However, it doesn't seem sufficient.

Here's a more-verbose description of (2), with additions about what it does
and doesn't achieve:

2. On systems where double alignment differs from int64 alignment, require
   NAMEDATALEN%8==0.  Modify the test from commits 79b716c and c1da0ac to stop
   treating "name" fields specially.  The test will still fail for AIX
   compatibility violations, but "name" columns no longer limit your field
   position candidates like they do today (today == option (1)).  Upgrading to
   v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
   to the new restriction.  (I'm not sure pg_upgrade checks NAMEDATALEN
   compatibility, but it should require at least one of: same NAMEDATALEN, or
   absence of "name" columns in user tables.)

> If I understand
> correctly, even a catalog that includes no NameData column can have a
> problem.

Correct.

On Wed, Jun 22, 2022 at 10:39:20AM -0400, Tom Lane wrote:
> It appears that what we've got on AIX is that typalign 'd' overstates the
> actual alignment requirement for 'double', which is safe from the SIGBUS
> angle.

On AIX, typalign='d' states the exact alignment requirement for 'double'.  It
understates the alignment requirement for int64_t.

> I don't think we want rules exactly, what we need is mechanical verification
> that the field orderings in use are safe.

Commits 79b716c and c1da0ac did that.




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 11:01 AM Tom Lane  wrote:
> Given that we haven't run into this before, it seems like a reasonable
> bet that the problem will seldom arise.  So as long as we have a
> cross-check I'm all right with calling it good and moving on.  Expending
> a whole lot of work to improve the situation seems uncalled-for.

All right. Well, I'm on record as not liking that solution, but
obviously you can and do feel differently.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 22, 2022 at 10:39 AM Tom Lane  wrote:
>> I don't especially want to
>> invent an additional typalign code that we could only test on legacy
>> platforms.

> I agree with that, but I don't think that having the developers
> enforce alignment rules by reordering catalog columns for the sake of
> legacy platforms is appealing either.

Given that we haven't run into this before, it seems like a reasonable
bet that the problem will seldom arise.  So as long as we have a
cross-check I'm all right with calling it good and moving on.  Expending
a whole lot of work to improve the situation seems uncalled-for.

When and if we get to a point where we're ready to break on-disk
compatibility for user tables, perhaps revisiting the alignment
rules would be an appropriate component of that.  I don't see that
happening in the foreseeable future, though.

regards, tom lane




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 10:39 AM Tom Lane  wrote:
> That's a fundamental misreading of the situation.  typalign is essential
> on alignment-picky architectures, else you will get a SIGBUS fault
> when trying to fetch a multibyte value (whether it's just going to get
> stored into a Datum array is not very relevant here).

I mean, that problem is easily worked around. Maybe you think memcpy
would be a lot slower than a direct assignment, but "essential" is a
strong word.

> I concur that Noah's description of #2 is not an accurate statement
> of the rules we'd have to impose to be sure that the C structs line up
> with the actual tuple layouts.  I don't think we want rules exactly,
> what we need is mechanical verification that the field orderings in
> use are safe.  The last time I looked at this thread, what was being
> discussed was (a) re-ordering pg_subscription's columns and (b)
> adding some kind of regression test to verify that all catalogs meet
> the expectation of 'd'-aligned fields not needing alignment padding
> that an AIX compiler might choose not to insert.  That still seems
> like the most plausible answer to me.  I don't especially want to
> invent an additional typalign code that we could only test on legacy
> platforms.

I agree with that, but I don't think that having the developers
enforce alignment rules by reordering catalog columns for the sake of
legacy platforms is appealing either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Tom Lane
[ sorry for not having tracked this thread more closely ... ]

Robert Haas  writes:
> Regarding (1), it is my opinion that the only real value of typalign
> is for system catalogs, and specifically that it lets you put the
> fields in an order that is aesthetically pleasing rather than worrying
> about alignment considerations. After all, if we just ordered the
> fields by descending alignment requirement, we could get rid of
> typalign altogether (at least, if we didn't care about backward
> compatibility). User tables would get smaller because we'd get rid of
> alignment padding, and I don't think we'd see much impact on
> performance because, for user tables, we copy the values into a datum
> array before doing anything interesting with them. So (1) seems to me
> to be conceding that typalign is unfit for the only purpose it has.

That's a fundamental misreading of the situation.  typalign is essential
on alignment-picky architectures, else you will get a SIGBUS fault
when trying to fetch a multibyte value (whether it's just going to get
stored into a Datum array is not very relevant here).

It appears that what we've got on AIX is that typalign 'd' overstates the
actual alignment requirement for 'double', which is safe from the SIGBUS
angle.  However, it is a problem for our usage with system catalogs,
where our C struct declarations may not line up with the way that a
tuple is constructed by the tuple assembly routines.

I concur that Noah's description of #2 is not an accurate statement
of the rules we'd have to impose to be sure that the C structs line up
with the actual tuple layouts.  I don't think we want rules exactly,
what we need is mechanical verification that the field orderings in
use are safe.  The last time I looked at this thread, what was being
discussed was (a) re-ordering pg_subscription's columns and (b)
adding some kind of regression test to verify that all catalogs meet
the expectation of 'd'-aligned fields not needing alignment padding
that an AIX compiler might choose not to insert.  That still seems
like the most plausible answer to me.  I don't especially want to
invent an additional typalign code that we could only test on legacy
platforms.

regards, tom lane




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 12:28 AM Noah Misch  wrote:
> "Everything" isn't limited to PostgreSQL.  The Perl ABI exposes large structs
> to plperl; a field of type double could require the AIX user to rebuild Perl
> with the same compiler option.

Oh, that isn't so great, then.

> Here's how I rank the options, from most-preferred to least-preferred:
>
> 1. Put new eight-byte fields at the front of each catalog, when in doubt.
> 2. On systems where double alignment differs from int64 alignment, require
>NAMEDATALEN%8==0.  Upgrading to v16 would require dump/reload for AIX users
>changing NAMEDATALEN to conform to the new restriction.
> 3. Introduce new typalign values.  Upgrading to v16 would require dump/reload
>for all AIX users.
> 4. De-support AIX.
> 5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
>sometimes 8-byte aligned to be 8-byte alignment on all platforms".
>Upgrading to v16 would require dump/reload for all AIX users.
> 6. Require -qalign=natural on AIX.  Upgrading to v16 would require dump/reload
>and possible system library rebuilds for all AIX users.
>
> I gather (1) isn't at the top of your ranking, or you wouldn't have written
> in.  What do you think of (2)?

(2) pleases me in the sense that it seems to inconvenience very few
people, perhaps no one, in order to avoid inconveniencing a larger
number of people. However, it doesn't seem sufficient. If I understand
correctly, even a catalog that includes no NameData column can have a
problem.

Regarding (1), it is my opinion that the only real value of typalign
is for system catalogs, and specifically that it lets you put the
fields in an order that is aesthetically pleasing rather than worrying
about alignment considerations. After all, if we just ordered the
fields by descending alignment requirement, we could get rid of
typalign altogether (at least, if we didn't care about backward
compatibility). User tables would get smaller because we'd get rid of
alignment padding, and I don't think we'd see much impact on
performance because, for user tables, we copy the values into a datum
array before doing anything interesting with them. So (1) seems to me
to be conceding that typalign is unfit for the only purpose it has.
Perhaps that's just how things are, but it doesn't seem like a good
way for things to be.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-21 Thread Noah Misch
On Mon, Jun 13, 2022 at 10:25:24AM -0400, Robert Haas wrote:
> On Sun, Apr 17, 2022 at 11:22 PM Noah Misch  wrote:
> > > Yes, but it could be false positives in some cases. For instance, the
> > > column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
> > > and 8 platforms but the new test fails.
> >
> > I'm happy with that, because the affected author should look for 
> > padding-free
> > layouts before settling on your example layout.  If the padding-free layouts
> > are all unacceptable, the author should update the expected sanity_check.out
> > to show the one row where the test "fails".

> Perhaps instead we ought to legislate that NAMEDATALEN must
> be a multiple of 8, or some such thing.
> 
> The other constraint, that typalign='d' fields must always fall on an
> 8 byte boundary, is probably less annoying in practice, but it's easy
> to imagine a future catalog running into trouble. Let's say we want to
> introduce a new catalog that has only an Oid column and a float8
> column. Perhaps with 0-3 bool or uint8 columns as well, or with any
> number of NameData columns as well. Well, the only way to satisfy this
> constraint is to put the float8 column first and the Oid column after
> it, which immediately makes it look different from every other catalog
> we have.

> AFAICS, we could do that by:
> 
> 1. De-supporting platforms that have this problem, or
> 2. Introducing new typalign values, as Noah proposed back on April 2, or
> 3. Somehow forcing values that are sometimes 4-byte aligned and
> sometimes 8-byte aligned to be 8-byte alignment on all platforms

On Mon, Jun 20, 2022 at 10:04:06AM -0400, Robert Haas wrote:
> On Mon, Jun 20, 2022 at 9:52 AM Peter Eisentraut
>  wrote:
> > That means changing the system's ABI, so in the extreme case you then
> > need to compile everything else to match as well.
> 
> I think we wouldn't want to do that in a minor release, but doing it
> in a new major release seems fine -- especially if only AIX is
> affected.

"Everything" isn't limited to PostgreSQL.  The Perl ABI exposes large structs
to plperl; a field of type double could require the AIX user to rebuild Perl
with the same compiler option.


Overall, this could be a textbook example of choosing between:

- Mild harm (unaesthetic column order) to many people.
- Considerable harm (dump/reload instead of pg_upgrade) to a small, unknown,
  possibly-zero quantity of people.

Here's how I rank the options, from most-preferred to least-preferred:

1. Put new eight-byte fields at the front of each catalog, when in doubt.
2. On systems where double alignment differs from int64 alignment, require
   NAMEDATALEN%8==0.  Upgrading to v16 would require dump/reload for AIX users
   changing NAMEDATALEN to conform to the new restriction.
3. Introduce new typalign values.  Upgrading to v16 would require dump/reload
   for all AIX users.
4. De-support AIX.
5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
   sometimes 8-byte aligned to be 8-byte alignment on all platforms".
   Upgrading to v16 would require dump/reload for all AIX users.
6. Require -qalign=natural on AIX.  Upgrading to v16 would require dump/reload
   and possible system library rebuilds for all AIX users.

I gather (1) isn't at the top of your ranking, or you wouldn't have written
in.  What do you think of (2)?




Re: Skipping logical replication transactions on subscriber side

2022-06-20 Thread Robert Haas
On Mon, Jun 20, 2022 at 9:52 AM Peter Eisentraut
 wrote:
> That means changing the system's ABI, so in the extreme case you then
> need to compile everything else to match as well.

I think we wouldn't want to do that in a minor release, but doing it
in a new major release seems fine -- especially if only AIX is
affected.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-20 Thread Peter Eisentraut

On 16.06.22 18:35, Robert Haas wrote:

On Thu, Jun 16, 2022 at 3:26 AM Masahiko Sawada  wrote:

FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.


Well if that can work it sure seems better.


That means changing the system's ABI, so in the extreme case you then 
need to compile everything else to match as well.






Re: Skipping logical replication transactions on subscriber side

2022-06-16 Thread Robert Haas
On Thu, Jun 16, 2022 at 3:26 AM Masahiko Sawada  wrote:
> FWIW, looking at the manual, there might have
> been a solution for AIX to specify -qalign=natural compiler option in
> order to enforce the alignment of double to 8.

Well if that can work it sure seems better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-16 Thread Masahiko Sawada
On Thu, Jun 16, 2022 at 2:27 AM Robert Haas  wrote:
>
> On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada  wrote:
> > > AFAICS, we could do that by:
> > >
> > > 1. De-supporting platforms that have this problem, or
> > > 2. Introducing new typalign values, as Noah proposed back on April 2, or
> > > 3. Somehow forcing values that are sometimes 4-byte aligned and
> > > sometimes 8-byte aligned to be 8-byte alignment on all platforms
> >
> > Introducing new typalign values seems a good idea to me as it's more
> > future-proof. Will this item be for PG16, right? The main concern
> > seems that what this test case enforces would be nuisance when
> > introducing a new system catalog or a new column to the existing
> > catalog but given we're in post PG15-beta1 it is unlikely to happen in
> > PG15.
>
> I agree that we're not likely to introduce a new typalign value any
> sooner than v16. There are a couple of things that bother me about
> that solution. One is that I don't know how many different behaviors
> exist out there in the wild. If we distinguish the alignment of double
> from the alignment of int8, is that good enough, or are there other
> data types whose properties aren't necessarily the same as either of
> those?

Yeah, there might be.

> The other is that 32-bit systems are already relatively rare
> and probably will become more rare until they disappear completely. It
> doesn't seem like a ton of fun to engineer solutions to problems that
> may go away by themselves with the passage of time.

IIUC the system affected by this problem is not necessarily 32-bit
system. For instance, the hoverfly on buildfarm is 64-bit system but
was affected by this problem. According to the XLC manual[1], there is
no difference between 32-bit systems and 64-bit systems in terms of
alignment for double. FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.

Regards,

[1] https://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf;
Table 11 on page 10.

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-06-15 Thread Robert Haas
On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada  wrote:
> > AFAICS, we could do that by:
> >
> > 1. De-supporting platforms that have this problem, or
> > 2. Introducing new typalign values, as Noah proposed back on April 2, or
> > 3. Somehow forcing values that are sometimes 4-byte aligned and
> > sometimes 8-byte aligned to be 8-byte alignment on all platforms
>
> Introducing new typalign values seems a good idea to me as it's more
> future-proof. Will this item be for PG16, right? The main concern
> seems that what this test case enforces would be nuisance when
> introducing a new system catalog or a new column to the existing
> catalog but given we're in post PG15-beta1 it is unlikely to happen in
> PG15.

I agree that we're not likely to introduce a new typalign value any
sooner than v16. There are a couple of things that bother me about
that solution. One is that I don't know how many different behaviors
exist out there in the wild. If we distinguish the alignment of double
from the alignment of int8, is that good enough, or are there other
data types whose properties aren't necessarily the same as either of
those? The other is that 32-bit systems are already relatively rare
and probably will become more rare until they disappear completely. It
doesn't seem like a ton of fun to engineer solutions to problems that
may go away by themselves with the passage of time. On the other hand,
if the alternative is to live with this kind of ugliness for another 5
years, maybe the time it takes to craft a solution is effort well
spent.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-14 Thread Masahiko Sawada
On Mon, Jun 13, 2022 at 11:25 PM Robert Haas  wrote:
>
> On Sun, Apr 17, 2022 at 11:22 PM Noah Misch  wrote:
> > > Yes, but it could be false positives in some cases. For instance, the
> > > column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
> > > and 8 platforms but the new test fails.
> >
> > I'm happy with that, because the affected author should look for 
> > padding-free
> > layouts before settling on your example layout.  If the padding-free layouts
> > are all unacceptable, the author should update the expected sanity_check.out
> > to show the one row where the test "fails".
>
> I realize that it was necessary to get something committed quickly
> here to unbreak the buildfarm, but this is really a mess. As I
> understand it, the problem here is that typalign='d' is either 4 bytes
> or 8 depending on how the 'double' type is aligned on that platform,
> but we use that typalign value also for some other data types that may
> not be aligned in the same way as 'double'. Consequently, it's
> possible to have a situation where the behavior of the C compiler
> diverges from the behavior of heap_form_tuple(). To avoid that, we
> need every catalog column that uses typalign=='d' to begin on an
> 8-byte boundary. We also want all such columns to occur before the
> first NameData column in the catalog, to guard against the possibility
> that NAMEDATALEN has been redefined to an odd value. I think this set
> of constraints is a nuisance and that it's mostly good luck we haven't
> run into any really awkward problems here so far.
>
> In many of our catalogs, the first member is an OID and the second
> member of the struct is of type NameData: pg_namespace, pg_class,
> pg_proc, etc. That common design pattern is in direct contradiction to
> the desires of this test case. As soon as someone wants to add a
> typalign='d' member to any of those system catalogs, the struct layout
> is going to have to get shuffled around -- and then it will look
> different from all the other ones. Or else we'd have to rearrange them
> all to move all the NameData columns to the end. I feel like it's
> weird to introduce a test case that so obviously flies in the face of
> how catalog layout has been done up to this point, especially for the
> sake of a hypothetical user who want to set NAMEDATALEN to an odd
> number. I doubt such scenarios have been thoroughly tested, or ever
> will be. Perhaps instead we ought to legislate that NAMEDATALEN must
> be a multiple of 8, or some such thing.
>
> The other constraint, that typalign='d' fields must always fall on an
> 8 byte boundary, is probably less annoying in practice, but it's easy
> to imagine a future catalog running into trouble. Let's say we want to
> introduce a new catalog that has only an Oid column and a float8
> column. Perhaps with 0-3 bool or uint8 columns as well, or with any
> number of NameData columns as well. Well, the only way to satisfy this
> constraint is to put the float8 column first and the Oid column after
> it, which immediately makes it look different from every other catalog
> we have. It's hard to feel like that would be a good solution here. I
> think we ought to try to engineer a solution where heap_form_tuple()
> is going to do the same thing as the C compiler without the sorts of
> extra rules that this test case enforces.

These seem to be valid concerns.

> AFAICS, we could do that by:
>
> 1. De-supporting platforms that have this problem, or
> 2. Introducing new typalign values, as Noah proposed back on April 2, or
> 3. Somehow forcing values that are sometimes 4-byte aligned and
> sometimes 8-byte aligned to be 8-byte alignment on all platforms

Introducing new typalign values seems a good idea to me as it's more
future-proof. Will this item be for PG16, right? The main concern
seems that what this test case enforces would be nuisance when
introducing a new system catalog or a new column to the existing
catalog but given we're in post PG15-beta1 it is unlikely to happen in
PG15.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-06-13 Thread Robert Haas
On Sun, Apr 17, 2022 at 11:22 PM Noah Misch  wrote:
> > Yes, but it could be false positives in some cases. For instance, the
> > column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
> > and 8 platforms but the new test fails.
>
> I'm happy with that, because the affected author should look for padding-free
> layouts before settling on your example layout.  If the padding-free layouts
> are all unacceptable, the author should update the expected sanity_check.out
> to show the one row where the test "fails".

I realize that it was necessary to get something committed quickly
here to unbreak the buildfarm, but this is really a mess. As I
understand it, the problem here is that typalign='d' is either 4 bytes
or 8 depending on how the 'double' type is aligned on that platform,
but we use that typalign value also for some other data types that may
not be aligned in the same way as 'double'. Consequently, it's
possible to have a situation where the behavior of the C compiler
diverges from the behavior of heap_form_tuple(). To avoid that, we
need every catalog column that uses typalign=='d' to begin on an
8-byte boundary. We also want all such columns to occur before the
first NameData column in the catalog, to guard against the possibility
that NAMEDATALEN has been redefined to an odd value. I think this set
of constraints is a nuisance and that it's mostly good luck we haven't
run into any really awkward problems here so far.

In many of our catalogs, the first member is an OID and the second
member of the struct is of type NameData: pg_namespace, pg_class,
pg_proc, etc. That common design pattern is in direct contradiction to
the desires of this test case. As soon as someone wants to add a
typalign='d' member to any of those system catalogs, the struct layout
is going to have to get shuffled around -- and then it will look
different from all the other ones. Or else we'd have to rearrange them
all to move all the NameData columns to the end. I feel like it's
weird to introduce a test case that so obviously flies in the face of
how catalog layout has been done up to this point, especially for the
sake of a hypothetical user who want to set NAMEDATALEN to an odd
number. I doubt such scenarios have been thoroughly tested, or ever
will be. Perhaps instead we ought to legislate that NAMEDATALEN must
be a multiple of 8, or some such thing.

The other constraint, that typalign='d' fields must always fall on an
8 byte boundary, is probably less annoying in practice, but it's easy
to imagine a future catalog running into trouble. Let's say we want to
introduce a new catalog that has only an Oid column and a float8
column. Perhaps with 0-3 bool or uint8 columns as well, or with any
number of NameData columns as well. Well, the only way to satisfy this
constraint is to put the float8 column first and the Oid column after
it, which immediately makes it look different from every other catalog
we have. It's hard to feel like that would be a good solution here. I
think we ought to try to engineer a solution where heap_form_tuple()
is going to do the same thing as the C compiler without the sorts of
extra rules that this test case enforces.

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

I also don't like the fact that the test case doesn't even catch
exactly the problematic set of cases, but rather a superset, leaving
it up to future patch authors to make a correct judgment about whether
a certain new column can be listed as an expected output of the test
case or whether the catalog representation must be changed. The idea
that we'll reliably get that right might be optimistic. Again, I don't
mean to say that this is the fault of this test case since, without
the test case, we'd have no idea that there was even a potential
problem, which would not be better. But it feels to me like we're
hacking around the real problem instead of fixing it, and it seems to
me that we should try to do better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-04-17 Thread Masahiko Sawada
On Mon, Apr 18, 2022 at 12:22 PM Noah Misch  wrote:
>
> On Mon, Apr 18, 2022 at 10:45:50AM +0900, Masahiko Sawada wrote:
> > On Fri, Apr 15, 2022 at 4:26 PM Noah Misch  wrote:
> > > On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:
> > > > On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila  
> > > > wrote:
> > > > > On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  
> > > > > wrote:
> > > > > > I'll take care of this today. I think we can mark the new function
> > > > > > get_column_offset() being introduced by this patch as parallel safe.
> > > > >
> > > > > Pushed.
> > > >
> > > > Thanks!
> > >
> > > I took a closer look at the test case.  The "get_column_offset(coltypes) 
> > > % 8"
> > > part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
> > > platform.  Instead of testing the start of the typalign='d' column, let's 
> > > test
> > > the first offset beyond the previous column.  The difference between 
> > > those two
> > > values depends on ALIGNOF_DOUBLE.
> >
> > Yes, but it could be false positives in some cases. For instance, the
> > column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
> > and 8 platforms but the new test fails.
>
> I'm happy with that, because the affected author should look for padding-free
> layouts before settling on your example layout.  If the padding-free layouts
> are all unacceptable, the author should update the expected sanity_check.out
> to show the one row where the test "fails".

That makes sense.

Regard,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-17 Thread Noah Misch
On Mon, Apr 18, 2022 at 10:45:50AM +0900, Masahiko Sawada wrote:
> On Fri, Apr 15, 2022 at 4:26 PM Noah Misch  wrote:
> > On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:
> > > On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila  
> > > wrote:
> > > > On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  
> > > > wrote:
> > > > > I'll take care of this today. I think we can mark the new function
> > > > > get_column_offset() being introduced by this patch as parallel safe.
> > > >
> > > > Pushed.
> > >
> > > Thanks!
> >
> > I took a closer look at the test case.  The "get_column_offset(coltypes) % 
> > 8"
> > part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
> > platform.  Instead of testing the start of the typalign='d' column, let's 
> > test
> > the first offset beyond the previous column.  The difference between those 
> > two
> > values depends on ALIGNOF_DOUBLE.
> 
> Yes, but it could be false positives in some cases. For instance, the
> column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
> and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout.  If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".




Re: Skipping logical replication transactions on subscriber side

2022-04-17 Thread Masahiko Sawada
On Fri, Apr 15, 2022 at 4:26 PM Noah Misch  wrote:
>
> On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:
> > On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila  wrote:
> > > On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  
> > > wrote:
> > > > I'll take care of this today. I think we can mark the new function
> > > > get_column_offset() being introduced by this patch as parallel safe.
> > >
> > > Pushed.
> >
> > Thanks!
>
> I took a closer look at the test case.  The "get_column_offset(coltypes) % 8"
> part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
> platform.  Instead of testing the start of the typalign='d' column, let's test
> the first offset beyond the previous column.  The difference between those two
> values depends on ALIGNOF_DOUBLE.

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-15 Thread Noah Misch
On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:
> On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila  wrote:
> > On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  wrote:
> > > I'll take care of this today. I think we can mark the new function
> > > get_column_offset() being introduced by this patch as parallel safe.
> >
> > Pushed.
> 
> Thanks!

I took a closer look at the test case.  The "get_column_offset(coltypes) % 8"
part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
platform.  Instead of testing the start of the typalign='d' column, let's test
the first offset beyond the previous column.  The difference between those two
values depends on ALIGNOF_DOUBLE.  While there, ignore typbyval; it doesn't
affect disk tuple layout, so this test shouldn't care.  I plan to push the
attached patch.
Author: Noah Misch 
Commit: Noah Misch 

Test ALIGNOF_DOUBLE==4 compatibility under ALIGNOF_DOUBLE==8.

Today's test case detected alignment problems only when executing on
AIX.  This change lets popular platforms detect the same problems.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/regress/expected/sanity_check.out 
b/src/test/regress/expected/sanity_check.out
index a2faefb..c5c675b 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -39,18 +39,18 @@ WITH check_columns AS (
SELECT t.oid
 FROM pg_type t JOIN pg_attribute pa ON t.oid = pa.atttypid
 WHERE pa.attrelid = a.attrelid AND
-  pa.attnum > 0 AND pa.attnum <= a.attnum
+  pa.attnum > 0 AND pa.attnum < a.attnum
 ORDER BY pa.attnum) AS coltypes
  FROM pg_attribute a JOIN pg_class c ON c.oid = attrelid
   JOIN pg_namespace n ON c.relnamespace = n.oid
  WHERE attalign = 'd' AND relkind = 'r' AND
   attnotnull AND attlen <> -1 AND n.nspname = 'pg_catalog'
 )
-SELECT relname, attname, coltypes, get_column_offset(coltypes)
+SELECT relname, attname, coltypes, get_columns_length(coltypes)
  FROM check_columns
- WHERE get_column_offset(coltypes) % 8 != 0 OR
+ WHERE get_columns_length(coltypes) % 8 != 0 OR
'name'::regtype::oid = ANY(coltypes);
- relname | attname | coltypes | get_column_offset 
--+-+--+---
+ relname | attname | coltypes | get_columns_length 
+-+-+--+
 (0 rows)
 
diff --git a/src/test/regress/expected/test_setup.out 
b/src/test/regress/expected/test_setup.out
index 8b8ba7d..391b36d 100644
--- a/src/test/regress/expected/test_setup.out
+++ b/src/test/regress/expected/test_setup.out
@@ -206,7 +206,7 @@ CREATE FUNCTION ttdummy ()
 RETURNS trigger
 AS :'regresslib'
 LANGUAGE C;
-CREATE FUNCTION get_column_offset (oid[])
+CREATE FUNCTION get_columns_length(oid[])
 RETURNS int
 AS :'regresslib'
 LANGUAGE C STRICT STABLE PARALLEL SAFE;
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 8b0c2d9..ade4b51 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -1219,12 +1219,12 @@ binary_coercible(PG_FUNCTION_ARGS)
 }
 
 /*
- * Return the column offset of the last data in the given array of
- * data types.  The input data types must be fixed-length data types.
+ * Return the length of the portion of a tuple consisting of the given array
+ * of data types.  The input data types must be fixed-length data types.
  */
-PG_FUNCTION_INFO_V1(get_column_offset);
+PG_FUNCTION_INFO_V1(get_columns_length);
 Datum
-get_column_offset(PG_FUNCTION_ARGS)
+get_columns_length(PG_FUNCTION_ARGS)
 {
ArrayType   *ta = PG_GETARG_ARRAYTYPE_P(0);
Oid *type_oids;
@@ -1249,14 +1249,10 @@ get_column_offset(PG_FUNCTION_ARGS)
get_typlenbyvalalign(typeoid, , , );
 
/* the data type must be fixed-length */
-   if (!(typbyval || (typlen > 0)))
+   if (typlen < 0)
elog(ERROR, "type %u is not fixed-length data type", 
typeoid);
 
-   column_offset = att_align_nominal(column_offset, typalign);
-
-   /* not include the last type size */
-   if (i != (ntypes - 1))
-   column_offset += typlen;
+   column_offset = att_align_nominal(column_offset + typlen, 
typalign);
}
 
PG_RETURN_INT32(column_offset);
diff --git a/src/test/regress/sql/sanity_check.sql 
b/src/test/regress/sql/sanity_check.sql
index c70ff78..7f338d1 100644
--- a/src/test/regress/sql/sanity_check.sql
+++ b/src/test/regress/sql/sanity_check.sql
@@ -34,14 +34,14 @@ WITH check_columns AS (
SELECT t.oid
 FROM pg_type t JOIN pg_attribute pa ON t.oid = pa.atttypid
 WHERE pa.attrelid = a.attrelid AND
-  pa.attnum > 0 AND pa.attnum <= a.attnum
+  pa.attnum > 0 AND pa.attnum < a.attnum
 ORDER BY pa.attnum) AS coltypes
  FROM pg_attribute a JOIN pg_class c ON c.oid = attrelid
  

Re: Skipping logical replication transactions on subscriber side

2022-04-07 Thread Masahiko Sawada
On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila  wrote:
>
> On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  wrote:
> >
> > I'll take care of this today. I think we can mark the new function
> > get_column_offset() being introduced by this patch as parallel safe.
> >
>
> Pushed.

Thanks!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-07 Thread Amit Kapila
On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila  wrote:
>
> I'll take care of this today. I think we can mark the new function
> get_column_offset() being introduced by this patch as parallel safe.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-06 Thread Amit Kapila
On Wed, Apr 6, 2022 at 10:01 AM Amit Kapila  wrote:
>
> On Wed, Apr 6, 2022 at 9:25 AM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 6, 2022 at 12:21 PM Noah Misch  wrote:
> >
> > Right. I've attached an updated patch.
> >
>
> Thanks, this looks good to me as well. Noah, would you like to commit it?
>

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-06 Thread Peter Eisentraut

On 02.04.22 10:13, Noah Misch wrote:

uint64 and pg_lsn use TYPALIGN_DOUBLE.  For AIX, they really need a typalign
corresponding to ALIGNOF_LONG.  Hence, the C struct layout doesn't match the
tuple layout.  Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class 
c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and 
attlen <> -1;
 attrelid │   attname
─┼──
  pg_sequence │ seqstart
  pg_sequence │ seqincrement
  pg_sequence │ seqmax
  pg_sequence │ seqmin
  pg_sequence │ seqcache
  pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.


Yes, we carefully did this when we ran into this the last time.  See 
 
and commit f3b421da5f4addc95812b9db05a24972b8fd9739.





Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Amit Kapila
On Wed, Apr 6, 2022 at 9:25 AM Masahiko Sawada  wrote:
>
> On Wed, Apr 6, 2022 at 12:21 PM Noah Misch  wrote:
>
> Right. I've attached an updated patch.
>

Thanks, this looks good to me as well. Noah, would you like to commit it?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Masahiko Sawada
On Wed, Apr 6, 2022 at 12:21 PM Noah Misch  wrote:
>
> On Tue, Apr 05, 2022 at 04:41:28PM +0900, Masahiko Sawada wrote:
> > On Tue, Apr 5, 2022 at 4:08 PM Noah Misch  wrote:
> > > On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:
> > > > I've attached an updated patch. The patch includes a regression test
> > > > to detect the new violation as we discussed. I've confirmed that
> > > > Cirrus CI tests pass. Please confirm on AIX and review the patch.
> > >
> > > When the context of a "git grep skiplsn" match involves several struct 
> > > fields
> > > in struct order, please change to the new order.  In other words, do for 
> > > all
> > > "git grep skiplsn" matches what the v2 patch does in GetSubscription().  
> > > The
> > > v2 patch does not do this for catalogs.sgml, but it ought to.  I didn't 
> > > check
> > > all the other "git grep" matches; please do so.
> >
> > Oops, I missed many places. I checked all "git grep" matches and fixed them.
>
> > --- a/src/backend/catalog/system_views.sql
> > +++ b/src/backend/catalog/system_views.sql
> > @@ -1285,8 +1285,8 @@ REVOKE ALL ON pg_replication_origin_status FROM 
> > public;
> >
> >  -- All columns of pg_subscription except subconninfo are publicly readable.
> >  REVOKE ALL ON pg_subscription FROM public;
> > -GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
> > -  substream, subtwophasestate, subdisableonerr, subskiplsn, 
> > subslotname,
> > +GRANT SELECT (oid, subdbid, subname, subskiplsn, subowner, subenabled,
> > +  subbinary, substream, subtwophasestate, subdisableonerr, 
> > subslotname,
> >subsynccommit, subpublications)
>
> subskiplsn comes before subname.

Right. I've attached an updated patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


make_subskiplsn_aligned_v4.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Noah Misch
On Tue, Apr 05, 2022 at 04:41:28PM +0900, Masahiko Sawada wrote:
> On Tue, Apr 5, 2022 at 4:08 PM Noah Misch  wrote:
> > On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:
> > > I've attached an updated patch. The patch includes a regression test
> > > to detect the new violation as we discussed. I've confirmed that
> > > Cirrus CI tests pass. Please confirm on AIX and review the patch.
> >
> > When the context of a "git grep skiplsn" match involves several struct 
> > fields
> > in struct order, please change to the new order.  In other words, do for all
> > "git grep skiplsn" matches what the v2 patch does in GetSubscription().  The
> > v2 patch does not do this for catalogs.sgml, but it ought to.  I didn't 
> > check
> > all the other "git grep" matches; please do so.
> 
> Oops, I missed many places. I checked all "git grep" matches and fixed them.

> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1285,8 +1285,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
>  
>  -- All columns of pg_subscription except subconninfo are publicly readable.
>  REVOKE ALL ON pg_subscription FROM public;
> -GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
> -  substream, subtwophasestate, subdisableonerr, subskiplsn, 
> subslotname,
> +GRANT SELECT (oid, subdbid, subname, subskiplsn, subowner, subenabled,
> +  subbinary, substream, subtwophasestate, subdisableonerr, 
> subslotname,
>subsynccommit, subpublications)

subskiplsn comes before subname.  Other than that, this looks done.  I
recommend committing it with that change.




Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Masahiko Sawada
On Tue, Apr 5, 2022 at 4:08 PM Noah Misch  wrote:
>
> On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:
> > I've attached an updated patch. The patch includes a regression test
> > to detect the new violation as we discussed. I've confirmed that
> > Cirrus CI tests pass. Please confirm on AIX and review the patch.
>
> When the context of a "git grep skiplsn" match involves several struct fields
> in struct order, please change to the new order.  In other words, do for all
> "git grep skiplsn" matches what the v2 patch does in GetSubscription().  The
> v2 patch does not do this for catalogs.sgml, but it ought to.  I didn't check
> all the other "git grep" matches; please do so.

Oops, I missed many places. I checked all "git grep" matches and fixed them.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7f4f79d1b5..d52aebaf66 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7750,6 +7750,16 @@ SCRAM-SHA-256$iteration count:
   
  
 
+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Finish LSN of the transaction whose changes are to be skipped, if a valid
+   LSN; otherwise 0/0.
+  
+ 
+
  
   
subname name
@@ -7820,16 +7830,6 @@ SCRAM-SHA-256$iteration count:
   
  
 
- 
-  
-   subskiplsn pg_lsn
-  
-  
-   Finish LSN of the transaction whose changes are to be skipped, if a valid
-   LSN; otherwise 0/0.
-  
- 
-
  
   
subconninfo text
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 0ff0982f7b..add51caadf 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -63,6 +63,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub = (Subscription *) palloc(sizeof(Subscription));
 	sub->oid = subid;
 	sub->dbid = subform->subdbid;
+	sub->skiplsn = subform->subskiplsn;
 	sub->name = pstrdup(NameStr(subform->subname));
 	sub->owner = subform->subowner;
 	sub->enabled = subform->subenabled;
@@ -70,7 +71,6 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->stream = subform->substream;
 	sub->twophasestate = subform->subtwophasestate;
 	sub->disableonerr = subform->subdisableonerr;
-	sub->skiplsn = subform->subskiplsn;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9eaa51df29..5383b7d3d0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1285,8 +1285,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-  substream, subtwophasestate, subdisableonerr, subskiplsn, subslotname,
+GRANT SELECT (oid, subdbid, subname, subskiplsn, subowner, subenabled,
+  subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
   subsynccommit, subpublications)
 ON pg_subscription TO public;
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 85dacbe93d..f7b62dd94c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -596,6 +596,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 			   Anum_pg_subscription_oid);
 	values[Anum_pg_subscription_oid - 1] = ObjectIdGetDatum(subid);
 	values[Anum_pg_subscription_subdbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+	values[Anum_pg_subscription_subskiplsn - 1] = LSNGetDatum(InvalidXLogRecPtr);
 	values[Anum_pg_subscription_subname - 1] =
 		DirectFunctionCall1(namein, CStringGetDatum(stmt->subname));
 	values[Anum_pg_subscription_subowner - 1] = ObjectIdGetDatum(owner);
@@ -607,7 +608,6 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 	 LOGICALREP_TWOPHASE_STATE_PENDING :
 	 LOGICALREP_TWOPHASE_STATE_DISABLED);
 	values[Anum_pg_subscription_subdisableonerr - 1] = BoolGetDatum(opts.disableonerr);
-	values[Anum_pg_subscription_subskiplsn - 1] = LSNGetDatum(InvalidXLogRecPtr);
 	values[Anum_pg_subscription_subconninfo - 1] =
 		CStringGetTextDatum(conninfo);
 	if (opts.slot_name)
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 599c2e4422..f006a92612 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,10 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	Oid			subdbid BKI_LOOKUP(pg_database);	/* Database the
 	 * subscription is in. */
+
+	XLogRecPtr	subskiplsn;		/* All changes finished at this LSN are
+ * skipped */
+
 	NameData	subname;		/* Name of 

Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Noah Misch
On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:
> I've attached an updated patch. The patch includes a regression test
> to detect the new violation as we discussed. I've confirmed that
> Cirrus CI tests pass. Please confirm on AIX and review the patch.

When the context of a "git grep skiplsn" match involves several struct fields
in struct order, please change to the new order.  In other words, do for all
"git grep skiplsn" matches what the v2 patch does in GetSubscription().  The
v2 patch does not do this for catalogs.sgml, but it ought to.  I didn't check
all the other "git grep" matches; please do so.

The changes present in this patch all look good.




Re: Skipping logical replication transactions on subscriber side

2022-04-05 Thread Masahiko Sawada
On Tue, Apr 5, 2022 at 12:38 PM Masahiko Sawada  wrote:
>
> On Tue, Apr 5, 2022 at 10:46 AM Noah Misch  wrote:
> >
> > On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:
> > > On Tue, Apr 5, 2022 at 9:21 AM Noah Misch  wrote:
> > > > On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:
> > > > > On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
> > > > > > On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > > > > > > How about a comment like: "It has to be kept at 8-byte alignment
> > > > > > > boundary so as to be accessed directly via C struct as it uses
> > > > > > > TYPALIGN_DOUBLE for storage which has 4-byte alignment on 
> > > > > > > platforms
> > > > > > > like AIX."? Can you please suggest a better comment if you don't 
> > > > > > > like
> > > > > > > this one?
> > > > > >
> > > > > > I'd write it like this, though I'm not sure it's an improvement on 
> > > > > > your words:
> > > > > >
> > > > > >   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte 
> > > > > > alignment on
> > > > > >   some of the C types that correspond to TYPALIGN_DOUBLE SQL types. 
> > > > > >  To ensure
> > > > > >   catalog C struct layout matches catalog tuple layout, arrange for 
> > > > > > the tuple
> > > > > >   offset of each fixed-width, attalign='d' catalog column to be 
> > > > > > divisible by 8
> > > > > >   unconditionally.  Keep such columns before the first NameData 
> > > > > > column of the
> > > > > >   catalog, since packagers can override NAMEDATALEN to an odd 
> > > > > > number.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > >
> > > > > > The best place for such a comment would be in one of
> > > > > > src/test/regress/sql/*sanity*.sql, next to a test written to detect 
> > > > > > new
> > > > > > violations.
> > > > >
> > > > > Agreed.
> > > > >
> > > > > IIUC in the new test, we would need a new SQL function to calculate
> > > > > the offset of catalog columns including padding, is that right? Or do
> > > > > you have an idea to do that by using existing functionality?
> > > >
> > > > Something like this:
> > > >
> > > > select
> > > >   attrelid::regclass,
> > > >   attname,
> > > >   array(select typname
> > > > from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> > > > where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum 
> > > > < a.attnum order by pa.attnum) AS types_before,
> > > >   (select sum(attlen)
> > > >from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> > > >where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> > > > a.attnum) AS len_before
> > > > from pg_attribute a
> > > > join pg_class c on c.oid = attrelid
> > > > where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
> > > > order by attrelid::regclass::text, attnum;
> > > > attrelid │   attname│types_before   
> > > >   │ len_before
> > > > ─┼──┼─┼
> > > >  pg_sequence │ seqstart │ {oid,oid} 
> > > >   │  8
> > > >  pg_sequence │ seqincrement │ {oid,oid,int8}
> > > >   │ 16
> > > >  pg_sequence │ seqmax   │ {oid,oid,int8,int8}   
> > > >   │ 24
> > > >  pg_sequence │ seqmin   │ {oid,oid,int8,int8,int8}  
> > > >   │ 32
> > > >  pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} 
> > > >   │ 40
> > > >  pg_subscription │ subskiplsn   │ 
> > > > {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
> > > > (6 rows)
> > > >
> > > > That doesn't count padding, but hazardous column changes will cause a 
> > > > diff in
> > > > the output.
> > >
> > > Yes, in this case, we can detect the violated column order even
> > > without considering padding. On the other hand, I think this
> > > calculation could not detect some patterns of order. For instance,
> > > suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
> > > int8}, the len_before is 16 but offset of int8 column including
> > > padding is 20 on ALIGNOF_DOUBLE==4 environment.
> >
> > Correct.  Feel free to make it more precise.  If you do want to add a
> > function, it could be a regress.c function rather than an always-installed
> > part of PostgreSQL.  Again, getting the buildfarm green is a priority; we 
> > can
> > always add tests later.
>
> Agreed. I'll update and submit the patch as soon as possible.
>

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


make_subskiplsn_aligned_v2.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Masahiko Sawada
On Tue, Apr 5, 2022 at 10:46 AM Noah Misch  wrote:
>
> On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:
> > On Tue, Apr 5, 2022 at 9:21 AM Noah Misch  wrote:
> > > On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:
> > > > On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
> > > > > On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > > > > > How about a comment like: "It has to be kept at 8-byte alignment
> > > > > > boundary so as to be accessed directly via C struct as it uses
> > > > > > TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> > > > > > like AIX."? Can you please suggest a better comment if you don't 
> > > > > > like
> > > > > > this one?
> > > > >
> > > > > I'd write it like this, though I'm not sure it's an improvement on 
> > > > > your words:
> > > > >
> > > > >   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte 
> > > > > alignment on
> > > > >   some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  
> > > > > To ensure
> > > > >   catalog C struct layout matches catalog tuple layout, arrange for 
> > > > > the tuple
> > > > >   offset of each fixed-width, attalign='d' catalog column to be 
> > > > > divisible by 8
> > > > >   unconditionally.  Keep such columns before the first NameData 
> > > > > column of the
> > > > >   catalog, since packagers can override NAMEDATALEN to an odd number.
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > > The best place for such a comment would be in one of
> > > > > src/test/regress/sql/*sanity*.sql, next to a test written to detect 
> > > > > new
> > > > > violations.
> > > >
> > > > Agreed.
> > > >
> > > > IIUC in the new test, we would need a new SQL function to calculate
> > > > the offset of catalog columns including padding, is that right? Or do
> > > > you have an idea to do that by using existing functionality?
> > >
> > > Something like this:
> > >
> > > select
> > >   attrelid::regclass,
> > >   attname,
> > >   array(select typname
> > > from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> > > where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> > > a.attnum order by pa.attnum) AS types_before,
> > >   (select sum(attlen)
> > >from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> > >where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> > > a.attnum) AS len_before
> > > from pg_attribute a
> > > join pg_class c on c.oid = attrelid
> > > where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
> > > order by attrelid::regclass::text, attnum;
> > > attrelid │   attname│types_before 
> > > │ len_before
> > > ─┼──┼─┼
> > >  pg_sequence │ seqstart │ {oid,oid}   
> > > │  8
> > >  pg_sequence │ seqincrement │ {oid,oid,int8}  
> > > │ 16
> > >  pg_sequence │ seqmax   │ {oid,oid,int8,int8} 
> > > │ 24
> > >  pg_sequence │ seqmin   │ {oid,oid,int8,int8,int8}
> > > │ 32
> > >  pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8}   
> > > │ 40
> > >  pg_subscription │ subskiplsn   │ 
> > > {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
> > > (6 rows)
> > >
> > > That doesn't count padding, but hazardous column changes will cause a 
> > > diff in
> > > the output.
> >
> > Yes, in this case, we can detect the violated column order even
> > without considering padding. On the other hand, I think this
> > calculation could not detect some patterns of order. For instance,
> > suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
> > int8}, the len_before is 16 but offset of int8 column including
> > padding is 20 on ALIGNOF_DOUBLE==4 environment.
>
> Correct.  Feel free to make it more precise.  If you do want to add a
> function, it could be a regress.c function rather than an always-installed
> part of PostgreSQL.  Again, getting the buildfarm green is a priority; we can
> always add tests later.

Agreed. I'll update and submit the patch as soon as possible.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Noah Misch
On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:
> On Tue, Apr 5, 2022 at 9:21 AM Noah Misch  wrote:
> > On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:
> > > On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
> > > > On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > > > > How about a comment like: "It has to be kept at 8-byte alignment
> > > > > boundary so as to be accessed directly via C struct as it uses
> > > > > TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> > > > > like AIX."? Can you please suggest a better comment if you don't like
> > > > > this one?
> > > >
> > > > I'd write it like this, though I'm not sure it's an improvement on your 
> > > > words:
> > > >
> > > >   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte 
> > > > alignment on
> > > >   some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  To 
> > > > ensure
> > > >   catalog C struct layout matches catalog tuple layout, arrange for the 
> > > > tuple
> > > >   offset of each fixed-width, attalign='d' catalog column to be 
> > > > divisible by 8
> > > >   unconditionally.  Keep such columns before the first NameData column 
> > > > of the
> > > >   catalog, since packagers can override NAMEDATALEN to an odd number.
> > >
> > > Thanks!
> > >
> > > >
> > > > The best place for such a comment would be in one of
> > > > src/test/regress/sql/*sanity*.sql, next to a test written to detect new
> > > > violations.
> > >
> > > Agreed.
> > >
> > > IIUC in the new test, we would need a new SQL function to calculate
> > > the offset of catalog columns including padding, is that right? Or do
> > > you have an idea to do that by using existing functionality?
> >
> > Something like this:
> >
> > select
> >   attrelid::regclass,
> >   attname,
> >   array(select typname
> > from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> > where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> > a.attnum order by pa.attnum) AS types_before,
> >   (select sum(attlen)
> >from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> >where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> > a.attnum) AS len_before
> > from pg_attribute a
> > join pg_class c on c.oid = attrelid
> > where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
> > order by attrelid::regclass::text, attnum;
> > attrelid │   attname│types_before   
> >   │ len_before
> > ─┼──┼─┼
> >  pg_sequence │ seqstart │ {oid,oid} 
> >   │  8
> >  pg_sequence │ seqincrement │ {oid,oid,int8}
> >   │ 16
> >  pg_sequence │ seqmax   │ {oid,oid,int8,int8}   
> >   │ 24
> >  pg_sequence │ seqmin   │ {oid,oid,int8,int8,int8}  
> >   │ 32
> >  pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} 
> >   │ 40
> >  pg_subscription │ subskiplsn   │ 
> > {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
> > (6 rows)
> >
> > That doesn't count padding, but hazardous column changes will cause a diff 
> > in
> > the output.
> 
> Yes, in this case, we can detect the violated column order even
> without considering padding. On the other hand, I think this
> calculation could not detect some patterns of order. For instance,
> suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
> int8}, the len_before is 16 but offset of int8 column including
> padding is 20 on ALIGNOF_DOUBLE==4 environment.

Correct.  Feel free to make it more precise.  If you do want to add a
function, it could be a regress.c function rather than an always-installed
part of PostgreSQL.  Again, getting the buildfarm green is a priority; we can
always add tests later.




Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Masahiko Sawada
On Tue, Apr 5, 2022 at 9:21 AM Noah Misch  wrote:
>
> On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:
> > On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
> > > On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > > > How about a comment like: "It has to be kept at 8-byte alignment
> > > > boundary so as to be accessed directly via C struct as it uses
> > > > TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> > > > like AIX."? Can you please suggest a better comment if you don't like
> > > > this one?
> > >
> > > I'd write it like this, though I'm not sure it's an improvement on your 
> > > words:
> > >
> > >   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte 
> > > alignment on
> > >   some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  To 
> > > ensure
> > >   catalog C struct layout matches catalog tuple layout, arrange for the 
> > > tuple
> > >   offset of each fixed-width, attalign='d' catalog column to be divisible 
> > > by 8
> > >   unconditionally.  Keep such columns before the first NameData column of 
> > > the
> > >   catalog, since packagers can override NAMEDATALEN to an odd number.
> >
> > Thanks!
> >
> > >
> > > The best place for such a comment would be in one of
> > > src/test/regress/sql/*sanity*.sql, next to a test written to detect new
> > > violations.
> >
> > Agreed.
> >
> > IIUC in the new test, we would need a new SQL function to calculate
> > the offset of catalog columns including padding, is that right? Or do
> > you have an idea to do that by using existing functionality?
>
> Something like this:
>
> select
>   attrelid::regclass,
>   attname,
>   array(select typname
> from pg_type t join pg_attribute pa on t.oid = pa.atttypid
> where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
> a.attnum order by pa.attnum) AS types_before,
>   (select sum(attlen)
>from pg_type t join pg_attribute pa on t.oid = pa.atttypid
>where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) 
> AS len_before
> from pg_attribute a
> join pg_class c on c.oid = attrelid
> where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
> order by attrelid::regclass::text, attnum;
> attrelid │   attname│types_before 
> │ len_before
> ─┼──┼─┼
>  pg_sequence │ seqstart │ {oid,oid}   
> │  8
>  pg_sequence │ seqincrement │ {oid,oid,int8}  
> │ 16
>  pg_sequence │ seqmax   │ {oid,oid,int8,int8} 
> │ 24
>  pg_sequence │ seqmin   │ {oid,oid,int8,int8,int8}
> │ 32
>  pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8}   
> │ 40
>  pg_subscription │ subskiplsn   │ {oid,oid,name,oid,bool,bool,bool,char,bool} 
> │ 81
> (6 rows)
>
> That doesn't count padding, but hazardous column changes will cause a diff in
> the output.

Yes, in this case, we can detect the violated column order even
without considering padding. On the other hand, I think this
calculation could not detect some patterns of order. For instance,
suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
int8}, the len_before is 16 but offset of int8 column including
padding is 20 on ALIGNOF_DOUBLE==4 environment.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Noah Misch
On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:
> On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
> > On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > > How about a comment like: "It has to be kept at 8-byte alignment
> > > boundary so as to be accessed directly via C struct as it uses
> > > TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> > > like AIX."? Can you please suggest a better comment if you don't like
> > > this one?
> >
> > I'd write it like this, though I'm not sure it's an improvement on your 
> > words:
> >
> >   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment 
> > on
> >   some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  To 
> > ensure
> >   catalog C struct layout matches catalog tuple layout, arrange for the 
> > tuple
> >   offset of each fixed-width, attalign='d' catalog column to be divisible 
> > by 8
> >   unconditionally.  Keep such columns before the first NameData column of 
> > the
> >   catalog, since packagers can override NAMEDATALEN to an odd number.
> 
> Thanks!
> 
> >
> > The best place for such a comment would be in one of
> > src/test/regress/sql/*sanity*.sql, next to a test written to detect new
> > violations.
> 
> Agreed.
> 
> IIUC in the new test, we would need a new SQL function to calculate
> the offset of catalog columns including padding, is that right? Or do
> you have an idea to do that by using existing functionality?

Something like this:

select
  attrelid::regclass,
  attname,
  array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < 
a.attnum order by pa.attnum) AS types_before,
  (select sum(attlen)
   from pg_type t join pg_attribute pa on t.oid = pa.atttypid
   where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) 
AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │   attname│types_before │ 
len_before
─┼──┼─┼
 pg_sequence │ seqstart │ {oid,oid}   │ 
 8
 pg_sequence │ seqincrement │ {oid,oid,int8}  │ 
16
 pg_sequence │ seqmax   │ {oid,oid,int8,int8} │ 
24
 pg_sequence │ seqmin   │ {oid,oid,int8,int8,int8}│ 
32
 pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8}   │ 
40
 pg_subscription │ subskiplsn   │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 
81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.




Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Masahiko Sawada
On Mon, Apr 4, 2022 at 3:26 PM Noah Misch  wrote:
>
> On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> > On Mon, Apr 4, 2022 at 8:01 AM Noah Misch  wrote:
> > > On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:
> > > > On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
> > > > > On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > > > > > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila 
> > > > > >  wrote:
> > > > > > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  
> > > > > > > wrote:
> > > > > > > > Some options:
> > > > > > > > - Move subskiplsn after subdbid, so it's always aligned anyway. 
> > > > > > > >  I've
> > > > > > > >   confirmed that this lets the test pass, in 44s.
> > >
> > > > --- a/src/include/catalog/pg_subscription.h
> > > > +++ b/src/include/catalog/pg_subscription.h
> > > > @@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > > > BKI_SHARED_RELATION BKI_ROW
> > > >
> > > >   Oid subdbid BKI_LOOKUP(pg_database);
> > > > /* Database the
> > > > 
> > > >* subscription is in. */
> > > > +
> > > > + /*
> > > > +  * All changes finished at this LSN are skipped.
> > > > +  *
> > > > +  * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte 
> > > > alignment
> > > > +  * (TYPALIGN_DOUBLE) and it does not match the alignment on some 
> > > > platforms
> > > > +  * such as AIX.  Therefore subskiplsn needs to be placed here so 
> > > > it is
> > > > +  * always aligned.
> > >
> > > I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 
> > > bytes, but
> > > the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.
> >
> > How about a comment like: "It has to be kept at 8-byte alignment
> > boundary so as to be accessed directly via C struct as it uses
> > TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> > like AIX."? Can you please suggest a better comment if you don't like
> > this one?
>
> I'd write it like this, though I'm not sure it's an improvement on your words:
>
>   When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
>   some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  To ensure
>   catalog C struct layout matches catalog tuple layout, arrange for the tuple
>   offset of each fixed-width, attalign='d' catalog column to be divisible by 8
>   unconditionally.  Keep such columns before the first NameData column of the
>   catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

>
> The best place for such a comment would be in one of
> src/test/regress/sql/*sanity*.sql, next to a test written to detect new
> violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-04 Thread Noah Misch
On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:
> On Mon, Apr 4, 2022 at 8:01 AM Noah Misch  wrote:
> > On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:
> > > On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
> > > > On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > > > > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  
> > > > > wrote:
> > > > > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > > > > > Some options:
> > > > > > > - Move subskiplsn after subdbid, so it's always aligned anyway.  
> > > > > > > I've
> > > > > > >   confirmed that this lets the test pass, in 44s.
> >
> > > --- a/src/include/catalog/pg_subscription.h
> > > +++ b/src/include/catalog/pg_subscription.h
> > > @@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > > BKI_SHARED_RELATION BKI_ROW
> > >
> > >   Oid subdbid BKI_LOOKUP(pg_database);/* 
> > > Database the
> > >   
> > >  * subscription is in. */
> > > +
> > > + /*
> > > +  * All changes finished at this LSN are skipped.
> > > +  *
> > > +  * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
> > > +  * (TYPALIGN_DOUBLE) and it does not match the alignment on some 
> > > platforms
> > > +  * such as AIX.  Therefore subskiplsn needs to be placed here so it 
> > > is
> > > +  * always aligned.
> >
> > I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, 
> > but
> > the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.
> 
> How about a comment like: "It has to be kept at 8-byte alignment
> boundary so as to be accessed directly via C struct as it uses
> TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> like AIX."? Can you please suggest a better comment if you don't like
> this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

  When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
  some of the C types that correspond to TYPALIGN_DOUBLE SQL types.  To ensure
  catalog C struct layout matches catalog tuple layout, arrange for the tuple
  offset of each fixed-width, attalign='d' catalog column to be divisible by 8
  unconditionally.  Keep such columns before the first NameData column of the
  catalog, since packagers can override NAMEDATALEN to an odd number.

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.  If adding such a test would materially delay getting the
buildfarm green, putting the comment in pg_subscription.h works for me.




Re: Skipping logical replication transactions on subscriber side

2022-04-03 Thread Amit Kapila
On Mon, Apr 4, 2022 at 8:41 AM Masahiko Sawada  wrote:
>
> On Mon, Apr 4, 2022 at 11:50 AM Amit Kapila  wrote:
> >
> > Another minor point is that I think it is better to use DatumGetLSN to
> > read this in GetSubscription as we use LSNGetDatum while storing it. I
> > am not sure if there is any direct problem due to this but that looks
> > consistent to me.
>
> But it seems not consistent with other usages since we don't normally
> use DatumGetXXX to get values directly from C struct.
>

Okay, I see that for sequences also we don't use it, so we can
probably leave it as it is.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-03 Thread Masahiko Sawada
On Mon, Apr 4, 2022 at 11:50 AM Amit Kapila  wrote:
>
> On Mon, Apr 4, 2022 at 8:01 AM Noah Misch  wrote:
> >
> > On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:
> > > On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
> > > > On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > > > > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  
> > > > > wrote:
> > > > > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > > > > > Some options:
> > > > > > > - Move subskiplsn after subdbid, so it's always aligned anyway.  
> > > > > > > I've
> > > > > > >   confirmed that this lets the test pass, in 44s.
> >
> > > --- a/src/include/catalog/pg_subscription.h
> > > +++ b/src/include/catalog/pg_subscription.h
> > > @@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > > BKI_SHARED_RELATION BKI_ROW
> > >
> > >   Oid subdbid BKI_LOOKUP(pg_database);/* 
> > > Database the
> > >   
> > >  * subscription is in. */
> > > +
> > > + /*
> > > +  * All changes finished at this LSN are skipped.
> > > +  *
> > > +  * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
> > > +  * (TYPALIGN_DOUBLE) and it does not match the alignment on some 
> > > platforms
> > > +  * such as AIX.  Therefore subskiplsn needs to be placed here so it 
> > > is
> > > +  * always aligned.
> >
> > I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, 
> > but
> > the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.
> >
>
> How about a comment like: "It has to be kept at 8-byte alignment
> boundary so as to be accessed directly via C struct as it uses
> TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
> like AIX."? Can you please suggest a better comment if you don't like
> this one?
>
> > > +  */
> > > + XLogRecPtr  subskiplsn;
> > > +
> > >   NameDatasubname;/* Name of the subscription 
> > > */
> > >
> > >   Oid subowner BKI_LOOKUP(pg_authid); /* Owner of 
> > > the subscription */
> > > @@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > > BKI_SHARED_RELATION BKI_ROW
> > >   boolsubdisableonerr;/* True if a worker error 
> > > should cause the
> > >* 
> > > subscription to be disabled */
> > >
> > > - XLogRecPtr  subskiplsn; /* All changes finished at 
> > > this LSN are
> > > -  * skipped 
> > > */
> >
> > Some code sites list pg_subscription fields in field order.  Please update
> > them so they continue to list fields in field order.  CreateSubscription() 
> > is
> > one example.
> >
>
> Another minor point is that I think it is better to use DatumGetLSN to
> read this in GetSubscription as we use LSNGetDatum while storing it. I
> am not sure if there is any direct problem due to this but that looks
> consistent to me.

But it seems not consistent with other usages since we don't normally
use DatumGetXXX to get values directly from C struct.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-03 Thread Amit Kapila
On Mon, Apr 4, 2022 at 8:01 AM Noah Misch  wrote:
>
> On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:
> > On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
> > > On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > > > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  
> > > > wrote:
> > > > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > > > > Some options:
> > > > > > - Move subskiplsn after subdbid, so it's always aligned anyway.  
> > > > > > I've
> > > > > >   confirmed that this lets the test pass, in 44s.
>
> > --- a/src/include/catalog/pg_subscription.h
> > +++ b/src/include/catalog/pg_subscription.h
> > @@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > BKI_SHARED_RELATION BKI_ROW
> >
> >   Oid subdbid BKI_LOOKUP(pg_database);/* 
> > Database the
> > 
> >* subscription is in. */
> > +
> > + /*
> > +  * All changes finished at this LSN are skipped.
> > +  *
> > +  * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
> > +  * (TYPALIGN_DOUBLE) and it does not match the alignment on some 
> > platforms
> > +  * such as AIX.  Therefore subskiplsn needs to be placed here so it is
> > +  * always aligned.
>
> I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
> the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.
>

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

> > +  */
> > + XLogRecPtr  subskiplsn;
> > +
> >   NameDatasubname;/* Name of the subscription */
> >
> >   Oid subowner BKI_LOOKUP(pg_authid); /* Owner of 
> > the subscription */
> > @@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> > BKI_SHARED_RELATION BKI_ROW
> >   boolsubdisableonerr;/* True if a worker error 
> > should cause the
> >* 
> > subscription to be disabled */
> >
> > - XLogRecPtr  subskiplsn; /* All changes finished at 
> > this LSN are
> > -  * skipped */
>
> Some code sites list pg_subscription fields in field order.  Please update
> them so they continue to list fields in field order.  CreateSubscription() is
> one example.
>

Another minor point is that I think it is better to use DatumGetLSN to
read this in GetSubscription as we use LSNGetDatum while storing it. I
am not sure if there is any direct problem due to this but that looks
consistent to me.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-03 Thread Noah Misch
On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:
> On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
> > On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  
> > > wrote:
> > > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > > > Some options:
> > > > > - Move subskiplsn after subdbid, so it's always aligned anyway.  I've
> > > > >   confirmed that this lets the test pass, in 44s.

> --- a/src/include/catalog/pg_subscription.h
> +++ b/src/include/catalog/pg_subscription.h
> @@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> BKI_SHARED_RELATION BKI_ROW
>  
>   Oid subdbid BKI_LOOKUP(pg_database);/* 
> Database the
>   
>  * subscription is in. */
> +
> + /*
> +  * All changes finished at this LSN are skipped.
> +  *
> +  * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
> +  * (TYPALIGN_DOUBLE) and it does not match the alignment on some 
> platforms
> +  * such as AIX.  Therefore subskiplsn needs to be placed here so it is
> +  * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

On most hosts, the C alignment of an XLogRecPtr is 8 bytes, and
TYPALIGN_DOUBLE==8.  On AIX, C alignment is still 8 bytes, but
TYPALIGN_DOUBLE==4.  The tuples on disk and in shared buffers use
TYPALIGN_DOUBLE to decide how much padding to insert, and that amount of
padding needs to match the C alignment padding.  Placing the field here
reduces the padding to zero, making that invariant hold trivially.

> +  */
> + XLogRecPtr  subskiplsn;
> +
>   NameDatasubname;/* Name of the subscription */
>  
>   Oid subowner BKI_LOOKUP(pg_authid); /* Owner of the 
> subscription */
> @@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
> BKI_SHARED_RELATION BKI_ROW
>   boolsubdisableonerr;/* True if a worker error 
> should cause the
>* 
> subscription to be disabled */
>  
> - XLogRecPtr  subskiplsn; /* All changes finished at this 
> LSN are
> -  * skipped */

Some code sites list pg_subscription fields in field order.  Please update
them so they continue to list fields in field order.  CreateSubscription() is
one example.




Re: Skipping logical replication transactions on subscriber side

2022-04-03 Thread Masahiko Sawada
On Sun, Apr 3, 2022 at 9:45 AM Noah Misch  wrote:
>
> On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> > On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  wrote:
> > > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > > Some options:
> > > > - Move subskiplsn after subdbid, so it's always aligned anyway.  I've
> > > >   confirmed that this lets the test pass, in 44s.
> > > > - Move subskiplsn to the CATALOG_VARLEN section, despite its fixed 
> > > > length.
> > >
> > > +1 to any one of the above. I mildly prefer the first option as that
> > > will allow us to access the value directly instead of going via
> > > SysCacheGetAttr but I am fine either way.
> >
> > +1. I also prefer the first option.
>
> Sounds good to me.

I've attached the patch for the first option.

> - Introduce a new typalign value suitable for uint64.  This is more intrusive,
>   but it's more future-proof.  Looking beyond catalog columns, it might
>   improve performance by avoiding unaligned reads.

The third option would be a good item for PG16 or later.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


make_subskiplsn_aligned.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-02 Thread Noah Misch
On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:
> On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  wrote:
> > On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> > > Some options:
> > > - Move subskiplsn after subdbid, so it's always aligned anyway.  I've
> > >   confirmed that this lets the test pass, in 44s.
> > > - Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
> >
> > +1 to any one of the above. I mildly prefer the first option as that
> > will allow us to access the value directly instead of going via
> > SysCacheGetAttr but I am fine either way.
> 
> +1. I also prefer the first option.

Sounds good to me.




Re: Skipping logical replication transactions on subscriber side

2022-04-02 Thread Masahiko Sawada
On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila  wrote:
>
> On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
> >
> > On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:
> > > It seems that 0/B0706F72 is not a random value. Two subscriber logs
> > > show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
> > > might show the next field in the pg_subscription catalog, i.e.,
> > > subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
> > > CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
> > > PUBLICATION pub WITH (disable_on_error = true, streaming = on,
> > > two_phase = on)".
> > >
> > > Given subscription.sql passes, something is wrong when we read the
> > > subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".
> >
> > That's a good clue.  We've never made pg_type.typalign able to represent
> > alignment as it works on AIX.  A uint64 like pg_lsn has 8-byte alignment, so
> > the C struct follows from that.  At the typalign level, we have only these:
> >
> > #define  TYPALIGN_CHAR  'c' /* char alignment (i.e. 
> > unaligned) */
> > #define  TYPALIGN_SHORT 's' /* short alignment (typically 2 
> > bytes) */
> > #define  TYPALIGN_INT   'i' /* int alignment (typically 4 
> > bytes) */
> > #define  TYPALIGN_DOUBLE'd' /* double alignment (often 8 
> > bytes) */
> >
> > On AIX, they are:
> >
> > #define ALIGNOF_DOUBLE 4
> > #define ALIGNOF_INT 4
> > #define ALIGNOF_LONG 8
> > /* #undef ALIGNOF_LONG_LONG_INT */
> > /* #undef ALIGNOF_PG_INT128_TYPE */
> > #define ALIGNOF_SHORT 2
> >
> > uint64 and pg_lsn use TYPALIGN_DOUBLE.  For AIX, they really need a typalign
> > corresponding to ALIGNOF_LONG.  Hence, the C struct layout doesn't match the
> > tuple layout.  Columns potentially affected:
> >
> > [local] test=*# select attrelid::regclass, attname from pg_attribute a join 
> > pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and 
> > attnotnull and attlen <> -1;
> > attrelid │   attname
> > ─┼──
> >  pg_sequence │ seqstart
> >  pg_sequence │ seqincrement
> >  pg_sequence │ seqmax
> >  pg_sequence │ seqmin
> >  pg_sequence │ seqcache
> >  pg_subscription │ subskiplsn
> > (6 rows)
> >
> > The pg_sequence fields evade trouble, because there's exactly eight bytes 
> > (two
> > oids) before them.

Thanks for helping with the investigation!

> >
> >
> > Some options:
> > - Move subskiplsn after subdbid, so it's always aligned anyway.  I've
> >   confirmed that this lets the test pass, in 44s.
> > - Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
> >
>
> +1 to any one of the above. I mildly prefer the first option as that
> will allow us to access the value directly instead of going via
> SysCacheGetAttr but I am fine either way.

+1. I also prefer the first option.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-02 Thread Amit Kapila
On Sat, Apr 2, 2022 at 1:43 PM Noah Misch  wrote:
>
> On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:
> > It seems that 0/B0706F72 is not a random value. Two subscriber logs
> > show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
> > might show the next field in the pg_subscription catalog, i.e.,
> > subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
> > CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
> > PUBLICATION pub WITH (disable_on_error = true, streaming = on,
> > two_phase = on)".
> >
> > Given subscription.sql passes, something is wrong when we read the
> > subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".
>
> That's a good clue.  We've never made pg_type.typalign able to represent
> alignment as it works on AIX.  A uint64 like pg_lsn has 8-byte alignment, so
> the C struct follows from that.  At the typalign level, we have only these:
>
> #define  TYPALIGN_CHAR  'c' /* char alignment (i.e. 
> unaligned) */
> #define  TYPALIGN_SHORT 's' /* short alignment (typically 2 
> bytes) */
> #define  TYPALIGN_INT   'i' /* int alignment (typically 4 
> bytes) */
> #define  TYPALIGN_DOUBLE'd' /* double alignment (often 8 
> bytes) */
>
> On AIX, they are:
>
> #define ALIGNOF_DOUBLE 4
> #define ALIGNOF_INT 4
> #define ALIGNOF_LONG 8
> /* #undef ALIGNOF_LONG_LONG_INT */
> /* #undef ALIGNOF_PG_INT128_TYPE */
> #define ALIGNOF_SHORT 2
>
> uint64 and pg_lsn use TYPALIGN_DOUBLE.  For AIX, they really need a typalign
> corresponding to ALIGNOF_LONG.  Hence, the C struct layout doesn't match the
> tuple layout.  Columns potentially affected:
>
> [local] test=*# select attrelid::regclass, attname from pg_attribute a join 
> pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and 
> attnotnull and attlen <> -1;
> attrelid │   attname
> ─┼──
>  pg_sequence │ seqstart
>  pg_sequence │ seqincrement
>  pg_sequence │ seqmax
>  pg_sequence │ seqmin
>  pg_sequence │ seqcache
>  pg_subscription │ subskiplsn
> (6 rows)
>
> The pg_sequence fields evade trouble, because there's exactly eight bytes (two
> oids) before them.
>
>
> Some options:
> - Move subskiplsn after subdbid, so it's always aligned anyway.  I've
>   confirmed that this lets the test pass, in 44s.
> - Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
>

+1 to any one of the above. I mildly prefer the first option as that
will allow us to access the value directly instead of going via
SysCacheGetAttr but I am fine either way.

> - Introduce a new typalign value suitable for uint64.  This is more intrusive,
>   but it's more future-proof.  Looking beyond catalog columns, it might
>   improve performance by avoiding unaligned reads.
>
> > Is it possible to run the test again with the attached patch?
>
> Logs attached.  The test "passed", though it printed "poll_query_until timed
> out" three times and took awhile.

Thanks for helping in figuring out the problem.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-02 Thread Noah Misch
On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:
> It seems that 0/B0706F72 is not a random value. Two subscriber logs
> show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
> might show the next field in the pg_subscription catalog, i.e.,
> subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
> CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
> PUBLICATION pub WITH (disable_on_error = true, streaming = on,
> two_phase = on)".
> 
> Given subscription.sql passes, something is wrong when we read the
> subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

That's a good clue.  We've never made pg_type.typalign able to represent
alignment as it works on AIX.  A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that.  At the typalign level, we have only these:

#define  TYPALIGN_CHAR  'c' /* char alignment (i.e. unaligned) 
*/
#define  TYPALIGN_SHORT 's' /* short alignment (typically 2 
bytes) */
#define  TYPALIGN_INT   'i' /* int alignment (typically 4 
bytes) */
#define  TYPALIGN_DOUBLE'd' /* double alignment (often 8 bytes) 
*/

On AIX, they are:

#define ALIGNOF_DOUBLE 4 
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8   
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2  

uint64 and pg_lsn use TYPALIGN_DOUBLE.  For AIX, they really need a typalign
corresponding to ALIGNOF_LONG.  Hence, the C struct layout doesn't match the
tuple layout.  Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join 
pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and 
attnotnull and attlen <> -1;
attrelid │   attname
─┼──
 pg_sequence │ seqstart
 pg_sequence │ seqincrement
 pg_sequence │ seqmax
 pg_sequence │ seqmin
 pg_sequence │ seqcache
 pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.


Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway.  I've
  confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
- Introduce a new typalign value suitable for uint64.  This is more intrusive,
  but it's more future-proof.  Looking beyond catalog columns, it might
  improve performance by avoiding unaligned reads.

> Is it possible to run the test again with the attached patch?

Logs attached.  The test "passed", though it printed "poll_query_until timed
out" three times and took awhile.


log-subscription-20220401c.tar.xz
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-02 Thread Masahiko Sawada
On Sat, Apr 2, 2022 at 1:08 PM Amit Kapila  wrote:
>
> On Sat, Apr 2, 2022 at 7:29 AM Noah Misch  wrote:
> >
> > On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:
> >
> > After applying datum_to_lsn_skiplsn_1.patch, I get another failure.  Logs
> > attached.
> >
>
> The failure is for the same reason. I noticed that even when skip lsn
> value should be 0/0, it is some invalid value, see: "LOG:  not started
> skipping changes: my_skiplsn 0/B0706F72 finish_lsn 0/14EB7D8". Here,
> my_skiplsn should be 0/0 instead of 0/B0706F72. Now, I am not sure why
> the LSN's 4 bytes are correct and the other 4 bytes have some random
> value.

It seems that 0/B0706F72 is not a random value. Two subscriber logs
show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
might show the next field in the pg_subscription catalog, i.e.,
subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
PUBLICATION pub WITH (disable_on_error = true, streaming = on,
two_phase = on)".

Given subscription.sql passes, something is wrong when we read the
subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

Is it possible to run the test again with the attached patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


add_logs_v2.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Amit Kapila
On Sat, Apr 2, 2022 at 7:29 AM Noah Misch  wrote:
>
> On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:
>
> After applying datum_to_lsn_skiplsn_1.patch, I get another failure.  Logs
> attached.
>

The failure is for the same reason. I noticed that even when skip lsn
value should be 0/0, it is some invalid value, see: "LOG:  not started
skipping changes: my_skiplsn 0/B0706F72 finish_lsn 0/14EB7D8". Here,
my_skiplsn should be 0/0 instead of 0/B0706F72. Now, I am not sure why
the LSN's 4 bytes are correct and the other 4 bytes have some random
value. A similar problem is there when we have set the valid value of
skip lsn, see: "LOG:  not started skipping changes: my_skiplsn
14EB7D8/B0706F72 finish_lsn 0/14EB7D8". Here the value of my_skiplsn
should be 0/14EB7D8 instead of 14EB7D8/B0706F72.

I am sure that if you create a subscription with the below test and
check the skip lsn value, it will be correct, otherwise, you would
have seen failure in subscription.sql as well. If possible, can you
please check the following example to rule out the possibility:

For example,
Publisher:
Create table t1(c1 int);
Create Publication pub1 for table t1;

Subscriber:
Create table t1(c1 int);
Create Subscription sub1 connection 'dbname = postgres' Publication pub1;
Select subname, subskiplsn from pg_subsription; -- subskiplsn should be 0/0

Alter Subscription sub1 SKIP (LSN = '0/14EB7D8');
Select subname, subskiplsn from pg_subsription; -- subskiplsn should
be 0/14EB7D8

Assuming the above is correct and we are still getting the wrong value
in apply worker, the only remaining suspect is the following code in
GetSubscription:
sub->skiplsn = DatumGetLSN(subform->subskiplsn);

I don't know what is wrong with this because subskiplsn is stored as
pg_lsn which is a fixed value and we should be able to access it by
struct. Do you see any problem with this?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Noah Misch
On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:
> On Sat, Apr 2, 2022 at 5:41 AM Noah Misch  wrote:
> >
> > On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:
> > > > On Fri, Apr 1, 2022 at 4:44 PM Noah Misch  wrote:
> > > > > src/test/subscription/t/029_on_error.pl has been failing reliably on 
> > > > > the five
> > > > > AIX buildfarm members:
> > > > >
> > > > > # poll_query_until timed out executing this query:
> > > > > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> > > > > # expecting this output:
> > > > > # t
> > > > > # last actual query output:
> > > > > # f
> > > > > # with stderr:
> > > > > timed out waiting for match: (?^:LOG:  done skipping logical 
> > > > > replication transaction finished at 0/1D30788) at t/029_on_error.pl 
> > > > > line 50.
> > > > >
> > > > > I've posted five sets of logs (2.7 MiB compressed) here:
> > > > > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing
> >
> > > Given that "SELECT subskiplsn = '0/0'
> > > FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
> > > value was set to subskiplsn even after the unique key error.
> > >
> > > So I'm guessing that the apply worker could not get the updated value
> > > of the subskiplsn or its MySubscription->skiplsn could not match with
> > > the transaction's finish LSN. Also, given that the test is failing on
> > > all AIX buildfarm members, there might be something specific to AIX.
> > >
> > > Noah, to investigate this issue further, is it possible for you to
> > > apply the attached patch and run the 029_on_error.pl test? The patch
> > > adds some logs to get additional information.
> >
> > Logs attached.
> 
> Thank you.
> 
> By seeing the below Logs:
> 
> 
> 2022-04-01 18:19:34.710 CUT [58327402] LOG:  not started skipping
> changes: my_skiplsn 14EB7D8/B0706F72 finish_lsn 0/14EB7D8
> ...
> 
> 
> It seems that the value of skiplsn read in GetSubscription is wrong
> which makes the apply worker think it doesn't need to skip the
> transaction. Now, in Alter/Create Subscription, we are using
> LSNGetDatum() to store skiplsn value in pg_subscription but while
> reading it in GetSubscription(), we are not converting back the datum
> to LSN by using DatumGetLSN(). Is it possible that on this machine it
> might be leading to not getting the right value for skiplsn? I think
> it is worth trying to see if this fixes the problem.

After applying datum_to_lsn_skiplsn_1.patch, I get another failure.  Logs
attached.


log-subscription-20220401b.tar.xz
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Amit Kapila
On Sat, Apr 2, 2022 at 5:41 AM Noah Misch  wrote:
>
> On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:
> > > On Fri, Apr 1, 2022 at 4:44 PM Noah Misch  wrote:
> > > > src/test/subscription/t/029_on_error.pl has been failing reliably on 
> > > > the five
> > > > AIX buildfarm members:
> > > >
> > > > # poll_query_until timed out executing this query:
> > > > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> > > > # expecting this output:
> > > > # t
> > > > # last actual query output:
> > > > # f
> > > > # with stderr:
> > > > timed out waiting for match: (?^:LOG:  done skipping logical 
> > > > replication transaction finished at 0/1D30788) at t/029_on_error.pl 
> > > > line 50.
> > > >
> > > > I've posted five sets of logs (2.7 MiB compressed) here:
> > > > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing
>
> > Given that "SELECT subskiplsn = '0/0'
> > FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
> > value was set to subskiplsn even after the unique key error.
> >
> > So I'm guessing that the apply worker could not get the updated value
> > of the subskiplsn or its MySubscription->skiplsn could not match with
> > the transaction's finish LSN. Also, given that the test is failing on
> > all AIX buildfarm members, there might be something specific to AIX.
> >
> > Noah, to investigate this issue further, is it possible for you to
> > apply the attached patch and run the 029_on_error.pl test? The patch
> > adds some logs to get additional information.
>
> Logs attached.
>

Thank you.

By seeing the below Logs:


2022-04-01 18:19:34.710 CUT [58327402] LOG:  not started skipping
changes: my_skiplsn 14EB7D8/B0706F72 finish_lsn 0/14EB7D8
...


It seems that the value of skiplsn read in GetSubscription is wrong
which makes the apply worker think it doesn't need to skip the
transaction. Now, in Alter/Create Subscription, we are using
LSNGetDatum() to store skiplsn value in pg_subscription but while
reading it in GetSubscription(), we are not converting back the datum
to LSN by using DatumGetLSN(). Is it possible that on this machine it
might be leading to not getting the right value for skiplsn? I think
it is worth trying to see if this fixes the problem.

Any other thoughts?

-- 
With Regards,
Amit Kapila.


datum_to_lsn_skiplsn_1.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Masahiko Sawada
On Fri, Apr 1, 2022 at 5:10 PM Masahiko Sawada  wrote:
>
> On Fri, Apr 1, 2022 at 4:44 PM Noah Misch  wrote:
> >
> > On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:
> > > On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira  wrote:
> > > > On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> > > > I have fixed all the above comments as per your suggestion in the
> > > > attached. Do let me know if something is missed?
> > > >
> > > > Looks good to me.
> > >
> > > This patch is committed
> > > (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).
> >
> > src/test/subscription/t/029_on_error.pl has been failing reliably on the 
> > five
> > AIX buildfarm members:
> >
> > # poll_query_until timed out executing this query:
> > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > timed out waiting for match: (?^:LOG:  done skipping logical replication 
> > transaction finished at 0/1D30788) at t/029_on_error.pl line 50.
> >
> > I've posted five sets of logs (2.7 MiB compressed) here:
> > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing
>
> Thank you for the report. I'm investigating this issue.

Looking at the subscriber logs, it successfully fetched the correct
error-LSN from the server logs and set it to ALTER SUBSCRIPTION …
SKIP:

2022-03-30 09:48:36.617 UTC [17039636:4] CONTEXT:  processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788
2022-03-30 09:48:36.617 UTC [17039636:5] LOG:  logical replication
subscription "sub" has been disabled due to an error
:
2022-03-30 09:48:36.670 UTC [17039640:1] [unknown] LOG:  connection
received: host=[local]
2022-03-30 09:48:36.672 UTC [17039640:2] [unknown] LOG:  connection
authorized: user=nm database=postgres application_name=029_on_error.pl
2022-03-30 09:48:36.675 UTC [17039640:3] 029_on_error.pl LOG:
statement: ALTER SUBSCRIPTION sub SKIP (lsn = '0/1D30788')
2022-03-30 09:48:36.676 UTC [17039640:4] 029_on_error.pl LOG:
disconnection: session time: 0:00:00.006 user=nm database=postgres
host=[local]
:
2022-03-30 09:48:36.762 UTC [28246036:2] ERROR:  duplicate key value
violates unique constraint "tbl_pkey"
2022-03-30 09:48:36.762 UTC [28246036:3] DETAIL:  Key (i)=(1) already exists.
2022-03-30 09:48:36.762 UTC [28246036:4] CONTEXT:  processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788

However, the worker could not start skipping changes of the error
transaction for some reason. Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


add_logs.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Masahiko Sawada
On Fri, Apr 1, 2022 at 4:44 PM Noah Misch  wrote:
>
> On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:
> > On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira  wrote:
> > > On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> > > I have fixed all the above comments as per your suggestion in the
> > > attached. Do let me know if something is missed?
> > >
> > > Looks good to me.
> >
> > This patch is committed
> > (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).
>
> src/test/subscription/t/029_on_error.pl has been failing reliably on the five
> AIX buildfarm members:
>
> # poll_query_until timed out executing this query:
> # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> timed out waiting for match: (?^:LOG:  done skipping logical replication 
> transaction finished at 0/1D30788) at t/029_on_error.pl line 50.
>
> I've posted five sets of logs (2.7 MiB compressed) here:
> https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Thank you for the report. I'm investigating this issue.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-04-01 Thread Noah Misch
On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:
> On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira  wrote:
> > On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> > I have fixed all the above comments as per your suggestion in the
> > attached. Do let me know if something is missed?
> >
> > Looks good to me.
> 
> This patch is committed
> (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG:  done skipping logical replication 
transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing


The members have not actually uploaded these failures, due to an OOM in the
Perl process driving the buildfarm script.  I think the OOM is due to a need
for excess RAM to capture 029_on_error_subscriber.log, which is 27MB here.  I
will move the members to 64-bit Perl.  (AIX 32-bit binaries OOM easily:
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX.)




Re: Skipping logical replication transactions on subscriber side

2022-03-28 Thread Amit Kapila
On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira  wrote:
>
> On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
>
> I have fixed all the above comments as per your suggestion in the
> attached. Do let me know if something is missed?
>
> Looks good to me.
>

This patch is committed
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).
Today, I have marked the corresponding entry in CF as committed.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-21 Thread Euler Taveira
On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> I have fixed all the above comments as per your suggestion in the
> attached. Do let me know if something is missed?
Looks good to me.

> > src/test/subscription/t/029_disable_on_error.pl |  94 --
> > src/test/subscription/t/029_on_error.pl | 183 +++
> >
> > It seems you are removing a test for 
> > 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
> > I should also name 029_on_error.pl to something else such as 
> > 030_skip_lsn.pl or
> > a generic name 030_skip_option.pl.
> >
> 
> As explained in my previous email, I don't think any change is
> required for this comment but do let me know if you still think so?
Oh, sorry about the noise. I saw mixed tests between the 2 new features and I
was confused if it was intentional or not.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Skipping logical replication transactions on subscriber side

2022-03-20 Thread Amit Kapila
On Mon, Mar 21, 2022 at 7:09 AM Euler Taveira  wrote:
>
> On Thu, Mar 17, 2022, at 3:03 AM, Amit Kapila wrote:
>
> On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated version patch.
> >
>
> The patch LGTM. I have made minor changes in comments and docs in the
> attached patch. Kindly let me know what you think of the attached?
>
> I am planning to commit this early next week (on Monday) unless there
> are more comments/suggestions.
>
> I reviewed this last version and I have a few comments.
>
> +* If the user set subskiplsn, we do a sanity check to make
> +* sure that the specified LSN is a probable value.
>
> ... user *sets*...
>
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("skip WAL location (LSN) must be 
> greater than origin LSN %X/%X",
> +   LSN_FORMAT_ARGS(remote_lsn;
>
> Shouldn't we add the LSN to be skipped in the "(LSN)"?
>
> +* Start a new transaction to clear the subskipxid, if not started
> +* yet.
>
> It seems it means subskiplsn.
>
> + * subskipxid in order to inform users for cases e.g., where the user 
> mistakenly
> + * specified the wrong subskiplsn.
>
> It seems it means subskiplsn.
>
> +sub test_skip_xact
> +{
>
> It seems this function should be named test_skip_lsn. Unless the intention is
> to cover other skip options in the future.
>

I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

> src/test/subscription/t/029_disable_on_error.pl |  94 --
> src/test/subscription/t/029_on_error.pl | 183 +++
>
> It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
> I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl 
> or
> a generic name 030_skip_option.pl.
>

As explained in my previous email, I don't think any change is
required for this comment but do let me know if you still think so?

-- 
With Regards,
Amit Kapila.


v17-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-03-20 Thread Amit Kapila
On Mon, Mar 21, 2022 at 7:09 AM Euler Taveira  wrote:
>
> src/test/subscription/t/029_disable_on_error.pl |  94 --
> src/test/subscription/t/029_on_error.pl | 183 +++
>
> It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
>

We have covered the same test in the new test file. See "CREATE
SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub WITH
(disable_on_error = true, ...". This will test the cases we were
earlier testing via 'disable_on_error'.

> I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl 
> or
> a generic name 030_skip_option.pl.
>

The reason to keep the name 'on_error' is that it has tests for both
'disable_on_error' option and 'skip_lsn'. The other option could be
'on_error_action' or something like that. Now, does this make sense to
you?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-20 Thread Euler Taveira
On Thu, Mar 17, 2022, at 3:03 AM, Amit Kapila wrote:
> On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated version patch.
> >
> 
> The patch LGTM. I have made minor changes in comments and docs in the
> attached patch. Kindly let me know what you think of the attached?
> 
> I am planning to commit this early next week (on Monday) unless there
> are more comments/suggestions.
I reviewed this last version and I have a few comments.

+* If the user set subskiplsn, we do a sanity check to make
+* sure that the specified LSN is a probable value.

... user *sets*...

+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("skip WAL location (LSN) must be 
greater than origin LSN %X/%X",
+   LSN_FORMAT_ARGS(remote_lsn;

Shouldn't we add the LSN to be skipped in the "(LSN)"?

+* Start a new transaction to clear the subskipxid, if not started
+* yet.

It seems it means subskiplsn.

+ * subskipxid in order to inform users for cases e.g., where the user 
mistakenly
+ * specified the wrong subskiplsn.

It seems it means subskiplsn.

+sub test_skip_xact
+{

It seems this function should be named test_skip_lsn. Unless the intention is
to cover other skip options in the future.

src/test/subscription/t/029_disable_on_error.pl |  94 --
src/test/subscription/t/029_on_error.pl | 183 +++

It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl or
a generic name 030_skip_option.pl.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Skipping logical replication transactions on subscriber side

2022-03-17 Thread Masahiko Sawada
On Thu, Mar 17, 2022 at 3:03 PM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated version patch.
> >
>
> The patch LGTM. I have made minor changes in comments and docs in the
> attached patch. Kindly let me know what you think of the attached?

Thank you for updating the patch. It looks good to me.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Skipping logical replication transactions on subscriber side

2022-03-17 Thread osumi.takami...@fujitsu.com
On Thursday, March 17, 2022 7:56 PM Masahiko Sawada  
wrote:
 On Thu, Mar 17, 2022 at 5:52 PM Amit Kapila 
> wrote:
> >
> > On Thu, Mar 17, 2022 at 12:39 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Thursday, March 17, 2022 3:04 PM Amit Kapila
>  wrote:
> > > > On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
> > > >  wrote:
> > > > >
> > > > > I've attached an updated version patch.
> > > > >
> > > >
> > > > The patch LGTM. I have made minor changes in comments and docs in
> > > > the attached patch. Kindly let me know what you think of the attached?
> > > Hi, thank you for the patch. Few minor comments.
> > >
> > >
> > > (3) apply_handle_commit_internal
> > >
> > ...
> > >
> > > I feel if we move those two functions at the end of the
> > > apply_handle_commit and apply_handle_stream_commit, then we will
> > > have more aligned codes and improve readability.
> > >
> 
> I think we cannot just move them to the end of apply_handle_commit() and
> apply_handle_stream_commit(). Because if we do that, we end up missing
> updating replication_session_origin_lsn/timestamp when clearing the
> subskiplsn if we're skipping a non-stream transaction.
> 
> Basically, the apply worker differently handles 2pc transactions and non-2pc
> transactions; we always prepare even empty transactions whereas we don't
> commit empty non-2pc transactions. So I think we don’t have to handle both in
> the same way.
Okay. Thank you so much for your explanation.
Then the code looks good to me.


Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-17 Thread Masahiko Sawada
On Thu, Mar 17, 2022 at 5:52 PM Amit Kapila  wrote:
>
> On Thu, Mar 17, 2022 at 12:39 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Thursday, March 17, 2022 3:04 PM Amit Kapila  
> > wrote:
> > > On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > I've attached an updated version patch.
> > > >
> > >
> > > The patch LGTM. I have made minor changes in comments and docs in the
> > > attached patch. Kindly let me know what you think of the attached?
> > Hi, thank you for the patch. Few minor comments.
> >
> >
> > (1) comment of maybe_start_skipping_changes
> >
> >
> > +   /*
> > +* Quick return if it's not requested to skip this transaction. This
> > +* function is called for every remote transaction and we assume 
> > that
> > +* skipping the transaction is not used often.
> > +*/
> >
> > I feel this comment should explain more about our intention and
> > what it confirms. In a case when user requests skip,
> > but it doesn't match the condition, we don't start
> > skipping changes, strictly speaking.
> >
> > From:
> > Quick return if it's not requested to skip this transaction.
> >
> > To:
> > Quick return if we can't ensure possible skiplsn is set
> > and it equals to the finish LSN of this transaction.
> >
>
> Hmm, the current comment seems more appropriate. What you are
> suggesting is almost writing the code in sentence form.
>
> >
> > (2) 029_on_error.pl
> >
> > +   my $contents = slurp_file($node_subscriber->logfile, $offset);
> > +   $contents =~
> > + qr/processing remote data for replication origin \"pg_\d+\" 
> > during "INSERT" for replication target relation "public.tbl" in transaction 
> > \d+ finishe$
> > + or die "could not get error-LSN";
> >
> > I think we shouldn't use a lot of new words.
> >
> > How about a change below  ?
> >
> > From:
> > could not get error-LSN
> > To:
> > failed to find expected error message that contains finish LSN for SKIP 
> > option
> >
> >
> > (3) apply_handle_commit_internal
> >
> ...
> >
> > I feel if we move those two functions at the end
> > of the apply_handle_commit and apply_handle_stream_commit,
> > then we will have more aligned codes and improve readability.
> >

I think we cannot just move them to the end of apply_handle_commit()
and apply_handle_stream_commit(). Because if we do that, we end up
missing updating replication_session_origin_lsn/timestamp when
clearing the subskiplsn if we're skipping a non-stream transaction.

Basically, the apply worker differently handles 2pc transactions and
non-2pc transactions; we always prepare even empty transactions
whereas we don't commit empty non-2pc transactions. So I think we
don’t have to handle both in the same way.

> I think the intention is to avoid duplicate code as we have a common
> function that gets called from both of those.

Yes.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-03-17 Thread Amit Kapila
On Thu, Mar 17, 2022 at 12:39 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, March 17, 2022 3:04 PM Amit Kapila  
> wrote:
> > On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
> >  wrote:
> > >
> > > I've attached an updated version patch.
> > >
> >
> > The patch LGTM. I have made minor changes in comments and docs in the
> > attached patch. Kindly let me know what you think of the attached?
> Hi, thank you for the patch. Few minor comments.
>
>
> (1) comment of maybe_start_skipping_changes
>
>
> +   /*
> +* Quick return if it's not requested to skip this transaction. This
> +* function is called for every remote transaction and we assume that
> +* skipping the transaction is not used often.
> +*/
>
> I feel this comment should explain more about our intention and
> what it confirms. In a case when user requests skip,
> but it doesn't match the condition, we don't start
> skipping changes, strictly speaking.
>
> From:
> Quick return if it's not requested to skip this transaction.
>
> To:
> Quick return if we can't ensure possible skiplsn is set
> and it equals to the finish LSN of this transaction.
>

Hmm, the current comment seems more appropriate. What you are
suggesting is almost writing the code in sentence form.

>
> (2) 029_on_error.pl
>
> +   my $contents = slurp_file($node_subscriber->logfile, $offset);
> +   $contents =~
> + qr/processing remote data for replication origin \"pg_\d+\" during 
> "INSERT" for replication target relation "public.tbl" in transaction \d+ 
> finishe$
> + or die "could not get error-LSN";
>
> I think we shouldn't use a lot of new words.
>
> How about a change below  ?
>
> From:
> could not get error-LSN
> To:
> failed to find expected error message that contains finish LSN for SKIP option
>
>
> (3) apply_handle_commit_internal
>
...
>
> I feel if we move those two functions at the end
> of the apply_handle_commit and apply_handle_stream_commit,
> then we will have more aligned codes and improve readability.
>

I think the intention is to avoid duplicate code as we have a common
function that gets called from both of those. OTOH, if Sawada-San or
others also prefer your approach to rearrange the code then I am fine
with it.

-- 
With Regards,
Amit Kapila.




RE: Skipping logical replication transactions on subscriber side

2022-03-17 Thread osumi.takami...@fujitsu.com
On Thursday, March 17, 2022 3:04 PM Amit Kapila  wrote:
> On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
>  wrote:
> >
> > I've attached an updated version patch.
> >
> 
> The patch LGTM. I have made minor changes in comments and docs in the
> attached patch. Kindly let me know what you think of the attached?
Hi, thank you for the patch. Few minor comments.


(1) comment of maybe_start_skipping_changes


+   /*
+* Quick return if it's not requested to skip this transaction. This
+* function is called for every remote transaction and we assume that
+* skipping the transaction is not used often.
+*/

I feel this comment should explain more about our intention and
what it confirms. In a case when user requests skip,
but it doesn't match the condition, we don't start
skipping changes, strictly speaking.

From:
Quick return if it's not requested to skip this transaction.

To:
Quick return if we can't ensure possible skiplsn is set
and it equals to the finish LSN of this transaction.


(2) 029_on_error.pl

+   my $contents = slurp_file($node_subscriber->logfile, $offset);
+   $contents =~
+ qr/processing remote data for replication origin \"pg_\d+\" during 
"INSERT" for replication target relation "public.tbl" in transaction \d+ 
finishe$
+ or die "could not get error-LSN";

I think we shouldn't use a lot of new words.

How about a change below  ?

From:
could not get error-LSN
To:
failed to find expected error message that contains finish LSN for SKIP option


(3) apply_handle_commit_internal


Lastly, may I have the reasons to call both
stop_skipping_changes and clear_subscription_skip_lsn
in this function, instead of having them at the end
of apply_handle_commit and apply_handle_stream_commit ?

IMHO, this structure looks to create the
extra condition branches in apply_handle_commit_internal.

Also, because of this code, when we call stop_skipping_changes
in the apply_handle_commit_internal, after checking
is_skipping_changes() returns true, we check another
is_skipping_changes() at the top of stop_skipping_changes.

OTOH, for other cases like apply_handle_prepare, apply_handle_stream_prepare,
we call those two functions (or either one) depending on the needs,
after existing commits and during the closing processing.
(In the case of rollback_prepare, it's also called after existing commit)

I feel if we move those two functions at the end
of the apply_handle_commit and apply_handle_stream_commit,
then we will have more aligned codes and improve readability.



Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-17 Thread Amit Kapila
On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada  wrote:
>
> I've attached an updated version patch.
>

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

I am planning to commit this early next week (on Monday) unless there
are more comments/suggestions.

-- 
With Regards,
Amit Kapila.


v16-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-03-16 Thread Amit Kapila
On Thu, Mar 17, 2022 at 8:13 AM shiy.f...@fujitsu.com
 wrote:
>
> On Wed, Mar 16, 2022 4:23 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated version patch.
> >
>
> Thanks for updating the patch. Here are some comments for the v15 patch.
>
> 1. src/backend/replication/logical/worker.c
>
> + * to skip applying the changes when starting to apply changes.  The 
> subskiplsn is
> + * cleared after successfully skipping the transaction or applying non-empty
> + * transaction. The latter prevents the mistakenly specified subskiplsn from
>
> Should "applying non-empty transaction" be modified to "finishing a
> transaction"? To be consistent with the description in the
> alter_subscription.sgml.
>

The current wording in the patch seems okay to me as it is good to
emphasize on non-empty transactions.

> 2. src/test/subscription/t/029_on_error.pl
>
> +# Test of logical replication subscription self-disabling feature.
>
> Should we add something about "skip logical replication transactions" in this
> comment?
>

How about: "Tests for disable_on_error and SKIP transaction features."?

I am making some other minor edits in the patch and will take care of
whatever we decide for these comments.

-- 
With Regards,
Amit Kapila.




RE: Skipping logical replication transactions on subscriber side

2022-03-16 Thread shiy.f...@fujitsu.com
On Wed, Mar 16, 2022 4:23 PM Masahiko Sawada  wrote:
> 
> I've attached an updated version patch.
> 

Thanks for updating the patch. Here are some comments for the v15 patch.

1. src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The 
subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction. The latter prevents the mistakenly specified subskiplsn from

Should "applying non-empty transaction" be modified to "finishing a
transaction"? To be consistent with the description in the
alter_subscription.sgml.

2. src/test/subscription/t/029_on_error.pl

+# Test of logical replication subscription self-disabling feature.

Should we add something about "skip logical replication transactions" in this
comment?

Regards,
Shi yu


Re: Skipping logical replication transactions on subscriber side

2022-03-16 Thread Masahiko Sawada
On Wed, Mar 16, 2022 at 1:20 PM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila  wrote:
> >
> > On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > >
> > > > 6.
> > > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > > >   TupleTableSlot *remoteslot;
> > > >   MemoryContext oldctx;
> > > >
> > > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > > + if (is_skipping_changes() ||
> > > >
> > > > Is there a reason to keep the skip_changes check here and in other DML
> > > > operations instead of at one central place in apply_dispatch?
> > >
> > > Since we already have the check of applying the change on the spot at
> > > the beginning of the handlers I feel it's better to add
> > > is_skipping_changes() to the check than add a new if statement to
> > > apply_dispatch, but do you prefer to check it in one central place in
> > > apply_dispatch?
> > >
> >
> > I think either way is fine. I just wanted to know the reason, your
> > current change looks okay to me.
> >
> > Some questions/comments
> > ==
> >
>
> Some cosmetic suggestions:
> ==
> 1.
> +# Create subscriptions. Both subscription sets disable_on_error to on
> +# so that they get disabled when a conflict occurs.
> +$node_subscriber->safe_psql(
> + 'postgres',
> + qq[
> +CREATE SUBSCRIPTION $subname CONNECTION '$publisher_connstr'
> PUBLICATION tap_pub WITH (streaming = on, two_phase = on,
> disable_on_error = on);
> +]);
>
> I don't understand what you mean by 'Both subscription ...' in the
> above comments.

Fixed.

>
> 2.
> + # Check the log indicating that successfully skipped the transaction,
>
> How about slightly rephrasing this to: "Check the log to ensure that
> the transaction is skipped"?

Fixed.

I've attached an updated version patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v15-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-03-16 Thread Masahiko Sawada
On Wed, Mar 16, 2022 at 11:28 AM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > 6.
> > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > >   TupleTableSlot *remoteslot;
> > >   MemoryContext oldctx;
> > >
> > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > + if (is_skipping_changes() ||
> > >
> > > Is there a reason to keep the skip_changes check here and in other DML
> > > operations instead of at one central place in apply_dispatch?
> >
> > Since we already have the check of applying the change on the spot at
> > the beginning of the handlers I feel it's better to add
> > is_skipping_changes() to the check than add a new if statement to
> > apply_dispatch, but do you prefer to check it in one central place in
> > apply_dispatch?
> >
>
> I think either way is fine. I just wanted to know the reason, your
> current change looks okay to me.
>
> Some questions/comments
> ==
> 1. IIRC, earlier, we thought of allowing to use of this option (SKIP)
> only for superusers (as this can lead to inconsistent data if not used
> carefully) but I don't see that check in the latest patch. What is the
> reason for the same?

I thought the non-superuser subscription owner can resolve the
conflict by manuall manipulating the relations, which is the same
result of skipping all data modification changes by ALTER SUBSCRIPTION
SKIP feature. But after more thought, it would not be exactly the same
since the skipped transaction might include changes to the relation
that the owner doesn't have permission on it.

>
> 2.
> + /*
> + * Update the subskiplsn of the tuple to InvalidXLogRecPtr.
>
> I think we can change the above part of the comment to "Clear subskiplsn."
>

Fixed.

> 3.
> + * Since we already have
>
> Isn't it better to say here: Since we have already ...?

Fixed.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Skipping logical replication transactions on subscriber side

2022-03-16 Thread osumi.takami...@fujitsu.com
On Wednesday, March 16, 2022 3:37 PM I wrote:
> On Wednesday, March 16, 2022 11:33 AM Amit Kapila
>  wrote:
> > On Tue, Mar 15, 2022 at 7:30 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada
> >  wrote:
> > > > I've attached an updated version patch.
> > >
> > > A couple of minor comments on v14.
> > >
> > > (1) apply_handle_commit_internal
> > >
> > >
> > > +   if (is_skipping_changes())
> > > +   {
> > > +   stop_skipping_changes();
> > > +
> > > +   /*
> > > +* Start a new transaction to clear the subskipxid,
> > > + if not
> > started
> > > +* yet. The transaction is committed below.
> > > +*/
> > > +   if (!IsTransactionState())
> > > +   StartTransactionCommand();
> > > +   }
> > > +
> > >
> > > I suppose we can move this condition check and
> > > stop_skipping_changes() call to the inside of the block we enter
> > > when IsTransactionState() returns
> > true.
> > >
> > > As the comment of apply_handle_commit_internal() mentions, it's the
> > > helper function for apply_handle_commit() and
> > > apply_handle_stream_commit().
> > >
> > > Then, I couldn't think that both callers don't open a transaction
> > > before the call of apply_handle_commit_internal().
> > > For applying spooled messages, we call begin_replication_step as well.
> > >
> > > I can miss something, but timing when we receive COMMIT message
> > > without opening a transaction, would be the case of empty
> > > transactions where the subscription (and its subscription worker) is not
> interested.
> > >
> >
> > I think when we skip non-streamed transactions we don't start a transaction.
> > So, if we do what you are suggesting, we will miss to clear the
> > skip_lsn after skipping the transaction.
> OK, this is what I missed.
> 
> On the other hand, what I was worried about is that empty transaction can 
> start
> skipping changes, if the subskiplsn is equal to the finish LSN for the empty
> transaction. The reason is we call maybe_start_skipping_changes even for
> empty ones and set skip_xact_finish_lsn by the finish LSN in that case.
> 
> I checked I could make this happen with debugger and some logs for LSN.
> What I did is just having two pairs of pub/sub and conduct a change for one of
> them, after I set a breakpoint in the logicalrep_write_begin on the walsender
> that will issue an empty transaction.
> Then, I check the finish LSN of it and
> conduct an alter subscription skip lsn command with this LSN value.
> As a result, empty transaction calls stop_skipping_changes in the
> apply_handle_commit_internal and then enter the block for IsTransactionState
> == true, which would not happen before applying the patch.
> 
> Also, this behavior looks contradicted with some comments in worker.c "The
> subskiplsn is cleared after successfully skipping the transaction or applying
> non-empty transaction." so, I was just confused and wrote the above comment.
Sorry, my understanding was not correct.

Even when we clear the subskiplsn by empty transaction,
we can say that it applies to the success of skipping the transaction.
Then this behavior and allowing empty transaction to match the indicated
LSN by alter subscription is fine.

I'm sorry for making noises.


Best Regards,
Takamichi Osumi



RE: Skipping logical replication transactions on subscriber side

2022-03-16 Thread osumi.takami...@fujitsu.com
On Wednesday, March 16, 2022 11:33 AM Amit Kapila  
wrote:
> On Tue, Mar 15, 2022 at 7:30 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada
>  wrote:
> > > I've attached an updated version patch.
> >
> > A couple of minor comments on v14.
> >
> > (1) apply_handle_commit_internal
> >
> >
> > +   if (is_skipping_changes())
> > +   {
> > +   stop_skipping_changes();
> > +
> > +   /*
> > +* Start a new transaction to clear the subskipxid, if not
> started
> > +* yet. The transaction is committed below.
> > +*/
> > +   if (!IsTransactionState())
> > +   StartTransactionCommand();
> > +   }
> > +
> >
> > I suppose we can move this condition check and stop_skipping_changes()
> > call to the inside of the block we enter when IsTransactionState() returns
> true.
> >
> > As the comment of apply_handle_commit_internal() mentions, it's the
> > helper function for apply_handle_commit() and
> > apply_handle_stream_commit().
> >
> > Then, I couldn't think that both callers don't open a transaction
> > before the call of apply_handle_commit_internal().
> > For applying spooled messages, we call begin_replication_step as well.
> >
> > I can miss something, but timing when we receive COMMIT message
> > without opening a transaction, would be the case of empty transactions
> > where the subscription (and its subscription worker) is not interested.
> >
> 
> I think when we skip non-streamed transactions we don't start a transaction.
> So, if we do what you are suggesting, we will miss to clear the skip_lsn after
> skipping the transaction.
OK, this is what I missed.

On the other hand, what I was worried about is that
empty transaction can start skipping changes,
if the subskiplsn is equal to the finish LSN for
the empty transaction. The reason is we call
maybe_start_skipping_changes even for empty ones
and set skip_xact_finish_lsn by the finish LSN in that case.

I checked I could make this happen with debugger and some logs for LSN.
What I did is just having two pairs of pub/sub
and conduct a change for one of them,
after I set a breakpoint in the logicalrep_write_begin
on the walsender that will issue an empty transaction.
Then, I check the finish LSN of it and
conduct an alter subscription skip lsn command with this LSN value.
As a result, empty transaction calls stop_skipping_changes
in the apply_handle_commit_internal and then
enter the block for IsTransactionState == true,
which would not happen before applying the patch.

Also, this behavior looks contradicted with some comments in worker.c
"The subskiplsn is cleared after successfully skipping the transaction
or applying non-empty transaction." so, I was just confused and
wrote the above comment.

I think this would not happen in practice, then
it might be OK without a special measure for this,
but I wasn't sure.


Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-16 Thread Masahiko Sawada
On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
>
> On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> wrote:
> >
> > I've attached an updated version patch.
> >
>
> Review:
> ===

Thank you for the comments.

> 1.
> +++ b/doc/src/sgml/logical-replication.sgml
> @@ -366,15 +366,19 @@ CONTEXT:  processing remote data for replication
> origin "pg_16395" during "INSER
> transaction, the subscription needs to be disabled temporarily by
> ALTER SUBSCRIPTION ... DISABLE first or
> alternatively, the
> subscription can be used with the
> disable_on_error option.
> -   Then, the transaction can be skipped by calling the
> +   Then, the transaction can be skipped by using
> +   ALTER SUBSCRITPION ... SKIP with the finish LSN
> +   (i.e., LSN 0/14C0378). After that the replication
> +   can be resumed by ALTER SUBSCRIPTION ... ENABLE.
> +   Alternatively, the transaction can also be skipped by calling the
>
> Do we really need to disable the subscription for the skip feature? I
> think that is required for origin_advance. Also, probably, we can say
> Finish LSN could be Prepare LSN, Commit LSN, etc.

Not necessary to disable the subscription for skip feature. Fixed.

>
> 2.
> + /*
> + * Quick return if it's not requested to skip this transaction. This
> + * function is called every start of applying changes and we assume that
> + * skipping the transaction is not used in many cases.
> + */
> + if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||
>
> The second part of this comment (especially ".. every start of
> applying changes ..") sounds slightly odd to me. How about changing it
> to: "This function is called for every remote transaction and we
> assume that skipping the transaction is not used in many cases."
>

Fixed.

> 3.
> +
> + ereport(LOG,
> + errmsg("start skipping logical replication transaction which
> finished at %X/%X",
> ...
> + ereport(LOG,
> + (errmsg("done skipping logical replication transaction which
> finished at %X/%X",
>
> No need of 'which' in above LOG messages. I think the message will be
> clear without the use of which in above message.

Removed.

>
> 4.
> + ereport(LOG,
> + (errmsg("done skipping logical replication transaction which
> finished at %X/%X",
> + LSN_FORMAT_ARGS(skip_xact_finish_lsn;
> +
> + /* Stop skipping changes */
> + skip_xact_finish_lsn = InvalidXLogRecPtr;
>
> Let's reverse the order of these statements to make them consistent
> with the corresponding maybe_start_* function.

But we cannot simply rever the order since skip_xact_finish_lsn is
used in the log message. Do we want to use a variable for it?

>
> 5.
> +
> + if (myskiplsn != finish_lsn)
> + ereport(WARNING,
> + errmsg("skip-LSN of logical replication subscription \"%s\"
> cleared", MySubscription->name),
>
> Shouldn't this be a LOG instead of a WARNING as this will be displayed
> only in server logs and by background apply worker?

WARNINGs are used also by other auxiliary processes such as archiver,
autovacuum workers, and launcher. So I think we can use it here.

>
> 6.
> @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
>   TupleTableSlot *remoteslot;
>   MemoryContext oldctx;
>
> - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> + if (is_skipping_changes() ||
>
> Is there a reason to keep the skip_changes check here and in other DML
> operations instead of at one central place in apply_dispatch?

I'd leave it as is as I mentioned in another email. But I've added
some comments as you suggested.

>
> 7.
> + /*
> + * Start a new transaction to clear the subskipxid, if not started
> + * yet. The transaction is committed below.
> + */
> + if (!IsTransactionState())
>
> I think the second part of the comment: "The transaction is committed
> below." is not required.

Removed.

>
> 8.
> + XLogRecPtr subskiplsn; /* All changes which finished at this LSN are
> + * skipped */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -109,6 +112,8 @@ typedef struct Subscription
>   bool disableonerr; /* Indicates if the subscription should be
>   * automatically disabled if a worker error
>   * occurs */
> + XLogRecPtr skiplsn; /* All changes which finished at this LSN are
> + * skipped */
>
> No need for 'which' in the above comments.

Removed.

>
> 9.
> Can we merge 029_disable_on_error in 030_skip_xact and name it as
> 029_on_error (or 029_on_error_skip_disable or some variant of it)?
> Both seem to be related features. I am slightly worried at the pace at
> which the number of test files are growing in subscription test.

Yes, we can merge them.

I'll submit an updated version patch after incorporating all comments I got.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > 6.
> > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > >   TupleTableSlot *remoteslot;
> > >   MemoryContext oldctx;
> > >
> > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > + if (is_skipping_changes() ||
> > >
> > > Is there a reason to keep the skip_changes check here and in other DML
> > > operations instead of at one central place in apply_dispatch?
> >
> > Since we already have the check of applying the change on the spot at
> > the beginning of the handlers I feel it's better to add
> > is_skipping_changes() to the check than add a new if statement to
> > apply_dispatch, but do you prefer to check it in one central place in
> > apply_dispatch?
> >
>
> I think either way is fine. I just wanted to know the reason, your
> current change looks okay to me.
>
> Some questions/comments
> ==
>

Some cosmetic suggestions:
==
1.
+# Create subscriptions. Both subscription sets disable_on_error to on
+# so that they get disabled when a conflict occurs.
+$node_subscriber->safe_psql(
+ 'postgres',
+ qq[
+CREATE SUBSCRIPTION $subname CONNECTION '$publisher_connstr'
PUBLICATION tap_pub WITH (streaming = on, two_phase = on,
disable_on_error = on);
+]);

I don't understand what you mean by 'Both subscription ...' in the
above comments.

2.
+ # Check the log indicating that successfully skipped the transaction,

How about slightly rephrasing this to: "Check the log to ensure that
the transaction is skipped"?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > 6.
> > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > >   TupleTableSlot *remoteslot;
> > >   MemoryContext oldctx;
> > >
> > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > + if (is_skipping_changes() ||
> > >
> > > Is there a reason to keep the skip_changes check here and in other DML
> > > operations instead of at one central place in apply_dispatch?
> >
> > Since we already have the check of applying the change on the spot at
> > the beginning of the handlers I feel it's better to add
> > is_skipping_changes() to the check than add a new if statement to
> > apply_dispatch, but do you prefer to check it in one central place in
> > apply_dispatch?
> >
>
> I think either way is fine. I just wanted to know the reason, your
> current change looks okay to me.
>

I feel it is better to at least add a comment suggesting that we skip
only data modification changes because the other part of message
handle_stream_* is there in other message handlers as well. It will
make it easier to add a similar check in future message handlers.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Tue, Mar 15, 2022 at 7:30 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada  
> wrote:
> > I've attached an updated version patch.
>
> A couple of minor comments on v14.
>
> (1) apply_handle_commit_internal
>
>
> +   if (is_skipping_changes())
> +   {
> +   stop_skipping_changes();
> +
> +   /*
> +* Start a new transaction to clear the subskipxid, if not 
> started
> +* yet. The transaction is committed below.
> +*/
> +   if (!IsTransactionState())
> +   StartTransactionCommand();
> +   }
> +
>
> I suppose we can move this condition check and stop_skipping_changes() call
> to the inside of the block we enter when IsTransactionState() returns true.
>
> As the comment of apply_handle_commit_internal() mentions,
> it's the helper function for apply_handle_commit() and
> apply_handle_stream_commit().
>
> Then, I couldn't think that both callers don't open
> a transaction before the call of apply_handle_commit_internal().
> For applying spooled messages, we call begin_replication_step as well.
>
> I can miss something, but timing when we receive COMMIT message
> without opening a transaction, would be the case of empty transactions
> where the subscription (and its subscription worker) is not interested.
>

I think when we skip non-streamed transactions we don't start a
transaction. So, if we do what you are suggesting, we will miss to
clear the skip_lsn after skipping the transaction.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
>
> On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > wrote:
> > >
> >
> > 6.
> > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> >   TupleTableSlot *remoteslot;
> >   MemoryContext oldctx;
> >
> > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > + if (is_skipping_changes() ||
> >
> > Is there a reason to keep the skip_changes check here and in other DML
> > operations instead of at one central place in apply_dispatch?
>
> Since we already have the check of applying the change on the spot at
> the beginning of the handlers I feel it's better to add
> is_skipping_changes() to the check than add a new if statement to
> apply_dispatch, but do you prefer to check it in one central place in
> apply_dispatch?
>

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
==
1. IIRC, earlier, we thought of allowing to use of this option (SKIP)
only for superusers (as this can lead to inconsistent data if not used
carefully) but I don't see that check in the latest patch. What is the
reason for the same?

2.
+ /*
+ * Update the subskiplsn of the tuple to InvalidXLogRecPtr.

I think we can change the above part of the comment to "Clear subskiplsn."

3.
+ * Since we already have

Isn't it better to say here: Since we have already ...?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Masahiko Sawada
On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
>
> On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> wrote:
> >
>
> 6.
> @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
>   TupleTableSlot *remoteslot;
>   MemoryContext oldctx;
>
> - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> + if (is_skipping_changes() ||
>
> Is there a reason to keep the skip_changes check here and in other DML
> operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Skipping logical replication transactions on subscriber side

2022-03-15 Thread osumi.takami...@fujitsu.com
On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada  
wrote:
> I've attached an updated version patch.

A couple of minor comments on v14.

(1) apply_handle_commit_internal


+   if (is_skipping_changes())
+   {
+   stop_skipping_changes();
+
+   /*
+* Start a new transaction to clear the subskipxid, if not 
started
+* yet. The transaction is committed below.
+*/
+   if (!IsTransactionState())
+   StartTransactionCommand();
+   }
+

I suppose we can move this condition check and stop_skipping_changes() call
to the inside of the block we enter when IsTransactionState() returns true.

As the comment of apply_handle_commit_internal() mentions,
it's the helper function for apply_handle_commit() and
apply_handle_stream_commit().

Then, I couldn't think that both callers don't open
a transaction before the call of apply_handle_commit_internal().
For applying spooled messages, we call begin_replication_step as well.

I can miss something, but timing when we receive COMMIT message
without opening a transaction, would be the case of empty transactions
where the subscription (and its subscription worker) is not interested.
If this is true, currently the patch's code includes
such cases within the range of is_skipping_changes() check.

(2) clear_subscription_skip_lsn's comments.

The comments for this function shouldn't touch
update of origin states, now that we don't update those.

+/*
+ * Clear subskiplsn of pg_subscription catalog with origin state updated.
+ *


This applies to other comments.

+   /*
+* Update the subskiplsn of the tuple to InvalidXLogRecPtr.  If user has
+* already changed subskiplsn before clearing it we don't update the
+* catalog and don't advance the replication origin state.  
...
+*We can reduce the possibility by
+* logging a replication origin WAL record to advance the origin LSN
+* instead but there is no way to advance the origin timestamp and it
+* doesn't seem to be worth doing anything about it since it's a very 
rare
+* case.
+*/



Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  wrote:
>
> I've attached an updated version patch.
>

Review:
===
1.
+++ b/doc/src/sgml/logical-replication.sgml
@@ -366,15 +366,19 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
transaction, the subscription needs to be disabled temporarily by
ALTER SUBSCRIPTION ... DISABLE first or
alternatively, the
subscription can be used with the
disable_on_error option.
-   Then, the transaction can be skipped by calling the
+   Then, the transaction can be skipped by using
+   ALTER SUBSCRITPION ... SKIP with the finish LSN
+   (i.e., LSN 0/14C0378). After that the replication
+   can be resumed by ALTER SUBSCRIPTION ... ENABLE.
+   Alternatively, the transaction can also be skipped by calling the

Do we really need to disable the subscription for the skip feature? I
think that is required for origin_advance. Also, probably, we can say
Finish LSN could be Prepare LSN, Commit LSN, etc.

2.
+ /*
+ * Quick return if it's not requested to skip this transaction. This
+ * function is called every start of applying changes and we assume that
+ * skipping the transaction is not used in many cases.
+ */
+ if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||

The second part of this comment (especially ".. every start of
applying changes ..") sounds slightly odd to me. How about changing it
to: "This function is called for every remote transaction and we
assume that skipping the transaction is not used in many cases."

3.
+
+ ereport(LOG,
+ errmsg("start skipping logical replication transaction which
finished at %X/%X",
...
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",

No need of 'which' in above LOG messages. I think the message will be
clear without the use of which in above message.

4.
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",
+ LSN_FORMAT_ARGS(skip_xact_finish_lsn;
+
+ /* Stop skipping changes */
+ skip_xact_finish_lsn = InvalidXLogRecPtr;

Let's reverse the order of these statements to make them consistent
with the corresponding maybe_start_* function.

5.
+
+ if (myskiplsn != finish_lsn)
+ ereport(WARNING,
+ errmsg("skip-LSN of logical replication subscription \"%s\"
cleared", MySubscription->name),

Shouldn't this be a LOG instead of a WARNING as this will be displayed
only in server logs and by background apply worker?

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
  TupleTableSlot *remoteslot;
  MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

7.
+ /*
+ * Start a new transaction to clear the subskipxid, if not started
+ * yet. The transaction is committed below.
+ */
+ if (!IsTransactionState())

I think the second part of the comment: "The transaction is committed
below." is not required.

8.
+ XLogRecPtr subskiplsn; /* All changes which finished at this LSN are
+ * skipped */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -109,6 +112,8 @@ typedef struct Subscription
  bool disableonerr; /* Indicates if the subscription should be
  * automatically disabled if a worker error
  * occurs */
+ XLogRecPtr skiplsn; /* All changes which finished at this LSN are
+ * skipped */

No need for 'which' in the above comments.

9.
Can we merge 029_disable_on_error in 030_skip_xact and name it as
029_on_error (or 029_on_error_skip_disable or some variant of it)?
Both seem to be related features. I am slightly worried at the pace at
which the number of test files are growing in subscription test.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Masahiko Sawada
Hi,

On Fri, Mar 11, 2022 at 8:37 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, March 11, 2022 5:20 PM Masahiko Sawada  
> wrote:
> > I've attached an updated version patch. This patch can be applied on top of 
> > the
> > latest disable_on_error patch[1].
> Hi, thank you for the patch. I'll share my review comments on v13.
>
>
> (a) src/backend/commands/subscriptioncmds.c
>
> @@ -84,6 +86,8 @@ typedef struct SubOpts
> boolstreaming;
> booltwophase;
> booldisableonerr;
> +   XLogRecPtr  lsn;/* InvalidXLogRecPtr for 
> resetting purpose,
> +* otherwise 
> a valid LSN */
>
>
> I think this explanation is slightly odd and can be improved.
> Strictly speaking, I feel a *valid* LSN is for retting transaction purpose
> from the functional perspective. Also, the wording "resetting purpose"
> is unclear by itself. I'll suggest below change.
>
> From:
> InvalidXLogRecPtr for resetting purpose, otherwise a valid LSN
> To:
> A valid LSN when we skip transaction, otherwise InvalidXLogRecPtr

"when we skip transaction" sounds incorrect to me since it's just an
option value but does not indicate that we really skip the transaction
that has that LSN. I realized that we directly use InvalidXLogRecPtr
for subskiplsn so I think no need to mention it.

>
> (b) The code position of additional append in describeSubscriptions
>
>
> +
> +   /* Skip LSN is only supported in v15 and higher */
> +   if (pset.sversion >= 15)
> +   appendPQExpBuffer(,
> + ", subskiplsn AS 
> \"%s\"\n",
> + gettext_noop("Skip 
> LSN"));
>
> I suggest to combine this code after subdisableonerr.

I got the comment[1] from Peter to put it at the end, which looks better to me.

>
> (c) parse_subscription_options
>
>
> +   /* Parse the argument as LSN */
> +   lsn = 
> DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
>
>
> Here, shouldn't we call DatumGetLSN, instead of DatumGetTransactionId ?

Right, fixed.

>
>
> (d) parse_subscription_options
>
> +   if (strcmp(lsn_str, "none") == 0)
> +   {
> +   /* Setting lsn = NONE is treated as resetting 
> LSN */
> +   lsn = InvalidXLogRecPtr;
> +   }
> +
>
> We should remove this pair of curly brackets that is for one sentence.

I moved the comment on top of the if statement and removed the brackets.

>
>
> (e) src/backend/replication/logical/worker.c
>
> + * to skip applying the changes when starting to apply changes.  The 
> subskiplsn is
> + * cleared after successfully skipping the transaction or applying non-empty
> + * transaction, where the later avoids the mistakenly specified subskiplsn 
> from
> + * being left.
>
> typo "the later" -> "the latter"
>
> At the same time, I feel the last part of this sentence can be an independent 
> sentence.
> From:
> , where the later avoids the mistakenly specified subskiplsn from being left
> To:
> . The latter prevents the mistakenly specified subskiplsn from being left

Fixed.

>
>
> * Note that my comments below are applied if we choose we don't merge 
> disable_on_error test with skip lsn tests.
>
> (f) src/test/subscription/t/030_skip_xact.pl
>
> +use Test::More tests => 4;
>
> It's better to utilize the new style for the TAP test.
> Then, probably we should introduce done_testing()
> at the end of the test.

Fixed.

>
> (g) src/test/subscription/t/030_skip_xact.pl
>
> I think there's no need to create two types of subscriptions.
> Just one subscription with two_phase = on and streaming = on
> would be sufficient for the tests(normal commit, commit prepared,
> stream commit cases). I think this point of view will reduce
> the number of the table and the publication, which will
> make the whole test simpler.

Good point, fixed.

On Mon, Mar 14, 2022 at 9:39 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, March 11, 2022 5:20 PM Masahiko Sawada  
> wrote:
> > I've attached an updated version patch. This patch can be applied on top of 
> > the
> > latest disable_on_error patch[1].
> Hi, few extra comments on v13.
>
>
> (1) src/backend/replication/logical/worker.c
>
>
> With regard to clear_subscription_skip_lsn,
> There are cases that we conduct origin state update twice.
>
> For instance, the case we reset subskiplsn by executing an
> irrelevant non-empty transaction. The first update is
> conducted at apply_handle_commit_internal and the second one
> is at clear_subscription_skip_lsn. In the second change,
> we update replorigin_session_origin_lsn by smaller value(commit_lsn),
> compared to the first update(end_lsn). Were those intentional and OK ?

Good catch, this part 

Re: Skipping logical replication transactions on subscriber side

2022-03-14 Thread Masahiko Sawada
On Mon, Mar 14, 2022 at 6:50 PM shiy.f...@fujitsu.com
 wrote:
>
> On Fri, Mar 11, 2022 4:20 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated version patch. This patch can be applied on
> > top of the latest disable_on_error patch[1].
> >
>
> Thanks for your patch. Here are some comments for the v13 patch.

Thank you for the comments!

>
> 1. doc/src/sgml/ref/alter_subscription.sgml
> +  Specifies the transaction's finish LSN of the remote transaction 
> whose changes
>
> Could it be simplified to "Specifies the finish LSN of the remote transaction
> whose ...".

Fixed.

>
> 2.
> I met a failed assertion, the backtrace is attached. This is caused by the
> following code in maybe_start_skipping_changes().
>
> +   /*
> +* It's a rare case; a past subskiplsn was left because the 
> server
> +* crashed after preparing the transaction and before 
> clearing the
> +* subskiplsn. We clear it without a warning message so as 
> not confuse
> +* the user.
> +*/
> +   if (unlikely(MySubscription->skiplsn < lsn))
> +   {
> +   clear_subscription_skip_lsn(MySubscription->skiplsn, 
> InvalidXLogRecPtr, 0,
> + 
>   false);
> +   Assert(!IsTransactionState());
> +   }
>
> We want to clear subskiplsn in the case mentioned in comment. But if the next
> transaction is a steaming transaction and this function is called by
> apply_spooled_messages(), we are inside a transaction here. So, I think this
> assertion is not suitable for streaming transaction. Thoughts?

Good catch. After more thought, I realized that the assumption of this
if statement is wrong and we don't necessarily need to do here since
the left skip-LSN will eventually be cleared when the next transaction
is finished. So removed this part.

>
> 3.
> +   XLogRecPtr  subskiplsn; /* All changes which 
> committed at this LSN are
> +* skipped */
>
> To be consistent, should the comment be changed to "All changes which finished
> at this LSN are skipped"?

Fixed.

>
> 4.
> +  After logical replication worker successfully skips the transaction or 
> commits
> +  non-empty transaction, the LSN (stored in
> +  
> pg_subscription.subskiplsn)
> +  is cleared.
>
> Besides "commits non-empty transaction", subskiplsn would also be cleared in
> some two-phase commit cases I think. Like prepare/commit/rollback a 
> transaction,
> even if it is an empty transaction. So, should we change it for these cases?

Fixed.

>
> 5.
> + * Clear subskiplsn of pg_subscription catalog with origin state update.
>
> Should "with origin state update" modified to "with origin state updated"?

Fixed.

I'll submit an updated patch soon.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Skipping logical replication transactions on subscriber side

2022-03-14 Thread osumi.takami...@fujitsu.com
On Friday, March 11, 2022 5:20 PM Masahiko Sawada  wrote:
> I've attached an updated version patch. This patch can be applied on top of 
> the
> latest disable_on_error patch[1].
Hi, few extra comments on v13.


(1) src/backend/replication/logical/worker.c


With regard to clear_subscription_skip_lsn,
There are cases that we conduct origin state update twice.

For instance, the case we reset subskiplsn by executing an
irrelevant non-empty transaction. The first update is
conducted at apply_handle_commit_internal and the second one
is at clear_subscription_skip_lsn. In the second change,
we update replorigin_session_origin_lsn by smaller value(commit_lsn),
compared to the first update(end_lsn). Were those intentional and OK ?


(2) src/backend/replication/logical/worker.c

+ * Both origin_lsn and origin_timestamp are the remote transaction's end_lsn
+ * and commit timestamp, respectively.
+ */
+static void
+stop_skipping_changes(XLogRecPtr origin_lsn, TimestampTz origin_ts)

Typo. Should change 'origin_timestamp' to 'origin_ts',
because the name of the argument is the latter.

Also, here we handle not only commit but also prepare.
You need to fix the comment "commit timestamp" as well.

(3) src/backend/replication/logical/worker.c

+/*
+ * Clear subskiplsn of pg_subscription catalog with origin state update.
+ *
+ * if with_warning is true, we raise a warning when clearing the subskipxid.

It's better to insert this second sentence as the last sentence of
the other comments. It should start with capital letter as well.


Best Regards,
Takamichi Osumi



RE: Skipping logical replication transactions on subscriber side

2022-03-14 Thread shiy.f...@fujitsu.com
On Fri, Mar 11, 2022 4:20 PM Masahiko Sawada  wrote:
> 
> I've attached an updated version patch. This patch can be applied on
> top of the latest disable_on_error patch[1].
> 

Thanks for your patch. Here are some comments for the v13 patch.

1. doc/src/sgml/ref/alter_subscription.sgml
+  Specifies the transaction's finish LSN of the remote transaction 
whose changes

Could it be simplified to "Specifies the finish LSN of the remote transaction
whose ...".

2.
I met a failed assertion, the backtrace is attached. This is caused by the
following code in maybe_start_skipping_changes().

+   /*
+* It's a rare case; a past subskiplsn was left because the 
server
+* crashed after preparing the transaction and before clearing 
the
+* subskiplsn. We clear it without a warning message so as not 
confuse
+* the user.
+*/
+   if (unlikely(MySubscription->skiplsn < lsn))
+   {
+   clear_subscription_skip_lsn(MySubscription->skiplsn, 
InvalidXLogRecPtr, 0,
+   
false);
+   Assert(!IsTransactionState());
+   }

We want to clear subskiplsn in the case mentioned in comment. But if the next
transaction is a steaming transaction and this function is called by
apply_spooled_messages(), we are inside a transaction here. So, I think this
assertion is not suitable for streaming transaction. Thoughts?

3.
+   XLogRecPtr  subskiplsn; /* All changes which committed 
at this LSN are
+* skipped */

To be consistent, should the comment be changed to "All changes which finished
at this LSN are skipped"?

4.
+  After logical replication worker successfully skips the transaction or 
commits
+  non-empty transaction, the LSN (stored in
+  
pg_subscription.subskiplsn)
+  is cleared.

Besides "commits non-empty transaction", subskiplsn would also be cleared in
some two-phase commit cases I think. Like prepare/commit/rollback a transaction,
even if it is an empty transaction. So, should we change it for these cases?

5.
+ * Clear subskiplsn of pg_subscription catalog with origin state update.

Should "with origin state update" modified to "with origin state updated"?

Regards,
Shi yu
#0  0x7fe3d1db170f in raise () from /lib64/libc.so.6
#1  0x7fe3d1d9bb25 in abort () from /lib64/libc.so.6
#2  0x00e0a657 in ExceptionalCondition 
(conditionName=conditionName@entry=0xfe9265 "!IsTransactionState()", 
errorType=errorType@entry=0xeb6917 "FailedAssertion",
fileName=fileName@entry=0xfdc44c "worker.c", 
lineNumber=lineNumber@entry=3846) at assert.c:69
#3  0x00ae5a52 in maybe_start_skipping_changes (lsn=lsn@entry=22983032) 
at worker.c:3846
#4  0x00aedd35 in apply_spooled_messages (xid=xid@entry=722, 
lsn=22983032) at worker.c:1385
#5  0x00aee3b0 in apply_handle_stream_commit (s=s@entry=0x7ffd1354ee00) 
at worker.c:1486
#6  0x00aecf83 in apply_dispatch (s=s@entry=0x7ffd1354ee00) at 
worker.c:2520
#7  0x00aed41c in LogicalRepApplyLoop (last_received=22983080, 
last_received@entry=0) at worker.c:2751
#8  0x00aedb25 in start_apply (origin_startpos=0) at worker.c:3514
#9  0x00aef656 in ApplyWorkerMain (main_arg=) at 
worker.c:3770
#10 0x00a72881 in StartBackgroundWorker () at bgworker.c:858
#11 0x00a8e159 in do_start_bgworker (rw=rw@entry=0x27e99f0) at 
postmaster.c:5916
#12 0x00a8e30a in maybe_start_bgworkers () at postmaster.c:6141
#13 0x00a8f833 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5305
#14 
#15 0x7fe3d1e6d4ab in select () from /lib64/libc.so.6
#16 0x00a90d68 in ServerLoop () at postmaster.c:1765
#17 0x00a933c7 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x27bdaf0) at postmaster.c:1473
#18 0x009203c4 in main (argc=3, argv=0x27bdaf0) at main.c:202


RE: Skipping logical replication transactions on subscriber side

2022-03-11 Thread osumi.takami...@fujitsu.com
On Friday, March 11, 2022 5:20 PM Masahiko Sawada  wrote:
> I've attached an updated version patch. This patch can be applied on top of 
> the
> latest disable_on_error patch[1].
Hi, thank you for the patch. I'll share my review comments on v13.


(a) src/backend/commands/subscriptioncmds.c

@@ -84,6 +86,8 @@ typedef struct SubOpts
boolstreaming;
booltwophase;
booldisableonerr;
+   XLogRecPtr  lsn;/* InvalidXLogRecPtr for 
resetting purpose,
+* otherwise a 
valid LSN */


I think this explanation is slightly odd and can be improved.
Strictly speaking, I feel a *valid* LSN is for retting transaction purpose
from the functional perspective. Also, the wording "resetting purpose"
is unclear by itself. I'll suggest below change.

From:
InvalidXLogRecPtr for resetting purpose, otherwise a valid LSN
To:
A valid LSN when we skip transaction, otherwise InvalidXLogRecPtr

(b) The code position of additional append in describeSubscriptions


+
+   /* Skip LSN is only supported in v15 and higher */
+   if (pset.sversion >= 15)
+   appendPQExpBuffer(,
+ ", subskiplsn AS 
\"%s\"\n",
+ gettext_noop("Skip 
LSN"));

I suggest to combine this code after subdisableonerr.

(c) parse_subscription_options


+   /* Parse the argument as LSN */
+   lsn = 
DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,


Here, shouldn't we call DatumGetLSN, instead of DatumGetTransactionId ?


(d) parse_subscription_options

+   if (strcmp(lsn_str, "none") == 0)
+   {
+   /* Setting lsn = NONE is treated as resetting 
LSN */
+   lsn = InvalidXLogRecPtr;
+   }
+

We should remove this pair of curly brackets that is for one sentence.


(e) src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The 
subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction, where the later avoids the mistakenly specified subskiplsn from
+ * being left.

typo "the later" -> "the latter"

At the same time, I feel the last part of this sentence can be an independent 
sentence.
From:
, where the later avoids the mistakenly specified subskiplsn from being left
To:
. The latter prevents the mistakenly specified subskiplsn from being left


* Note that my comments below are applied if we choose we don't merge 
disable_on_error test with skip lsn tests.

(f) src/test/subscription/t/030_skip_xact.pl

+use Test::More tests => 4;

It's better to utilize the new style for the TAP test.
Then, probably we should introduce done_testing()
at the end of the test.

(g) src/test/subscription/t/030_skip_xact.pl

I think there's no need to create two types of subscriptions.
Just one subscription with two_phase = on and streaming = on
would be sufficient for the tests(normal commit, commit prepared,
stream commit cases). I think this point of view will reduce
the number of the table and the publication, which will
make the whole test simpler.


Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-11 Thread Masahiko Sawada
On Thu, Mar 10, 2022 at 9:02 PM Amit Kapila  wrote:
>
> On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated patch along with two patches for cfbot tests
> > since the main patch (0003) depends on the other two patches. Both
> > 0001 and 0002 patches are the same ones I attached on another
> > thread[2].
> >
>
> Few comments on 0003:
> =
> 1.
> + 
> +  
> +   subskiplsn pg_lsn
> +  
> +  
> +   Commit LSN of the transaction whose changes are to be skipped,
> if a valid
> +   LSN; otherwise 0/0.
> +  
> + 
>
> Can't this be prepared LSN or rollback prepared LSN? Can we say
> Finish/End LSN and then add some details which all LSNs can be there?

Right, changed to finish LSN.

>
> 2. The conflict resolution explanation needs an update after the
> latest commits and we should probably change the commit LSN
> terminology as mentioned in the previous point.

Updated.

>
> 3. The text in alter_subscription.sgml looks a bit repetitive to me
> (similar to what we have in logical-replication.sgml related to
> conflicts). Here also we refer to only commit LSN which needs to be
> changed as mentioned in the previous two points.

Updated.

>
> 4.
> if (strcmp(lsn_str, "none") == 0)
> + {
> + /* Setting lsn = NONE is treated as resetting LSN */
> + lsn = InvalidXLogRecPtr;
> + }
> + else
> + {
> + /* Parse the argument as LSN */
> + lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
> + CStringGetDatum(lsn_str)));
> +
> + if (XLogRecPtrIsInvalid(lsn))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid WAL location (LSN): %s", lsn_str)));
>
> Is there a reason that we don't want to allow setting 0
> (InvalidXLogRecPtr) for skip LSN?

0 is obviously an invalid value for skip LSN, which should not be
allowed similar to other options (like setting '' to slot_name). Also,
we use 0 (InvalidXLogRecPtr) internally to reset the subskipxid when
NONE is specified.

>
> 5.
> +# The subscriber will enter an infinite error loop, so we don't want
> +# to overflow the server log with error messages.
> +$node_subscriber->append_conf(
> + 'postgresql.conf',
> + qq[
> +wal_retrieve_retry_interval = 2s
> +]);
>
> Can we change this test to use disable_on_error feature? I am thinking
> if the disable_on_error feature got committed first, maybe we can have
> one test file for this and disable_on_error feature (something like
> conflicts.pl).

Good idea. Updated.

I've attached an updated version patch. This patch can be applied on
top of the latest disable_on_error patch[1].

Regards,

[1] 
https://www.postgresql.org/message-id/CAA4eK1Kes9TsMpGL6m%2BAJNHYCGRvx6piYQt5v6TEbH_t9jh8nA%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v13-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-03-10 Thread Masahiko Sawada
On Thu, Mar 10, 2022 at 2:10 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada  
> wrote:
> > I've attached an updated patch along with two patches for cfbot tests since 
> > the
> > main patch (0003) depends on the other two patches. Both
> > 0001 and 0002 patches are the same ones I attached on another thread[2].
> Hi, few comments on 
> v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.

Thank you for the comments.

>
>
> (1) doc/src/sgml/ref/alter_subscription.sgml
>
>
> +SKIP (  class="parameter">skip_option =  class="parameter">value ...
> +  ...After logical replication
> +  successfully skips the transaction or commits non-empty transaction,
> +  the LSN (stored in
> +  
> pg_subscription.subskiplsn)
> +  is cleared.  See  for
> +  the details of logical replication conflicts.
> + 
> ...
> +lsn (pg_lsn)
> +
> + 
> +  Specifies the commit LSN of the remote transaction whose changes 
> are to be skipped
> +  by the logical replication worker.  Skipping
> +  individual subtransactions is not supported.  Setting 
> NONE
> +  resets the LSN.
>
>
> I think we'll extend the SKIP option choices in the future besides the 'lsn' 
> option.
> Then, one sentence "After logical replication successfully skips the 
> transaction or commits non-empty
> transaction, the LSN .. is cleared" should be moved to the explanation for 
> 'lsn' section,
> if we think this behavior to reset LSN is unique for 'lsn' option ?

Hmm, I think that regardless of the type of option (e.g., relid, xid,
and action whatever), resetting the specified something after that is
specific to SKIP command. SKIP command should have an effect on only
the first non-empty transaction. Otherwise, we could end up leaving it
if the user mistakenly specifies the wrong one.

>
>
> (2) doc/src/sgml/catalogs.sgml
>
> + 
> +  
> +   subskiplsn pg_lsn
> +  
> +  
> +   Commit LSN of the transaction whose changes are to be skipped, if a 
> valid
> +   LSN; otherwise 0/0.
> +  
> + 
> +
>
> We need to cover the PREPARE that keeps causing errors on the subscriber.
> This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

Fixed.

>
> (3) apply_handle_commit_internal comments
>
>  /*
>   * Helper function for apply_handle_commit and apply_handle_stream_commit.
> + * Return true if the transaction was committed, otherwise return false.
>   */
>
> If we want to make the new added line alinged with other functions in 
> worker.c,
> we should insert one blank line before it ?

This part is removed.

>
>
> (4) apply_worker_post_transaction
>
> I'm not sure if the current refactoring is good or not.
> For example, the current HEAD calls pgstat_report_stat(false)
> for a commit case if we are in a transaction in apply_handle_commit_internal.
> On the other hand, your refactoring calls pgstat_report_stat unconditionally
> for apply_handle_commit path. I'm not sure if there
> are many cases to call apply_handle_commit without opening a transaction,
> but is that acceptable ?
>
> Also, the name is a bit broad.
> How about making a function only for stopping and resetting LSN at this stage 
> ?

Agreed, it seems to be overkill. I'll revert that change.

>
>
> (5) comments for clear_subscription_skip_lsn
>
> How about changing the comment like below  ?
>
> From:
> Clear subskiplsn of pg_subscription catalog
> To:
> Clear subskiplsn of pg_subscription catalog with origin state update
>

Updated.

I'll submit an updated patch that incorporated comments I got so far
and is rebased to disable_on_error patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-03-10 Thread Amit Kapila
On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
>
> I've attached an updated patch along with two patches for cfbot tests
> since the main patch (0003) depends on the other two patches. Both
> 0001 and 0002 patches are the same ones I attached on another
> thread[2].
>

Few comments on 0003:
=
1.
+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Commit LSN of the transaction whose changes are to be skipped,
if a valid
+   LSN; otherwise 0/0.
+  
+ 

Can't this be prepared LSN or rollback prepared LSN? Can we say
Finish/End LSN and then add some details which all LSNs can be there?

2. The conflict resolution explanation needs an update after the
latest commits and we should probably change the commit LSN
terminology as mentioned in the previous point.

3. The text in alter_subscription.sgml looks a bit repetitive to me
(similar to what we have in logical-replication.sgml related to
conflicts). Here also we refer to only commit LSN which needs to be
changed as mentioned in the previous two points.

4.
if (strcmp(lsn_str, "none") == 0)
+ {
+ /* Setting lsn = NONE is treated as resetting LSN */
+ lsn = InvalidXLogRecPtr;
+ }
+ else
+ {
+ /* Parse the argument as LSN */
+ lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
+ CStringGetDatum(lsn_str)));
+
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL location (LSN): %s", lsn_str)));

Is there a reason that we don't want to allow setting 0
(InvalidXLogRecPtr) for skip LSN?

5.
+# The subscriber will enter an infinite error loop, so we don't want
+# to overflow the server log with error messages.
+$node_subscriber->append_conf(
+ 'postgresql.conf',
+ qq[
+wal_retrieve_retry_interval = 2s
+]);

Can we change this test to use disable_on_error feature? I am thinking
if the disable_on_error feature got committed first, maybe we can have
one test file for this and disable_on_error feature (something like
conflicts.pl).

-- 
With Regards,
Amit Kapila.




RE: Skipping logical replication transactions on subscriber side

2022-03-09 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada  
wrote:
> I've attached an updated patch along with two patches for cfbot tests since 
> the
> main patch (0003) depends on the other two patches. Both
> 0001 and 0002 patches are the same ones I attached on another thread[2].
Hi, few comments on 
v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.


(1) doc/src/sgml/ref/alter_subscription.sgml


+SKIP ( skip_option = valuepg_subscription.subskiplsn)
+  is cleared.  See  for
+  the details of logical replication conflicts.
+ 
...
+lsn (pg_lsn)
+
+ 
+  Specifies the commit LSN of the remote transaction whose changes are 
to be skipped
+  by the logical replication worker.  Skipping
+  individual subtransactions is not supported.  Setting 
NONE
+  resets the LSN.


I think we'll extend the SKIP option choices in the future besides the 'lsn' 
option.
Then, one sentence "After logical replication successfully skips the 
transaction or commits non-empty
transaction, the LSN .. is cleared" should be moved to the explanation for 
'lsn' section,
if we think this behavior to reset LSN is unique for 'lsn' option ?


(2) doc/src/sgml/catalogs.sgml

+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Commit LSN of the transaction whose changes are to be skipped, if a 
valid
+   LSN; otherwise 0/0.
+  
+ 
+

We need to cover the PREPARE that keeps causing errors on the subscriber.
This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

(3) apply_handle_commit_internal comments

 /*
  * Helper function for apply_handle_commit and apply_handle_stream_commit.
+ * Return true if the transaction was committed, otherwise return false.
  */

If we want to make the new added line alinged with other functions in worker.c,
we should insert one blank line before it ?


(4) apply_worker_post_transaction

I'm not sure if the current refactoring is good or not.
For example, the current HEAD calls pgstat_report_stat(false)
for a commit case if we are in a transaction in apply_handle_commit_internal.
On the other hand, your refactoring calls pgstat_report_stat unconditionally
for apply_handle_commit path. I'm not sure if there
are many cases to call apply_handle_commit without opening a transaction,
but is that acceptable ?

Also, the name is a bit broad.
How about making a function only for stopping and resetting LSN at this stage ?


(5) comments for clear_subscription_skip_lsn

How about changing the comment like below  ?

From:
Clear subskiplsn of pg_subscription catalog
To:
Clear subskiplsn of pg_subscription catalog with origin state update


Best Regards,
Takamichi Osumi



Re: Skipping logical replication transactions on subscriber side

2022-03-03 Thread Amit Kapila
On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
>
> I’ve considered a plan for the skipping logical replication
> transaction feature toward PG15. Several ideas and patches have been
> proposed here and another related thread[1][2] for the skipping
> logical replication transaction feature as follows:
>
> A. Change pg_stat_subscription_workers (committed 7a8507329085)
> B. Add origin name and commit-LSN to logical replication worker
> errcontext (proposed[2])
> C. Store error information (e.g., the error message and commit-LSN) to
> the system catalog
> D. Introduce ALTER SUBSCRIPTION SKIP
> E. Record the skipped data somewhere: server logs or a table
>
> Given the remaining time for PG15, it’s unlikely to complete all of
> them for PG15 by the feature freeze. The most realistic plan for PG15
> in my mind is to complete B and D. With these two items, the LSN of
> the error-ed transaction is shown in the server log, and we can ask
> users to check server logs for the LSN and use it with ALTER
> SUBSCRIPTION SKIP command.
>

It makes sense to me to try to finish B and D from the above list for
PG-15. I can review the patch for D in detail if others don't have an
objection to it.

Peter E., others, any opinion on this matter?

> If the community agrees with B+D, we will
> have a user-visible feature for PG15 which can be further
> extended/improved in PG16 by adding C and E.

Agreed.

>
> I've attached an updated patch for D and here is the summary:
>
> * Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
> '0/1234'). The user can get the commit-LSN of the transaction in
> question from the server logs thanks to B[2].
> * The user-specified LSN (say skip-LSN) is stored in the
> pg_subscription catalog.
> * The apply worker skips the whole transaction if the transaction's
> commit-LSN exactly matches to skip-LSN.
> * The skip-LSN has an effect on only the first non-empty transaction
> since the worker started to apply changes. IOW it's cleared after
> either skipping the whole transaction or successfully committing a
> non-empty transaction, preventing the skip-LSN to remain in the
> catalog. Also, since the latter case means that the user set the wrong
> skip-LSN we clear it with a warning.
>

As this will be displayed only in server logs and by background apply
worker, should it be LOG or WARNING?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-01 Thread Masahiko Sawada
On Tue, Feb 15, 2022 at 7:35 PM Peter Eisentraut
 wrote:
>
> On 14.02.22 10:16, Amit Kapila wrote:
> > I think exposing LSN is a better approach as it doesn't have the
> > dangers of wraparound. And, I think users can use it with the existing
> > function pg_replication_origin_advance() which will save us from
> > adding additional code for this feature. We can explain/expand in docs
> > how users can use the error information from view/error_logs and use
> > the existing function to skip conflicting transactions. We might want
> > to even expose error_origin to make it a bit easier for users but not
> > sure. I feel the need for the new syntax (and then added code
> > complexity due to that) isn't warranted if we expose error_LSN and let
> > users use it with the existing functions.
>
> Well, the whole point of this feature is to provide a higher-level
> interface instead of pg_replication_origin_advance().  Replication
> origins are currently not something the users have to deal with
> directly.  We already document that you can use
> pg_replication_origin_advance() to skip erroring transactions.  But that
> seems unsatisfactory.  It'd be like using pg_surgery to fix unique
> constraint violations.

+1

I’ve considered a plan for the skipping logical replication
transaction feature toward PG15. Several ideas and patches have been
proposed here and another related thread[1][2] for the skipping
logical replication transaction feature as follows:

A. Change pg_stat_subscription_workers (committed 7a8507329085)
B. Add origin name and commit-LSN to logical replication worker
errcontext (proposed[2])
C. Store error information (e.g., the error message and commit-LSN) to
the system catalog
D. Introduce ALTER SUBSCRIPTION SKIP
E. Record the skipped data somewhere: server logs or a table

Given the remaining time for PG15, it’s unlikely to complete all of
them for PG15 by the feature freeze. The most realistic plan for PG15
in my mind is to complete B and D. With these two items, the LSN of
the error-ed transaction is shown in the server log, and we can ask
users to check server logs for the LSN and use it with ALTER
SUBSCRIPTION SKIP command. If the community agrees with B+D, we will
have a user-visible feature for PG15 which can be further
extended/improved in PG16 by adding C and E. I started a new thread[2]
for B yesterday. In this thread, I'd like to discuss D.

I've attached an updated patch for D and here is the summary:

* Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
'0/1234'). The user can get the commit-LSN of the transaction in
question from the server logs thanks to B[2].
* The user-specified LSN (say skip-LSN) is stored in the
pg_subscription catalog.
* The apply worker skips the whole transaction if the transaction's
commit-LSN exactly matches to skip-LSN.
* The skip-LSN has an effect on only the first non-empty transaction
since the worker started to apply changes. IOW it's cleared after
either skipping the whole transaction or successfully committing a
non-empty transaction, preventing the skip-LSN to remain in the
catalog. Also, since the latter case means that the user set the wrong
skip-LSN we clear it with a warning.
* ALTER SUBSCRIPTION SKIP doesn't support tablesync workers. But it
would not be a problem in practice since an error during table
synchronization is not common and could be resolved by truncating the
table and restarting the synchronization.

For the above reasons, ALTER SUBSCRIPTION SKIP command is safer than
the existing way of using pg_replication_origin_advance().

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2].

Regards,

[1] 
https://www.postgresql.org/message-id/20220125063131.4cmvsxbz2tdg6g65%40alap3.anarazel.de
[2] 
https://www.postgresql.org/message-id/CAD21AoBarBf2oTF71ig2g_o%3D3Z_Dt6_sOpMQma1kFgbnA5OZ_w%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


v12-0001-Use-complete-sentences-in-logical-replication-wo.patch
Description: Binary data


v12-0002-Add-the-origin-name-and-remote-commit-LSN-to-log.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-02-15 Thread Peter Eisentraut

On 14.02.22 10:16, Amit Kapila wrote:

I think exposing LSN is a better approach as it doesn't have the
dangers of wraparound. And, I think users can use it with the existing
function pg_replication_origin_advance() which will save us from
adding additional code for this feature. We can explain/expand in docs
how users can use the error information from view/error_logs and use
the existing function to skip conflicting transactions. We might want
to even expose error_origin to make it a bit easier for users but not
sure. I feel the need for the new syntax (and then added code
complexity due to that) isn't warranted if we expose error_LSN and let
users use it with the existing functions.


Well, the whole point of this feature is to provide a higher-level 
interface instead of pg_replication_origin_advance().  Replication 
origins are currently not something the users have to deal with 
directly.  We already document that you can use 
pg_replication_origin_advance() to skip erroring transactions.  But that 
seems unsatisfactory.  It'd be like using pg_surgery to fix unique 
constraint violations.





Re: Skipping logical replication transactions on subscriber side

2022-02-14 Thread Amit Kapila
On Fri, Feb 11, 2022 at 7:40 AM Masahiko Sawada  wrote:
>
> On Thu, Jan 27, 2022 at 10:42 PM Peter Eisentraut
>  wrote:
> >
> > On 26.01.22 05:05, Masahiko Sawada wrote:
> > >> I think it is okay to clear after the first successful application of
> > >> any transaction. What I was not sure was about the idea of giving
> > >> WARNING/ERROR if the first xact to be applied is not the same as
> > >> skip_xid.
> > > Do you prefer not to do anything in this case?
> >
> > I think a warning would be sensible.  If the user specifies to skip a
> > certain transaction and then that doesn't happen, we should at least say
> > something.
>
> Meanwhile waiting for comments on the discussion about the designs of
> both pg_stat_subscription_workers and ALTER SUBSCRIPTION SKIP feature,
> I’ve incorporated some (minor) comments on the current design patch,
> which includes:
>
> * Use LSN instead of XID.
>

I think exposing LSN is a better approach as it doesn't have the
dangers of wraparound. And, I think users can use it with the existing
function pg_replication_origin_advance() which will save us from
adding additional code for this feature. We can explain/expand in docs
how users can use the error information from view/error_logs and use
the existing function to skip conflicting transactions. We might want
to even expose error_origin to make it a bit easier for users but not
sure. I feel the need for the new syntax (and then added code
complexity due to that) isn't warranted if we expose error_LSN and let
users use it with the existing functions.

Do you see any problem with the same?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-02-10 Thread Masahiko Sawada
On Thu, Jan 27, 2022 at 10:42 PM Peter Eisentraut
 wrote:
>
> On 26.01.22 05:05, Masahiko Sawada wrote:
> >> I think it is okay to clear after the first successful application of
> >> any transaction. What I was not sure was about the idea of giving
> >> WARNING/ERROR if the first xact to be applied is not the same as
> >> skip_xid.
> > Do you prefer not to do anything in this case?
>
> I think a warning would be sensible.  If the user specifies to skip a
> certain transaction and then that doesn't happen, we should at least say
> something.

Meanwhile waiting for comments on the discussion about the designs of
both pg_stat_subscription_workers and ALTER SUBSCRIPTION SKIP feature,
I’ve incorporated some (minor) comments on the current design patch,
which includes:

* Use LSN instead of XID.
* Raise a warning if the user specifies to skip a certain transaction
and then that doesn’t happen.
* Skip-LSN has an effect on the first non-empty transaction. That is,
it’s cleared after successfully committing a non-empty transaction,
preventing the user-specified wrong LSN to remain.
* Remove some unnecessary tap tests to reduce the test time.

I think we all agree with the first point regardless of where we store
error information. And speaking of the current design, I think we all
agree on other points. Since the design discussion is ongoing, I’ll
incorporate other comments according to the result of the discussion.

The attached 0001 patch modifies the pg_stat_subscription_workers to
report LSN instead of XID, which is required by ALTER SUBSCRIPTION
SKIP patch, the 0002 patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v11-0001-Report-error-transaction-s-commit-LSN-instead-of.patch
Description: Binary data


v11-0002-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-01-27 Thread Peter Eisentraut

On 26.01.22 05:05, Masahiko Sawada wrote:

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?


I think a warning would be sensible.  If the user specifies to skip a 
certain transaction and then that doesn't happen, we should at least say 
something.





Re: Skipping logical replication transactions on subscriber side

2022-01-26 Thread Masahiko Sawada
On Wed, Jan 26, 2022 at 8:02 PM Amit Kapila  wrote:
>
> On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
> >  wrote:
> > >
> > > We probably should just provide an option for the user to specify 
> > > "subrelid".  If null, only the main apply worker will skip the given xid, 
> > > otherwise only the worker tasked with syncing that particular table will 
> > > do so.  It might take a sequence of ALTER SUBSCRIPTION SET commands to 
> > > get a broken initial table synchronization to load completely but at 
> > > least there will not be any surprises as to which tables had transactions 
> > > skipped and which did not.
> >
> > That would work but I’m concerned that the users can specify it
> > properly. Also, we would need to change the errcontext message
> > generated by apply_error_callback() so the user can know that the
> > error occurred in either apply worker or tablesync worker.
> >
> > Or, as another idea, since an error during table synchronization is
> > not common and could be resolved by truncating the table and
> > restarting the synchronization in practice, there might be no need
> > this much and we can support it only for apply worker errors.
> >
>
> Yes, that is what I have also in mind. We can always extend this
> feature for tablesync process because it can not only fail for the
> specified skip_xid but also for many other reasons during the initial
> copy.

I'll update the patch accordingly to test and verify this approach.

In the meantime, I’d like to discuss the possible ideas of storing the
error XID somewhere the worker can see it even after a restart. It has
been proposed that the worker updates the catalog when an error
occurs, which was criticized as updating the catalog in such a
situation is not a good idea.

The next idea I considered was to store the error XID somewhere on
shmem (e.g., ReplicationState). But It requires entries at least as
much as subscriptions in principle, not
max_logical_replcation_workers. Since we don’t know it at startup
time, we need to use DSM or cache with a fixed number of entries. It
seems overkill to me.

The third idea, which is slightly better than others, is to update the
catalog by the launcher process, not the worker process; when an error
occurs, the apply worker stores the error XID (and maybe its
subscription OID) into its LogicalRepWorker entry, and the launcher
updates the corresponding entry of pg_subscription catalog before
launching workers. After the worker restarts, it clears the error XID
on the catalog if it successfully applied the transaction with the
error XID. The user can enable the skipping transaction behavior by a
query say ALTER SUBSCRIPTION SKIP ENABLED. The user cannot enable the
skipping behavior if the error XID is not set. If the skipping
behavior is enabled and the error XID is a valid value, the worker
skips the transaction and then clears both the error XID and a flag of
skipping behavior on the catalog.

With this idea, we don’t need a complex mechanism to store the error
XID for each subscription and can ensure to skip only the transaction
in question. But my concern is that the launcher updates the catalog.
Since it doesn’t connect to any database, probably it cannot open the
catalog indexes (because it requires lookup pg_class). Therefore, we
have to use in-place updates here. Through quick tests, I’ve confirmed
that using heap_inplace_update() to update the error XID on
pg_subscription tuples seems to work but not sure using an in-place
update here is a legitimate approach.

What do you think and any ideas?


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-01-26 Thread Amit Kapila
On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada  wrote:
>
> On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
>  wrote:
> >
> > We probably should just provide an option for the user to specify 
> > "subrelid".  If null, only the main apply worker will skip the given xid, 
> > otherwise only the worker tasked with syncing that particular table will do 
> > so.  It might take a sequence of ALTER SUBSCRIPTION SET commands to get a 
> > broken initial table synchronization to load completely but at least there 
> > will not be any surprises as to which tables had transactions skipped and 
> > which did not.
>
> That would work but I’m concerned that the users can specify it
> properly. Also, we would need to change the errcontext message
> generated by apply_error_callback() so the user can know that the
> error occurred in either apply worker or tablesync worker.
>
> Or, as another idea, since an error during table synchronization is
> not common and could be resolved by truncating the table and
> restarting the synchronization in practice, there might be no need
> this much and we can support it only for apply worker errors.
>

Yes, that is what I have also in mind. We can always extend this
feature for tablesync process because it can not only fail for the
specified skip_xid but also for many other reasons during the initial
copy.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Masahiko Sawada
On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
 wrote:
>
> On Tue, Jan 25, 2022 at 9:16 PM Amit Kapila  wrote:
>>
>> On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada  
>> wrote:
>> > On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila  
>> > wrote:
>>
>> > > >
>> > > > Probably, we also need to consider the case where the tablesync worker
>> > > > entered an error loop and the user wants to skip the transaction? The
>> > > > apply worker is also running at the same time but it should not clear
>> > > > subskipxid. Similarly, the tablesync worker should not clear
>> > > > subskipxid if the apply worker wants to skip the transaction.
>> > > >
>> > >
>> > > I think for tablesync workers, the skip_xid set via this mechanism
>> > > won't work as we don't have any remote_xid for them, and neither any
>> > > XID is reported in the view for them.
>> >
>> > If the tablesync worker raises an error while applying changes after
>> > finishing the copy, it also reports the error XID.
>> >
>>
>> Right and agreed with your assessment for the same.
>>
>
> IIUC each tablesync process also performs an apply stage but only applies the 
> messages related to the single table it is responsible for.  Once all 
> tablesync workers synchronize they are all destroyed and the main apply 
> worker takes over and applies transactions to all subscribed tables.
>
> We probably should just provide an option for the user to specify "subrelid". 
>  If null, only the main apply worker will skip the given xid, otherwise only 
> the worker tasked with syncing that particular table will do so.  It might 
> take a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial 
> table synchronization to load completely but at least there will not be any 
> surprises as to which tables had transactions skipped and which did not.

That would work but I’m concerned that the users can specify it
properly. Also, we would need to change the errcontext message
generated by apply_error_callback() so the user can know that the
error occurred in either apply worker or tablesync worker.

Or, as another idea, since an error during table synchronization is
not common and could be resolved by truncating the table and
restarting the synchronization in practice, there might be no need
this much and we can support it only for apply worker errors.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread David G. Johnston
On Tue, Jan 25, 2022 at 9:16 PM Amit Kapila  wrote:

> On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada 
> wrote:
> > On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila 
> wrote:
>
> > > >
> > > > Probably, we also need to consider the case where the tablesync
> worker
> > > > entered an error loop and the user wants to skip the transaction? The
> > > > apply worker is also running at the same time but it should not clear
> > > > subskipxid. Similarly, the tablesync worker should not clear
> > > > subskipxid if the apply worker wants to skip the transaction.
> > > >
> > >
> > > I think for tablesync workers, the skip_xid set via this mechanism
> > > won't work as we don't have any remote_xid for them, and neither any
> > > XID is reported in the view for them.
> >
> > If the tablesync worker raises an error while applying changes after
> > finishing the copy, it also reports the error XID.
> >
>
> Right and agreed with your assessment for the same.
>
>
IIUC each tablesync process also performs an apply stage but only applies
the messages related to the single table it is responsible for.  Once all
tablesync workers synchronize they are all destroyed and the main apply
worker takes over and applies transactions to all subscribed tables.

We probably should just provide an option for the user to specify
"subrelid".  If null, only the main apply worker will skip the given xid,
otherwise only the worker tasked with syncing that particular table will do
so.  It might take a sequence of ALTER SUBSCRIPTION SET commands to get a
broken initial table synchronization to load completely but at least there
will not be any surprises as to which tables had transactions skipped and
which did not.

It may even make sense, eventually for the main apply worker to skip on a
subrelid basis.  Since the main apply worker isn't applying transactions at
the same time as the tablesync workers the non-null subrelid can also be
interpreted by the main apply worker.

David J.


Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Amit Kapila
On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila  wrote:
> >
> >
> > I think it is okay to clear after the first successful application of
> > any transaction. What I was not sure was about the idea of giving
> > WARNING/ERROR if the first xact to be applied is not the same as
> > skip_xid.
>
> Do you prefer not to do anything in this case?
>

I am fine with clearing the skip_xid after the first successful
application. But note, we shouldn't do catalog access for this, we can
check if it is set in MySubscription.

> >
> > >
> > > Probably, we also need to consider the case where the tablesync worker
> > > entered an error loop and the user wants to skip the transaction? The
> > > apply worker is also running at the same time but it should not clear
> > > subskipxid. Similarly, the tablesync worker should not clear
> > > subskipxid if the apply worker wants to skip the transaction.
> > >
> >
> > I think for tablesync workers, the skip_xid set via this mechanism
> > won't work as we don't have any remote_xid for them, and neither any
> > XID is reported in the view for them.
>
> If the tablesync worker raises an error while applying changes after
> finishing the copy, it also reports the error XID.
>

Right and agreed with your assessment for the same.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Masahiko Sawada
On Wed, Jan 26, 2022 at 7:05 AM David G. Johnston
 wrote:
>
> On Tue, Jan 25, 2022 at 8:33 AM Masahiko Sawada  wrote:
>>
>> Given that we cannot use rely on the pg_stat_subscription_workers view
>> for this purpose, we would need either a new sub-system that tracks
>> each logical replication status so the system can set the error XID to
>> subskipxid, or to wait for shared-memory based stats collector.
>
>
> I'm reading over the monitoring-stats page to try and get my head around all 
> of this.  First of all, it defines two kinds of views:
>
> 1. PostgreSQL's statistics collector is a subsystem that supports collection 
> and reporting of information about server activity.
> 2. PostgreSQL also supports reporting dynamic information ... This facility 
> is independent of the collector process.
>
> In then has two tables:
>
> 28.1 Dynamic Statistics Views (describing #2 above)
> 28.2 Collected Statistics Views (describing #1 above)
>
> Apparently the "collector process" is UDP-like, not reliable.  The 
> documentation fails to mention this fact.  I'd argue that this is a 
> documentation bug.
>
> I do see that the pg_stat_subscription_workers view is correctly placed in 
> Table 28.2
>
> Reviewing the other views listed in that table only pg_stat_archiver abuses 
> the statistics collector in a similar fashion.  All of the others are 
> actually metric oriented.
>
> I don't care for the specification: "will contain one row per subscription 
> worker on which errors have occurred, for workers applying logical 
> replication changes and workers handling the initial data copy of the 
> subscribed tables."
>
> I would much rather have this behave similar to pg_stat_activity (which, of 
> course, is a Dynamic Statistics View...) in that it shows only and all 
> workers that are presently working.

I have no objection against having a dynamic statistics view showing
the status of each running worker but I think it should be implemented
in a separate view and not be something that replaces the
pg_stat_subscription_workers. I think pg_stat_subscription would be
the right place for it.

> The tablesync workers should go away when they have finished synchronizing.  
> I should not have to manually intervene to get rid of unreliable expired 
> data.  The log file feels like a superior solution to this monitoring view.
>
> Alternatively, if the tablesync workers are done but we've been accumulating 
> real statistics for them, then by all means keep them included in the view - 
> but regardless of whether they encountered an error.  But maybe the view can 
> right join in pg_stat_subscription as show a column for "(pid is not null) AS 
> is_active".
>
> Maybe we need to add a track_finished_tablesync_workers GUC so the DBA can 
> decide whether to devote storage and processing resources to that historical 
> information.
>
> If you had kept the original view name, "pg_stat_subscription_error", this 
> whole issue goes away.  But you decided to make it more generic and call it 
> "pg_stat_subscription_workers" - which means you need to get rid of the 
> error-specific condition in the WHERE clause for the view.  Show all workers 
> - I can filter on is_active.  Showing only active workers is also acceptable. 
>  You won't get to change your mind so decide whether this wants to show only 
> current and running state or whether historical statistics for now defunct 
> tablesync workers are desired.  Personally, I would just show active workers 
> and if someone wants to add the feature they can add a 
> track_tablesync_worker_stats GUC and a matching view.

We plan to clear/remove table sync entries who finished synchronization.

It’s better not to merge dynamic statistics such as pid and is_active
and accumulative statistics into one view. I think we can have both
views: pg_stat_subscription_workers view with some changes based on
the review comments (e.g., removing defunct tablesync entry), and
another view showing dynamic statistics such as the worker status.

> From that, every apply worker should be sending a statistics message to the 
> collector periodically.  If error info is not present and the state is "all 
> is well", clear out any existing error info from the view.  The attempt to 
> include an actual statistic field here doesn't seem useful nor redeeming.  I 
> would add a "state" field in its place (well, after subrelid).  And I would 
> still rename the columns to current_error_* and note that these should be 
> null unless the status field shows error (there may be some additional 
> complexity here).  Just get rid of last_error_count.
>

I don't think that using the stats collector to show the current
status of each worker is a good idea because of 500ms lag, UDP
connection etc. Even if error info is not present and the state is
good according to the view, it might be out-of-date or simply not
true. If we want to do that, it’s much better to prepare something on
shmem so each worker can store its 

Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Masahiko Sawada
On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila  wrote:
>
> On Wed, Jan 26, 2022 at 8:55 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila  
> > > wrote:
> > > >
> > > > IIUC, the proposal is to compare the skip_xid with the very
> > > > transaction the apply worker received to apply and raise a warning if
> > > > it doesn't match with skip_xid and then continue. This seems like a
> > > > reasonable idea but can we guarantee that it is always the first
> > > > transaction that we want to skip? We seem to guarantee that we won't
> > > > get something again once it is written durably/flushed on the
> > > > subscriber side. I guess here it can happen that before the errored
> > > > transaction, there is some empty xact, or maybe part of the stream
> > > > (consider streaming transactions) of some xact, or there could be
> > > > other cases as well where the server will send those xacts again.
> > >
> > > Good point.
> > >
> > > I guess that in the situation the worker entered an error loop, we can
> > > guarantee that the worker fails while applying the first non-empty
> > > transaction since starting logical replication. And the transaction is
> > > what we’d like to skip. If the transaction that can be applied without
> > > an error is resent after a restart, it’s a problem of logical
> > > replication. As you pointed out, it's possible that there are some
> > > empty transactions before the transaction in question since we don't
> > > advance replication origin LSN if the transaction is empty. Also,
> > > probably the same is true for a streamed transaction that is rolled
> > > back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
> > > subskipxid if the transaction is empty? That is, we make sure to clear
> > > it after applying the first non-empty transaction. We would need to
> > > carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
> > > ends up not working at all in some cases.
>
> I think it is okay to clear after the first successful application of
> any transaction. What I was not sure was about the idea of giving
> WARNING/ERROR if the first xact to be applied is not the same as
> skip_xid.

Do you prefer not to do anything in this case?

>
> >
> > Probably, we also need to consider the case where the tablesync worker
> > entered an error loop and the user wants to skip the transaction? The
> > apply worker is also running at the same time but it should not clear
> > subskipxid. Similarly, the tablesync worker should not clear
> > subskipxid if the apply worker wants to skip the transaction.
> >
>
> I think for tablesync workers, the skip_xid set via this mechanism
> won't work as we don't have any remote_xid for them, and neither any
> XID is reported in the view for them.

If the tablesync worker raises an error while applying changes after
finishing the copy, it also reports the error XID.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Amit Kapila
On Wed, Jan 26, 2022 at 8:55 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila  
> > wrote:
> > >
> > > IIUC, the proposal is to compare the skip_xid with the very
> > > transaction the apply worker received to apply and raise a warning if
> > > it doesn't match with skip_xid and then continue. This seems like a
> > > reasonable idea but can we guarantee that it is always the first
> > > transaction that we want to skip? We seem to guarantee that we won't
> > > get something again once it is written durably/flushed on the
> > > subscriber side. I guess here it can happen that before the errored
> > > transaction, there is some empty xact, or maybe part of the stream
> > > (consider streaming transactions) of some xact, or there could be
> > > other cases as well where the server will send those xacts again.
> >
> > Good point.
> >
> > I guess that in the situation the worker entered an error loop, we can
> > guarantee that the worker fails while applying the first non-empty
> > transaction since starting logical replication. And the transaction is
> > what we’d like to skip. If the transaction that can be applied without
> > an error is resent after a restart, it’s a problem of logical
> > replication. As you pointed out, it's possible that there are some
> > empty transactions before the transaction in question since we don't
> > advance replication origin LSN if the transaction is empty. Also,
> > probably the same is true for a streamed transaction that is rolled
> > back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
> > subskipxid if the transaction is empty? That is, we make sure to clear
> > it after applying the first non-empty transaction. We would need to
> > carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
> > ends up not working at all in some cases.

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

>
> Probably, we also need to consider the case where the tablesync worker
> entered an error loop and the user wants to skip the transaction? The
> apply worker is also running at the same time but it should not clear
> subskipxid. Similarly, the tablesync worker should not clear
> subskipxid if the apply worker wants to skip the transaction.
>

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-01-25 Thread Masahiko Sawada
On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
> > >  wrote:
> > > >
> > > > On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada  
> > > > wrote:
> > > >>
> > > >> Yeah, I think it's a good idea to clear the subskipxid after the first
> > > >> transaction regardless of whether the worker skipped it.
> > > >>
> > > >
> > > > So basically instead of stopping the worker with an error you suggest 
> > > > having the worker continue applying changes (after resetting 
> > > > subskipxid, and - arguably - the ?_error_* fields).  Log the 
> > > > transaction xid mis-match as a warning in the log file as opposed to an 
> > > > error.
> > >
> > > Agreed, I think it's better to log a warning than to raise an error.
> > > In the case where the user specified the wrong XID, the worker should
> > > fail again due to the same error.
> > >
> >
> > IIUC, the proposal is to compare the skip_xid with the very
> > transaction the apply worker received to apply and raise a warning if
> > it doesn't match with skip_xid and then continue. This seems like a
> > reasonable idea but can we guarantee that it is always the first
> > transaction that we want to skip? We seem to guarantee that we won't
> > get something again once it is written durably/flushed on the
> > subscriber side. I guess here it can happen that before the errored
> > transaction, there is some empty xact, or maybe part of the stream
> > (consider streaming transactions) of some xact, or there could be
> > other cases as well where the server will send those xacts again.
>
> Good point.
>
> I guess that in the situation the worker entered an error loop, we can
> guarantee that the worker fails while applying the first non-empty
> transaction since starting logical replication. And the transaction is
> what we’d like to skip. If the transaction that can be applied without
> an error is resent after a restart, it’s a problem of logical
> replication. As you pointed out, it's possible that there are some
> empty transactions before the transaction in question since we don't
> advance replication origin LSN if the transaction is empty. Also,
> probably the same is true for a streamed transaction that is rolled
> back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
> subskipxid if the transaction is empty? That is, we make sure to clear
> it after applying the first non-empty transaction. We would need to
> carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
> ends up not working at all in some cases.

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




  1   2   3   4   5   6   >