Re: Duplicated LSN in ReorderBuffer

2020-01-31 Thread Amit Kapila
On Fri, Jan 31, 2020 at 12:35 AM Alvaro Herrera
 wrote:
>
> On 2019-Jul-26, Andres Freund wrote:
>
> > Petr, Simon, see the potential issue related to fast forward at the
> > bottom.
>
> I think we neglected this bit.  I looked at the patch Simon submitted
> downthread, and while I vaguely understand that we need to process
> NEW_CID records during fast-forwarding,
>

Right, IIUC, that is mainly to mark txn has catalog changes
(ReorderBufferXidSetCatalogChanges) so that later such a transaction
can be used to build historic snapshots (during SnapBuildCommitTxn).
Now, if that is true, then won't we need a similar change in
DecodeHeapOp for XLOG_HEAP_INPLACE case as well?  Also, I am not sure
if  SnapBuildProcessNewCid needs to create Cid map in such a case.


> I don't quite understand why we
> still can skip XLOG_INVALIDATION messages.  I *think* we should process
> those too.
>

I also think so.  If you are allowing to execute invalidation
irrespective of fast_forward in DecodeStandbyOp, then why not do the
same in DecodeCommit where we add
ReorderBufferAddInvalidations?

>   Here's a patch that also contains that change; I also
> reworded Simon's proposed comment.  I appreciate reviews.
>

Even though, in theory, the changes look to be in the right direction,
but it is better if we can create a test case to reproduce the
problem.  I am not sure, but I think we need to generate a few DDLs
for the transaction for which we want to fast forward and then after
moving forward the DMLs dependent on those WAL should create some
problem as we have skipped executing invalidations and addition of
such a transaction in the historic snapshot.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Duplicated LSN in ReorderBuffer

2020-01-30 Thread Alvaro Herrera
On 2019-Jul-26, Andres Freund wrote:

> Petr, Simon, see the potential issue related to fast forward at the
> bottom.

I think we neglected this bit.  I looked at the patch Simon submitted
downthread, and while I vaguely understand that we need to process
NEW_CID records during fast-forwarding, I don't quite understand why we
still can skip XLOG_INVALIDATION messages.  I *think* we should process
those too.  Here's a patch that also contains that change; I also
reworded Simon's proposed comment.  I appreciate reviews.

Thoughts?  Relevant extracts from Andres' message below.

> Thinking about it, it was not immediately clear to me that it is
> necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the
> cid mapping when decoding content of the transaction that the
> XLOG_HEAP2_NEW_CID record was about - which will not happen if it
> started before SNAPBUILD_FULL.
> 
> Except that they also are responsible for signalling that a transaction
> performed catalog modifications (cf ReorderBufferXidSetCatalogChanges()
> call), which in turn is important for SnapBuildCommitTxn() to know
> whether to include that transaction needs to be included in historic
> snapshots.
> 
> So unless I am missing something - which is entirely possible, I've had
> this code thoroughly swapped out - that means that we only need to
> process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions
> with relevant catalog changes, that don't have any invalidations
> messages.
> 
> After thinking about it for a bit, that's not guaranteed however. For
> one, even for system catalog tables, looking at
> CacheInvalidateHeapTuple() et al there can be catalog modifications that
> create neither a snapshot invalidation message, nor a catcache
> one. There's also the remote scenario that we possibly might be only
> modifying a toast relation.
> 
> But more importantly, the only modified table could be a user defined
> catalog table (i.e. WITH (user_catalog_table = true)). Which in all
> likelihood won't cause invalidation messages. So I think currently it is
> required to process NEW_ID records - although we don't need to actually
> execute the ReorderBufferAddNewTupleCids() etc calls.

[...]

> And related to the point of the theorizing above, I don't think skipping
> XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
> record does not imply an invalidation message as discussed above, we'll
> afaict compute wrong snapshots when such transactions are encountered
> during forwarding. And we'll then log those snapshots to disk. Which
> then means the slot cannot safely be used for actual decoding anymore -
> as we'll use that snapshot when starting up decoding without fast
> forwarding.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e1dc8a651..dcb6873f12 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -336,10 +336,9 @@ DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 xl_invalidations *invalidations =
 (xl_invalidations *) XLogRecGetData(r);
 
-if (!ctx->fast_forward)
-	ReorderBufferImmediateInvalidation(ctx->reorder,
-	   invalidations->nmsgs,
-	   invalidations->msgs);
+ReorderBufferImmediateInvalidation(ctx->reorder,
+   invalidations->nmsgs,
+   invalidations->msgs);
 			}
 			break;
 		default:
@@ -360,11 +359,9 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * If we don't have snapshot, we have no reason to decode changes.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
@@ -379,6 +376,13 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 xl_heap_new_cid *xlrec;
 
 xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);
+
+/*
+ * Must process this one even when fast-forwarding.  (If we
+ * were to skip it, an incorrect snapshout would be built
+ * for catalog changes that don't carry invalidations, such
+ * as for user-marked catalog tables.)
+ */
 SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
 
 break;


Re: Duplicated LSN in ReorderBuffer

2019-09-13 Thread Andres Freund
Hi,

On 2019-09-13 16:42:47 -0300, Alvaro Herrera wrote:
> On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:
> 
> > Here's a couple of patches.
> > 
> > always_decode_assignment.patch is Masahiko Sawada's patch, which has
> > been confirmed to fix the assertion failure.
> 
> I pushed this one to all branches.  Thanks Ildar for reporting and
> Sawada-san for fixing, and reviewers.
> 
> If you (Andres) want to propose a different fix for this, be my guest.
> We can always revert this one if you have a different better fix.

I'm bit surprised to see this go in just now, after I asked for the
changes you were reporting as not working for three weeks, and you sent
them out three days ago (during which I was at the linux plumbers
conference)...

Greetings,

Andres Freund




Re: Duplicated LSN in ReorderBuffer

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:

> Here's a couple of patches.
> 
> always_decode_assignment.patch is Masahiko Sawada's patch, which has
> been confirmed to fix the assertion failure.

I pushed this one to all branches.  Thanks Ildar for reporting and
Sawada-san for fixing, and reviewers.

If you (Andres) want to propose a different fix for this, be my guest.
We can always revert this one if you have a different better fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Duplicated LSN in ReorderBuffer

2019-09-10 Thread Alvaro Herrera from 2ndQuadrant
Hello,

On 2019-Sep-06, Andres Freund wrote:

> On 2019-08-19 08:51:43 -0700, Andres Freund wrote:
> > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera 
> >  wrote:

> > >Never mind.  I was able to reproduce it later, and verify that Andres'
> > >proposed strategy doesn't seem to fix the problem.  I'm going to study
> > >the problem again today.
> > 
> > Could you post the patch?

Here's a couple of patches.

always_decode_assignment.patch is Masahiko Sawada's patch, which has
been confirmed to fix the assertion failure.

assign-child.patch is what I understood you were proposing -- namely to
assign the subxid to the top-level xid on NEW_CID.  In order for it to
work at all, I had to remove a different safety check; but the assertion
still hits when running Ildar's test case.  So the patch doesn't
actually fix anything.  And I think it makes sense that it fails, since
the first thing that's happening in this patch is that we create both
the top-level xact and the subxact with the same LSN value, which is
what triggers the assertion in the first place.  It's possible that I
misunderstood what you were suggesting.

If you want to propose a different fix, be my guest, but failing that
I'm inclined to push always_decode_assignment.patch sometime before the
end of the week.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93af0..582b6a2b96 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -361,10 +361,12 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * point in decoding changes (except NEW_CID assignments, because we need
+	 * to have subxid applied to xid in that case.)
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if ((info != XLOG_HEAP2_NEW_CID) &&
+		(SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
+		 ctx->fast_forward))
 		return;
 
 	switch (info)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 8ce28ad629..6faba6077e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -778,9 +778,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	txn = ReorderBufferTXNByXid(rb, xid, true, _top, lsn, true);
 	subtxn = ReorderBufferTXNByXid(rb, subxid, true, _sub, lsn, false);
 
-	if (new_top && !new_sub)
-		elog(ERROR, "subtransaction logged without previous top-level txn record");
-
 	if (!new_sub)
 	{
 		if (subtxn->is_known_as_subxact)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0bd1d0f954..326f5b7809 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,13 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
+	/*
+	 * In rare cases, it is possible that we're seeing a subtransaction
+	 * for the first time.  Record it in that case.
+	 */
+	if (xlrec->top_xid != xid)
+		ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+
 	/*
 	 * we only log new_cid's if a catalog tuple was modified, so mark the
 	 * transaction as containing catalog modifications
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93af0..c53e7e2279 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -217,12 +217,15 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	uint8		info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
 
 	/*
-	 * No point in doing anything yet, data could not be decoded anyway. It's
-	 * ok not to call ReorderBufferProcessXid() in that case, except in the
-	 * assignment case there'll not be any later records with the same xid;
-	 * and in the assignment case we'll not decode those xacts.
+	 * If the snapshot isn't yet fully built, we cannot decode anything, so
+	 * bail out.
+	 *
+	 * However, it's critical to process XLOG_XACT_ASSIGNMENT records even
+	 * when the snapshot is being built: it is possible to get later records
+	 * that require subxids to be properly assigned.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT &&
+		info != XLOG_XACT_ASSIGNMENT)
 		return;
 
 	switch (info)


Re: Duplicated LSN in ReorderBuffer

2019-09-06 Thread Andres Freund
Hi,

On 2019-08-19 08:51:43 -0700, Andres Freund wrote:
> On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera  
> wrote:
> >On 2019-Aug-19, Masahiko Sawada wrote:
> >
> >> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera
> > wrote:
> >
> >> > BTW I wrote the code as suggested and it passes all the tests ...
> >but I
> >> > then noticed that the unpatched code doesn't fail Ildar's original
> >> > pgbench-based test for me, either.  So maybe my laptop is not
> >powerful
> >> > enough to reproduce it, or maybe I'm doing something wrong.
> >> 
> >> If you share the patch fixing this issue I'll test it on my
> >> environment where I could reproduce the original problem.
> >
> >Never mind.  I was able to reproduce it later, and verify that Andres'
> >proposed strategy doesn't seem to fix the problem.  I'm going to study
> >the problem again today.
> 
> Could you post the patch?

Ping?

Greetings,

Andres Freund




Re: Duplicated LSN in ReorderBuffer

2019-08-19 Thread Andres Freund
Hi,

On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera  
wrote:
>On 2019-Aug-19, Masahiko Sawada wrote:
>
>> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera
> wrote:
>
>> > BTW I wrote the code as suggested and it passes all the tests ...
>but I
>> > then noticed that the unpatched code doesn't fail Ildar's original
>> > pgbench-based test for me, either.  So maybe my laptop is not
>powerful
>> > enough to reproduce it, or maybe I'm doing something wrong.
>> 
>> If you share the patch fixing this issue I'll test it on my
>> environment where I could reproduce the original problem.
>
>Never mind.  I was able to reproduce it later, and verify that Andres'
>proposed strategy doesn't seem to fix the problem.  I'm going to study
>the problem again today.

Could you post the patch?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Duplicated LSN in ReorderBuffer

2019-08-19 Thread Alvaro Herrera
On 2019-Aug-19, Masahiko Sawada wrote:

> On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera  
> wrote:

> > BTW I wrote the code as suggested and it passes all the tests ... but I
> > then noticed that the unpatched code doesn't fail Ildar's original
> > pgbench-based test for me, either.  So maybe my laptop is not powerful
> > enough to reproduce it, or maybe I'm doing something wrong.
> 
> If you share the patch fixing this issue I'll test it on my
> environment where I could reproduce the original problem.

Never mind.  I was able to reproduce it later, and verify that Andres'
proposed strategy doesn't seem to fix the problem.  I'm going to study
the problem again today.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Duplicated LSN in ReorderBuffer

2019-08-19 Thread Masahiko Sawada
On Tue, Aug 13, 2019 at 6:36 AM Alvaro Herrera  wrote:
>
> On 2019-Aug-07, Andres Freund wrote:
>
> > I think we would need to do this for all values of
> > SnapBuildCurrentState() - after all the problem occurs because we
> > *previously* didn't assign subxids to the toplevel xid.  Compared to the
> > cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
> > I don't think there's any problem just calling it unconditionally (when
> > top_xid <> xid, of course).
>
> BTW I wrote the code as suggested and it passes all the tests ... but I
> then noticed that the unpatched code doesn't fail Ildar's original
> pgbench-based test for me, either.  So maybe my laptop is not powerful
> enough to reproduce it, or maybe I'm doing something wrong.

If you share the patch fixing this issue I'll test it on my
environment where I could reproduce the original problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Duplicated LSN in ReorderBuffer

2019-08-12 Thread Alvaro Herrera
On 2019-Aug-07, Andres Freund wrote:

> I think we would need to do this for all values of
> SnapBuildCurrentState() - after all the problem occurs because we
> *previously* didn't assign subxids to the toplevel xid.  Compared to the
> cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
> I don't think there's any problem just calling it unconditionally (when
> top_xid <> xid, of course).

BTW I wrote the code as suggested and it passes all the tests ... but I
then noticed that the unpatched code doesn't fail Ildar's original
pgbench-based test for me, either.  So maybe my laptop is not powerful
enough to reproduce it, or maybe I'm doing something wrong.

I'm tempted to just push it, since it seems "obviously" more correct
than the original.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Duplicated LSN in ReorderBuffer

2019-08-07 Thread Andres Freund
Hi,

On 2019-08-07 16:19:13 -0400, Alvaro Herrera wrote:
> On 2019-Jul-26, Andres Freund wrote:
> 
> > 2) We could simply assign the subtransaction to the parent using
> >ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
> >caller. That ought to also fix the bug
> > 
> >I also has the advantage that we can save some memory in transactions
> >that have some, but fewer than the ASSIGNMENT limit subtransactions,
> >because it allows us to avoid having a separate base snapshot for
> >them (c.f. ReorderBufferTransferSnapToParent()).
> 
> I'm not sure I understood this suggestion correctly.  I first tried with
> this, which seems the simplest rendition:
> 
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId 
> xid,
>  {
>   CommandId   cid;
>  
> + if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
> + (xlrec->top_xid != xid))
> + {
> + ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, 
> lsn);
> + }
> +

I think we would need to do this for all values of
SnapBuildCurrentState() - after all the problem occurs because we
*previously* didn't assign subxids to the toplevel xid.  Compared to the
cost of catalog changes, ReorderBufferAssignChild() is really cheap. So
I don't think there's any problem just calling it unconditionally (when
top_xid <> xid, of course).

If the above is the only change, I think the body of the if should be
unreachable, DecodeHeap2Op guards against that:

/*
 * If we don't have snapshot or we are just fast-forwarding, there is no
 * point in decoding changes.
 */
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
ctx->fast_forward)
return;


> I thought I would create the main txn before calling AssignChild in
> snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
> So that seems out.

There shouldn't be any need for doing that, ReorderBufferAssignChild
does that.

Greetings,

Andres Freund




Re: Duplicated LSN in ReorderBuffer

2019-08-07 Thread Alvaro Herrera
On 2019-Jul-26, Andres Freund wrote:

> 2) We could simply assign the subtransaction to the parent using
>ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
>caller. That ought to also fix the bug
> 
>I also has the advantage that we can save some memory in transactions
>that have some, but fewer than the ASSIGNMENT limit subtransactions,
>because it allows us to avoid having a separate base snapshot for
>them (c.f. ReorderBufferTransferSnapToParent()).

I'm not sure I understood this suggestion correctly.  I first tried with
this, which seems the simplest rendition:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId 
xid,
 {
CommandId   cid;
 
+   if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
+   (xlrec->top_xid != xid))
+   {
+   ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, 
lsn);
+   }
+
/*
 * we only log new_cid's if a catalog tuple was modified, so mark the
 * transaction as containing catalog modifications

test_decoding's tests pass with that, but if I try the example script
provided by Ildar, all pgbench clients die with this:

client 19 script 1 aborted in command 1 query 0: ERROR:  subtransaction logged 
without previous top-level txn record

I thought I would create the main txn before calling AssignChild in
snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
So that seems out.  My next try was to remove the elog() that was
causing the failure ... but that leads pretty quickly to a crash with
this backtrace:

#2  0x5653241fb823 in ExceptionalCondition 
(conditionName=conditionName@entry=0x5653243c1960 "!(prev_first_lsn < 
cur_txn->first_lsn)", 
errorType=errorType@entry=0x565324250596 "FailedAssertion", 
fileName=fileName@entry=0x5653243c18e8 
"/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c", 
lineNumber=lineNumber@entry=680) at 
/pgsql/source/master/src/backend/utils/error/assert.c:54
#3  0x565324062a84 in AssertTXNLsnOrder (rb=rb@entry=0x565326304fa8)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:680
#4  0x565324062e39 in ReorderBufferTXNByXid (rb=rb@entry=0x565326304fa8, 
xid=, xid@entry=185613, create=create@entry=true, 
is_new=is_new@entry=0x0, lsn=lsn@entry=2645271944, 
create_as_top=create_as_top@entry=true)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:559
#5  0x565324067365 in ReorderBufferAddNewTupleCids (rb=0x565326304fa8, 
xid=185613, lsn=lsn@entry=2645271944, node=..., tid=..., cmin=0, 
cmax=4294967295, combocid=4294967295) at 
/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:2100
#6  0x565324069451 in SnapBuildProcessNewCid (builder=0x56532630afd8, 
xid=185614, lsn=2645271944, xlrec=0x5653262efc78)
at /pgsql/source/master/src/backend/replication/logical/snapbuild.c:787

Now this failure goes away if I relax the < to <= in the
complained-about line ... but at this point it's two sanity checks that
I've lobotomized in order to get this to run at all.  Not really
comfortable with that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5fa3d7323e..55538fa44c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -677,7 +677,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
 
 		/* Current initial LSN must be strictly higher than previous */
 		if (prev_first_lsn != InvalidXLogRecPtr)
-			Assert(prev_first_lsn < cur_txn->first_lsn);
+			Assert(prev_first_lsn <= cur_txn->first_lsn);
 
 		/* known-as-subtxn txns must not be listed */
 		Assert(!cur_txn->is_known_as_subxact);
@@ -778,9 +778,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	txn = ReorderBufferTXNByXid(rb, xid, true, _top, lsn, true);
 	subtxn = ReorderBufferTXNByXid(rb, subxid, true, _sub, lsn, false);
 
-	if (new_top && !new_sub)
-		elog(ERROR, "subtransaction logged without previous top-level txn record");
-
 	if (!new_sub)
 	{
 		if (subtxn->is_known_as_subxact)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index dc64b1e0c2..c8b1fcd96c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,10 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT &&
+		xlrec->top_xid != xid)
+		ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+
 	/*
 	 * we only log new_cid's if a catalog tuple was 

Re: Duplicated LSN in ReorderBuffer

2019-07-28 Thread Simon Riggs
On Mon, 29 Jul 2019 at 00:09, Petr Jelinek  wrote:


> Given that we don't process any other records in this function besides
> XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest
> fix is to just remove the first check for fast forward and keep the one
> in XLOG_HEAP2_MULTI_INSERT.
>

Fix proposed by Petr, with comments as explained by Andres.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Solutions for the Enterprise


allow_XLOG_HEAP2_NEW_CID_while_building_snapshot.v1.patch
Description: Binary data


Re: Duplicated LSN in ReorderBuffer

2019-07-28 Thread Petr Jelinek

Hi,

On 27. 07. 19 3:15, Andres Freund wrote:

Hi,

Petr, Simon, see the potential issue related to fast forward at the
bottom.

[..snip..]

This actually made me look at the nearby changes due to

commit 9c7d06d60680c7f00d931233873dee81fdb311c6
Author: Simon Riggs 
Date:   2018-01-17 11:38:34 +

 Ability to advance replication slots

and uhm, I'm not sure they're fully baked. Something like:

/*
 * If we don't have snapshot or we are just fast-forwarding, there is no
 * point in decoding changes.
 */
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
ctx->fast_forward)
return;

case XLOG_HEAP2_MULTI_INSERT:
if (!ctx->fast_forward &&
SnapBuildProcessChange(builder, xid, 
buf->origptr))
DecodeMultiInsert(ctx, buf);
break;

is not very suggestive of that (note the second check).



You mean that it's redundant, yeah.., although given your next point, 
see bellow.




And related to the point of the theorizing above, I don't think skipping
XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID
record does not imply an invalidation message as discussed above, we'll
afaict compute wrong snapshots when such transactions are encountered
during forwarding. And we'll then log those snapshots to disk. Which
then means the slot cannot safely be used for actual decoding anymore -
as we'll use that snapshot when starting up decoding without fast
forwarding.



Hmm, I guess that's true. I think I have convinced myself that CID does 
not matter outside of this transaction, but since we might actually 
share the computed snapshot via file save/restore with other slots, any 
non-fast-forwarding decoding that reads the same transaction could miss 
the CID thanks to the shared snapshot which does not include it.


Given that we don't process any other records in this function besides 
XLOG_HEAP2_MULTI_INSERT and XLOG_HEAP2_NEW_CID, it seems like simplest 
fix is to just remove the first check for fast forward and keep the one 
in XLOG_HEAP2_MULTI_INSERT.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: Duplicated LSN in ReorderBuffer

2019-07-26 Thread Andres Freund
Hi,

Petr, Simon, see the potential issue related to fast forward at the
bottom.


On 2019-07-26 18:46:35 -0400, Alvaro Herrera wrote:
> On 2019-Jul-09, Masahiko Sawada wrote:
>
> > I think the cause of this bug would be that a ReorderBufferTXN entry
> > of sub transaction is created as top-level transaction. And this
> > happens because we skip to decode ASSIGNMENT during the state of
> > snapshot builder < SNAPBUILD_FULL.
>
> That explanation seems to make sense.

Yea. The comment that "in the assignment case we'll not decode those
xacts" is true, but it misses that we *do* currently process
XLOG_HEAP2_NEW_CID records for transactions that started before reaching
FULL_SNAPSHOT.

Thinking about it, it was not immediately clear to me that it is
necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the
cid mapping when decoding content of the transaction that the
XLOG_HEAP2_NEW_CID record was about - which will not happen if it
started before SNAPBUILD_FULL.

Except that they also are responsible for signalling that a transaction
performed catalog modifications (cf ReorderBufferXidSetCatalogChanges()
call), which in turn is important for SnapBuildCommitTxn() to know
whether to include that transaction needs to be included in historic
snapshots.

So unless I am missing something - which is entirely possible, I've had
this code thoroughly swapped out - that means that we only need to
process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions
with relevant catalog changes, that don't have any invalidations
messages.

After thinking about it for a bit, that's not guaranteed however. For
one, even for system catalog tables, looking at
CacheInvalidateHeapTuple() et al there can be catalog modifications that
create neither a snapshot invalidation message, nor a catcache
one. There's also the remote scenario that we possibly might be only
modifying a toast relation.

But more importantly, the only modified table could be a user defined
catalog table (i.e. WITH (user_catalog_table = true)). Which in all
likelihood won't cause invalidation messages. So I think currently it is
required to process NEW_ID records - although we don't need to actually
execute the ReorderBufferAddNewTupleCids() etc calls.

Perhaps the right fix for the future would actually be to not rely on on
NEW_ID for recognizing transactions as such, but instead have an xact.c
marker that signals whether a transaction performed catalog
modifications.

Hm, need to think more about this.


> > Instead, I wonder if we can decode ASSIGNMENT even when the state of
> > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
> > ReorderBufferTXN entries of both top transaction and sub transaction
> > are created properly before we decode NEW_CID.
>
> Yeah, that seems a sensible remediation to me.

That does seems like a reasonable approach. I can see two alternatives:

1) Change SnapBuildProcessNewCid()'s ReorderBufferXidSetCatalogChanges()
   call to reference the toplevel xid. That has the disadvantage that if
   the subtransaction that performed DDL rolls back, the main
   transaction will still be treated as a catalog transaction - i have a
   hard time seeing that being common, however.

   That'd then also require SnapBuildProcessNewCid() in
   SNAPBUILD_FULL_SNAPSHOT to return before processing any data assigned
   to subtransactions. Which would be good, because that's currently
   unnecessarily stored in memory.

2) We could simply assign the subtransaction to the parent using
   ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
   caller. That ought to also fix the bug

   I also has the advantage that we can save some memory in transactions
   that have some, but fewer than the ASSIGNMENT limit subtransactions,
   because it allows us to avoid having a separate base snapshot for
   them (c.f. ReorderBufferTransferSnapToParent()).

   Like 1) that could be combined with adding an early return when <
   SNAPBUILD_FULL_SNAPSHOT, after ReorderBufferXidSetCatalogChanges(),
   but I don't think it'd be required for correctness in contrast to 1).

Both of these would have the advantage that we only would track
additional information for transactions that have modified the catalog,
whereas the proposal to process ASSIGNMENT earlier, would mean that we
additionally track all transactions with more than 64 children.  So
provided that I didn't mis-analyze here, I think both of my alternatives
are preferrable? I think 2) is simpler?


>   /*
> -  * No point in doing anything yet, data could not be decoded anyway. 
> It's
> -  * ok not to call ReorderBufferProcessXid() in that case, except in the
> -  * assignment case there'll not be any later records with the same xid;
> -  * and in the assignment case we'll not decode those xacts.
> +  * If the snapshot isn't yet fully built, we cannot decode anything, so
> +  * bail out.
> +  *
> +  * However, it's critical to process 

Re: Duplicated LSN in ReorderBuffer

2019-07-26 Thread Alvaro Herrera
On 2019-Jul-09, Masahiko Sawada wrote:

> I think the cause of this bug would be that a ReorderBufferTXN entry
> of sub transaction is created as top-level transaction. And this
> happens because we skip to decode ASSIGNMENT during the state of
> snapshot builder < SNAPBUILD_FULL.

That explanation seems to make sense.

> Instead, I wonder if we can decode ASSIGNMENT even when the state of
> snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
> ReorderBufferTXN entries of both top transaction and sub transaction
> are created properly before we decode NEW_CID.

Yeah, that seems a sensible remediation to me.

I would reduce the scope a little bit -- only create the assignment in
the BUILDING state, and skip it in the START state.  I'm not sure that
it's possible to get assignments while in START state that are
significant (I'm still trying to digest SnapBuildFindSnapshot).

I would propose the attached.  Andres, do you have an opinion on this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 0d37a9a64bc20414742a8e9497dc8dffeeea16d1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 26 Jul 2019 18:38:44 -0400
Subject: [PATCH v3] decode XACT_ASSIGNMENT while building snapshot

---
 src/backend/replication/logical/decode.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef882..a6f7dc2723 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -215,14 +215,19 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBuffer *reorder = ctx->reorder;
 	XLogReaderState *r = buf->record;
 	uint8		info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
+	SnapBuildState state;
 
 	/*
-	 * No point in doing anything yet, data could not be decoded anyway. It's
-	 * ok not to call ReorderBufferProcessXid() in that case, except in the
-	 * assignment case there'll not be any later records with the same xid;
-	 * and in the assignment case we'll not decode those xacts.
+	 * If the snapshot isn't yet fully built, we cannot decode anything, so
+	 * bail out.
+	 *
+	 * However, it's critical to process XLOG_XACT_ASSIGNMENT records even
+	 * when the snapshot is being built: it is possible to get later records
+	 * that require subxids to be properly assigned.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	state = SnapBuildCurrentState(builder);
+	if (state < SNAPBUILD_FULL_SNAPSHOT ||
+		(info == XLOG_XACT_ASSIGNMENT && state < SNAPBUILD_BUILDING_SNAPSHOT))
 		return;
 
 	switch (info)
-- 
2.17.1



Re: Duplicated LSN in ReorderBuffer

2019-07-09 Thread Masahiko Sawada
On Mon, Jul 8, 2019 at 11:46 PM Masahiko Sawada  wrote:
>
> On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro  wrote:
> >
> > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin  wrote:
> > > Attached is a simple patch that uses subxid instead of top-level xid
> > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > > i'm not sure that this is a valid change. Can someone please verify it
> > > or maybe suggest a better solution for the issue?
> >
>
> I've reproduced this issue with script Ildar provided. I don't find
> out the root cause yet and I'm not sure the patch takes a correct way
> to fix this.
>
> In my environment, I got the following pg_waldump output and the
> logical decoding failed at 0/019FA058 when processing NEW_CID.
>
>   90489 rmgr: Transaction len (rec/tot): 38/38, tx:
> 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
> subxacts: 1999
>   90490 rmgr: Standby len (rec/tot):405/   405, tx:
> 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
> latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
> 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
> 1914 1982 1958 1840 1920 1926 1992 1962 1
>   90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
> 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
>   90491 rmgr: Heap2   len (rec/tot): 60/60, tx:
> 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
> 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
> 4294967295
>   90492 rmgr: Heaplen (rec/tot):127/   127, tx:
> 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
> 0x00, blkref #0: rel 1663/12678/2615 blk 11
>
> I thought that the logical decoding doesn't create ReorderBufferTXN of
> xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
> xid=1999 beforehand. But what actually happen is that it skips NEW_CID
> since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
> and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
> RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
> for xid = 1999 and xid = 1998 as top-level transactions when
> processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
> and ReorderBufferAddNewTupleCids creates xid = 1998).

I think the cause of this bug would be that a ReorderBufferTXN entry
of sub transaction is created as top-level transaction. And this
happens because we skip to decode ASSIGNMENT during the state of
snapshot builder < SNAPBUILD_FULL.

@@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder,
TransactionId xid,
  */
  ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

- ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn,
  xlrec->target_node, xlrec->target_tid,
  xlrec->cmin, xlrec->cmax,
  xlrec->combocid);

The above change in the proposed patch changes SnapBuildProcessNewCid
so that it passes sub transaction id instead of top transaction id to
ReorderBufferAddNewTupleCids that adds a (relfilenode, tid) -> (cmin,
cmax) mapping to the transaction. But I think the fix is not correct
since as the comment of ReorderBufferTXN describes, the mappings are
always assigned to the top-level transaction.

in reorderbuffer.h,
/*
 * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
 * Those are always assigned to the toplevel transaction. (Keep track of
 * #entries to create a hash of the right size)
 */
dlist_head  tuplecids;
uint64  ntuplecids;

Instead, I wonder if we can decode ASSIGNMENT even when the state of
snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
ReorderBufferTXN entries of both top transaction and sub transaction
are created properly before we decode NEW_CID.

Attached patch do that. In my environment the issue seems to be fixed
but I'm still not confident that this is the right fix. Please review
it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


always_decode_assignment.patch
Description: Binary data


Re: Duplicated LSN in ReorderBuffer

2019-07-08 Thread Masahiko Sawada
On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro  wrote:
>
> On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin  wrote:
> > Attached is a simple patch that uses subxid instead of top-level xid
> > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > i'm not sure that this is a valid change. Can someone please verify it
> > or maybe suggest a better solution for the issue?
>

I've reproduced this issue with script Ildar provided. I don't find
out the root cause yet and I'm not sure the patch takes a correct way
to fix this.

In my environment, I got the following pg_waldump output and the
logical decoding failed at 0/019FA058 when processing NEW_CID.

  90489 rmgr: Transaction len (rec/tot): 38/38, tx:
1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
subxacts: 1999
  90490 rmgr: Standby len (rec/tot):405/   405, tx:
0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
1914 1982 1958 1840 1920 1926 1992 1962 1
  90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
  90491 rmgr: Heap2   len (rec/tot): 60/60, tx:
1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
4294967295
  90492 rmgr: Heaplen (rec/tot):127/   127, tx:
1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
0x00, blkref #0: rel 1663/12678/2615 blk 11

I thought that the logical decoding doesn't create ReorderBufferTXN of
xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
xid=1999 beforehand. But what actually happen is that it skips NEW_CID
since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
for xid = 1999 and xid = 1998 as top-level transactions when
processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
and ReorderBufferAddNewTupleCids creates xid = 1998). And therefore it
got the assertion failure when adding ReorderBufferTXN of xid = 1998.

I'll look into this more deeply tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Duplicated LSN in ReorderBuffer

2019-07-07 Thread Thomas Munro
On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin  wrote:
> Attached is a simple patch that uses subxid instead of top-level xid
> in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> i'm not sure that this is a valid change. Can someone please verify it
> or maybe suggest a better solution for the issue?

Hello Ildar,

I hope someone more familiar with this code than me can comment, but
while going through the Commitfest CI results I saw this segfault with
your patch:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555184304

At a glance, HistoricSnapshotGetTupleCids() returned NULL in
HeapTupleSatisfiesHistoricMVCC(), so ResolveCminCmaxDuringDecoding()
blew up.

-- 
Thomas Munro
https://enterprisedb.com