Re: Possible bug in logical replication.

2018-07-19 Thread Alvaro Herrera
On 2018-Jul-19, Michael Paquier wrote:

> On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:
> > On 2018-Jul-19, Michael Paquier wrote:
> >> I am fine either way if you want to have the last call.  So please feel
> >> free to choose what you prefer here.  That's no big deal.
> > 
> > Okay.  You want to push it, or shall I?
> 
> It seems to me that you are on it more than I.  So please go ahead.
> Thanks for asking.

Done, thanks everyone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-07-18 Thread Erik Rijkers

On 2018-07-19 03:42, Michael Paquier wrote:

On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:

In the immortal words of Julian Bream: "yeah, I didn't like any of
that".


One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.


Julian Bream is the 20th century finest classical guitarist.

The quote is from Bream's 1978 guitar masterclass (at 30:25)

https://www.youtube.com/watch?v=iAXUzIrukt0




Re: Possible bug in logical replication.

2018-07-18 Thread Michael Paquier
On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:
> On 2018-Jul-19, Michael Paquier wrote:
>> I am fine either way if you want to have the last call.  So please feel
>> free to choose what you prefer here.  That's no big deal.
> 
> Okay.  You want to push it, or shall I?

It seems to me that you are on it more than I.  So please go ahead.
Thanks for asking.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-18 Thread Alvaro Herrera
On 2018-Jul-19, Michael Paquier wrote:

> I am fine either way if you want to have the last call.  So please feel
> free to choose what you prefer here.  That's no big deal.

Okay.  You want to push it, or shall I?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-07-18 Thread Michael Paquier
On Wed, Jul 18, 2018 at 10:45:27PM -0400, Alvaro Herrera wrote:
> On 2018-Jul-19, Michael Paquier wrote:
> 
>> On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
>> > In the immortal words of Julian Bream: "yeah, I didn't like any of
>> > that".
>> 
>> One wikipedia lookup later, I still don't know where this quote comes
>> from, but at least I understand who the man is.
> 
> https://twitter.com/alvherre/status/1019652397306703873
> 
> OK, maybe not that "immortal" after all.

:)

>> This is referring to the system catalog field in pg_replication_slots.
> 
> Yeah, I think that's a bit misleading.  (I very frequently do a tag-jump
> on identifiers in comments, and it's uncomfortable that in this case it
> jumps to the Docbook source rather than to the struct declaration.)

Okay, this makes sense as well and that's easier to grep.

>>> +  gotrecord = XLogReadRecord(ctx->reader, startlsn, ) != NULL;
>> 
>> I would put parenthesis for clarity.
> 
> Or just put it back as a record pointer.

I am fine either way if you want to have the last call.  So please feel
free to choose what you prefer here.  That's no big deal.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-18 Thread Alvaro Herrera
On 2018-Jul-19, Michael Paquier wrote:

> On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
> > In the immortal words of Julian Bream: "yeah, I didn't like any of
> > that".
> 
> One wikipedia lookup later, I still don't know where this quote comes
> from, but at least I understand who the man is.

https://twitter.com/alvherre/status/1019652397306703873

OK, maybe not that "immortal" after all.

> I may be missing something, but I cannot apply your patch on HEAD so I
> have not tested it.  Anyway, I read through it and the thing does not
> look logically wrong.

Sorry, I forgot to mention this -- it applies to 11.

> > I also moved some assignments from the declaration section to the code
> > section, so that I could attach proper comments to each, to improve
> > clarity of *why* we do those things.
> 
> To be pedantic here, you could move the declarations of startlsn,
> old_resowner and ctx directly inside the PG_TRY block.

Good idea, thanks.

> > I then noticed that we get a XLogRecord from XLogReadRecord, but then
> > fail to do anything with it, so I changed the code to use a bool
> > instead, which I think is clearer.
> 
> Matter of taste perhaps, I was fine with just manipulating the record
> pointer.

Yeah, it works out to the same thing really.  Maybe it's just me being
pedantic and annoyed when I realized that logical decoding does not
operate on the record itself but on the xlogreader struct instead.

TBH what I was actually doing at first was just casting the result of
XLogReadRecord to void and not doing anything with it, until I realized
that the logical decoding call below was important because of
side-effects.  I thought since fast_forward mode was used,
XLogReadRecord would not return anything anyway -- but I added an assert
and realized that was not so.

> > I think the proposed comment before the LogicalDecodingProcessRecord
> > call failed to convey the important ideas, so I rewrote that one also.
> > 
> > There is no struct member called confirmed_flush_lsn anywhere.
> 
> This is referring to the system catalog field in pg_replication_slots.

Yeah, I think that's a bit misleading.  (I very frequently do a tag-jump
on identifiers in comments, and it's uncomfortable that in this case it
jumps to the Docbook source rather than to the struct declaration.)

> > PG_TRY();
> > {
> > -   /* restart at slot's confirmed_flush */
> > +   /*
> > +* Create our decoding context in fast_forward mode, passing 
> > start_lsn
> > +* as Invalid, so that we start processing from confirmed_flush.
> > +*/
> 
> I'd rather mention InvalidXLogRecPtr directly here.  Invalid alone makes
> no real sense.

OK.  I was of two minds about that.

> > +  gotrecord = XLogReadRecord(ctx->reader, startlsn, ) != NULL;
> 
> I would put parenthesis for clarity.

Or just put it back as a record pointer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-07-18 Thread Michael Paquier
On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
> In the immortal words of Julian Bream: "yeah, I didn't like any of
> that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

I may be missing something, but I cannot apply your patch on HEAD so I
have not tested it.  Anyway, I read through it and the thing does not
look logically wrong.

> I also moved some assignments from the declaration section to the code
> section, so that I could attach proper comments to each, to improve
> clarity of *why* we do those things.

To be pedantic here, you could move the declarations of startlsn,
old_resowner and ctx directly inside the PG_TRY block.

> I then noticed that we get a XLogRecord from XLogReadRecord, but then
> fail to do anything with it, so I changed the code to use a bool
> instead, which I think is clearer.

Matter of taste perhaps, I was fine with just manipulating the record
pointer.

> I think the proposed comment before the LogicalDecodingProcessRecord
> call failed to convey the important ideas, so I rewrote that one also.
> 
> There is no struct member called confirmed_flush_lsn anywhere.

This is referring to the system catalog field in pg_replication_slots.

> BTW I think I'm starting to have a vague idea of logical decoding
> now.

Nice.

>   PG_TRY();
>   {
> - /* restart at slot's confirmed_flush */
> + /*
> +  * Create our decoding context in fast_forward mode, passing 
> start_lsn
> +  * as Invalid, so that we start processing from confirmed_flush.
> +  */

I'd rather mention InvalidXLogRecPtr directly here.  Invalid alone makes
no real sense.

> +  gotrecord = XLogReadRecord(ctx->reader, startlsn, ) != NULL;

I would put parenthesis for clarity.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-12 Thread Michael Paquier
On Wed, Jul 04, 2018 at 10:50:28AM +0900, Michael Paquier wrote:
> On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:
> > Let me review tomorrow.
> 
> Of course, please feel free.

Alvaro, are you planning to look at that to close the loop?  The latest
version is here:
https://postgr.es/m/20180709070200.gc30...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-09 Thread Arseny Sher


Michael Paquier  writes:

> Okay, let's do as you suggest then.  Do you find the attached adapted?

Yes, thanks!

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Possible bug in logical replication.

2018-07-09 Thread Michael Paquier
On Tue, Jul 03, 2018 at 09:16:42AM +0300, Arseny Sher wrote:
> Michael Paquier  writes:
>> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
>>> Could it be possible to get a patch from all the feedback and exchange
>>> gathered here?  Petr, I think that it would not hurt if you use the set
>>> of words and comments you think is most adapted as the primary author of
>>> the feature.
>>
>> I have seen no patch, so attached is one to finally close the loop and
>> this open item, which includes both my suggestions and what Arseny has
>> mentioned based on the latest emails exchanged.  Any objections to that?
> 
> I'm practically happy with this.
> 
>>  * while confirmed_lsn is used as base point for the decoding context.
> 
> This line is excessive as now we have comment below saying it doesn't
> matter.

Okay, let's do as you suggest then.  Do you find the attached adapted?
--
Michael
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index c2d0e0c723..7e10a027ca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..5cedb688a6 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,12 +341,11 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * The slot's restart_lsn is used as start point for reading records.
+ * While we could just use LogicalConfirmReceivedLocation to update
+ * confirmed_flush_lsn, we had better digest WAL to advance restart_lsn
+ * allowing to recycle WAL and old catalog tuples.  As decoding is done
+ * with fast_forward mode, no changes are generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +357,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Note that start_lsn does not matter here, as with fast_forward mode
+		 * no transactions are replayed.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	NIL,
 	true,
@@ -378,6 +380,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			XLogRecord *record;
 			char	   *errm = NULL;
 
+			/*
+			 * Start reading WAL at the slot's restart_lsn, which certainly
+			 * points to the valid record.
+			 */
 			record = XLogReadRecord(ctx->reader, startlsn, );
 			if (errm)
 elog(ERROR, "%s", errm);
@@ -389,8 +395,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, and
+			 * that the slot's data is still updated.
 			 */
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-03 Thread Michael Paquier
On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:
> Let me review tomorrow.

Of course, please feel free.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-07-02 Thread Alvaro Herrera
On 2018-Jul-03, Michael Paquier wrote:

> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> > Could it be possible to get a patch from all the feedback and exchange
> > gathered here?  Petr, I think that it would not hurt if you use the set
> > of words and comments you think is most adapted as the primary author of
> > the feature.
> 
> I have seen no patch, so attached is one to finally close the loop and
> this open item, which includes both my suggestions and what Arseny has
> mentioned based on the latest emails exchanged.  Any objections to that?

Let me review tomorrow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-07-02 Thread Michael Paquier
On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> Could it be possible to get a patch from all the feedback and exchange
> gathered here?  Petr, I think that it would not hurt if you use the set
> of words and comments you think is most adapted as the primary author of
> the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged.  Any objections to that?
--
Michael
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index c2d0e0c723..7e10a027ca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..6c2addd5b7 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -343,10 +343,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * Helper function for advancing logical replication slot forward.
  * The slot's restart_lsn is used as start point for reading records,
  * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just use LogicalConfirmReceivedLocation to update
+ * confirmed_flush_lsn, we had better digest WAL to advance restart_lsn
+ * allowing to recycle WAL and old catalog tuples.  As decoding is done
+ * with fast_forward mode, no changes are generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +358,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Note that start_lsn does not matter here, as with fast_forward mode
+		 * no transactions are replayed.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	NIL,
 	true,
@@ -378,6 +381,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			XLogRecord *record;
 			char	   *errm = NULL;
 
+			/*
+			 * Start reading WAL at the slot's restart_lsn, which certainly
+			 * points to the valid record.
+			 */
 			record = XLogReadRecord(ctx->reader, startlsn, );
 			if (errm)
 elog(ERROR, "%s", errm);
@@ -389,8 +396,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, and
+			 * that the slot's data is still updated.
 			 */
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-06-22 Thread Michael Paquier
On Fri, Jun 22, 2018 at 04:33:12PM +0900, Kyotaro HORIGUCHI wrote:
> pg_advance_replication_slots can advance uninitialized physical
> slots and that might not be good. (Logical slots always have
> initial invalid values in thw two lsn columns.)

The current logic is careful that users willing to move to a new
position cannot move in the future, but the logic is visibly wanted to
accept past values.  Petr, what do you think?  KeepLogSeg() won't return
negative values so some applications may take advantage of that.  Or
should advancing be simply disabled for non-initialized slots? 
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-06-22 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 14 Jun 2018 16:06:43 -0400, Alvaro Herrera  
wrote in <20180614200643.3my362zmfiwitrni@alvherre.pgsql>
> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?

pg_advance_replication_slots can advance uninitialized physical
slots and that might not be good. (Logical slots always have
initial invalid values in thw two lsn columns.)

About scanning from restart_lsn in the advancing function, I
think I confirmed that the value always comes from
XLogRecordBuffer.origptr, which comes from ReadRecPtr, not
EndRecPtr, which cannot be on page boundary.

FWIW, as the result, it looks fine for me also regarding the
issue on this thread.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Possible bug in logical replication.

2018-06-21 Thread Petr Jelinek
On 20/06/18 09:59, Arseny Sher wrote:
> 
> Michael Paquier  writes:
> 
>> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>>> It seems to me that we still want to have the slot forwarding finish in
>>> this case even if this is interrupted.  Petr, isn't that the intention
>>> here?
>>
>> I have been chewing a bit more on the proposed patch, finishing with the
>> attached to close the loop.  Thoughts?
> 
> Sorry for being pedantic, but it seems to me worthwhile to mention *why*
> we need decoding machinery at all -- like I wrote:
> 
> + * While we could just do LogicalConfirmReceivedLocation updating
> + * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
> + * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog 
> tuples).
> 

+1

> Also,
> 
>>   * The slot's restart_lsn is used as start point for reading records,
> 
> This is clearly seen from the code, I propose to remove this.

Given there was just bug fix done for this I guess the extra clarity
there does not hurt.

> 
>>   * The LSN position to move to is checked by doing a per-record scan and
>>   * logical decoding which makes sure that confirmed_lsn is updated to a
>>   * LSN which allows the future slot consumer to get consistent logical
>> - * changes.
>> + * changes.  As decoding is done with fast_forward mode, no changes are
>> + * actually generated.
> 
> confirmed_lsn is always updated to `moveto` unless we run out of WAL
> earlier (and unless we try to move slot backwards, which is obviously
> forbidden) -- consistent changes are practically irrelevant
> here. Moreover, we can directly set confirmed_lsn and still have
> consistent changes further as restart_lsn and xmin of the slot are not
> touched. What we actually do here is trying to advance *restart_lsn and
> xmin* as far as we can but up to the point which ensures that decoding
> can assemble a consistent snapshot allowing to fully decode all COMMITs
> since updated `confirmed_flush_lsn`. All this happens in
> SnapBuildProcessRunningXacts.
> 

Those are two different things, here is consistent snapshot for logical
decoding without which we can't decode and that's handled by restart_lsn
and xmin. But the consistent stream of data for the consumer is handled
by confirmed_lsn (and the comment says that). You have to take into
account that next use of the slot can consume data (ie may be done via
one of the other interfaces and not by move). So I think what Michael
has here is correct.

>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted.  Petr, isn't that the intention
>> here?
> 

Well, it seems wasteful to just exit there if we already finished all
the requested work, also gives some consistency with the coding of
get/peek_changes. Not very important for the functionality either way.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Possible bug in logical replication.

2018-06-20 Thread Arseny Sher

Michael Paquier  writes:

> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted.  Petr, isn't that the intention
>> here?
>
> I have been chewing a bit more on the proposed patch, finishing with the
> attached to close the loop.  Thoughts?

Sorry for being pedantic, but it seems to me worthwhile to mention *why*
we need decoding machinery at all -- like I wrote:

+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).

Also,

>   * The slot's restart_lsn is used as start point for reading records,

This is clearly seen from the code, I propose to remove this.

>   * while confirmed_lsn is used as base point for the decoding context.

And as I wrote, this doesn't matter as changes are not produced.

>   * The LSN position to move to is checked by doing a per-record scan and
>   * logical decoding which makes sure that confirmed_lsn is updated to a
>   * LSN which allows the future slot consumer to get consistent logical
> - * changes.
> + * changes.  As decoding is done with fast_forward mode, no changes are
> + * actually generated.

confirmed_lsn is always updated to `moveto` unless we run out of WAL
earlier (and unless we try to move slot backwards, which is obviously
forbidden) -- consistent changes are practically irrelevant
here. Moreover, we can directly set confirmed_lsn and still have
consistent changes further as restart_lsn and xmin of the slot are not
touched. What we actually do here is trying to advance *restart_lsn and
xmin* as far as we can but up to the point which ensures that decoding
can assemble a consistent snapshot allowing to fully decode all COMMITs
since updated `confirmed_flush_lsn`. All this happens in
SnapBuildProcessRunningXacts.

> It seems to me that we still want to have the slot forwarding finish in
> this case even if this is interrupted.  Petr, isn't that the intention
> here?

Probably, though I am not sure what is the point of this. Ok, I keep
this check in the updated (with your comments) patch and CC'ing Petr.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..76bafca41c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -335,11 +335,14 @@ CreateInitDecodingContext(char *plugin,
  *		The LSN at which to start decoding.  If InvalidXLogRecPtr, restart
  *		from the slot's confirmed_flush; otherwise, start from the specified
  *		location (but move it forwards to confirmed_flush if it's older than
- *		that, see below).
+ *		that, see below). Doesn't matter in fast_forward mode.
  *
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..0a4985ef8c 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,12 +341,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).
+ * We do it in special fast_forward mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +356,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	NIL,
 	true,
@@ -388,10 +385,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			startlsn = InvalidXLogRecPtr;
 
-			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
-			 */
+			/* Changes are not actually produced in fast_forward mode. */
 			if (record != NULL)
 

Re: Possible bug in logical replication.

2018-06-19 Thread Michael Paquier
On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
> It seems to me that we still want to have the slot forwarding finish in
> this case even if this is interrupted.  Petr, isn't that the intention
> here?

I have been chewing a bit more on the proposed patch, finishing with the
attached to close the loop.  Thoughts?
--
Michael
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..6068aefd02 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..7c03bf69d3 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -346,7 +346,8 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * The LSN position to move to is checked by doing a per-record scan and
  * logical decoding which makes sure that confirmed_lsn is updated to a
  * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * changes.  As decoding is done with fast_forward mode, no changes are
+ * actually generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -389,8 +390,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, but
+			 * that the slot's data still needs to be updated.
 			 */
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-06-18 Thread Michael Paquier
On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
> I confirm that starting reading WAL since restart_lsn as implemented in
> f731cfa fixes this issue, as well as the second issue tushar mentioned
> at [1].

Thanks!

+/*
+ * Start reading WAL at restart_lsn, which certainly points to the valid
+ * record.
+ */
 XLogRecPtrstartlsn = MyReplicationSlot->data.restart_lsn;
 XLogRecPtrretlsn = MyReplicationSlot->data.confirmed_flush;

What does this one actually bring?

 PG_TRY();
 {
-/* restart at slot's confirmed_flush */
+/* start_lsn doesn't matter here, we don't replay xacts at all */
 ctx = CreateDecodingContext(InvalidXLogRecPtr,
 NIL,
 true,

Okay for this one.

-/*
- * The {begin_txn,change,commit_txn}_wrapper callbacks above will
- * store the description into our tuplestore.
- */
+/* Changes are not actually produced in fast_forward mode. */

This one is a good idea.  Now CreateDecodingContext is missing the
description of what fast_forward actually does, aka no changes are
produced.  Could you update your patch to reflect that?  That would be
useful for future callers of CreateDecodingContext as well.

-/* Stop once the moving point wanted by caller has been reached */
-if (moveto <= ctx->reader->EndRecPtr)
-break;
-
 CHECK_FOR_INTERRUPTS();

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted.  Petr, isn't that the intention
here?
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-06-15 Thread Arseny Sher

Alvaro Herrera  writes:

> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?

I confirm that starting reading WAL since restart_lsn as implemented in
f731cfa fixes this issue, as well as the second issue tushar mentioned
at [1]. I think that the code still can be improved a bit though --
consider the attached patch:
* pg_logical_replication_slot_advance comment was not very informative
  and even a bit misleading: it said that we use confirmed_flush_lsn and
  restart_lsn, but didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.

[1] 
https://www.postgresql.org/message-id/5f85bf41-098e-c4e1-7332-9171fef57a0a%40enterprisedb.com

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From d8ed8ae3eec54b716d7dbb35379d0047a96c6c75 Mon Sep 17 00:00:00 2001
From: Arseny Sher 
Date: Fri, 15 Jun 2018 18:11:17 +0300
Subject: [PATCH] Cosmetic review for f731cfa.

* pg_logical_replication_slot_advance comment was not very informative and even
  a bit misleading: it said that we use confirmed_flush_lsn and restart_lsn, but
  didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.
---
 src/backend/replication/slotfuncs.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..597e81f4d9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,24 +341,26 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn allowing
+ * to recycle WAL and old catalog tuples. We do it in special fast_forward
+ * mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
+	/*
+	 * Start reading WAL at restart_lsn, which certainly points to the valid
+	 * record.
+	 */
 	XLogRecPtr	startlsn = MyReplicationSlot->data.restart_lsn;
 	XLogRecPtr	retlsn = MyReplicationSlot->data.confirmed_flush;
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/* start_lsn doesn't matter here, we don't replay xacts at all */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	NIL,
 	true,
@@ -388,17 +390,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			startlsn = InvalidXLogRecPtr;
 
-			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
-			 */
+			/* Changes are not actually produced in fast_forward mode. */
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* Stop once the moving point wanted by caller has been reached */
-			if (moveto <= ctx->reader->EndRecPtr)
-break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
-- 
2.11.0



Re: Possible bug in logical replication.

2018-06-15 Thread Masahiko Sawada
On Fri, Jun 15, 2018 at 5:06 AM, Alvaro Herrera
 wrote:
> Hello
>
> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?
>

I don't confirm that commit deeply yet but I have confirmed that the
reported bug is fixed using the following test case which led an
assertion failure. And this test case is the same as the previous
report[1].

postgres(1:128004)=# select pg_create_logical_replication_slot('s1',
'pgoutput');
2018-06-15 18:04:31.301 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.301 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.301 JST [128004] STATEMENT:  select
pg_create_logical_replication_slot('s1', 'pgoutput');
 pg_create_logical_replication_slot

 (s1,0/16453C0)
(1 row)

postgres(1:128004)=# select pg_switch_wal();
 pg_switch_wal
---
 0/16453D8
(1 row)

postgres(1:128004)=# select pg_switch_wal();
 pg_switch_wal
---
 0/200
(1 row)

postgres(1:128004)=# select pg_replication_slot_advance('s1', '0/200');
2018-06-15 18:04:31.338 JST [128004] LOG:  starting logical decoding
for slot "s1"
2018-06-15 18:04:31.338 JST [128004] DETAIL:  Streaming transactions
committing after 0/16453C0, reading WAL from 0/1645388.
2018-06-15 18:04:31.338 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', '0/200');
2018-06-15 18:04:31.339 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.339 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.339 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', '0/200');
 pg_replication_slot_advance
-
 (s1,0/200)
(1 row)

postgres(1:128004)=# create table a (c int);
select pg_replication_slot_advance('s1', pg_current_wal_lsn());
CREATE TABLE
postgres(1:128004)=# select pg_replication_slot_advance('s1',
pg_current_wal_lsn());
2018-06-15 18:04:31.401 JST [128004] LOG:  starting logical decoding
for slot "s1"
2018-06-15 18:04:31.401 JST [128004] DETAIL:  Streaming transactions
committing after 0/200, reading WAL from 0/1645388.
2018-06-15 18:04:31.401 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
2018-06-15 18:04:31.402 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.402 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.402 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
 pg_replication_slot_advance
-
 (s1,0/2017828)
(1 row)

[1] 
https://www.postgresql.org/message-id/34d66f63-40a9-4c3e-c9a1-248d1e393d29%40enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Possible bug in logical replication.

2018-06-10 Thread Michael Paquier
On Thu, Jun 07, 2018 at 08:32:10AM +0100, Simon Riggs wrote:
> On 6 June 2018 at 17:22, Alvaro Herrera  wrote:
>> This thread seems to have died down without any fix being proposed.
>> Simon, you own this open item.
> 
> Thanks, will look.

Petr and I have found a couple of issues about the slot advance stuff on
this thread:
https://www.postgresql.org/message-id/20180525052805.GA15634%40paquier.xyz

The result is f731cfa, which, per my tests, is able to take care of this
issue as well as advancing first a slot to a WAL page boundary, and then
advancing it to the latest LSN available does not trigger any assertions
anymore.  It would be nice if there is a double-check though, so I am
letting this thread on the list of open items for now.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-06-07 Thread Simon Riggs
On 6 June 2018 at 17:22, Alvaro Herrera  wrote:
> This thread seems to have died down without any fix being proposed.
> Simon, you own this open item.

Thanks, will look.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-06-06 Thread Alvaro Herrera
This thread seems to have died down without any fix being proposed.
Simon, you own this open item.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Possible bug in logical replication.

2018-05-25 Thread Arseny Sher

Michael Paquier  writes:

> Maybe I am being too naive, but wouldn't it be just enough to update the
> confirmed flush LSN to ctx->reader->ReadRecPtr?  This way, the slot
> advances up to the beginning of the last record where user wants to
> advance, and not the beginning of the next record:

Same problem should be handled at pg_logical_slot_get_changes_guts and
apply worker feedback; and there is a convention that all commits since
confirmed_flush must be decoded, so we risk decoding such boundary
commit twice.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Possible bug in logical replication.

2018-05-25 Thread Arseny Sher
Hello,

Kyotaro HORIGUCHI  writes:

> restart_lsn stays at the beginning of a transaction until the
> transaction ends so just using restart_lsn allows repeated
> decoding of a transaction, in short, rewinding occurs. The
> function works only for inactive slot so the current code works
> fine on this point.

Sorry, I do not follow. restart_lsn is advanced whenever there is a
consistent snapshot dumped (in xl_running_xacts) which is old enough to
wholly decode all xacts not yet confirmed by the client. Could you
please elaborate, what's wrong with that?

> Addition to that restart_lsn also can be on a
> page bounary.

Do you have an example of that? restart_lsn is set initially to WAL
insert position at ReplicationSlotReserveWal, and later it always points
to xl_running_xacts record with consistent snapshot dumped.

> So directly set ctx->reader->EndRecPtr by startlsn fixes the
> problem, but I found another problem here.

There is a minor issue with the patch. Now slot advancement hangs
polling for new WAL on my example from [1]; most probably because we
must exit the loop when ctx->reader->EndRecPtr == moveto.

> The function accepts any LSN even if it is not at the begiining
> of a record. We will see errors or crashs or infinite waiting or
> maybe any kind of trouble by such values. The moved LSN must
> always be at the "end of a record" (that is, at the start of the
> next recored). The attached patch also fixes this.

Indeed, but we have these problems only if we are trying to read WAL
since confirmed_flush.

[1] https://www.postgresql.org/message-id/873720e4hf.fsf%40ars-thinkpad

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Possible bug in logical replication.

2018-05-25 Thread Michael Paquier
On Thu, May 24, 2018 at 10:14:01AM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada  
> wrote in 
>> Another possible way might be to make XLogFindNextRecord valid in
>> backend code and move startlsn to the first valid record with an lsn
>> >= startlsn by using that function. Please find attached patch.
> 
> The another reason for the code is the fact that confirmed_lsn is
> storing EndRecPtr after the last XLogReadRecord call. That is,
> from the definition, confirmed_lsn must be on the start of a
> record or page boundary and error out if not. For that reason,
> calling XLogFindNextRecord would not be the right thing to do
> here. We should just skip a header if we are on a boundary but it
> already done in XLogReadRecord.

Maybe I am being too naive, but wouldn't it be just enough to update the
confirmed flush LSN to ctx->reader->ReadRecPtr?  This way, the slot
advances up to the beginning of the last record where user wants to
advance, and not the beginning of the next record:
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -395,7 +395,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, 
XLogRecPtr moveto)
 
 if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
 {
-LogicalConfirmReceivedLocation(moveto);
+LogicalConfirmReceivedLocation(ctx->reader->ReadRecPtr);
 
 /*
  * If only the confirmed_flush_lsn has changed the slot won't get

I agree with the point made above to not touch manually the XLogReader
context out of xlogreader.c.
--
Michael


signature.asc
Description: PGP signature


Re: Possible bug in logical replication.

2018-05-23 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada  
wrote in 
> > So directly set ctx->reader->EndRecPtr by startlsn fixes the
> > problem, but I found another problem here.
> 
> I confirmed that the attached patch fixes this problem as well as the
> same problem reported on another thread.
> 
> I'm not sure it's a good approach to change the state of xlogreader
> directly in the replication slot codes because it also means that we
> have to care about this code as well when xlogreader code is changed.

XLogReadRecrod checks whether state->ReadRecPtr is invalid or not
in the case and works as the same to the explicit LSN case if
so. That is suggesting the usage. (I found no actual use case,
though.) It seems somewhat uneasy also to me, though..

> Another possible way might be to make XLogFindNextRecord valid in
> backend code and move startlsn to the first valid record with an lsn
> >= startlsn by using that function. Please find attached patch.

The another reason for the code is the fact that confirmed_lsn is
storing EndRecPtr after the last XLogReadRecord call. That is,
from the definition, confirmed_lsn must be on the start of a
record or page boundary and error out if not. For that reason,
calling XLogFindNextRecord would not be the right thing to do
here. We should just skip a header if we are on a boundary but it
already done in XLogReadRecord.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Possible bug in logical replication.

2018-05-23 Thread Masahiko Sawada
On Fri, May 18, 2018 at 2:37 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher  wrote 
> in <87in7md034.fsf@ars-thinkpad>
>>
>> Konstantin Knizhnik  writes:
>>
>> > I think that using restart_lsn instead of confirmed_flush is not right 
>> > approach.
>> > If restart_lsn is not available and confirmed_flush is pointing to page
>> > boundary, then in any case we should somehow handle this case and adjust
>> > startlsn to point on the valid record position (by jjust adding page header
>> > size?).
>>
>> Well, restart_lsn is always available on live slot: it is initially set
>> in ReplicationSlotReserveWal during slot creation.
>
> restart_lsn stays at the beginning of a transaction until the
> transaction ends so just using restart_lsn allows repeated
> decoding of a transaction, in short, rewinding occurs. The
> function works only for inactive slot so the current code works
> fine on this point. Addition to that restart_lsn also can be on a
> page bounary.
>
>
> We can see the problem easily.
>
> 1. Just create a logical replication slot with setting current LSN.
>
>   select pg_create_logical_replication_slot('s1', 'pgoutput');
>
> 2. Advance LSN by two or three pages by doing anyting.
>
> 3. Advance the slot to a page bounadry.
>
>   e.g. select pg_replication_slot_advance('s1', '0/9624000');
>
> 4. advance the slot further, then crash.
>
> So directly set ctx->reader->EndRecPtr by startlsn fixes the
> problem, but I found another problem here.

I confirmed that the attached patch fixes this problem as well as the
same problem reported on another thread.

I'm not sure it's a good approach to change the state of xlogreader
directly in the replication slot codes because it also means that we
have to care about this code as well when xlogreader code is changed.
Another possible way might be to make XLogFindNextRecord valid in
backend code and move startlsn to the first valid record with an lsn
>= startlsn by using that function. Please find attached patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


find_next_valid_record.patch
Description: Binary data


Re: Possible bug in logical replication.

2018-05-17 Thread Kyotaro HORIGUCHI
At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher  wrote 
in <87in7md034.fsf@ars-thinkpad>
> 
> Konstantin Knizhnik  writes:
> 
> > I think that using restart_lsn instead of confirmed_flush is not right 
> > approach.
> > If restart_lsn is not available and confirmed_flush is pointing to page
> > boundary, then in any case we should somehow handle this case and adjust
> > startlsn to point on the valid record position (by jjust adding page header
> > size?).
> 
> Well, restart_lsn is always available on live slot: it is initially set
> in ReplicationSlotReserveWal during slot creation.

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point. Addition to that restart_lsn also can be on a
page bounary.


We can see the problem easily.

1. Just create a logical replication slot with setting current LSN.

  select pg_create_logical_replication_slot('s1', 'pgoutput');

2. Advance LSN by two or three pages by doing anyting.

3. Advance the slot to a page bounadry.

  e.g. select pg_replication_slot_advance('s1', '0/9624000');

4. advance the slot further, then crash.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

The function accepts any LSN even if it is not at the begiining
of a record. We will see errors or crashs or infinite waiting or
maybe any kind of trouble by such values. The moved LSN must
always be at the "end of a record" (that is, at the start of the
next recored). The attached patch also fixes this.

The documentation doesn't look requiring a fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d9e10263bb..d3cb777f9f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -318,6 +318,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 /*
  * Helper function for advancing physical replication slot forward.
+ *
+ * This function accepts arbitrary LSN even if the LSN is not at the beginning
+ * of a record. This can lead to any kind of misbehavior but currently the
+ * value is used only to determine up to what wal segment to keep and
+ * successive implicit advancing fixes the state.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
@@ -344,6 +349,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
 	XLogRecPtr	retlsn = InvalidXLogRecPtr;
+	XLogRecPtr	upto;
 
 	PG_TRY();
 	{
@@ -354,6 +360,13 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 	logical_read_local_xlog_page,
 	NULL, NULL, NULL);
 
+		/*
+		 * startlsn can be on page boundary but it is not accepted as explicit
+		 * parameter to XLogReadRecord. Set it in reader context.
+		 */
+		Assert(startlsn != InvalidXLogRecPtr);
+		upto = ctx->reader->EndRecPtr = startlsn;
+	
 		CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
    "logical decoding");
 
@@ -361,22 +374,18 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 		InvalidateSystemCaches();
 
 		/* Decode until we run out of records */
-		while ((startlsn != InvalidXLogRecPtr && startlsn < moveto) ||
-			   (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < moveto))
+		while (ctx->reader->EndRecPtr <= moveto)
 		{
 			XLogRecord *record;
 			char	   *errm = NULL;
+ 
+			/* ctx->reader->EndRecPtr cannot be go backward here */
+			upto = ctx->reader->EndRecPtr;
 
-			record = XLogReadRecord(ctx->reader, startlsn, );
+			record = XLogReadRecord(ctx->reader, InvalidXLogRecPtr, );
 			if (errm)
 elog(ERROR, "%s", errm);
 
-			/*
-			 * Now that we've set up the xlog reader state, subsequent calls
-			 * pass InvalidXLogRecPtr to say "continue from last record"
-			 */
-			startlsn = InvalidXLogRecPtr;
-
 			/*
 			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
 			 * store the description into our tuplestore.
@@ -384,18 +393,14 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* check limits */
-			if (moveto <= ctx->reader->EndRecPtr)
-break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
 		CurrentResourceOwner = old_resowner;
 
-		if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
+		if (startlsn != upto)
 		{
-			LogicalConfirmReceivedLocation(moveto);
+			LogicalConfirmReceivedLocation(upto);
 
 			/*
 			 * If only the confirmed_flush_lsn has changed the slot won't get


Re: Possible bug in logical replication.

2018-05-17 Thread Arseny Sher

Konstantin Knizhnik  writes:

> I think that using restart_lsn instead of confirmed_flush is not right 
> approach.
> If restart_lsn is not available and confirmed_flush is pointing to page
> boundary, then in any case we should somehow handle this case and adjust
> startlsn to point on the valid record position (by jjust adding page header
> size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Possible bug in logical replication.

2018-05-17 Thread Konstantin Knizhnik



On 17.05.2018 10:45, Konstantin Knizhnik wrote:
We got the following assertion failure at our buildfarm of master 
branch of Postgres in contrib/test_decoding regression tests:


2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions 
committing after 0/2A0, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 
1)) & ~((uintptr_t) ((4) - 1)", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch 
pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres
 pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT
 '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at 
../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) 
((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)", errorType=0x56b8bf61 
"FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, 
errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at 
slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, 
econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, 
randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 , 
accessMtd=0x568b3ed0 , node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 , 
recheckMtd=0x568b3ec0 ) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at 
../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=, dest=0x0, direction=NoMovementScanDirection, 
numberTuples=, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, 
count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, 
forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) 
at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, 
run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 
"") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=, 
username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=, argv=) 
at postmaster.c:1379
#24 0x566cf642 in main (argc=, argv=) at 
main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a0) 
is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in 
pg_replication_slot_advance:
startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this 
regression tests 1000 times, so it seems to be very difficult to reproduced 
non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start 
of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example 
assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page 
boundary, then the following formula is used:

state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to 
page boundary.
It is confirmed by the following comment in XLogReadRecord function:

/*
 * RecPtr is pointing to end+1