Re: [HACKERS] Replication status in logical replication

2018-07-11 Thread Masahiko Sawada
On Thu, Jul 12, 2018 at 10:22 AM, Michael Paquier  wrote:
> On Tue, Jul 10, 2018 at 10:14:35AM +0900, Michael Paquier wrote:
>> Thanks.  If there are no objections, then I will try to wrap this stuff
>> on Thursday my time.
>
> And done down to 9.4.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-07-11 Thread Michael Paquier
On Tue, Jul 10, 2018 at 10:14:35AM +0900, Michael Paquier wrote:
> Thanks.  If there are no objections, then I will try to wrap this stuff
> on Thursday my time.

And done down to 9.4.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Replication status in logical replication

2018-07-09 Thread Michael Paquier
On Mon, Jul 09, 2018 at 05:25:55PM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. The patch looks fine to me, and I
> agree with all changes you made.

Thanks.  If there are no objections, then I will try to wrap this stuff
on Thursday my time.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Replication status in logical replication

2018-07-09 Thread Masahiko Sawada
On Fri, Jul 6, 2018 at 2:21 PM, Michael Paquier  wrote:
> On Thu, Jul 05, 2018 at 05:13:27PM +0900, Michael Paquier wrote:
>> This concerns as well v10, so that's not actually an open item...
>> Well, it was an open item last year.  The last set of patches is from
>> Simon here:
>> https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com
>
> logical_repl_caught_up_v4alt2.patch is actually incorrect after I tested
> the thing, and that logical_repl_caught_up_v4.patch gets the correct
> call.
>
>> Simon, do you feel confident with your patch?  If yes, could you finish
>> wrapping it?  I am getting myself familiar with the problem as this has
>> been around for some time now so I am reviewing the thing as well and
>> then I can board the ship..
>
> Okay, I have spent some time today looking at this patch, and the error
> is very easy to reproduce once you do that in the TAP tests:
> 1) Stop and start once the publisher in one of the tests of
> src/test/subscription.
> 2) Enforce wait_for_catchup() to check for state = 'streaming'.
> And then you would see the tests waiting until timeout is reached and
> then die.
>
> I would be inclined to add those tests in the final patch, the
> disadvantage being that 1) makes one of the test scripts a bit longer,
> but it can reproduce the failures most of the time.  Having 2) is
> actually nice for physical replication as the tests in
> src/test/recovery/ use wait_for_catchup() in various ways.
>
> Some other notes about the patch:
> - I switched the error message in WalSndLoop as mentioned upthread for
> nodes catching up, aka no more "primary" but "upstream server".
> - Added a note about using only GetFlushRecPtr in XLogSendLogical as
> logical decoding cannot be used on standby nodes.  If one day logical
> decoding gets supports on standby then this would need an update as
> well.
>
> Does this look fine to all the folks involved in this thread?  It is
> Friday afternoon here so my brain is getting fried, but I can finish
> wrapping up this patch at the beginning of next week if there are no
> objections.  At quick glance this indeed would need a backpatch down to
> 9.4 but I have not spent time testing those configurations yet.

Thank you for updating the patch. The patch looks fine to me, and I
agree with all changes you made.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-07-05 Thread Michael Paquier
On Thu, Jul 05, 2018 at 05:13:27PM +0900, Michael Paquier wrote:
> This concerns as well v10, so that's not actually an open item...
> Well, it was an open item last year.  The last set of patches is from
> Simon here:
> https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com

logical_repl_caught_up_v4alt2.patch is actually incorrect after I tested
the thing, and that logical_repl_caught_up_v4.patch gets the correct
call.

> Simon, do you feel confident with your patch?  If yes, could you finish
> wrapping it?  I am getting myself familiar with the problem as this has
> been around for some time now so I am reviewing the thing as well and
> then I can board the ship..

Okay, I have spent some time today looking at this patch, and the error
is very easy to reproduce once you do that in the TAP tests:
1) Stop and start once the publisher in one of the tests of
src/test/subscription.
2) Enforce wait_for_catchup() to check for state = 'streaming'.
And then you would see the tests waiting until timeout is reached and
then die.

I would be inclined to add those tests in the final patch, the
disadvantage being that 1) makes one of the test scripts a bit longer,
but it can reproduce the failures most of the time.  Having 2) is
actually nice for physical replication as the tests in
src/test/recovery/ use wait_for_catchup() in various ways.

Some other notes about the patch:
- I switched the error message in WalSndLoop as mentioned upthread for
nodes catching up, aka no more "primary" but "upstream server".
- Added a note about using only GetFlushRecPtr in XLogSendLogical as
logical decoding cannot be used on standby nodes.  If one day logical
decoding gets supports on standby then this would need an update as
well.

Does this look fine to all the folks involved in this thread?  It is
Friday afternoon here so my brain is getting fried, but I can finish
wrapping up this patch at the beginning of next week if there are no
objections.  At quick glance this indeed would need a backpatch down to
9.4 but I have not spent time testing those configurations yet.
--
Michael
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e47ddca6bc..3a0106bc93 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2169,7 +2169,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
 			{
 ereport(DEBUG1,
-		(errmsg("standby \"%s\" has now caught up with primary",
+		(errmsg("\"%s\" has now caught up with upstream server",
 application_name)));
 WalSndSetState(WALSNDSTATE_STREAMING);
 			}
@@ -2758,10 +2758,10 @@ XLogSendLogical(void)
 	char	   *errm;
 
 	/*
-	 * Don't know whether we've caught up yet. We'll set it to true in
-	 * WalSndWaitForWal, if we're actually waiting. We also set to true if
-	 * XLogReadRecord() had to stop reading but WalSndWaitForWal didn't wait -
-	 * i.e. when we're shutting down.
+	 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
+	 * true in WalSndWaitForWal, if we're actually waiting. We also set to
+	 * true if XLogReadRecord() had to stop reading but WalSndWaitForWal
+	 * didn't wait - i.e. when we're shutting down.
 	 */
 	WalSndCaughtUp = false;
 
@@ -2774,6 +2774,9 @@ XLogSendLogical(void)
 
 	if (record != NULL)
 	{
+		/* XXX: Note that logical decoding cannot be used while in recovery */
+		XLogRecPtr	flushPtr = GetFlushRecPtr();
+
 		/*
 		 * Note the lack of any call to LagTrackerWrite() which is handled by
 		 * WalSndUpdateProgress which is called by output plugin through
@@ -2782,6 +2785,13 @@ XLogSendLogical(void)
 		LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader);
 
 		sentPtr = logical_decoding_ctx->reader->EndRecPtr;
+
+		/*
+		 * If we have sent a record that is at or beyond the flushed point, we
+		 * have caught up.
+		 */
+		if (sentPtr >= flushPtr)
+			WalSndCaughtUp = true;
 	}
 	else
 	{
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index a08af65695..79fb457075 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1535,7 +1535,8 @@ also works for logical subscriptions)
 until its replication location in pg_stat_replication equals or passes the
 upstream's WAL insert point at the time this function is called. By default
 the replay_lsn is waited for, but 'mode' may be specified to wait for any of
-sent|write|flush|replay.
+sent|write|flush|replay. The connection catching up must be in a streaming
+state.
 
 If there is no active replication connection from this peer, waits until
 poll_query_until timeout.
@@ -1580,7 +1581,7 @@ sub wait_for_catchup
 	  . $lsn_expr . " on "
 	  . $self->name . "\n";
 	my $query =
-	  qq[SELECT $lsn_expr <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	  qq[SELECT $lsn_expr <= ${mode}_lsn AND state = 

Re: [HACKERS] Replication status in logical replication

2018-07-05 Thread Michael Paquier
On Mon, May 21, 2018 at 10:15:02AM +0900, Masahiko Sawada wrote:
> I've added this to Open Items so as not to forget.

This concerns as well v10, so that's not actually an open item...
Well, it was an open item last year.  The last set of patches is from
Simon here:
https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com

Simon, do you feel confident with your patch?  If yes, could you finish
wrapping it?  I am getting myself familiar with the problem as this has
been around for some time now so I am reviewing the thing as well and
then I can board the ship..
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Replication status in logical replication

2018-05-20 Thread Masahiko Sawada
On Tue, Apr 10, 2018 at 11:11 PM, David Steele  wrote:
> On 4/10/18 6:14 AM, Masahiko Sawada wrote:
>> On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao  wrote:
>>> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
 On 16 January 2018 at 06:21, Michael Paquier  
 wrote:
> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  
>> wrote:
>>> On 9 January 2018 at 04:36, Masahiko Sawada  
>>> wrote:
>>> We're not talking about standbys, so the message is incorrect.
>>
>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>> server"?
>
> +1.

 upstream is what I would have suggested, so +1 here also.

 Will commit.
>>>
>>> ping?
>>>
>>
>> Should I add this item to "Older Bugs" of the open item since we
>> regarded it as a bug?
>
> I'm going to reclassify this as a bug since everyone seems to agree it
> is one.

I've added this to Open Items so as not to forget.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-04-10 Thread David Steele
On 4/10/18 6:14 AM, Masahiko Sawada wrote:
> On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao  wrote:
>> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
>>> On 16 January 2018 at 06:21, Michael Paquier  
>>> wrote:
 On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  
> wrote:
>> On 9 January 2018 at 04:36, Masahiko Sawada  
>> wrote:
>> We're not talking about standbys, so the message is incorrect.
>
> Ah, I understood. How about "\"%s\"  has now caught up with upstream
> server"?

 +1.
>>>
>>> upstream is what I would have suggested, so +1 here also.
>>>
>>> Will commit.
>>
>> ping?
>>
> 
> Should I add this item to "Older Bugs" of the open item since we
> regarded it as a bug?

I'm going to reclassify this as a bug since everyone seems to agree it
is one.

Simon, are you still planning to commit this?

Thanks,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] Replication status in logical replication

2018-04-10 Thread Masahiko Sawada
On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao  wrote:
> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
>> On 16 January 2018 at 06:21, Michael Paquier  
>> wrote:
>>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
 On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  
 wrote:
> On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
> We're not talking about standbys, so the message is incorrect.

 Ah, I understood. How about "\"%s\"  has now caught up with upstream
 server"?
>>>
>>> +1.
>>
>> upstream is what I would have suggested, so +1 here also.
>>
>> Will commit.
>
> ping?
>

Should I add this item to "Older Bugs" of the open item since we
regarded it as a bug?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-03-29 Thread Fujii Masao
On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
> On 16 January 2018 at 06:21, Michael Paquier  
> wrote:
>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
 On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
 We're not talking about standbys, so the message is incorrect.
>>>
>>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>>> server"?
>>
>> +1.
>
> upstream is what I would have suggested, so +1 here also.
>
> Will commit.

ping?

Regards,

-- 
Fujii Masao



Re: [HACKERS] Replication status in logical replication

2018-01-16 Thread Masahiko Sawada
On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
> On 16 January 2018 at 06:21, Michael Paquier  
> wrote:
>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
 On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
 We're not talking about standbys, so the message is incorrect.
>>>
>>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>>> server"?
>>
>> +1.
>
> upstream is what I would have suggested, so +1 here also.
>
> Will commit.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-01-16 Thread Simon Riggs
On 16 January 2018 at 06:21, Michael Paquier  wrote:
> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
>>> On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
>>> We're not talking about standbys, so the message is incorrect.
>>
>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>> server"?
>
> +1.

upstream is what I would have suggested, so +1 here also.

Will commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Replication status in logical replication

2018-01-15 Thread Michael Paquier
On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
>> On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
>> We're not talking about standbys, so the message is incorrect.
> 
> Ah, I understood. How about "\"%s\"  has now caught up with upstream
> server"?

+1.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Replication status in logical replication

2018-01-13 Thread Simon Riggs
On 9 January 2018 at 04:36, Masahiko Sawada  wrote:

>> This patch appears to cause this DEBUG1 message
>>
>> "standby \"%s\" has now caught up with primary"
>>
>> which probably isn't the right message, but might be OK to backpatch.
>>
>> Thoughts on better wording?
>>
>
> I think that this DEBUG1 message appears when sent any WAL after
> caught up even without this patch. This patch makes this message
> appear at a properly timing. Or am I missing something?

We're not talking about standbys, so the message is incorrect.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Replication status in logical replication

2018-01-08 Thread Masahiko Sawada
On Sun, Jan 7, 2018 at 7:50 PM, Simon Riggs  wrote:
> On 26 December 2017 at 00:26, Masahiko Sawada  wrote:
>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>>  wrote:
>>> On 21/11/17 22:06, Masahiko Sawada wrote:

 After investigation, I found out that my previous patch was wrong
 direction. I should have changed XLogSendLogical() so that we can
 check the read LSN and set WalSndCaughtUp = true even after read a
 record without wait. Attached updated patch passed 'make check-world'.
 Please review it.

>>>
>>> Hi,
>>>
>>> This version looks good to me and seems to be in line with what we do in
>>> physical replication.
>>>
>>> Marking as ready for committer.
>>>
>>
>> Thank you for reviewing this patch!
>
> The patch calls this AFTER processing the record
>if (sentPtr >= GetFlushRecPtr())
> but it seems better to call GetFlushRecPtr() before we process the
> record, otherwise the flush pointer might have moved forwards while we
> process the record and it wouldn't catch up. (Physical replication
> works like that).

Agreed.

> New patch version attached for discussion before commit. (v4)

v4 patch looks good to me.

>
> I'd rather not call it at all at that point though, so if we made
> RecentFlushPtr static at the module level rather than within
> WalSndWaitForWal we could use it here also. That's a bit more invasive
> for backpatching, so not implemented that here.
>
> Overall, I find setting WalSndCaughtUp = false at the top of
> XLogSendLogical() to be incredibly ugly and I would like to remove it.
> It can't be correct to have a static status variable that oscillates
> between false and true with every record. This seems to be done
> because of the lack of a logical initialization call. Petr? Peter?
> Version with this removed (v4alt2)
>
> I've removed the edit that fusses over English grammar: both ways are correct.
>
>> I think this patch can be
>> back-patched to 9.4 as Simon mentioned.
>
> This patch appears to cause this DEBUG1 message
>
> "standby \"%s\" has now caught up with primary"
>
> which probably isn't the right message, but might be OK to backpatch.
>
> Thoughts on better wording?
>

I think that this DEBUG1 message appears when sent any WAL after
caught up even without this patch. This patch makes this message
appear at a properly timing. Or am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2018-01-07 Thread Simon Riggs
On 26 December 2017 at 00:26, Masahiko Sawada  wrote:
> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>  wrote:
>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>
>>> After investigation, I found out that my previous patch was wrong
>>> direction. I should have changed XLogSendLogical() so that we can
>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>> record without wait. Attached updated patch passed 'make check-world'.
>>> Please review it.
>>>
>>
>> Hi,
>>
>> This version looks good to me and seems to be in line with what we do in
>> physical replication.
>>
>> Marking as ready for committer.
>>
>
> Thank you for reviewing this patch!

The patch calls this AFTER processing the record
   if (sentPtr >= GetFlushRecPtr())
but it seems better to call GetFlushRecPtr() before we process the
record, otherwise the flush pointer might have moved forwards while we
process the record and it wouldn't catch up. (Physical replication
works like that).

New patch version attached for discussion before commit. (v4)

I'd rather not call it at all at that point though, so if we made
RecentFlushPtr static at the module level rather than within
WalSndWaitForWal we could use it here also. That's a bit more invasive
for backpatching, so not implemented that here.

Overall, I find setting WalSndCaughtUp = false at the top of
XLogSendLogical() to be incredibly ugly and I would like to remove it.
It can't be correct to have a static status variable that oscillates
between false and true with every record. This seems to be done
because of the lack of a logical initialization call. Petr? Peter?
Version with this removed (v4alt2)

I've removed the edit that fusses over English grammar: both ways are correct.

> I think this patch can be
> back-patched to 9.4 as Simon mentioned.

This patch appears to cause this DEBUG1 message

"standby \"%s\" has now caught up with primary"

which probably isn't the right message, but might be OK to backpatch.

Thoughts on better wording?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


logical_repl_caught_up_v4.patch
Description: Binary data


logical_repl_caught_up_v4alt2.patch
Description: Binary data


Re: [HACKERS] Replication status in logical replication

2017-12-26 Thread Tels
Moin,

On Tue, December 26, 2017 5:26 am, Masahiko Sawada wrote:
> On Tue, Dec 26, 2017 at 6:19 PM, Tels 
> wrote:
>> Moin,
>>
>> On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote:
>>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>>>  wrote:
 On 21/11/17 22:06, Masahiko Sawada wrote:
>
> After investigation, I found out that my previous patch was wrong
> direction. I should have changed XLogSendLogical() so that we can
> check the read LSN and set WalSndCaughtUp = true even after read a
> record without wait. Attached updated patch passed 'make
> check-world'.
> Please review it.
>

 Hi,

 This version looks good to me and seems to be in line with what we do
 in
 physical replication.

 Marking as ready for committer.
>>
>> (Sorry Masahiko, you'll get this twice, as fumbled the reply button.)
>>
>> I have not verifed that comment and/or code are correct, just a grammar
>> fix:
>>
>> +/*
>> + * If we've sent a record is at or beyond the flushed
>> point, then
>> + * we're caught up.
>>
>> That should read more like this:
>>
>> "If we've sent a record that is at or beyond the flushed point, we have
>> caught up."
>>
>
> Thank you for reviewing the patch!
> Actually, that comment is inspired by the comment just below comment.
> ISTM it's better to fix both if grammar of them is not appropriate.

Oh yes. Your attached version reads fine to me.

All the best,

Tels



Re: [HACKERS] Replication status in logical replication

2017-12-26 Thread Masahiko Sawada
On Tue, Dec 26, 2017 at 6:19 PM, Tels  wrote:
> Moin,
>
> On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote:
>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>>  wrote:
>>> On 21/11/17 22:06, Masahiko Sawada wrote:

 After investigation, I found out that my previous patch was wrong
 direction. I should have changed XLogSendLogical() so that we can
 check the read LSN and set WalSndCaughtUp = true even after read a
 record without wait. Attached updated patch passed 'make check-world'.
 Please review it.

>>>
>>> Hi,
>>>
>>> This version looks good to me and seems to be in line with what we do in
>>> physical replication.
>>>
>>> Marking as ready for committer.
>
> (Sorry Masahiko, you'll get this twice, as fumbled the reply button.)
>
> I have not verifed that comment and/or code are correct, just a grammar fix:
>
> +/*
> + * If we've sent a record is at or beyond the flushed
> point, then
> + * we're caught up.
>
> That should read more like this:
>
> "If we've sent a record that is at or beyond the flushed point, we have
> caught up."
>

Thank you for reviewing the patch!
Actually, that comment is inspired by the comment just below comment.
ISTM it's better to fix both if grammar of them is not appropriate.

+
+  /*
+* If we've sent a record that is at or beyond the flushed point, we
+* have caught up.
+*/
+   if (sentPtr >= GetFlushRecPtr())
+   WalSndCaughtUp = true;
   }
   else
   {
   /*
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
   if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
   {

Attached a updated patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


logical_repl_caught_up_v3.patch
Description: Binary data


Re: [HACKERS] Replication status in logical replication

2017-12-26 Thread Tels
Moin,

On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote:
> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>  wrote:
>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>
>>> After investigation, I found out that my previous patch was wrong
>>> direction. I should have changed XLogSendLogical() so that we can
>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>> record without wait. Attached updated patch passed 'make check-world'.
>>> Please review it.
>>>
>>
>> Hi,
>>
>> This version looks good to me and seems to be in line with what we do in
>> physical replication.
>>
>> Marking as ready for committer.

(Sorry Masahiko, you'll get this twice, as fumbled the reply button.)

I have not verifed that comment and/or code are correct, just a grammar fix:

+/*
+ * If we've sent a record is at or beyond the flushed
point, then
+ * we're caught up.

That should read more like this:

"If we've sent a record that is at or beyond the flushed point, we have
caught up."

All the best,

Tels




Re: [HACKERS] Replication status in logical replication

2017-12-25 Thread Masahiko Sawada
On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
 wrote:
> On 21/11/17 22:06, Masahiko Sawada wrote:
>>
>> After investigation, I found out that my previous patch was wrong
>> direction. I should have changed XLogSendLogical() so that we can
>> check the read LSN and set WalSndCaughtUp = true even after read a
>> record without wait. Attached updated patch passed 'make check-world'.
>> Please review it.
>>
>
> Hi,
>
> This version looks good to me and seems to be in line with what we do in
> physical replication.
>
> Marking as ready for committer.
>

Thank you for reviewing this patch! I think this patch can be
back-patched to 9.4 as Simon mentioned.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2017-12-25 Thread Petr Jelinek
On 21/11/17 22:06, Masahiko Sawada wrote:
> 
> After investigation, I found out that my previous patch was wrong
> direction. I should have changed XLogSendLogical() so that we can
> check the read LSN and set WalSndCaughtUp = true even after read a
> record without wait. Attached updated patch passed 'make check-world'.
> Please review it.
> 

Hi,

This version looks good to me and seems to be in line with what we do in
physical replication.

Marking as ready for committer.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Replication status in logical replication

2017-11-29 Thread Michael Paquier
On Wed, Nov 22, 2017 at 6:06 AM, Masahiko Sawada  wrote:
> After investigation, I found out that my previous patch was wrong
> direction. I should have changed XLogSendLogical() so that we can
> check the read LSN and set WalSndCaughtUp = true even after read a
> record without wait. Attached updated patch passed 'make check-world'.
> Please review it.

Moved to next CF.
-- 
Michael



Re: [HACKERS] Replication status in logical replication

2017-11-21 Thread Masahiko Sawada
On Tue, Nov 14, 2017 at 6:46 AM, Thomas Munro
 wrote:
> On Tue, Sep 26, 2017 at 3:45 PM, Masahiko Sawada  
> wrote:
>> On Tue, Sep 26, 2017 at 10:36 AM, Vaishnavi Prabakaran
>>  wrote:
>>> On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson  wrote:
 I’m not entirely sure why this was flagged as "Waiting for Author” by the
 automatic run, the patch applies for me and builds so resetting back to
 “Needs
 review”.

>>>
>>> This patch applies and build cleanly and I did a testing with one publisher
>>> and one subscriber, and confirm that the replication state after restarting
>>> the server now is "streaming" and not "Catchup".
>>>
>>> And, I don't find any issues with code and patch to me is ready for
>>> committer, marked the same in cf entry.
>
> Hi Sawada-san,
>
> My patch-testing robot doesn't like this patch[1].  I just tried it on
> my laptop to double-check and get some more details, and saw the same
> failures:
>
> (1) "make check" under src/test/recovery fails like this:
>
> t/006_logical_decoding.pl  2/16 # Looks like your test
> exited with 29 just after 4.
> t/006_logical_decoding.pl  Dubious, test returned 29
> (wstat 7424, 0x1d00)
> Failed 12/16 subtests
>
> regress_log_006_logical_decoding says:
>
> ok 4 - got same expected output from pg_recvlogical decoding session
> pg_recvlogical timed out at
> /opt/local/lib/perl5/vendor_perl/5.24/IPC/Run.pm line 2918.
>  waiting for endpos 0/1609B60 with stdout '', stderr '' at
> /Users/munro/projects/postgres/src/test/recovery/../../../src/test/perl/PostgresNode.pm
> line 1700.
> ### Stopping node "master" using mode immediate
> # Running: pg_ctl -D
> /Users/munro/projects/postgres/src/test/recovery/tmp_check/t_006_logical_decoding_master_data/pgdata
> -m immediate stop
> waiting for server to shut down done
> server stopped
> # No postmaster PID for node "master"
> # Looks like your test exited with 29 just after 4.
>
> (2) "make check" under src/test/subscription says:
>
> t/001_rep_changes.pl .. ok
> t/002_types.pl  #
> # Looks like your test exited with 60 before it could output anything.
> t/002_types.pl  Dubious, test returned 60 (wstat 15360, 0x3c00)
> Failed 3/3 subtests
> t/003_constraints.pl ..
>
> Each of those tooks several minutes, and I stopped it there.  It may
> be going to say some more things but is taking a very long time
> (presumably timing out, but the 001 took ages and then succeeded...
> hmm).  In fact I had to run this on my laptop to see that because on
> Travis CI the whole test job just gets killed after 10 minutes of
> non-output and the above output was never logged because of the way
> concurrent test jobs' output is buffered.
>
> I didn't try to figure out what is going wrong.
>

Thank you for the notification!

After investigation, I found out that my previous patch was wrong
direction. I should have changed XLogSendLogical() so that we can
check the read LSN and set WalSndCaughtUp = true even after read a
record without wait. Attached updated patch passed 'make check-world'.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


logical_repl_caught_up_v2.patch
Description: Binary data