Re: [HACKERS] Replication status in logical replication
On Thu, Jul 12, 2018 at 10:22 AM, Michael Paquier wrote: > On Tue, Jul 10, 2018 at 10:14:35AM +0900, Michael Paquier wrote: >> Thanks. If there are no objections, then I will try to wrap this stuff >> on Thursday my time. > > And done down to 9.4. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On Tue, Jul 10, 2018 at 10:14:35AM +0900, Michael Paquier wrote: > Thanks. If there are no objections, then I will try to wrap this stuff > on Thursday my time. And done down to 9.4. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Replication status in logical replication
On Mon, Jul 09, 2018 at 05:25:55PM +0900, Masahiko Sawada wrote: > Thank you for updating the patch. The patch looks fine to me, and I > agree with all changes you made. Thanks. If there are no objections, then I will try to wrap this stuff on Thursday my time. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Replication status in logical replication
On Fri, Jul 6, 2018 at 2:21 PM, Michael Paquier wrote: > On Thu, Jul 05, 2018 at 05:13:27PM +0900, Michael Paquier wrote: >> This concerns as well v10, so that's not actually an open item... >> Well, it was an open item last year. The last set of patches is from >> Simon here: >> https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com > > logical_repl_caught_up_v4alt2.patch is actually incorrect after I tested > the thing, and that logical_repl_caught_up_v4.patch gets the correct > call. > >> Simon, do you feel confident with your patch? If yes, could you finish >> wrapping it? I am getting myself familiar with the problem as this has >> been around for some time now so I am reviewing the thing as well and >> then I can board the ship.. > > Okay, I have spent some time today looking at this patch, and the error > is very easy to reproduce once you do that in the TAP tests: > 1) Stop and start once the publisher in one of the tests of > src/test/subscription. > 2) Enforce wait_for_catchup() to check for state = 'streaming'. > And then you would see the tests waiting until timeout is reached and > then die. > > I would be inclined to add those tests in the final patch, the > disadvantage being that 1) makes one of the test scripts a bit longer, > but it can reproduce the failures most of the time. Having 2) is > actually nice for physical replication as the tests in > src/test/recovery/ use wait_for_catchup() in various ways. > > Some other notes about the patch: > - I switched the error message in WalSndLoop as mentioned upthread for > nodes catching up, aka no more "primary" but "upstream server". > - Added a note about using only GetFlushRecPtr in XLogSendLogical as > logical decoding cannot be used on standby nodes. If one day logical > decoding gets supports on standby then this would need an update as > well. > > Does this look fine to all the folks involved in this thread? It is > Friday afternoon here so my brain is getting fried, but I can finish > wrapping up this patch at the beginning of next week if there are no > objections. At quick glance this indeed would need a backpatch down to > 9.4 but I have not spent time testing those configurations yet. Thank you for updating the patch. The patch looks fine to me, and I agree with all changes you made. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On Thu, Jul 05, 2018 at 05:13:27PM +0900, Michael Paquier wrote: > This concerns as well v10, so that's not actually an open item... > Well, it was an open item last year. The last set of patches is from > Simon here: > https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com logical_repl_caught_up_v4alt2.patch is actually incorrect after I tested the thing, and that logical_repl_caught_up_v4.patch gets the correct call. > Simon, do you feel confident with your patch? If yes, could you finish > wrapping it? I am getting myself familiar with the problem as this has > been around for some time now so I am reviewing the thing as well and > then I can board the ship.. Okay, I have spent some time today looking at this patch, and the error is very easy to reproduce once you do that in the TAP tests: 1) Stop and start once the publisher in one of the tests of src/test/subscription. 2) Enforce wait_for_catchup() to check for state = 'streaming'. And then you would see the tests waiting until timeout is reached and then die. I would be inclined to add those tests in the final patch, the disadvantage being that 1) makes one of the test scripts a bit longer, but it can reproduce the failures most of the time. Having 2) is actually nice for physical replication as the tests in src/test/recovery/ use wait_for_catchup() in various ways. Some other notes about the patch: - I switched the error message in WalSndLoop as mentioned upthread for nodes catching up, aka no more "primary" but "upstream server". - Added a note about using only GetFlushRecPtr in XLogSendLogical as logical decoding cannot be used on standby nodes. If one day logical decoding gets supports on standby then this would need an update as well. Does this look fine to all the folks involved in this thread? It is Friday afternoon here so my brain is getting fried, but I can finish wrapping up this patch at the beginning of next week if there are no objections. At quick glance this indeed would need a backpatch down to 9.4 but I have not spent time testing those configurations yet. -- Michael diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e47ddca6bc..3a0106bc93 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2169,7 +2169,7 @@ WalSndLoop(WalSndSendDataCallback send_data) if (MyWalSnd->state == WALSNDSTATE_CATCHUP) { ereport(DEBUG1, - (errmsg("standby \"%s\" has now caught up with primary", + (errmsg("\"%s\" has now caught up with upstream server", application_name))); WalSndSetState(WALSNDSTATE_STREAMING); } @@ -2758,10 +2758,10 @@ XLogSendLogical(void) char *errm; /* - * Don't know whether we've caught up yet. We'll set it to true in - * WalSndWaitForWal, if we're actually waiting. We also set to true if - * XLogReadRecord() had to stop reading but WalSndWaitForWal didn't wait - - * i.e. when we're shutting down. + * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to + * true in WalSndWaitForWal, if we're actually waiting. We also set to + * true if XLogReadRecord() had to stop reading but WalSndWaitForWal + * didn't wait - i.e. when we're shutting down. */ WalSndCaughtUp = false; @@ -2774,6 +2774,9 @@ XLogSendLogical(void) if (record != NULL) { + /* XXX: Note that logical decoding cannot be used while in recovery */ + XLogRecPtr flushPtr = GetFlushRecPtr(); + /* * Note the lack of any call to LagTrackerWrite() which is handled by * WalSndUpdateProgress which is called by output plugin through @@ -2782,6 +2785,13 @@ XLogSendLogical(void) LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader); sentPtr = logical_decoding_ctx->reader->EndRecPtr; + + /* + * If we have sent a record that is at or beyond the flushed point, we + * have caught up. + */ + if (sentPtr >= flushPtr) + WalSndCaughtUp = true; } else { diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index a08af65695..79fb457075 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1535,7 +1535,8 @@ also works for logical subscriptions) until its replication location in pg_stat_replication equals or passes the upstream's WAL insert point at the time this function is called. By default the replay_lsn is waited for, but 'mode' may be specified to wait for any of -sent|write|flush|replay. +sent|write|flush|replay. The connection catching up must be in a streaming +state. If there is no active replication connection from this peer, waits until poll_query_until timeout. @@ -1580,7 +1581,7 @@ sub wait_for_catchup . $lsn_expr . " on " . $self->name . "\n"; my $query = - qq[SELECT $lsn_expr <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; + qq[SELECT $lsn_expr <= ${mode}_lsn AND state =
Re: [HACKERS] Replication status in logical replication
On Mon, May 21, 2018 at 10:15:02AM +0900, Masahiko Sawada wrote: > I've added this to Open Items so as not to forget. This concerns as well v10, so that's not actually an open item... Well, it was an open item last year. The last set of patches is from Simon here: https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com Simon, do you feel confident with your patch? If yes, could you finish wrapping it? I am getting myself familiar with the problem as this has been around for some time now so I am reviewing the thing as well and then I can board the ship.. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Replication status in logical replication
On Tue, Apr 10, 2018 at 11:11 PM, David Steelewrote: > On 4/10/18 6:14 AM, Masahiko Sawada wrote: >> On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masao wrote: >>> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs wrote: On 16 January 2018 at 06:21, Michael Paquier wrote: > On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: >> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs >> wrote: >>> On 9 January 2018 at 04:36, Masahiko Sawada >>> wrote: >>> We're not talking about standbys, so the message is incorrect. >> >> Ah, I understood. How about "\"%s\" has now caught up with upstream >> server"? > > +1. upstream is what I would have suggested, so +1 here also. Will commit. >>> >>> ping? >>> >> >> Should I add this item to "Older Bugs" of the open item since we >> regarded it as a bug? > > I'm going to reclassify this as a bug since everyone seems to agree it > is one. I've added this to Open Items so as not to forget. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On 4/10/18 6:14 AM, Masahiko Sawada wrote: > On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masaowrote: >> On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs wrote: >>> On 16 January 2018 at 06:21, Michael Paquier >>> wrote: On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: > On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs > wrote: >> On 9 January 2018 at 04:36, Masahiko Sawada >> wrote: >> We're not talking about standbys, so the message is incorrect. > > Ah, I understood. How about "\"%s\" has now caught up with upstream > server"? +1. >>> >>> upstream is what I would have suggested, so +1 here also. >>> >>> Will commit. >> >> ping? >> > > Should I add this item to "Older Bugs" of the open item since we > regarded it as a bug? I'm going to reclassify this as a bug since everyone seems to agree it is one. Simon, are you still planning to commit this? Thanks, -- -David da...@pgmasters.net
Re: [HACKERS] Replication status in logical replication
On Fri, Mar 30, 2018 at 5:37 AM, Fujii Masaowrote: > On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs wrote: >> On 16 January 2018 at 06:21, Michael Paquier >> wrote: >>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs wrote: > On 9 January 2018 at 04:36, Masahiko Sawada wrote: > We're not talking about standbys, so the message is incorrect. Ah, I understood. How about "\"%s\" has now caught up with upstream server"? >>> >>> +1. >> >> upstream is what I would have suggested, so +1 here also. >> >> Will commit. > > ping? > Should I add this item to "Older Bugs" of the open item since we regarded it as a bug? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggswrote: > On 16 January 2018 at 06:21, Michael Paquier > wrote: >> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: >>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs wrote: On 9 January 2018 at 04:36, Masahiko Sawada wrote: We're not talking about standbys, so the message is incorrect. >>> >>> Ah, I understood. How about "\"%s\" has now caught up with upstream >>> server"? >> >> +1. > > upstream is what I would have suggested, so +1 here also. > > Will commit. ping? Regards, -- Fujii Masao
Re: [HACKERS] Replication status in logical replication
On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggswrote: > On 16 January 2018 at 06:21, Michael Paquier > wrote: >> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: >>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs wrote: On 9 January 2018 at 04:36, Masahiko Sawada wrote: We're not talking about standbys, so the message is incorrect. >>> >>> Ah, I understood. How about "\"%s\" has now caught up with upstream >>> server"? >> >> +1. > > upstream is what I would have suggested, so +1 here also. > > Will commit. > Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On 16 January 2018 at 06:21, Michael Paquierwrote: > On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: >> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs wrote: >>> On 9 January 2018 at 04:36, Masahiko Sawada wrote: >>> We're not talking about standbys, so the message is incorrect. >> >> Ah, I understood. How about "\"%s\" has now caught up with upstream >> server"? > > +1. upstream is what I would have suggested, so +1 here also. Will commit. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Replication status in logical replication
On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote: > On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggswrote: >> On 9 January 2018 at 04:36, Masahiko Sawada wrote: >> We're not talking about standbys, so the message is incorrect. > > Ah, I understood. How about "\"%s\" has now caught up with upstream > server"? +1. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Replication status in logical replication
On 9 January 2018 at 04:36, Masahiko Sawadawrote: >> This patch appears to cause this DEBUG1 message >> >> "standby \"%s\" has now caught up with primary" >> >> which probably isn't the right message, but might be OK to backpatch. >> >> Thoughts on better wording? >> > > I think that this DEBUG1 message appears when sent any WAL after > caught up even without this patch. This patch makes this message > appear at a properly timing. Or am I missing something? We're not talking about standbys, so the message is incorrect. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Replication status in logical replication
On Sun, Jan 7, 2018 at 7:50 PM, Simon Riggswrote: > On 26 December 2017 at 00:26, Masahiko Sawada wrote: >> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek >> wrote: >>> On 21/11/17 22:06, Masahiko Sawada wrote: After investigation, I found out that my previous patch was wrong direction. I should have changed XLogSendLogical() so that we can check the read LSN and set WalSndCaughtUp = true even after read a record without wait. Attached updated patch passed 'make check-world'. Please review it. >>> >>> Hi, >>> >>> This version looks good to me and seems to be in line with what we do in >>> physical replication. >>> >>> Marking as ready for committer. >>> >> >> Thank you for reviewing this patch! > > The patch calls this AFTER processing the record >if (sentPtr >= GetFlushRecPtr()) > but it seems better to call GetFlushRecPtr() before we process the > record, otherwise the flush pointer might have moved forwards while we > process the record and it wouldn't catch up. (Physical replication > works like that). Agreed. > New patch version attached for discussion before commit. (v4) v4 patch looks good to me. > > I'd rather not call it at all at that point though, so if we made > RecentFlushPtr static at the module level rather than within > WalSndWaitForWal we could use it here also. That's a bit more invasive > for backpatching, so not implemented that here. > > Overall, I find setting WalSndCaughtUp = false at the top of > XLogSendLogical() to be incredibly ugly and I would like to remove it. > It can't be correct to have a static status variable that oscillates > between false and true with every record. This seems to be done > because of the lack of a logical initialization call. Petr? Peter? > Version with this removed (v4alt2) > > I've removed the edit that fusses over English grammar: both ways are correct. > >> I think this patch can be >> back-patched to 9.4 as Simon mentioned. > > This patch appears to cause this DEBUG1 message > > "standby \"%s\" has now caught up with primary" > > which probably isn't the right message, but might be OK to backpatch. > > Thoughts on better wording? > I think that this DEBUG1 message appears when sent any WAL after caught up even without this patch. This patch makes this message appear at a properly timing. Or am I missing something? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On 26 December 2017 at 00:26, Masahiko Sawadawrote: > On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek > wrote: >> On 21/11/17 22:06, Masahiko Sawada wrote: >>> >>> After investigation, I found out that my previous patch was wrong >>> direction. I should have changed XLogSendLogical() so that we can >>> check the read LSN and set WalSndCaughtUp = true even after read a >>> record without wait. Attached updated patch passed 'make check-world'. >>> Please review it. >>> >> >> Hi, >> >> This version looks good to me and seems to be in line with what we do in >> physical replication. >> >> Marking as ready for committer. >> > > Thank you for reviewing this patch! The patch calls this AFTER processing the record if (sentPtr >= GetFlushRecPtr()) but it seems better to call GetFlushRecPtr() before we process the record, otherwise the flush pointer might have moved forwards while we process the record and it wouldn't catch up. (Physical replication works like that). New patch version attached for discussion before commit. (v4) I'd rather not call it at all at that point though, so if we made RecentFlushPtr static at the module level rather than within WalSndWaitForWal we could use it here also. That's a bit more invasive for backpatching, so not implemented that here. Overall, I find setting WalSndCaughtUp = false at the top of XLogSendLogical() to be incredibly ugly and I would like to remove it. It can't be correct to have a static status variable that oscillates between false and true with every record. This seems to be done because of the lack of a logical initialization call. Petr? Peter? Version with this removed (v4alt2) I've removed the edit that fusses over English grammar: both ways are correct. > I think this patch can be > back-patched to 9.4 as Simon mentioned. This patch appears to cause this DEBUG1 message "standby \"%s\" has now caught up with primary" which probably isn't the right message, but might be OK to backpatch. Thoughts on better wording? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services logical_repl_caught_up_v4.patch Description: Binary data logical_repl_caught_up_v4alt2.patch Description: Binary data
Re: [HACKERS] Replication status in logical replication
Moin, On Tue, December 26, 2017 5:26 am, Masahiko Sawada wrote: > On Tue, Dec 26, 2017 at 6:19 PM, Tels> wrote: >> Moin, >> >> On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote: >>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek >>> wrote: On 21/11/17 22:06, Masahiko Sawada wrote: > > After investigation, I found out that my previous patch was wrong > direction. I should have changed XLogSendLogical() so that we can > check the read LSN and set WalSndCaughtUp = true even after read a > record without wait. Attached updated patch passed 'make > check-world'. > Please review it. > Hi, This version looks good to me and seems to be in line with what we do in physical replication. Marking as ready for committer. >> >> (Sorry Masahiko, you'll get this twice, as fumbled the reply button.) >> >> I have not verifed that comment and/or code are correct, just a grammar >> fix: >> >> +/* >> + * If we've sent a record is at or beyond the flushed >> point, then >> + * we're caught up. >> >> That should read more like this: >> >> "If we've sent a record that is at or beyond the flushed point, we have >> caught up." >> > > Thank you for reviewing the patch! > Actually, that comment is inspired by the comment just below comment. > ISTM it's better to fix both if grammar of them is not appropriate. Oh yes. Your attached version reads fine to me. All the best, Tels
Re: [HACKERS] Replication status in logical replication
On Tue, Dec 26, 2017 at 6:19 PM, Telswrote: > Moin, > > On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote: >> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek >> wrote: >>> On 21/11/17 22:06, Masahiko Sawada wrote: After investigation, I found out that my previous patch was wrong direction. I should have changed XLogSendLogical() so that we can check the read LSN and set WalSndCaughtUp = true even after read a record without wait. Attached updated patch passed 'make check-world'. Please review it. >>> >>> Hi, >>> >>> This version looks good to me and seems to be in line with what we do in >>> physical replication. >>> >>> Marking as ready for committer. > > (Sorry Masahiko, you'll get this twice, as fumbled the reply button.) > > I have not verifed that comment and/or code are correct, just a grammar fix: > > +/* > + * If we've sent a record is at or beyond the flushed > point, then > + * we're caught up. > > That should read more like this: > > "If we've sent a record that is at or beyond the flushed point, we have > caught up." > Thank you for reviewing the patch! Actually, that comment is inspired by the comment just below comment. ISTM it's better to fix both if grammar of them is not appropriate. + + /* +* If we've sent a record that is at or beyond the flushed point, we +* have caught up. +*/ + if (sentPtr >= GetFlushRecPtr()) + WalSndCaughtUp = true; } else { /* * If the record we just wanted read is at or beyond the flushed * point, then we're caught up. */ if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) { Attached a updated patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center logical_repl_caught_up_v3.patch Description: Binary data
Re: [HACKERS] Replication status in logical replication
Moin, On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote: > On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek >wrote: >> On 21/11/17 22:06, Masahiko Sawada wrote: >>> >>> After investigation, I found out that my previous patch was wrong >>> direction. I should have changed XLogSendLogical() so that we can >>> check the read LSN and set WalSndCaughtUp = true even after read a >>> record without wait. Attached updated patch passed 'make check-world'. >>> Please review it. >>> >> >> Hi, >> >> This version looks good to me and seems to be in line with what we do in >> physical replication. >> >> Marking as ready for committer. (Sorry Masahiko, you'll get this twice, as fumbled the reply button.) I have not verifed that comment and/or code are correct, just a grammar fix: +/* + * If we've sent a record is at or beyond the flushed point, then + * we're caught up. That should read more like this: "If we've sent a record that is at or beyond the flushed point, we have caught up." All the best, Tels
Re: [HACKERS] Replication status in logical replication
On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinekwrote: > On 21/11/17 22:06, Masahiko Sawada wrote: >> >> After investigation, I found out that my previous patch was wrong >> direction. I should have changed XLogSendLogical() so that we can >> check the read LSN and set WalSndCaughtUp = true even after read a >> record without wait. Attached updated patch passed 'make check-world'. >> Please review it. >> > > Hi, > > This version looks good to me and seems to be in line with what we do in > physical replication. > > Marking as ready for committer. > Thank you for reviewing this patch! I think this patch can be back-patched to 9.4 as Simon mentioned. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Replication status in logical replication
On 21/11/17 22:06, Masahiko Sawada wrote: > > After investigation, I found out that my previous patch was wrong > direction. I should have changed XLogSendLogical() so that we can > check the read LSN and set WalSndCaughtUp = true even after read a > record without wait. Attached updated patch passed 'make check-world'. > Please review it. > Hi, This version looks good to me and seems to be in line with what we do in physical replication. Marking as ready for committer. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Replication status in logical replication
On Wed, Nov 22, 2017 at 6:06 AM, Masahiko Sawadawrote: > After investigation, I found out that my previous patch was wrong > direction. I should have changed XLogSendLogical() so that we can > check the read LSN and set WalSndCaughtUp = true even after read a > record without wait. Attached updated patch passed 'make check-world'. > Please review it. Moved to next CF. -- Michael
Re: [HACKERS] Replication status in logical replication
On Tue, Nov 14, 2017 at 6:46 AM, Thomas Munrowrote: > On Tue, Sep 26, 2017 at 3:45 PM, Masahiko Sawada > wrote: >> On Tue, Sep 26, 2017 at 10:36 AM, Vaishnavi Prabakaran >> wrote: >>> On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson wrote: I’m not entirely sure why this was flagged as "Waiting for Author” by the automatic run, the patch applies for me and builds so resetting back to “Needs review”. >>> >>> This patch applies and build cleanly and I did a testing with one publisher >>> and one subscriber, and confirm that the replication state after restarting >>> the server now is "streaming" and not "Catchup". >>> >>> And, I don't find any issues with code and patch to me is ready for >>> committer, marked the same in cf entry. > > Hi Sawada-san, > > My patch-testing robot doesn't like this patch[1]. I just tried it on > my laptop to double-check and get some more details, and saw the same > failures: > > (1) "make check" under src/test/recovery fails like this: > > t/006_logical_decoding.pl 2/16 # Looks like your test > exited with 29 just after 4. > t/006_logical_decoding.pl Dubious, test returned 29 > (wstat 7424, 0x1d00) > Failed 12/16 subtests > > regress_log_006_logical_decoding says: > > ok 4 - got same expected output from pg_recvlogical decoding session > pg_recvlogical timed out at > /opt/local/lib/perl5/vendor_perl/5.24/IPC/Run.pm line 2918. > waiting for endpos 0/1609B60 with stdout '', stderr '' at > /Users/munro/projects/postgres/src/test/recovery/../../../src/test/perl/PostgresNode.pm > line 1700. > ### Stopping node "master" using mode immediate > # Running: pg_ctl -D > /Users/munro/projects/postgres/src/test/recovery/tmp_check/t_006_logical_decoding_master_data/pgdata > -m immediate stop > waiting for server to shut down done > server stopped > # No postmaster PID for node "master" > # Looks like your test exited with 29 just after 4. > > (2) "make check" under src/test/subscription says: > > t/001_rep_changes.pl .. ok > t/002_types.pl # > # Looks like your test exited with 60 before it could output anything. > t/002_types.pl Dubious, test returned 60 (wstat 15360, 0x3c00) > Failed 3/3 subtests > t/003_constraints.pl .. > > Each of those tooks several minutes, and I stopped it there. It may > be going to say some more things but is taking a very long time > (presumably timing out, but the 001 took ages and then succeeded... > hmm). In fact I had to run this on my laptop to see that because on > Travis CI the whole test job just gets killed after 10 minutes of > non-output and the above output was never logged because of the way > concurrent test jobs' output is buffered. > > I didn't try to figure out what is going wrong. > Thank you for the notification! After investigation, I found out that my previous patch was wrong direction. I should have changed XLogSendLogical() so that we can check the read LSN and set WalSndCaughtUp = true even after read a record without wait. Attached updated patch passed 'make check-world'. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center logical_repl_caught_up_v2.patch Description: Binary data