On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: > At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi > <horikyota....@gmail.com> wrote in > > By the way, if latch is consumed in WalSndLoop, succeeding call to > > WalSndWaitForWal cannot be woke-up by the latch-set. Doesn't that > > cause missing wakeups? (in other words, overlooking of wakeup latch). > > - Since the only source other than timeout of walsender wakeup is latch, > - we should avoid wasteful consuming of latch. (It is the same issue > - with [1]). > > + Since walsender is wokeup by LSN advancement via latch, we should > + avoid wasteful consuming of latch. (It is the same issue with [1]). > > > > If wakeup signal is not remembered on walsender (like > > InterruptPending), WalSndPhysical cannot enter a sleep with > > confidence.
No; per latch.h, "What must be avoided is placing any checks for asynchronous events after WaitLatch and before ResetLatch, as that creates a race condition." In other words, the thing to avoid is calling ResetLatch() without next examining all pending work that a latch would signal. Each walsender.c WaitLatch call does follow the rules. On Sat, Apr 18, 2020 at 12:29:58AM +0900, Fujii Masao wrote: > On 2020/04/17 14:41, Noah Misch wrote: > >1. Make XLogSendPhysical() more like XLogSendLogical(), calling > > WalSndWaitForWal() when no WAL is available. A quick version of this > > passes tests, but I'll need to audit WalSndWaitForWal() for things that > > are > > wrong for physical replication. > > (1) makes even physical replication walsender sleep in two places and > which seems to make the code for physical replication complicated > more than necessary. I'd like to avoid (1) if possible. Good point. > >2. Make XLogSendLogical() more like XLogSendPhysical(), returning when > > insufficient WAL is available. This complicates the xlogreader.h API to > > pass back "wait for this XLogRecPtr", and we'd then persist enough state > > to > > resume decoding. This doesn't have any advantages to make up for those. > > > >3. Don't avoid waiting in WalSndLoop(); instead, fix the stall by copying the > > WalSndKeepalive() call from WalSndWaitForWal() to WalSndLoop(). This > > risks > > further drift between the two wait sites; on the other hand, one could > > refactor later to help avoid that. > > Since the additional call of WalSndKeepalive() is necessary only for > logical replication, it should be copied to, e.g., XLogSendLogical(), > instead of WalSndLoop()? For example, when XLogSendLogical() sets > WalSndCaughtUp to true, it should call WalSndKeepalive()? We'd send a keepalive even when pq_flush_if_writable() can't empty the output buffer. That could be acceptable, but it's not ideal. > The root problem seems that when WAL record that's no-opes for > logical rep is processed, keep alive message has not sent immediately, > in spite of that we want pg_stat_replication to be updated promptly. The degree of promptness should be predictable, at least. If we removed the WalSndKeepalive() from WalSndWaitForWal(), pg_stat_replication updates would not be prompt, but they would be predictable. I do, however, think prompt updates are worthwhile. > (3) seems to try to address this problem straightly and looks better to me. > > >4. Keep the WalSndLoop() wait, but condition it on !logical. This is the > > minimal fix, but it crudely punches through the abstraction between > > WalSndLoop() and its WalSndSendDataCallback. > > (4) also looks good because it's simple, if we can redesign those > functions in good shape. Let's do that. I'm attaching the replacement implementation and the revert of v1.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Revert "When WalSndCaughtUp, sleep only in WalSndWaitForWal()." This reverts commit 421685812290406daea58b78dfab0346eb683bbb. It caused idle physical walsenders to busy-wait, as reported by Fujii Masao. Discussion: https://postgr.es/m/20200417054146.ga1061...@rfd.leadboat.com diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fc475d1..122d884 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1428,10 +1428,8 @@ WalSndWaitForWal(XLogRecPtr loc) /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown - * possibly are waiting for a later location. So, before sleeping, we - * send a ping containing the flush location. If the receiver is - * otherwise idle, this keepalive will trigger a reply. Processing the - * reply will update these MyWalSnd locations. + * possibly are waiting for a later location. So we send pings + * containing the flush location every now and then. */ if (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && @@ -2316,16 +2314,20 @@ WalSndLoop(WalSndSendDataCallback send_data) WalSndKeepaliveIfNecessary(); /* - * Block if we have unsent data. Let WalSndWaitForWal() handle any - * other blocking; idle receivers need its additional actions. + * We don't block if not caught up, unless there is unsent data + * pending in which case we'd better block until the socket is + * write-ready. This test is only needed for the case where the + * send_data callback handled a subset of the available data but then + * pq_flush_if_writable flushed it all --- we should immediately try + * to send more. */ - if (pq_is_send_pending()) + if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending()) { long sleeptime; int wakeEvents; wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT | - WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE; + WL_SOCKET_READABLE; /* * Use fresh timestamp, not last_processing, to reduce the chance @@ -2333,6 +2335,9 @@ WalSndLoop(WalSndSendDataCallback send_data) */ sleeptime = WalSndComputeSleeptime(GetCurrentTimestamp()); + if (pq_is_send_pending()) + wakeEvents |= WL_SOCKET_WRITEABLE; + /* Sleep until something happens or we time out */ (void) WaitLatchOrSocket(MyLatch, wakeEvents, MyProcPort->sock, sleeptime,
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> In caught-up logical walsender, sleep only in WalSndWaitForWal(). Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write < sentPtr. When the latest physical LSN yields no logical replication messages (a common case), that keepalive elicits a reply. Processing the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when WalSndLoop() slept, replay_lsn advancement could stall until wal_receiver_status_interval elapsed. This sometimes stalled src/test/subscription/t/001_rep_changes.pl for up to 10s. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 9e56115..a880195 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1428,8 +1428,10 @@ WalSndWaitForWal(XLogRecPtr loc) /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown - * possibly are waiting for a later location. So we send pings - * containing the flush location every now and then. + * possibly are waiting for a later location. So, before sleeping, we + * send a ping containing the flush location. If the receiver is + * otherwise idle, this keepalive will trigger a reply. Processing the + * reply will update these MyWalSnd locations. */ if (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && @@ -2314,14 +2316,14 @@ WalSndLoop(WalSndSendDataCallback send_data) WalSndKeepaliveIfNecessary(); /* - * We don't block if not caught up, unless there is unsent data - * pending in which case we'd better block until the socket is - * write-ready. This test is only needed for the case where the - * send_data callback handled a subset of the available data but then - * pq_flush_if_writable flushed it all --- we should immediately try - * to send more. + * Block if we have unsent data. XXX For logical replication, let + * WalSndWaitForWal(), handle any other blocking; idle receivers need + * its additional actions. For physical replication, also block if + * caught up; its send_data does not block. */ - if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending()) + if ((WalSndCaughtUp && send_data != XLogSendLogical && + !streamingDoneSending) || + pq_is_send_pending()) { long sleeptime; int wakeEvents;