Hi, On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote: > The reason for that is pretty simple - the walsenders are doing logical > decoding, and during shutdown they're waiting for WalSndCaughtUp=true. > But per XLogSendLogical() this only happens if > > if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) > { > WalSndCaughtUp = true; > ... > } > > That is, we need to get beyong GetFlushRecPtr(). But that may never > happen, because there may be incomplete (and unflushed) page in WAL > buffers, not forced out by any transaction. So if there's a WAL record > overflowing to the unflushed page, the walsender will never catch up. > > Now, this situation is apparently expected, because WalSndWaitForWal() > does this: > > /* > * If we're shutting down, trigger pending WAL to be written out, > * otherwise we'd possibly end up waiting for WAL that never gets > * written, because walwriter has shut down already. > */ > if (got_STOPPING) > XLogBackgroundFlush(); > > but unfortunately that does not actually do anything, because right at > the very beginning XLogBackgroundFlush() does this: > > /* back off to last completed page boundary */ > WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
> That is, it intentionally ignores the incomplete page, which means the > walsender can't read the record and reach GetFlushRecPtr(). > > XLogBackgroundFlush() does this since (at least) 2007, so how come we > never had issues with this before? I assume that's because of the following logic: /* if we have already flushed that far, consider async commit records */ if (WriteRqst.Write <= LogwrtResult.Flush) { SpinLockAcquire(&XLogCtl->info_lck); WriteRqst.Write = XLogCtl->asyncXactLSN; SpinLockRelease(&XLogCtl->info_lck); flexible = false; /* ensure it all gets written */ } and various pieces of the code doing XLogSetAsyncXactLSN() to force flushing. I wonder if the issue is that we're better at avoiding unnecessary WAL to be written due to 6ef2eba3f57f17960b7cd4958e18aa79e357de2f > But I don't think we're safe without the failover slots patch, because > any output plugin can do something similar - say, LogLogicalMessage() or > something like that. I'm not aware of a plugin doing such things, but I > don't think it's illegal / prohibited either. (Of course, plugins that > generate WAL won't be useful for decoding on standby in the future.) FWIW, I'd consider such an output plugin outright broken. > So what I think we should do is to tweak XLogBackgroundFlush() so that > during shutdown it skips the back-off to page boundary, and flushes even > the last piece of WAL. There are only two callers anyway, so something > like XLogBackgroundFlush(bool) would be simple enough. I think it then just ought to be a normal XLogFlush(). I.e. something along the lines of: /* * If we're shutting down, trigger pending WAL to be written out, * otherwise we'd possibly end up waiting for WAL that never gets * written, because walwriter has shut down already. */ if (got_STOPPING && !RecoveryInProgress()) XLogFlush(GetXLogInsertRecPtr()); /* Update our idea of the currently flushed position. */ if (!RecoveryInProgress()) RecentFlushPtr = GetFlushRecPtr(); else RecentFlushPtr = GetXLogReplayRecPtr(NULL); Greetings, Andres Freund