On Tue, Aug 24, 2021 at 09:51:25PM -0700, Soumyadeep Chakraborty wrote: > Ashwin and I recently got a chance to work on this and we addressed all > outstanding feedback and suggestions. PFA a significantly reworked patch.
+static void +StartWALReceiverEagerly() +{ The patch fails to apply because of the recent changes from Robert to eliminate ThisTimeLineID. The correct thing to do would be to add one TimeLineID argument, passing down the local ThisTimeLineID in StartupXLOG() and using XLogCtl->lastReplayedTLI in CheckRecoveryConsistency(). + /* + * We should never reach here. We should have at least one valid WAL + * segment in our pg_wal, for the standby to have started. + */ + Assert(false); The reason behind that is not that we have a standby, but that we read at least the segment that included the checkpoint record we are replaying from, at least (it is possible for a standby to start without any contents in pg_wal/ as long as recovery is configured), and because StartWALReceiverEagerly() is called just after that. It would be better to make sure that StartWALReceiverEagerly() gets only called from the startup process, perhaps? + RequestXLogStreaming(ThisTimeLineID, startptr, PrimaryConnInfo, + PrimarySlotName, wal_receiver_create_temp_slot); + XLogReaderFree(state); XLogReaderFree() should happen before RequestXLogStreaming(). The tipping point of the patch is here, where the WAL receiver is started based on the location of the first valid WAL record found. wal_receiver_start_condition is missing in postgresql.conf.sample. + /* + * Start WAL receiver eagerly if requested. + */ + if (StandbyModeRequested && !WalRcvStreaming() && + PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 && + wal_receiver_start_condition == WAL_RCV_START_AT_STARTUP) + StartWALReceiverEagerly(); [...] + if (StandbyModeRequested && !WalRcvStreaming() && reachedConsistency && + PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 && + wal_receiver_start_condition == WAL_RCV_START_AT_CONSISTENCY) + StartWALReceiverEagerly(); This repeats two times the same set of conditions, which does not look like a good idea to me. I think that you'd better add an extra argument to StartWALReceiverEagerly to track the start timing expected in this code path, that will be matched with the GUC in the routine. It would be better to document the reasons behind each check done, as well. + /* Find the latest and earliest WAL segments in pg_wal */ + dir = AllocateDir("pg_wal"); + while ((de = ReadDir(dir, "pg_wal")) != NULL) + { [ ... ] + /* Find the latest valid WAL segment and request streaming from its start */ + while (endsegno >= startsegno) + { [...] + XLogReaderFree(state); + endsegno--; + } So, this reads the contents of pg_wal/ for any files that exist, then goes down to the first segment found with a valid beginning. That's going to be expensive with a large max_wal_size. When searching for a point like that, a dichotomy method would be better to calculate a LSN you'd like to start from. Anyway, I think that there is a problem with the approach: what should we do if there are holes in the segments present in pg_wal/? As of HEAD, or wal_receiver_start_condition = 'exhaust' in this patch, we would switch across local pg_wal/, archive and stream in a linear way, thanks to WaitForWALToBecomeAvailable(). For example, imagine that we have a standby with the following set of valid segments, because of the buggy way a base backup has been taken: 000000010000000000000001 000000010000000000000003 000000010000000000000005 What the patch would do is starting a WAL receiver from segment 5, which is in contradiction with the existing logic where we should try to look for the segment once we are waiting for something in segment 2. This would be dangerous once the startup process waits for some WAL to become available, because we have a WAL receiver started, but we cannot fetch the segment we have. Perhaps a deployment has archiving, in which case it would be able to grab segment 2 (if no archiving, recovery would not be able to move on, so that would be game over). /* * Move to XLOG_FROM_STREAM state, and set to start a - * walreceiver if necessary. + * walreceiver if necessary. The WAL receiver may have + * already started (if it was configured to start + * eagerly). */ currentSource = XLOG_FROM_STREAM; - startWalReceiver = true; + startWalReceiver = !WalRcvStreaming(); break; case XLOG_FROM_ARCHIVE: case XLOG_FROM_PG_WAL: - /* - * WAL receiver must not be running when reading WAL from - * archive or pg_wal. - */ - Assert(!WalRcvStreaming()); These parts should IMO not be changed. They are strong assumptions we rely on in the startup process, and this comes down to the fact that it is not a good idea to mix a WAL receiver started while currentSource could be pointing at a WAL source completely different. That's going to bring a lot of racy conditions, I am afraid, as we rely on currentSource a lot during recovery, in combination that we expect the code to be able to retrieve WAL in a linear fashion from the LSN position that recovery is looking for. So, I think that deciding if a WAL receiver should be started blindly outside of the code path deciding if the startup process is waiting for some WAL is not a good idea, and the position we may begin to stream from may be something that we may have zero need for at the end (this is going to be tricky if we detect a TLI jump while replaying the local WAL, also?). The issue is that I am not sure what a good design for that should be. We have no idea when the startup process will need WAL from a different source until replay comes around, but what we want here is to anticipate othis LSN :) I am wondering if there should be a way to work out something with the control file, though, but things can get very fancy with HA and base backup deployments and the various cases we support thanks to the current way recovery works, as well. We could also go simpler and rework the priority order if both archiving and streaming are options wanted by the user. -- Michael
signature.asc
Description: PGP signature