Hi, I have noticed that $SUBJECT, and I think it's bad, and I'd like to propose some patches to fix it. 0001 is hopefully uncontroversial, but 0002 might need some discussion as I'm not quite sure what approach to take.
The reason why RecoveryInProgress() has critical side effects is that it calls InitXLOGAccess(). It turns out that the only critical thing that InitXLOGAccess() does is call InitXLogInsert().[1] If a backend doesn't call InitXLogInsert(), max_rdatas won't be initialized, and the first attempt to insert WAL will probably fail with something like "ERROR: too much WAL data". But there's no real reason why InitXLogInsert() needs to happen only after recovery is finished. The work that it does is just setting up data structures in backend-local memory, and not much is lost if they are set up and not used. So, in 0001, I propose to move the call to InitXLogInsert() from InitXLOGAccess() to BaseInit(). That means every backend will do it the first time it starts up. It also means that CreateCheckPoint() will no longer need a special case call to InitXLogInsert(), which seems like a good thing. Now, what about the other stuff that InitXLOGAccess() does? As far as I can tell, "wal_segment_size = ControlFile->xlog_seg_size" is always redundant, because LocalProcessControlFile() has always been called, and it does the same thing. The postmaster does that call, and in non-EXEC_BACKEND builds, the global variable setting will be inherited by all child processes. In EXEC_BACKEND builds, every child process will go through SubPostmasterMain() which also calls LocalProcessControlFile() very early in the startup sequence. InitXLOGAccess() also sets RedoRecPtr and doPageWrites ... but these are non-critical values. If you set them to 0 and false respectively and then call XLogInsert(), GetFullPagesWritesInfo() will return 0 and false, XLogRecordAssemble() will use those values to make wrong decisions, and then XLogInsertRecord() will notice that the wrong values were used and will update them and cause XLogInsert() to loop around. Similarly, XLogCheckpointNeeded() relies on the caller to set RedoRecPtr appropriately, but all callers recheck after updating RedoRecPtr, so it's fine if the initial value is zero. The only other interesting case is XLogCheckBufferNeedsBackup(), which calls GetFullPageWriteInfo(), but if somehow it gets the wrong answer, the only thing that can happen is we might come the wrong conclusion about whether a heap-update WAL record should attempt to compress by looking for a common prefix and suffix between the old and new tuples. That's non-critical, and if it happens or doesn't happen at most once per backend, fine. Nevertheless, what I did in 0002 is remove InitXLOGAccess() but move the initialization of RedoRecPtr and doPageWrites into GetFullPageWritesInfo(), only in the case where RedoRecPtr hasn't been set yet. This has one property that I really really like, which is that makes the code more understandable. There is no suggestion that the correctness of WAL insertion somehow depends on a RecoveryInProgress() call which may or may not have occurred at some arbitrarily distant time in the past. Initializing the value at the point of first use is just way clearer than initializing it as a side-effect of some other function that's not even that tightly connected. However, it does have the effect of introducing a branch into GetFullPageWritesInfo(), which I'm not 100% sure is cheap enough not to matter. I think the obvious alternative is to just not initialize RedoRecPtr and doPageWrites at all and document in the comments that the first time each backend does something with them it's going to end up retrying once; perhaps that is preferable. Going the other way, we could get rid of the global variables altogether and have GetFullPageWrites() read the data from shared memory. However, unless 8-byte atomicity is guaranteed, that requires a spinlock, so it seems likely unappealing. (In case the explanation above is unclear to casual readers, here's a bit more detail: RedoRecPtr is the redo pointer from what we believe to be the most recent checkpoint record, and doFullPageWrites is whether full_page_writes are required at the moment. The latter can be true either because full_page_writes=on or because a base backup is in progress. When assembling an XLOG record, we have to decide whether to include full-page images in that or not, and we do that based on the value of these variables: if full page writes are required in general and if the page LSN precedes RedoRecPtr, we include a full page image. If it turns out that our RedoRecPtr was out of date - i.e. there's a newer checkpoint we didn't know about - then we go back and re-assemble the record to make sure all required full-page images are included.) Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com [1] Isn't it great that these two similarly-named functions capitalize "xlog" differently? Wa-hoo.
v1-0001-Move-InitXLogInsert-call-from-InitXLOGAccess-to-B.patch
Description: Binary data
v1-0002-Remove-InitXLOGAccess.patch
Description: Binary data