On 03/02/2026 00:33, Andres Freund wrote:
   - Now that we use the normal order of WAL logging, we don't need to delay
     checkpoint starts anymore.

     I think the explanation for why that is ok is correct [1], but it needs to
     be looked at by somebody with experience around this. Maybe Heikki?

So that's patch 0004 "bufmgr: Switch to standard order in MarkBufferDirtyHint()". Yes, looks correct to me.

        /*
         * Update RedoRecPtr so that we can make the right decision. It's 
possible
         * that a new checkpoint will start just after GetRedoRecPtr(), but that
         * is ok, as the buffer is already dirty, ensuring that any BufferSync()
         * started after the buffer was marked dirty cannot complete without
         * flushing this buffer.  If a checkpoint started between marking the
         * buffer dirty and this check, we will emit an unnecessary WAL record 
(as
         * the buffer will be written out as part of the checkpoint), but the
         * window for that is small.
         */
        RedoRecPtr = GetRedoRecPtr();

That "small window" is actually pretty big if you think of it a little more loosely. Our rule is that we write the full page image if a checkpoint has started since the page LSN, but that's very conservative already. It would be sufficient to write the full page image only if the checkpoint has already flushed the page. This small window is just a special case of that conservatism.

I've been thinking of trying track that more accurately for a long time, because it would smoothen the WAL spike when a checkpoint begins.

That gets off-topic, but my point is that it feels a little silly to mention that small window when there's the other giant panoramic window next to it.

- Heikki



Reply via email to