On Tue, Jan 23, 2024 at 9:37 AM Jeff Davis <pg...@j-davis.com> wrote: > > On Mon, 2024-01-22 at 12:12 -0800, Andres Freund wrote: > > I still think that anything that requires such checks shouldn't be > > merged. It's completely bogus to check page contents for validity > > when we > > should have metadata telling us which range of the buffers is valid > > and which > > not. > > The check seems entirely unnecessary, to me. A leftover from v18? > > I have attached a new patch (version "19j") to illustrate some of my > previous suggestions. I didn't spend a lot of time on it so it's not > ready for commit, but I believe my suggestions are easier to understand > in code form.
> Note that, right now, it only works for XLogSendPhysical(). I believe > it's best to just make it work for 1-3 callers that we understand well, > and we can generalize later if it makes sense. +1 to do it for XLogSendPhysical() first. Enabling it for others can just be done as something like the attached v20-0003. > I'm still not clear on why some callers are reading XLOG_BLCKSZ > (expecting zeros at the end), and if it's OK to just change them to use > the exact byte count. "expecting zeros at the end" - this can't always be true as the WAL can get flushed after determining the flush ptr before reading it from the WAL file. FWIW, here's what I've tried previoulsy - https://github.com/BRupireddy2/postgres/tree/ensure_extra_read_WAL_page_is_zero_padded_at_the_end_WIP, the tests hit the Assert(false); added. Which means, the zero-padding comment around WALRead() call-sites isn't quite right. /* * Even though we just determined how much of the page can be validly read * as 'count', read the whole page anyway. It's guaranteed to be * zero-padded up to the page boundary if it's incomplete. */ if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli, I think this needs to be discussed separately. If okay, I'll start a new thread. > Also, if we've detected that the first requested buffer has been > evicted, is there any value in continuing the loop to see if more > recent buffers are available? For example, if the requested LSNs range > over buffers 4, 5, and 6, and 4 has already been evicted, should we try > to return LSN data from 5 and 6 at the proper offset in the dest > buffer? If so, we'd need to adjust the API so the caller knows what > parts of the dest buffer were filled in. I'd second this capability for now to keep the API simple and clear, but we can consider expanding it as needed. I reviewed the v19j and attached v20 patch set: 1. * The caller must ensure that it's reasonable to read from the WAL buffers, * i.e. that the requested data is from the current timeline, that we're not * in recovery, etc. I still think the XLogReadFromBuffers can just return in any of the above cases instead of comments. I feel we must assume the caller is going to ask the WAL from a different timeline and/or in recovery and design the API to deal with it. Done that way in v20 patch. 2. Fixed some typos, reworded a few comments (i.e. used "current insert/write position" instead of "Insert/Write pointer" like elsewhere), ran pgindent. 3. - * XXX probably this should be improved to suck data directly from the - * WAL buffers when possible. Removed the above comment before WALRead() since we have that facility now. Perhaps, we can say the callers can suck data directly from the WAL buffers using XLogReadFromBuffers. But I have no strong opinion on this. 4. + * Most callers will have already updated LogwrtResult when determining + * how far to read, but it's OK if it's out of date. (XXX: is it worth + * taking a spinlock to update LogwrtResult and check again before calling + * WaitXLogInsertionsToFinish()?) If the callers use GetFlushRecPtr() to determine how far to read, LogwrtResult will be *reasonably* latest, otherwise not. If LogwrtResult is a bit old, XLogReadFromBuffers will call WaitXLogInsertionsToFinish which will just loop over all insertion locks and return. As far as the current WAL readers are concerned, we don't need an explicit spinlock to determine LogwrtResult because all of them use GetFlushRecPtr() to determine how far to read. If there's any caller that's not updating LogwrtResult at all, we can consider reading LogwrtResult it ourselves in future. 5. I think the two requirements specified at https://www.postgresql.org/message-id/20231109205836.zjoawdrn4q77yemv%40awork3.anarazel.de still hold with the v19j. 5.1 Never allow WAL being read that's past XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos) as it does not exist. 5.2 If the to-be-read LSN is between XLogCtl->LogwrtResult->Write and XLogBytePosToRecPtr(Insert->CurrBytePos) we need to call WaitXLogInsertionsToFinish() before copying the data. + if (upto > LogwrtResult.Write) + { + XLogRecPtr writtenUpto = WaitXLogInsertionsToFinish(upto, false); + + upto = Min(upto, writtenUpto); + nbytes = upto - startptr; + } XLogReadFromBuffers ensures the above two with adjusting upto based on Min(upto, writtenUpto) as WaitXLogInsertionsToFinish returns the oldest insertion that is still in-progress. For instance, the current write LSN is 100, current insert LSN is 150 and upto is 200 - we only read upto 150 if startptr is < 150; we don't read anything if startptr is > 150. 6. I've modified the test module in v20-0002 patch as follows: 6.1 Renamed the module to read_wal_from_buffers stripping "test_" which otherwise is making the name longer. Longer names can cause failures on some Windows BF members if the PATH/FILE name is too long. 6.2 Tweaked tests to hit WaitXLogInsertionsToFinish() and upto = Min(upto, writtenUpto); in XLogReadFromBuffers. PSA v20 patch set. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v20-0001-Add-XLogReadFromBuffers.patch
Description: Binary data
v20-0002-Add-test-module-for-verifying-WAL-read-from-WAL-.patch
Description: Binary data
v20-0003-Use-XLogReadFromBuffers-in-more-places.patch
Description: Binary data