On 2020-Mar-17, Thomas Munro wrote: Hi Thomas
> On Sat, Mar 14, 2020 at 10:15 AM Alvaro Herrera > <alvhe...@2ndquadrant.com> wrote: > > I didn't manage to go over 0005 though, but I agree with Tomas that > > having this be configurable in terms of bytes of WAL is not very > > user-friendly. > > The primary control is now maintenance_io_concurrency, which is > basically what Tomas suggested. > The byte-based control is just a cap to prevent it reading a crazy > distance ahead, that also functions as the on/off switch for the > feature. In this version I've added "max" to the name, to make that > clearer. Mumble. I guess I should wait to comment on this after reading 0005 more in depth. > > First of all, let me join the crowd chanting that this is badly needed; > > I don't need to repeat what Chittenden's talk showed. "WAL recovery is > > now 10x-20x times faster" would be a good item for pg13 press release, > > I think. > > We should be careful about over-promising here: Sean basically had a > best case scenario for this type of techology, partly due to his 16kB > filesystem blocks. Common results may be a lot more pedestrian, > though it could get more interesting if we figure out how to get rid > of FPWs... Well, in my mind it's an established fact that our WAL replay uses far too little of the available I/O speed. I guess if the system is generating little WAL, then this change will show no benefit, but that's not the kind of system that cares about this anyway -- for the others, the parallelisation gains will be substantial, I'm sure. > > > From a61b4e00c42ace5db1608e02165f89094bf86391 Mon Sep 17 00:00:00 2001 > > > From: Thomas Munro <thomas.mu...@gmail.com> > > > Date: Tue, 3 Dec 2019 17:13:40 +1300 > > > Subject: [PATCH 1/5] Allow PrefetchBuffer() to be called with a > > > SMgrRelation. > > > > > > Previously a Relation was required, but it's annoying to have > > > to create a "fake" one in recovery. > While staring at this, I decided that SharedPrefetchBuffer() was a > weird word order, so I changed it to PrefetchSharedBuffer(). Then, by > analogy, I figured I should also change the pre-existing function > LocalPrefetchBuffer() to PrefetchLocalBuffer(). Do you think this is > an improvement? Looks good. I doubt you'll break anything by renaming that routine. > > > From acbff1444d0acce71b0218ce083df03992af1581 Mon Sep 17 00:00:00 2001 > > > From: Thomas Munro <tmu...@postgresql.org> > > > Date: Mon, 9 Dec 2019 17:10:17 +1300 > > > Subject: [PATCH 2/5] Rename GetWalRcvWriteRecPtr() to > > > GetWalRcvFlushRecPtr(). > > > > > > The new name better reflects the fact that the value it returns > > > is updated only when received data has been flushed to disk. > > > > > > An upcoming patch will make use of the latest data that was > > > written without waiting for it to be flushed, so use more > > > precise function names. > > > > Ugh. (Not for your patch -- I mean for the existing naming convention). > > It would make sense to rename WalRcvData->receivedUpto in this commit, > > maybe to flushedUpto. > > Ok, I renamed that variable and a related one. There are more things > you could rename if you pull on that thread some more, including > pg_stat_wal_receiver's received_lsn column, but I didn't do that in > this patch. +1 for that approach. Maybe we'll want to rename the SQL-visible name, but I wouldn't burden this patch with that, lest we lose the entire series to that :-) > > > + pg_atomic_uint64 writtenUpto; > > > > Are we already using uint64s for XLogRecPtrs anywhere? This seems > > novel. Given this, I wonder if the comment near "mutex" needs an > > update ("except where atomics are used"), or perhaps just move the > > member to after the line with mutex. > > Moved. LGTM. > We use [u]int64 in various places in the replication code. Ideally > I'd have a magic way to say atomic<XLogRecPtr> so I didn't have to > assume that pg_atomic_uint64 is the right atomic integer width and > signedness, but here we are. In dsa.h I made a special typedef for > the atomic version of something else, but that's because the size of > that thing varied depending on the build, whereas our LSNs are of a > fixed width that ought to be en... <trails off>. Let's rewrite Postgres in Rust ... > > I didn't understand the purpose of inc_counter() as written. Why not > > just pg_atomic_fetch_add_u64(..., 1)? > > I didn't want counters that wrap at ~4 billion, but I did want to be > able to read and write concurrently without tearing. Instructions > like "lock xadd" would provide more guarantees that I don't need, > since only one thread is doing all the writing and there's no ordering > requirement. It's basically just counter++, but some platforms need a > spinlock to perform atomic read and write of 64 bit wide numbers, so > more hoop jumping is required. Ah, I see, you don't want lock xadd ... That's non-obvious. I suppose the function could use more commentary on *why* you're doing it that way then. > > > /* > > > * smgrprefetch() -- Initiate asynchronous read of the specified block > > > of a relation. > > > + * > > > + * In recovery only, this can return false to indicate that a > > > file > > > + * doesn't exist (presumably it has been dropped by a later WAL > > > + * record). > > > */ > > > -void > > > +bool > > > smgrprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum) > > > > I think this API, where the behavior of a low-level module changes > > depending on InRecovery, is confusingly crazy. I'd rather have the > > callers specifying whether they're OK with a file that doesn't exist. > > Hmm. But... md.c has other code like that. It's true that I'm adding > InRecovery awareness to a function that didn't previously have it, but > that's just because we previously had no reason to prefetch stuff in > recovery. True. I'm uncomfortable about it anyway. I also noticed that _mdfd_getseg() already has InRecovery-specific behavior flags. Clearly that ship has sailed. Consider my objection^W comment withdrawn. > > Umm, I would keep the return values of both these functions in sync. > > It's really strange that PrefetchBuffer does not return > > PrefetchBufferResult, don't you think? > > Agreed, and changed. I suspect that other users of the main > PrefetchBuffer() call will eventually want that, to do a better job of > keeping the request queue full, for example bitmap heap scan and > (hypothetical) btree scan with prefetch. LGTM. As before, I didn't get to reading 0005 in depth. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services