On Tue, 11 Jun 2024 at 04:01, Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Fri, Jun 07, 2024 at 11:10:25AM +0200, Matthias van de Meent wrote: > > My primary concern isn't the IO, but the O(shared_buffers) that we > > have to go through during a checkpoint. As I mentioned upthread, it is > > reasonably possible the new cluster is already setup with a good > > fraction of the old system's shared_buffers configured. Every > > checkpoint has to scan all those buffers, which IMV can get (much) > > more expensive than the IO overhead caused by the WAL_LOG strategy. It > > may be a baseless fear as I haven't done the performance benchmarks > > for this, but I wouldn't be surprised if shared_buffers=8GB would > > measurably impact the upgrade performance in the current patch (vs the > > default 128MB). > > I did a handful of benchmarks on an r5.24xlarge that seem to prove your > point. The following are the durations of the pg_restore step of > pg_upgrade: > > * 10k empty databases, 128MB shared_buffers > WAL_LOG: 1m 01s > FILE_COPY: 0m 22s > > * 10k empty databases, 100GB shared_buffers > WAL_LOG: 2m 03s > FILE_COPY: 5m 08s > > * 2.5k databases with 10k tables each, 128MB shared_buffers > WAL_LOG: 17m 20s > FILE_COPY: 16m 44s > > * 2.5k databases with 10k tables each, 100GB shared_buffers > WAL_LOG: 16m 39s > FILE_COPY: 15m 21s > > I was surprised with the last result, but there's enough other stuff > happening during such a test that I hesitate to conclude much.
If you still have the test data set up, could you test the attached patch (which does skip the checkpoints in FILE_COPY mode during binary upgrades)? >>>> If such a change were implemented (i.e. no checkpoints for FILE_COPY >>>> in binary upgrade, with a single manual checkpoint after restoring >>>> template1 in create_new_objects) I think most of my concerns with this >>>> patch would be alleviated. >>> >>> Yeah, I think that's a valid point. The second checkpoint is to ensure >>> that the XLOG_DBASE_CREATE_FILE_COPY never gets replayed. However, for >>> binary upgrades, we don't need that guarantee because a checkpoint >>> will be performed during shutdown at the end of the upgrade anyway. >> >> Indeed. > > It looks like pg_dump always uses template0, so AFAICT we don't even need > the suggested manual checkpoint after restoring template1. Thanks for reminding me. It seems I misunderstood the reason why we first process template1 in create_new_objects, as I didn't read the comments thoroughly enough. > I do like the idea of skipping a bunch of unnecessary operations in binary > upgrade mode, since it'll help me in my goal of speeding up pg_upgrade. > But I'm a bit hesitant to get too fancy here and to introduce a bunch of > new "if (IsBinaryUpgrade)" checks if the gains in the field won't be all > that exciting. However, we've already sprinkled such checks quite > liberally, so maybe I'm being too cautious... Hmm, yes. From an IO perspective I think this could be an improvement, but let's check the numbers first. Kind regards, Matthias van de Meent
0001-pg_upgrade-use-CREATE-DATABASE-.-STRATEGY-FILE_COPY.patch
Description: Binary data