Hi hackers,

I happen to find a case where a transaction involves dropping a relation
file, and a relation happens to be empty, the WAL recovery will waste
a lot of time on scanning the whole buffer pool.

In DropRelationBuffers/DropRelationsAllBuffers, we will take fast path
to avoid scanning whole buffer pool, if the buffers to be invalidated
are less than typically 1/32 of buffer pool. However, since no more WAL
records are referencing an empty relation, if we want to drop it, startup
process will never initialize smgr_cached_nblocks for that relation,
causing fallback to slow path.

The situation can be like refreshing a materialized view whose query
returns no row, or truncating a table and then loading it but the query
returns no row. It will generate WAL sequence like:

rmgr: Standby     len (rec/tot):     42/    42, tx:        780, lsn: 
0/41238520, prev 0/41238490, desc: LOCK xid 780 db 16384 rel 16416
rmgr: Storage     len (rec/tot):     42/    42, tx:        780, lsn: 
0/412385F0, prev 0/412385C0, desc: CREATE base/16384/16416
...
rmgr: Transaction len (rec/tot):   1073/  1073, tx:        783, lsn: 
0/4136BF70, prev 0/41369F60, desc: COMMIT 2026-01-27 08:07:49.953301 UTC; rels: 
base/16384/16416

This makes me think about initializing the cached nblocks to zero when
redo CREATE record. Note that the 0-size may not correct, because the
relation may have been zeroextended before server crash, and recovery
starts from an earlier checkpoint. But we can correct that by invalidating
the cached value when we really meet a WAL record referencing a block of
that relation, so that a later smgrnblocks() will fill-in the correct
file length. It can avoid extending the relation in the middle of a file
by mistake.

Quickly proposed a patch for it, with a Perl program for judging the
performance difference.

In my environment with 16GB shared buffers, -O2 optimized, debugging off,
I've run a test of 10 client * 500 transactions using such workload, made
the server crash and poll it until recovery is done. The CPU time:

w/o patch: CPU user: 77.58 s, system: 0.27 s
patched:   CPU user: 0.14 s,  system: 0.09 s

Without the patch, the CPU is wasted on BufTagMatchesRelFileLocator:

  Overhead  Command   Shared Object       Symbol
-   98.23%  postgres  postgres            [.] DropRelationsAllBuffers
     98.21% DropRelationsAllBuffers
        smgrdounlinkall
        DropRelationFiles
        xact_redo_commit
        xact_redo
        PerformWalRecovery
        StartupXLOG
        StartupProcessMain
        postmaster_child_launch
        StartChildProcess
        PostmasterStateMachine
        ServerLoop.isra.0
     0.06%  postgres  postgres            [.] hash_search_with_hash_value

Not sure if there will be any problem of correctness. Also, not sure about
what to do about INIT fork.

Any thoughts?

--
Regards, Jingtang


Attachment: 0001-Optimize-CPU-usage-of-dropping-buffers-during-recove.patch
Description: Binary data

Reply via email to