Hi, Zhigou
Thanks for the patch. On Thu, 02 Jan 2025 at 09:14, "Zhou, Zhiguo" <zhiguo.z...@intel.com> wrote: > Hi all, > > I am reaching out to solicit your insights and comments on a recent > proposal regarding the "Lock-free XLog Reservation from WAL." We have > identified some challenges with the current WAL insertions, which > require space reservations in the WAL buffer which involve updating > two shared-memory statuses in XLogCtlInsert: CurrBytePos (the start > position of the current XLog) and PrevBytePos (the prev-link to the > previous XLog). Currently, the use of XLogCtlInsert.insertpos_lck > ensures consistency but introduces lock contention, hindering the > parallelism of XLog insertions. > > To address this issue, we propose the following changes: > > 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos (a > single uint64 field) to be implemented with an atomic operation (fetch_add). > 2. Updating Prev-Link of next XLog: Based on the fact that the > prev-link of the next XLog always points to the head of the current > Xlog,we will slightly exceed the reserved memory range of the current > XLog to update the prev-link of the next XLog, regardless of which > backend acquires the next memory space. The next XLog inserter will > wait until its prev-link is updated for CRC calculation before > starting its own XLog copy into the WAL. > 3. Breaking Sequential Write Convention: Each backend will update the > prev-link of its next XLog first, then return to the header position > for the current log insertion. This change will reduce the dependency > of XLog writes on previous ones (compared with the sequential writes). > 4. Revised GetXLogBuffer: To support #3, we need update this function to > separate the LSN it intends to access from the LSN it expects to update in > the insertingAt field. > 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing > NUM_XLOGINSERT_LOCKS, for example to 128, could effectively enhance the > parallelism. > > The attached patch could pass the regression tests (make check, make > check-world), and in the performance test of this POC on SPR (480 > vCPU) shows that this optimization could help the TPCC benchmark > better scale with the core count and as a result the performance with > full cores enabled could be improved by 2.04x. > I tried it and found it cannot pass all test-cases. Here is the output comes from Rocky 9.5. mkdir build && cd build ../configure \ --prefix=/tmp/pg \ --enable-tap-tests \ --enable-debug \ --enable-cassert \ --enable-depend \ --enable-dtrace \ --with-icu \ --with-openssl \ --with-python \ --with-libxml \ --with-libxslt \ --with-lz4 \ --with-pam \ CFLAGS='-Wmissing-prototypes -Wincompatible-pointer-types' make -j $(nproc) -s && make install -s (cd contrib/ && make -j $(nproc) -s && make install -s) make check-world The error as follows: echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/japin/postgresql/build/src/test/recovery'/tmp_check && /usr/bin/mkdir -p '/home/japin/postgresql/build/src/test/recovery'/tmp_check && cd /home/japin/postgresql/build/../src/test/recovery && TESTLOGDIR='/home/japin/postgresql/build/src/test/recovery/tmp_check/log' TESTDATADIR='/home/japin/postgresql/build/src/test/recovery/tmp_check' PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/bin:/home/japin/postgresql/build/src/test/recovery:$PATH" LD_LIBRARY_PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/lib" INITDB_TEMPLATE='/home/japin/postgresql/build'/tmp_install/initdb-template PGPORT='65432' top_builddir='/home/japin/postgresql/build/src/test/recovery/../../..' PG_REGRESS='/home/japin/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/japin/postgresql/build/../src/test/perl/ -I /home/japin/postgresql/build/../src/test/recovery t/*.pl # +++ tap check in src/test/recovery +++ t/001_stream_rep.pl ................... ok t/002_archiving.pl .................... 1/? Bailout called. Further testing stopped: command "pg_ctl -D /home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata -l /home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.log promote" exited with value 1 FAILED--Further testing stopped: command "pg_ctl -D /home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata -l /home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.log promote" exited with value 1 make[2]: *** [Makefile:28: check] Error 255 make[2]: Leaving directory '/home/japin/postgresql/build/src/test/recovery' make[1]: *** [Makefile:42: check-recovery-recurse] Error 2 make[1]: Leaving directory '/home/japin/postgresql/build/src/test' make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2 -- Regrads, Japin Li