Re: [RFC] Lock-free XLog Reservation from WAL

Yura Sokolov Thu, 16 Jan 2025 06:01:17 -0800

14.01.2025 17:49, Zhou, Zhiguo пишет:

Good day, Yura!
On 1/10/2025 8:42 PM, Yura Sokolov wrote:
If you consider hash-table fillrate, than 256 is quite enough for 128concurrent inserters.
The profile of your patch didn't show significant hotspots in the hashtable functions, so I believe the 256 entries should be enough.
I will soon try to evaluate the performance impact of your patch onmy device with the TPCC benchmark and also profile it to see if thereare any changes that could be made to further improve it.
It would be great. On my notebook (Mac Air M1) I don't see anybenefits neither from mine, nor from yours patch ))My colleague will also test it on 20 core virtual machine (butbackported to v15).
I've tested the performance impact of our patches on an Intel SapphireRapids device with 480 vCPUs using a HammerDB TPC-C workload (256 VUs).The results show a 72.3% improvement (average of 3 rounds, RSD: 1.5%)with your patch and a 76.0% boost (average of 3 rounds, RSD: 2.95%) withmine, applied to the latest codebase. This optimization is mosteffective on systems with over 64 cores, as our core-scaling experimentssuggest minimal impact on lower-core setups like your notebook or a 20-core VM.
BTW, do you have a plan to merge this patch to the master branch?Thanks!
I'm not committer )) We are both will struggle to make somethingcommitted for many months ;-)
BTW, your version could make alike trick for guaranteed atomicity:
- change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset`and store offset to prev record's start.
Since there are two limits:

     #define XLogRecordMaxSize    (1020 * 1024 * 1024)
     #define WalSegMaxSize 1024 * 1024 * 1024

offset to previous record could not be larger than 2GB.

Yes, it is format change, that some backup utilities will have to adopt.
But it saves 4 bytes in XLogRecord (that could be spent to storeFullTransactionId instead of TransactionId) and it is bettercompressible.
And your version than will not need the case when this value is splitamong two buffers (since MAXALIGN is not less than 4), and PostgreSQLalready relies on 4 byte read/write atomicity (in some places evenwithout use of pg_atomic_uint32).
----

regards
Sokolov Yura aka funny-falcon
Thanks for the great suggestion!
I think we've arrived at a critical juncture where we need to decidewhich patch to move forward with for our optimization efforts. I'veevaluated the pros and cons of my implementation:
Pros:
-Achieves an additional 4% performance improvement.

Cons:
-Breaks the current convention of XLog insertions.
-TAP tests are not fully passed, requiring time to resolve.
-May necessitate changes to the format and backup tools, potentiallyleading to backward compatibility issues.
Given these considerations, I believe your implementation is superior tomine. I'd greatly appreciate it if you could share your insights on thismatter.


Good day, Zhiguo.

Excuse me, I feel sneaky a bit, but I've started another thread justabout increase of NUM_XLOGINSERT_LOCK, because I can measure its effecteven on my working notebook (it is another one: Ryzen 5825U limited to@2GHz).


http://postgr.es/m/flat/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru

-----

regards
Yura Sokolov aka funny-falcon

Re: [RFC] Lock-free XLog Reservation from WAL

Reply via email to