Re: [RFC] Lock-free XLog Reservation from WAL

Zhou, Zhiguo Tue, 14 Jan 2025 06:57:38 -0800

Good day, Yura!

On 1/10/2025 8:42 PM, Yura Sokolov wrote:

If you consider hash-table fillrate, than 256 is quite enough for 128concurrent inserters.

The profile of your patch didn't show significant hotspots in the hashtable functions, so I believe the 256 entries should be enough.

I will soon try to evaluate the performance impact of your patch on mydevice with the TPCC benchmark and also profile it to see if there areany changes that could be made to further improve it.
It would be great. On my notebook (Mac Air M1) I don't see any benefitsneither from mine, nor from yours patch ))My colleague will also test it on 20 core virtual machine (butbackported to v15).

I've tested the performance impact of our patches on an Intel SapphireRapids device with 480 vCPUs using a HammerDB TPC-C workload (256 VUs).The results show a 72.3% improvement (average of 3 rounds, RSD: 1.5%)with your patch and a 76.0% boost (average of 3 rounds, RSD: 2.95%) withmine, applied to the latest codebase. This optimization is mosteffective on systems with over 64 cores, as our core-scaling experimentssuggest minimal impact on lower-core setups like your notebook or a20-core VM.

BTW, do you have a plan to merge this patch to the master branch? Thanks!
I'm not committer )) We are both will struggle to make somethingcommitted for many months ;-)
BTW, your version could make alike trick for guaranteed atomicity:
- change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset`and store offset to prev record's start.
Since there are two limits:

     #define XLogRecordMaxSize    (1020 * 1024 * 1024)
     #define WalSegMaxSize 1024 * 1024 * 1024

offset to previous record could not be larger than 2GB.

Yes, it is format change, that some backup utilities will have to adopt.
But it saves 4 bytes in XLogRecord (that could be spent to storeFullTransactionId instead of TransactionId) and it is better compressible.
And your version than will not need the case when this value is splitamong two buffers (since MAXALIGN is not less than 4), and PostgreSQLalready relies on 4 byte read/write atomicity (in some places evenwithout use of pg_atomic_uint32).
----

regards
Sokolov Yura aka funny-falcon


Thanks for the great suggestion!

I think we've arrived at a critical juncture where we need to decidewhich patch to move forward with for our optimization efforts. I'veevaluated the pros and cons of my implementation:


Pros:
-Achieves an additional 4% performance improvement.

Cons:
-Breaks the current convention of XLog insertions.
-TAP tests are not fully passed, requiring time to resolve.

-May necessitate changes to the format and backup tools, potentiallyleading to backward compatibility issues.

Given these considerations, I believe your implementation is superior tomine. I'd greatly appreciate it if you could share your insights on thismatter.


Regards,
Zhiguo

Re: [RFC] Lock-free XLog Reservation from WAL

Reply via email to