Good day, Yura!

On 1/10/2025 8:42 PM, Yura Sokolov wrote:
If you consider hash-table fillrate, than 256 is quite enough for 128 concurrent inserters.

The profile of your patch didn't show significant hotspots in the hash table functions, so I believe the 256 entries should be enough.


I will soon try to evaluate the performance impact of your patch on my device with the TPCC benchmark and also profile it to see if there are any changes that could be made to further improve it.

It would be great. On my notebook (Mac Air M1) I don't see any benefits neither from mine, nor from yours patch )) My colleague will also test it on 20 core virtual machine (but backported to v15).


I've tested the performance impact of our patches on an Intel Sapphire Rapids device with 480 vCPUs using a HammerDB TPC-C workload (256 VUs). The results show a 72.3% improvement (average of 3 rounds, RSD: 1.5%) with your patch and a 76.0% boost (average of 3 rounds, RSD: 2.95%) with mine, applied to the latest codebase. This optimization is most effective on systems with over 64 cores, as our core-scaling experiments suggest minimal impact on lower-core setups like your notebook or a 20-core VM.


BTW, do you have a plan to merge this patch to the master branch? Thanks!

I'm not committer )) We are both will struggle to make something committed for many months ;-)

BTW, your version could make alike trick for guaranteed atomicity:
- change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset` and store offset to prev record's start.

Since there are two limits:

     #define XLogRecordMaxSize    (1020 * 1024 * 1024)
     #define WalSegMaxSize 1024 * 1024 * 1024

offset to previous record could not be larger than 2GB.

Yes, it is format change, that some backup utilities will have to adopt.
But it saves 4 bytes in XLogRecord (that could be spent to store FullTransactionId instead of TransactionId) and it is better compressible.

And your version than will not need the case when this value is split among two buffers (since MAXALIGN is not less than 4), and PostgreSQL already relies on 4 byte read/write atomicity (in some places even without use of pg_atomic_uint32).

----

regards
Sokolov Yura aka funny-falcon

Thanks for the great suggestion!

I think we've arrived at a critical juncture where we need to decide which patch to move forward with for our optimization efforts. I've evaluated the pros and cons of my implementation:

Pros:
-Achieves an additional 4% performance improvement.

Cons:
-Breaks the current convention of XLog insertions.
-TAP tests are not fully passed, requiring time to resolve.
-May necessitate changes to the format and backup tools, potentially leading to backward compatibility issues.

Given these considerations, I believe your implementation is superior to mine. I'd greatly appreciate it if you could share your insights on this matter.

Regards,
Zhiguo



Reply via email to