On 03/30/2016 07:09 PM, Andres Freund wrote:
Yes. That looks good. My testing shows that increasing the number of
buffers can increase both throughput and reduce latency variance. The
former is a smaller effect with one of the discussed patches applied,
the latter seems to actually increase in scale (with increased
I've attached patches to:
0001: Increase the max number of clog buffers
0002: Implement 64bit atomics fallback and optimize read/write
0003: Edited version of Simon's clog scalability patch
WRT 0003 - still clearly WIP - I've:
- made group_lsn pg_atomic_u64*, to allow for tear-free reads
- split content from IO lock
- made SimpleLruReadPage_optShared always return with only share lock
- Implement a different, experimental, concurrency model for
SetStatusBit using cmpxchg. A define USE_CONTENT_LOCK controls which
bit is used.
I've tested this and saw this outperform Amit's approach. Especially so
when using a read/write mix, rather then only reads. I saw over 30%
increase on a large EC2 instance with -btpcb-like@1 -bselect-only@3. But
that's in a virtualized environment, not very good for reproducability.
Amit, could you run benchmarks on your bigger hardware? Both with
USE_CONTENT_LOCK commented out and in?
I think we should go for 1) and 2) unconditionally. And then evaluate
whether to go with your, or 3) from above. If the latter, we've to do
some cleanup :)
I have been testing Amit's patch in various setups and work loads, with
up to 400 connections on a 2 x Xeon E5-2683 (28C/56T @ 2 GHz), not
seeing an improvement, but no regression either.
Testing with 0001 and 0002 do show up to a 5% improvement when using a
HDD for data + wal - about 1% when using 2 x RAID10 SSD - unlogged.
I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.
Thanks for your work on this !
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: