On Thu, 26 Nov 2020 at 10:55, Krunal Bauskar <[email protected]> wrote: > Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for > server and 8 for client) [2 numa nodes] > Storage: 3.2 TB NVMe SSD > OS: CentOS Linux release 7.6 > PGSQL: baseline = Release Tag 13.1 > Invocation suite: > https://github.com/mysqlonarm/benchmark-suites/tree/master/pgsql-pbench (Uses > pgbench)
Using the same hardware, attached are my improvement figures, which are pretty much in line with your figures. Except that, I did not run for more than 400 number of clients. And, I am getting some improvement even for select-only workloads, in case of 200-400 clients. For read-write load, I had seen that the s_lock() contention was caused when the XLogFlush() uses the spinlock. But for read-only case, I have not analyzed where the improvement occurred. The .png files in the attached tar have the graphs for head versus patch. The GUCs that I changed : work_mem=64MB shared_buffers=128GB maintenance_work_mem = 1GB min_wal_size = 20GB max_wal_size = 100GB checkpoint_timeout = 60min checkpoint_completion_target = 0.9 full_page_writes = on synchronous_commit = on effective_io_concurrency = 200 log_checkpoints = on For backends, 64 CPUs were allotted (covering 2 NUMA nodes) , and for pgbench clients a separate set of 28 CPUs were allotted on a different socket. Server was pre_warmed().
results.tar.gz
Description: application/gzip
