On 09/14/2016 06:04 PM, Dilip Kumar wrote:
On Wed, Sep 14, 2016 at 8:59 PM, Robert Haas <robertmh...@gmail.com> wrote:Sure, but you're testing at *really* high client counts here. Almost nobody is going to benefit from a 5% improvement at 256 clients.I agree with your point, but here we need to consider one more thing, that on head we are gaining ~30% with both the approaches. So for comparing these two patches we can consider.. A. Other workloads (one can be as below) -> Load on CLogControlLock at commit (exclusive mode) + Load on CLogControlLock at Transaction status (shared mode). I think we can mix (savepoint + updates) B. Simplicity of the patch (if both are performing almost equal in all practical scenarios). C. Bases on algorithm whichever seems winner. I will try to test these patches with other workloads...You need to test 64 clients and 32 clients and 16 clients and 8 clients and see what happens there. Those cases are a lot more likely than these stratospheric client counts.I tested with 64 clients as well.. 1. On head we are gaining ~15% with both the patches. 2. But group lock vs granular lock is almost same.
I've been doing some testing too, but I haven't managed to measure any significant difference between master and any of the patches. Not sure why, I've repeated the test from scratch to make sure I haven't done anything stupid, but I got the same results (which is one of the main reasons why the testing took me so long).
Attached is an archive with a script running the benchmark (including SQL scripts generating the data and custom transaction for pgbench), and results in a CSV format.
The benchmark is fairly simple - for each case (master + 3 different patches) we do 10 runs, 5 minutes each, for 32, 64, 128 and 192 clients (the machine has 32 physical cores).
The transaction is using a single unlogged table initialized like this: create unlogged table t(id int, val int); insert into t select i, i from generate_series(1,100000) s(i); vacuum t; create index on t(id);(I've also ran it with 100M rows, called "large" in the results), and pgbench is running this transaction:
\set id random(1, 100000) BEGIN; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s1; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s2; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s3; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s4; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s5; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s6; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s7; UPDATE t SET val = val + 1 WHERE id = :id; SAVEPOINT s8; COMMIT;So 8 simple UPDATEs interleaved by savepoints. The benchmark was running on a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large SSD array. I'd done some basic tuning on the system, most importantly:
effective_io_concurrency = 32 work_mem = 512MB maintenance_work_mem = 512MB max_connections = 300 checkpoint_completion_target = 0.9 checkpoint_timeout = 3600 max_wal_size = 128GB min_wal_size = 16GB shared_buffers = 16GBAlthough most of the changes probably does not matter much for unlogged tables (I planned to see how this affects regular tables, but as I see no difference for unlogged ones, I haven't done that yet).
So the question is why Dilip sees +30% improvement, while my results are almost exactly the same. Looking at Dilip's benchmark, I see he only ran the test for 10 seconds, and I'm not sure how many runs he did, warmup etc. Dilip, can you provide additional info?
I'll ask someone else to redo the benchmark after the weekend to make sure it's not actually some stupid mistake of mine.
regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
clog.tgz
Description: application/compressed-tar
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers