On 09/14/2016 06:04 PM, Dilip Kumar wrote:
On Wed, Sep 14, 2016 at 8:59 PM, Robert Haas <robertmh...@gmail.com> wrote:
Sure, but you're testing at *really* high client counts here.  Almost
nobody is going to benefit from a 5% improvement at 256 clients.

I agree with your point, but here we need to consider one more thing,
that on head we are gaining ~30% with both the approaches.

So for comparing these two patches we can consider..

A.  Other workloads (one can be as below)
   -> Load on CLogControlLock at commit (exclusive mode) + Load on
CLogControlLock at Transaction status (shared mode).
   I think we can mix (savepoint + updates)

B. Simplicity of the patch (if both are performing almost equal in all
practical scenarios).

C. Bases on algorithm whichever seems winner.

I will try to test these patches with other workloads...

need to test 64 clients and 32 clients and 16 clients and 8 clients
and see what happens there.  Those cases are a lot more likely than
these stratospheric client counts.

I tested with 64 clients as well..
1. On head we are gaining ~15% with both the patches.
2. But group lock vs granular lock is almost same.

I've been doing some testing too, but I haven't managed to measure any significant difference between master and any of the patches. Not sure why, I've repeated the test from scratch to make sure I haven't done anything stupid, but I got the same results (which is one of the main reasons why the testing took me so long).

Attached is an archive with a script running the benchmark (including SQL scripts generating the data and custom transaction for pgbench), and results in a CSV format.

The benchmark is fairly simple - for each case (master + 3 different patches) we do 10 runs, 5 minutes each, for 32, 64, 128 and 192 clients (the machine has 32 physical cores).

The transaction is using a single unlogged table initialized like this:

    create unlogged table t(id int, val int);
    insert into t select i, i from generate_series(1,100000) s(i);
    vacuum t;
    create index on t(id);

(I've also ran it with 100M rows, called "large" in the results), and pgbench is running this transaction:

    \set id random(1, 100000)

    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;
    UPDATE t SET val = val + 1 WHERE id = :id;

So 8 simple UPDATEs interleaved by savepoints. The benchmark was running on a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large SSD array. I'd done some basic tuning on the system, most importantly:

    effective_io_concurrency = 32
    work_mem = 512MB
    maintenance_work_mem = 512MB
    max_connections = 300
    checkpoint_completion_target = 0.9
    checkpoint_timeout = 3600
    max_wal_size = 128GB
    min_wal_size = 16GB
    shared_buffers = 16GB

Although most of the changes probably does not matter much for unlogged tables (I planned to see how this affects regular tables, but as I see no difference for unlogged ones, I haven't done that yet).

So the question is why Dilip sees +30% improvement, while my results are almost exactly the same. Looking at Dilip's benchmark, I see he only ran the test for 10 seconds, and I'm not sure how many runs he did, warmup etc. Dilip, can you provide additional info?

I'll ask someone else to redo the benchmark after the weekend to make sure it's not actually some stupid mistake of mine.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: clog.tgz
Description: application/compressed-tar

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to