On Mon, Sep 5, 2016 at 11:34 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> On 09/05/2016 06:03 AM, Amit Kapila wrote:
>>  So, in short we have to compare three
>> approaches here.
>> 1) Group mode to reduce CLOGControlLock contention
>> 2) Use granular locking model
>> 3) Use atomic operations
>> For approach-1, you can use patch [1].  For approach-2, you can use
>> 0001-Improve-64bit-atomics-support patch[2] and the patch attached
>> with this mail.  For approach-3, you can use
>> 0001-Improve-64bit-atomics-support patch[2] and the patch attached
>> with this mail by commenting USE_CONTENT_LOCK.  If the third doesn't
>> work for you then for now we can compare approach-1 and approach-2.
> OK, I can compile all three cases - but onl with gcc 4.7 or newer. Sadly
> the 4-socket 64-core machine runs Debian Jessie with just gcc 4.6 and my
> attempts to update to a newer version were unsuccessful so far.

So which all patches your are able to compile on 4-socket m/c?  I
think it is better to measure the performance on bigger machine.

>> I have done some testing of these patches for read-write pgbench
>> workload and doesn't find big difference.  Now the interesting test
>> case could be to use few sub-transactions (may be 4-8) for each
>> transaction as with that we can see more contention for
>> CLOGControlLock.
> Understood. So a bunch of inserts/updates interleaved by savepoints?


> I presume you started looking into this based on a real-world
> performance issue, right? Would that be a good test case?

I had started looking into it based on LWLOCK_STATS data for
read-write workload (pgbench tpc-b).  I think it will depict many of
the real-world read-write workloads.

>> Few points to note for performance testing, one should use --unlogged
>> tables, else the WAL writing and WALWriteLock contention masks the
>> impact of this patch.  The impact of this patch is visible at
>> higher-client counts (say at 64~128).
> Even on good hardware (say, PCIe SSD storage that can do thousands of
> fsyncs per second)?

Not sure, because it could be masked by WALWriteLock contention.

> Does it then make sense to try optimizing this if
> the effect can only be observed without the WAL overhead (so almost
> never in practice)?

It is not that there is no improvement with WAL overhead (like one can
observe that via LWLOCK_STATS apart from TPS), but it is clearly
visible with unlogged tables.  The situation is not that simple,
because let us say we don't do anything for the remaining contention
for CLOGControlLock, then when we try to reduce the contention around
other locks like WALWriteLock or may be ProcArrayLock, there is a
chance that contention will shift to CLOGControlLock.  So, the basic
idea is to get the big benefits, we need to eliminate contention
around each of the locks.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to