Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-23 Thread Amit Kapila
On Thu, Jun 22, 2017 at 6:54 AM, Andres Freund  wrote:
> On 2017-06-21 00:57:32 -0700, jasrajd wrote:
>> We are also seeing contention on the walwritelock and repeated writes to the
>> same offset if we move the flush outside the lock in the Azure environment.
>> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
>> bandwidth. Is there more work being done in this area?
>

That should not happen if the writes from various backends are
combined in some way.  However, it is not very clear what exactly you
have done as part of taking flush calls out of walwritelock.  Can you
share patch or some details about how you have done it and how have
you measured the contention you are seeing?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-22 Thread Sokolov Yura

On 2017-06-22 04:16, Michael Paquier wrote:

On Wed, Jun 21, 2017 at 4:57 PM, jasrajd  wrote:
We are also seeing contention on the walwritelock and repeated writes 
to the
same offset if we move the flush outside the lock in the Azure 
environment.

pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
bandwidth. Is there more work being done in this area?


As of now, there is no patch in the development queue for Postgres 11
that is dedicated to this particularly lock contention. There is a
patch for LWlocks in general with power PC, but that's all:
https://commitfest.postgresql.org/14/984/

Not sure if Kuntal has plans to submit again this patch. It is
actually a bit sad to not see things moving on and use an approach to
group flushes.
--
Michael


There is also patch against LWLock degradation on NUMA :
https://commitfest.postgresql.org/14/1166/

But they are both about LWLock itself, and not its usage.

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Jun 22, 2017 at 6:46 AM, Michael Paquier
 wrote:
> On Wed, Jun 21, 2017 at 4:57 PM, jasrajd  wrote:
>> We are also seeing contention on the walwritelock and repeated writes to the
>> same offset if we move the flush outside the lock in the Azure environment.
>> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
>> bandwidth. Is there more work being done in this area?
>
> As of now, there is no patch in the development queue for Postgres 11
> that is dedicated to this particularly lock contention. There is a
> patch for LWlocks in general with power PC, but that's all:
> https://commitfest.postgresql.org/14/984/
>
> Not sure if Kuntal has plans to submit again this patch. It is
> actually a bit sad to not see things moving on and use an approach to
> group flushes.
As of now, I've no plans to re-submit the patch. Actually, I'm not
sure what I should try next. I would love to get some advice/direction
regarding this.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Dec 22, 2016 at 11:29 PM, Tomas Vondra
 wrote:
>
> How do these counts compare to the other wait events? For example
> CLogControlLock, which is what Amit's patch [1] is about?
>
> [1]
> https://www.postgresql.org/message-id/flat/84c22fbb-b9c4-a02f-384b-b4feb2c67193%402ndquadrant.com
>
Hello Tomas,

I'm really sorry for this late reply. I've somehow missed the thread.
Actually, I've seen some performance improvement with the
CLogControlLock patch. But, then it turned out all the improvements
were because of the CLogControlLock patch alone.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Andres Freund
On 2017-06-21 00:57:32 -0700, jasrajd wrote:
> We are also seeing contention on the walwritelock and repeated writes to the
> same offset if we move the flush outside the lock in the Azure environment.
> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
> bandwidth. Is there more work being done in this area?

I kind of doubt that scalability limit is directly related to this patch
- I've seen postgres scale furhter without that lock becoming the prime
issue.  What exactly are you measuring / observing?

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Michael Paquier
On Wed, Jun 21, 2017 at 4:57 PM, jasrajd  wrote:
> We are also seeing contention on the walwritelock and repeated writes to the
> same offset if we move the flush outside the lock in the Azure environment.
> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
> bandwidth. Is there more work being done in this area?

As of now, there is no patch in the development queue for Postgres 11
that is dedicated to this particularly lock contention. There is a
patch for LWlocks in general with power PC, but that's all:
https://commitfest.postgresql.org/14/984/

Not sure if Kuntal has plans to submit again this patch. It is
actually a bit sad to not see things moving on and use an approach to
group flushes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread jasrajd
We are also seeing contention on the walwritelock and repeated writes to the
same offset if we move the flush outside the lock in the Azure environment.
pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
bandwidth. Is there more work being done in this area?



--
View this message in context: 
http://www.postgresql-archive.org/An-attempt-to-reduce-WALWriteLock-contention-tp5935907p5967786.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2016-12-22 Thread Tomas Vondra

On 12/22/2016 04:00 PM, Kuntal Ghosh wrote:

Hello all,

>

...

>

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid();
\watch 0.5
HEAD

48642 LWLockNamed | WALWriteLock

With Patch
--
31889 LWLockNamed | WALFlushLock
25212 LWLockNamed | WALWriteLock



How do these counts compare to the other wait events? For example 
CLogControlLock, which is what Amit's patch [1] is about?


[1] 
https://www.postgresql.org/message-id/flat/84c22fbb-b9c4-a02f-384b-b4feb2c67193%402ndquadrant.com


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] An attempt to reduce WALWriteLock contention

2016-12-22 Thread Kuntal Ghosh
Hello all,

In a recent post[1] by Robert, wait events for different LWLOCKS have been
analyzed. The results clearly indicate a significant lock contention
overhead on WAL Write locks. To get an idea of this overhead, we did the
following two tests.

1. Hacked the code to comment out WAL write and flush calls to see the
overhead of WAL writing. The TPS for read-write pgbench tests at 300 scale
factor with 64 client count increased from 27871 to 45068.

2. Hacked the code to comment out WAL flush calls to see the overhead of
WAL flushing (fsync=off). The TPS for read-write pgbench tests at 300 scale
factor with 64 client count increased from 27871 to 41835.

All the tests have been performed for 15 minutes with following pg
configurations:
max_wal_size: 40GB
checkpoint_timeout: 15mins
maintenance_work_mem: 4GB
checkpoint_completion_target: 0.9
Shared buffer: 8GB
(Other settings have default values)

>From above experiments, it is clear that flush is the main cost in WAL
writing which is no surprise, but still, the above data shows the exact
overhead of flush. Robert and Amit suggested (in offline discussions) using
separate WALFlushLock to flush the WAL data. The idea is to take WAL flush
calls out of WAL Write Lock and introduce a new lock (WAL Flush Lock) to
flush the data. This should allow simultaneous os writes when a fsync is in
progress. LWLockAcquireOrWait is used for the newly introduced WAL Flush
Lock to accumulate flush calls. We did a pgbench read/write (s.f. 300) test
with above configurations for various clients. But, we didn't see any
performance improvements, rather it decreased by 10%-12%. Hence to measure
the wait events, we performed a run for 30 minutes with 64 clients.

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid();
\watch 0.5
HEAD

48642 LWLockNamed | WALWriteLock

With Patch
--
31889 LWLockNamed | WALFlushLock
25212 LWLockNamed | WALWriteLock

The contention on WAL Write Lock was reduced, but together with WAL Flush
lock, the total contention got increased. We also measured the number of
times fsync() and write() have been called for a 10-minutes pgbench
read/write test with 16 clients. We noticed a huge increase in write()
system calls and this is happening as we've reduced the contention on WAL
Write Lock.

Due to reduced contention on WAL Write Lock, lot of backends are going for
small os writes, sometimes on same 8KB page, i.e., write calls are not
properly accumulated. For example,
backend 1 - 1 KB write() - 15-20 micro secs
backend 2 - 1 KB write() - 15-20 micro secs
backend 3 - 1 KB write() - 15-20 micro secs
backend 4 - 1 KB write() - 15-20 micro secs
But, if we accumulate these 4 requests, 4KB can be written in 50-60 micro
secs. Apart from that, we are also paying for lock acquire and lock release
for os write and lseek(). For the same reason, when a fsync is going, we
are not able to accumulate sufficient data for next fsync. This also
increases the contention on WAL Flush Lock. So, we tried adding
delay(pg_usleep) before flush/write to accumulate data. But, this severely
increases the contention on WAL flush locks.

To reduce the contention on WAL Write Lock further, Amit suggested the
following change on top of the existing patch:
Backend as Write Leader:
Except one proc, all other proc's will wait for their write location to be
written in OS buffer. Each proc will advertise it's write location and wait
on the semaphore to check whether it's write location has been completed.
Only the leader will compete for WALWriteLock.After data is written, it
wakes all the procs for which it has written the WAL and once done with
waking it will release the WALWriteLock. Ashutosh and Amit have helped a
lot for the implementation of the above idea.  Even after this idea, we
didn't see any noticeable performance improvement with
synchronous_commit=on mode, however there was no regression. Again, to
measure the wait events, we performed a 30 minutes run with 64 clients.
(pgbench r/w test with s.f. 300)

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid();
\watch 0.5
HEAD

48642  LWLockNamed | WALWriteLock

With Patch
--
38952 LWLockNamed | WALFlushLock
1679 LWLockNamed | WALWriteLock

We reduced the contention on WAL write locks. The reason is that only the
group leader is competing for write lock on behalf of a group of procs.
Still, the number of small write requests is not reduced.

Finally, we performed some tests with synchronous_commit=off and data
doesn't fit in shared buffer. This should accumulate the data properly for
write without waiting on some locks or semaphores. Besides, write and fsync
can be done simultaneously. Next results are for various scale factors and
shared buffers. (Please see below for system configuration):

./pgbench -c $threads -j $threads -T 900 -M prepared postgres
n