[HACKERS] Re[2]: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
>>The results above are not really fair, pgbouncer.ini was a bit different on >>Ubuntu host (application_name_add_host was disabled). Here are the right >>results with exactly the same configuration: >> >>OS PostgreSQL version TPS Avg. latency >>RHEL 6 9.4 44898 1.425 ms >>RHEL 6 9.5 26199 2.443 ms >>RHEL 6 9.5 43027 1.487 ms >>Ubuntu 14.04 9.4 45971 1.392 ms >>Ubuntu 14.04 9.5 40282 1.589 ms >>Ubuntu 14.04 9.6 45410 1.409 ms >> >>It can be seen that there is a regression for 9.5 in Ubuntu also, but not so >>significant. We first thought that the reason is >>38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for >>PgBackendStatus.st _changecount protocol), but in that case the regression >>should also be seen in 9.6 also. >> >>There also was a bunch of changes in FE/BE communication (like >>387da18874afa17156ee3af63766f17efb53c4b9 or >>98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of >>regression in 9.5 and normal results in 9.6. Probably the right way to find >>the answer is to do bisect. I’ll do it but if some more diagnostics >>information can help, feel free to ask about it. > >Yep, bisect confirms that the first bad commit in REL9_5_STABLE is >387da18874afa17156ee3af63766f17efb53c4b9. Full output is attached. >And bisect for master branch confirms that the situation became much better >after 98a64d0bd713cb89e61bef6432befc4b7b5da59e. Output is also attached. > >On Ubuntu performance degradation is ~15% and on RHEL it is ~100%. I don’t >know what is the cause for different numbers on RHEL and Ubuntu but certainly >there is a regression when pgbouncer is connected to postgres through >localhost. When I try to connect pgbouncer to postgres through unix-socket >performance is constantly bad on all postgres versions. > >Both servers are for testing but I can easily provide you SSH access only to >Ubuntu host if necessary. I can also gather more diagnostics if needed. We have not invented anything better than to backport 98a64d0bd713cb89e61bef6432befc4b7b5da59e from 9.6 to 9.5. It completely solved the problem. If anyone is interested or anyone will face with this problem, patch is attached. Regards, Dmitriy Sarafannikovdiff --git a/configure b/configure index adee368..311340f 100755 --- a/configure +++ b/configure @@ -9364,7 +9364,7 @@ fi done -for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h +for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/epoll.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h do : as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" diff --git a/configure.in b/configure.in index 5025798..66e8fa9 100644 --- a/configure.in +++ b/configure.in @@ -1082,7 +1082,7 @@ AC_SUBST(UUID_LIBS) ## dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES -AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h]) +AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/epoll.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h]) # On BSD, test for net/if.h will fail unless sys/socket.h # is included first. diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 26d8faa..52b35a2 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -139,16 +139,38 @@ retry: /* In blocking mode, wait until the socket is ready */ if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN)) { - int w; + WaitEvent event; Assert(waitfor); - w = WaitLatchOrSocket(MyLatch, - WL_LATCH_SET | waitfor, - port->sock, 0); + ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + + WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1); + + /* + * If the postmaster has died, it's not safe to continue running, + * because it is the postmaster's job to kill us if some other backend + * exists uncleanly. Moreover, we won't run very well in this state; + * helper processes like walwriter and the bgwriter will exit, so + * performance may
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
27 мая 2016 г., в 19:57, Vladimir Borodinнаписал(а):-performance+hackers25 мая 2016 г., в 17:33, Vladimir Borodin написал(а):Hi all.We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb), * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053), * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.Shortly:OS PostgreSQL version TPS Avg. latencyRHEL 6 9.4 44898 1.425 msRHEL 6 9.5 26199 2.443 msRHEL 6 9.5 43027 1.487 msUbuntu 14.04 9.4 67458 0.949 msUbuntu 14.04 9.5 64065 0.999 msUbuntu 14.04 9.6 64350 0.995 msThe results above are not really fair, pgbouncer.ini was a bit different on Ubuntu host (application_name_add_host was disabled). Here are the right results with exactly the same configuration:OS PostgreSQL version TPS Avg. latencyRHEL 6 9.4 44898 1.425 msRHEL 6 9.5 26199 2.443 msRHEL 6 9.5 43027 1.487 msUbuntu 14.04 9.4 45971 1.392 msUbuntu 14.04 9.5 40282 1.589 msUbuntu 14.04 9.6 45410 1.409 msIt can be seen that there is a regression for 9.5 in Ubuntu also, but not so significant. We first thought that the reason is 38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for PgBackendStatus.st_changecount protocol), but in that case the regression should also be seen in 9.6 also.There also was a bunch of changes in FE/BE communication (like 387da18874afa17156ee3af63766f17efb53c4b9 or 98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of regression in 9.5 and normal results in 9.6. Probably the right way to find the answer is to do bisect. I’ll do it but if some more diagnostics information can help, feel free to ask about it.Yep, bisect confirms that the first bad commit in REL9_5_STABLE is 387da18874afa17156ee3af63766f17efb53c4b9. Full output is attached.And bisect for master branch confirms that the situation became much better after 98a64d0bd713cb89e61bef6432befc4b7b5da59e. Output is also attached.On Ubuntu performance degradation is ~15% and on RHEL it is ~100%. I don’t know what is the cause for different numbers on RHEL and Ubuntu but certainly there is a regression when pgbouncer is connected to postgres through localhost. When I try to connect pgbouncer to postgres through unix-socket performance is constantly bad on all postgres versions.Both servers are for testing but I can easily provide you SSH access only to Ubuntu host if necessary. I can also gather more diagnostics if needed. bisect95.out Description: Binary data bisect96.out Description: Binary data You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.Below are more details.RHEL 6:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2693962latency average: 1.425 mstps = 44897.461518 (including connections establishing)tps = 44898.763258 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 1572014latency average: 2.443 mstps = 26198.928627 (including connections establishing)tps = 26199.803363 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2581645latency average: 1.487 mstps = 43025.676995 (including connections establishing)tps = 43027.038275 (excluding connections establishing)postgres@pgload05g ~ $Ubuntu 14.04 (the same hardware):postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of thr
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
> 28 мая 2016 г., в 0:56, Andres Freund написал(а): > > On 2016-05-27 19:57:34 +0300, Vladimir Borodin wrote: >> >> OS PostgreSQL version TPS Avg. >> latency >> RHEL 6 9.4 44898 >> 1.425 ms >> RHEL 6 9.5 26199 >> 2.443 ms >> RHEL 6 9.5 43027 >> 1.487 ms > > Hm. I'm a bit confused. You show one result for 9.5 with bad and one > with good performance. I suspect the second one is supposed to be a 9.6? Sorry, I misunderstood. Yes, the last line above is for 9.6, that was a typo. > > Greetings, > > Andres Freund -- May the force be with you… https://simply.name
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
> 28 мая 2016 г., в 0:56, Andres Freund написал(а): > > Hi, > > > On 2016-05-27 19:57:34 +0300, Vladimir Borodin wrote: >> -performance >>> Here is how the results look like for 9.4, 9.5 and 9.6. All are built from >>> latest commits on yesterday in >>> * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb), >>> * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053), >>> * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d). >>> >>> All of them are build on the host where testing is done (with stock gcc >>> versions). Sysctls, pgbouncer config and everything we found are the same, >>> postgres configs are default, PGDATA is in tmpfs. All numbers are >>> reproducible, they are stable between runs. >>> >>> Shortly: >>> >>> OS PostgreSQL version TPS Avg. >>> latency >>> RHEL 6 9.4 44898 >>> 1.425 ms >>> RHEL 6 9.5 26199 >>> 2.443 ms >>> RHEL 6 9.5 43027 >>> 1.487 ms >>> Ubuntu 14.049.4 67458 >>> 0.949 ms >>> Ubuntu 14.049.5 64065 >>> 0.999 ms >>> Ubuntu 14.049.6 64350 >>> 0.995 ms >> >> The results above are not really fair, pgbouncer.ini was a bit different on >> Ubuntu host (application_name_add_host was disabled). Here are the right >> results with exactly the same configuration: >> >> OS PostgreSQL version TPS Avg. >> latency >> RHEL 6 9.4 44898 >> 1.425 ms >> RHEL 6 9.5 26199 >> 2.443 ms >> RHEL 6 9.5 43027 >> 1.487 ms >> Ubuntu 14.04 9.4 45971 1.392 ms >> Ubuntu 14.04 9.5 40282 1.589 ms >> Ubuntu 14.04 9.6 45410 1.409 ms > > Hm. I'm a bit confused. You show one result for 9.5 with bad and one > with good performance. I suspect the second one is supposed to be a 9.6? No, they are both for 9.5. One of them is on RHEL 6 host, another one on Ubuntu 14.04. > > Am I understanding correctly that the performance near entirely > recovered with 9.6? Yes, 9.6 is much better than 9.5. > If so, I suspect we might be dealing with a memory > alignment issue. Do the 9.5 results change if you increase > max_connections by one or two (without changing anything else)? Results with max_connections=100: OS Version TPS Avg. latency RHEL 6 9.4 69810 0.917 RHEL 6 9.5 35303 1.812 RHEL 6 9.6 71827 0.891 Ubuntu 14.049.4 76829 0.833 Ubuntu 14.049.5 67574 0.947 Ubuntu 14.049.6 79200 0.808 Results with max_connections=101: OS Version TPS Avg. latency RHEL 6 9.4 70059 0.914 RHEL 6 9.5 35979 1.779 RHEL 6 9.6 71183 0.899 Ubuntu 14.049.4 78934 0.811 Ubuntu 14.049.5 67803 0.944 Ubuntu 14.049.6 79624 0.804 Results with max_connections=102: OS Version TPS Avg. latency RHEL 6 9.4 70710 0.905 RHEL 6 9.5 36615 1.748 RHEL 6 9.6 69742 0.918 Ubuntu 14.049.4 76356 0.838 Ubuntu 14.049.5 66814 0.958 Ubuntu 14.049.6 78528 0.815 Doesn’t seem that it is a memory alignment issue. Also please note that there is no performance degradation when connections from pgbench to postgres are established without pgbouncer: OS Version TPS Avg. latency RHEL 6 9.4 167427 0.382 RHEL 6 9.5 223674 0.286 RHEL 6 9.6 215580 0.297 Ubuntu 14.049.4 176659 0.362 Ubuntu 14.049.5 248277 0.258 Ubuntu 14.049.6 245871 0.260 > > What's the actual hardware? Host with RHEL has Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2 sockets, 16 physical cores, 32 cores with Hyper-Threading) and 256 GB of RAM while host with Ubuntu has Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz (2 sockets, 16 physical cores, 32 cores with Hyper-Threading) and 128 GB of RAM. > > Greetings, > > Andres Freund -- May the fo
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Hi, On 2016-05-27 19:57:34 +0300, Vladimir Borodin wrote: > -performance > > Here is how the results look like for 9.4, 9.5 and 9.6. All are built from > > latest commits on yesterday in > > * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb), > > * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053), > > * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d). > > > > All of them are build on the host where testing is done (with stock gcc > > versions). Sysctls, pgbouncer config and everything we found are the same, > > postgres configs are default, PGDATA is in tmpfs. All numbers are > > reproducible, they are stable between runs. > > > > Shortly: > > > > OS PostgreSQL version TPS Avg. > > latency > > RHEL 6 9.4 44898 > > 1.425 ms > > RHEL 6 9.5 26199 > > 2.443 ms > > RHEL 6 9.5 43027 > > 1.487 ms > > Ubuntu 14.049.4 67458 > > 0.949 ms > > Ubuntu 14.049.5 64065 > > 0.999 ms > > Ubuntu 14.049.6 64350 > > 0.995 ms > > The results above are not really fair, pgbouncer.ini was a bit different on > Ubuntu host (application_name_add_host was disabled). Here are the right > results with exactly the same configuration: > > OSPostgreSQL version TPS Avg. > latency > RHEL 69.4 44898 > 1.425 ms > RHEL 69.5 26199 > 2.443 ms > RHEL 69.5 43027 > 1.487 ms > Ubuntu 14.04 9.4 45971 1.392 ms > Ubuntu 14.04 9.5 40282 1.589 ms > Ubuntu 14.04 9.6 45410 1.409 ms Hm. I'm a bit confused. You show one result for 9.5 with bad and one with good performance. I suspect the second one is supposed to be a 9.6? Am I understanding correctly that the performance near entirely recovered with 9.6? If so, I suspect we might be dealing with a memory alignment issue. Do the 9.5 results change if you increase max_connections by one or two (without changing anything else)? What's the actual hardware? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
-performance +hackers > 25 мая 2016 г., в 17:33, Vladimir Borodin написал(а): > > Hi all. > > We have found that queries through PgBouncer 1.7.2 (with transaction pooling) > to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on > RHEL 6 hosts (all packages are updated to last versions). Meanwhile the > problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated). > > Here is how the results look like for 9.4, 9.5 and 9.6. All are built from > latest commits on yesterday in > * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb), > * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053), > * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d). > > All of them are build on the host where testing is done (with stock gcc > versions). Sysctls, pgbouncer config and everything we found are the same, > postgres configs are default, PGDATA is in tmpfs. All numbers are > reproducible, they are stable between runs. > > Shortly: > > OSPostgreSQL version TPS Avg. > latency > RHEL 69.4 44898 > 1.425 ms > RHEL 69.5 26199 > 2.443 ms > RHEL 69.5 43027 > 1.487 ms > Ubuntu 14.04 9.4 67458 0.949 ms > Ubuntu 14.04 9.5 64065 0.999 ms > Ubuntu 14.04 9.6 64350 0.995 ms The results above are not really fair, pgbouncer.ini was a bit different on Ubuntu host (application_name_add_host was disabled). Here are the right results with exactly the same configuration: OS PostgreSQL version TPS Avg. latency RHEL 6 9.4 44898 1.425 ms RHEL 6 9.5 26199 2.443 ms RHEL 6 9.5 43027 1.487 ms Ubuntu 14.049.4 45971 1.392 ms Ubuntu 14.049.5 40282 1.589 ms Ubuntu 14.049.6 45410 1.409 ms It can be seen that there is a regression for 9.5 in Ubuntu also, but not so significant. We first thought that the reason is 38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for PgBackendStatus.st_changecount protocol), but in that case the regression should also be seen in 9.6 also. There also was a bunch of changes in FE/BE communication (like 387da18874afa17156ee3af63766f17efb53c4b9 or 98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of regression in 9.5 and normal results in 9.6. Probably the right way to find the answer is to do bisect. I’ll do it but if some more diagnostics information can help, feel free to ask about it. > > You could see that the difference between major versions on Ubuntu is not > significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6. > > Below are more details. > > RHEL 6: > > postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 > -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94' > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > number of threads: 64 > duration: 60 s > number of transactions actually processed: 2693962 > latency average: 1.425 ms > tps = 44897.461518 (including connections establishing) > tps = 44898.763258 (excluding connections establishing) > postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 > -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95' > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > number of threads: 64 > duration: 60 s > number of transactions actually processed: 1572014 > latency average: 2.443 ms > tps = 26198.928627 (including connections establishing) > tps = 26199.803363 (excluding connections establishing) > postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 > -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96' > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > number of threads: 64 > duration: 60 s > number of transactions actually processed: 2581645 > latency average: 1.487 ms > tps = 43025.676995 (including connections establishing) > tps = 43027.038275 (excluding connections establishing) > postgres@pgload05g ~ $ > > Ubuntu 14.04 (the same hardware): > > postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T > 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94' > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > numbe