Re: [PERFORM] archive_command too slow.
On Fri, Nov 4, 2016 at 1:19 PM, Jeff Janes wrote: > If setting up the ssh tunnel is the problem, you could assess whether you > really need that security, or compile a custom postgresql with larger WAL > file sizes, or write a fancy archive_command which first archives the files > to a local directory, and then transfers them in chunks to the slave. Or > maybe use streaming rather than file shipping. Another option is to use ssh's ControlMaster and ControlPersist features to keep the SSH tunnel alive between commands. You'd have to set up the RSYNC_CONNECT_PROG environment variable on your archive command for that and include the relevant options for ssh in the command. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] archive_command too slow.
On Wed, Nov 2, 2016 at 12:06 PM, Joao Junior wrote: > Hi friends, > > I am running 2 Linux machines, kernel 3.13.0-45-generic #74-Ubuntu SMP. > Postgresql version 9.4 in both machine, in a Hot Standby cenario. > > Master-Slave using WAL files, not streaming replication. > > The archive_command from master is: > > archive_command = '/usr/bin/rsync -a -e "ssh" "%p" > slave:/data2/postgres/standby/main/incoming/"%f"' # > How long does it take just to set up the ssh tunnel? $ time ssh slave hostname In my hands, this takes about 0.5, every time. If you need to archive 26 segments per minute, that much overhead is going to consume a substantial fraction of your time budget. How much network bandwidth do you have? If you scp a big chunk of files in one command over to the slave (not into a production directory of it,of course) how fast does that go? $ time rsync datadir/pg_xlog/0001C960004? slave:/tmp/foo/ ... > > It seems that archive_command is very slowly compared with the amount of > WAL segments generated. > Any suggestions??? Should I use another strategy to increase the > archive_command process speed??? > If network throughput is the problem, use compression, or get a faster network. If setting up the ssh tunnel is the problem, you could assess whether you really need that security, or compile a custom postgresql with larger WAL file sizes, or write a fancy archive_command which first archives the files to a local directory, and then transfers them in chunks to the slave. Or maybe use streaming rather than file shipping. Cheers, Jeff
Re: [PERFORM] Perf decreased although server is better
On Fri, Nov 4, 2016 at 6:53 AM, Benjamin Toueg wrote: > I don't see how this can be due to network latency! I'm not suggesting it is due to network latency -- it is due to the latency for storage requests. That won't depend on network latency unless you are going to a LAN for storage. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Perf decreased although server is better
> Rick, what did you mean by kernel configuration? The OS is a standard Ubuntu 16.04: > > - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > > Do you think losing half the number of cores can explain my performance issue ? (AMD 8 cores down to Haswell 4 cores). I was referring to some of the tunables discussed on this page: https://www.postgresql.org/docs/9.6/static/kernel-resources.html Specifically, in my environment I update /etc/security/limits.conf to include: * hard nofile 65536 * soft nofile 65536 * hard stack 16384 * soft stack 16384 * hard memlock unlimited * soft memlock unlimited And then add this to /etc/pam.d/common-session so that they get picked up when I su to the postgres user: session requiredpam_limits.so I update sysctl.conf with huge pages: vm.hugetlb_shm_group=5432 vm.nr_hugepages=4300 (The number of huge pages may be different for your environment.) And create and add the postgres user to the huge pages group: hugepages:x:5432:postgres You may also want to look at some TCP tunables, and check your shared memory limits too. I only mentioned this because sometimes when you move from one system to another, you can get so caught up in getting the database set up and data migration that you overlook the basic system settings... Regarding the number of cores, most of the postgresql queries are going to be single threaded. The number of cores won't impact the performance of a single query except in certain circumstances: 1) You have parallel queries enabled and the table is doing some sort of expensive sequence scan 2) You have so many concurrent queries running the whole system is cpu starved. 3) There is some other resource contention between the cpus that causes _more_ cpus to actually run slower than fewer. (It happens - I had a server back in the 90's which had severe lock contention over /dev/tcp. Adding more cpus made it slower.) 4) The near-cache memory gets fragmented in a way that processors have to reach deeper in the caches to find what they need. (I'm not explaining that very well, but it is unlikely to be a problem in your case anyway.) A quick and simple command to get a sense of how busy your cpus are is: $ mpstat -P ALL 5 (let it run for a few of the 5 second intervals) If they are all running pretty hot, then more cores might help. If just one is running hot, then more cores probably won't do anything. On Fri, Nov 4, 2016 at 7:53 AM, Benjamin Toueg wrote: > I've noticed a network latency increase. Ping between web server and > database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days > ago :( > > I've narrowed my investigation to one particular "Transaction" in terms of > the NewRelic APM. It's basically the main HTTP request of my application. > > Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c): > 4 ms up to 16 ms on average. > > That I can understand. However, I don't understand the performance > decrease of the select queries on table1 (see https://i.stack.imgur.com/ > QaUqy.png): 80 ms up to 160 ms on average > > Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on > average > > However, there is a commit in my request, and it performs better (see > http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average. > > I don't see how this can be due to network latency! > > I will provide a new bonnie++ benchmark when the requests per minute is at > the lowest (remember I can only run benchmarks while the server is in use). > > Rick, what did you mean by kernel configuration? The OS is a standard > Ubuntu 16.04: > > - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 > x86_64 x86_64 x86_64 GNU/Linux > > Do you think losing half the number of cores can explain my performance > issue ? (AMD 8 cores down to Haswell 4 cores). > > Best Regards, > > Benjamin > > PS : I've edited the SO post http://serverfault.com/ > questions/812702/posgres-perf-decreased-although-server-is-better > > 2016-11-04 1:05 GMT+01:00 Kevin Grittner : > >> On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg wrote: >> > >> > Stream gives substantially better results with the new server >> (before/after) >> >> Yep, the new server can access RAM at about twice the speed of the old. >> >> > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both >> > machines. I don't know how to read bonnie++ results (before/after) >> > but it looks quite the same, sometimes better for the new, >> > sometimes better for the old. >> >> On most metrics the new machine looks better, but there are a few >> things that look potentially problematic with the new machine: the >> new machine uses about 1.57x the CPU time of the old per block >> written sequentially ((41 / 143557) / (16 / 87991)); so if the box >> becomes CPU starved, you might notice writes getting slower than on >> the new box. Also, several of the latency numbers
Re: [PERFORM] Perf decreased although server is better
My guess would be that your server upgrade wasn't the upgrade you thought it was. You network latency could definitely be the cause of most of this. The problem is you're not measuring this from the server side. It's not only going to impact connect time, but you're going to get your data a bit slower as well. I'm assuming your pgbouncer is installed on the postgres box. Do you have the pgbouncer addon installed? If you're looking to isolate network transfer, pgbouncer should give the statistics like avg query time to see if they line up with what you're seeing in your APM. Also, you reduced your number of cores, which is a big problem because basically you just cut your max query capacity in half. Assuming a dedicated box, each CPU can only process up to one postgres query at a time. Previously, you could process up to 8 queries simultaneously, whereas now you can only do 4. Now, since most of your queries are probably in the ms, that can still be quite a bit of queries in a second time frame and you may never hit 4 going at the same exact time, but without seeing your pgbouncer config, this may actually be happening based on all the idle connections you're seeing. If ALL your connections are coming from a single pgbouncer locally on the postgres box, then you can use your server resources better by setting max_connections to 6 (the number of cores + one or two more for you to connect locally via pgsql). Then, set your default_pool_size in pgbouncer to 4 (number of cores) and reserve_pool_size to 0, and restart. This will keep the number of sessions limited to what your box is actually capable of doing and will help you avoid loading postgres down with more than it's capable of doing, which can make a bad situation worse. My recommendation would be to go back to the old server if it's available. If not, get a new one in the same data center as your web servers with at least 8 cores to put you back where you were. On Fri, Nov 4, 2016 at 7:55 AM Benjamin Toueg wrote: > I've noticed a network latency increase. Ping between web server and > database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days > ago :( > > I've narrowed my investigation to one particular "Transaction" in terms of > the NewRelic APM. It's basically the main HTTP request of my application. > > Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c): > 4 ms up to 16 ms on average. > > That I can understand. However, I don't understand the performance > decrease of the select queries on table1 (see > https://i.stack.imgur.com/QaUqy.png): 80 ms up to 160 ms on average > > Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on > average > > However, there is a commit in my request, and it performs better (see > http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average. > > I don't see how this can be due to network latency! > > I will provide a new bonnie++ benchmark when the requests per minute is at > the lowest (remember I can only run benchmarks while the server is in use). > > Rick, what did you mean by kernel configuration? The OS is a standard > Ubuntu 16.04: > > - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 > x86_64 x86_64 x86_64 GNU/Linux > > Do you think losing half the number of cores can explain my performance > issue ? (AMD 8 cores down to Haswell 4 cores). > > Best Regards, > > Benjamin > > PS : I've edited the SO post > http://serverfault.com/questions/812702/posgres-perf-decreased-although-server-is-better > > 2016-11-04 1:05 GMT+01:00 Kevin Grittner : > > On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg wrote: > > > > Stream gives substantially better results with the new server > (before/after) > > Yep, the new server can access RAM at about twice the speed of the old. > > > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both > > machines. I don't know how to read bonnie++ results (before/after) > > but it looks quite the same, sometimes better for the new, > > sometimes better for the old. > > On most metrics the new machine looks better, but there are a few > things that look potentially problematic with the new machine: the > new machine uses about 1.57x the CPU time of the old per block > written sequentially ((41 / 143557) / (16 / 87991)); so if the box > becomes CPU starved, you might notice writes getting slower than on > the new box. Also, several of the latency numbers are worse -- in > some cases far worse. If I'm understanding that properly, it > suggests that while total throughput from a number of connections > may be better on the new machine, a single connection may not run > the same query as quickly. That probably makes the new machine > better for handling an OLTP workload from many concurrent clients, > but perhaps not as good at cranking out a single big report or > running dump/restore. > > Yes, it is quite possible that the new machine could be faster at > some things and slower at others. > > -- > Kevi
Re: [PERFORM] Perf decreased although server is better
I've noticed a network latency increase. Ping between web server and database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days ago :( I've narrowed my investigation to one particular "Transaction" in terms of the NewRelic APM. It's basically the main HTTP request of my application. Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c): 4 ms up to 16 ms on average. That I can understand. However, I don't understand the performance decrease of the select queries on table1 (see https://i.stack.imgur.com/QaUqy.png): 80 ms up to 160 ms on average Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on average However, there is a commit in my request, and it performs better (see http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average. I don't see how this can be due to network latency! I will provide a new bonnie++ benchmark when the requests per minute is at the lowest (remember I can only run benchmarks while the server is in use). Rick, what did you mean by kernel configuration? The OS is a standard Ubuntu 16.04: - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Do you think losing half the number of cores can explain my performance issue ? (AMD 8 cores down to Haswell 4 cores). Best Regards, Benjamin PS : I've edited the SO post http://serverfault.com/questions/812702/posgres-perf-decreased-although-server-is-better 2016-11-04 1:05 GMT+01:00 Kevin Grittner : > On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg wrote: > > > > Stream gives substantially better results with the new server > (before/after) > > Yep, the new server can access RAM at about twice the speed of the old. > > > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both > > machines. I don't know how to read bonnie++ results (before/after) > > but it looks quite the same, sometimes better for the new, > > sometimes better for the old. > > On most metrics the new machine looks better, but there are a few > things that look potentially problematic with the new machine: the > new machine uses about 1.57x the CPU time of the old per block > written sequentially ((41 / 143557) / (16 / 87991)); so if the box > becomes CPU starved, you might notice writes getting slower than on > the new box. Also, several of the latency numbers are worse -- in > some cases far worse. If I'm understanding that properly, it > suggests that while total throughput from a number of connections > may be better on the new machine, a single connection may not run > the same query as quickly. That probably makes the new machine > better for handling an OLTP workload from many concurrent clients, > but perhaps not as good at cranking out a single big report or > running dump/restore. > > Yes, it is quite possible that the new machine could be faster at > some things and slower at others. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >