Re: [PERFORM] archive_command too slow.

2016-11-04 Thread Claudio Freire
On Fri, Nov 4, 2016 at 1:19 PM, Jeff Janes  wrote:
> If setting up the ssh tunnel is the problem, you could assess whether you
> really need that security, or compile a custom postgresql with larger WAL
> file sizes, or write a fancy archive_command which first archives the files
> to a local directory, and then transfers them in chunks to the slave.  Or
> maybe use streaming rather than file shipping.

Another option is to use ssh's ControlMaster and ControlPersist
features to keep the SSH tunnel alive between commands.

You'd have to set up the RSYNC_CONNECT_PROG environment variable on
your archive command for that and include the relevant options for ssh
in the command.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] archive_command too slow.

2016-11-04 Thread Jeff Janes
On Wed, Nov 2, 2016 at 12:06 PM, Joao Junior  wrote:

> Hi friends,
>
> I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
> Postgresql version 9.4 in both machine, in a Hot Standby cenario.
>
> Master-Slave using WAL files, not streaming replication.
>
> The archive_command from master is:
>
> archive_command = '/usr/bin/rsync -a -e "ssh" "%p"
> slave:/data2/postgres/standby/main/incoming/"%f"' #
>


How long does it take just to set up the ssh tunnel?

$ time ssh slave hostname

In my hands, this takes about 0.5, every time.  If you need to archive 26
segments per minute, that much overhead is going to consume a substantial
fraction of your time budget.

How much network bandwidth do you have?  If you scp a big chunk of files in
one command over to the slave (not into a production directory of it,of
course) how fast does that go?

$ time rsync datadir/pg_xlog/0001C960004? slave:/tmp/foo/


...



>
> It seems that archive_command is very slowly compared with the amount of
> WAL segments generated.
> Any suggestions??? Should I use another strategy to increase the
> archive_command process speed???
>


If network throughput is the problem, use compression, or get a faster
network.

If setting up the ssh tunnel is the problem, you could assess whether you
really need that security, or compile a custom postgresql with larger WAL
file sizes, or write a fancy archive_command which first archives the files
to a local directory, and then transfers them in chunks to the slave.  Or
maybe use streaming rather than file shipping.


Cheers,

Jeff


Re: [PERFORM] Perf decreased although server is better

2016-11-04 Thread Kevin Grittner
On Fri, Nov 4, 2016 at 6:53 AM, Benjamin Toueg  wrote:

> I don't see how this can be due to network latency!

I'm not suggesting it is due to network latency -- it is due to the
latency for storage requests.  That won't depend on network latency
unless you are going to a LAN for storage.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perf decreased although server is better

2016-11-04 Thread Rick Otten
> Rick, what did you mean by kernel configuration? The OS is a standard
Ubuntu 16.04:
>
> - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
>
> Do you think losing half the number of cores can explain my performance
issue ? (AMD 8 cores down to Haswell 4 cores).

I was referring to some of the tunables discussed on this page:
https://www.postgresql.org/docs/9.6/static/kernel-resources.html

Specifically, in my environment I update /etc/security/limits.conf to
include:


* hard nofile 65536
* soft nofile 65536

* hard stack 16384
* soft stack 16384

* hard memlock unlimited
* soft memlock unlimited


And then add this to /etc/pam.d/common-session so that they get picked up
when I su to the postgres user:

session requiredpam_limits.so


I update sysctl.conf with huge pages:

vm.hugetlb_shm_group=5432
vm.nr_hugepages=4300


(The number of huge pages may be different for your environment.)
And create and add the postgres user to the huge pages group:

hugepages:x:5432:postgres


You may also want to look at some TCP tunables, and check your shared
memory limits too.

I only mentioned this because sometimes when you move from one system to
another, you can get so caught up in getting the database set up  and data
migration that you overlook the basic system settings...

Regarding the number of cores, most of the postgresql queries are going to
be single threaded.  The number of cores won't impact the performance of a
single query except in certain circumstances:
1) You have parallel queries enabled and the table is doing some sort
of expensive sequence scan
2) You have so many concurrent queries running the whole system is cpu
starved.
3) There is some other resource contention between the cpus that causes
_more_ cpus to actually run slower than fewer.  (It happens - I had a
server back in the 90's which had severe lock contention over /dev/tcp.
Adding more cpus made it slower.)
4) The near-cache memory gets fragmented in a way that processors have
to reach deeper in the caches to find what they need.  (I'm not explaining
that very well, but it is unlikely to be a problem in your case anyway.)

A quick and simple command to get a sense of how busy your cpus are is:

$ mpstat -P ALL 5

(let it run for a few of the 5 second intervals)

If they are all running pretty hot, then more cores might help.  If just
one is running hot, then more cores probably won't do anything.



On Fri, Nov 4, 2016 at 7:53 AM, Benjamin Toueg  wrote:

> I've noticed a network latency increase. Ping between web server and
> database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days
> ago :(
>
> I've narrowed my investigation to one particular "Transaction" in terms of
> the NewRelic APM. It's basically the main HTTP request of my application.
>
> Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c):
> 4 ms up to 16 ms on average.
>
> That I can understand. However, I don't understand the performance
> decrease of the select queries on table1 (see https://i.stack.imgur.com/
> QaUqy.png): 80 ms up to 160 ms on average
>
> Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on
> average
>
> However, there is a commit in my request, and it performs better (see
> http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average.
>
> I don't see how this can be due to network latency!
>
> I will provide a new bonnie++ benchmark when the requests per minute is at
> the lowest (remember I can only run benchmarks while the server is in use).
>
> Rick, what did you mean by kernel configuration? The OS is a standard
> Ubuntu 16.04:
>
>  - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux
>
> Do you think losing half the number of cores can explain my performance
> issue ? (AMD 8 cores down to Haswell 4 cores).
>
> Best Regards,
>
> Benjamin
>
> PS : I've edited the SO post http://serverfault.com/
> questions/812702/posgres-perf-decreased-although-server-is-better
>
> 2016-11-04 1:05 GMT+01:00 Kevin Grittner :
>
>> On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg  wrote:
>> >
>> > Stream gives substantially better results with the new server
>> (before/after)
>>
>> Yep, the new server can access RAM at about twice the speed of the old.
>>
>> > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both
>> > machines. I don't know how to read bonnie++ results (before/after)
>> > but it looks quite the same, sometimes better for the new,
>> > sometimes better for the old.
>>
>> On most metrics the new machine looks better, but there are a few
>> things that look potentially problematic with the new machine: the
>> new machine uses about 1.57x the CPU time of the old per block
>> written sequentially ((41 / 143557) / (16 / 87991)); so if the box
>> becomes CPU starved, you might notice writes getting slower than on
>> the new box.  Also, several of the latency numbers 

Re: [PERFORM] Perf decreased although server is better

2016-11-04 Thread Will Platnick
My guess would be that your server upgrade wasn't the upgrade you thought
it was.

You network latency could definitely be the cause of most of this. The
problem is you're not measuring this from the server side. It's not only
going to impact connect time, but you're going to get your data a bit
slower as well. I'm assuming your pgbouncer is installed on the postgres
box. Do you have the pgbouncer addon installed? If you're looking to
isolate network transfer, pgbouncer should give the statistics like avg
query time to see if they line up with what you're seeing in your APM.

Also, you reduced your number of cores, which is a big problem because
basically you just cut your max query capacity in half. Assuming a
dedicated box, each CPU can only process up to one postgres query at a
time. Previously, you could process up to 8 queries simultaneously, whereas
now you can only do 4. Now, since most of your queries are probably in the
ms, that can still be quite a bit of queries in a second time frame and you
may never hit 4 going at the same exact time, but without seeing your
pgbouncer config, this may actually be happening based on all the idle
connections you're seeing.

If ALL your connections are coming from a single pgbouncer locally on the
postgres box, then you can use your server resources better by setting
max_connections to 6 (the number of cores + one or two more for you to
connect locally via pgsql). Then, set your default_pool_size in pgbouncer
to 4 (number of cores) and reserve_pool_size to 0, and restart. This will
keep the number of sessions limited to what your box is actually capable of
doing and will help you avoid loading postgres down with more than it's
capable of doing, which can make a bad situation worse.

My recommendation would be to go back to the old server if it's available.
If not, get a new one in the same data center as your web servers with at
least 8 cores to put you back where you were.


On Fri, Nov 4, 2016 at 7:55 AM Benjamin Toueg  wrote:

> I've noticed a network latency increase. Ping between web server and
> database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days
> ago :(
>
> I've narrowed my investigation to one particular "Transaction" in terms of
> the NewRelic APM. It's basically the main HTTP request of my application.
>
> Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c):
> 4 ms up to 16 ms on average.
>
> That I can understand. However, I don't understand the performance
> decrease of the select queries on table1 (see
> https://i.stack.imgur.com/QaUqy.png): 80 ms up to 160 ms on average
>
> Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on
> average
>
> However, there is a commit in my request, and it performs better (see
> http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average.
>
> I don't see how this can be due to network latency!
>
> I will provide a new bonnie++ benchmark when the requests per minute is at
> the lowest (remember I can only run benchmarks while the server is in use).
>
> Rick, what did you mean by kernel configuration? The OS is a standard
> Ubuntu 16.04:
>
>  - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux
>
> Do you think losing half the number of cores can explain my performance
> issue ? (AMD 8 cores down to Haswell 4 cores).
>
> Best Regards,
>
> Benjamin
>
> PS : I've edited the SO post
> http://serverfault.com/questions/812702/posgres-perf-decreased-although-server-is-better
>
> 2016-11-04 1:05 GMT+01:00 Kevin Grittner :
>
> On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg  wrote:
> >
> > Stream gives substantially better results with the new server
> (before/after)
>
> Yep, the new server can access RAM at about twice the speed of the old.
>
> > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both
> > machines. I don't know how to read bonnie++ results (before/after)
> > but it looks quite the same, sometimes better for the new,
> > sometimes better for the old.
>
> On most metrics the new machine looks better, but there are a few
> things that look potentially problematic with the new machine: the
> new machine uses about 1.57x the CPU time of the old per block
> written sequentially ((41 / 143557) / (16 / 87991)); so if the box
> becomes CPU starved, you might notice writes getting slower than on
> the new box.  Also, several of the latency numbers are worse -- in
> some cases far worse.  If I'm understanding that properly, it
> suggests that while total throughput from a number of connections
> may be better on the new machine, a single connection may not run
> the same query as quickly.  That probably makes the new machine
> better for handling an OLTP workload from many concurrent clients,
> but perhaps not as good at cranking out a single big report or
> running dump/restore.
>
> Yes, it is quite possible that the new machine could be faster at
> some things and slower at others.
>
> --
> Kevi

Re: [PERFORM] Perf decreased although server is better

2016-11-04 Thread Benjamin Toueg
I've noticed a network latency increase. Ping between web server and
database : 0.6 ms avg before, 5.3 ms avg after -- it wasn't that big 4 days
ago :(

I've narrowed my investigation to one particular "Transaction" in terms of
the NewRelic APM. It's basically the main HTTP request of my application.

Looks like the ping impacts psycopg2:connect (see http://imgur.com/a/LDH1c):
4 ms up to 16 ms on average.

That I can understand. However, I don't understand the performance decrease
of the select queries on table1 (see https://i.stack.imgur.com/QaUqy.png):
80 ms up to 160 ms on average

Same goes for table 2 (see http://imgur.com/a/CnETs): 4 ms up to 20 ms on
average

However, there is a commit in my request, and it performs better (see
http://imgur.com/a/td8Dc): 12 ms down to 6 ms on average.

I don't see how this can be due to network latency!

I will provide a new bonnie++ benchmark when the requests per minute is at
the lowest (remember I can only run benchmarks while the server is in use).

Rick, what did you mean by kernel configuration? The OS is a standard
Ubuntu 16.04:

 - Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

Do you think losing half the number of cores can explain my performance
issue ? (AMD 8 cores down to Haswell 4 cores).

Best Regards,

Benjamin

PS : I've edited the SO post
http://serverfault.com/questions/812702/posgres-perf-decreased-although-server-is-better

2016-11-04 1:05 GMT+01:00 Kevin Grittner :

> On Thu, Nov 3, 2016 at 9:51 AM, Benjamin Toueg  wrote:
> >
> > Stream gives substantially better results with the new server
> (before/after)
>
> Yep, the new server can access RAM at about twice the speed of the old.
>
> > I've run "bonnie++ -u postgres -d /tmp/ -s 4096M -r 1096" on both
> > machines. I don't know how to read bonnie++ results (before/after)
> > but it looks quite the same, sometimes better for the new,
> > sometimes better for the old.
>
> On most metrics the new machine looks better, but there are a few
> things that look potentially problematic with the new machine: the
> new machine uses about 1.57x the CPU time of the old per block
> written sequentially ((41 / 143557) / (16 / 87991)); so if the box
> becomes CPU starved, you might notice writes getting slower than on
> the new box.  Also, several of the latency numbers are worse -- in
> some cases far worse.  If I'm understanding that properly, it
> suggests that while total throughput from a number of connections
> may be better on the new machine, a single connection may not run
> the same query as quickly.  That probably makes the new machine
> better for handling an OLTP workload from many concurrent clients,
> but perhaps not as good at cranking out a single big report or
> running dump/restore.
>
> Yes, it is quite possible that the new machine could be faster at
> some things and slower at others.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>