Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-16 Thread Cody Caughlan

On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote:

 On 16 Listopad 2011, 2:21, Cody Caughlan wrote:
 How did you build your RAID array? Maybe I have a fundamental flaw /
 misconfiguration. I am doing it via:
 
 $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
 $ pvcreate /dev/md0
 $ vgcreate lvm-raid10 /dev/md0
 $ lvcreate -l 215021 lvm-raid10 -n lvm0
 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0
 $ mkfs.xfs -f /dev/lvm-raid10/lvm0
 $ mkdir -p /data  mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data
 
 I'm not using EC2 much, and those were my first attempts with ephemeral
 storage, so this may be a stupid question, but why are you building a
 RAID-10 array on an ephemeral storage, anyway?
 
 You already have a standby, so if the primary instance fails you can
 easily failover.
 

Yes, the slave will become master if master goes down. We have no plan to try 
and resurrect the master in the case of failure, hence the choice of ephemeral 
vs EBS. 

We chose RAID10 over RAID0 to get the best combination of performance and 
minimizing probability of a single drive failure bringing down the house.

So, yes, RAID0 would ultimately deliver the best performance, with more risk.

 What are you going to do in case of a drive failure? With a server this is
 rather easy - just put there a new drive and you're done, but can you do
 that on EC2? I guess you can't do that when the instance is running, so
 you'll have to switch to the standby anyway, right? Have you ever tried
 this (how it affects the performance etc.)?
 

As far as I know one cannot alter the ephemeral drives in a running instance, 
so yes, the whole instance would have to be written off.

 So what additional protection does that give you? Wouldn't a RAID-0 be a
 better utilization of the resources?
 

Too much risk.

 Tomas
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-15 Thread Cody Caughlan
On Tue, Nov 15, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 14.11.2011 22:58, Cody Caughlan napsal(a):
 I ran bonnie++ on a slave node, doing active streaming replication but
 otherwise idle:
 http://batch-files-test.s3.amazonaws.com/sql03.prod.html

 bonnie++ on the master node:
 http://batch-files-test.s3.amazonaws.com/sql01.prod.html

 If I am reading this right, this is my first time using it, the
 numbers dont look too good.

 I've done some benchmarks on my own (m1.xlarge instance), and the
 results are these (http://pastebin.com/T1LXHru0):

 single drive
 
 dd writes: 62 MB/s
 dd reads: 110 MB/s
 bonnie seq. writes: 55 MB/s
 bonnie seq. rewrite: 33 MB/s
 bonnie seq. reads: 91 MB/s
 bonnie seeks: 370/s

 raid 0 (4 devices)
 -
 dd writes: 220 MB/s
 dd reads: 380 MB/s
 bonnie seq. writes: 130 MB/s
 bonnie seq. rewrite: 114 MB/s
 bonnie seq. reads: 280 MB/s
 bonnie seeks: 570/s

 raid 10 (4 devices)
 -
 dd writes: 90 MB/s
 dd reads: 200 MB/s
 bonnie seq. writes: 49 MB/s
 bonnie seq. rewrite: 56 MB/s
 bonnie seq. reads: 160 MB/s
 bonnie seeks: 590/s


Interesting. I spun up a new m1.xlarge and did the same RAID10 config
(4 drives) except with a chunk size of 512K (instead of 256K) and the
machine was completely idle. Bonnie:

http://batch-files-test.s3.amazonaws.com/idle-512k-chunk.html

Which has similar-ish performance as yours, except for worse seeks but
a bit better seq. reads.

The other bonnies I sent over were NOT on idle systems. This one is
the master, which receives a heavy stream of writes and some reads

http://batch-files-test.s3.amazonaws.com/sql01.prod.html

And this is the slave, which is all writes and no reads:
http://batch-files-test.s3.amazonaws.com/sql03.prod.html

How did you build your RAID array? Maybe I have a fundamental flaw /
misconfiguration. I am doing it via:

$ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
/dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
$ pvcreate /dev/md0
$ vgcreate lvm-raid10 /dev/md0
$ lvcreate -l 215021 lvm-raid10 -n lvm0
$ blockdev --setra 65536 /dev/lvm-raid10/lvm0
$ mkfs.xfs -f /dev/lvm-raid10/lvm0
$ mkdir -p /data  mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-14 Thread Cody Caughlan
Hi, running Postgres 9.1.1 on an EC2 m1.xlarge instance. Machine is a
dedicated master with 2 streaming replication nodes.

The machine has 16GB of RAM and 4 cores.

We're starting to see some slow queries, especially COMMITs that are
happening more frequently. The slow queries are against seemingly
well-indexed tables.

I have log_min_duration = 150ms

Slow commits like:

2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG:  duration: 3062.784 ms
 statement: COMMIT
2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG:  duration: 2593.351 ms
 statement: COMMIT

These slow COMMITs are against tables that received a large number of
UPDATEs and are growing fairly rapidly.

And slow queries like:

2011-11-14 17:50:20 UTC pid:6519 (16/0-0) LOG:  duration: 1694.456 ms
statement: SELECT  facebook_wall_posts.* FROM facebook_wall_posts
WHERE facebook_wall_posts.token =
'984c44e75975b224b38197cf8f8fc76a' LIMIT 1

query plan: http://explain.depesz.com/s/wbm
The # of rows in facebook_wall_posts is 5841

Structure of facebook_wall_posts:
 Table public.facebook_wall_posts
   Column   |Type |
Modifiers
+-+--
 id | integer | not null default
nextval('facebook_wall_posts_id_seq'::regclass)
 album_id   | integer | not null
 token  | character varying(32)   | not null
 fb_recipient_id| character varying(64)   | not null
 post_id| character varying(100)  | not null
 fb_post_created_at | timestamp without time zone |
 data   | text|
 created_at | timestamp without time zone |
 updated_at | timestamp without time zone |
 fb_post_deleted_at | timestamp without time zone |
Indexes:
facebook_wall_posts_pkey PRIMARY KEY, btree (id)
index_facebook_wall_posts_on_token UNIQUE, btree (token)
index_facebook_wall_posts_on_album_id btree (album_id)


And another slow query:

2011-11-14 17:52:44 UTC pid:14912 (58/0-0) LOG:  duration: 979.740 ms
statement: SELECT facebook_friends.friend_id FROM
facebook_friends  WHERE facebook_friends.user_id = 9134671

Query plan: http://explain.depesz.com/s/x1q
# of rows in facebook_friends is 27075088

Structure of facebook_friends:
 Table public.facebook_friends
   Column   |Type |
Modifiers
+-+---
 id | integer | not null default
nextval('facebook_friends_id_seq'::regclass)
 user_id| integer | not null
 friend_id  | integer | not null
 created_at | timestamp without time zone |
Indexes:
facebook_friends_pkey PRIMARY KEY, btree (id)
index_facebook_friends_on_user_id_and_friend_id UNIQUE, btree
(user_id, friend_id)

We have auto-vacuum enabled and running. But yesterday I manually ran
vacuum on the database. Autovacuum settings:

autovacuum  | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold| 50
autovacuum_freeze_max_age   | 2
autovacuum_max_workers  | 3
autovacuum_naptime  | 60
autovacuum_vacuum_cost_delay| 20
autovacuum_vacuum_cost_limit| -1
autovacuum_vacuum_scale_factor  | 0.2
autovacuum_vacuum_threshold | 50

other postgresql.conf settings:

shared_buffers = 3584MB
wal_buffers = 16MB
checkpoint_segments = 32
max_wal_senders = 10
checkpoint_completion_target = 0.9
wal_keep_segments = 1024
maintenance_work_mem = 256MB
work_mem = 88MB
shared_buffers = 3584MB
effective_cache_size = 10GB

The PGDATA dir is a RAID10 on 4 local (ephemeral in EC2 speak)
drives. I ran some dd tests and received the following output:

--- WRITING ---
root@sql03:/data# time sh -c dd if=/dev/zero of=/data/tmp/bigfile
bs=8k count=400  sync
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 670.663 s, 48.9 MB/s

real11m52.199s
user0m2.720s
sys 0m45.330s


--- READING ---
root@sql03:/data# time dd of=/dev/zero if=/data/tmp/bigfile bs=8k
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 155.429 s, 211 MB/s

real2m35.434s
user0m2.400s
sys 0m33.160s


I have enabled log_checkpoints and here is a recent sample from the log:

2011-11-14 17:38:48 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8
recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync
files=60, longest=1.484 s, average=0.049 s
2011-11-14 17:39:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
2011-11-14 17:43:49 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
recycled; 

Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-14 Thread Cody Caughlan
Thanks for your response. Please see below for answers to your questions.

On Mon, Nov 14, 2011 at 11:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 14 Listopad 2011, 19:16, Cody Caughlan wrote:
 shared_buffers = 3584MB
 wal_buffers = 16MB
 checkpoint_segments = 32
 max_wal_senders = 10
 checkpoint_completion_target = 0.9
 wal_keep_segments = 1024
 maintenance_work_mem = 256MB
 work_mem = 88MB
 shared_buffers = 3584MB
 effective_cache_size = 10GB

 Seems reasonable, although I'd bump up the checkpoint_timeout (the 5m is
 usually too low).

Ok, will do.


 The PGDATA dir is a RAID10 on 4 local (ephemeral in EC2 speak)
 drives. I ran some dd tests and received the following output:

 --- WRITING ---
 root@sql03:/data# time sh -c dd if=/dev/zero of=/data/tmp/bigfile
 bs=8k count=400  sync
 400+0 records in
 400+0 records out
 3276800 bytes (33 GB) copied, 670.663 s, 48.9 MB/s

 real  11m52.199s
 user  0m2.720s
 sys   0m45.330s

 This measures sequential write performance (and the same holds for the
 read test). We need to know the random I/O performance too - use bonnie++
 or similar tool.

 Based on the AWS benchmarks I've seen so far, I'd expect about 90 MB/s for
 sequential read/writes, and about twice that performance for a 4-drive
 RAID10. So while the reads (211 MB/s) seem perfectly OK, the writes
 (50MB/s) are rather slow. Have you measured this on an idle system, or
 when the db was running?


I ran bonnie++ on a slave node, doing active streaming replication but
otherwise idle:
http://batch-files-test.s3.amazonaws.com/sql03.prod.html

bonnie++ on the master node:
http://batch-files-test.s3.amazonaws.com/sql01.prod.html

If I am reading this right, this is my first time using it, the
numbers dont look too good.

 See for example this:

 [1] http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes
 [2]
 http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-ec2.html

 I have enabled log_checkpoints and here is a recent sample from the log:
 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
 recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync
 files=82, longest=2.943 s, average=0.050 s

 Nothing special here - this just says that the checkpoints were timed and
 finished on time (the default checkpoint timeout is 5 minutes, with
 completion target 0.9 the expected checkpoint time is about 270s). Not a
 checkpoint issue, probably.

 I've been collecting random samples from pg_stat_bgwriter:
 https://gist.github.com/4faec2ca9a79ede281e1

 Although it's a bit difficult to interpret this (collect the data in
 regular intervals - e.g. every hour - and post the differences, please),
 but it seems reasonable.

Ok, I have a cron running every hour to grab this data. I will post
back in a few hours or tomorrow.


 So given all this information (if you need more just let me know), is
 there something fundamentally wrong or mis-configured? Do I have an
 I/O issue?

 Probably - the discrepancy between read/write performance is a bit
 suspicious.

 Try to watch the I/O performance when this happens, i.e. run iostat -x
 and watch the output (especially %util, r_await, w_await) and post several
 lines of the output.


Heres a gist of running iostat -x 3 for about a few minutes:

https://gist.github.com/f94d98f2ef498a522ac2

Indeed, the %iowat and await values can spike up drastically.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-14 Thread Cody Caughlan
On Mon, Nov 14, 2011 at 2:57 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 14 Listopad 2011, 22:58, Cody Caughlan wrote:
 Seems reasonable, although I'd bump up the checkpoint_timeout (the 5m is
 usually too low).

 Ok, will do.

 Yes, but find out what that means and think about the possible impact
 first. It usually improves the checkpoint behaviour but increases the
 recovery time and you may need more checkpoint segments. And I doubt this
 will fix the issue you've described.


Ok, I understand the ramifications with increasing the checkpoint
timeout. But I will investigate more before I jump in.


 I ran bonnie++ on a slave node, doing active streaming replication but
 otherwise idle:
 http://batch-files-test.s3.amazonaws.com/sql03.prod.html

 bonnie++ on the master node:
 http://batch-files-test.s3.amazonaws.com/sql01.prod.html

 If I am reading this right, this is my first time using it, the
 numbers dont look too good.

 Are those instances equal, i.e. use tha same RAID10 config etc.? It
 surprises me a bit that the slave performs much better than the master,
 for example the sequential reads are much faster (210MB/s vs. 60MB/s) and
 it handles about twice the number of seeks (345 vs. 170). But this may be
 skewed because of the workload.

Yes, these two machines are the same. sql01 is the master node and is
quite busy. Running bonnie++ on it during its normal workload spiked
I/O for the duration. sql03 is a pure slave and is quite idle, save
for receiving WAL segments.


 Heres a gist of running iostat -x 3 for about a few minutes:

 https://gist.github.com/f94d98f2ef498a522ac2

 Indeed, the %iowat and await values can spike up drastically.

 OK, so xvdb-xvde are individual drives and dm-0 is the RAID10 device,
 right? According to the log_checkpoint info, you're writing about 15000
 (120MB) buffers in 270s, i.e. about 440kB/s. But according to the iostat
 you're writing up to 4MB/s, so it's not just about the checkpoints.

 What else is going on there? How much WAL do you write?

Yes, dm-0 is the RAID10 device. The WAL config is:

wal_buffers = 16MB
checkpoint_segments = 32
max_wal_senders = 10
checkpoint_completion_target = 0.9
checkpoint_timeout = 300
wal_keep_segments = 1024


 Do you have iotop installed? That might give you a hint what processes are
 writing data etc.

I do have iotop and have been watching it. The only I/O users are
postgres and its backends. I dont see anything else consuming any I/O.
By eyeballing iotop, big consumers of disk writes are:

idle in transaction, SELECT, COMMIT

The first two are what I would think would be largely read operations
(certainly the SELECT) so its not clear why a SELECT consumes write
time.

Here is the output of some pg_stat_bgwriter stats from the last couple of hours:

https://gist.github.com/41ee26caca01471a9b77

One thing that I might not have made very clear earlier is that this
DB, especially a single table receives a very large number of UPDATEs.
However, it seems to be well cached, I have shared_buffers = 3584MB
and a view of pg_buffercache shows:
https://gist.github.com/53c520571290cae14613

Is it possible that we're just trying to handle too many UPDATEs and
they are all trying to hit disk all at once - causing this I/O
contention? Here is a view of pg_stat_user_tables that shows the
amount of live/dead tuples:

https://gist.github.com/5ac1ae7d11facd72913f

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Large number of short lived connections - could a connection pool help?

2011-11-14 Thread Cody Caughlan
We have anywhere from 60-80 background worker processes connecting to
Postgres, performing a short task and then disconnecting. The lifetime
of these tasks averages 1-3 seconds.

I know that there is some connection overhead to Postgres, but I dont
know what would be the best way to measure this overheard and/or to
determine if its currently an issue at all.

If there is a substantial overheard I would think that employing a
connection pool like pgbouncer to keep a static list of these
connections and then dole them out to the transient workers on demand.

So the overall cumulative number of connections wouldnt change, I
would just attempt to alleviate the setup/teardown of them so quickly.

Is this something that I should look into or is it not much of an
issue? Whats the best way to determine if I could benefit from using a
connection pool?

Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large number of short lived connections - could a connection pool help?

2011-11-14 Thread Cody Caughlan
On Mon, Nov 14, 2011 at 4:59 PM, Ben Chobot be...@silentmedia.com wrote:
 On Nov 14, 2011, at 4:42 PM, Cody Caughlan wrote:

 We have anywhere from 60-80 background worker processes connecting to
 Postgres, performing a short task and then disconnecting. The lifetime
 of these tasks averages 1-3 seconds.

 [snip]

 Is this something that I should look into or is it not much of an
 issue? Whats the best way to determine if I could benefit from using a
 connection pool?

 Yes, this is precisely a kind of situation a connection pooler will help 
 with. Not only with the the connection set up/tear down overhead, but also by 
 using resources on your server better you probably don't actually have 
 60-80 cores on your server, so reducing that number down to just a few that 
 are actually working will the Postgres finish them faster to work on others. 
 Basically, the queueing happens off the postgres server, letting postgres use 
 the box with less interruptions.

 Now, is it a problem to not use a pooler? That depends on if it's causing you 
 grief or not. But if you think you'll get more connection churn or larger 
 numbers of workers, then a connection pooler will only help more.

Thanks for your input. Its not causing me grief per se. The load on
the pg machine is small. I guess I am just wondering if I am being
stupid and leaving resources and/or performance on the table.

But it sounds that as a whole it would be a good use case for
pgbouncer and in the long run will prove beneficial. But no, its not
obviously killing me right now.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance