Re: [PERFORM] Subquery in a JOIN not getting restricted?
Tom Lane wrote: Jay Levittjay.lev...@gmail.com writes: If the query was more like select questions.id from questions join ( select sum(u.id) from users as u group by u.id ) as s on s.id = questions.user_id where questions.id = 1; would you no longer be surprised that it scanned all user rows? I'd suggest rephrasing the query to do the join underneath the GROUP BY. Well, my real goal is to have that inner query in a set-returning function that gives a computed table of other users relative to the current user, and then be able to JOIN that with other things and ORDER BY it: select questions.id from questions join (select * from relevance(current_user)) as r on r.id = questions.user_id where questions.id = 1; I assume there's no way for that function (in SQL or PL/pgSQL) to reach to the upper node and say do that join again here, or force the join order from down below? I can't imagine how there could be, but never hurts to ask. Right now, our workaround is to pass the joined target user as a function parameter and do the JOIN in the function, but that means we have to put the function in the select list, else we hit the lack of LATERAL support: -- This would need LATERAL select questions.id from questions join ( select * from relevance(current_user, questions.user_id)) as r ) on r.id = questions.user_id where questions.id = 1; -- This works but has lots of row-at-a-time overhead select questions.id, ( select * from relevance(current_user, questions.user_id) ) as r from questions where questions.id = 1; Again, just checking if there's a solution I'm missing. I know the optimizer is only asymptotically approaching optimal! Jay -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?
On 16 Listopad 2011, 5:27, Greg Smith wrote: On 11/14/2011 01:16 PM, Cody Caughlan wrote: We're starting to see some slow queries, especially COMMITs that are happening more frequently. The slow queries are against seemingly well-indexed tables. Slow commits like: 2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG: duration: 3062.784 ms statement: COMMIT 2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG: duration: 2593.351 ms statement: COMMIT These slow COMMITs are against tables that received a large number of UPDATEs and are growing fairly rapidly. Linux will fill its write cache with all of the writes coming out of each checkpoint. With a 16GB instance, I would expect that 5% * 16GB ~= 800MB of writes are batched up when your system is slow. You should be able to confirm that by looking at the Dirty: line in /proc/meminfo With 800MB queued up and I/O that is lucky to get 50MB/s, the sync calls at the end of each checkpoint are sometimes blocking for multiple seconds: 2011-11-14 17:38:48 UTC pid:3965 (-0) LOG: checkpoint complete: wrote 15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8 recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync files=60, longest=1.484 s, average=0.049 s 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG: checkpoint starting: time 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG: checkpoint complete: wrote 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9 recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync files=82, longest=2.943 s, average=0.050 s When an individual sync call gets stuck for that long, clients can easily get stuck behind it too. There are a couple of techniques that might help: The sync times I see there seem quite acceptable - 4.2s is not perfect, but I wouldn't rate it as terrible. What actually annoys me is the amount of data written - it's just 17000 pages, i.e. about 130 MB for a checkpoint (spread over 5 minutes). So it's just like 0.5 MB/s. -Switch filesystems if you're running a slow one. ext3 has bad latency behavior here, XFS and ext4 are better. He's using xfs, IIRC. That's one of the better behaving ones, when it comes to sync. -Lower the dirty_* tunables like dirty_background_ratio or its bytes version. This will reduce average throughput, but can lower latency. -Spread checkpoints out more so that less average writes are happening. -Decrease shared_buffers so less data is getting pushed out at checkpoint time. -Reduce your reliability expectations and turn off synchronous_commit. The question here probably is whether those high latencies are caused or significantly influenced by the checkpoint, or are a feature of the storage. Because if it's a feature, then all this is a futile attempt to fix it. I don't think he has problems with checkpoints - he's complaining about regular queries being slow (even plain SELECT, i.e. something that usually does not require a sync). No doubt this may be connected, but a regular SELECT usually does not perform a sync, right? It may need to fetch some data and if the I/O is saturated by a checkpoint, this may take time. But again - those bonnie results were collected with on a running system, i.e. with checkpoints in progress and all of that. And I'd expect most of the SELECT queries to be handled without actually touching the devices, but by connecting https://gist.github.com/5ac1ae7d11facd72913f and https://gist.github.com/5ac1ae7d11facd72913f it seems that the larges table (users) is almost completely in shared buffers, while the two other large tables (external_user and facebook_friends) are cached by about 30%. And I'd expect the rest of those tables to be in the page cache, so SELECT queries on those tables should be fast. A commit obviously requires a sync on the WAL - I wonder if moving the WAL would improve the performance here. This is obviously based on an incomplete set of stats, and maybe I'm missing something. Your server is sometimes showing multi-second latency issues with bonnie++ too; that suggests how this problem is not even specific to PostgreSQL. Linux is hard to tune for low latency under all Don't forget those data were collected on a production system, i.e. it was actually under load. That probably skews the results a lot. circumstances; fighting latency down under a heavy update workload is hard to do even with good hardware to accelerate write performance. In an EC2 environment, it may not even be possible to do without making trade-offs like disabling synchronous writes. I can easily get transactions hung for 10 to 15 seconds on one of their servers if I try to make that problem bad, you're only seeing the middle range of latency issues so far. Are you talking about EBS or ephemeral storage? Because all this is about ephemeral (something like a virtualized local storage). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make
Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?
On 16 Listopad 2011, 2:21, Cody Caughlan wrote: How did you build your RAID array? Maybe I have a fundamental flaw / misconfiguration. I am doing it via: $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde $ pvcreate /dev/md0 $ vgcreate lvm-raid10 /dev/md0 $ lvcreate -l 215021 lvm-raid10 -n lvm0 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0 $ mkfs.xfs -f /dev/lvm-raid10/lvm0 $ mkdir -p /data mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data I'm not using EC2 much, and those were my first attempts with ephemeral storage, so this may be a stupid question, but why are you building a RAID-10 array on an ephemeral storage, anyway? You already have a standby, so if the primary instance fails you can easily failover. What are you going to do in case of a drive failure? With a server this is rather easy - just put there a new drive and you're done, but can you do that on EC2? I guess you can't do that when the instance is running, so you'll have to switch to the standby anyway, right? Have you ever tried this (how it affects the performance etc.)? So what additional protection does that give you? Wouldn't a RAID-0 be a better utilization of the resources? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?
On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote: On 16 Listopad 2011, 2:21, Cody Caughlan wrote: How did you build your RAID array? Maybe I have a fundamental flaw / misconfiguration. I am doing it via: $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde $ pvcreate /dev/md0 $ vgcreate lvm-raid10 /dev/md0 $ lvcreate -l 215021 lvm-raid10 -n lvm0 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0 $ mkfs.xfs -f /dev/lvm-raid10/lvm0 $ mkdir -p /data mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data I'm not using EC2 much, and those were my first attempts with ephemeral storage, so this may be a stupid question, but why are you building a RAID-10 array on an ephemeral storage, anyway? You already have a standby, so if the primary instance fails you can easily failover. Yes, the slave will become master if master goes down. We have no plan to try and resurrect the master in the case of failure, hence the choice of ephemeral vs EBS. We chose RAID10 over RAID0 to get the best combination of performance and minimizing probability of a single drive failure bringing down the house. So, yes, RAID0 would ultimately deliver the best performance, with more risk. What are you going to do in case of a drive failure? With a server this is rather easy - just put there a new drive and you're done, but can you do that on EC2? I guess you can't do that when the instance is running, so you'll have to switch to the standby anyway, right? Have you ever tried this (how it affects the performance etc.)? As far as I know one cannot alter the ephemeral drives in a running instance, so yes, the whole instance would have to be written off. So what additional protection does that give you? Wouldn't a RAID-0 be a better utilization of the resources? Too much risk. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?
On 16 Listopad 2011, 18:31, Cody Caughlan wrote: On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote: On 16 Listopad 2011, 2:21, Cody Caughlan wrote: How did you build your RAID array? Maybe I have a fundamental flaw / misconfiguration. I am doing it via: $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde $ pvcreate /dev/md0 $ vgcreate lvm-raid10 /dev/md0 $ lvcreate -l 215021 lvm-raid10 -n lvm0 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0 $ mkfs.xfs -f /dev/lvm-raid10/lvm0 $ mkdir -p /data mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data I'm not using EC2 much, and those were my first attempts with ephemeral storage, so this may be a stupid question, but why are you building a RAID-10 array on an ephemeral storage, anyway? You already have a standby, so if the primary instance fails you can easily failover. Yes, the slave will become master if master goes down. We have no plan to try and resurrect the master in the case of failure, hence the choice of ephemeral vs EBS. We chose RAID10 over RAID0 to get the best combination of performance and minimizing probability of a single drive failure bringing down the house. So, yes, RAID0 would ultimately deliver the best performance, with more risk. What are you going to do in case of a drive failure? With a server this is rather easy - just put there a new drive and you're done, but can you do that on EC2? I guess you can't do that when the instance is running, so you'll have to switch to the standby anyway, right? Have you ever tried this (how it affects the performance etc.)? As far as I know one cannot alter the ephemeral drives in a running instance, so yes, the whole instance would have to be written off. So what additional protection does that give you? Wouldn't a RAID-0 be a better utilization of the resources? Too much risk. Why? If I understand that correctly, the only case where a RAID-10 actually helps is when an ephemeral drive fails, but not the whole instance. Do you have some numbers how often this happens, i.e. how often a drive fails without the instance? But you can't actually replace the failed drive, so the only option you have is to failover to the standby - right? Sure - with async replication, you could loose a the not-yet-sent transactions. I see two possible solutions: a) use sync rep, available in 9.1 (you already run 9.1.1) b) place WAL on an EBS, mounted as part of the failover The EBS are not exactly fast, but it seems (e.g. http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/) the sequential performance might be acceptable. According to the stats you've posted, you've written about 5632 MB of WAL data per hour. That's about 1.5 MB/s on average, and that might be handled by the EBS. Yes, if you have a peak where you need to write much more data, this is going to be a bottleneck. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance question 83 GB Table 150 million rows, distinct select
Linux F12 64bit Postgres 8.4.4 16 proc / 32GB 8 disk 15KRPM SAS/Raid 5 (I know!) shared_buffers = 6000MB #temp_buffers = 8MB max_prepared_transactions = 0 work_mem = 250MB maintenance_work_mem = 1000MB We now have about 180mill records in that table. The database size is about 580GB and the userstats table which is the biggest one and the one we query the most is 83GB. Just a basic query takes 4 minutes: For e.g. select count(distinct uid) from userstats where log_date '11/7/2011' Since we are looking for distinct we can't obviously use an index. But I'm wondering what should be expected and what is caused be tuning or lack there of? Doing an iostat I see maybe 10-15%, however the cpu that this query is attached to is obviously in the 99-100% busy arena. Or am I really IOBound for this single query (sure lots of data but?!). It takes roughly 5.5 hours to do a concurrent re-index and this DB is vac'd nightly. Just not sure if this is what to expect, however there are many other DB's out there bigger than ours, so I'm curious what can I do? Thanks Tory avg-cpu: %user %nice %system %iowait %steal %idle 1.410.000.201.610.00 96.78 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 92.17 3343.06 1718.85 46273281004 23791660544 avg-cpu: %user %nice %system %iowait %steal %idle 1.470.000.615.850.00 92.07 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 417.50 90372.00 0.00 180744 0 avg-cpu: %user %nice %system %iowait %steal %idle 2.880.000.766.340.00 90.03 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 725.00183560.00 148.00 367120296 avg-cpu: %user %nice %system %iowait %steal %idle 2.180.000.603.590.00 93.63 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 711.00179952.00 240.00 359904480 [blue@adb01 ~]$ iostat -xd 2 Linux 2.6.32.26-175.fc12.x86_64 (adb01) 11/16/2011 _x86_64_ (16 CPU) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.18 191.40 68.71 23.45 3343.22 1718.85 54.92 0.124.61 2.05 18.94 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 2.00 0.00 706.508.00 178832.00 128.00 250.4777.76 31.21 1.40 100.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 4.9817.41 584.58 35.32 148497.51 672.64 240.6438.04 227.07 1.61 99.55 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 3.50 0.00 688.502.00 174556.0032.00 252.84 2.814.66 1.44 99.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1.0010.00 717.501.50 182084.0092.00 253.37 2.433.37 1.38 99.45 ^C [blue@]$ iostat 2 Linux 2.6.32.26-175.fc12.x86_64 (adb01) 11/16/2011 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.410.000.201.610.00 96.78 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 92.17 3343.33 1718.85 46277115652 23791678248 avg-cpu: %user %nice %system %iowait %steal %idle 7.790.000.518.510.00 83.20 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 235.50 45168.00 0.00 90336 0 avg-cpu: %user %nice %system %iowait %steal %idle 5.900.000.354.460.00 89.29 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 160.00 14688.00 132.00 29376264 avg-cpu: %user %nice %system %iowait %steal %idle 8.010.000.51 12.800.00 78.67 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 163.50 11324.00 700.00 22648 1400 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On November 16, 2011 02:53:17 PM Tory M Blue wrote: We now have about 180mill records in that table. The database size is about 580GB and the userstats table which is the biggest one and the one we query the most is 83GB. Just a basic query takes 4 minutes: For e.g. select count(distinct uid) from userstats where log_date '11/7/2011' Just not sure if this is what to expect, however there are many other DB's out there bigger than ours, so I'm curious what can I do? That query should use an index on log_date if one exists. Unless the planner thinks it would need to look at too much of the table. Also, the normal approach to making large statistics tables more manageable is to partition them by date range. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 4:27 PM, Alan Hodgson ahodg...@simkin.ca wrote: On November 16, 2011 02:53:17 PM Tory M Blue wrote: We now have about 180mill records in that table. The database size is about 580GB and the userstats table which is the biggest one and the one we query the most is 83GB. Just a basic query takes 4 minutes: For e.g. select count(distinct uid) from userstats where log_date '11/7/2011' Just not sure if this is what to expect, however there are many other DB's out there bigger than ours, so I'm curious what can I do? That query should use an index on log_date if one exists. Unless the planner thinks it would need to look at too much of the table. Agreed. We'd need to know how selective that where clause is. Seeing some forced index usage versus regular explain analyze would be useful. i.e. set enable_seqscan=off; explain analyze select ... Also, the normal approach to making large statistics tables more manageable is to partition them by date range. If the OP's considering partitioning, they should really consider upgrading to 9.1 which has much better performance of things like aggregates against partition tables. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
Hi. On 16 Listopad 2011, 23:53, Tory M Blue wrote: We now have about 180mill records in that table. The database size is about 580GB and the userstats table which is the biggest one and the one we query the most is 83GB. Just a basic query takes 4 minutes: For e.g. select count(distinct uid) from userstats where log_date '11/7/2011' Since we are looking for distinct we can't obviously use an index. But I'm wondering what should be expected and what is caused be tuning or lack there of? Doing an iostat I see maybe 10-15%, however the cpu that this query is attached to is obviously in the 99-100% busy arena. Or am I really IOBound for this single query (sure lots of data but?!). What do you mean by can't use an index? The query may use an index to evaluate the WHERE condition, no matter if there's a distinct or not. The index-only scans that might be used to speed up this query are committed in 9.2 - but even that might use index both for plain count and count distinct. But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. If there's enough memory (work_mem) to keep all the values, this may be done using a hash table (hash aggregate). Otherwise it has to sort them. You can see this in explain plan (which you haven't posted). Anyway this is actually a rather CPU intensive - how exactly depends on the data type. Comparing integers is much easier / cheaper than comparing text values. What data type is the 'uid' column? It takes roughly 5.5 hours to do a concurrent re-index and this DB is vac'd nightly. Just not sure if this is what to expect, however there are many other DB's out there bigger than ours, so I'm curious what can I do? Well, not much. Use an integer data type for the 'uid' column (unless you're already using it). Then you can use more work_mem so that a hash aggregate is used (maybe it's already used, we need to see the explain plan to check). Then you could precompute the distinct values somehow - for example if there are only a few distinct values for each day, you could do something like this every day INSERT INTO userstats_distinct SELECT DISTINCT date_trunc('day', log_date), uid FROM userstats WHERE log_date BETWEEN date_trunc('day', log_date) - interval '1 day' AND date_trunc('day', log_date); and then just SELECT COUNT(DISTINCT uid) FROM userstats_distinct WHERE log_date '11/7/2011'; The point is to preaggregate the data to the desired granularity (e.g. day), and how it improves the performance depends on how much the amount of data decreases. Another option is to use estimates instead of exact results - I've actually written an extension for that, maybe you'll find that useful. It's available on github (https://github.com/tvondra/distinct_estimators) and pgxn (http://pgxn.org/tag/estimate/). I've posted a brief description here: http://www.fuzzy.cz/en/articles/aggregate-functions-for-distinct-estimation/ and the current extensions actually performs much better. It's not that difficult to reach 1% precision. Let me know if this is interesting for you and if you need a help with the extensions. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
Thanks all, I misspoke on our use of the index. We do have an index on log_date and it is being used here is the explain analyze plan. 'Aggregate (cost=7266186.16..7266186.17 rows=1 width=8) (actual time=127575.030..127575.030 rows=1 loops=1)' ' - Bitmap Heap Scan on userstats (cost=135183.17..7240890.38 rows=10118312 width=8) (actual time=8986.425..74815.790 rows=33084417 loops=1)' 'Recheck Cond: (log_date '2011-11-04'::date)' '- Bitmap Index Scan on idx_userstats_logdate (cost=0.00..132653.59 rows=10118312 width=0) (actual time=8404.147..8404.147 rows=33084417 loops=1)' ' Index Cond: (log_date '2011-11-04'::date)' 'Total runtime: 127583.898 ms' Partitioning Tables This is use primarily when you are usually accessing only a part of the data. We want our queries to go across the entire date range. So we don't really meet the criteria for partitioning (had to do some quick research). Thanks again Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 4:52 PM, Samuel Gendler sgend...@ideasculptor.com wrote: Could you elaborate on this a bit, or point me at some docs? I manage a 600GB db which does almost nothing but aggregates on partitioned tables - the largest of which has approx 600 million rows across all partitions. grouping in the aggregates tends to be on the partition column and rarely, if ever, would a group cross multiple partitions. We're on 9.0 and could definitely use some performance gains. It's covered in the release notes for 9.1: http://developer.postgresql.org/pgdocs/postgres/release-9-1.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On 11/16/2011 04:53 PM, Tory M Blue wrote: Linux F12 64bit Postgres 8.4.4 16 proc / 32GB 8 disk 15KRPM SAS/Raid 5 (I know!) shared_buffers = 6000MB #temp_buffers = 8MB max_prepared_transactions = 0 work_mem = 250MB maintenance_work_mem = 1000MB We now have about 180mill records in that table. The database size is about 580GB and the userstats table which is the biggest one and the one we query the most is 83GB. Just a basic query takes 4 minutes: For e.g. select count(distinct uid) from userstats where log_date'11/7/2011' How'd you feel about keeping a monthly summary table? Update it daily, with only a days worth of stats, then you could query the summary table much faster. That's what I do for my website stats. I log details for a month, then summarize everything into a summary table, and blow away the details. You wouldn't have to delete the details if you wanted them, just keeping the summary table updated would be enough. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. But this actually means the pre-aggregating the data (as I described in my previous post) would probably help him even more (less data, less CPU). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something else via another CPU I have no issues accessing the disks, writing/deleting/reading. It appears that what was said about this being very CPU intensive makes more sense to me. The query is only using 1 CPU and that appears to be getting overwhelmed. %util: This number depicts the percentage of time that the device spent in servicing requests. On a large query, or something that is taking a while it's going to be writing to disk all the time and I'm thinking that is what the util is telling me, especially since IOwait is in the 10-15% range. Again just trying to absorb avg-cpu: %user %nice %system %iowait %steal %idle 0.930.000.609.840.00 88.62 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.0086.50 3453.001.50 55352.0016.00 16.03 5.240.66 0.29 100.00 I mean await time and service time are in the .29 to .66 msec that doesn't read as IObound to me. But I'm more than willing to learn something not totally postgres specific. But I just don't see it... Average queue size of 2.21 to 6, that's really not a ton of stuff waiting Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.002.00 49224.0020.00 16.08 2.210.76 0.33 99.95 avg-cpu: %user %nice %system %iowait %steal %idle 0.800.000.51 11.010.00 87.68 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 5.00 3012.503.00 48200.0092.00 16.01 2.110.74 0.33 99.95 avg-cpu: %user %nice %system %iowait %steal %idle 0.930.000.609.840.00 88.62 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.0086.50 3453.001.50 55352.0016.00 16.03 5.240.66 0.29 100.00 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote: On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something Look here in iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.00 2.00 49224.00 20.00 16.08 2.21 0.76 0.33 99.95 See that last column, it's % utilization. Once it hits 100% you are anywhere from pretty close to IO bound to right on past it. I agree with the previous poster, you should roll these up ahead of time into a materialized view for fast reporting. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 8:02 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote: On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something Look here in iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.00 2.00 49224.00 20.00 16.08 2.21 0.76 0.33 99.95 See that last column, it's % utilization. Once it hits 100% you are anywhere from pretty close to IO bound to right on past it. I agree with the previous poster, you should roll these up ahead of time into a materialized view for fast reporting. A followup. A good tool to see how your machine is running over time is the sar command and the needed sysstat service running and collecting data. You can get summary views of the last x weeks rolled up in 5 minute increments on all kinds of system metrics. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote: On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something Look here in iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.00 2.00 49224.00 20.00 16.08 2.21 0.76 0.33 99.95 See that last column, it's % utilization. Once it hits 100% you are anywhere from pretty close to IO bound to right on past it. I agree with the previous poster, you should roll these up ahead of time into a materialized view for fast reporting. Ya I'm getting mixed opinions on that. avg queue size is nothing and await and svctime is nothing, so maybe I'm on the edge, but it's not at face value, the cause of the slow query times. I think the data structure is, however as it seems I need to query against all the data, I'm unclear how to best set that up. Partitioning is not the answer it seems. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On 17 Listopad 2011, 4:16, Tory M Blue wrote: On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote: On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something Look here in iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.00 2.00 49224.00 20.00 16.08 2.21 0.76 0.33 99.95 See that last column, it's % utilization. Once it hits 100% you are anywhere from pretty close to IO bound to right on past it. I agree with the previous poster, you should roll these up ahead of time into a materialized view for fast reporting. Ya I'm getting mixed opinions on that. avg queue size is nothing and await and svctime is nothing, so maybe I'm on the edge, but it's not What do you mean by nothing? There are 3060 reads/s, servicing each one takes 0.33 ms - that means the drive is 100% utilized. The problem with the iostat results you've posted earlier is that they either use -xd or none of those switches. That means you can's see CPU stats and extended I/O stats at the same time - use just -x next time. Anyway the results show that %iowait is about 6% - as Scott Marlowe pointed out, this means 1 core is waiting for I/O. That's the core running your query. Try to execute the query 16x and you'll see the iowait is 100%. at face value, the cause of the slow query times. I think the data structure is, however as it seems I need to query against all the data, I'm unclear how to best set that up. Partitioning is not the answer it seems. I'm not sure I understand what you mean by accessing all the data. You can do that with partitioning too, although the execution plan may not be as efficient as with a plain table. Try to partition the data by date (a partition for each day / week) - my impression is that you're querying data by date so this is a natural partitioning. Anyway what I've recommended in my previous post was intelligent reduction of the data - imagine for example there are 1000 unique visitors and each of them does 1000 actions per day. That means 1.000.000 of rows. What you can do is aggregating the data by user (at the end of the day, thus processing just the single day), i.e. something like this SELECT uid, count(*) FROM users WHERE log_date ... GROUP BY uid and storing this in a table users_aggregated. This table has just 1000 rows (one for each user), so it's 1000x smaller. But you can do this SELECT COUNT(DISTINCT uid) FROM users_aggregated and you'll get exactly the correct result. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
Tory, A seq scan across 83GB in 4 minutes is pretty good. That's over 300MB/s. Even if you assume that 1/3 of the table was already cached, that's still over 240mb/s. Good disk array. Either you need an index, or you need to not do this query at user request time. Or a LOT more RAM. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 7:47 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 4:16, Tory M Blue wrote: On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote: On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote: On 17 Listopad 2011, 2:57, Scott Marlowe wrote: On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote: But you're right - you're not bound by I/O (although I don't know what are those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually keep all the distinct values to determine which are actually distinct. Actually I meant to comment on this, he is IO bound. Look at % Util, it's at 99 or 100. Also, if you have 16 cores and look at something like vmstat you'll see 6% wait state. That 6% represents one CPU core waiting for IO, the other cores will add up the rest to 100%. Aaaah, I keep forgetting about this and I somehow ignored the iostat results too. Yes, he's obviously IO bound. I'm not so sure on the io-bound. Been battling/reading about it all day. 1 CPU is pegged at 100%, but the disk is not. If I do something Look here in iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.50 3060.00 2.00 49224.00 20.00 16.08 2.21 0.76 0.33 99.95 See that last column, it's % utilization. Once it hits 100% you are anywhere from pretty close to IO bound to right on past it. I agree with the previous poster, you should roll these up ahead of time into a materialized view for fast reporting. Ya I'm getting mixed opinions on that. avg queue size is nothing and await and svctime is nothing, so maybe I'm on the edge, but it's not What do you mean by nothing? There are 3060 reads/s, servicing each one takes 0.33 ms - that means the drive is 100% utilized. The problem with the iostat results you've posted earlier is that they either use -xd or none of those switches. That means you can's see CPU stats and extended I/O stats at the same time - use just -x next time. Anyway the results show that %iowait is about 6% - as Scott Marlowe pointed out, this means 1 core is waiting for I/O. That's the core running your query. Try to execute the query 16x and you'll see the iowait is 100%. Yes this I understand and is correct. But I'm wrestling with the idea that the Disk is completely saturated. I've seen where I actually run into high IO/Wait and see that load climbs as processes stack. I'm not arguing (please know this), I appreciate the help and will try almost anything that is offered here, but I think if I just threw money at the situation (hardware), I wouldn't get any closer to resolution of my issue. I am very interested in other solutions and more DB structure changes etc. Thanks ! Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select
On Wed, Nov 16, 2011 at 9:19 PM, Josh Berkus j...@agliodbs.com wrote: Tory, A seq scan across 83GB in 4 minutes is pretty good. That's over 300MB/s. Even if you assume that 1/3 of the table was already cached, that's still over 240mb/s. Good disk array. Either you need an index, or you need to not do this query at user request time. Or a LOT more RAM. Thanks josh, That's also the other scenario, what is expected, maybe the 4 minutes which turns into 5.5 hours or 23 hours for a report is just standard based on our data and sizing. Then it's about stopping the chase and start looking at tuning or redesign if possible to allow for reports to finish in a timely fashion. The data is going to grow a tad still, but reporting requirements are on the rise. You folks are the right place to seek answers from, I just need to make sure I'm giving you the information that will allow you to assist/help me. Memory is not expensive these days, so it's possible that i bump the server to the 192gb or whatever to give me the headroom, but we are trying to dig a tad deeper into the data/queries/tuning before I go the hardware route again. Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance