Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-16 Thread Jay Levitt

Tom Lane wrote:

Jay Levittjay.lev...@gmail.com  writes:

If the query was more like



select questions.id
from questions
join (
select sum(u.id)
from users as u
group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;



would you no longer be surprised that it scanned all user rows?


I'd suggest rephrasing the query to do the join underneath the GROUP BY.


Well, my real goal is to have that inner query in a set-returning function 
that gives a computed table of other users relative to the current user, and 
then be able to JOIN that with other things and ORDER BY it:


select questions.id
from questions
join (select * from relevance(current_user)) as r
on r.id = questions.user_id
where questions.id = 1;

I assume there's no way for that function (in SQL or PL/pgSQL) to reach to 
the upper node and say do that join again here, or force the join order 
from down below? I can't imagine how there could be, but never hurts to ask. 
 Right now, our workaround is to pass the joined target user as a function 
parameter and do the JOIN in the function, but that means we have to put the 
function in the select list, else we hit the lack of LATERAL support:


 -- This would need LATERAL

select questions.id
from questions
join (
  select * from relevance(current_user, questions.user_id)) as r
)
on r.id = questions.user_id
where questions.id = 1;

 -- This works but has lots of row-at-a-time overhead

select questions.id, (
  select * from relevance(current_user, questions.user_id)
) as r
from questions
where questions.id = 1;

Again, just checking if there's a solution I'm missing.  I know the 
optimizer is only asymptotically approaching optimal!


Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-16 Thread Tomas Vondra
On 16 Listopad 2011, 5:27, Greg Smith wrote:
 On 11/14/2011 01:16 PM, Cody Caughlan wrote:
 We're starting to see some slow queries, especially COMMITs that are
 happening more frequently. The slow queries are against seemingly
 well-indexed tables.
 Slow commits like:

 2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG:  duration: 3062.784 ms
   statement: COMMIT
 2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG:  duration: 2593.351 ms
   statement: COMMIT

 These slow COMMITs are against tables that received a large number of
 UPDATEs and are growing fairly rapidly.

 Linux will fill its write cache with all of the writes coming out of
 each checkpoint.  With a 16GB instance, I would expect that 5% * 16GB ~=
 800MB of writes are batched up when your system is slow.  You should be
 able to confirm that by looking at the Dirty: line in /proc/meminfo

 With 800MB queued up and I/O that is lucky to get 50MB/s, the sync calls
 at the end of each checkpoint are sometimes blocking for multiple seconds:

 2011-11-14 17:38:48 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
 15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8
 recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync
 files=60, longest=1.484 s, average=0.049 s
 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
 recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync
 files=82, longest=2.943 s, average=0.050 s


 When an individual sync call gets stuck for that long, clients can
 easily get stuck behind it too.  There are a couple of techniques that
 might help:

The sync times I see there seem quite acceptable - 4.2s is not perfect,
but I wouldn't rate it as terrible. What actually annoys me is the amount
of data written - it's just 17000 pages, i.e. about 130 MB for a
checkpoint (spread over 5 minutes). So it's just like 0.5 MB/s.

 -Switch filesystems if you're running a slow one.  ext3 has bad latency
 behavior here, XFS and ext4 are better.

He's using xfs, IIRC. That's one of the better behaving ones, when it
comes to sync.

 -Lower the dirty_* tunables like dirty_background_ratio or its bytes
 version.  This will reduce average throughput, but can lower latency.
 -Spread checkpoints out more so that less average writes are happening.
 -Decrease shared_buffers so less data is getting pushed out at
 checkpoint time.
 -Reduce your reliability expectations and turn off synchronous_commit.

The question here probably is whether those high latencies are caused or
significantly influenced by the checkpoint, or are a feature of the
storage. Because if it's a feature, then all this is a futile attempt to
fix it.

I don't think he has problems with checkpoints - he's complaining about
regular queries being slow (even plain SELECT, i.e. something that usually
does not require a sync).

No doubt this may be connected, but a regular SELECT usually does not
perform a sync, right? It may need to fetch some data and if the I/O is
saturated by a checkpoint, this may take time. But again - those bonnie
results were collected with on a running system, i.e. with checkpoints in
progress and all of that.

And I'd expect most of the SELECT queries to be handled without actually
touching the devices, but by connecting
https://gist.github.com/5ac1ae7d11facd72913f and
https://gist.github.com/5ac1ae7d11facd72913f it seems that the larges
table (users) is almost completely in shared buffers, while the two other
large tables (external_user and facebook_friends) are cached by about 30%.
And I'd expect the rest of those tables to be in the page cache, so SELECT
queries on those tables should be fast.

A commit obviously requires a sync on the WAL - I wonder if moving the WAL
would improve the performance here.

This is obviously based on an incomplete set of stats, and maybe I'm
missing something.

 Your server is sometimes showing multi-second latency issues with
 bonnie++ too; that suggests how this problem is not even specific to
 PostgreSQL.  Linux is hard to tune for low latency under all

Don't forget those data were collected on a production system, i.e. it was
actually under load. That probably skews the results a lot.

 circumstances; fighting latency down under a heavy update workload is
 hard to do even with good hardware to accelerate write performance.  In
 an EC2 environment, it may not even be possible to do without making
 trade-offs like disabling synchronous writes.  I can easily get
 transactions hung for 10 to 15 seconds on one of their servers if I try
 to make that problem bad, you're only seeing the middle range of latency
 issues so far.

Are you talking about EBS or ephemeral storage? Because all this is about
ephemeral (something like a virtualized local storage).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make 

Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-16 Thread Tomas Vondra
On 16 Listopad 2011, 2:21, Cody Caughlan wrote:
 How did you build your RAID array? Maybe I have a fundamental flaw /
 misconfiguration. I am doing it via:

 $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
 $ pvcreate /dev/md0
 $ vgcreate lvm-raid10 /dev/md0
 $ lvcreate -l 215021 lvm-raid10 -n lvm0
 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0
 $ mkfs.xfs -f /dev/lvm-raid10/lvm0
 $ mkdir -p /data  mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data

I'm not using EC2 much, and those were my first attempts with ephemeral
storage, so this may be a stupid question, but why are you building a
RAID-10 array on an ephemeral storage, anyway?

You already have a standby, so if the primary instance fails you can
easily failover.

What are you going to do in case of a drive failure? With a server this is
rather easy - just put there a new drive and you're done, but can you do
that on EC2? I guess you can't do that when the instance is running, so
you'll have to switch to the standby anyway, right? Have you ever tried
this (how it affects the performance etc.)?

So what additional protection does that give you? Wouldn't a RAID-0 be a
better utilization of the resources?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-16 Thread Cody Caughlan

On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote:

 On 16 Listopad 2011, 2:21, Cody Caughlan wrote:
 How did you build your RAID array? Maybe I have a fundamental flaw /
 misconfiguration. I am doing it via:
 
 $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
 $ pvcreate /dev/md0
 $ vgcreate lvm-raid10 /dev/md0
 $ lvcreate -l 215021 lvm-raid10 -n lvm0
 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0
 $ mkfs.xfs -f /dev/lvm-raid10/lvm0
 $ mkdir -p /data  mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data
 
 I'm not using EC2 much, and those were my first attempts with ephemeral
 storage, so this may be a stupid question, but why are you building a
 RAID-10 array on an ephemeral storage, anyway?
 
 You already have a standby, so if the primary instance fails you can
 easily failover.
 

Yes, the slave will become master if master goes down. We have no plan to try 
and resurrect the master in the case of failure, hence the choice of ephemeral 
vs EBS. 

We chose RAID10 over RAID0 to get the best combination of performance and 
minimizing probability of a single drive failure bringing down the house.

So, yes, RAID0 would ultimately deliver the best performance, with more risk.

 What are you going to do in case of a drive failure? With a server this is
 rather easy - just put there a new drive and you're done, but can you do
 that on EC2? I guess you can't do that when the instance is running, so
 you'll have to switch to the standby anyway, right? Have you ever tried
 this (how it affects the performance etc.)?
 

As far as I know one cannot alter the ephemeral drives in a running instance, 
so yes, the whole instance would have to be written off.

 So what additional protection does that give you? Wouldn't a RAID-0 be a
 better utilization of the resources?
 

Too much risk.

 Tomas
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries / commits, mis-configuration or hardware issues?

2011-11-16 Thread Tomas Vondra
On 16 Listopad 2011, 18:31, Cody Caughlan wrote:

 On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote:

 On 16 Listopad 2011, 2:21, Cody Caughlan wrote:
 How did you build your RAID array? Maybe I have a fundamental flaw /
 misconfiguration. I am doing it via:

 $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
 $ pvcreate /dev/md0
 $ vgcreate lvm-raid10 /dev/md0
 $ lvcreate -l 215021 lvm-raid10 -n lvm0
 $ blockdev --setra 65536 /dev/lvm-raid10/lvm0
 $ mkfs.xfs -f /dev/lvm-raid10/lvm0
 $ mkdir -p /data  mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data

 I'm not using EC2 much, and those were my first attempts with ephemeral
 storage, so this may be a stupid question, but why are you building a
 RAID-10 array on an ephemeral storage, anyway?

 You already have a standby, so if the primary instance fails you can
 easily failover.


 Yes, the slave will become master if master goes down. We have no plan to
 try and resurrect the master in the case of failure, hence the choice of
 ephemeral vs EBS.

 We chose RAID10 over RAID0 to get the best combination of performance and
 minimizing probability of a single drive failure bringing down the house.

 So, yes, RAID0 would ultimately deliver the best performance, with more
 risk.

 What are you going to do in case of a drive failure? With a server this
 is
 rather easy - just put there a new drive and you're done, but can you do
 that on EC2? I guess you can't do that when the instance is running, so
 you'll have to switch to the standby anyway, right? Have you ever tried
 this (how it affects the performance etc.)?


 As far as I know one cannot alter the ephemeral drives in a running
 instance, so yes, the whole instance would have to be written off.

 So what additional protection does that give you? Wouldn't a RAID-0 be a
 better utilization of the resources?


 Too much risk.

Why? If I understand that correctly, the only case where a RAID-10
actually helps is when an ephemeral drive fails, but not the whole
instance. Do you have some numbers how often this happens, i.e. how often
a drive fails without the instance?

But you can't actually replace the failed drive, so the only option you
have is to failover to the standby - right? Sure - with async replication,
you could loose a the not-yet-sent transactions. I see two possible
solutions:

a) use sync rep, available in 9.1 (you already run 9.1.1)

b) place WAL on an EBS, mounted as part of the failover

The EBS are not exactly fast, but it seems (e.g.
http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/)
the sequential performance might be acceptable.

According to the stats you've posted, you've written about 5632 MB of WAL
data per hour. That's about 1.5 MB/s on average, and that might be handled
by the EBS. Yes, if you have a peak where you need to write much more
data, this is going to be a bottleneck.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
Linux F12 64bit
Postgres 8.4.4
16 proc / 32GB
8 disk 15KRPM SAS/Raid 5 (I know!)


shared_buffers = 6000MB 
#temp_buffers = 8MB 
max_prepared_transactions = 0   
work_mem = 250MB
maintenance_work_mem = 1000MB   





We now have about 180mill records in that table. The database size is
about 580GB and the userstats table which is the biggest one and the
one we query the most is 83GB.

Just a basic query takes 4 minutes:

For e.g. select count(distinct uid) from userstats where log_date '11/7/2011'

Since we are looking for distinct we can't obviously use an index. But
I'm wondering what should be expected and what is caused be tuning or
lack there of? Doing an iostat  I see maybe 10-15%, however the cpu
that this query is attached to is obviously in the 99-100% busy arena.
Or am I really IOBound for this single query (sure lots of data
but?!).

It takes roughly 5.5 hours to do a concurrent re-index and this DB is
vac'd nightly.

Just not sure if this is what to expect, however there are many other
DB's out there bigger than ours, so I'm curious what can I do?

Thanks
Tory

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.410.000.201.610.00   96.78

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  92.17  3343.06  1718.85 46273281004 23791660544

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.470.000.615.850.00   92.07

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 417.50 90372.00 0.00 180744  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.880.000.766.340.00   90.03

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 725.00183560.00   148.00 367120296

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.180.000.603.590.00   93.63

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 711.00179952.00   240.00 359904480

[blue@adb01 ~]$ iostat -xd 2
Linux 2.6.32.26-175.fc12.x86_64 (adb01) 11/16/2011  _x86_64_
(16 CPU)

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.18   191.40   68.71   23.45  3343.22  1718.85
54.92 0.124.61   2.05  18.94

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   2.00 0.00  706.508.00 178832.00   128.00
250.4777.76   31.21   1.40 100.00

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   4.9817.41  584.58   35.32 148497.51   672.64
240.6438.04  227.07   1.61  99.55

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   3.50 0.00  688.502.00 174556.0032.00
252.84 2.814.66   1.44  99.30

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   1.0010.00  717.501.50 182084.0092.00
253.37 2.433.37   1.38  99.45

^C
[blue@]$ iostat 2
Linux 2.6.32.26-175.fc12.x86_64 (adb01) 11/16/2011  _x86_64_
(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.410.000.201.610.00   96.78

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  92.17  3343.33  1718.85 46277115652 23791678248

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.790.000.518.510.00   83.20

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 235.50 45168.00 0.00  90336  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.900.000.354.460.00   89.29

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 160.00 14688.00   132.00  29376264

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   8.010.000.51   12.800.00   78.67

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 163.50 11324.00   700.00  22648   1400

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Alan Hodgson
On November 16, 2011 02:53:17 PM Tory M Blue wrote:
 We now have about 180mill records in that table. The database size is
 about 580GB and the userstats table which is the biggest one and the
 one we query the most is 83GB.
 
 Just a basic query takes 4 minutes:
 
 For e.g. select count(distinct uid) from userstats where log_date
 '11/7/2011'

 Just not sure if this is what to expect, however there are many other
 DB's out there bigger than ours, so I'm curious what can I do?

That query should use an index on log_date if one exists. Unless the planner 
thinks it would need to look at too much of the table.

Also, the normal approach to making large statistics tables more manageable is 
to partition them by date range.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Scott Marlowe
On Wed, Nov 16, 2011 at 4:27 PM, Alan Hodgson ahodg...@simkin.ca wrote:
 On November 16, 2011 02:53:17 PM Tory M Blue wrote:
 We now have about 180mill records in that table. The database size is
 about 580GB and the userstats table which is the biggest one and the
 one we query the most is 83GB.

 Just a basic query takes 4 minutes:

 For e.g. select count(distinct uid) from userstats where log_date
 '11/7/2011'

 Just not sure if this is what to expect, however there are many other
 DB's out there bigger than ours, so I'm curious what can I do?

 That query should use an index on log_date if one exists. Unless the planner
 thinks it would need to look at too much of the table.

Agreed.  We'd need to know how selective that where clause is.  Seeing
some forced index usage versus regular explain analyze would be
useful.  i.e.

set enable_seqscan=off;
explain analyze select ...

 Also, the normal approach to making large statistics tables more manageable is
 to partition them by date range.

If the OP's considering partitioning, they should really consider
upgrading to 9.1 which has much better performance of things like
aggregates against partition tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tomas Vondra
Hi.

On 16 Listopad 2011, 23:53, Tory M Blue wrote:

 We now have about 180mill records in that table. The database size is
 about 580GB and the userstats table which is the biggest one and the
 one we query the most is 83GB.

 Just a basic query takes 4 minutes:

 For e.g. select count(distinct uid) from userstats where log_date
 '11/7/2011'

 Since we are looking for distinct we can't obviously use an index. But
 I'm wondering what should be expected and what is caused be tuning or
 lack there of? Doing an iostat  I see maybe 10-15%, however the cpu
 that this query is attached to is obviously in the 99-100% busy arena.
 Or am I really IOBound for this single query (sure lots of data
 but?!).

What do you mean by can't use an index? The query may use an index to
evaluate the WHERE condition, no matter if there's a distinct or not.

The index-only scans that might be used to speed up this query are
committed in 9.2 - but even that might use index both for plain count and
count distinct.

But you're right - you're not bound by I/O (although I don't know what are
those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
keep all the distinct values to determine which are actually distinct.

If there's enough memory (work_mem) to keep all the values, this may be
done using a hash table (hash aggregate). Otherwise it has to sort them.
You can see this in explain plan (which you haven't posted).

Anyway this is actually a rather CPU intensive - how exactly depends on
the data type. Comparing integers is much easier / cheaper than comparing
text values. What data type is the 'uid' column?

 It takes roughly 5.5 hours to do a concurrent re-index and this DB is
 vac'd nightly.

 Just not sure if this is what to expect, however there are many other
 DB's out there bigger than ours, so I'm curious what can I do?

Well, not much. Use an integer data type for the 'uid' column (unless
you're already using it). Then you can use more work_mem so that a hash
aggregate is used (maybe it's already used, we need to see the explain
plan to check).

Then you could precompute the distinct values somehow - for example if
there are only a few distinct values for each day, you could do something
like this every day

INSERT INTO userstats_distinct
SELECT DISTINCT date_trunc('day', log_date), uid FROM userstats
 WHERE log_date BETWEEN date_trunc('day', log_date) - interval '1 day'
AND date_trunc('day', log_date);

and then just

SELECT COUNT(DISTINCT uid) FROM userstats_distinct
 WHERE log_date  '11/7/2011';


The point is to preaggregate the data to the desired granularity (e.g.
day), and how it improves the performance depends on how much the amount
of data decreases.

Another option is to use estimates instead of exact results - I've
actually written an extension for that, maybe you'll find that useful.
It's available on github (https://github.com/tvondra/distinct_estimators)
and pgxn (http://pgxn.org/tag/estimate/). I've posted a brief description
here:

http://www.fuzzy.cz/en/articles/aggregate-functions-for-distinct-estimation/

and the current extensions actually performs much better. It's not that
difficult to reach 1% precision. Let me know if this is interesting for
you and if you need a help with the extensions.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
Thanks all,  I misspoke on our use of the index.

We do have an index on log_date and it is being used here is the
explain analyze plan.



'Aggregate  (cost=7266186.16..7266186.17 rows=1 width=8) (actual
time=127575.030..127575.030 rows=1 loops=1)'
'  -  Bitmap Heap Scan on userstats  (cost=135183.17..7240890.38
rows=10118312 width=8) (actual time=8986.425..74815.790 rows=33084417
loops=1)'
'Recheck Cond: (log_date  '2011-11-04'::date)'
'-  Bitmap Index Scan on idx_userstats_logdate
(cost=0.00..132653.59 rows=10118312 width=0) (actual
time=8404.147..8404.147 rows=33084417 loops=1)'
'  Index Cond: (log_date  '2011-11-04'::date)'
'Total runtime: 127583.898 ms'

Partitioning Tables

This is use primarily when you are usually accessing only a part of
the data. We want our queries to go across the entire date range.  So
we don't really meet the criteria for partitioning (had to do some
quick research).

Thanks again
Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Scott Marlowe
On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

Actually I meant to comment on this, he is IO bound.  Look at % Util,
it's at 99 or 100.

Also, if you have 16 cores and look at something like vmstat you'll
see 6% wait state.  That 6% represents one CPU core waiting for IO,
the other cores will add up the rest to 100%.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Scott Marlowe
On Wed, Nov 16, 2011 at 4:52 PM, Samuel Gendler
sgend...@ideasculptor.com wrote:
 Could you elaborate on this a bit, or point me at some docs?  I manage a
 600GB db which does almost nothing but aggregates on partitioned tables -
 the largest of which has approx 600 million rows across all partitions.
  grouping in the aggregates tends to be on the partition column and rarely,
 if ever, would a group cross multiple partitions. We're on 9.0 and could
 definitely use some performance gains.

It's covered in the release notes for 9.1:

http://developer.postgresql.org/pgdocs/postgres/release-9-1.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Andy Colson

On 11/16/2011 04:53 PM, Tory M Blue wrote:

Linux F12 64bit
Postgres 8.4.4
16 proc / 32GB
8 disk 15KRPM SAS/Raid 5 (I know!)


shared_buffers = 6000MB 
#temp_buffers = 8MB 
max_prepared_transactions = 0   
work_mem = 250MB
maintenance_work_mem = 1000MB   





We now have about 180mill records in that table. The database size is
about 580GB and the userstats table which is the biggest one and the
one we query the most is 83GB.

Just a basic query takes 4 minutes:

For e.g. select count(distinct uid) from userstats where log_date'11/7/2011'



How'd you feel about keeping a monthly summary table?  Update it daily, with 
only a days worth of stats, then you could query the summary table much faster.

That's what I do for my website stats.  I log details for a month, then 
summarize everything into a summary table, and blow away the details.  You 
wouldn't have to delete the details if you wanted them, just keeping the 
summary table updated would be enough.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tomas Vondra
On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

Aaaah, I keep forgetting about this and I somehow ignored the iostat
results too. Yes, he's obviously IO bound.

But this actually means the pre-aggregating the data (as I described in my
previous post) would probably help him even more (less data, less CPU).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

I'm not so sure on the io-bound. Been battling/reading about it all
day. 1 CPU is pegged at 100%, but the disk is not. If I do something
else via another CPU I have no issues accessing the disks,
writing/deleting/reading. It appears that what was said about this
being very CPU intensive makes more sense to me. The query is only
using 1 CPU and that appears to be getting overwhelmed.

%util: This number depicts the percentage of time that the device
spent in servicing requests.

On a large query, or something that is taking a while it's going to be
writing to disk all the time and I'm thinking that is what the util is
telling me, especially since IOwait is in the 10-15% range.

Again just trying to absorb

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.930.000.609.840.00   88.62

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.0086.50 3453.001.50 55352.0016.00
16.03 5.240.66   0.29 100.00

I mean await time and service time are in the .29 to .66 msec that
doesn't read as IObound to me. But I'm more than willing to learn
something not totally postgres specific.

But I just don't see it...  Average queue size of 2.21 to 6, that's
really not a ton of stuff waiting

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00 3.50 3060.002.00 49224.0020.00
16.08 2.210.76   0.33  99.95

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.800.000.51   11.010.00   87.68

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00 5.00 3012.503.00 48200.0092.00
16.01 2.110.74   0.33  99.95

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.930.000.609.840.00   88.62

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.0086.50 3453.001.50 55352.0016.00
16.03 5.240.66   0.29 100.00

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Scott Marlowe
On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

 I'm not so sure on the io-bound. Been battling/reading about it all
 day. 1 CPU is pegged at 100%, but the disk is not. If I do something

Look here in iostat:

 Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda               0.00     3.50 3060.00    2.00 49224.00    20.00
 16.08     2.21    0.76   0.33  99.95

See that last column, it's % utilization.   Once it hits 100% you are
anywhere from pretty close to IO bound to right on past it.

I agree with the previous poster, you should roll these up ahead of
time into a materialized view for fast reporting.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Scott Marlowe
On Wed, Nov 16, 2011 at 8:02 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

 I'm not so sure on the io-bound. Been battling/reading about it all
 day. 1 CPU is pegged at 100%, but the disk is not. If I do something

 Look here in iostat:

 Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda               0.00     3.50 3060.00    2.00 49224.00    20.00
 16.08     2.21    0.76   0.33  99.95

 See that last column, it's % utilization.   Once it hits 100% you are
 anywhere from pretty close to IO bound to right on past it.

 I agree with the previous poster, you should roll these up ahead of
 time into a materialized view for fast reporting.

A followup.  A good tool to see how your machine is running over time
is the sar command and the needed sysstat service running and
collecting data.  You can get summary views of the last x weeks rolled
up in 5 minute increments on all kinds of system metrics.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
 keep all the distinct values to determine which are actually distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

 I'm not so sure on the io-bound. Been battling/reading about it all
 day. 1 CPU is pegged at 100%, but the disk is not. If I do something

 Look here in iostat:

 Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda               0.00     3.50 3060.00    2.00 49224.00    20.00
 16.08     2.21    0.76   0.33  99.95

 See that last column, it's % utilization.   Once it hits 100% you are
 anywhere from pretty close to IO bound to right on past it.

 I agree with the previous poster, you should roll these up ahead of
 time into a materialized view for fast reporting.

Ya I'm getting mixed opinions on that. avg queue size is nothing and
await and svctime is nothing, so maybe I'm on the edge, but it's not
at face value, the cause of the slow query times. I think the data
structure is, however as it seems I need to query against all the
data, I'm unclear how to best set that up. Partitioning is not the
answer it seems.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tomas Vondra
On 17 Listopad 2011, 4:16, Tory M Blue wrote:
 On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:
 On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know
 what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to
 actually
 keep all the distinct values to determine which are actually
 distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

 I'm not so sure on the io-bound. Been battling/reading about it all
 day. 1 CPU is pegged at 100%, but the disk is not. If I do something

 Look here in iostat:

 Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda               0.00     3.50 3060.00    2.00 49224.00    20.00
 16.08     2.21    0.76   0.33  99.95

 See that last column, it's % utilization.   Once it hits 100% you are
 anywhere from pretty close to IO bound to right on past it.

 I agree with the previous poster, you should roll these up ahead of
 time into a materialized view for fast reporting.

 Ya I'm getting mixed opinions on that. avg queue size is nothing and
 await and svctime is nothing, so maybe I'm on the edge, but it's not

What do you mean by nothing? There are 3060 reads/s, servicing each one
takes 0.33 ms - that means the drive is 100% utilized.

The problem with the iostat results you've posted earlier is that they
either use -xd or none of those switches. That means you can's see CPU
stats and extended I/O stats at the same time - use just -x next time.

Anyway the results show that %iowait is about 6% - as Scott Marlowe
pointed out, this means 1 core is waiting for I/O. That's the core running
your query. Try to execute the query 16x and you'll see the iowait is
100%.

 at face value, the cause of the slow query times. I think the data
 structure is, however as it seems I need to query against all the
 data, I'm unclear how to best set that up. Partitioning is not the
 answer it seems.

I'm not sure I understand what you mean by accessing all the data. You can
do that with partitioning too, although the execution plan may not be as
efficient as with a plain table. Try to partition the data by date (a
partition for each day / week) - my impression is that you're querying
data by date so this is a natural partitioning.

Anyway what I've recommended in my previous post was intelligent reduction
of the data - imagine for example there are 1000 unique visitors and each
of them does 1000 actions per day. That means 1.000.000 of rows. What you
can do is aggregating the data by user (at the end of the day, thus
processing just the single day), i.e. something like this

SELECT uid, count(*) FROM users WHERE log_date ... GROUP BY uid

and storing this in a table users_aggregated. This table has just 1000
rows (one for each user), so it's 1000x smaller.

But you can do this

SELECT COUNT(DISTINCT uid) FROM users_aggregated

and you'll get exactly the correct result.


Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Josh Berkus
Tory,

A seq scan across 83GB in 4 minutes is pretty good.   That's over
300MB/s.  Even if you assume that 1/3 of the table was already cached,
that's still over 240mb/s.  Good disk array.

Either you need an index, or you need to not do this query at user
request time.  Or a LOT more RAM.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
On Wed, Nov 16, 2011 at 7:47 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 4:16, Tory M Blue wrote:
 On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:
 On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue tmb...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
 On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra t...@fuzzy.cz wrote:

 But you're right - you're not bound by I/O (although I don't know
 what
 are
 those 15% - iowait, util or what?). The COUNT(DISTINCT) has to
 actually
 keep all the distinct values to determine which are actually
 distinct.

 Actually I meant to comment on this, he is IO bound.  Look at % Util,
 it's at 99 or 100.

 Also, if you have 16 cores and look at something like vmstat you'll
 see 6% wait state.  That 6% represents one CPU core waiting for IO,
 the other cores will add up the rest to 100%.

 Aaaah, I keep forgetting about this and I somehow ignored the iostat
 results too. Yes, he's obviously IO bound.

 I'm not so sure on the io-bound. Been battling/reading about it all
 day. 1 CPU is pegged at 100%, but the disk is not. If I do something

 Look here in iostat:

 Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda               0.00     3.50 3060.00    2.00 49224.00    20.00
 16.08     2.21    0.76   0.33  99.95

 See that last column, it's % utilization.   Once it hits 100% you are
 anywhere from pretty close to IO bound to right on past it.

 I agree with the previous poster, you should roll these up ahead of
 time into a materialized view for fast reporting.

 Ya I'm getting mixed opinions on that. avg queue size is nothing and
 await and svctime is nothing, so maybe I'm on the edge, but it's not

 What do you mean by nothing? There are 3060 reads/s, servicing each one
 takes 0.33 ms - that means the drive is 100% utilized.

 The problem with the iostat results you've posted earlier is that they
 either use -xd or none of those switches. That means you can's see CPU
 stats and extended I/O stats at the same time - use just -x next time.

 Anyway the results show that %iowait is about 6% - as Scott Marlowe
 pointed out, this means 1 core is waiting for I/O. That's the core running
 your query. Try to execute the query 16x and you'll see the iowait is
 100%.

Yes this I understand and is correct. But I'm wrestling with the idea
that the Disk is completely saturated. I've seen where I actually run
into high IO/Wait and see that load climbs as processes stack.

I'm not arguing (please know this), I appreciate the help and will try
almost anything that is offered here, but I think if I just threw
money at the situation (hardware), I wouldn't get any closer to
resolution of my issue. I am very interested in other solutions and
more DB structure changes etc.

Thanks !
Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Tory M Blue
On Wed, Nov 16, 2011 at 9:19 PM, Josh Berkus j...@agliodbs.com wrote:
 Tory,

 A seq scan across 83GB in 4 minutes is pretty good.   That's over
 300MB/s.  Even if you assume that 1/3 of the table was already cached,
 that's still over 240mb/s.  Good disk array.

 Either you need an index, or you need to not do this query at user
 request time.  Or a LOT more RAM.

Thanks josh,

That's also the other scenario, what is expected, maybe the 4 minutes
which turns into 5.5 hours or 23 hours for a report is just standard
based on our data and sizing.

Then it's about stopping the chase and start looking at tuning or
redesign if possible to allow for reports to finish in a timely
fashion. The data is going to grow a tad still, but reporting
requirements are on the rise.

You folks are the right place to seek answers from, I just need to
make sure I'm giving you the information that will allow you to
assist/help me.

Memory is not expensive these days, so it's possible that i bump the
server to the 192gb or whatever to give me the headroom, but we are
trying to dig a tad deeper into the data/queries/tuning before I go
the hardware route again.

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance