James G Wilkinson wrote:

All,

...

Firstly, I  am frankly mystified on how to interpret all this.  If anyone
could point me to a document or two that will help me decipher this,
I will greatly appreciate it.

I assume you have looked at:
http://www.postgresql.org/docs/8.0/static/performance-tips.html
And didn't find it helpful enough. I'm not really sure what help you are asking. Are you saying that this query is performing slowly and you want to speed it up? Or you just want to understand how to interpret the output of explain?

Secondly, I have figured out that SEQ SCANs are typically bad.  I am
concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data'
which is the largest table in the query.  I would rather have a SEQ SCAN
be performed on 'tmpgrid' which contains the keys that subset the data
from 'biogenic_beld3_data.'  Is this naive on my part?

It depends how much data is being extracted. If you have 1,000,000 rows, and only need 10, then an index scan is wonderful. If you need 999,999, then a sequential scan is much better (the break even point is <10%)
From the explain, it thinks it is going to be needing 5,637 rows from biogenic_beld3_data, what is that portion relative to the total?

The values at least look like you've run vacuum analyze. Have you tried running "explain analyze" instead of just explain? Then you can see if the planners estimates are accurate.

If you want some help to force it, you could try a subselect query. Something like:

select * from biogenic_beld3_data b where b.beld3_icell = (select b_icell from tmpgrid_pk) and b.beld3_jcell = (select b_jcell from tmpgrid_pk);


Thirdly, I have run EXPLAIN on other queries that report back a GroupAggregate Cost=<low 300,000s> that runs in about 30 minutes on my relatively highend linux machine. But when I run this particular query, it takes on the order of 90 minutes to complete. Any thoughts on why this happens will be appreciated.

Remember cost is in terms of page fetches, not in seconds.
Probably it is just an issue of postgres mis-estimating the selectivity of one of your queries.
Also, you have a fairly complex SUM occurring involving 4 multiplications on an estimated 150,000 rows. While doesn't seem like it should take 90 minutes, it also isn't a trivial operation.

Finally, if anyone can be so kind as to provide insight on how to better
optimize this query, I will, again, be deeply grateful.

Thanks in advance.

terrakit

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

John =:->


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to