Hi John,

thanks very much for your analysis. I'll probably need to reorganize some things.

Regards,

Dirk

John A Meinel wrote:

Dirk Lutzebaeck wrote:

Hi,

here is a query which produces over 1G temp file in pgsql_tmp. This
is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
sort_mem and 320MB shared_mem.

Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
tables have been analyzed before.

Can some please explain why the temp file is so huge? I understand
there are a lot of rows.

Thanks in advance,

Dirk


...

-> Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1)


Well, there is this particular query where it thinks there will only be 3 rows, but in fact there are 703,677 of them. And the previous line:

-> Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)


Seem to indicate that after sorting you still have 22,439 rows, which then gets pared down again down to 1000.

I'm assuming that the sort you are trying to do is extremely expensive. You are sorting 700k rows, which takes up too much memory (1GB), which forces it to create a temporary table, and write it out to disk.

I didn't analyze it a lot, but you might get a lot better performance from doing a subselect, rather than the query you wrote.

You are joining 4 tables (bi, en, df AS ft, es) I don't know which tables are what size. In the end, though, you don't really care about the en table or es tables (they aren't in your output).

So maybe one of you subselects could be:

where bi.en = (select en from es where es.co = bi.co and es.spec=122293729);

I'm pretty sure the reason you need 1GB of temp space is because at one point you have 700k rows. Is it possible to rewrite the query so that it does more filtering earlier? Your distinct criteria seems to filter it down to 20k rows. So maybe it's possible to do some sort of a distinct in part of the subselect, before you start joining against other tables.

If you have that much redundancy, you might also need to think of doing a different normalization.

Just some thoughts.

Also, I thought using the "oid" column wasn't really recommended, since in *high* volume databases they aren't even guaranteed to be unique. (I think it is a 32-bit number that rolls over.) Also on a database dump and restore, they don't stay the same, unless you take a lot of extra care that they are included in both the dump and the restore. I believe it is better to create your own "id" per table (say SERIAL or BIGSERIAL).

John
=:->



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to