[PERFORM] query produces 1 GB temp file
Hi, here is a query which produces over 1G temp file in pgsql_tmp. This is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB sort_mem and 320MB shared_mem. Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All tables have been analyzed before. Can some please explain why the temp file is so huge? I understand there are a lot of rows. All relevant indices seem to be used. Thanks in advance, Dirk EXPLAIN SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.flatobj, bi.oid, bi.en FROM bi, en, df AS ft, es WHERE bi.rc=130170467 AND bi.en=ft.en AND bi.co=117305223 AND bi.hide=FALSE AND ft.en=en.oid AND es.en=bi.en AND es.co=bi.co AND es.spec=122293729 AND (ft.val_2='DG' OR ft.val_2='SK') AND ft.docstart=1 ORDER BY ft.val_9 ASC, ft.created DESC LIMIT 1000 OFFSET 0; Limit (cost=8346.75..8346.78 rows=3 width=1361) - Unique (cost=8346.75..8346.78 rows=3 width=1361) - Sort (cost=8346.75..8346.76 rows=3 width=1361) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) Index Cond: (outer.en = en.oid) - Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) Index Cond: (outer.en = ft.en) Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1)) (17 rows) -- EXPLAIN ANALYZE gives: Limit (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1) - Unique (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1) - Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563 loops=1) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) (actual time=0.184..46.519 rows=5863 loops=1) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218 rows=8 loops=5863) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1 loops=48563) Index Cond: (outer.en = en.oid) - Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) (actual time=0.038..0.148 rows=14 loops=48563) Index Cond: (outer.en = ft.en) Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1)) Total runtime: 81782.052 ms (18 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] query produces 1 GB temp file
Greg Stark wrote: I gave a bunch of explain analyze select commands to test estimates for individual columns. What results do they come up with? If those are inaccurate then raising the statistics target is a good route. If those are accurate individually but the combination is inaccurate then you have a more difficult problem. After setting the new statistics target to 200 they did slightly better but not accurate. The results were attached to my last post. Here is a copy: explain analyze select * from bi where rc=130170467; QUERY PLAN --- Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual time=0.157..3066.028 rows=513724 loops=1) Filter: (rc = 130170467::oid) Total runtime: 4208.663 ms (3 rows) explain analyze select * from bi where co=117305223; QUERY PLAN --- Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual time=0.021..3692.238 rows=945487 loops=1) Filter: (co = 117305223::oid) Total runtime: 5786.268 ms (3 rows) Here is the distribution of the data in bi: select count(*) from bi; 1841966 select count(*) from bi where rc=130170467::oid; 513732 select count(*) from bi where co=117305223::oid; 945503 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query produces 1 GB temp file
Hi John, thanks very much for your analysis. I'll probably need to reorganize some things. Regards, Dirk John A Meinel wrote: Dirk Lutzebaeck wrote: Hi, here is a query which produces over 1G temp file in pgsql_tmp. This is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB sort_mem and 320MB shared_mem. Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All tables have been analyzed before. Can some please explain why the temp file is so huge? I understand there are a lot of rows. Thanks in advance, Dirk ... - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1) Well, there is this particular query where it thinks there will only be 3 rows, but in fact there are 703,677 of them. And the previous line: - Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1) Seem to indicate that after sorting you still have 22,439 rows, which then gets pared down again down to 1000. I'm assuming that the sort you are trying to do is extremely expensive. You are sorting 700k rows, which takes up too much memory (1GB), which forces it to create a temporary table, and write it out to disk. I didn't analyze it a lot, but you might get a lot better performance from doing a subselect, rather than the query you wrote. You are joining 4 tables (bi, en, df AS ft, es) I don't know which tables are what size. In the end, though, you don't really care about the en table or es tables (they aren't in your output). So maybe one of you subselects could be: where bi.en = (select en from es where es.co = bi.co and es.spec=122293729); I'm pretty sure the reason you need 1GB of temp space is because at one point you have 700k rows. Is it possible to rewrite the query so that it does more filtering earlier? Your distinct criteria seems to filter it down to 20k rows. So maybe it's possible to do some sort of a distinct in part of the subselect, before you start joining against other tables. If you have that much redundancy, you might also need to think of doing a different normalization. Just some thoughts. Also, I thought using the oid column wasn't really recommended, since in *high* volume databases they aren't even guaranteed to be unique. (I think it is a 32-bit number that rolls over.) Also on a database dump and restore, they don't stay the same, unless you take a lot of extra care that they are included in both the dump and the restore. I believe it is better to create your own id per table (say SERIAL or BIGSERIAL). John =:- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] query produces 1 GB temp file
Tom, the orginal query has more output columns. I reduced it for readability. Specifically it returns a persitent object (flatobj column) which needs to be processed by the application as the returned result. The problem of the huge sort space usage seems to be that the flatobj is part of the row, so it used always copied in the sort algorithm I guess. When I drop the flatobj from the output columns the size of the temp space file drops dramatically. So I'll probably need to read flatobj after the sorting from the limited return result in a subselect. Regards, Dirk Tom Lane wrote: [EMAIL PROTECTED] (Dirk Lutzebaeck) writes: SELECT DISTINCT ON (df.val_9, df.created, df.flatid) df.docindex, df.flatobj, bi.oid, bi.en FROM bi,df WHERE bi.rc=130170467 ... ORDER BY df.val_9 ASC, df.created DESC LIMIT 1000 OFFSET 0 Just out of curiosity, what is this query supposed to *do* exactly? It looks to me like it will give indeterminate results. Practical uses of DISTINCT ON generally specify more ORDER BY columns than there are DISTINCT ON columns, because the extra columns determine which rows have priority to survive the DISTINCT filter. With the above query, you have absolutely no idea which row will be output for a given combination of val_9/created/flatid. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query produces 1 GB temp file
John A Meinel wrote: Dirk Lutzebaeck wrote: Greg Stark wrote: I gave a bunch of explain analyze select commands to test estimates for individual columns. What results do they come up with? If those are inaccurate then raising the statistics target is a good route. If those are accurate individually but the combination is inaccurate then you have a more difficult problem. After setting the new statistics target to 200 they did slightly better but not accurate. The results were attached to my last post. Here is a copy: It does seem that setting the statistics to a higher value would help. Since rc=130170467 seems to account for almost 1/3 of the data. Probably you have other values that are much less common. So setting a high statistics target would help the planner realize that this value occurs at a different frequency from the other ones. Can you try other numbers and see what the counts are? There is not much effect when increasing statistics target much higher. I guess this is because rc=130170467 takes a large portion of the column distribution. I assume you did do a vacuum analyze after adjusting the statistics target. Yes. Also interesting that in the time it took you to place these queries, you had received 26 new rows. Yes, it's a live system... And finally, what is the row count if you do explain analyze select * from bi where rc=130170467::oid and co=117305223::oid; explain analyze select * from bi where rc=130170467::oid and co=117305223::oid; QUERY PLAN --- Seq Scan on bi (cost=0.00..43866.19 rows=105544 width=51) (actual time=0.402..3724.222 rows=513732 loops=1) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid)) Well both columns data take about 1/4 of the whole table. There is not much distributed data. So it needs to do full scans... If this is a lot less than say 500k, then probably you aren't going to be helped a lot. The postgresql statistics engine doesn't generate cross column statistics. It always assumes random distribution of data. So if two columns are correlated (or anti-correlated), it won't realize that. 105k, that seems to be may problem. No much random data. Does 8.0 address this problem? Even so, your original desire was to reduce the size of the intermediate step (where you have 700k rows). So you need to try and design a subselect on bi which is as restrictive as possible, so that you don't get all of these rows. With any luck, the planner will realize ahead of time that there won't be that many rows, and can use indexes, etc. But even if it doesn't use an index scan, if you have a query that doesn't use a lot of rows, then you won't need a lot of disk space. I'll try that. What I have already noticed it that one of my output column is quite large so that's why it uses so much temp space. I'll probably need to sort without that output column and read it in afterwards using a subselect on the limted result. Thanks for your help, Dirk John =:- explain analyze select * from bi where rc=130170467; QUERY PLAN --- Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual time=0.157..3066.028 rows=513724 loops=1) Filter: (rc = 130170467::oid) Total runtime: 4208.663 ms (3 rows) explain analyze select * from bi where co=117305223; QUERY PLAN --- Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual time=0.021..3692.238 rows=945487 loops=1) Filter: (co = 117305223::oid) Total runtime: 5786.268 ms (3 rows) Here is the distribution of the data in bi: select count(*) from bi; 1841966 select count(*) from bi where rc=130170467::oid; 513732 select count(*) from bi where co=117305223::oid; 945503 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] query produces 1 GB temp file
Hi, here is a query which produces over 1G temp file in pgsql_tmp. This is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB sort_mem and 320MB shared_mem. Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All tables have been analyzed before. Can some please explain why the temp file is so huge? I understand there are a lot of rows. Thanks in advance, Dirk EXPLAIN SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.docindex, ft.flatobj, bi.oid, bi.en FROM bi, en, df AS ft, es WHERE bi.rc=130170467 AND bi.en=ft.en AND bi.co=117305223 AND bi.hide=FALSE AND ft.en=en.oid AND es.en=bi.en AND es.co=bi.co AND es.spec=122293729 AND (ft.val_2='DG' OR ft.val_2='SK') AND ft.docstart=1 ORDER BY ft.val_9 ASC, ft.created DESC LIMIT 1000 OFFSET 0; Limit (cost=8346.75..8346.78 rows=3 width=1361) - Unique (cost=8346.75..8346.78 rows=3 width=1361) - Sort (cost=8346.75..8346.76 rows=3 width=1361) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) Index Cond: (outer.en = en.oid) - Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) Index Cond: (outer.en = ft.en) Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1)) (17 rows) -- EXPLAIN ANALYZE gives: Limit (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1) - Unique (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1) - Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563 loops=1) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) (actual time=0.184..46.519 rows=5863 loops=1) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218 rows=8 loops=5863) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1 loops=48563) Index Cond: (outer.en = en.oid) - Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) (actual time=0.038..0.148 rows=14 loops=48563) Index Cond: (outer.en = ft.en) Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1)) Total runtime: 81782.052 ms (18 rows) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Toooo many context switches (maybe SLES8?)
Could this be related to the O(1) scheduler backpatches from 2.6 to 2.4 kernel on newer 2.4er distros (RedHat, SuSE)? Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Improve spinlock code for recent x86 processors: insert a PAUSE instruction in the s_lock() wait loop, and use test before test-and-set in TAS() macro to avoid unnecessary bus traffic. Patch from Manfred Spraul, reworked a bit by Tom. I thought this had been committed to the 7.4 stable branch as well, but it appears not. I am currently chasing what seems to be the same issue: massive context swapping on a dual Xeon system. I tried back-patching the above-mentioned patch ... it helps a little but by no means solves the problem ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly