Re: [PERFORM] Optimization idea
On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: I don't think this is just an issue with statistics, because the same problem arises when I try executing a query like this: I'm not sure how you think this proves that it isn't a problem with statistics, but I think what you should be focusing on here, looking back to your original email, is that the plans that are actually much faster have almost as much estimated cost as the slower one. Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
2010/4/23 Robert Haas robertmh...@gmail.com: On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: I don't think this is just an issue with statistics, because the same problem arises when I try executing a query like this: I'm not sure how you think this proves that it isn't a problem with statistics, but I think what you should be focusing on here, looking back to your original email, is that the plans that are actually much faster have almost as much estimated cost as the slower one. Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. that will help worrect the situation, but the planner is loosing here I think. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/4/23 Robert Haas robertmh...@gmail.com: On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: I don't think this is just an issue with statistics, because the same problem arises when I try executing a query like this: I'm not sure how you think this proves that it isn't a problem with statistics, but I think what you should be focusing on here, looking back to your original email, is that the plans that are actually much faster have almost as much estimated cost as the slower one. Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. that will help worrect the situation, but the planner is loosing here I think. Well, what do you think the planner should do differently? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
Cédric Villemaincedric.villemain.deb...@gmail.com wrote: 2010/4/23 Robert Haas robertmh...@gmail.com: Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. that will help worrect the situation, but the planner is loosing here I think. The planner produces a lot of possible plans to produce the requested results, and then calculates a cost for each. The lowest cost plan which will produce the correct results is the one chosen. If your costing factors don't represent the reality of your environment, it won't pick the best plan for your environment. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
2010/4/23 Robert Haas robertmh...@gmail.com: On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/4/23 Robert Haas robertmh...@gmail.com: On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: I don't think this is just an issue with statistics, because the same problem arises when I try executing a query like this: I'm not sure how you think this proves that it isn't a problem with statistics, but I think what you should be focusing on here, looking back to your original email, is that the plans that are actually much faster have almost as much estimated cost as the slower one. Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. that will help worrect the situation, but the planner is loosing here I think. Well, what do you think the planner should do differently? Here the planner just divide the number of rows in the t2 table by the number of distinct value of t1.t. this is the rows=20200 we can see in the explains. It seems it is normal, but it also looks to me that it can be improved. When estimating the rowcount to just num_rows/n_distinct, it *knows* that this is wrong because the most_common_freqs of t2.t say that of the 99600 rows have the value 1, or less than 200 in all other case. So in every case the planner make (perhaps good) choice, but being sure its estimation are wrong. I wonder if we can improve the planner here. In this case where the number of rows is lower than the stats target(in t1.t), perhaps the planner can improve its decision by going a bit ahead and trying plan for each n_distinct values corresponding in t2.t . I haven't a very clear idea of how to do that, but it may be better if the planner estimate if its plan is 100%(or lower, just an idea) sure to hapen and that's fine, else try another plan. in this test case, if the query is : select * from t2 join t1 on t1.t = t2.t where t1.id = X; if X=1 then the planner has 20% of chance that the rowcount=99600 and 80% that rowcount=200 or less, by providing a rowcount=20200 how can it find the good plan anyway ? Is it beter to start with bad estimation and perhaps find a good plan, or start with estimation which may be bad but lead to a good plan in more than XX% of the cases. So, currently, the planner do as expected, but can we try another approach for those corner cases ? ...Robert -- Cédric Villemain -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replacing Cursors with Temporary Tables
To answer the question of whether calling a stored procedure adds any significant overhead, I built a test case and the short answer is that it seems that it does: CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP SELECT 1 AS id INTO temp; END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS $BODY$ DECLARE BEGIN RETURN QUERY SELECT 1 AS id; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP temp := Test2A(); END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; EXPLAIN ANALYZE SELECT * FROM Test1(); Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual time=6.568..6.569 rows=1 loops=1) Total runtime: 6.585 ms EXPLAIN ANALYZE SELECT * FROM Test2B(); Function Scan on test2b (cost=0.00..0.26 rows=1 width=4) (actual time=29.006..29.007 rows=1 loops=1) Total runtime: 29.020 ms So, when chasing milliseconds, don't call sub functions if it can realistically and easily be avoided. I only have one operation/algorithm broken out into another stored procedure because I call it in about 8 different places and it is 900+ lines long. While everything else could be broken out into different stored procedures to make it easier to analyze the whole set of code and probably make it easier to maintain, it does not make sense from a performance perspective. Each different logical group of actions that would be in its own stored procedure is only ever used once in the whole algorithm, so there is no good code re-use going on. Further, since the containing stored procedure gets called by itself hundreds or even thousands of times per second on a production system, the nested calls to individual sub-stored procedures would just add extra overhead for no real gain. And, from these tests, it would be significant overhead. On Thu, Apr 22, 2010 at 4:57 PM, Eliot Gable egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.comwrote: I appreciate all the comments. I will perform some benchmarking before doing the rewrite to be certain of how it will impact performance. At the very least, I think can say for near-certain now that the indexes are not going to help me given the particular queries I am dealing with and limited number of records the temp tables will have combined with the limited number of times I will re-use them. On Thu, Apr 22, 2010 at 10:42 AM, Merlin Moncure mmonc...@gmail.comwrote: On Thu, Apr 22, 2010 at 10:11 AM, Merlin Moncure mmonc...@gmail.com wrote: The timings are similar, but the array returning case: *) runs in a single statement. If this is executed from the client that means less round trips *) can be passed around as a variable between functions. temp table requires re-query *) make some things easier/cheap such as counting the array -- you get to call the basically free array_upper() *) makes some things harder. specifically dealing with arrays on the client is a pain UNLESS you expand the array w/unnest() or use libpqtypes *) can nest. you can trivially nest complicated sets w/arrays *) does not require explicit transaction mgmt I neglected to mention perhaps the most important point about the array method: *) does not rely on any temporary resources. If you write a lot of plpsql, you will start to appreciate the difference in execution time between planned and unplanned functions. The first time you run a function in a database session, it has to be parsed and planned. The planning time in particular for large-ish functions that touch a lot of objects can exceed the execution time of the function. Depending on _any_ temporary resources causes plan mgmt issues because the database detects that a table in the old plan is gone ('on commit drop') and has to re-plan. If your functions are complex/long and you are counting milliseconds, then that alone should be enough to dump any approach that depends on temp tables. merlin -- Eliot Gable We do not inherit the Earth from our ancestors: we borrow it from our children. ~David Brower I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime. ~David Brower Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero -- Eliot Gable We do not inherit the Earth from our ancestors: we borrow it from our children. ~David Brower I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime. ~David Brower Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
Re: [PERFORM] Replacing Cursors with Temporary Tables
On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable egable+pgsql-performa...@gmail.com wrote: To answer the question of whether calling a stored procedure adds any significant overhead, I built a test case and the short answer is that it seems that it does: CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP SELECT 1 AS id INTO temp; END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS $BODY$ DECLARE BEGIN RETURN QUERY SELECT 1 AS id; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP temp := Test2A(); END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; EXPLAIN ANALYZE SELECT * FROM Test1(); Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual time=6.568..6.569 rows=1 loops=1) Total runtime: 6.585 ms EXPLAIN ANALYZE SELECT * FROM Test2B(); Function Scan on test2b (cost=0.00..0.26 rows=1 width=4) (actual time=29.006..29.007 rows=1 loops=1) Total runtime: 29.020 ms That's not a fair test. test2a() is a SRF which has higher overhead than regular function. Try it this way and the timings will level out: CREATE OR REPLACE FUNCTION Test2A() RETURNS INTEGER AS $BODY$ DECLARE BEGIN RETURN 1 ; END; $BODY$ LANGUAGE plpgsql ; merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
On Fri, Apr 23, 2010 at 3:22 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/4/23 Robert Haas robertmh...@gmail.com: On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/4/23 Robert Haas robertmh...@gmail.com: On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: I don't think this is just an issue with statistics, because the same problem arises when I try executing a query like this: I'm not sure how you think this proves that it isn't a problem with statistics, but I think what you should be focusing on here, looking back to your original email, is that the plans that are actually much faster have almost as much estimated cost as the slower one. Since all your data is probably fully cached, at a first cut, I might try setting random_page_cost and seq_page_cost to 0.005 or so, and adjusting effective_cache_size to something appropriate. that will help worrect the situation, but the planner is loosing here I think. Well, what do you think the planner should do differently? Here the planner just divide the number of rows in the t2 table by the number of distinct value of t1.t. this is the rows=20200 we can see in the explains. It seems it is normal, but it also looks to me that it can be improved. When estimating the rowcount to just num_rows/n_distinct, it *knows* that this is wrong because the most_common_freqs of t2.t say that of the 99600 rows have the value 1, or less than 200 in all other case. So in every case the planner make (perhaps good) choice, but being sure its estimation are wrong. I wonder if we can improve the planner here. In this case where the number of rows is lower than the stats target(in t1.t), perhaps the planner can improve its decision by going a bit ahead and trying plan for each n_distinct values corresponding in t2.t . I haven't a very clear idea of how to do that, but it may be better if the planner estimate if its plan is 100%(or lower, just an idea) sure to hapen and that's fine, else try another plan. in this test case, if the query is : select * from t2 join t1 on t1.t = t2.t where t1.id = X; if X=1 then the planner has 20% of chance that the rowcount=99600 and 80% that rowcount=200 or less, by providing a rowcount=20200 how can it find the good plan anyway ? Is it beter to start with bad estimation and perhaps find a good plan, or start with estimation which may be bad but lead to a good plan in more than XX% of the cases. So, currently, the planner do as expected, but can we try another approach for those corner cases ? Hmm. We currently have a heuristic that we don't record a value as an MCV unless it's more frequent than the average frequency. When the number of MCVs is substantially smaller than the number of distinct values in the table this is probably a good heuristic, since it prevents us from bothering with the recording of some values that are probably only marginally more interesting than other values we don't have space to record. But if ndistinct is less than the stats target we could in theory record every value we find in the MCVs table and leave the histogram empty. Not sure if that would be better in general, or not, but it's a thought. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replacing Cursors with Temporary Tables
On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable egable+pgsql-performa...@gmail.com wrote: And, from these tests, it would be significant overhead. Yeah, I've been very disappointed by the size of the function-call overhead on many occasions. It might be worth putting some effort into seeing if there's anything that can be done about this, but I haven't. :-) ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
Robert Haas robertmh...@gmail.com writes: Hmm. We currently have a heuristic that we don't record a value as an MCV unless it's more frequent than the average frequency. When the number of MCVs is substantially smaller than the number of distinct values in the table this is probably a good heuristic, since it prevents us from bothering with the recording of some values that are probably only marginally more interesting than other values we don't have space to record. But if ndistinct is less than the stats target we could in theory record every value we find in the MCVs table and leave the histogram empty. Which, in fact, is exactly what we do. Cf analyze.c lines 2414ff (as of CVS HEAD). The heuristic you mention only gets applied after we determine that a complete MCV list won't fit. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea
On Fri, Apr 23, 2010 at 6:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Hmm. We currently have a heuristic that we don't record a value as an MCV unless it's more frequent than the average frequency. When the number of MCVs is substantially smaller than the number of distinct values in the table this is probably a good heuristic, since it prevents us from bothering with the recording of some values that are probably only marginally more interesting than other values we don't have space to record. But if ndistinct is less than the stats target we could in theory record every value we find in the MCVs table and leave the histogram empty. Which, in fact, is exactly what we do. Cf analyze.c lines 2414ff (as of CVS HEAD). The heuristic you mention only gets applied after we determine that a complete MCV list won't fit. Oh, hrmm. I guess I need to go try to understand this example again, then. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replacing Cursors with Temporary Tables
That's a good point. However, even after changing it, it is still 12ms with the function call verses 6ms without the extra function call. Though, it is worth noting that if you can make the function call be guaranteed to return the same results when used with the same input parameters, it ends up being faster (roughly 3ms in my test case) due to caching -- at least when executing it multiple times in a row like this. Unfortunately, I cannot take advantage of that, because in my particular use case, the chances of it being called again with the same input values within the cache lifetime of the results is close to zero. Add to that the fact that the function queries tables that could change between transactions (meaning the function is volatile) and it's a moot point. However, it is worth noting that for those people using a non-volatile function call multiple times in the same transaction with the same input values, there is no need to inline the function call. On Fri, Apr 23, 2010 at 5:01 PM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.com wrote: To answer the question of whether calling a stored procedure adds any significant overhead, I built a test case and the short answer is that it seems that it does: CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP SELECT 1 AS id INTO temp; END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS $BODY$ DECLARE BEGIN RETURN QUERY SELECT 1 AS id; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN FOR i IN 1..1000 LOOP temp := Test2A(); END LOOP; RETURN 1; END; $BODY$ LANGUAGE plpgsql; EXPLAIN ANALYZE SELECT * FROM Test1(); Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual time=6.568..6.569 rows=1 loops=1) Total runtime: 6.585 ms EXPLAIN ANALYZE SELECT * FROM Test2B(); Function Scan on test2b (cost=0.00..0.26 rows=1 width=4) (actual time=29.006..29.007 rows=1 loops=1) Total runtime: 29.020 ms That's not a fair test. test2a() is a SRF which has higher overhead than regular function. Try it this way and the timings will level out: CREATE OR REPLACE FUNCTION Test2A() RETURNS INTEGER AS $BODY$ DECLARE BEGIN RETURN 1 ; END; $BODY$ LANGUAGE plpgsql ; merlin -- Eliot Gable We do not inherit the Earth from our ancestors: we borrow it from our children. ~David Brower I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime. ~David Brower Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
Re: [PERFORM] Replacing Cursors with Temporary Tables
More benchmarking results are in with a comparison between cursors, arrays, and temporary tables for storing, using, and accessing data outside the stored procedure: CREATE OR REPLACE FUNCTION Test_Init() RETURNS INTEGER AS $BODY$ DECLARE temp INTEGER; BEGIN DROP TABLE IF EXISTS test_table1 CASCADE; CREATE TABLE test_table1 ( id SERIAL NOT NULL PRIMARY KEY, junk_field1 INTEGER, junk_field2 INTEGER, junk_field3 INTEGER ) WITH (OIDS=FALSE); DROP INDEX IF EXISTS test_table1_junk_field1_idx CASCADE; DROP INDEX IF EXISTS test_table1_junk_field2_idx CASCADE; DROP INDEX IF EXISTS test_table1_junk_field3_idx CASCADE; FOR i IN 1..1 LOOP INSERT INTO test_table1 (junk_field1, junk_field2, junk_field3) VALUES (i%10, i%20, i%30); END LOOP; CREATE INDEX test_table1_junk_field1_idx ON test_table1 USING btree (junk_field1); CREATE INDEX test_table1_junk_field2_idx ON test_table1 USING btree (junk_field2); CREATE INDEX test_table1_junk_field3_idx ON test_table1 USING btree (junk_field3); DROP TABLE IF EXISTS test_table2 CASCADE; CREATE TABLE test_table2 ( id SERIAL NOT NULL PRIMARY KEY, junk_field1 INTEGER, junk_field2 INTEGER, junk_field3 INTEGER ) WITH (OIDS=FALSE); DROP INDEX IF EXISTS test_table2_junk_field1_idx CASCADE; DROP INDEX IF EXISTS test_table2_junk_field2_idx CASCADE; DROP INDEX IF EXISTS test_table2_junk_field3_idx CASCADE; FOR i IN 1..1 LOOP INSERT INTO test_table2 (junk_field1, junk_field2, junk_field3) VALUES (i%15, i%25, i%35); END LOOP; CREATE INDEX test_table2_junk_field1_idx ON test_table2 USING btree (junk_field1); CREATE INDEX test_table2_junk_field2_idx ON test_table2 USING btree (junk_field2); CREATE INDEX test_table2_junk_field3_idx ON test_table2 USING btree (junk_field3); RETURN 1; END; $BODY$ LANGUAGE plpgsql; SELECT * FROM Test_Init(); DROP TYPE IF EXISTS test_row_type CASCADE; CREATE TYPE test_row_type AS ( junk_field1 INTEGER, junk_field2 INTEGER, junk_field3 INTEGER ); CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS $BODY$ DECLARE temp_row test_row_type; cursresults test_row_type[]; curs SCROLL CURSOR IS SELECT * FROM test_table1 WHERE junk_field1=8; BEGIN FOR temp IN curs LOOP temp_row := temp; cursresults := array_append(cursresults, temp_row); END LOOP; OPEN curs; RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test2() RETURNS INTEGER AS $BODY$ DECLARE cursresults test_row_type[]; cur SCROLL CURSOR IS SELECT * FROM unnest(cursresults); BEGIN cursresults := array(SELECT (junk_field1, junk_field2, junk_field3)::test_row_type AS rec FROM test_table1 WHERE junk_field1=8); OPEN cur; RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test3() RETURNS INTEGER AS $BODY$ DECLARE BEGIN CREATE TEMPORARY TABLE results WITH (OIDS=FALSE) ON COMMIT DROP AS ( SELECT junk_field1, junk_field2, junk_field3 FROM test_table1 WHERE junk_field1=8 ); RETURN 1; END; $BODY$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Test4() RETURNS INTEGER AS $BODY$ DECLARE cur SCROLL CURSOR IS SELECT * FROM results; BEGIN CREATE TEMPORARY TABLE results WITH (OIDS=FALSE) ON COMMIT DROP AS ( SELECT junk_field1, junk_field2, junk_field3 FROM test_table1 WHERE junk_field1=8 ); OPEN cur; RETURN 1; END; $BODY$ LANGUAGE plpgsql; EXPLAIN ANALYZE SELECT * FROM Test1(); Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual time=17.701..17.701 rows=1 loops=1) Total runtime: 17.714 ms -- Ouch EXPLAIN ANALYZE SELECT * FROM Test2(); Function Scan on test2 (cost=0.00..0.26 rows=1 width=4) (actual time=1.137..1.137 rows=1 loops=1) Total runtime: 1.153 ms -- Wow EXPLAIN ANALYZE SELECT * FROM Test3(); Function Scan on test3 (cost=0.00..0.26 rows=1 width=4) (actual time=2.033..2.034 rows=1 loops=1) Total runtime: 2.050 ms EXPLAIN ANALYZE SELECT * FROM Test4(); Function Scan on test4 (cost=0.00..0.26 rows=1 width=4) (actual time=2.001..2.001 rows=1 loops=1) Total runtime: 2.012 ms In each case, the results are available outside the stored procedure by either fetching from the cursor or selecting from the temporary table. Clearly, the temporary table takes a performance hit compared using arrays. Building an array with array append is horrendously inefficient. Unnesting an array is surprisingly efficient. As can be seen from Test3 and Test4, cursors have no detectable overhead for opening the cursor (at least in this example with 1000 result rows). It is unclear whether there is any difference at all from Test3 and Test4 for retrieving the data as I have no easy way right now to measure that accurately. However, since arrays+cursors are more efficient than anything having to do with temp tables, that is the way I will go. With the number of rows I am dealing with (which should always be less than 1,000 in the final