Re: [PERFORM] Optimization idea

2010-04-23 Thread Robert Haas
On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote:
 I don't think this is just an issue with statistics, because the same
 problem arises when I try executing a query like this:

I'm not sure how you think this proves that it isn't a problem with
statistics, but I think what you should be focusing on here, looking
back to your original email, is that the plans that are actually much
faster have almost as much estimated cost as the slower one.  Since
all your data is probably fully cached, at a first cut, I might try
setting random_page_cost and seq_page_cost to 0.005 or so, and
adjusting effective_cache_size to something appropriate.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Cédric Villemain
2010/4/23 Robert Haas robertmh...@gmail.com:
 On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote:
 I don't think this is just an issue with statistics, because the same
 problem arises when I try executing a query like this:

 I'm not sure how you think this proves that it isn't a problem with
 statistics, but I think what you should be focusing on here, looking
 back to your original email, is that the plans that are actually much
 faster have almost as much estimated cost as the slower one.  Since
 all your data is probably fully cached, at a first cut, I might try
 setting random_page_cost and seq_page_cost to 0.005 or so, and
 adjusting effective_cache_size to something appropriate.

that will help worrect the situation, but the planner is loosing here I think.


 ...Robert

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/4/23 Robert Haas robertmh...@gmail.com:
 On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote:
 I don't think this is just an issue with statistics, because the same
 problem arises when I try executing a query like this:

 I'm not sure how you think this proves that it isn't a problem with
 statistics, but I think what you should be focusing on here, looking
 back to your original email, is that the plans that are actually much
 faster have almost as much estimated cost as the slower one.  Since
 all your data is probably fully cached, at a first cut, I might try
 setting random_page_cost and seq_page_cost to 0.005 or so, and
 adjusting effective_cache_size to something appropriate.

 that will help worrect the situation, but the planner is loosing here I think.

Well, what do you think the planner should do differently?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Kevin Grittner
Cédric Villemaincedric.villemain.deb...@gmail.com wrote:
 2010/4/23 Robert Haas robertmh...@gmail.com:
 
 Since all your data is probably fully cached, at a first cut, I
 might try setting random_page_cost and seq_page_cost to 0.005 or
 so, and adjusting effective_cache_size to something appropriate.
 
 that will help worrect the situation, but the planner is loosing
 here I think.
 
The planner produces a lot of possible plans to produce the
requested results, and then calculates a cost for each.  The lowest
cost plan which will produce the correct results is the one chosen. 
If your costing factors don't represent the reality of your
environment, it won't pick the best plan for your environment.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Cédric Villemain
2010/4/23 Robert Haas robertmh...@gmail.com:
 On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/4/23 Robert Haas robertmh...@gmail.com:
 On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru 
 wrote:
 I don't think this is just an issue with statistics, because the same
 problem arises when I try executing a query like this:

 I'm not sure how you think this proves that it isn't a problem with
 statistics, but I think what you should be focusing on here, looking
 back to your original email, is that the plans that are actually much
 faster have almost as much estimated cost as the slower one.  Since
 all your data is probably fully cached, at a first cut, I might try
 setting random_page_cost and seq_page_cost to 0.005 or so, and
 adjusting effective_cache_size to something appropriate.

 that will help worrect the situation, but the planner is loosing here I 
 think.

 Well, what do you think the planner should do differently?

Here the planner just divide the number of rows in the t2 table by the
number of distinct value of t1.t. this is the rows=20200 we can see in
the explains.
It seems it is normal, but it also looks to me that it can be improved.
When estimating the rowcount to just num_rows/n_distinct, it *knows*
that this is wrong because the most_common_freqs of t2.t say that of
the 99600 rows have the value 1, or less than 200 in all other case.
So in every case the planner make (perhaps good) choice, but being
sure its estimation are wrong.
I wonder if we can improve the planner here.

In this case where the number of rows is lower than the stats
target(in t1.t), perhaps the planner can improve its decision by going
a bit ahead and trying plan for each n_distinct values corresponding
in t2.t .

I haven't a very clear idea of how to do that, but it may be better if
the planner estimate if its plan is 100%(or lower, just an idea) sure
to hapen and that's fine, else  try another plan.

in this test case, if the query is :
select *
from t2
join t1 on t1.t = t2.t
where t1.id = X;

if X=1 then the planner has 20% of chance that the rowcount=99600 and
80% that rowcount=200 or less, by providing a rowcount=20200 how can
it find the good plan anyway ? Is it beter to start with bad
estimation and perhaps find a good plan, or start with estimation
which may be bad but lead to a good plan in more than XX% of the
cases.

So, currently, the planner do as expected, but can we try another
approach for those corner cases ?


 ...Robert




-- 
Cédric Villemain

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-23 Thread Eliot Gable
To answer the question of whether calling a stored procedure adds any
significant overhead, I built a test case and the short answer is that it
seems that it does:

CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
$BODY$
DECLARE
temp INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
SELECT 1 AS id INTO temp;
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS
$BODY$
DECLARE
BEGIN
RETURN QUERY SELECT 1 AS id;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS
$BODY$
DECLARE
temp INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
temp := Test2A();
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;


EXPLAIN ANALYZE SELECT * FROM Test1();
Function Scan on test1  (cost=0.00..0.26 rows=1 width=4) (actual
time=6.568..6.569 rows=1 loops=1)
Total runtime: 6.585 ms


EXPLAIN ANALYZE SELECT * FROM Test2B();
Function Scan on test2b  (cost=0.00..0.26 rows=1 width=4) (actual
time=29.006..29.007 rows=1 loops=1)
Total runtime: 29.020 ms


So, when chasing milliseconds, don't call sub functions if it can
realistically and easily be avoided. I only have one operation/algorithm
broken out into another stored procedure because I call it in about 8
different places and it is 900+ lines long. While everything else could be
broken out into different stored procedures to make it easier to analyze the
whole set of code and probably make it easier to maintain, it does not make
sense from a performance perspective. Each different logical group of
actions that would be in its own stored procedure is only ever used once in
the whole algorithm, so there is no good code re-use going on. Further,
since the containing stored procedure gets called by itself hundreds or even
thousands of times per second on a production system, the nested calls to
individual sub-stored procedures would just add extra overhead for no real
gain. And, from these tests, it would be significant overhead.



On Thu, Apr 22, 2010 at 4:57 PM, Eliot Gable 
egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.comwrote:

 I appreciate all the comments.

 I will perform some benchmarking before doing the rewrite to be certain of
 how it will impact performance. At the very least, I think can say for
 near-certain now that the indexes are not going to help me given the
 particular queries I am dealing with and limited number of records the temp
 tables will have combined with the limited number of times I will re-use
 them.


 On Thu, Apr 22, 2010 at 10:42 AM, Merlin Moncure mmonc...@gmail.comwrote:

 On Thu, Apr 22, 2010 at 10:11 AM, Merlin Moncure mmonc...@gmail.com
 wrote:
  The timings are similar, but the array returning case:
  *)  runs in a single statement.  If this is executed from the client
  that means less round trips
  *) can be passed around as a variable between functions.  temp table
  requires re-query
  *) make some things easier/cheap such as counting the array -- you get
  to call the basically free array_upper()
  *) makes some things harder.  specifically dealing with arrays on the
  client is a pain UNLESS you expand the array w/unnest() or use
  libpqtypes
  *) can nest. you can trivially nest complicated sets w/arrays
  *) does not require explicit transaction mgmt


 I neglected to mention perhaps the most important point about the array
 method:
 *) does not rely on any temporary resources.

 If you write a lot of plpsql, you will start to appreciate the
 difference in execution time between planned and unplanned functions.
 The first time you run a function in a database session, it has to be
 parsed and planned.  The planning time in particular for large-ish
 functions that touch a lot of objects can exceed the execution time of
 the function.  Depending on _any_ temporary resources causes plan mgmt
 issues because the database detects that a table in the old plan is
 gone ('on commit drop') and has to re-plan.   If your functions are
 complex/long and you are counting milliseconds, then that alone should
 be enough to dump any approach that depends on temp tables.

 merlin




 --
 Eliot Gable

 We do not inherit the Earth from our ancestors: we borrow it from our
 children. ~David Brower

 I decided the words were too conservative for me. We're not borrowing from
 our children, we're stealing from them--and it's not even considered to be a
 crime. ~David Brower

 Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live;
 not live to eat.) ~Marcus Tullius Cicero




-- 
Eliot Gable

We do not inherit the Earth from our ancestors: we borrow it from our
children. ~David Brower

I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime. ~David Brower

Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-23 Thread Merlin Moncure
On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable
egable+pgsql-performa...@gmail.com wrote:
 To answer the question of whether calling a stored procedure adds any
 significant overhead, I built a test case and the short answer is that it
 seems that it does:

 CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
 $BODY$
 DECLARE
     temp INTEGER;
 BEGIN
     FOR i IN 1..1000 LOOP
         SELECT 1 AS id INTO temp;
     END LOOP;
     RETURN 1;
 END;
 $BODY$
 LANGUAGE plpgsql;

 CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS
 $BODY$
 DECLARE
 BEGIN
     RETURN QUERY SELECT 1 AS id;
 END;
 $BODY$
 LANGUAGE plpgsql;

 CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS
 $BODY$
 DECLARE
     temp INTEGER;
 BEGIN
     FOR i IN 1..1000 LOOP
         temp := Test2A();
     END LOOP;
     RETURN 1;
 END;
 $BODY$
 LANGUAGE plpgsql;


 EXPLAIN ANALYZE SELECT * FROM Test1();
 Function Scan on test1  (cost=0.00..0.26 rows=1 width=4) (actual
 time=6.568..6.569 rows=1 loops=1)
 Total runtime: 6.585 ms


 EXPLAIN ANALYZE SELECT * FROM Test2B();
 Function Scan on test2b  (cost=0.00..0.26 rows=1 width=4) (actual
 time=29.006..29.007 rows=1 loops=1)
 Total runtime: 29.020 ms

That's not a fair test.  test2a() is a SRF which has higher overhead
than regular function.  Try it this way and the timings will level
out:

CREATE OR REPLACE FUNCTION Test2A() RETURNS  INTEGER AS
$BODY$
DECLARE
BEGIN
RETURN  1 ;
END;
$BODY$
LANGUAGE plpgsql ;

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 3:22 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/4/23 Robert Haas robertmh...@gmail.com:
 On Fri, Apr 23, 2010 at 9:09 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/4/23 Robert Haas robertmh...@gmail.com:
 On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov arhi...@dc.baikal.ru 
 wrote:
 I don't think this is just an issue with statistics, because the same
 problem arises when I try executing a query like this:

 I'm not sure how you think this proves that it isn't a problem with
 statistics, but I think what you should be focusing on here, looking
 back to your original email, is that the plans that are actually much
 faster have almost as much estimated cost as the slower one.  Since
 all your data is probably fully cached, at a first cut, I might try
 setting random_page_cost and seq_page_cost to 0.005 or so, and
 adjusting effective_cache_size to something appropriate.

 that will help worrect the situation, but the planner is loosing here I 
 think.

 Well, what do you think the planner should do differently?

 Here the planner just divide the number of rows in the t2 table by the
 number of distinct value of t1.t. this is the rows=20200 we can see in
 the explains.
 It seems it is normal, but it also looks to me that it can be improved.
 When estimating the rowcount to just num_rows/n_distinct, it *knows*
 that this is wrong because the most_common_freqs of t2.t say that of
 the 99600 rows have the value 1, or less than 200 in all other case.
 So in every case the planner make (perhaps good) choice, but being
 sure its estimation are wrong.
 I wonder if we can improve the planner here.

 In this case where the number of rows is lower than the stats
 target(in t1.t), perhaps the planner can improve its decision by going
 a bit ahead and trying plan for each n_distinct values corresponding
 in t2.t .

 I haven't a very clear idea of how to do that, but it may be better if
 the planner estimate if its plan is 100%(or lower, just an idea) sure
 to hapen and that's fine, else  try another plan.

 in this test case, if the query is :
 select *
 from t2
 join t1 on t1.t = t2.t
 where t1.id = X;

 if X=1 then the planner has 20% of chance that the rowcount=99600 and
 80% that rowcount=200 or less, by providing a rowcount=20200 how can
 it find the good plan anyway ? Is it beter to start with bad
 estimation and perhaps find a good plan, or start with estimation
 which may be bad but lead to a good plan in more than XX% of the
 cases.

 So, currently, the planner do as expected, but can we try another
 approach for those corner cases ?

Hmm.  We currently have a heuristic that we don't record a value as an
MCV unless it's more frequent than the average frequency.  When the
number of MCVs is substantially smaller than the number of distinct
values in the table this is probably a good heuristic, since it
prevents us from bothering with the recording of some values that are
probably only marginally more interesting than other values we don't
have space to record.  But if ndistinct is less than the stats target
we could in theory record every value we find in the MCVs table and
leave the histogram empty.  Not sure if that would be better in
general, or not, but it's a thought.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable
egable+pgsql-performa...@gmail.com wrote:
 And, from these tests, it would be significant overhead.

Yeah, I've been very disappointed by the size of the function-call
overhead on many occasions.  It might be worth putting some effort
into seeing if there's anything that can be done about this, but I
haven't.  :-)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Hmm.  We currently have a heuristic that we don't record a value as an
 MCV unless it's more frequent than the average frequency.  When the
 number of MCVs is substantially smaller than the number of distinct
 values in the table this is probably a good heuristic, since it
 prevents us from bothering with the recording of some values that are
 probably only marginally more interesting than other values we don't
 have space to record.  But if ndistinct is less than the stats target
 we could in theory record every value we find in the MCVs table and
 leave the histogram empty.

Which, in fact, is exactly what we do.  Cf analyze.c lines 2414ff
(as of CVS HEAD).  The heuristic you mention only gets applied after
we determine that a complete MCV list won't fit.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 6:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm.  We currently have a heuristic that we don't record a value as an
 MCV unless it's more frequent than the average frequency.  When the
 number of MCVs is substantially smaller than the number of distinct
 values in the table this is probably a good heuristic, since it
 prevents us from bothering with the recording of some values that are
 probably only marginally more interesting than other values we don't
 have space to record.  But if ndistinct is less than the stats target
 we could in theory record every value we find in the MCVs table and
 leave the histogram empty.

 Which, in fact, is exactly what we do.  Cf analyze.c lines 2414ff
 (as of CVS HEAD).  The heuristic you mention only gets applied after
 we determine that a complete MCV list won't fit.

Oh, hrmm.  I guess I need to go try to understand this example again, then.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-23 Thread Eliot Gable
That's a good point. However, even after changing it, it is still 12ms with
the function call verses 6ms without the extra function call. Though, it is
worth noting that if you can make the function call be guaranteed to return
the same results when used with the same input parameters, it ends up being
faster (roughly 3ms in my test case) due to caching -- at least when
executing it multiple times in a row like this. Unfortunately, I cannot take
advantage of that, because in my particular use case, the chances of it
being called again with the same input values within the cache lifetime of
the results is close to zero. Add to that the fact that the function queries
tables that could change between transactions (meaning the function is
volatile) and it's a moot point. However, it is worth noting that for those
people using a non-volatile function call multiple times in the same
transaction with the same input values, there is no need to inline the
function call.


On Fri, Apr 23, 2010 at 5:01 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Apr 23, 2010 at 4:42 PM, Eliot Gable
 egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.com
 wrote:
  To answer the question of whether calling a stored procedure adds any
  significant overhead, I built a test case and the short answer is that it
  seems that it does:
 
  CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
  $BODY$
  DECLARE
  temp INTEGER;
  BEGIN
  FOR i IN 1..1000 LOOP
  SELECT 1 AS id INTO temp;
  END LOOP;
  RETURN 1;
  END;
  $BODY$
  LANGUAGE plpgsql;
 
  CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS
  $BODY$
  DECLARE
  BEGIN
  RETURN QUERY SELECT 1 AS id;
  END;
  $BODY$
  LANGUAGE plpgsql;
 
  CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS
  $BODY$
  DECLARE
  temp INTEGER;
  BEGIN
  FOR i IN 1..1000 LOOP
  temp := Test2A();
  END LOOP;
  RETURN 1;
  END;
  $BODY$
  LANGUAGE plpgsql;
 
 
  EXPLAIN ANALYZE SELECT * FROM Test1();
  Function Scan on test1  (cost=0.00..0.26 rows=1 width=4) (actual
  time=6.568..6.569 rows=1 loops=1)
  Total runtime: 6.585 ms
 
 
  EXPLAIN ANALYZE SELECT * FROM Test2B();
  Function Scan on test2b  (cost=0.00..0.26 rows=1 width=4) (actual
  time=29.006..29.007 rows=1 loops=1)
  Total runtime: 29.020 ms

 That's not a fair test.  test2a() is a SRF which has higher overhead
 than regular function.  Try it this way and the timings will level
 out:

 CREATE OR REPLACE FUNCTION Test2A() RETURNS  INTEGER AS
 $BODY$
 DECLARE
 BEGIN
 RETURN  1 ;
 END;
 $BODY$
 LANGUAGE plpgsql ;

 merlin




-- 
Eliot Gable

We do not inherit the Earth from our ancestors: we borrow it from our
children. ~David Brower

I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime. ~David Brower

Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-23 Thread Eliot Gable
More benchmarking results are in with a comparison between cursors, arrays,
and temporary tables for storing, using, and accessing data outside the
stored procedure:

CREATE OR REPLACE FUNCTION Test_Init() RETURNS INTEGER AS
$BODY$
DECLARE
   temp INTEGER;
BEGIN
   DROP TABLE IF EXISTS test_table1 CASCADE;
   CREATE TABLE test_table1 (
  id SERIAL NOT NULL PRIMARY KEY,
  junk_field1 INTEGER,
  junk_field2 INTEGER,
  junk_field3 INTEGER
   ) WITH (OIDS=FALSE);
   DROP INDEX IF EXISTS test_table1_junk_field1_idx CASCADE;
   DROP INDEX IF EXISTS test_table1_junk_field2_idx CASCADE;
   DROP INDEX IF EXISTS test_table1_junk_field3_idx CASCADE;
   FOR i IN 1..1 LOOP
  INSERT INTO test_table1 (junk_field1, junk_field2, junk_field3) VALUES
(i%10, i%20, i%30);
   END LOOP;
   CREATE INDEX test_table1_junk_field1_idx ON test_table1 USING btree
(junk_field1);
   CREATE INDEX test_table1_junk_field2_idx ON test_table1 USING btree
(junk_field2);
   CREATE INDEX test_table1_junk_field3_idx ON test_table1 USING btree
(junk_field3);
   DROP TABLE IF EXISTS test_table2 CASCADE;
   CREATE TABLE test_table2 (
  id SERIAL NOT NULL PRIMARY KEY,
  junk_field1 INTEGER,
  junk_field2 INTEGER,
  junk_field3 INTEGER
   ) WITH (OIDS=FALSE);
   DROP INDEX IF EXISTS test_table2_junk_field1_idx CASCADE;
   DROP INDEX IF EXISTS test_table2_junk_field2_idx CASCADE;
   DROP INDEX IF EXISTS test_table2_junk_field3_idx CASCADE;
   FOR i IN 1..1 LOOP
  INSERT INTO test_table2 (junk_field1, junk_field2, junk_field3) VALUES
  (i%15, i%25, i%35);
   END LOOP;
   CREATE INDEX test_table2_junk_field1_idx ON test_table2 USING btree
(junk_field1);
   CREATE INDEX test_table2_junk_field2_idx ON test_table2 USING btree
(junk_field2);
   CREATE INDEX test_table2_junk_field3_idx ON test_table2 USING btree
(junk_field3);
   RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

SELECT * FROM Test_Init();

DROP TYPE IF EXISTS test_row_type CASCADE;
CREATE TYPE test_row_type AS (
   junk_field1 INTEGER,
   junk_field2 INTEGER,
   junk_field3 INTEGER
);

CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
$BODY$
DECLARE
   temp_row test_row_type;
   cursresults test_row_type[];
   curs SCROLL CURSOR IS
  SELECT * FROM test_table1 WHERE junk_field1=8;
BEGIN
FOR temp IN curs LOOP
   temp_row := temp;
   cursresults := array_append(cursresults, temp_row);
END LOOP;
OPEN curs;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2() RETURNS INTEGER AS
$BODY$
DECLARE
cursresults test_row_type[];
   cur SCROLL CURSOR IS
 SELECT * FROM unnest(cursresults);
BEGIN
   cursresults := array(SELECT (junk_field1, junk_field2,
junk_field3)::test_row_type AS rec FROM test_table1 WHERE junk_field1=8);
   OPEN cur;
   RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test3() RETURNS INTEGER AS
$BODY$
DECLARE
BEGIN
   CREATE TEMPORARY TABLE results WITH (OIDS=FALSE) ON COMMIT DROP AS (
  SELECT junk_field1, junk_field2, junk_field3 FROM test_table1 WHERE
junk_field1=8
   );
   RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test4() RETURNS INTEGER AS
$BODY$
DECLARE
   cur SCROLL CURSOR IS
  SELECT * FROM results;
BEGIN
   CREATE TEMPORARY TABLE results WITH (OIDS=FALSE) ON COMMIT DROP AS (
  SELECT junk_field1, junk_field2, junk_field3 FROM test_table1 WHERE
junk_field1=8
   );
   OPEN cur;
   RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

EXPLAIN ANALYZE SELECT * FROM Test1();
Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual
time=17.701..17.701 rows=1 loops=1)
Total runtime: 17.714 ms -- Ouch


EXPLAIN ANALYZE SELECT * FROM Test2();
Function Scan on test2 (cost=0.00..0.26 rows=1 width=4) (actual
time=1.137..1.137 rows=1 loops=1)
Total runtime: 1.153 ms -- Wow


EXPLAIN ANALYZE SELECT * FROM Test3();
Function Scan on test3 (cost=0.00..0.26 rows=1 width=4) (actual
time=2.033..2.034 rows=1 loops=1)
Total runtime: 2.050 ms


EXPLAIN ANALYZE SELECT * FROM Test4();
Function Scan on test4 (cost=0.00..0.26 rows=1 width=4) (actual
time=2.001..2.001 rows=1 loops=1)
Total runtime: 2.012 ms


In each case, the results are available outside the stored procedure by
either fetching from the cursor or selecting from the temporary table.
Clearly, the temporary table takes a performance hit compared using arrays.
Building an array with array append is horrendously inefficient. Unnesting
an array is surprisingly efficient. As can be seen from Test3 and Test4,
cursors have no detectable overhead for opening the cursor (at least in this
example with 1000 result rows). It is unclear whether there is any
difference at all from Test3 and Test4 for retrieving the data as I have no
easy way right now to measure that accurately. However, since arrays+cursors
are more efficient than anything having to do with temp tables, that is the
way I will go. With the number of rows I am dealing with (which should
always be less than 1,000 in the final