Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Bryn Llewellyn Sun, 26 Mar 2023 13:16:38 -0700

> [email protected] wrote:
> 
>> [email protected] replied to [email protected]:
>> 
>> Thanks for the link to your SQL file at the line where you get the row count 
>> in the way that you describe... I noted that neither of these functions has 
>> a refcursor formal argument and that, rather, you open (i.e. create) each of 
>> the three cursors that you use within the two functions that uses them.
> 
> CREATE FUNCTION materialize_foreign_table(...)
> 
> CREATE FUNCTION db_migrate_refresh(...) 
> 
> What is not formal about the above?


I used the term of art "formal argument" ordinarily to denote what's listed in 
parentheses at the start of a subprogram definition. The term stands in 
contrast to "actual argument"—meaning the expression that's used in a 
subprogram invocation to provide a value for the corresponding formal argument. 
Go to this page:

dotnettutorials.net/lesson/types-of-function-arguments-in-python/

and search in it for "Example: formal and actual function arguments in python". 
The blurb that comes just before the example says what I just said. It seems to 
be more common to talk about formal and actual "parameters". But the PG doc 
prefers "argument".

I didn't at all mean that "formal" is good and that Laurenz's code is not 
formal and therefore is bad! So sorry if you took it to mean this. 

> Though it does not matter as, back to the docs again:
> 
> www.postgresql.org/docs/current/plpgsql-cursors.html
> 
> "All access to cursors in PL/pgSQL goes through cursor variables, which are 
> always of the special data type refcursor. One way to create a cursor 
> variable is just to declare it as a variable of type refcursor..." Though 
> maybe you are trying to differentiate between bound and unbound refcursor's...


Ah... you missed my point entirely. Looks like my prose was opaque. I was 
referring to this paradigm as described at the start of the page that you 
referenced:

«
A more interesting usage is to return a reference to a cursor that a function 
has created, allowing the caller to read the rows. This provides an efficient 
way to return large row sets from functions.
»

<aside> This seems to me to imply that the producer function with a "refcursor" 
return is "security definer" and that the consumer subprogram cannot access the 
tables of interest with explicit "select”. I thinks that's nice. </aside>

I needed to know that the definition and use of each "refcursor" variable, in 
Laurentz's code, are in the same subprogram because it's only when this is the 
case that using an ordinary "select count(*)" and an ordinary "for" loop, where 
the "select" is written in place within the "for" clause, are viable 
alternatives.

> Also order by is not relevant for getting a count.

Yes, of course. I know that. (But it does matter in many cases of results 
presentation.) I was simply noting that Laurenz’s use-case didn’t need ordering 
and that the "select" statements will be faster than if ordering were needed. 
Sorry if I implied something that I didn’t mean too. (The example in Laurenz's 
post, www.cybertec-postgresql.com/en/pagination-problem-total-result-count/ 
<http://www.cybertec-postgresql.com/en/pagination-problem-total-result-count/>, 
does use "order by".)

>> I did a little test an saw that this:
>> 
>> move last in cur;
>> get diagnostics n = row_count;
>> 
>> reported just 1—in contrast to what you used:
>> 
>> move forward all in cur;
> 
> Because they are different actions. The first directly moves to the last row 
> in the cursor and counts just that row. The second moves through all the rows 
> left in the cursor and counts all the rows it traversed. It is spelled out 
> here:
> 
> https://www.postgresql.org/docs/current/sql-move.html
> 
> "The count is the number of rows that a FETCH command with the same 
> parameters would have returned (possibly zero)."

Yes, I'm afraid that I expressed myself poorly again. I meant to say that while 
I know nothing about the internals, I can't see how "move last in cur" can know 
where the last row is unless it traverses the entire result set from its 
current position until it finds no more rows. And while its doing this, it may 
just as well count the rows it traverses. It would seem, therefore, that a 
single operation that moves to where you asked for and that tells you how many 
rows it traversed would be enough—rather than two that both achieve the same 
effect in the same time where one gives you the count of rows traversed and the 
other doesn't. But never mind. The simple way to see it is that the operations 
simply do what they do—and I won't worry about how they might have been 
conceived differently.

I did some timing tests. I’ve copied the results and the code at the end—just 
for completeness. Here’s what I concluded.

(1) (and this is unremarkable), if you can find a way to meet your requirement 
with just a single SQL statement, then this is bound to be best. I chose a 
simple requirement: read from a table and insert half the total number of rows 
into one table and the remainder into another. This single SQL statement meets 
that requirement:

with
  chunk(n) as (select count(*)/2 from s.t0),
  i1 as (insert into s.t1(k, v) select k, v from s.t0 where k < (select n from 
chunk) returning k)
insert into s.t2(k, v) select k, v from s.t0 where k >= (select n from chunk);

Internet search shows that this CTE approach seems to be the popular PG pattern 
to achieve what Oracle’s dedicated multi-table "insert" achieves. On the 
assumption that the source table suffers constant concurrent changes from other 
sessions, the PL/pgSQL loop that meets this requirement _must_ use the 
cursor-based approach for getting the count that we’ve been discussing. I’m 
glad to have learned all this because some requirements (e.g. when the source 
table traversal has to do DDLs) can’t be met with a single SQL statement, and 
so a PL/pgSQL loop must be used.

(2) Even though the declaration (in a txn that you commit) materializes the 
entire result set, this manages to be a lot faster than doing this yourself 
with a temp table like this:

with
  c(n) as (
    select count(*) from s.t),
  i1(n) as (
    insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from 
s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);

The index is meant to help "move". Maybe some use cases wouldn’t need it. Same 
goes for recording the count in its own one-row, one-column table.

(3) The times to do this:

move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;

and this:

move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;

are the same (within the limits of my measurements). But, of course, there’s no 
need to use the second alternative.

(4) The time to get the count using "move forward all in cur" is about the same 
as doing an ordinary "count(*)"—but "move forward" is a little slower. I'm 
therefore puzzled by Laurentz's "the result set count comes for free [with the 
cursor approach]" in his blog post (x-ref above). Its appeal seems to me to be 
limited to its rock-solid semantics.

(5) It's somewhat quicker to move around in a holdable cursor than in a 
non-holdable one. But, of course, you pay a price when you create the former.

Here are my timings—all in seconds and all with three digits of precision. I 
used PG 15.2 in a Ubuntu VM on my Mac Book. I run psql in the bare macOS and 
connect into the VM. On some runs, some times are higher than usual. I expect 
that the way the VM handles files has something to do with this. I chose a run 
where the timings were pretty stable to copy here.

Insert 10,000,000 rows time:             13.7  

Populate temp tables time:               10.6  

Native count(*) time:                     0.633
                                          0.637
                                          0.677
                                          0.781

Declare "Is-Holdable" time:               1.91 

Is-Holdable move fwd count time:          0.703
                                          0.700
                                          0.700
                                          0.700

Declare "Not-Holdable" time:              0.000

Not-Holdable move fwd count time:         0.877
                                          0.866
                                          0.864
                                          0.872

Declare "Is-Holdable" time:               1.78 

Is-Holdable move back & fore time:        0.691
                                          0.695
                                          0.709
                                          0.708

Declare "Not-Holdable" time:              0.000

Not-Holdable move back & fore time:       0.872
                                          0.871
                                          0.874
                                          0.888

Here's the code. If you want to run it, just copy it into a single file and 
start it at the psql prompt. You need just to connect as a regular user that 
has "create" on the database that you use.
--------------------------------------------------------------------------------

-- Create the objects.
\c :db :u
set search_path = pg_catalog, pg_temp;
drop schema if exists s cascade;
create schema s;
create table s.t(k int primary key, v int not null);

/*
  Format the elapsed time since t0 in seconds
  with three digits of precision. 
*/;
create function s.elapsed_time(t0 in double precision)
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  e constant  double
                precision not null := extract(epoch from clock_timestamp()) - 
t0;

  t constant  text        not null :=
    case
      when e >= 100000.0 then '>= 100K'                                  ||'    
'
      when e >=  10000.0 then to_char(round(e/100.0)*100.0, '99,999'    )||'    
'
      when e >=   1000.0 then to_char(round(e/10.0)* 10.0,   '9,999'    )||'    
'
      when e >=    100.0 then to_char(e,                       '999'    )||'    
'
      when e >=     10.0 then to_char(e,                        '99.9'  )||'  '
      when e >=      1.0 then to_char(e,                         '9.99' )||' '
      else                   to_char(e,                         '0.999')
    end;
begin
  return lpad(t, 12);
end;
$body$;

create procedure s.insert_table_time(no_of_rows int, t inout text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  n_text  constant  text        not null := ltrim(to_char(no_of_rows, 
'999,999,999,999'));
  caption constant  text        not null := rpad('Insert '||n_text||' rows 
time:', 35);

  t0                double
                      precision not null := 0.0;
begin
  truncate table s.t;

  t0 := extract(epoch from clock_timestamp());
  with g(v) as (select generate_series(1, no_of_rows))
  insert into s.t(k, v) select g.v, (g.v)*2 from g;
  t := caption||s.elapsed_time(t0);
end;
$body$;

create procedure s.populate_temp_tables_time(t inout text)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  caption constant  text        not null := rpad('Populate temp tables time:', 
35);
  t0                double
                      precision not null := 0.0;
begin
  create table pg_temp.t_count(n int);
  create table pg_temp.t(k int, v int, r int);

  t0 := extract(epoch from clock_timestamp());
  with
    c(n) as (
      select count(*) from s.t),
    i1(n) as (
      insert into pg_temp.t_count select n from c returning n)
  insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) 
from s.t;
  create unique index temp_t_k_unq on pg_temp.t(k asc);
  t := caption||s.elapsed_time(t0);
end;
$body$;

create procedure s.close_cursor(cur in refcursor)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
begin
  close cur;
end;
$body$;

create procedure s.declare_cursor_time(cur_name in text, holdable in boolean, t 
inout text)
  -- set search_path = pg_catalog, pg_temp :: Incomparible with "commit".
  language plpgsql
as $body$
declare
  hld     constant  text not null := case holdable
                                       when true then 'with hold'
                                       else           'without hold'
                                     end;
  slct    constant  text not null := 'select k, v from s.t order by k';
  decl    constant  text not null := format('declare %I scroll cursor %s for 
'||slct, cur_name, hld);
  caption constant  text        not null := rpad('Declare "'||cur_name||'" 
time:', 35);

  t0      constant  double
                      precision not null := extract(epoch from 
clock_timestamp());
begin
  execute decl;
  if holdable then
    commit;
  end if;
  t := caption||s.elapsed_time(t0);
end;
$body$;

create function s.native_count_time(expected_n in int, show_caption in boolean 
= false)
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  n                 int         not null := 0;
  caption constant  text        not null := case show_caption
                                              when true then rpad('Native 
count(*) time:', 35)
                                              else           rpad('', 35)
                                            end;
  t0      constant  double
                      precision not null := extract(epoch from 
clock_timestamp());
begin
  n := (select count(*) from s.t);
  assert n = expected_n;
  return caption||s.elapsed_time(t0);
end;
$body$;

drop function if exists s.count_by_move_time(refcursor, int, boolean) cascade;
create function s.count_by_move_time(cur in refcursor, expected_n in int, 
show_caption in boolean = false)
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  n                 int         not null := 0;
  caption constant  text        not null := case show_caption
                                              when true then rpad(cur::text||' 
move fwd count time:', 35)
                                              else           rpad('', 35)
                                            end;
  t0      constant  double
                      precision not null := extract(epoch from 
clock_timestamp());
begin
  move absolute 0 in cur;
  move forward all in cur;
  get diagnostics n = row_count;
  move absolute 0 in cur;
  assert n = expected_n;
  return caption||s.elapsed_time(t0);
end;
$body$;

create function s.move_back_and_fore_time(cur in refcursor, show_caption in 
boolean = false)
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  caption constant  text        not null := case show_caption
                                              when true then rpad(cur::text||' 
move back & fore time:', 35)
                                              else           rpad('', 35)
                                            end;
  t0      constant  double
                      precision not null := extract(epoch from 
clock_timestamp());
begin
  move absolute 0 in cur;
  move last in cur;
  move absolute 0 in cur;
  return caption||s.elapsed_time(t0);
end;
$body$;
--------------------------------------------------------------------------------

-- Do the timimg.
-- Ten thousand rows.
\set no_of_rows 10000000
\set Is_Holdable ''''Is-Holdable''''
\set Not_Holdable ''''Not-Holdable''''
\t on
\o results.txt

call s.insert_table_time(:no_of_rows, null::text);
call s.populate_temp_tables_time(null::text);

select s.native_count_time(:no_of_rows, true);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);

call s.declare_cursor_time(:Is_Holdable, true, null::text);

select s.count_by_move_time(:Is_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);

start transaction;
  call s.declare_cursor_time(:Not_Holdable, false, null::text);

  select s.count_by_move_time(:Not_Holdable, :no_of_rows, true);
  select s.count_by_move_time(:Not_Holdable, :no_of_rows);
  select s.count_by_move_time(:Not_Holdable, :no_of_rows);
  select s.count_by_move_time(:Not_Holdable, :no_of_rows);
rollback;

call s.close_cursor(:Is_Holdable);

call s.declare_cursor_time(:Is_Holdable, true, null::text);

select s.move_back_and_fore_time(:Is_Holdable, true);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);

start transaction;
  call s.declare_cursor_time(:Not_Holdable, false, null::text);

  select s.move_back_and_fore_time(:Not_Holdable, true);
  select s.move_back_and_fore_time(:Not_Holdable);
  select s.move_back_and_fore_time(:Not_Holdable);
  select s.move_back_and_fore_time(:Not_Holdable);
rollback;

\o
\t off

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Reply via email to