A while back, I posted a pathological minimal-case query where, in order to select one row from a users table, Postgres needed to scan the whole users table, because the restriction was not visible to the GROUP BY.

At the time, Tom wrote:

> Don't hold your breath waiting for that to change.  To do what you're
> wishing for, we'd have to treat the GROUP BY subquery as if it were an
> inner indexscan, and push a join condition into it.  That's not even
> possible today.  It might be possible after I get done with the
> parameterized-path stuff I've been speculating about for a couple of
> years now; but I suspect that even if it is possible, we won't do it
> for subqueries because of the planner-performance hit we'd take from
> repeatedly replanning the same subquery.

http://archives.postgresql.org/pgsql-performance/2011-11/msg00131.php

Given all the work that's been done on parameterized paths and LATERAL, is this something to revisit yet? In 9.3, it's at least possible to manually tweak the SRF, thanks to LATERAL. But it'd be great to allow set-returning functions to remain blissfully unaware of their surroundings. Modular code, Single Responsibility Principle, all that.

I guess a more general question is: Are there cases where the planner can *use* LATERAL functionality to push down restrictions like this? (Do LATERAL and pushdown conceptually overlap? I think maybe they do.)

Example code below - and before you say "but you could just use 'where exists'", trust me that the original queries were much more involved :)

-----


    drop schema if exists jaytest cascade;
    create schema jaytest;
    set search_path to jaytest;

    create table questions (
      id int not null primary key,
      user_id int not null
    );
    insert into questions
      select generate_series(1,1100), (random()*2000000)::int;

    create table users (
      id int not null primary key
    );
    insert into users select generate_series(1, 2000000);

    vacuum freeze analyze;

    create function srf() returns table (user_id int) as $$
      select u.id
      from users as u
      group by u.id
    $$ language sql stable;

    -- Option 1: Use the set-returning function

    explain analyze
    select questions.id
    from questions
    join srf()
    on srf.user_id = questions.user_id
    where questions.id = 1;

    -- Option 2: Use the equivalent of the set-returning function
    -- (remove any doubts about function call overhead)

    explain analyze
    select questions.id
    from questions
    join (
      select u.id as user_id
      from users as u
      group by u.id
    ) as srf1
    on srf1.user_id = questions.user_id
    where questions.id = 1;

    -- Option 3: Now that we have LATERAL, manually
    -- push the join down into the set-returning function

create function srf_lateral(questions_user_id int) returns table (user_id int) as $$
      select u.id
      from users as u
      where u.id = questions_user_id
      group by u.id
    $$ language sql stable;

    explain analyze
    select questions.id
    from questions, lateral srf_lateral(questions.user_id)
    where questions.id = 1;

    drop schema jaytest cascade;

-----

On my machine, options 1 and 2 usually have runtimes of 100-400 ms, and option 3 is about 0.04ms.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to