[PERFORM] Cost estimate vs. actual - do I care?

2012-01-01 Thread Jay Levitt
I gather that a big part of making queries performant is making sure the 
planner's estimates reflect reality.


Given a random explain analyze line:

Limit  (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729 
rows=50 loops=1)


which is the truer statement?

1. As long as costs go up with actual time, you're fine.

2. You should try to ensure that costs go up linearly with actual time.

3. You should try to ensure that costs are as close as possible to actual time.

4. The number 4.

Jay Levitt

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-16 Thread Jay Levitt

Tom Lane wrote:

Jay Levittjay.lev...@gmail.com  writes:

If the query was more like



select questions.id
from questions
join (
select sum(u.id)
from users as u
group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;



would you no longer be surprised that it scanned all user rows?


I'd suggest rephrasing the query to do the join underneath the GROUP BY.


Well, my real goal is to have that inner query in a set-returning function 
that gives a computed table of other users relative to the current user, and 
then be able to JOIN that with other things and ORDER BY it:


select questions.id
from questions
join (select * from relevance(current_user)) as r
on r.id = questions.user_id
where questions.id = 1;

I assume there's no way for that function (in SQL or PL/pgSQL) to reach to 
the upper node and say do that join again here, or force the join order 
from down below? I can't imagine how there could be, but never hurts to ask. 
 Right now, our workaround is to pass the joined target user as a function 
parameter and do the JOIN in the function, but that means we have to put the 
function in the select list, else we hit the lack of LATERAL support:


 -- This would need LATERAL

select questions.id
from questions
join (
  select * from relevance(current_user, questions.user_id)) as r
)
on r.id = questions.user_id
where questions.id = 1;

 -- This works but has lots of row-at-a-time overhead

select questions.id, (
  select * from relevance(current_user, questions.user_id)
) as r
from questions
where questions.id = 1;

Again, just checking if there's a solution I'm missing.  I know the 
optimizer is only asymptotically approaching optimal!


Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-10 Thread Jay Levitt

Kevin Grittner wrote:

Merlin Moncuremmonc...@gmail.com  wrote:


Well, this may not fit the OP's 'real' query


Right, if I recall correctly, the OP said it was simplified down as
far as it could be and still have the issue show.


but the inner subquery is probably better written as a semi-join
(WHERE EXISTS).


Kevin's right.  The real query involves several SQL and PL/pgsql functions 
(all now inlineable), custom aggregates, a union or two and a small coyote. 
 I could post it, but that feels like Please write my code for me. 
Still, if you really want to...


Meanwhile, it's good for me to learn how the planner sees my queries and how 
I can best state them.  I assume this is me not understanding something 
about restrictions across group-by nodes.


If the query was more like

select questions.id
from questions
join (
  select sum(u.id)
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

would you no longer be surprised that it scanned all user rows?  I.E. is the 
group by a red herring, which usually wouldn't be present without an 
aggregate, and the real problem is that the planner can't restrict aggregates?


This comment in planagg.c may be relevant; I'm not doing min/max, but is it 
still true that GROUP BY always looks at all the rows, period?


void
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
...
/* We don't handle GROUP BY or windowing, because our current
* implementations of grouping require looking at all the rows anyway,
*/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-10 Thread Jay Levitt

Don't hold your breath waiting for that to change.  To do what you're
wishing for, we'd have to treat the GROUP BY subquery as if it were an
inner indexscan, and push a join condition into it.  That's not even
possible today.


Thanks! Knowing that's not a thing helps; we'll just have to rephrase the 
query.


Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-09 Thread Jay Levitt

Kevin Grittner wrote:

Jay Levittjay.lev...@gmail.com  wrote:


I don't get why the GROUP BY in this subquery forces it to scan
the entire users table (seq scan here, index scan on a larger
table) when there's only one row in users that can match:



Are you sure there's a plan significantly faster than 1.3 ms?


Yep!  Watch this:

drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;

create table questions (
  id int not null primary key,
  user_id int not null
);
insert into questions
  select generate_series(1,1100), (random()*200)::int;

create table users (
  id int not null primary key
);
insert into users select generate_series(1, 200);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

---
 Merge Join  (cost=8.28..90833.02 rows=1818 width=4) (actual 
time=888.787..888.790 rows=1 loops=1)

   Merge Cond: (u.id = questions.user_id)
   -  Group  (cost=0.00..65797.47 rows=200 width=4) (actual 
time=0.017..735.509 rows=1747305 loops=1)
 -  Index Scan using users_pkey on users u  (cost=0.00..60797.47 
rows=200 width=4) (actual time=0.015..331.990 rows=1747305 loops=1)
   -  Materialize  (cost=8.28..8.29 rows=1 width=8) (actual 
time=0.013..0.015 rows=1 loops=1)
 -  Sort  (cost=8.28..8.28 rows=1 width=8) (actual 
time=0.012..0.013 rows=1 loops=1)

   Sort Key: questions.user_id
   Sort Method:  quicksort  Memory: 25kB
   -  Index Scan using questions_pkey on questions 
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)

 Index Cond: (id = 1)
 Total runtime: 888.832 ms
(11 rows)

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;

---
 Nested Loop  (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021 
rows=1 loops=1)
   -  Index Scan using questions_pkey on questions  (cost=0.00..8.27 
rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)

 Index Cond: (id = 1)
   -  Index Scan using users_pkey on users u  (cost=0.00..8.49 rows=1 
width=4) (actual time=0.007..0.007 rows=1 loops=1)

 Index Cond: (u.id = questions.user_id)
 Total runtime: 0.045 ms
(6 rows)


That said, there might be some room for an optimization which pushes
that test into the query with the group by clause.  I don't know
if there's a problem with that which I'm missing, the construct was
judged to be too rare to be worth the cost of testing for it, or
it's just that nobody has yet gotten to it.


Anyone have more insights on whether this is hard to optimize or simply 
not-yet-optimized?  And if the latter, where might I start looking? (Not 
that you -really- want me to submit a patch; my C has regressed to the try 
an ampersand.  OK, try an asterisk. level...)


Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Predicates not getting pushed into SQL function?

2011-11-07 Thread Jay Levitt

Tom Lane wrote:
 Please don't send HTML-only email to these lists.

Oops - new mail client, sorry.

 Anyway, the answer seems to be that inline_set_returning_function needs
 some work to handle cases with declared OUT parameters.  I will see
 about fixing that going forward, but in existing releases what you need
 to do is declare the function as returning SETOF some named composite
 type


Yes, that patch works great!  Oddly enough, the workaround now does NOT 
work; functions returning SETOF named composite types don't get inlined, but 
functions returning the equivalent TABLE do get inlined.  Let me know if you 
need a failcase, but the bug doesn't actually affect me now :)


Jay


 create type matcher_result as (user_id int, match int);

 create or replace function matcher() returns setof matcher_result as ...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Predicates not getting pushed into SQL function?

2011-11-07 Thread Jay Levitt

Jay Levitt wrote:

Yes, that patch works great! Oddly enough, the workaround now does NOT work;
functions returning SETOF named composite types don't get inlined, but
functions returning the equivalent TABLE do get inlined. Let me know if you
need a failcase, but the bug doesn't actually affect me now :)


Never mind... I left a strict in my test.  Works great all around.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Subquery in a JOIN not getting restricted?

2011-11-07 Thread Jay Levitt

When I run the following query:

select questions.id
from questions
join (
select u.id as user_id
from users as u
left join scores as s
on s.user_id = u.id
) as subquery
on subquery.user_id = questions.user_id;

the subquery is scanning my entire user table, even though it's restricted 
by the outer query.  (My real subquery is much more complicated, of course, 
but this is the minimal fail case.)


Is this just not a thing the optimizer can do?  Are there ways to rewrite 
this, still as a subquery, that will be smart enough to only produce the one 
row of subquery that matches questions.user_id?


Jay Levitt

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-07 Thread Jay Levitt

Kevin Grittner wrote:

Kevin Grittnerkevin.gritt...@wicourts.gov  wrote:


If I had made the scores table wider, it might have gone from the
user table to scores on the index.


Bah.  I just forgot to put an index on scores.user_id.  With that
index available it did what you were probably expecting -- seq scan
on questions, nested loop index scan on users, nested loop index
scan on scores.

You weren't running you test with just a few rows in each table and
expecting the same plan to be generated as for tables with a lot of
rows, were you?


No, we're a startup - we only have 2,000 users and 17,000 scores!  We don't 
need test databases yet...


But I just realized something I'd completely forgot (or blocked) - scores is 
a view.  And views don't have indexes.  The underlying tables are ultimately 
indexed by user_id, but I can believe that Postgres doesn't think that's a 
cheap way to do it - especially since we're still using stock tuning 
settings (I know) so its costs are all screwed up.


And yep!  When I do a CREATE TABLE AS from that view, and add an index on 
user_id, it works just as I'd like.  I've been meaning to persist that view 
anyway, so that's what I'll do.


Thanks for the push in the right direction..

Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery in a JOIN not getting restricted?

2011-11-07 Thread Jay Levitt

Jay Levitt wrote:

And yep! When I do a CREATE TABLE AS from that view, and add an index on
user_id, it works just as I'd like.


Or not.  Feel free to kick me back over to pgsql-novice, but I don't get why 
the GROUP BY in this subquery forces it to scan the entire users table (seq 
scan here, index scan on a larger table) when there's only one row in users 
that can match:


create table questions (
  id int not null primary key,
  user_id int not null
);
insert into questions
  select generate_series(1,1100), (random()*2000)::int;

create table users (
  id int not null primary key
);
insert into users select generate_series(1, 2000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;


Hash Join  (cost=42.28..89.80 rows=2 width=4) (actual time=0.857..1.208 
rows=1 loops=1)

   Hash Cond: (u.id = questions.user_id)
   -  HashAggregate  (cost=34.00..54.00 rows=2000 width=4) (actual 
time=0.763..1.005 rows=2000 loops=1)
 -  Seq Scan on users u  (cost=0.00..29.00 rows=2000 width=4) 
(actual time=0.003..0.160 rows=2000 loops=1)
   -  Hash  (cost=8.27..8.27 rows=1 width=8) (actual time=0.015..0.015 
rows=1 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 -  Index Scan using questions_pkey on questions  (cost=0.00..8.27 
rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)

   Index Cond: (id = 1)
 Total runtime: 1.262 ms

This is on patched 9.0.5 built earlier today.  The real query has 
aggregates, so it really does need GROUP BY.. I think..


Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Predicates not getting pushed into SQL function?

2011-11-03 Thread Jay Levitt
I'm confused.  I have a now-trivial SQL function that, unrestricted, would 
scan my whole users table.  When I paste the body of the function as a 
subquery and restrict it to one row, it only produces one row.  When I paste 
the body of the function into a view and restrict it to one row, it produces 
one row.  But when I put it in a SQL function... it scans the whole users 
table and then throws the other rows away.


I thought SQL functions were generally inline-able, push-down-able, etc.  As 
a workaround, I can put my WHERE clause inside the function and pass it 
parameters, but that feels ugly, and it won't help for things like 
resticting via JOINs.  The real function needs parameters, so I can't use it 
as a view.  Are there better workarounds?


I suspect the problem is (something like) the planner doesn't realize the 
function will produce a variable number of rows; I can specify COST or ROWS, 
but they're both fixed values.


Pretty-printed function and explain analyze results:

https://gist.github.com/1336963

In ASCII for web-haters and posterity:

-- THE OVERLY SIMPLIFIED FUNCTION

create or replace function matcher()
returns table(user_id int, match int) as $$

select o.user_id, 1 as match
from (
  select u.id as user_id, u.gender
  from users as u
) as o
cross join
(
  select u.id as user_id, u.gender
  from users as u
  where u.id = 1
) as my;

$$ language sql stable;

-- WHEN I CALL IT AS A FUNCTION

select * from matcher() where user_id = 2;

LOG:  duration: 1.242 ms  plan:
Query Text:

select o.user_id, 1 as match
from (
  select u.id as user_id, u.gender
  from users as u
) as o
cross join
(
  select u.id as user_id, u.gender
  from users as u
  where u.id = 1
) as my;


Nested Loop  (cost=0.00..118.39 rows=1656 width=4) (actual 
time=0.022..0.888 rows=1613 loops=1)

  Output: u.id, 1
  -  Index Scan using users_pkey on public.users u  (cost=0.00..8.27 
rows=1 width=0) (actual time=0.013..0.015 rows=1 loops=1)

Index Cond: (u.id = 1)
  -  Seq Scan on public.users u  (cost=0.00..93.56 rows=1656 width=4) 
(actual time=0.004..0.479 rows=1613 loops=1)

Output: u.id
CONTEXT:  SQL function matcher statement 1
LOG:  duration: 1.951 ms  plan:
Query Text: select * from matcher() where user_id = 2;
Function Scan on public.matcher  (cost=0.25..12.75 rows=5 width=8) 
(actual time=1.687..1.940 rows=1 loops=1)

  Output: user_id, match
  Filter: (matcher.user_id = 2)

-- WHEN I CALL IT AS A SUBQUERY

select * from
(
  select o.user_id, 1 as match
  from (
select u.id as user_id, u.gender
from users as u
  ) as o
  cross join
  (
select u.id as user_id, u.gender
from users as u
where u.id = 1
  ) as my
) as matcher
where user_id = 2;

LOG:  duration: 0.044 ms  plan:
Query Text: select * from
(
  select o.user_id, 1 as match
  from (
select u.id as user_id, u.gender
from users as u
  ) as o
  cross join
  (
select u.id as user_id, u.gender
from users as u
where u.id = 1
  ) as my
) as matcher
where user_id = 2;
Nested Loop  (cost=0.00..16.55 rows=1 width=4) (actual 
time=0.028..0.031 rows=1 loops=1)

  Output: u.id, 1
  -  Index Scan using users_pkey on public.users u  (cost=0.00..8.27 
rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1)

Output: u.id
Index Cond: (u.id = 2)
  -  Index Scan using users_pkey on public.users u  (cost=0.00..8.27 
rows=1 width=0) (actual time=0.004..0.006 rows=1 loops=1)

Index Cond: (u.id = 1)

-- WHEN I CALL IT AS A VIEW

create view matchview as
select o.user_id, 1 as match
  from (
select u.id as user_id, u.gender
from users as u
  ) as o
  cross join
  (
select u.id as user_id, u.gender
from users as u
where u.id = 1
  ) as my;

select * from matchview where user_id = 2;


LOG:  duration: 0.044 ms  plan:
Query Text: select * from matchview where user_id = 2;
Nested Loop  (cost=0.00..16.55 rows=1 width=4) (actual 
time=0.028..0.031 rows=1 loops=1)

  Output: u.id, 1
  -  Index Scan using users_pkey on public.users u  (cost=0.00..8.27 
rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1)

Output: u.id
Index Cond: (u.id = 2)
  -  Index Scan using users_pkey on public.users u  (cost=0.00..8.27 
rows=1 width=0) (actual time=0.005..0.007 rows=1 loops=1)

Index Cond: (u.id = 1)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Predicates not getting pushed into SQL function?

2011-11-03 Thread Jay Levitt
What other info can I 
provide? id is int, gender is varchar(255), and it's happening on 
9.0.4...

  
   
   	Tom Lane 
 
  November 3, 2011 
2:41 PM
  
  Jay Levitt jay.lev...@gmail.com writes:
I'm confused.  I have a now-trivial SQL function that, unrestricted, would 
scan my whole users table.  When I paste the body of the function as a 
subquery and restrict it to one row, it only produces one row.  When I paste 
the body of the function into a view and restrict it to one row, it produces 
one row.  But when I put it in a SQL function... it scans the whole users 
table and then throws the other rows away.

I thought SQL functions were generally inline-able, push-down-able, etc.

inline-able, yes, but if they're not inlined you don't get any such
thing as pushdown of external conditions into the function body.
A non-inlined function is a black box.

The interesting question here is why the function doesn't get inlined
into the calling query.  You got the obvious showstoppers: it has a
SETOF result, it's not volatile, nor strict.  The only other possibility
I can see offhand is that there's some sort of result datatype mismatch,
but you've not provided enough info to be sure about that.

			regards, tom lane

   	   
   	Jay Levitt
  
  November 3, 2011 
1:47 PM
  I'm confused. I have a 
now-trivial SQL function that, unrestricted, would 
scan my whole users table. When I paste the body of the function as a 
subquery and restrict it to one row, it only produces one row. When I 
paste 
the body of the function into a view and restrict it to one row, it 
produces 
one row. But when I put it in a SQL function... it scans the whole 
users 
table and then throws the other rows away.

I thought SQL functions were generally inline-able, push-down-able, 
etc. As 
a workaround, I can put my WHERE clause inside the function and pass it 
parameters, but that feels ugly, and it won't help for things like 
resticting via JOINs. The real function needs parameters, so I can't 
use it 
as a view. Are there better workarounds?

I suspect the problem is (something like) the planner doesn't 
realize the 
function will produce a variable number of rows; I can specify COST or 
ROWS, 
but they're both fixed values.

Pretty-printed function and explain analyze results:

https://gist.github.com/1336963

In ASCII for web-haters and posterity:

-- THE OVERLY SIMPLIFIED FUNCTION

create or replace function matcher()
returns table(user_id int, match int) as $$

 select o.user_id, 1 as match
 from (
 select u.id as user_id, u.gender
 from users as u
 ) as o
 cross join
 (
 select u.id as user_id, u.gender
 from users as u
 where u.id = 1
 ) as my;

$$ language sql stable;

-- WHEN I CALL IT AS A FUNCTION

select * from matcher() where user_id = 2;

LOG: duration: 1.242 ms plan:
 Query Text:

 select o.user_id, 1 as match
 from (
 select u.id as user_id, u.gender
 from users as u
 ) as o
 cross join
 (
 select u.id as user_id, u.gender
 from users as u
 where u.id = 1
 ) as my;


 Nested Loop (cost=0.00..118.39 rows=1656 width=4) (actual 
time=0.022..0.888 rows=1613 loops=1)
 Output: u.id, 1
 - Index Scan using users_pkey on public.users u 
(cost=0.00..8.27 
rows=1 width=0) (actual time=0.013..0.015 rows=1 loops=1)
 Index Cond: (u.id = 1)
 - Seq Scan on public.users u (cost=0.00..93.56 rows=1656
 width=4) 
(actual time=0.004..0.479 rows=1613 loops=1)
 Output: u.id
CONTEXT: SQL function "matcher" statement 1
LOG: duration: 1.951 ms plan:
 Query Text: select * from matcher() where user_id = 2;
 Function Scan on public.matcher (cost=0.25..12.75 rows=5 
width=8) 
(actual time=1.687..1.940 rows=1 loops=1)
 Output: user_id, match
 Filter: (matcher.user_id = 2)

-- WHEN I CALL IT AS A SUBQUERY

select * from
(
 select o.user_id, 1 as match
 from (
 select u.id as user_id, u.gender
 from users as u
 ) as o
 cross join
 (
 select u.id as user_id, u.gender
 from users as u
 where u.id = 1
 ) as my
) as matcher
where user_id = 2;

LOG: duration: 0.044 ms plan:
 Query Text: select * from
 (
 select o.user_id, 1 as match
 from (
 select u.id as user_id, u.gender
 from users as u
 ) as o
 cross join
 (
 select u.id as user_id, u.gender
 from users as u
 where u.id = 1
 ) as my
 ) as matcher
 where user_id = 2;
 Nested Loop (cost=0.00..16.55 rows=1 width=4) (actual 
time=0.028..0.031 rows=1 loops=1)
 Output: u.id, 1
 - Index Scan using users_pkey on public.users u 
(cost=0.00..8.27 
rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1)
 Output: u.id
 Index Cond: (u.id = 2)
 - Index Scan using users_pkey on public.users u 
(cost=0.00..8.27 
rows=1 width=0) (actual time=0.004..0.006 rows=1 loops=1)
 Index Cond: (u.id = 1)

-- WHEN I CALL IT AS A VIEW

create view matchview as
select o.user_id, 1 as match
 from (
 select u.id as user_id, u.gender
 from users as u
 ) as o
 cross join
 (
 select u.id as user_id, u.gender
 from users as u
 where u.id = 1
 ) as my;

select * from matchview where user_id = 2;


LOG: duration

[PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-02 Thread Jay Levitt
As I've come up to speed on SQL and PostgreSQL with some
medium-complexity queries, I've asked a few questions about what the
optimizer will do in various situations.  I'm not talking about the
seq-scan-vs-index type of optimizing; I mean transforming within the
relational calculus (algebra?) to an equivalent but more performant
query.  The same topics come up:

- Flattening.  I think that means Merge the intent of the subquery
into the various clauses of the parent query.

- Inlining. That's Don't run this function/subquery/view as an atomic
unit; instead, push it up into the parent query so the optimizer can
see it all at once.  Maybe that's the same as flattening.

- Predicate pushdown. That's This subquery produces a lot of rows,
but the parent query has a WHERE clause that will eliminate half of
them, so don't produce the unnecessary rows.

Am I right so far?  Now, the big question, which I haven't seen
documented anywhere: Under what circumstances can the optimizer do
each of these things?

For instance, I have a complex query that calculates the similarity of
one user to every other user.  The output is two columns, one row per
user:

   select * from similarity(my_user_id);

   other_user | similarity%
   ---|-
  123 |  99

Being a novice at SQL, I first wrote it in PL/pgSQL, so I could stay
in my imperative, iterative head.  The query performed decently well
when scanning the whole table, but when I only wanted to compare
myself to a single user, I said:

   select * from similarity(my_user_id) as s where s.other_user = 321;

And, of course, similarity() produced the whole table anyway, because
predicates don't get pushed down into PL/pgSQL functions.

So I went and rewrote similarity as a SQL function, but I still didn't
want one big hairy SQL query. Ah ha! CTEs let you write modular
subqueries, and you also avoid problems with lack of LATERAL. I'll use
those.

.. But of course predicates don't get pushed into CTEs, either.  (Or
maybe it was that they would, but only if they were inline with the
predicate.. I forget now.)

So you can see where I'm going.  I know if I break everything into
elegant, composable functions, it'll continue to perform poorly.  If I
write one big hairy, it'll perform great but it will be difficult to
maintain, and it will be inelegant and a kitten will die.  My tools
are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
views (and other tools?)  What optimizations do each of those prevent?

We're on 9.0 now but will happily upgrade to 9.1 if that matters.

Jay Levitt

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance