Re: [PERFORM] left join + case - how is it processed?

2009-01-19 Thread Tom Lane
Chris dmag...@gmail.com writes:
 I can see it's doing the extra filter step at the start (4th line) which 
 is not present without the coalesce/case statement. I just don't 
 understand why it's being done at that stage.

It's not that hard to understand.  With the original view formulation
(or the COALESCE version), the fully expanded form of the query looks
like

select ... from p left join r ...
  where expression_involving_both_p_and_r = constant

If you make the view output be just p.assetid then you have

select ... from p left join r ...
  where p.assetid = constant

In the first case the planner cannot apply the WHERE restriction until
it's formed the p+r join; so you see the condition applied as a filter
on the join node's output.  In the second case, the planner can push the
WHERE restriction down into the scan of p, since the left join doesn't
affect it.  (If a p row doesn't pass the restriction, then no join row
formed from it can either; ergo there is no need to form those join rows
at all.)

In general a WHERE or JOIN/ON clause cannot be applied below the point
at which all the relations mentioned in it have been joined.  There are
a few special cases where the planner can transform clauses into some
other form that's more optimizable, but you can pretty much bet that a
CASE will never be one of them --- CASE is more or less *defined* to
defeat optimization.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-19 Thread Chris

Tom Lane wrote:

Chris dmag...@gmail.com writes:
I can see it's doing the extra filter step at the start (4th line) which 
is not present without the coalesce/case statement. I just don't 
understand why it's being done at that stage.


It's not that hard to understand.  With the original view formulation
(or the COALESCE version), the fully expanded form of the query looks
like

select ... from p left join r ...
  where expression_involving_both_p_and_r = constant

If you make the view output be just p.assetid then you have

select ... from p left join r ...
  where p.assetid = constant

In the first case the planner cannot apply the WHERE restriction until
it's formed the p+r join; so you see the condition applied as a filter
on the join node's output.  In the second case, the planner can push the
WHERE restriction down into the scan of p, since the left join doesn't
affect it.  (If a p row doesn't pass the restriction, then no join row
formed from it can either; ergo there is no need to form those join rows
at all.)


So because the CASE is on (some of) the fields I'm joining on, in effect 
it's made part of the join condition. If the fields are outside that 
(r.userid/p.userid), then it's evaluated after.


Thanks!

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris

Hi all,

I have a view that looks like this:

 SELECT
CASE
WHEN r.assetid IS NULL THEN p.assetid
ELSE r.assetid
END AS assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text;



It was pointed out to me that the first CASE is useless (since r.assetid 
will always be the same as p.assetid because of the left join condition) 
so I'm looking at that to see if it'll make much of a difference and it 
does.


I won't post the whole lot but the first line is the most interesting.

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';


 Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual 
time=284.371..341.536 rows=1 loops=1)


(The row count is right - it's the total # of rows from sq_ast_perm).


When I change the view to be:

 SELECT p.assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text;



The Merge left join only returns 3 rows:

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

   Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70) 
(actual time=11.544..11.549 rows=1 loops=1)


I thought the where condition would cut down on the rows returned, then 
the case statement would take effect to do the null check. It seems to 
be doing it in reverse ??


Recently analyzed, only just imported so free of bloat. Running 8.1.11.

Thanks!
--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Robert Haas
On Sun, Jan 18, 2009 at 10:30 PM, Chris dmag...@gmail.com wrote:
 Hi all,

 I have a view that looks like this:

  SELECT
CASE
WHEN r.assetid IS NULL THEN p.assetid
ELSE r.assetid
END AS assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
 r.assetid::text = p.assetid::text;


 It was pointed out to me that the first CASE is useless (since r.assetid
 will always be the same as p.assetid because of the left join condition) so
 I'm looking at that to see if it'll make much of a difference and it does.

 I won't post the whole lot but the first line is the most interesting.

 # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';


  Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual
 time=284.371..341.536 rows=1 loops=1)

 (The row count is right - it's the total # of rows from sq_ast_perm).

The row count is VERY WRONG.   Apparently the actual number of rows is
1 and the estimate is 75721: that's bad.

 When I change the view to be:

  SELECT p.assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
 r.assetid::text = p.assetid::text;


 The Merge left join only returns 3 rows:

 # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

   Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70) (actual
 time=11.544..11.549 rows=1 loops=1)

 I thought the where condition would cut down on the rows returned, then the
 case statement would take effect to do the null check. It seems to be doing
 it in reverse ??

The ESTIMATE is 3 rows - the actual rows are 1, just as before.
Notice this is a much more accurate estimate: that's good.

The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition).  The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE.  The bad selectivity estimate, in turn, is
leading to a poor plan choice...

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Evan Carroll
On Sun, Jan 18, 2009 at 9:30 PM, Chris dmag...@gmail.com wrote:
 Hi all,

 I have a view that looks like this:

  SELECT
CASE
WHEN r.assetid IS NULL THEN p.assetid
ELSE r.assetid
END AS assetid,
CASE
WHEN r.userid IS NULL THEN p.userid
ELSE r.userid
END AS userid, p.permission, p.granted, p.cascades
   FROM sq_ast_perm p
   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
 r.assetid::text = p.assetid::text;


 It was pointed out to me that the first CASE is useless (since r.assetid
 will always be the same as p.assetid because of the left join condition) so
 I'm looking at that to see if it'll make much of a difference and it does.

Lets assume it wasn't useless because of that, it would still be
obfuscated and probably slower because it is an explicit coalesce()

SELECT coalesce( r.assetid, p.assetid ) , coalesce( r.userid , p.userid )


 I thought the where condition would cut down on the rows returned, then the
 case statement would take effect to do the null check. It seems to be doing
 it in reverse ??
# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

It aperas to me that both of your statements have where clauses, but I
believe where isn't that explicit.  I'm not sure the nature of your
problem permits the query optimizer to eliminate rows at all, even
with the where statement. assetid is probably not known when the
query optimizer hits, because it is computed based on the nullness of
the columns.  I'd assume that the optimizer *could* more easily
optimize this if you had used coalesce rather than an ad-hoc method
with CASE. My guess is you can exclude rows with WHERE if the the
column used is an run-time computation involving an ad-hoc CASE.


-- 
Evan Carroll
System Lord of the Internets

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris



The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition).  The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE.  The bad selectivity estimate, in turn, is
leading to a poor plan choice...


If I take it out of the view, it's fine:

#  SELECT
# CASE
# WHEN r.assetid IS NULL THEN p.assetid
# ELSE r.assetid
# END AS assetid,
# CASE
# WHEN r.userid IS NULL THEN p.userid
# ELSE r.userid
# END AS userid, p.permission, p.granted, p.cascades
#FROM sq_ast_perm p
#LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND 
r.assetid::text = p.assetid::text

# where p.assetid='30748';

  QUERY PLAN 


-
 Merge Left Join  (cost=9459.89..9463.13 rows=3 width=102) (actual 
time=0.096..0.098 rows=1 loops=1)


In this case I assume the planner is doing the 'WHERE' first to cut down 
the rows, then applying the CASE at the end.


The view it seems to be the opposite - I still don't understand why 
that's the case.



Though I do get the same behaviour as the view when I do it as a subselect.

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Evan Carroll
 My guess is you can exclude rows with WHERE if the the
 column used is an run-time computation involving an ad-hoc CASE.


* that you can't


-- 
Evan Carroll
System Lord of the Internets

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] left join + case - how is it processed?

2009-01-18 Thread Chris





I thought the where condition would cut down on the rows returned, then the
case statement would take effect to do the null check. It seems to be doing
it in reverse ??

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

It aperas to me that both of your statements have where clauses, but I
believe where isn't that explicit.  I'm not sure the nature of your
problem permits the query optimizer to eliminate rows at all, even
with the where statement. assetid is probably not known when the
query optimizer hits, because it is computed based on the nullness of
the columns.  I'd assume that the optimizer *could* more easily
optimize this if you had used coalesce rather than an ad-hoc method
with CASE. My guess is you can exclude rows with WHERE if the the
column used is an run-time computation involving an ad-hoc CASE.


No difference.

Full explain plan here:

http://explain-analyze.info/query_plans/2725-query-plan-1447

I can see it's doing the extra filter step at the start (4th line) which 
is not present without the coalesce/case statement. I just don't 
understand why it's being done at that stage.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance