[PERFORM] left join + case - how is it processed?
Hi all, I have a view that looks like this: SELECT CASE WHEN r.assetid IS NULL THEN p.assetid ELSE r.assetid END AS assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; It was pointed out to me that the first CASE is useless (since r.assetid will always be the same as p.assetid because of the left join condition) so I'm looking at that to see if it'll make much of a difference and it does. I won't post the whole lot but the first line is the most interesting. # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual time=284.371..341.536 rows=1 loops=1) (The row count is right - it's the total # of rows from sq_ast_perm). When I change the view to be: SELECT p.assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; The Merge left join only returns 3 rows: # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual time=11.544..11.549 rows=1 loops=1) I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? Recently analyzed, only just imported so free of bloat. Running 8.1.11. Thanks! -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
On Sun, Jan 18, 2009 at 10:30 PM, Chris dmag...@gmail.com wrote: Hi all, I have a view that looks like this: SELECT CASE WHEN r.assetid IS NULL THEN p.assetid ELSE r.assetid END AS assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; It was pointed out to me that the first CASE is useless (since r.assetid will always be the same as p.assetid because of the left join condition) so I'm looking at that to see if it'll make much of a difference and it does. I won't post the whole lot but the first line is the most interesting. # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual time=284.371..341.536 rows=1 loops=1) (The row count is right - it's the total # of rows from sq_ast_perm). The row count is VERY WRONG. Apparently the actual number of rows is 1 and the estimate is 75721: that's bad. When I change the view to be: SELECT p.assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; The Merge left join only returns 3 rows: # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual time=11.544..11.549 rows=1 loops=1) I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? The ESTIMATE is 3 rows - the actual rows are 1, just as before. Notice this is a much more accurate estimate: that's good. The reason why the CASE is affecting your query planning is because you are using a query that compares assetid to a constant: SELECT * from sq_vw_ast_perm where assetid='30748'; When PostgreSQL evaluates this statement, assetid gets expanded either into a case statement (with your first view definition) or into sq_ast_perm.assetid (with your second view definition). The latter definition allows PostgreSQL to make use of the column statistics (which are pretty accurate) whereas the former is probably leading to a SWAG, because PostgreSQL isn't very good at estimating the selectivity of CASE. The bad selectivity estimate, in turn, is leading to a poor plan choice... ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
On Sun, Jan 18, 2009 at 9:30 PM, Chris dmag...@gmail.com wrote: Hi all, I have a view that looks like this: SELECT CASE WHEN r.assetid IS NULL THEN p.assetid ELSE r.assetid END AS assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p.granted, p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; It was pointed out to me that the first CASE is useless (since r.assetid will always be the same as p.assetid because of the left join condition) so I'm looking at that to see if it'll make much of a difference and it does. Lets assume it wasn't useless because of that, it would still be obfuscated and probably slower because it is an explicit coalesce() SELECT coalesce( r.assetid, p.assetid ) , coalesce( r.userid , p.userid ) I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; It aperas to me that both of your statements have where clauses, but I believe where isn't that explicit. I'm not sure the nature of your problem permits the query optimizer to eliminate rows at all, even with the where statement. assetid is probably not known when the query optimizer hits, because it is computed based on the nullness of the columns. I'd assume that the optimizer *could* more easily optimize this if you had used coalesce rather than an ad-hoc method with CASE. My guess is you can exclude rows with WHERE if the the column used is an run-time computation involving an ad-hoc CASE. -- Evan Carroll System Lord of the Internets -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
The reason why the CASE is affecting your query planning is because you are using a query that compares assetid to a constant: SELECT * from sq_vw_ast_perm where assetid='30748'; When PostgreSQL evaluates this statement, assetid gets expanded either into a case statement (with your first view definition) or into sq_ast_perm.assetid (with your second view definition). The latter definition allows PostgreSQL to make use of the column statistics (which are pretty accurate) whereas the former is probably leading to a SWAG, because PostgreSQL isn't very good at estimating the selectivity of CASE. The bad selectivity estimate, in turn, is leading to a poor plan choice... If I take it out of the view, it's fine: # SELECT # CASE # WHEN r.assetid IS NULL THEN p.assetid # ELSE r.assetid # END AS assetid, # CASE # WHEN r.userid IS NULL THEN p.userid # ELSE r.userid # END AS userid, p.permission, p.granted, p.cascades #FROM sq_ast_perm p #LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text # where p.assetid='30748'; QUERY PLAN - Merge Left Join (cost=9459.89..9463.13 rows=3 width=102) (actual time=0.096..0.098 rows=1 loops=1) In this case I assume the planner is doing the 'WHERE' first to cut down the rows, then applying the CASE at the end. The view it seems to be the opposite - I still don't understand why that's the case. Though I do get the same behaviour as the view when I do it as a subselect. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
My guess is you can exclude rows with WHERE if the the column used is an run-time computation involving an ad-hoc CASE. * that you can't -- Evan Carroll System Lord of the Internets -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] left join + case - how is it processed?
I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; It aperas to me that both of your statements have where clauses, but I believe where isn't that explicit. I'm not sure the nature of your problem permits the query optimizer to eliminate rows at all, even with the where statement. assetid is probably not known when the query optimizer hits, because it is computed based on the nullness of the columns. I'd assume that the optimizer *could* more easily optimize this if you had used coalesce rather than an ad-hoc method with CASE. My guess is you can exclude rows with WHERE if the the column used is an run-time computation involving an ad-hoc CASE. No difference. Full explain plan here: http://explain-analyze.info/query_plans/2725-query-plan-1447 I can see it's doing the extra filter step at the start (4th line) which is not present without the coalesce/case statement. I just don't understand why it's being done at that stage. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance