[HACKERS] Outer where pushed down

2005-10-06 Thread Gaetano Mendola
Hi all,

consider this view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM   user_login ul,
   current_connection cc
WHERE ul.id_user = cc.id_user;


And this is the explain on a usage of that view:

# explain select * from v_current_connection_test where 
sp_connected_test(id_user) = FALSE;
QUERY PLAN
--
 Hash Join  (cost=42.79..1325.14 rows=451 width=5)
   Hash Cond: (outer.id_user = inner.id_user)
   -  Seq Scan on user_login ul  (cost=0.00..1142.72 rows=27024 width=4)
 Filter: (sp_connected_test(id_user) = false)
   -  Hash  (cost=40.49..40.49 rows=919 width=5)
 -  Index Scan using idx_connected on current_connection cc  
(cost=0.00..40.49 rows=919 width=5)
   Index Cond: (connected = true)
(7 rows)

apart my initial surprise to see that function applied at rows not returned by 
the view
( Tom Lane explained me that the planner is able to push down the outer 
condition )
why postgres doesn't apply that function at table current_connection given the 
fact are extimated
only 919 vs 27024 rows?


redefining the view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT cc.id_user
FROM   user_login ul,
   current_connection cc
WHERE ul.id_user = cc.id_user;

then I obtain the desidered plan.


# explain select * from v_current_connection_test where 
sp_connected_test(id_user ) = FALSE;
QUERY PLAN
--
 Hash Join  (cost=46.23..1193.47 rows=452 width=5)
   Hash Cond: (outer.id_user = inner.id_user)
   -  Seq Scan on user_login ul  (cost=0.00..872.48 rows=54048 width=4)
   -  Hash  (cost=45.08..45.08 rows=460 width=5)
 -  Index Scan using idx_connected on current_connection cc  
(cost=0.00..45.08 rows=460 width=5)
   Index Cond: (connected = true)
   Filter: (sp_connected_test(id_user) = false)
(7 rows)



Is not possible in any way push postgres to apply that function to the right 
table ?
Shall I rewrite the views figuring out wich column is better to expose ?



Regards
Gaetano Mendola
















---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Outer where pushed down

2005-10-06 Thread Tom Lane
Gaetano Mendola [EMAIL PROTECTED] writes:
 CREATE OR REPLACE VIEW v_current_connection AS
 SELECT ul.id_user
 FROM   user_login ul,
current_connection cc
 WHERE ul.id_user = cc.id_user;

 # explain select * from v_current_connection_test where 
 sp_connected_test(id_user) = FALSE;

 why postgres doesn't apply that function at table current_connection given 
 the fact are extimated
 only 919 vs 27024 rows?

Because the condition is on a field of the other table.

You seem to wish that the planner would use ul.id_user = cc.id_user
to decide that sp_connected_test(ul.id_user) can be rewritten as
sp_connected_test(cc.id_user), but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are.  As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Outer where pushed down

2005-10-06 Thread Gaetano Mendola
Tom Lane wrote:
 Gaetano Mendola [EMAIL PROTECTED] writes:
 CREATE OR REPLACE VIEW v_current_connection AS
 SELECT ul.id_user
 FROM   user_login ul,
current_connection cc
 WHERE ul.id_user = cc.id_user;
 
 # explain select * from v_current_connection_test where 
 sp_connected_test(id_user) = FALSE;
 
 why postgres doesn't apply that function at table current_connection given 
 the fact are extimated
 only 919 vs 27024 rows?
 
 Because the condition is on a field of the other table.
 
 You seem to wish that the planner would use ul.id_user = cc.id_user
 to decide that sp_connected_test(ul.id_user) can be rewritten as
 sp_connected_test(cc.id_user), but in general this is not safe.
 The planner has little idea of what the datatype-specific semantics
 of equality are, and none whatsoever what the semantics of your
 function are.  As a real-world example: IEEE-standard floating
 point math considers that +0 and -0 are different bit patterns.
 They compare as equal, but it's very easy to come up with user-defined
 functions that will yield different results for the two inputs.
 So the proposed transformation is definitely unsafe for float8.

And what about to define for each type when this is safe and let
the planner make his best choice ?
Rewriting that view the execution time passed from 4 secs to 1 sec,
that is not bad if the planner can do it autonomously. In this very
example I can decide if it's better expose one column or the other
one but in other cases not...


Regards
Gaetano Mendola


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings