Re: [PERFORM] IS NOT NULL and LEFT JOIN

2014-10-20 Thread Laurent Martelli

Hi David,

Do we agree that both queries are identical ? Since we join on 
c.user_info=u.id  having u.id  is not null or 
c.user_info is not null in the where clause is the same, isn't it ?


Since c.user_info=u.id  the condition onu.id is not null 
does not use any *new* information from user_user_info.


Regards,
Laurent

Le 19/10/2014 10:41, David Rowley a écrit :
On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli 
mailto:laurent.marte...@enercoop.org>> 
wrote:


Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on
user_user_info,
but it is generated like this by hibernate. Just changing the IS
NOT NULL condition
to the other side of useless JOIN makes a big difference in the
query plan :

-- THE BAD ONE : given the selectivity on c.name 
and c.email, barely more than one row will ever be returned


But it looks like you're ignoring the fact that the OR condition would 
force the query to match not only the user and the email, but also any 
row that finds a match in the user_user_info table, which going by the 
planner's estimates, that's every row in the contract_contract table. 
This is why the planner chooses a seqscan on the contract_contract 
table instead of using the index on lower(name).


Is it really your intention to get all rows that find a this martelli 
contract that has this email, and along with that, get every contract 
that has a not null user_info record?


I see that you have a foreign key on c.user_info to reference the 
user, so this should be matching everything with a non null user_info 
record.


explain analyze select c.*
   from contact_contact c
   left outer join user_user_info u on c.user_info=u.id

   left outer join contact_address a on c.address=a.id

  where lower(c.name )='martelli'
and c.email='ds...@ezrfz.com ' or
u.id  is not null;

   QUERY PLAN


 Hash Left Join  (cost=1.83..2246.76 rows=59412 width=4012)
(actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id )
   Filter: (((lower((c.name )::text) =
'martelli'::text) AND ((c.email)::text = 'ds...@ezrfz.com
'::text)) OR (u.id  IS NOT NULL))
   Rows Removed by Filter: 58247
   ->  Seq Scan on contact_contact c (cost=0.00..2022.12
rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   ->  Hash  (cost=1.37..1.37 rows=37 width=8) (actual
time=0.029..0.029 rows=37 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 2kB
 ->  Seq Scan on user_user_info u (cost=0.00..1.37 rows=37
width=8) (actual time=0.004..0.015 rows=37 loops=1)
 Planning time: 0.790 ms
 Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead
of userinfo1_.id)
explain analyze select c.*
   from contact_contact c
   left outer join user_user_info u on c.user_info=u.id

   left outer join contact_address a on c.address=a.id

  where lower(c.name )='martelli'
and c.email='ds...@ezrfz.com ' or
c.user_info is not null;
QUERY PLAN


 Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1
width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'ds...@ezrfz.com
'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND
((email)::text = 'ds...@ezrfz.com
'::text)) OR (user_info IS NOT NULL))
   ->  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual
time=0.034..0.034 rows=0 loops=1)
 ->  Bitmap Index Scan on idx_contact_email 
(cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0

loops=1)
   Index Cond: ((email)::text = 'ds...@ezrfz.com
'::text)
 ->  Bitmap Index Scan on contact_contact_user_info_idx 
(cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0

loops=1)
   Index Cond: (user_info IS NOT NULL)
 Planning time: 0.602 ms
 Execution time: 0.118 ms


If you look closely at the 2nd query plan, you'll see that no joins 
are performed, and it's only the contract_contract table that's looked 
at. This is 

Re: [PERFORM] IS NOT NULL and LEFT JOIN

2014-10-20 Thread Tom Lane
Laurent Martelli  writes:
> Do we agree that both queries are identical ?

No, they *aren't* identical.  Go consult any SQL reference.  Left join
conditions don't work the way you seem to be thinking: after the join,
the RHS column might be null, rather than equal to the LHS column.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query with large number of joins

2014-10-20 Thread Marco Di Cesare
We are using Postgres for the first time after being SQLServer users for a long 
time so forgive for being noobs.

We are using a BI tool that generates a query with an unusually large number of 
joins. My understanding is that with this many joins Postgres query planner 
can't possibly use an exhaustive search so it drops into a heuristics 
algorithm. Unfortunately, the query runs quite slow (~35 seconds) and seems to 
ignore using primary keys and indexes where available.

Query plan here (sorry had to anonymize):
http://explain.depesz.com/s/Uml

Line 30 is one of the pain points where a full table scan is running on 4.2 
million rows even though there are indexes on oscar_bravo.foxtrot_four and 
oscar_charlie.foxtrot_four

We've tried to play around with the join_collapse_limit value by upping it from 
the default of 8 to 10 or 12 but it doesn't seem to help much. Cranking the 
value up to an unreasonable value of 20 does shave some seconds off the query 
time but not substantially (explain plan with the value set to 20: 
http://explain.depesz.com/s/sW6).

We haven't tried playing around with the geqo_threshold at this point.

Any thoughts on ways to speed up the run time of this query or any other 
Postgres settings we should be aware of when dealing with this unusually large 
number of joins?

Thanks in advance



Marco Di Cesare




Re: [PERFORM] IS NOT NULL and LEFT JOIN

2014-10-20 Thread Laurent Martelli

Le 20/10/2014 15:58, Tom Lane a écrit :

Laurent Martelli  writes:

Do we agree that both queries are identical ?

No, they *aren't* identical.  Go consult any SQL reference.  Left join
conditions don't work the way you seem to be thinking: after the join,
the RHS column might be null, rather than equal to the LHS column.

Yes, I was wrong to assume that c.user_info=u.id because of the LEFT JOIN.

But since I only want rows where u.id IS NOT NULL, in any case I will 
also have c.user_info IS NOT NULL.


Also, having a foreign key, if c.user_info is not null, it will have a 
match in u. So in that case, either both c.user_info and c.id are null 
in the result rows, or they are equal.


Regards,
Laurent


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] IS NOT NULL and LEFT JOIN

2014-10-20 Thread David G Johnston
Laurent Martelli wrote
> Le 20/10/2014 15:58, Tom Lane a écrit :
>> Laurent Martelli <

> laurent.martelli@

> > writes:
>>> Do we agree that both queries are identical ?
>> No, they *aren't* identical.  Go consult any SQL reference.  Left join
>> conditions don't work the way you seem to be thinking: after the join,
>> the RHS column might be null, rather than equal to the LHS column.
> Yes, I was wrong to assume that c.user_info=u.id because of the LEFT JOIN.
> 
> But since I only want rows where u.id IS NOT NULL, in any case I will 
> also have c.user_info IS NOT NULL.
> 
> Also, having a foreign key, if c.user_info is not null, it will have a 
> match in u. So in that case, either both c.user_info and c.id are null 
> in the result rows, or they are equal.

The planner only expends so much effort converting between equivalent query
forms.  By adding u.id IS NOT NULL you are saying that you really meant to
use INNER JOIN instead of LEFT JOIN but whether the planner can and/or does
act on that information in the WHERE clause to modify its joins is beyond my
knowledge.  It doesn't seem to and probably correctly isn't worth adding the
planner cycles to fix a poorly written/generated query on-the-fly.


Now that it has been pointed out that the two queries you supplied are
semantically different it is unclear what your point here is.  It is known
that Hibernate (and humans too) will generate sub-optimal plans that can be
rewritten using relational algebra and better optimized for having done so. 
But such work takes resources that would be expended for every single query
while manually rewriting the sub-optimal query solves the problem
once-and-for-all.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/IS-NOT-NULL-and-LEFT-JOIN-tp5823591p5823737.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] IS NOT NULL and LEFT JOIN

2014-10-20 Thread David G Johnston
David G Johnston wrote
> 
> Laurent Martelli wrote
>> Le 20/10/2014 15:58, Tom Lane a écrit :
>>> Laurent Martelli <

>> laurent.martelli@

>> > writes:
 Do we agree that both queries are identical ?
>>> No, they *aren't* identical.  Go consult any SQL reference.  Left join
>>> conditions don't work the way you seem to be thinking: after the join,
>>> the RHS column might be null, rather than equal to the LHS column.
>> Yes, I was wrong to assume that c.user_info=u.id because of the LEFT
>> JOIN.
>> 
>> But since I only want rows where u.id IS NOT NULL, in any case I will 
>> also have c.user_info IS NOT NULL.
>> 
>> Also, having a foreign key, if c.user_info is not null, it will have a 
>> match in u. So in that case, either both c.user_info and c.id are null 
>> in the result rows, or they are equal.
> The planner only expends so much effort converting between equivalent
> query forms.  By adding u.id IS NOT NULL you are saying that you really
> meant to use INNER JOIN instead of LEFT JOIN but whether the planner can
> and/or does act on that information in the WHERE clause to modify its
> joins is beyond my knowledge.  It doesn't seem to and probably correctly
> isn't worth adding the planner cycles to fix a poorly written/generated
> query on-the-fly.
> 
> 
> Now that it has been pointed out that the two queries you supplied are
> semantically different it is unclear what your point here is.  It is known
> that Hibernate (and humans too) will generate sub-optimal plans that can
> be rewritten using relational algebra and better optimized for having done
> so.  But such work takes resources that would be expended for every single
> query while manually rewriting the sub-optimal query solves the problem
> once-and-for-all.
> 
> David J.

Didn't sound right what I wrote above...

The presence of the "OR" screws things up even further since it does force
the use of LEFT JOIN mechanics for the single case where the name and e-mail
match.

I would maybe try a UNION DISTINCT query instead of an OR clause if you want
to have a query that performs better than the Hibernate one...otherwise
others more knowledgeable than myself have not made any indication that the
planner is unintentionally deficient in its handling of your original query.

You may try posting your actual question, and not the SQL, and see if that
sparks any suggestions.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/IS-NOT-NULL-and-LEFT-JOIN-tp5823591p5823739.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with large number of joins

2014-10-20 Thread Tom Lane
Marco Di Cesare  writes:
> We are using a BI tool that generates a query with an unusually large number 
> of joins. My understanding is that with this many joins Postgres query 
> planner can't possibly use an exhaustive search so it drops into a heuristics 
> algorithm. Unfortunately, the query runs quite slow (~35 seconds) and seems 
> to ignore using primary keys and indexes where available.

> Query plan here (sorry had to anonymize):
> http://explain.depesz.com/s/Uml

It's difficult to make any detailed comments when you've shown us only an
allegedly-bad query plan, and not either the query itself or the table
definitions.

However, it appears to me that the query plan is aggregating over a rather
large number of join rows, and there are very few constraints that would
allow eliminating rows.  So I'm not at all sure there is a significantly
better plan available.  Are you claiming this query was instantaneous
on SQL Server?

The only thing that jumps out at me as possibly improvable is that with
a further increase in work_mem, you could probably get it to change the
last aggregation step from Sort+GroupAggregate into HashAggregate,
which'd likely run faster ... assuming you can spare some more memory.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance