Re: [HACKERS] Predicate migration on complex self joins

2009-07-14 Thread Sam Mason
On Mon, Jul 13, 2009 at 07:06:40PM +0100, Simon Riggs wrote:
 On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote:
  It's hard to believe
  this sort of case comes up often enough to justify the cycles that would
  be expended (on *every* join query) to try to recognize it.
 
 Yeh, damn ORMs seem to spring up faster than vines.
 
 Not just because of this but I wonder if we might benefit from an
 optimizer setting specifically aimed at the foolishnesses of
 automatically generated SQL.

The best suggestion I heard was to carry on optimizing until the plan
looked cheap enough or all the options had been exhausted.

In practical terms; I think this means doing the planning in two stages,
try with all the simple optimizations and see if it results in less
than n page accesses, if it's above this level then try again with all
the optimizations turned on.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Simon Riggs

In some cases, we have SQL being submitted that has superfluous
self-joins. An example would be

select count(*) 
from foo1 a, foo1 b 
where a.c1 = b.c1 /* PK join */
and a.c2 = 5 
and b.c2 = 10;

We can recognise that a and b are the same table because they are
joined on the PK. PK is never NULL, so a join b == a in set terms. We
can use this to re-write the query as if all predicates on either of the
two aliases were on the LHS only. e.g. rewrite query like this:

select count(*) 
from foo1 a, foo1 b 
where a.c1 = b.c1 
and a.c2 = 5 
and a.c2 = 10;  /* predicate migration */

Predicate migration is important because it either allows us to detect
impossible logic, as above, or to use multi-column index access/ bitmap
scans, or to allow join removal of the RHS as a superfluous join. (I
believe that self-joins were not originally part of the analysis of
potentially removable joins).

You may well ask who would be stupid enough to write SQL like that. The
answer is of course that it is automatically generated by an ORM.

Implementing something along these lines is secondary to join removal,
but it seems worth noting as non-high priority item for the TODO.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 select count(*) 
 from foo1 a, foo1 b 
 where a.c1 = b.c1 /* PK join */
 
 You may well ask who would be stupid enough to write SQL like that.
 The answer is of course that it is automatically generated by an
 ORM.
 
We had to do something like that to get acceptable performance from
Sybase ASE.  That code probably has not been changed since migrating
to PostgreSQL, and since we have a strong portability mandate, it
probably should be left alone, since the penalty for the extra join in
PostgreSQL is small and the penalty for not having it in Sybase ASE is
large.
 
In short, it would be a welcome optimization here, although (in our
case) a relatively minor one.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 In some cases, we have SQL being submitted that has superfluous
 self-joins. An example would be

 select count(*) 
 from foo1 a, foo1 b 
 where a.c1 = b.c1 /* PK join */
 and a.c2 = 5 
 and b.c2 = 10;

 You may well ask who would be stupid enough to write SQL like that. The
 answer is of course that it is automatically generated by an ORM.

Seems like the right answer is fix the damn ORM.  It's hard to believe
this sort of case comes up often enough to justify the cycles that would
be expended (on *every* join query) to try to recognize it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 select count(*) 
 from foo1 a, foo1 b 
 where a.c1 = b.c1 /* PK join */
 
 We had to do something like that to get acceptable performance from
 Sybase ASE.

Writing a join for a single-table query?  Why, in heavens name?
(Or have you mercifully blotted the details from your memory?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Simon Riggs

On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  In some cases, we have SQL being submitted that has superfluous
  self-joins. An example would be
 
  select count(*) 
  from foo1 a, foo1 b 
  where a.c1 = b.c1 /* PK join */
  and a.c2 = 5 
  and b.c2 = 10;
 
  You may well ask who would be stupid enough to write SQL like that. The
  answer is of course that it is automatically generated by an ORM.
 
 Seems like the right answer is fix the damn ORM.  It's hard to believe
 this sort of case comes up often enough to justify the cycles that would
 be expended (on *every* join query) to try to recognize it.

Yeh, damn ORMs seem to spring up faster than vines.

Not just because of this but I wonder if we might benefit from an
optimizer setting specifically aimed at the foolishnesses of
automatically generated SQL.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Robert Haas
On Mon, Jul 13, 2009 at 1:33 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 In some cases, we have SQL being submitted that has superfluous
 self-joins. An example would be

 select count(*)
 from foo1 a, foo1 b
 where a.c1 = b.c1 /* PK join */
 and a.c2 = 5
 and b.c2 = 10;

 You may well ask who would be stupid enough to write SQL like that. The
 answer is of course that it is automatically generated by an ORM.

 Seems like the right answer is fix the damn ORM.  It's hard to believe
 this sort of case comes up often enough to justify the cycles that would
 be expended (on *every* join query) to try to recognize it.

I think it's more common than you might think.  It's been requested on
-performance within recent memory, and I've had cases where I needed
to deal with it as well.  You can't write:

DELETE FROM table AS alias LEFT JOIN othertable ...

so you end up writing:

DELETE FROM table AS alias USING sametable LEFT JOIN othertable ...

or sometimes:

DELETE FROM table AS alias USING viewthatconstainstable ...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Writing a join for a single-table query?  Why, in heavens name?
 (Or have you mercifully blotted the details from your memory?)
 
Actually, I had only the vaguest recollection of why, but I found an
email where I was explaining the problem to Sybase.  Basically, it
boiled down to a corner case involving the intersection of named
caches and index optimizations similar to what Heikki's currently
developing.  If we did a search such as:
 
SELECT searchName FROM Party WHERE searchName LIKE 'PET%,RANDY%'
 
where searchName was the first column of an index bound to a named
cache, it would scan the range of the index where searchName = 'PET'
and searchName  'PEU', determine which rows actually matched the
whole pattern, and access the heap pages only for those rows which
matched the pattern.  (In this case, 298 rows.)  As long as only
columns from the index were returned, there were only 298 access to
the heap.  Now, if you added a column which was not in the index, it
went to the heap before checking the full pattern, so it went to the
heap 87,632 times for the above criteria, and returned the same 298
rows.  Since the primary key columns were in all indexes (to allow use
of READ UNCOMMITTED :-/ ), we could select those columns without
driving it to the heap, so we used the first table reference just
for selecting the rows, and joined back to the same table on primary
key to get the values to return.
 
We could not convince Sybase that they should fix that issue.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread decibel

On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote:

Not just because of this but I wonder if we might benefit from an
optimizer setting specifically aimed at the foolishnesses of
automatically generated SQL.



+1. And it's not just ORMs that do stupid things, I've seen crap like  
this come out of users too (not this exact case, but similar).


Perhaps what we really want is an optimization level GUC so that  
users can tell the backend how much overhead they want the optimizer  
to spend on trying to work around stupidity... :)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread Jaime Casanova
On Mon, Jul 13, 2009 at 3:48 PM, decibeldeci...@decibel.org wrote:
 On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote:

 Not just because of this but I wonder if we might benefit from an
 optimizer setting specifically aimed at the foolishnesses of
 automatically generated SQL.


 +1. And it's not just ORMs that do stupid things, I've seen crap like this
 come out of users too (not this exact case, but similar).


i've this come from users most of the time...
the few systems i have seen that generate sql, try to avoid using
complex queries and make simple ones and the JOIN them at the app
level...

 Perhaps what we really want is an optimization level GUC so that users can
 tell the backend how much overhead they want the optimizer to spend on
 trying to work around stupidity... :)

i wonder what the levels have to be:
- smart_sql
- application_generated_sql
- normal_user_sql
- xtremely_stupid_sql
- what_the_hell_sql

;)

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers