Re: [HACKERS] Predicate migration on complex self joins
On Mon, Jul 13, 2009 at 07:06:40PM +0100, Simon Riggs wrote: On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote: It's hard to believe this sort of case comes up often enough to justify the cycles that would be expended (on *every* join query) to try to recognize it. Yeh, damn ORMs seem to spring up faster than vines. Not just because of this but I wonder if we might benefit from an optimizer setting specifically aimed at the foolishnesses of automatically generated SQL. The best suggestion I heard was to carry on optimizing until the plan looked cheap enough or all the options had been exhausted. In practical terms; I think this means doing the planning in two stages, try with all the simple optimizations and see if it results in less than n page accesses, if it's above this level then try again with all the optimizations turned on. -- Sam http://samason.me.uk/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Predicate migration on complex self joins
In some cases, we have SQL being submitted that has superfluous self-joins. An example would be select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ and a.c2 = 5 and b.c2 = 10; We can recognise that a and b are the same table because they are joined on the PK. PK is never NULL, so a join b == a in set terms. We can use this to re-write the query as if all predicates on either of the two aliases were on the LHS only. e.g. rewrite query like this: select count(*) from foo1 a, foo1 b where a.c1 = b.c1 and a.c2 = 5 and a.c2 = 10; /* predicate migration */ Predicate migration is important because it either allows us to detect impossible logic, as above, or to use multi-column index access/ bitmap scans, or to allow join removal of the RHS as a superfluous join. (I believe that self-joins were not originally part of the analysis of potentially removable joins). You may well ask who would be stupid enough to write SQL like that. The answer is of course that it is automatically generated by an ORM. Implementing something along these lines is secondary to join removal, but it seems worth noting as non-high priority item for the TODO. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
Simon Riggs si...@2ndquadrant.com wrote: select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ You may well ask who would be stupid enough to write SQL like that. The answer is of course that it is automatically generated by an ORM. We had to do something like that to get acceptable performance from Sybase ASE. That code probably has not been changed since migrating to PostgreSQL, and since we have a strong portability mandate, it probably should be left alone, since the penalty for the extra join in PostgreSQL is small and the penalty for not having it in Sybase ASE is large. In short, it would be a welcome optimization here, although (in our case) a relatively minor one. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
Simon Riggs si...@2ndquadrant.com writes: In some cases, we have SQL being submitted that has superfluous self-joins. An example would be select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ and a.c2 = 5 and b.c2 = 10; You may well ask who would be stupid enough to write SQL like that. The answer is of course that it is automatically generated by an ORM. Seems like the right answer is fix the damn ORM. It's hard to believe this sort of case comes up often enough to justify the cycles that would be expended (on *every* join query) to try to recognize it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ We had to do something like that to get acceptable performance from Sybase ASE. Writing a join for a single-table query? Why, in heavens name? (Or have you mercifully blotted the details from your memory?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: In some cases, we have SQL being submitted that has superfluous self-joins. An example would be select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ and a.c2 = 5 and b.c2 = 10; You may well ask who would be stupid enough to write SQL like that. The answer is of course that it is automatically generated by an ORM. Seems like the right answer is fix the damn ORM. It's hard to believe this sort of case comes up often enough to justify the cycles that would be expended (on *every* join query) to try to recognize it. Yeh, damn ORMs seem to spring up faster than vines. Not just because of this but I wonder if we might benefit from an optimizer setting specifically aimed at the foolishnesses of automatically generated SQL. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
On Mon, Jul 13, 2009 at 1:33 PM, Tom Lanet...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: In some cases, we have SQL being submitted that has superfluous self-joins. An example would be select count(*) from foo1 a, foo1 b where a.c1 = b.c1 /* PK join */ and a.c2 = 5 and b.c2 = 10; You may well ask who would be stupid enough to write SQL like that. The answer is of course that it is automatically generated by an ORM. Seems like the right answer is fix the damn ORM. It's hard to believe this sort of case comes up often enough to justify the cycles that would be expended (on *every* join query) to try to recognize it. I think it's more common than you might think. It's been requested on -performance within recent memory, and I've had cases where I needed to deal with it as well. You can't write: DELETE FROM table AS alias LEFT JOIN othertable ... so you end up writing: DELETE FROM table AS alias USING sametable LEFT JOIN othertable ... or sometimes: DELETE FROM table AS alias USING viewthatconstainstable ... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
Tom Lane t...@sss.pgh.pa.us wrote: Writing a join for a single-table query? Why, in heavens name? (Or have you mercifully blotted the details from your memory?) Actually, I had only the vaguest recollection of why, but I found an email where I was explaining the problem to Sybase. Basically, it boiled down to a corner case involving the intersection of named caches and index optimizations similar to what Heikki's currently developing. If we did a search such as: SELECT searchName FROM Party WHERE searchName LIKE 'PET%,RANDY%' where searchName was the first column of an index bound to a named cache, it would scan the range of the index where searchName = 'PET' and searchName 'PEU', determine which rows actually matched the whole pattern, and access the heap pages only for those rows which matched the pattern. (In this case, 298 rows.) As long as only columns from the index were returned, there were only 298 access to the heap. Now, if you added a column which was not in the index, it went to the heap before checking the full pattern, so it went to the heap 87,632 times for the above criteria, and returned the same 298 rows. Since the primary key columns were in all indexes (to allow use of READ UNCOMMITTED :-/ ), we could select those columns without driving it to the heap, so we used the first table reference just for selecting the rows, and joined back to the same table on primary key to get the values to return. We could not convince Sybase that they should fix that issue. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote: Not just because of this but I wonder if we might benefit from an optimizer setting specifically aimed at the foolishnesses of automatically generated SQL. +1. And it's not just ORMs that do stupid things, I've seen crap like this come out of users too (not this exact case, but similar). Perhaps what we really want is an optimization level GUC so that users can tell the backend how much overhead they want the optimizer to spend on trying to work around stupidity... :) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
On Mon, Jul 13, 2009 at 3:48 PM, decibeldeci...@decibel.org wrote: On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote: Not just because of this but I wonder if we might benefit from an optimizer setting specifically aimed at the foolishnesses of automatically generated SQL. +1. And it's not just ORMs that do stupid things, I've seen crap like this come out of users too (not this exact case, but similar). i've this come from users most of the time... the few systems i have seen that generate sql, try to avoid using complex queries and make simple ones and the JOIN them at the app level... Perhaps what we really want is an optimization level GUC so that users can tell the backend how much overhead they want the optimizer to spend on trying to work around stupidity... :) i wonder what the levels have to be: - smart_sql - application_generated_sql - normal_user_sql - xtremely_stupid_sql - what_the_hell_sql ;) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers