Re: [HACKERS] Pulling up more complicated subqueries

2017-05-19 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 8:38 PM, Heikki Linnakangas  wrote:

>
> As a final note, I found an interesting paper called "Unnesting Arbitrary
> Queries", by Thomas Neumann and Alfons Kemper
> (http://www.btw-2015.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf).
> It describes a series of transformations that can be applied to de-correlate
> (de-lateralize?) any lateral subquery join. The trick is to build a
> temporary relation that contains all the distinct outer values of the join,
> and pass that relation down to the subquery. So instead of "executing" the
> subquery for every outer row, passing the outer values as parameters (using
> PostgreSQL terminology), you collect a list of all the outer values first,
> and then execute the subquery only once. In the subquery, you perform a join
> with the temporary relation. And then the paper describes a bunch of rules
> and optimizations, that can optimize away the temporary relation in many
> cases. That might be one way to implement the de-lateralization steps in
> PostgreSQL.
>

IIUC what you describe here, in a query involving foreign scan, we
could pass down a list of parameters to the foreign server instead of
running foreign scan on every parameter in the list. That would
improve performance of such queries by a lot.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pulling up more complicated subqueries

2017-05-17 Thread David Rowley
On 18 May 2017 at 04:30, Robert Haas  wrote:
> On Wed, May 17, 2017 at 11:08 AM, Heikki Linnakangas  wrote:
>> That's not a straight semi-join, but we could still turn it into a new kind
>> of LEFT-SEMI join. A left-semi join is like a left join, in that it returns
>> all rows from the left side, and NULLs for any non-matches. And like a
>> semi-join, it returns only one matching row from the right-side, if there
>> are duplicates. In the qual, replace the SubLink with an IS NOT NULL test.
> ...
>> This can be implemented using yet another new join type, a LEFT-UNIQUE join.
>> It's like a LEFT JOIN, but it must check that there are no duplicates in the
>> right-hand-side, and throw an error if there are (ERROR:  more than one row
>> returned by a subquery used as an expression).
>
> It seems like we might want to split what is currently called JoinType
> into two separate things -- one that is INNER/LEFT/RIGHT/FULL and the
> other that says what to do about multiple matches, which could be that
> they are expected, they are to be ignored (as in your LEFT-SEMI case),
> or they should error out (as in your LEFT-UNIQUE case).

I just wanted to mention that I almost got sucked down that hole with
unique joins. Instead, I'd recommend following the pattern of passing
bool flags down to the executor from the planner just like what is
done for inner_unique in, say make_hashjoin()

I did change unique joins at one point to overload the JoinType, but
it was a much more scary thing to do as there's lots of code in the
planner that does special things based on the join type, and the
footprint of your patch and risk factor start to grow pretty rapidly
once you do that.

I mention something around this in [1].

[1] 
https://www.postgresql.org/message-id/CAKJS1f_jRki1PQ4X-9UGKa-wnBhECQLnrxCX5haQzu4SDR_r2Q%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pulling up more complicated subqueries

2017-05-17 Thread Robert Haas
On Wed, May 17, 2017 at 11:08 AM, Heikki Linnakangas  wrote:
> That's not a straight semi-join, but we could still turn it into a new kind
> of LEFT-SEMI join. A left-semi join is like a left join, in that it returns
> all rows from the left side, and NULLs for any non-matches. And like a
> semi-join, it returns only one matching row from the right-side, if there
> are duplicates. In the qual, replace the SubLink with an IS NOT NULL test.
...
> This can be implemented using yet another new join type, a LEFT-UNIQUE join.
> It's like a LEFT JOIN, but it must check that there are no duplicates in the
> right-hand-side, and throw an error if there are (ERROR:  more than one row
> returned by a subquery used as an expression).

It seems like we might want to split what is currently called JoinType
into two separate things -- one that is INNER/LEFT/RIGHT/FULL and the
other that says what to do about multiple matches, which could be that
they are expected, they are to be ignored (as in your LEFT-SEMI case),
or they should error out (as in your LEFT-UNIQUE case).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pulling up more complicated subqueries

2017-05-17 Thread Heikki Linnakangas

Hi,

I spent some time staring at TPC-DS benchmark's query 6. It contains a 
somewhat complicated subquery, and most of the time spent on that query 
is currently spent on executing the subquery again and again. The 
essence of the query boils down to this:


CREATE TABLE foo (i int4, j int4);
CREATE TABLE bar (i int4, j int4);

INSERT INTO foo SELECT g%10, g FROM generate_series(1, 1) g;
INSERT INTO bar SELECT g%10, g FROM generate_series(1, 1) g;


SELECT * FROM foo
WHERE foo.j >= 1.2 *
  (SELECT avg(bar.j) FROM bar WHERE foo.i = bar.i);


The planner can pull up simpler subqueries, converting them to joins, 
but unfortunately this case is beyond its capabilities. If it was smart 
enough, it could be transformed into something like this:


SELECT * FROM foo
LEFT JOIN (SELECT avg(bar.j) as avg, bar.i FROM bar GROUP BY bar.i) as 
avg_bar ON foo.i = avg_bar.i

WHERE foo.j >= 1.2 * avg_bar.avg


There was some brief discussion on doing this kind of transformation 
back in 2011, at 
https://www.postgresql.org/message-id/4E2F163B.6060105%40gmail.com. I 
think we're in a better position to do that now than back then, thanks 
to all the other planner work that's been done since.


So how do we get there? I've identified a number of smaller steps that, 
when all put together, can handle this, as well as many other queries of 
course.


0. This simple case is already handled:

SELECT * FROM foo
WHERE foo.j IN (select bar.j from bar)

It's turned into a semi-join. The next steps will extend this basic 
capability, to handle more complicated cases.


postgres=# explain SELECT * FROM foo WHERE foo.j IN (select bar.j from bar);
   QUERY PLAN
-
 Hash Join  (cost=42.75..93.85 rows=1130 width=8)
   Hash Cond: (foo.j = bar.j)
   ->  Seq Scan on foo  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=40.25..40.25 rows=200 width=4)
 ->  HashAggregate  (cost=38.25..40.25 rows=200 width=4)
   Group Key: bar.j
   ->  Seq Scan on bar  (cost=0.00..32.60 rows=2260 width=4)
(7 rows)


1. Currently, the IN/EXISTS pullup is only done when the IN/EXISTS is a 
top-level qual. For example, this isn't currently pulled-up:


SELECT * FROM foo
WHERE foo.j IN (select bar.j from bar) OR foo.j < 10;

That's not a straight semi-join, but we could still turn it into a new 
kind of LEFT-SEMI join. A left-semi join is like a left join, in that it 
returns all rows from the left side, and NULLs for any non-matches. And 
like a semi-join, it returns only one matching row from the right-side, 
if there are duplicates. In the qual, replace the SubLink with an IS NOT 
NULL test. So logically, the above would be converted into:


SELECT * FROM foo
/* SEMI- */ LEFT JOIN (select bar.j from bar) ON foo.j = bar.j
WHERE bar.j IS NOT NULL OR foo.j < 10

This transformation can be applied to IN/EXIST type SubPlan references 
anywhere in the query, not only those in the WHERE clause.



2. Using a similar trick, we can transform subqueries that are not of 
the IN/EXISTS kind. (This isn't useful on its own, because this example 
would be handled as an InitPlan, which performs well. But it's a 
stepping stone in explaining this.)


For example:

SELECT * FROM foo
WHERE foo.j > (select avg(bar.j) from bar)

This can be implemented using yet another new join type, a LEFT-UNIQUE 
join. It's like a LEFT JOIN, but it must check that there are no 
duplicates in the right-hand-side, and throw an error if there are 
(ERROR:  more than one row returned by a subquery used as an expression).


SELECT * FROM foo
/* unique-checking */ LEFT JOIN (select avg(bar.j) as min_j from bar) as 
subq ON TRUE

WHERE foo.j > subq.min_j


3. All the previous examples are uncorrelated subqueries, but all these 
transformations can also be performed on correlated subqueries. The 
difference is that the subquery that's pulled up to the range table must 
be marked as LATERAL. For example:


SELECT * FROM foo
WHERE foo.j > (select avg(bar.j) from bar where bar.i = foo.i)

->

SELECT * FROM foo
/* UNIQUE-CHECKING */ LEFT JOIN LATERAL (select avg(bar.j) from bar 
where bar.i = foo.i) AS subq(j) ON true

WHERE foo.j > subq.j


4. The previous transformation isn't very interesting on its own, 
because the only way to implement a LATERAL join is a nested loop join, 
which is essentially the same as executing the subplan the naive way. 
But we can potentially de-correlate it, by pulling up the correlation qual:



SELECT * FROM foo
/* UNIQUE-CHECKING */ LEFT JOIN LATERAL (select bar.j, bar.i from bar 
WHERE bar.i = foo.i) AS subq(j, i) ON true

WHERE foo.j > subq.j

->

SELECT * FROM foo
/* UNIQUE-CHECKING */ LEFT JOIN (select bar.j, bar.i from bar) AS 
subq(j, i) ON subq.i = foo.i

WHERE foo.j > subq.j


This will get inlined into the parent query, getting rid of the subquery 
altogether. But even if that cannot be done, this is potentially much 
better