Re: Early WIP/PoC for inlining CTEs

2019-04-09 Thread Tom Lane
I wrote:
> Andrew Gierth  writes:
>> So what I think we need to do here is to forbid inlining if (a) the
>> refcount is greater than 1 and (b) the CTE in question contains,
>> recursively anywhere inside its rtable or the rtables of any of its
>> nested CTEs, a "self_reference" RTE.

> That's kind of "ugh" too: it sounds expensive, and doing it in a way
> that doesn't produce false positives would be even more complicated.

After further investigation, I concluded that that wasn't that awful,
so done that way.

I'm still not entirely convinced about the behavior for nested WITHs
with different materialization specifications, but that seems like
a separate topic.

regards, tom lane




Re: Early WIP/PoC for inlining CTEs

2019-03-12 Thread Tatsuo Ishii
> On 2018-08-08 16:55:22 +1200, Thomas Munro wrote:
>> On Fri, Jul 27, 2018 at 8:10 PM, David Fetter  wrote:
>> > On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
>> >> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
>> >> > Please find attached the next version, which passes 'make check'.
>> >>
>> >> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is 
>> >> different).
>> >
>> > Please find attached a patch that does.
>> >
>> > It doesn't always pass make installcheck-world, but I need to sleep
>> > rather than investigate that at the moment.
>> 
>> One observation I wanted to share: CTE scans inhibit parallelism today
>> (something we might eventually want to fix with shared tuplestores).
>> This patch therefore allows parallelism in some WITH queries, which
>> seems like a very valuable thing.
> 
> Might be interesting to see how big a difference it makes for
> TPC-DS. Currently the results are bad (as in many queries don't finish
> in a relevant time) because it uses CTEs so widely, and there's often
> predicates outside the CTE that could be pushed down.

Now that the patch was committed, I played with TPCS-DS and found at
least one of their queries gets speedup. Query 2 runs 2 times faster
than 11. In 12, it seems that CTE wscs is pushed down.

with wscs as
 (select sold_date_sk
,sales_price
  from (select ws_sold_date_sk sold_date_sk
  ,ws_ext_sales_price sales_price
from web_sales 
union all
select cs_sold_date_sk sold_date_sk
  ,cs_ext_sales_price sales_price
from catalog_sales) as s1),
 wswscs as 
 (select d_week_seq,
sum(case when (d_day_name='Sunday') then sales_price else null end) 
sun_sales,
sum(case when (d_day_name='Monday') then sales_price else null end) 
mon_sales,
sum(case when (d_day_name='Tuesday') then sales_price else  null end) 
tue_sales,
sum(case when (d_day_name='Wednesday') then sales_price else null end) 
wed_sales,
sum(case when (d_day_name='Thursday') then sales_price else null end) 
thu_sales,
sum(case when (d_day_name='Friday') then sales_price else null end) 
fri_sales,
sum(case when (d_day_name='Saturday') then sales_price else null end) 
sat_sales
 from wscs
 ,date_dim
 where d_date_sk = sold_date_sk
 group by d_week_seq)
 select d_week_seq1
   ,round(sun_sales1/sun_sales2,2)
   ,round(mon_sales1/mon_sales2,2)
   ,round(tue_sales1/tue_sales2,2)
   ,round(wed_sales1/wed_sales2,2)
   ,round(thu_sales1/thu_sales2,2)
   ,round(fri_sales1/fri_sales2,2)
   ,round(sat_sales1/sat_sales2,2)
 from
 (select wswscs.d_week_seq d_week_seq1
,sun_sales sun_sales1
,mon_sales mon_sales1
,tue_sales tue_sales1
,wed_sales wed_sales1
,thu_sales thu_sales1
,fri_sales fri_sales1
,sat_sales sat_sales1
  from wswscs,date_dim 
  where date_dim.d_week_seq = wswscs.d_week_seq and
d_year = 1998) y,
 (select wswscs.d_week_seq d_week_seq2
,sun_sales sun_sales2
,mon_sales mon_sales2
,tue_sales tue_sales2
,wed_sales wed_sales2
,thu_sales thu_sales2
,fri_sales fri_sales2
,sat_sales sat_sales2
  from wswscs
  ,date_dim 
  where date_dim.d_week_seq = wswscs.d_week_seq and
d_year = 1998+1) z
 where d_week_seq1=d_week_seq2-53
 order by d_week_seq1;

Here's the 12's plan:

  QUERY PLAN
  
--
 Sort  (cost=118929.39..118929.43 rows=13 width=228) (actual 
time=792.588..792.710 rows=2513 loops=1)
   Sort Key: wswscs.d_week_seq
   Sort Method: quicksort  Memory: 323kB
   CTE wswscs
 ->  Finalize GroupAggregate  (cost=110164.09..113672.71 rows=10447 
width=228) (actual time=766.232..768.415 rows=263 loops=1)
   Group Key: date_dim_2.d_week_seq
   ->  Gather Merge  (cost=110164.09..112601.89 rows=20894 width=228) 
(actual time=766.209..767.158 rows=789 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=109164.06..109190.18 rows=10447 width=228) 
(actual time=763.059..763.078 rows=263 loops=3)
   Sort Key: date_dim_2.d_week_seq
   Sort Method: quicksort  Memory: 160kB
   Worker 0:  Sort Method: quicksort  Memory: 160kB
   Worker 1:  Sort Method: quicksort  Memory: 160kB
   ->  Partial HashAggregate  (cost=108179.39..108466.69 
rows=10447 width=228) (actual time=762.202..762.889 rows=263 loops=3)
 Group Key: date_dim_2.d_week_seq
 ->  Parallel 

Re: Early WIP/PoC for inlining CTEs

2019-02-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I also thought about that. But what I thought about it on reflection
 >> was: if the user explicitly wrote NOT MATERIALIZED, then we should
 >> assume they mean it.

 Tom> Ah, but the example I gave also had MATERIALIZED on the inner WITH.
 Tom> Why should the user not also mean that?

The inner WITH does get materialized, it just gets materialized twice.
If the user doesn't want that, then they can avoid using NOT MATERIALIZED
on the outer CTE; but if we force it to materialize the outer query,
then that leaves the user without recourse.

Consider a case like:

create view foo as
  with s as materialized (select something)
  select * from large l
  where l.foo in (select * from s) or l.bar in (select * from s);

with
  bar as not materialized (select * from foo)
select * from bar b1, bar b2 where b1.col='x' and b2.col='y';

In a case like this, materializing "s" twice may be far less expensive
than materializing the result of "select * from large..." without
benefit of pushed-down quals.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2019-02-26 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Also, I thought of a somewhat-related scenario that the code isn't
>  Tom> accounting for: you can break the restrictions about single
>  Tom> evaluation with nested WITHs, like

> I also thought about that. But what I thought about it on reflection
> was: if the user explicitly wrote NOT MATERIALIZED, then we should
> assume they mean it.

Ah, but the example I gave also had MATERIALIZED on the inner WITH.
Why should the user not also mean that?

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Also, I thought of a somewhat-related scenario that the code isn't
 Tom> accounting for: you can break the restrictions about single
 Tom> evaluation with nested WITHs, like

I also thought about that. But what I thought about it on reflection
was: if the user explicitly wrote NOT MATERIALIZED, then we should
assume they mean it.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2019-02-26 Thread Tom Lane
Andrew Gierth  writes:
> Here, uncommenting that NOT actually changes the result, from 22 rows to
> 4 rows, because we end up generating multiple worktable scans and the
> recursion logic is not set up to handle that.

Ugh.

> So what I think we need to do here is to forbid inlining if (a) the
> refcount is greater than 1 and (b) the CTE in question contains,
> recursively anywhere inside its rtable or the rtables of any of its
> nested CTEs, a "self_reference" RTE.

That's kind of "ugh" too: it sounds expensive, and doing it in a way
that doesn't produce false positives would be even more complicated.

Idle uncaffeinated speculation: is it practical to fix the restriction
about multiple worktable scans?

Also, I thought of a somewhat-related scenario that the code isn't
accounting for: you can break the restrictions about single evaluation
with nested WITHs, like

with x as not materialized (with y as materialized (select random() r) select * 
from y)
select * from x, x x1;

In this particular example, we're saved from computing random() twice
by the checks for volatile functions.  But without that, y is inlined
and computed twice, e.g.

explain verbose with x as not materialized (with y as (select now() r) select * 
from y)
select * from x, x x1;
   QUERY PLAN   

 Nested Loop  (cost=0.00..0.06 rows=1 width=16)
   Output: (now()), (now())
   ->  Result  (cost=0.00..0.01 rows=1 width=8)
 Output: now()
   ->  Result  (cost=0.00..0.01 rows=1 width=8)
 Output: now()
(6 rows)

As a user I think I'd find that surprising, and bad if y were expensive.

Is it practical to inline the outer "x" level and still compute "y"
only once?  If not, I think we need to disallow inlining anything
that contains a "with".

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-26 Thread Andrew Gierth
So it turns out there's another case we have to check for, as reported
on IRC by Yaroslav Schekin (though I've simplified his query a little):

WITH RECURSIVE
  x(a) AS ((VALUES ('a'),('b'))
   UNION ALL
   (WITH
  z AS /*NOT*/ MATERIALIZED (SELECT a FROM x)
SELECT z.a || z1.a AS a FROM z CROSS JOIN z AS z1
 WHERE length(z.a || z1.a) < 5))
SELECT * FROM x;

Here, uncommenting that NOT actually changes the result, from 22 rows to
4 rows, because we end up generating multiple worktable scans and the
recursion logic is not set up to handle that.

So what I think we need to do here is to forbid inlining if (a) the
refcount is greater than 1 and (b) the CTE in question contains,
recursively anywhere inside its rtable or the rtables of any of its
nested CTEs, a "self_reference" RTE. We only need to make this check if
we're somewhere (possibly nested) inside a recursive query, but I don't
know how practical it will be to test that condition at the point we do
inlining; doing it unconditionally will (I think) be harmless because
self_reference will not be set.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Andreas Karlsson

On 16/02/2019 22.14, Tom Lane wrote:

I was expecting more controversy ... pushed that way.


Thanks! And thanks to everyone else who worked on this patch!

Andreas



Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Tomas Vondra



On 2/16/19 10:14 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>>  wrote:
 On 2019-Feb-14, Peter Eisentraut wrote:
> If we're not really planning to add any more options, I'd register a
> light vote for MATERIALIZED.  It reads easier, seems more grammatically
> correct, and uses an existing word.
> 
 +1 for MATERIALIZED, as I proposed in
 https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
> 
>>> Seconded!
> 
>> +1 to MATERIALIZED too
> 
> I was expecting more controversy ... pushed that way.
> 

As you wish. I withdraw my previous vote and I propose

   AS [NOT] MATERIALIZED [yes|false]

;-)

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Jonathan S. Katz
On 2/16/19 4:14 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>>  wrote:
 On 2019-Feb-14, Peter Eisentraut wrote:
> If we're not really planning to add any more options, I'd register a
> light vote for MATERIALIZED.  It reads easier, seems more grammatically
> correct, and uses an existing word.
> 
 +1 for MATERIALIZED, as I proposed in
 https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
> 
>>> Seconded!
> 
>> +1 to MATERIALIZED too
> 
> I was expecting more controversy ... pushed that way.

This is awesome. Thank you everyone for working on this!



signature.asc
Description: OpenPGP digital signature


Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Tom Lane
Tomas Vondra  writes:
> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>  wrote:
>>> On 2019-Feb-14, Peter Eisentraut wrote:
 If we're not really planning to add any more options, I'd register a
 light vote for MATERIALIZED.  It reads easier, seems more grammatically
 correct, and uses an existing word.

>>> +1 for MATERIALIZED, as I proposed in
>>> https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql

>> Seconded!

> +1 to MATERIALIZED too

I was expecting more controversy ... pushed that way.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-15 Thread Tomas Vondra
On 2/14/19 8:22 PM, Merlin Moncure wrote:
> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>  wrote:
>>
>> On 2019-Feb-14, Peter Eisentraut wrote:
>>
>>> On 14/02/2019 16:11, Tom Lane wrote:
 ... so, have we beaten this topic to death yet?  Can we make a decision?

 Personally, I'd be happy with either of the last two patch versions
 I posted (that is, either AS [[NOT] MATERIALIZED] or
 AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.
>>>
>>> If we're not really planning to add any more options, I'd register a
>>> light vote for MATERIALIZED.  It reads easier, seems more grammatically
>>> correct, and uses an existing word.
>>
>> +1 for MATERIALIZED, as I proposed in
>> https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
> 
> Seconded!
> 

+1 to MATERIALIZED too

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Andreas Karlsson

On 14/02/2019 16.11, Tom Lane wrote:

... so, have we beaten this topic to death yet?  Can we make a decision?

Personally, I'd be happy with either of the last two patch versions
I posted (that is, either AS [[NOT] MATERIALIZED] or
AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.


I am happy with either of those.

Andreas



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Bruce Momjian
On Thu, Feb 14, 2019 at 04:25:27PM +0100, Peter Eisentraut wrote:
> On 06/02/2019 10:00, Bruce Momjian wrote:
> > I think "materialize" is the right word since "materialized" would be
> > past tense.
> 
> It's an adjective.  The sentence is, "with foo as the materialized
> $query, do the $main_query".
> 
> It's the same as "materialized view".

Agreed, thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Merlin Moncure
On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
 wrote:
>
> On 2019-Feb-14, Peter Eisentraut wrote:
>
> > On 14/02/2019 16:11, Tom Lane wrote:
> > > ... so, have we beaten this topic to death yet?  Can we make a decision?
> > >
> > > Personally, I'd be happy with either of the last two patch versions
> > > I posted (that is, either AS [[NOT] MATERIALIZED] or
> > > AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.
> >
> > If we're not really planning to add any more options, I'd register a
> > light vote for MATERIALIZED.  It reads easier, seems more grammatically
> > correct, and uses an existing word.
>
> +1 for MATERIALIZED, as I proposed in
> https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql

Seconded!

merlin



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Alvaro Herrera
On 2019-Feb-14, Peter Eisentraut wrote:

> On 14/02/2019 16:11, Tom Lane wrote:
> > ... so, have we beaten this topic to death yet?  Can we make a decision?
> > 
> > Personally, I'd be happy with either of the last two patch versions
> > I posted (that is, either AS [[NOT] MATERIALIZED] or
> > AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.
> 
> If we're not really planning to add any more options, I'd register a
> light vote for MATERIALIZED.  It reads easier, seems more grammatically
> correct, and uses an existing word.

+1 for MATERIALIZED, as I proposed in
https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Peter Eisentraut
On 06/02/2019 10:00, Bruce Momjian wrote:
> I think "materialize" is the right word since "materialized" would be
> past tense.

It's an adjective.  The sentence is, "with foo as the materialized
$query, do the $main_query".

It's the same as "materialized view".

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Tom Lane
... so, have we beaten this topic to death yet?  Can we make a decision?

Personally, I'd be happy with either of the last two patch versions
I posted (that is, either AS [[NOT] MATERIALIZED] or
AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-09 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> After further reflection I really don't like Andrew's suggestion
>  Tom> that we not document the rule that multiply-referenced CTEs won't
>  Tom> be inlined by default. That would be giving up the principle that
>  Tom> WITH calculations are not done multiple times by default, and I
>  Tom> draw the line at that. It's an often-useful behavior as well as
>  Tom> one that's been documented from day one, so I do not accept the
>  Tom> argument that we might someday override it on the basis of nothing
>  Tom> but planner cost estimates.

> The case that springs to mind is when a CTE with grouping is then joined
> multiple times in the main query with different conditions. If the
> planner is able to deduce (e.g. via ECs) that restrictions on grouped
> columns can be pushed into the CTE, then inlining the CTE multiple times
> might be a significant win. But if that isn't possible, then inlining
> multiple times might be a significant loss.

Sure, but this is exactly the sort of situation where we should offer
a way for the user to force either decision to be made.  I think it's
very unlikely that we'll ever be in a position to make a realistic
cost-based decision for that.  Actually planning it out both ways would
be horrendously expensive (and probably none too reliable anyway, given
how shaky ndistinct estimates tend to be); and we certainly don't have
enough info to make a smart choice without doing that.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-06 Thread Bruce Momjian
On Sat, Feb  2, 2019 at 02:01:01PM -0500, Tom Lane wrote:
> I wrote:
> > I propose that we implement and document this as
> > WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
> > which is maybe a bit clunky but not awful, and it would leave room
> > to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
> > ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
> > probably allow just "MATERIALIZE" as well, with the boolean value
> > defaulting to true.
> 
> In hopes of moving things along, here's a version of the patch that
> does it like that.  This demonstrates that, in fact, we can accept
> "keyword [value] [, ...]" style options without any parens and
> there's no syntax conflict.  We'd have to work a bit harder on the
> actual code in gram.y if we wanted to handle multiple options,
> but the Bison productions will work.
> 
> There's nothing particularly stopping us from accepting
> "materialized" with a D in this syntax, instead of or in addition
> to "materialize"; though I hesitate to mention it for fear of
> another round of bikeshedding.

I think "materialize" is the right word since "materialized" would be
past tense.

> After further reflection I really don't like Andrew's suggestion
> that we not document the rule that multiply-referenced CTEs won't
> be inlined by default.  That would be giving up the principle
> that WITH calculations are not done multiple times by default,
> and I draw the line at that.  It's an often-useful behavior as
> well as one that's been documented from day one, so I do not accept
> the argument that we might someday override it on the basis of
> nothing but planner cost estimates.

Thinking of the history of documenting optimizer issues, I think we
should document when CTEs are inlined by default, because the user will
want to know when they should override the default behavior.  When we
didn't document how PREPARED queries worked, we got many questions about
odd query performance until we finally documented it in 2016 in commit
fab9d1da4a213fab08fe2d263eedf2408bc4a27a.  If we change the inlining
behavior later, we can update the docs.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Early WIP/PoC for inlining CTEs

2019-02-03 Thread Tom Lane
Vik Fearing  writes:
> On 28/01/2019 23:05, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> Or put it at the end?
>>> WITH ctename AS ( query ) MATERIALIZED

>> Yeah, I thought about that too, but it doesn't seem like an improvement.
>> If the query is very long (which isn't unlikely) I think people would
>> prefer to see the option(s) up front.

> On the other hand, the end is where the other options go (that we
> haven't implemented yet).  See .

Yeah, I noticed that too while working on the latest patch revision.
ISTM that's actually an argument for *not* putting PG-specific
syntax there.  We'd increase the risk of conflicting with future spec
additions, assuming that they continue to add stuff at the end rather
than just after AS.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-02-03 Thread Vik Fearing
On 28/01/2019 23:05, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 28/01/2019 21:35, Tom Lane wrote:
>>> Conceivably we could make it work without the parens:
>>> ...
> 
>> Or put it at the end?
>> WITH ctename AS ( query ) MATERIALIZED
> 
> Yeah, I thought about that too, but it doesn't seem like an improvement.
> If the query is very long (which isn't unlikely) I think people would
> prefer to see the option(s) up front.

On the other hand, the end is where the other options go (that we
haven't implemented yet).  See .
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: Early WIP/PoC for inlining CTEs

2019-02-03 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> After further reflection I really don't like Andrew's suggestion
 Tom> that we not document the rule that multiply-referenced CTEs won't
 Tom> be inlined by default. That would be giving up the principle that
 Tom> WITH calculations are not done multiple times by default, and I
 Tom> draw the line at that. It's an often-useful behavior as well as
 Tom> one that's been documented from day one, so I do not accept the
 Tom> argument that we might someday override it on the basis of nothing
 Tom> but planner cost estimates.

The case that springs to mind is when a CTE with grouping is then joined
multiple times in the main query with different conditions. If the
planner is able to deduce (e.g. via ECs) that restrictions on grouped
columns can be pushed into the CTE, then inlining the CTE multiple times
might be a significant win. But if that isn't possible, then inlining
multiple times might be a significant loss.

If such a query is made into a view, then (given your position) the
decision of whether to inline has to be made at view creation time,
which doesn't seem desirable (even if we have to put up with it for the
present).

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2019-02-02 Thread Tom Lane
I wrote:
> I propose that we implement and document this as
>   WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
> which is maybe a bit clunky but not awful, and it would leave room
> to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
> ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
> probably allow just "MATERIALIZE" as well, with the boolean value
> defaulting to true.

In hopes of moving things along, here's a version of the patch that
does it like that.  This demonstrates that, in fact, we can accept
"keyword [value] [, ...]" style options without any parens and
there's no syntax conflict.  We'd have to work a bit harder on the
actual code in gram.y if we wanted to handle multiple options,
but the Bison productions will work.

There's nothing particularly stopping us from accepting
"materialized" with a D in this syntax, instead of or in addition
to "materialize"; though I hesitate to mention it for fear of
another round of bikeshedding.

After further reflection I really don't like Andrew's suggestion
that we not document the rule that multiply-referenced CTEs won't
be inlined by default.  That would be giving up the principle
that WITH calculations are not done multiple times by default,
and I draw the line at that.  It's an often-useful behavior as
well as one that's been documented from day one, so I do not accept
the argument that we might someday override it on the basis of
nothing but planner cost estimates.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index ea2e4bc..9905593 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2927,6 +2927,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 
 /* we store the string name because RTE_CTE RTEs need it */
 APP_JUMB_STRING(cte->ctename);
+APP_JUMB(cte->ctematerialized);
 JumbleQuery(jstate, castNode(Query, cte->ctequery));
 			}
 			break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b3894d0..226ba56 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZE ON (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZE ON (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165..6399b8b 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZE ON (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZE ON (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, 

Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure
On Tuesday, January 29, 2019, Tom Lane  wrote:

> Merlin Moncure  writes:
> > On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
> >> I propose that we implement and document this as
> >> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> > I hate to bikeshed here, but I think it's better english using that
> > style of syntax to say,
> >  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )
>
> Hm.  Doesn't really seem to fit with our style for options elsewhere;
> for instance, EXPLAIN's options are things like ANALYZE ON/OFF and
> VERBOSE ON/OFF, not ANALYSIS and VERBOSITY.
>
> regards, tom lane
>

Yep...I'll concede the point.

merlin


Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
David Fetter  writes:
> On Tue, Jan 29, 2019 at 02:52:44PM -0500, Tom Lane wrote:
>> I propose that we implement and document this as
>> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

> I think this would be better with parentheses like this: 
> WITH ctename [ ( MATERIALIZE { ON | OFF } ) ] AS ( query ) [, ... ]

I take it you haven't actually been reading this thread.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread David Fetter
On Tue, Jan 29, 2019 at 02:52:44PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> >> Yeah, I thought about that too, but it doesn't seem like an improvement.
> >> If the query is very long (which isn't unlikely) I think people would
> >> prefer to see the option(s) up front.
> 
> > Having these options at the front of the WITH clause looks more
> > natural to me.
> 
> Well, we've managed to get agreement on the semantics of this thing,
> let's not get hung up on the syntax details.
> 
> I propose that we implement and document this as
> 
>   WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

I think this would be better with parentheses like this: 

WITH ctename [ ( MATERIALIZE { ON | OFF } ) ] AS ( query ) [, ... ]

and it's a lot easier to add more query hints later.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Chapman Flack
On 1/29/19 3:36 PM, Merlin Moncure wrote:
> I hate to bikeshed here, but I think it's better english using that
> style of syntax to say,
>  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

I had been just about to also engage in bikeshedding, on grounds
that (to me) the MATERIALIZED/NOT MATERIALIZED form seemed more
natural:

FROM GROCER OBTAIN WAXED CUCUMBERS. (this seems downright natural)
FROM GROCER OBTAIN NOT WAXED CUCUMBERS. (nearly natural, s/NOT /UN/)

FROM GROCER OBTAIN WAX ON CUCUMBERS. (these read oddly to me)
FROM GROCER OBTAIN WAX OFF CUCUMBERS.

I do understand Tom's point that the wax-on/wax-off form generalizes
more easily to non-boolean future options. It would really read
better as a parenthetical, so too bad parentheses are already taken
to go around the query.

While gawking at the bikeshed, one more thing came to mind:

I like to hold out hope [1] that, one day, the WITH grammar could
be extended to handle lexically-scoped option settings like those in
the ISO standard.

It doesn't seem to me that any of these current proposals would get
in the way of that. Just another thing to have in mind.

Regards,
-Chap


[1]
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XMLBINARY_and_XMLNAMESPACES



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
Merlin Moncure  writes:
> On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
>> I propose that we implement and document this as
>> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

> I hate to bikeshed here, but I think it's better english using that
> style of syntax to say,
>  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

Hm.  Doesn't really seem to fit with our style for options elsewhere;
for instance, EXPLAIN's options are things like ANALYZE ON/OFF and
VERBOSE ON/OFF, not ANALYSIS and VERBOSITY.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure
On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> >> Yeah, I thought about that too, but it doesn't seem like an improvement.
> >> If the query is very long (which isn't unlikely) I think people would
> >> prefer to see the option(s) up front.
>
> > Having these options at the front of the WITH clause looks more
> > natural to me.
>
> Well, we've managed to get agreement on the semantics of this thing,
> let's not get hung up on the syntax details.
>
> I propose that we implement and document this as
>
> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> which is maybe a bit clunky but not awful, and it would leave room
> to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
> ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
> probably allow just "MATERIALIZE" as well, with the boolean value
> defaulting to true.

I hate to bikeshed here, but I think it's better english using that
style of syntax to say,
 WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

merlin



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
>> Yeah, I thought about that too, but it doesn't seem like an improvement.
>> If the query is very long (which isn't unlikely) I think people would
>> prefer to see the option(s) up front.

> Having these options at the front of the WITH clause looks more
> natural to me.

Well, we've managed to get agreement on the semantics of this thing,
let's not get hung up on the syntax details.

I propose that we implement and document this as

WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

which is maybe a bit clunky but not awful, and it would leave room
to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
probably allow just "MATERIALIZE" as well, with the boolean value
defaulting to true.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Michael Paquier
On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> Yeah, I thought about that too, but it doesn't seem like an improvement.
> If the query is very long (which isn't unlikely) I think people would
> prefer to see the option(s) up front.

Having these options at the front of the WITH clause looks more
natural to me.
--
Michael


signature.asc
Description: PGP signature


Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Andreas Karlsson

On 1/28/19 10:54 PM, Peter Eisentraut wrote:

Or put it at the end?

 WITH ctename AS ( query ) MATERIALIZED


Hm, seems like that would be easy to miss for long queries.

Andreas




Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Tom Lane
Peter Eisentraut  writes:
> On 28/01/2019 21:35, Tom Lane wrote:
>> Conceivably we could make it work without the parens:
>> ...

> Or put it at the end?
> WITH ctename AS ( query ) MATERIALIZED

Yeah, I thought about that too, but it doesn't seem like an improvement.
If the query is very long (which isn't unlikely) I think people would
prefer to see the option(s) up front.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Peter Eisentraut
On 28/01/2019 21:35, Tom Lane wrote:
> Conceivably we could make it work without the parens:
> 
> WITH ctename AS [ option = value [ ,  ] ] ( query  )
> 
> which for the immediate feature I'd be tempted to spell as
> 
> WITH ctename AS [ materialize = on/off ] ( query ... )
> 
> I think the only reason the syntax is MATERIALIZED with a D is that
> that's already a keyword; it reads a bit awkwardly IMO.  But if we
> were accepting a ColId there, there'd be room to adjust the spelling.

Or put it at the end?

WITH ctename AS ( query ) MATERIALIZED

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Tom Lane
Robert Haas  writes:
> However, generally we have not had great luck with just sticking
> keywords in there (cf. VACUUM, ANALYZE, EXPLAIN, COPY) which is why I
> suggested using a flexible syntax with parenthesized options.

Fair, except that as you then proceed to point out, that does not work
either before or after the AS.

Conceivably we could make it work without the parens:

WITH ctename AS [ option = value [ ,  ] ] ( query  )

which for the immediate feature I'd be tempted to spell as

WITH ctename AS [ materialize = on/off ] ( query ... )

I think the only reason the syntax is MATERIALIZED with a D is that
that's already a keyword; it reads a bit awkwardly IMO.  But if we
were accepting a ColId there, there'd be room to adjust the spelling.

That said, this is still clunkier than the existing proposal, and
I'm not really convinced that we need to leave room for more options.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Robert Haas
On Sun, Jan 27, 2019 at 8:23 AM Andrew Gierth
 wrote:
> I think it makes more sense to say
> that we never inline if MATERIALIZED is specified, that we always inline
> if NOT MATERIALIZED is specified, and that if neither is specified the
> planner will choose (but perhaps note that currently it always chooses
> only based on refcount).

I, too, like this approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Robert Haas
On Mon, Jan 21, 2019 at 10:28 AM Tom Lane  wrote:
> Andreas Karlsson  writes:
> > I have a minor biksheddish question about the syntax.
> > You proposed:
> > WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query
> > While Andrew proposed:
> > WITH cte_name AS [[NOT] MATERIALIZED] (query) main_query
> > Do people have any preference between these two?
>
> FWIW, I'd independently thought that the latter is more readable,
> and probably less likely to have syntax problems with future
> extensions (since AS is already fully reserved).  Didn't get
> around to mentioning it yet, but +1 for putting AS first.

It seems to me that as long as the query has to be surrounded by
non-optional parentheses and the options list never starts with a
parenthesis, there isn't much room for a grammar conflict either way.
If the query didn't have to be surrounded by parentheses, you would
want to put the options before the word AS so that the word AS would
serve as an unambiguous terminator for the options specification, but
if the query must be preceded by an opening parenthesis then as long
as the options list can't include an option that begins with such a
parenthesis we are in good shape.

However, generally we have not had great luck with just sticking
keywords in there (cf. VACUUM, ANALYZE, EXPLAIN, COPY) which is why I
suggested using a flexible syntax with parenthesized options.  And
then you're going to have trouble getting bison to figure out what to
do after...

WITH something AS ( some_keyword

...because some_keyword might be the an option name or the first word
of a query.  And putting that syntax before AS isn't any better,
because now it can be confused with a column name list.

I am not deeply worked up about this, just proposing some things to think about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-27 Thread Andreas Karlsson
On 1/26/19 11:55 PM, Tom Lane wrote:> Hearing no immediate pushback on 
that proposal, I went ahead and made

a version of the patch that does it like that, as attached.  I also took
a stab at documenting it fully.


Thanks! This version of the patch looks solid, including the 
documentation. The only remaining question I see is the one Andrew 
raised about if we should change the language to allow for future 
versions of PostgreSQL to add costing for when the same CTE is 
referenced multiple times.


Andreas



Re: Early WIP/PoC for inlining CTEs

2019-01-27 Thread Andreas Karlsson

On 1/27/19 4:21 PM, Tom Lane wrote:

Andrew Gierth  writes:

I'm not sure we should nail down the rule that the absence of NOT
MATERIALIZED will mean a multiply-referenced CTE is evaluated once. One
would hope that in the future the planner might be taught to inline or
not in that case depending on cost. I think it makes more sense to say
that we never inline if MATERIALIZED is specified, that we always inline
if NOT MATERIALIZED is specified, and that if neither is specified the
planner will choose (but perhaps note that currently it always chooses
only based on refcount).


I have no objection to documenting it like that; I just don't want us
to go off into the weeds trying to actually implement something smarter
for v12.


+1

Andreas



Re: Early WIP/PoC for inlining CTEs

2019-01-27 Thread Tom Lane
Andrew Gierth  writes:
> I'm not sure we should nail down the rule that the absence of NOT
> MATERIALIZED will mean a multiply-referenced CTE is evaluated once. One
> would hope that in the future the planner might be taught to inline or
> not in that case depending on cost. I think it makes more sense to say
> that we never inline if MATERIALIZED is specified, that we always inline
> if NOT MATERIALIZED is specified, and that if neither is specified the
> planner will choose (but perhaps note that currently it always chooses
> only based on refcount).

I have no objection to documenting it like that; I just don't want us
to go off into the weeds trying to actually implement something smarter
for v12.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-27 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I was interested to find, while writing the docs, that it's a real
 Tom> struggle to invent plausible reasons to write MATERIALIZED given
 Tom> the above specification. You pretty much have to have lied to the
 Tom> planner, eg by making a volatile function that's not marked
 Tom> volatile, before there's a real need for that. Am I missing
 Tom> something? If I'm not, then we're in a really good place
 Tom> backwards-compatibility-wise, because the new default behavior
 Tom> shouldn't break any cases where people weren't cheating.

The cases where the new default will break things are where people used
WITH to force a choice of join order or otherwise constrain the planner
in order to avoid a misplan.

I'm not sure we should nail down the rule that the absence of NOT
MATERIALIZED will mean a multiply-referenced CTE is evaluated once. One
would hope that in the future the planner might be taught to inline or
not in that case depending on cost. I think it makes more sense to say
that we never inline if MATERIALIZED is specified, that we always inline
if NOT MATERIALIZED is specified, and that if neither is specified the
planner will choose (but perhaps note that currently it always chooses
only based on refcount).

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2019-01-26 Thread Marko Tiikkaja
On Sat, Jan 26, 2019 at 12:22 AM Tom Lane  wrote:

> Therefore, I'm reversing my previous opinion that we should not have
> an explicit NOT MATERIALIZED option.  I think we should add that, and
> the behavior ought to be:
>
> * No option given: inline if there's exactly one reference.
>
> * With MATERIALIZED: never inline.
>
> * With NOT MATERIALIZED: inline regardless of the number of references.
>

This much has been obvious to most people for a long time.


.m


Re: Early WIP/PoC for inlining CTEs

2019-01-26 Thread Tom Lane
I wrote:
> Therefore, I'm reversing my previous opinion that we should not have
> an explicit NOT MATERIALIZED option.  I think we should add that, and
> the behavior ought to be:
> * No option given: inline if there's exactly one reference.
> * With MATERIALIZED: never inline.
> * With NOT MATERIALIZED: inline regardless of the number of references.
> (Obviously, we should not inline if there's RECURSIVE or the CTE
> potentially has side-effects, regardless of the user option;
> I don't think those cases are up for debate.)

Hearing no immediate pushback on that proposal, I went ahead and made
a version of the patch that does it like that, as attached.  I also took
a stab at documenting it fully.

I was interested to find, while writing the docs, that it's a real
struggle to invent plausible reasons to write MATERIALIZED given the
above specification.  You pretty much have to have lied to the planner,
eg by making a volatile function that's not marked volatile, before
there's a real need for that.  Am I missing something?  If I'm not,
then we're in a really good place backwards-compatibility-wise,
because the new default behavior shouldn't break any cases where people
weren't cheating.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177eba..6d456f6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2925,6 +2925,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 
 /* we store the string name because RTE_CTE RTEs need it */
 APP_JUMB_STRING(cte->ctename);
+APP_JUMB(cte->ctematerialized);
 JumbleQuery(jstate, castNode(Query, cte->ctequery));
 			}
 			break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d..8c26dd1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165..56602a1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189..2225255 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2199,16 +2199,19 @@ SELECT n FROM t LIMIT 100;
   
 
   
-   A useful property of WITH queries is that 

Re: Early WIP/PoC for inlining CTEs

2019-01-25 Thread Tom Lane
Andreas Karlsson  writes:
> [ inlining-ctes-v8.patch ]

I went ahead and pushed the stuff about QTW_EXAMINE_RTES_BEFORE/_AFTER,
because that seems like an independent change with other possible uses.

Attached is an updated version of the rest of the patch, with mostly
cosmetic changes.  I've not touched the documentation, but I think this
is otherwise committable if we are satisfied with the semantics.

However ... after thinking about it more, I'm not really satisfied
with that.  In particular I don't like the fact that by default this
will inline regardless of the number of references to the CTE.  I doubt
that inlining when there are multiple references is so likely to be a
win as to justify it being the default, especially given that it flies
in the face of what our documentation has said for as long as we've
had CTEs.

Therefore, I'm reversing my previous opinion that we should not have
an explicit NOT MATERIALIZED option.  I think we should add that, and
the behavior ought to be:

* No option given: inline if there's exactly one reference.

* With MATERIALIZED: never inline.

* With NOT MATERIALIZED: inline regardless of the number of references.

(Obviously, we should not inline if there's RECURSIVE or the CTE
potentially has side-effects, regardless of the user option;
I don't think those cases are up for debate.)

I haven't done anything about that here, but the changes would be pretty
minor.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177eba..6d456f6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2925,6 +2925,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 
 /* we store the string name because RTE_CTE RTEs need it */
 APP_JUMB_STRING(cte->ctename);
+APP_JUMB(cte->ctematerialized);
 JumbleQuery(jstate, castNode(Query, cte->ctequery));
 			}
 			break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d..8c26dd1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165..56602a1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml 

Re: Early WIP/PoC for inlining CTEs

2019-01-21 Thread Gavin Flower

On 22/01/2019 02:40, Andreas Karlsson wrote:

On 1/18/19 9:34 PM, Robert Haas wrote:
On Thu, Jan 17, 2019 at 10:48 AM Andreas Karlsson  
wrote:

On 1/11/19 8:10 PM, Robert Haas wrote:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...


Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the
usefulness of forcing inlining other than if we by default do not 
inline

when a CTE is referenced multiple times.


When the planner materializes it, but the performance of the resulting
plan therefore sucks, I suppose.

I don't feel super-strongly about this, and Tom is right that there
may be cases where materialization is just not practical due to
implementation restrictions.  But it's not crazy to imagine that
inlining a multiply-referenced CTE might create opportunities for
optimization at each of those places, perhaps not the same ones in
each case, whereas materializing it results in doing extra work.


I see.

I have a minor biksheddish question about the syntax.

You proposed:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query

While Andrew proposed:

WITH cte_name AS [[NOT] MATERIALIZED] (query) main_query

Do people have any preference between these two?

Andreas


+1

For putting the 'AS' earlier, 2nd option,  I think it reads better.


Cheers,
Gavin





Re: Early WIP/PoC for inlining CTEs

2019-01-21 Thread Andreas Karlsson

On 1/10/19 2:28 AM, Andreas Karlsson wrote:
Here is a new version of the patch with added tests, improved comments, 
some minor code cleanup and most importantly slightly changed logic for 
when we should inline.


Add ctematerialized to the JumbleExpr() in pg_stat_statements on 
suggestion from Andrew Gierth. I think that is the correct thing to do 
since it can have a major impact on performance.


Andreas
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177ebaa2c..6d456f6bce 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2925,6 +2925,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 
 /* we store the string name because RTE_CTE RTEs need it */
 APP_JUMB_STRING(cte->ctename);
+APP_JUMB(cte->ctematerialized);
 JumbleQuery(jstate, castNode(Query, cte->ctequery));
 			}
 			break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d37a..8c26dd1f26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165650..56602a164c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189646..92f180c5cf 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2199,6 +2199,8 @@ SELECT n FROM t LIMIT 100;
   
 
   
+   TODO: Update this for inlining and MATERIALIZED.
+
A useful property of WITH queries is that they are evaluated
only once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH queries.
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..92ede4fc6c 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionand with_query is:
 
-with_query_name [ ( column_name [, ...] ) ] AS ( select | values | insert | update | delete )
+with_query_name [ ( column_name [, ...] ) ] AS [ MATERIALIZED ] ( select | values | insert | update | delete )
 
 TABLE [ ONLY ] table_name [ * ]
 
@@ -273,6 +273,8 @@ TABLE [ ONLY ] table_name [ * ]

 

+TODO: Update this 

Re: Early WIP/PoC for inlining CTEs

2019-01-21 Thread Tom Lane
Andreas Karlsson  writes:
> I have a minor biksheddish question about the syntax.
> You proposed:
> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query
> While Andrew proposed:
> WITH cte_name AS [[NOT] MATERIALIZED] (query) main_query
> Do people have any preference between these two?

FWIW, I'd independently thought that the latter is more readable,
and probably less likely to have syntax problems with future
extensions (since AS is already fully reserved).  Didn't get
around to mentioning it yet, but +1 for putting AS first.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-21 Thread Andreas Karlsson

On 1/18/19 9:34 PM, Robert Haas wrote:

On Thu, Jan 17, 2019 at 10:48 AM Andreas Karlsson  wrote:

On 1/11/19 8:10 PM, Robert Haas wrote:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...


Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the
usefulness of forcing inlining other than if we by default do not inline
when a CTE is referenced multiple times.


When the planner materializes it, but the performance of the resulting
plan therefore sucks, I suppose.

I don't feel super-strongly about this, and Tom is right that there
may be cases where materialization is just not practical due to
implementation restrictions.  But it's not crazy to imagine that
inlining a multiply-referenced CTE might create opportunities for
optimization at each of those places, perhaps not the same ones in
each case, whereas materializing it results in doing extra work.


I see.

I have a minor biksheddish question about the syntax.

You proposed:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query

While Andrew proposed:

WITH cte_name AS [[NOT] MATERIALIZED] (query) main_query

Do people have any preference between these two?

Andreas



Re: Early WIP/PoC for inlining CTEs

2019-01-18 Thread Robert Haas
On Fri, Jan 18, 2019 at 3:42 PM Andres Freund  wrote:
> On 2019-01-18 15:34:46 -0500, Robert Haas wrote:
> > On Thu, Jan 17, 2019 at 10:48 AM Andreas Karlsson  wrote:
> > > On 1/11/19 8:10 PM, Robert Haas wrote:
> > > > WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...
> > >
> > > Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the
> > > usefulness of forcing inlining other than if we by default do not inline
> > > when a CTE is referenced multiple times.
> >
> > When the planner materializes it, but the performance of the resulting
> > plan therefore sucks, I suppose.
> >
> > I don't feel super-strongly about this, and Tom is right that there
> > may be cases where materialization is just not practical due to
> > implementation restrictions.
>
> *not* materializing I assume?

Right, sorry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-18 Thread Andres Freund
On 2019-01-18 15:34:46 -0500, Robert Haas wrote:
> On Thu, Jan 17, 2019 at 10:48 AM Andreas Karlsson  wrote:
> > On 1/11/19 8:10 PM, Robert Haas wrote:
> > > WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...
> >
> > Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the
> > usefulness of forcing inlining other than if we by default do not inline
> > when a CTE is referenced multiple times.
> 
> When the planner materializes it, but the performance of the resulting
> plan therefore sucks, I suppose.
> 
> I don't feel super-strongly about this, and Tom is right that there
> may be cases where materialization is just not practical due to
> implementation restrictions.

*not* materializing I assume?

Greetings,

Andres Freund



Re: Early WIP/PoC for inlining CTEs

2019-01-18 Thread Robert Haas
On Thu, Jan 17, 2019 at 10:48 AM Andreas Karlsson  wrote:
> On 1/11/19 8:10 PM, Robert Haas wrote:
> > WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...
>
> Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the
> usefulness of forcing inlining other than if we by default do not inline
> when a CTE is referenced multiple times.

When the planner materializes it, but the performance of the resulting
plan therefore sucks, I suppose.

I don't feel super-strongly about this, and Tom is right that there
may be cases where materialization is just not practical due to
implementation restrictions.  But it's not crazy to imagine that
inlining a multiply-referenced CTE might create opportunities for
optimization at each of those places, perhaps not the same ones in
each case, whereas materializing it results in doing extra work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-17 Thread Tom Lane
Andreas Karlsson  writes:
> On 1/11/19 8:10 PM, Robert Haas wrote:
>> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...

> Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the 
> usefulness of forcing inlining other than if we by default do not inline 
> when a CTE is referenced multiple times.

I'm also concerned about what we do if the user says NOT MATERIALIZED
but there are semantic or implementation reasons not to inline.  Either
we throw an error or do something the user didn't expect, and neither
is very nice.  So I'm not in favor of having that option.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2019-01-17 Thread Andreas Karlsson

On 1/11/19 8:10 PM, Robert Haas wrote:

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...


Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the 
usefulness of forcing inlining other than if we by default do not inline 
when a CTE is referenced multiple times.


Do you imaging it working something like the below?

1. Default

# Not inlined

- Referenced multiple times
- Includes FOR UPDATE|SHARE
- Includes volatile functions
- Recurisve
- DML

# Inlined

- Simple case (no side effects, referenced once)

2. MATERIALIZED

# Not inlined

- Everything

# Inlined

- (none)

3. NOT MATERIALIZED

# Not inlined

- Recursive
- DML

# Inlined

- Everything else

Andreas




Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Pavel Stehule
pá 11. 1. 2019 v 20:11 odesílatel Robert Haas 
napsal:

> On Fri, Jan 11, 2019 at 2:04 PM Andres Freund  wrote:
> > > Maybe we could consider a more extensible syntax that is attached to
> > > the contained SELECT rather than the containing WITH.  Then CTEs would
> > > be less special; there'd be a place to put hints controlling top-level
> > > queries, subselects, views etc too (perhaps eventually join hints,
> > > parallelism hints etc, but "materialize this" would be just another
> > > one of those things).  That'd be all-in.
> >
> > I think you have some purity arguments here, but the likelihood of us
> > developing a full-blown solution is not that high, and the lack of
> > inlinable CTEs is *really* hurting us. As long as the design doesn't
> > block a full solution, if we go there, I think it's a very acceptable
> > blemish in comparison to the benefits we'd get.
>
> Also, it seems to me that this is properly a property of the
> individual WITH clause, not the query as a whole.
>
> I mean I suppose we could do
>
> WITH or_with_out_you OPTIONS (materialized false) AS (SELECT 'mariah
> carey') SELECT ...
>
> That'd allow for extensibility, have the write scope, and look like
> what we do elsewhere.  It looks a little less elegant than
>
> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...
>
> ...but maybe elegance for extensibility is a good trade.
>

I like this explicit syntax (both variant can be used). From my
perspective, it is much better than hints in comments.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>


Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Merlin Moncure
On Tue, Jul 24, 2018 at 6:10 PM Andres Freund  wrote:
> My read of the concensus (in which I am in the majority, so I might be
> biased) is that we do want inlining to be the default. We were thinking
> that it'd be necessary to provide a way to force inlining on the SQL
> level for individual CTEs.

This is correct.  Suggesting that we need syntax to disabling inlining
at the CTE level, and/or GUC to control the behavior (which I agree
should be defualted to inline).  Something like
enable_cte_inline=true; I'm not very enthusiastic about explicitly
breaking intentionally introduced optimization fences and then forcing
people to inject our OFFSET 0 hack.   This is just too unpleasant to
contemplate...what  happens if we come up with a better implemntation
of OFFSET?  yuck.

Thanks for providing this, CTE plan problems are a real bugaboo.

merlin



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Mike Rylander
On Fri, Jan 11, 2019 at 2:10 PM Robert Haas  wrote:
>
> On Fri, Jan 11, 2019 at 2:04 PM Andres Freund  wrote:
> > > Maybe we could consider a more extensible syntax that is attached to
> > > the contained SELECT rather than the containing WITH.  Then CTEs would
> > > be less special; there'd be a place to put hints controlling top-level
> > > queries, subselects, views etc too (perhaps eventually join hints,
> > > parallelism hints etc, but "materialize this" would be just another
> > > one of those things).  That'd be all-in.
> >
> > I think you have some purity arguments here, but the likelihood of us
> > developing a full-blown solution is not that high, and the lack of
> > inlinable CTEs is *really* hurting us. As long as the design doesn't
> > block a full solution, if we go there, I think it's a very acceptable
> > blemish in comparison to the benefits we'd get.
>
> Also, it seems to me that this is properly a property of the
> individual WITH clause, not the query as a whole.
>
> I mean I suppose we could do
>
> WITH or_with_out_you OPTIONS (materialized false) AS (SELECT 'mariah
> carey') SELECT ...
>
> That'd allow for extensibility, have the write scope, and look like
> what we do elsewhere.  It looks a little less elegant than
>
> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...
>
> ...but maybe elegance for extensibility is a good trade.
>

Here, have $0.02 from the peanut gallery...

I mildly prefer the latter, elegant spelling, but if CTE inlining does
become a thing then I would /really/ want some way, any way, of
telling Postgres that I want it to materialize a particular CTE.

I use that currently-documented property of CTEs to structure large,
complicated OLAP queries on a regular basis, for performance.
Sometimes, such as when you have dozens of tables in a complex join
tree, breaking the query into logically related chunks (which I know
about, but the planner does not) via CTE is the only way to give the
planner a fighting chance of finding a good plan.  Otherwise you get
stuck in the GEQO ghetto, or planning time is some non-trivial
multiple of execution time.

Thanks,

--
Mike Rylander
 | Executive Director
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@equinoxinitiative.org
 | web:  http://equinoxinitiative.org



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Robert Haas
On Fri, Jan 11, 2019 at 2:04 PM Andres Freund  wrote:
> > Maybe we could consider a more extensible syntax that is attached to
> > the contained SELECT rather than the containing WITH.  Then CTEs would
> > be less special; there'd be a place to put hints controlling top-level
> > queries, subselects, views etc too (perhaps eventually join hints,
> > parallelism hints etc, but "materialize this" would be just another
> > one of those things).  That'd be all-in.
>
> I think you have some purity arguments here, but the likelihood of us
> developing a full-blown solution is not that high, and the lack of
> inlinable CTEs is *really* hurting us. As long as the design doesn't
> block a full solution, if we go there, I think it's a very acceptable
> blemish in comparison to the benefits we'd get.

Also, it seems to me that this is properly a property of the
individual WITH clause, not the query as a whole.

I mean I suppose we could do

WITH or_with_out_you OPTIONS (materialized false) AS (SELECT 'mariah
carey') SELECT ...

That'd allow for extensibility, have the write scope, and look like
what we do elsewhere.  It looks a little less elegant than

WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query...

...but maybe elegance for extensibility is a good trade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Andres Freund
Hi,

On 2019-01-12 07:58:39 +1300, Thomas Munro wrote:
> On Sat, Jan 12, 2019 at 7:19 AM Andres Freund  wrote:
> > On 2019-01-11 11:12:25 -0500, Robert Haas wrote:
> > > I actually think that we should go "all in" here and allow the user to
> > > specify either that they want materialization or that they don't want
> > > materialization.  If they specify neither, then we make some decision
> > > which we may change in a future release.  If they do specify
> > > something, we do that.
> >
> > +many
> 
> I think the syntax as proposed is almost OK if we only want to
> grandfather in our historically hintful CTEs but discourage the
> development of any future kinds of hints.  Even then I don't love the
> way it formalises a semi-procedural step at the same language level as
> a glorious declarative relational query.
> 
> Maybe we could consider a more extensible syntax that is attached to
> the contained SELECT rather than the containing WITH.  Then CTEs would
> be less special; there'd be a place to put hints controlling top-level
> queries, subselects, views etc too (perhaps eventually join hints,
> parallelism hints etc, but "materialize this" would be just another
> one of those things).  That'd be all-in.

I think you have some purity arguments here, but the likelihood of us
developing a full-blown solution is not that high, and the lack of
inlinable CTEs is *really* hurting us. As long as the design doesn't
block a full solution, if we go there, I think it's a very acceptable
blemish in comparison to the benefits we'd get.

Greetings,

Andres Freund



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Thomas Munro
On Sat, Jan 12, 2019 at 7:19 AM Andres Freund  wrote:
> On 2019-01-11 11:12:25 -0500, Robert Haas wrote:
> > I actually think that we should go "all in" here and allow the user to
> > specify either that they want materialization or that they don't want
> > materialization.  If they specify neither, then we make some decision
> > which we may change in a future release.  If they do specify
> > something, we do that.
>
> +many

I think the syntax as proposed is almost OK if we only want to
grandfather in our historically hintful CTEs but discourage the
development of any future kinds of hints.  Even then I don't love the
way it formalises a semi-procedural step at the same language level as
a glorious declarative relational query.

Maybe we could consider a more extensible syntax that is attached to
the contained SELECT rather than the containing WITH.  Then CTEs would
be less special; there'd be a place to put hints controlling top-level
queries, subselects, views etc too (perhaps eventually join hints,
parallelism hints etc, but "materialize this" would be just another
one of those things).  That'd be all-in.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Robert Haas
On Fri, Jan 11, 2019 at 1:49 PM David Fetter  wrote:
> I don't see those as the same thing even slightly. Functions are
> Turing complete, generally speaking, which means that unless we send
> along those descriptors, we're asking the planner to solve the Halting
> Problem.

So... your argument is that functions are Turing-complete, but the
queries which call those functions somehow aren't?  Actually, there's
probably a decent argument that WITH RECURSIVE is Turing-complete even
without any fancy functions.  See
https://wiki.postgresql.org/wiki/Turing_Machine_(with_recursive)

> > OK, I know that's a bit of a straw man -- you're talking about hints
> > within a query, not DDL.  Still, I think our theory about not having
> > hints is that we should have the optimizer try to figure it out
> > instead of making the user specify the behavior that they want -- and
> > I think sometimes that's setting the bar at an impossible level.
>
> There is a worked example that's open source.
> https://github.com/ossc-db/pg_hint_plan
>
> Have we looked over it seriously for inclusion in PostgreSQL?

That really has very little to do with what's under discussion here,
unless you're proposing that the right strategy for determining
whether to materialize the CTE or not is to execute the query both
ways and then use that to construct the plan we use to execute the
query.

> When they're specifying it, are they specifying it globally, or
> per WITH clause, or...?

Per WITH clause.  That's the proposal which is under discussion here,
not anything else.  Did you read the thread?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread David Fetter
On Fri, Jan 11, 2019 at 11:12:25AM -0500, Robert Haas wrote:
> > I know this is a thorny topic, but I have to say that I am uneasy
> > about the MATERIALIZED syntax.  Here's how you write that in some
> > other RDBMS that loves hints:
> >
> > WITH foo AS (SELECT /*+ MATERIALIZE */ ...)
> >
> > I understood that it was a long standing project policy that we don't
> > want planner hints, but now we have a proposal to support one with a
> > top-level non-standard syntax.  If we take this syntax, should we not
> > also accept MATERIALIZED in front of subselects?
> >
> > -1
> 
> I think this is unduly negative.  Do you want to also remove the
> IMMUTABLE, STABLE, and VOLATILE keywords from functions because those
> are hints to the planner as to how those things should be treated?

I don't see those as the same thing even slightly. Functions are
Turing complete, generally speaking, which means that unless we send
along those descriptors, we're asking the planner to solve the Halting
Problem.

> Should we remove CREATE INDEX -- which is, after all, a non-standard
> extension to SQL syntax -- because it presumes that the user is in a
> better position than we are to know which columns ought to be
> indexed?

Not yet.  Maybe in a decade or two.

> OK, I know that's a bit of a straw man -- you're talking about hints
> within a query, not DDL.  Still, I think our theory about not having
> hints is that we should have the optimizer try to figure it out
> instead of making the user specify the behavior that they want -- and
> I think sometimes that's setting the bar at an impossible level.

There is a worked example that's open source.
https://github.com/ossc-db/pg_hint_plan

Have we looked over it seriously for inclusion in PostgreSQL?

> It's not like this is a thing where we can get this right 90% of the
> time and with some more planner work we can drive it up to near 100%.
> We're going to be wrong a lot, even if we do expensive things like try
> planning it both ways and see which one comes out cheaper on cost, and
> we don't like driving up planner CPU consumption, either.  So it seems
> to me that letting the user say what they want is a very pragmatic
> approach.  Sometimes, like with things that have side effects, it's
> the only way to know that we're even delivering the right answer; and
> even when it's just an optimization problem, it's nice to give users a
> way of fixing our planning failures that is better than asking them to
> wait until we invent a way of improving the optimization decision in
> some future release - which we may never even do.
> 
> I actually think that we should go "all in" here and allow the user to
> specify either that they want materialization or that they don't want
> materialization.  If they specify neither, then we make some decision
> which we may change in a future release.  If they do specify
> something, we do that.

When they're specifying it, are they specifying it globally, or
per WITH clause, or...?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Andres Freund
On 2019-01-11 11:12:25 -0500, Robert Haas wrote:
> I actually think that we should go "all in" here and allow the user to
> specify either that they want materialization or that they don't want
> materialization.  If they specify neither, then we make some decision
> which we may change in a future release.  If they do specify
> something, we do that.

+many



Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Robert Haas
> I know this is a thorny topic, but I have to say that I am uneasy
> about the MATERIALIZED syntax.  Here's how you write that in some
> other RDBMS that loves hints:
>
> WITH foo AS (SELECT /*+ MATERIALIZE */ ...)
>
> I understood that it was a long standing project policy that we don't
> want planner hints, but now we have a proposal to support one with a
> top-level non-standard syntax.  If we take this syntax, should we not
> also accept MATERIALIZED in front of subselects?
>
> -1

I think this is unduly negative.  Do you want to also remove the
IMMUTABLE, STABLE, and VOLATILE keywords from functions because those
are hints to the planner as to how those things should be treated?
Should we remove CREATE INDEX -- which is, after all, a non-standard
extension to SQL syntax -- because it presumes that the user is in a
better position than we are to know which columns ought to be indexed?

OK, I know that's a bit of a straw man -- you're talking about hints
within a query, not DDL.  Still, I think our theory about not having
hints is that we should have the optimizer try to figure it out
instead of making the user specify the behavior that they want -- and
I think sometimes that's setting the bar at an impossible level.  In
the case of things that have side effects, like FOR UPDATE/SHARE or
volatile functions, we really can't know the user's intention unless
the user tells us.  But even in cases where there are no side effects
and it's just a question of finding the most efficient plan, it
doesn't seem crazy to me to allow the user to give us a clue about
what they have in mind.

It's not like this is a thing where we can get this right 90% of the
time and with some more planner work we can drive it up to near 100%.
We're going to be wrong a lot, even if we do expensive things like try
planning it both ways and see which one comes out cheaper on cost, and
we don't like driving up planner CPU consumption, either.  So it seems
to me that letting the user say what they want is a very pragmatic
approach.  Sometimes, like with things that have side effects, it's
the only way to know that we're even delivering the right answer; and
even when it's just an optimization problem, it's nice to give users a
way of fixing our planning failures that is better than asking them to
wait until we invent a way of improving the optimization decision in
some future release - which we may never even do.

I actually think that we should go "all in" here and allow the user to
specify either that they want materialization or that they don't want
materialization.  If they specify neither, then we make some decision
which we may change in a future release.  If they do specify
something, we do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2019-01-09 Thread Thomas Munro
On Thu, Jan 10, 2019 at 2:28 PM Andreas Karlsson  wrote:
> 2. Feedback on the new syntax. I am personally fine with the current
> syntax, but it was just something I just quickly hacked together to move
> the patch forward and which also solved my personal uses cases.

Thanks for working on this.  I very much want to see this feature go
in.  As mentioned by Andres up-thread, TPC-DS makes a lot of use of
CTEs... let me see, 34 queries out of 99 have a WITH clause.  These
will hopefully become candidates for parallel query.

I know this is a thorny topic, but I have to say that I am uneasy
about the MATERIALIZED syntax.  Here's how you write that in some
other RDBMS that loves hints:

WITH foo AS (SELECT /*+ MATERIALIZE */ ...)

I understood that it was a long standing project policy that we don't
want planner hints, but now we have a proposal to support one with a
top-level non-standard syntax.  If we take this syntax, should we not
also accept MATERIALIZED in front of subselects?

-1

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Early WIP/PoC for inlining CTEs

2019-01-09 Thread Andreas Karlsson
Here is a new version of the patch with added tests, improved comments, 
some minor code cleanup and most importantly slightly changed logic for 
when we should inline.


The current strategy is to always inline unless the CTE is recursive or 
it has side effects, i.e. it is a DML query, it contains calls it a 
volatile function, or it contains FOR SHARE/FOR UPDATE. I feel this is a 
conservative approach which prevents behavioral changes (other than 
performance).


Current open issues:

1. Currently the patch will inline CTEs even when there are multiple 
references to them. If this is a win or not depends on the specific 
query so I am not sure what we should do here. One thing worth noting is 
that our current documentation talks a lot about how CTEs are only 
evaluated once.


"A useful property of WITH queries is that they are 
evaluated

only once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH 
queries.

Thus, expensive calculations that are needed in multiple places can be
placed within a WITH query to avoid redundant work. 
Another

possible application is to prevent unwanted multiple evaluations of
functions with side-effects."

What do you think?

2. Feedback on the new syntax. I am personally fine with the current 
syntax, but it was just something I just quickly hacked together to move 
the patch forward and which also solved my personal uses cases.


3. Are we happy with how I modified query_tree_walker()? I feel the code 
would be clearer if we could change the tree walker to treat the RTE as 
the parent node of the subquery instead of a sibling, but this seems 
like potentially a quite invasive change.


4. I need to update the user documentation.

Andreas
commit 60aab4af243cc93d504b99c99010593c02a5705c
Author: Andreas Karlsson 
Date:   Mon Sep 17 03:42:40 2018 +0200

CTE inlining

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d37a..8c26dd1f26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165650..56602a164c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189646..92f180c5cf 100644
--- a/doc/src/sgml/queries.sgml
+++ 

Re: Early WIP/PoC for inlining CTEs

2019-01-01 Thread Andreas Karlsson

On 1/1/19 1:42 AM, Andrew Gierth wrote:

"Andreas" == Andreas Karlsson  writes:


  Andreas> I believe I have fixed these except for the comment on the
  Andreas> conditions for when we inline.

  Andreas> Andrew Gierth: Why did you chose to not inline on FOR UPDATE
  Andreas> but inline volatile functions? I feel that this might be
  Andreas> inconsistent since in both cases the query in the CTE can
  Andreas> change behavior if the planner pushes a WHERE clause into the
  Andreas> subquery, but maybe I am missing something.

I chose not to inline FOR UPDATE because it was an obvious compatibility
break, potentially changing the set of locked rows, and it was an easy
condition to test.

I did not test for volatile functions simply because this was a very
early stage of the project (which wasn't my project, I was just
assisting someone else). I left the comment "this likely needs some
additional checks" there for a reason.


Thanks, that makes sense! I will need to ponder some on if the behavior 
change when predicates are pushed into a subquery with volatile 
functions is ok. I am leaning towards no, because otherwise inlining 
CTEs would affect more than query performance.


Andreas



Re: Early WIP/PoC for inlining CTEs

2019-01-01 Thread Andreas Karlsson

On 1/1/19 3:18 AM, Andrew Gierth wrote:

I had a comment around here which seems to have been lost:

  * Secondly, views (and explicit subqueries) currently have
  * different behaviour w.r.t. SELECT FOR UPDATE than CTEs do. A
  * FOR UPDATE clause is treated as extending into views and
  * subqueries, but not into CTEs. We preserve this distinction
  * by not trying to push rowmarks into the new subquery.

This comment seems to me to be worth preserving (unless this behavior is
changed). What I'm referring to is the following, which is unchanged by
the patch:

create table t1 as select 123 as a;
create view v1 as select * from t1;
select * from t1 for update;  -- locks row in t1
select * from t1 for update of t1;  -- locks row in t1
select * from v1 for update;  -- locks row in t1
select * from v1 for update of v1;  -- locks row in t1
select * from (select * from t1) s1 for update;  -- locks row in t1
select * from (select * from t1) s1 for update of s1;  -- locks row in t1
with c1 as (select * from t1)
   select * from c1 for update;  -- does NOT lock anything at all
with c1 as (select * from t1)
   select * from c1 for update of c1;  -- parse-time error

(Obviously, inlining decisions should not change what gets locked;
the behavior here should not be changed unless it is changed for both
inlined and non-inlined CTEs.)


I see, I misread the comment. I will re-add it, possibly with some word 
smithing. Thanks!


Andreas




Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 Andreas> +   if (rte->rtekind == RTE_CTE &&
 Andreas> +   strcmp(rte->ctename, context->ctename) == 0 &&
 Andreas> +   rte->ctelevelsup == context->levelsup)
 Andreas> +   {
 Andreas> +   Query *newquery = copyObject(context->ctequery);
 Andreas> +
 Andreas> +   /* Preserve outer references, for example to other CTEs */
 Andreas> +   if (context->levelsup > 0)
 Andreas> +   IncrementVarSublevelsUp((Node *) newquery, 
context->levelsup, 1);

I had a comment around here which seems to have been lost:

 * Secondly, views (and explicit subqueries) currently have
 * different behaviour w.r.t. SELECT FOR UPDATE than CTEs do. A
 * FOR UPDATE clause is treated as extending into views and
 * subqueries, but not into CTEs. We preserve this distinction
 * by not trying to push rowmarks into the new subquery.

This comment seems to me to be worth preserving (unless this behavior is
changed). What I'm referring to is the following, which is unchanged by
the patch:

create table t1 as select 123 as a;
create view v1 as select * from t1;
select * from t1 for update;  -- locks row in t1
select * from t1 for update of t1;  -- locks row in t1
select * from v1 for update;  -- locks row in t1
select * from v1 for update of v1;  -- locks row in t1
select * from (select * from t1) s1 for update;  -- locks row in t1
select * from (select * from t1) s1 for update of s1;  -- locks row in t1
with c1 as (select * from t1)
  select * from c1 for update;  -- does NOT lock anything at all
with c1 as (select * from t1)
  select * from c1 for update of c1;  -- parse-time error

(Obviously, inlining decisions should not change what gets locked;
the behavior here should not be changed unless it is changed for both
inlined and non-inlined CTEs.)

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 Andreas> I believe I have fixed these except for the comment on the
 Andreas> conditions for when we inline.

 Andreas> Andrew Gierth: Why did you chose to not inline on FOR UPDATE
 Andreas> but inline volatile functions? I feel that this might be
 Andreas> inconsistent since in both cases the query in the CTE can
 Andreas> change behavior if the planner pushes a WHERE clause into the
 Andreas> subquery, but maybe I am missing something.

I chose not to inline FOR UPDATE because it was an obvious compatibility
break, potentially changing the set of locked rows, and it was an easy
condition to test.

I did not test for volatile functions simply because this was a very
early stage of the project (which wasn't my project, I was just
assisting someone else). I left the comment "this likely needs some
additional checks" there for a reason.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-12-31 Thread Andreas Karlsson

On 11/16/18 10:15 PM, Tom Lane wrote:
> I took a little bit of a look through this.  Some thoughts:

Thanks for the review! I have decided to pick up this patch and work on 
it since nothing has happened in a while. Here is a new version with 
most of the feedback fixed.



* This is not the idiomatic way to declare an expression tree walker:

+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+   struct inline_cte_walker_ctx *ctx = ctxp;

* I have no faith in the idea that we can skip doing a copyObject on the
inlined subquery, except maybe in the case where we know there's exactly
one reference.

* In "here's where we do the actual substitution", if we're going to
scribble on the RTE rather than make a new one, we must take pains
to zero out the RTE_CTE-specific fields so that the RTE looks the
same as if it had been a RTE_SUBQUERY all along; cf db1071d4e.

* The lack of comments about what conditions we inline under
(at subselect.c:1318) is distressing.  I'm not particularly
in love with the way that inline_cte_walker is commented, either.
And dare I mention that this falsifies the intro comment for
SS_process_ctes?

* Speaking of the comments, I'm not convinced that view rewrite is
a good comparison point; I think this is more like subquery pullup.


I believe I have fixed these except for the comment on the conditions 
for when we inline.


Andrew Gierth: Why did you chose to not inline on FOR UPDATE but inline 
volatile functions? I feel that this might be inconsistent since in both 
cases the query in the CTE can change behavior if the planner pushes a 
WHERE clause into the subquery, but maybe I am missing something.



* I wonder whether we should make range_table_walker more friendly
to the needs of this patch.  The fact that it doesn't work for this usage
suggests that it might not work for others, too.  I could see replacing
the QTW_EXAMINE_RTES flag with QTW_EXAMINE_RTES_BEFORE and
QTW_EXAMINE_RTES_AFTER so that callers could say which order of operations
they want (ie, visit RTE itself before or after its substructure).  Then
we could get rid of the double traversal of the RTE list.


I did as suggested and the code is now much cleaner, but I feel while 
RTE walking business would become cleaner if we could change from having 
the range table walker yield both RTEs and the contents of the RTEs to 
having it just yeild the RTEs and then the walker callback can call 
expression_tree_walker() with the RTE so RTEs are treated like any other 
node in the tree.


I might look into how big impact such a change would have and if it is 
worth the churn.



* I think a large fraction of the regression test cases that this
changes are actually broken by the patch, in the sense that we meant
to test the behavior with a CTE and now we're not getting that.
So we'd need to add MATERIALIZED in many more places than this has
done.  Somebody else (Stephen?) would need to opine on whether that's
true for the CTEs in rowsecurity.sql, but it's definitely true for
the one in rowtypes.sql, where the point is to test what happens
with a whole-row Var.


Agreed, fixed.


* Which will mean we need some new test cases showing that this patch does
anything.  It'd be a good idea to add a test case showing that this gets
things right for conflicting CTE names at different levels, eg

explain verbose
with x as (select 1 as y)
select * from (with x as (select 2 as y) select * from x) ss;


Added this test case, but more are needed. Any suggestion for what file 
these tests belong (right now I just added it to subselect.sql)? Or 
should I add a new called cte.sql?



* ruleutils.c needs adjustments for the new syntax, if we keep that.


Thanks, fixed!


* And of course the documentation needs much more work than this.


Yeah, I was waiting for there to be more agreement on when CTEs should 
be inlined, but maybe I should start writing anyway.


Andreas

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d37a..8c26dd1f26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS 

Re: Early WIP/PoC for inlining CTEs

2018-11-18 Thread Craig Ringer
On Sat, 17 Nov 2018 at 10:12, Stephen Frost  wrote:

> Greetings,
>
> * Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> > > "Tom" == Tom Lane  writes:
> >
> >  >> [ inlining-ctes-v5.patch ]
> >
> >  Tom> I took a little bit of a look through this.  Some thoughts:
> >
> >  Tom> * I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be
> >  Tom> an alternate way of keeping it from being inlined. As the patch
> >  Tom> stands, if that's the behavior you want, you have no way to
> >  Tom> express it in a query that will also work in older servers. (I
> >  Tom> will manfully resist suggesting that then we don't need the
> >  Tom> nonstandard syntax at all ... oops, too late.)
> >
> > I think this is the wrong approach, because you may want the
> > optimization-barrier effects of OFFSET/LIMIT _without_ the actual
> > materialization - there is no need to force a query like
> >
> > with d as (select stuff from bigtable offset 1) select * from d;
> >
> > to push all the data through an (on-disk) tuplestore.
>
> Agreed, there's going to be cases where you want the CTE to be inlined
> even with OFFSET/LIMIT.  Let's please not cater to the crowd who
> happened to know that they could hack around with OFFSET/LIMIT to make
> something not be inlined when it comes to the question of if the CTE
> should be inlined or not.  That's the same issue we were argueing around
> when discussing if we should allow parallel array_agg, imv.
>
> Particularly since, with CTEs anyway, we never inlined them, so the
> whole OFFSET/LIMIT thing doesn't really make any sense- today, if you
> wrote a CTE, you wouldn't bother with OFFSET/LIMIT because you knew it
> wasn't going to be inlined, that entire line of thinking is for
> subqueries, not CTEs.  If you're going to force people to change their
> CTEs to require that they not be inlined, let's not pick a method which
> makes it ambiguous and makes us have to ask "do they really want this
> limit/offset, or did they just want to make the CTE not be inlined...?"
>
>
To satisfy Tom's understandable desire to let people write queries that
behave the same on old and new versions, can we get away with back-patching
the MATERIALIZED parser enhancement as a no-op in point releases?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Early WIP/PoC for inlining CTEs

2018-11-16 Thread Stephen Frost
Greetings,

* Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> > "Tom" == Tom Lane  writes:
> 
>  >> [ inlining-ctes-v5.patch ]
> 
>  Tom> I took a little bit of a look through this.  Some thoughts:
> 
>  Tom> * I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be
>  Tom> an alternate way of keeping it from being inlined. As the patch
>  Tom> stands, if that's the behavior you want, you have no way to
>  Tom> express it in a query that will also work in older servers. (I
>  Tom> will manfully resist suggesting that then we don't need the
>  Tom> nonstandard syntax at all ... oops, too late.)
> 
> I think this is the wrong approach, because you may want the
> optimization-barrier effects of OFFSET/LIMIT _without_ the actual
> materialization - there is no need to force a query like
> 
> with d as (select stuff from bigtable offset 1) select * from d;
> 
> to push all the data through an (on-disk) tuplestore.

Agreed, there's going to be cases where you want the CTE to be inlined
even with OFFSET/LIMIT.  Let's please not cater to the crowd who
happened to know that they could hack around with OFFSET/LIMIT to make
something not be inlined when it comes to the question of if the CTE
should be inlined or not.  That's the same issue we were argueing around
when discussing if we should allow parallel array_agg, imv.

Particularly since, with CTEs anyway, we never inlined them, so the
whole OFFSET/LIMIT thing doesn't really make any sense- today, if you
wrote a CTE, you wouldn't bother with OFFSET/LIMIT because you knew it
wasn't going to be inlined, that entire line of thinking is for
subqueries, not CTEs.  If you're going to force people to change their
CTEs to require that they not be inlined, let's not pick a method which
makes it ambiguous and makes us have to ask "do they really want this
limit/offset, or did they just want to make the CTE not be inlined...?"

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Early WIP/PoC for inlining CTEs

2018-11-16 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> * I have no faith in the idea that we can skip doing a copyObject
>  Tom> on the inlined subquery, except maybe in the case where we know
>  Tom> there's exactly one reference.

> The code doesn't do a copyObject on the query if there are no level
> changes because everywhere else just does another copyObject on it as
> the first processing step (cf. set_subquery_pathlist and the various
> functions called from pull_up_subqueries).

Perhaps it's safe today, but is that likely to remain true?  We've had
enough pain in the past from multiply-linked parse subtrees that I am
not eager to introduce another case, especially not here where there's
absolutely no evidence that it'd provide a meaningful performance
improvement.

>  Tom> * Speaking of the comments, I'm not convinced that view rewrite is
>  Tom> a good comparison point; I think this is more like subquery
>  Tom> pullup.

> It's not really like subquery pullup because we actually do go on to do
> subquery pullup _after_ inlining CTEs.

Subquery pullup can happen across multiple levels, too.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2018-11-16 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> [ inlining-ctes-v5.patch ]

 Tom> I took a little bit of a look through this.  Some thoughts:

 Tom> * I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be
 Tom> an alternate way of keeping it from being inlined. As the patch
 Tom> stands, if that's the behavior you want, you have no way to
 Tom> express it in a query that will also work in older servers. (I
 Tom> will manfully resist suggesting that then we don't need the
 Tom> nonstandard syntax at all ... oops, too late.)

I think this is the wrong approach, because you may want the
optimization-barrier effects of OFFSET/LIMIT _without_ the actual
materialization - there is no need to force a query like

with d as (select stuff from bigtable offset 1) select * from d;

to push all the data through an (on-disk) tuplestore.

 Tom> * I have no faith in the idea that we can skip doing a copyObject
 Tom> on the inlined subquery, except maybe in the case where we know
 Tom> there's exactly one reference.

The code doesn't do a copyObject on the query if there are no level
changes because everywhere else just does another copyObject on it as
the first processing step (cf. set_subquery_pathlist and the various
functions called from pull_up_subqueries).

 Tom> * In "here's where we do the actual substitution", if we're going
 Tom> to scribble on the RTE rather than make a new one, we must take
 Tom> pains to zero out the RTE_CTE-specific fields so that the RTE
 Tom> looks the same as if it had been a RTE_SUBQUERY all along; cf
 Tom> db1071d4e.

Yes, this needs fixing. (This code predates that commit.)

 Tom> * Speaking of the comments, I'm not convinced that view rewrite is
 Tom> a good comparison point; I think this is more like subquery
 Tom> pullup.

It's not really like subquery pullup because we actually do go on to do
subquery pullup _after_ inlining CTEs.

 Tom> * I wonder whether we should make range_table_walker more friendly
 Tom> to the needs of this patch. The fact that it doesn't work for this
 Tom> usage suggests that it might not work for others, too. I could see
 Tom> replacing the QTW_EXAMINE_RTES flag with QTW_EXAMINE_RTES_BEFORE
 Tom> and QTW_EXAMINE_RTES_AFTER so that callers could say which order
 Tom> of operations they want (ie, visit RTE itself before or after its
 Tom> substructure). Then we could get rid of the double traversal of
 Tom> the RTE list.

Sure, why not.

-- 
Andrew.



Re: Early WIP/PoC for inlining CTEs

2018-11-16 Thread David Fetter
On Fri, Nov 16, 2018 at 04:15:10PM -0500, Tom Lane wrote:
> Andreas Karlsson  writes:
> >  [ inlining-ctes-v5.patch ]
> 
> I took a little bit of a look through this.  Some thoughts:
> 
> * I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be an
> alternate way of keeping it from being inlined.  As the patch stands,
> if that's the behavior you want, you have no way to express it in
> a query that will also work in older servers.  (I will manfully
> resist suggesting that then we don't need the nonstandard syntax
> at all ... oops, too late.)

If we're on board with exposing pilot error, we could decide not to
implement the nonstandard WITH syntax. One type of pilot error this
would expose is a situation where:

- A UDF has side effects, but is declared IMMUTABLE
- A WITH clause calls it in order to get those side effects on the
  entire result set

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2018-11-16 Thread Tom Lane
Andreas Karlsson  writes:
>  [ inlining-ctes-v5.patch ]

I took a little bit of a look through this.  Some thoughts:

* I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be an
alternate way of keeping it from being inlined.  As the patch stands,
if that's the behavior you want, you have no way to express it in
a query that will also work in older servers.  (I will manfully
resist suggesting that then we don't need the nonstandard syntax
at all ... oops, too late.)

* This is not the idiomatic way to declare an expression tree walker:

+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+   struct inline_cte_walker_ctx *ctx = ctxp;

* I have no faith in the idea that we can skip doing a copyObject on the
inlined subquery, except maybe in the case where we know there's exactly
one reference.

* In "here's where we do the actual substitution", if we're going to
scribble on the RTE rather than make a new one, we must take pains
to zero out the RTE_CTE-specific fields so that the RTE looks the
same as if it had been a RTE_SUBQUERY all along; cf db1071d4e.

* The lack of comments about what conditions we inline under
(at subselect.c:1318) is distressing.  I'm not particularly
in love with the way that inline_cte_walker is commented, either.
And dare I mention that this falsifies the intro comment for
SS_process_ctes?

* Speaking of the comments, I'm not convinced that view rewrite is
a good comparison point; I think this is more like subquery pullup.

* I wonder whether we should make range_table_walker more friendly
to the needs of this patch.  The fact that it doesn't work for this usage
suggests that it might not work for others, too.  I could see replacing
the QTW_EXAMINE_RTES flag with QTW_EXAMINE_RTES_BEFORE and
QTW_EXAMINE_RTES_AFTER so that callers could say which order of operations
they want (ie, visit RTE itself before or after its substructure).  Then
we could get rid of the double traversal of the RTE list.

* I think a large fraction of the regression test cases that this
changes are actually broken by the patch, in the sense that we meant
to test the behavior with a CTE and now we're not getting that.
So we'd need to add MATERIALIZED in many more places than this has
done.  Somebody else (Stephen?) would need to opine on whether that's
true for the CTEs in rowsecurity.sql, but it's definitely true for
the one in rowtypes.sql, where the point is to test what happens
with a whole-row Var.

* Which will mean we need some new test cases showing that this patch does
anything.  It'd be a good idea to add a test case showing that this gets
things right for conflicting CTE names at different levels, eg

explain verbose
with x as (select 1 as y)
select * from (with x as (select 2 as y) select * from x) ss;

* ruleutils.c needs adjustments for the new syntax, if we keep that.

* And of course the documentation needs much more work than this.

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2018-10-05 Thread Andrew Gierth
> "David" == David Fetter  writes:

 >> Consider the difference between (in the absence of CTE inlining):
 >> 
 >> -- inline subquery with no optimization barrier (qual may be pushed down)
 >> select * from (select x from y) s where x=1;

 David> ...and doesn't need to materialize all of y,

 >> -- inline subquery with optimization barrier (qual not pushed down)
 >> select * from (select x from y offset 0) s where x=1;
 >> 
 >> -- CTE with materialization
 >> with s as (select x from y) select * from s where x=1;

 David> while both of these do.

The non-CTE one has to _evaluate_ the whole of the "s" subquery, but it
doesn't have to actually store the result, whereas the CTE version needs
to put it all in a tuplestore and read it back.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-10-05 Thread David Fetter
On Fri, Oct 05, 2018 at 01:40:05AM +0100, Andrew Gierth wrote:
> > "Andreas" == Andreas Karlsson  writes:
> 
>  > On 10/03/2018 05:57 PM, David Fetter wrote:
>  >> Is there any meaningful distinction between "inlining," by which I
>  >> mean converting to a subquery, and predicate pushdown, which
>  >> would happen at least for a first cut, at the rewrite stage?
> 
> Yes.
> 
>  Andreas> Sorry, but I do not think I understand your question. The
>  Andreas> ability to push down predicates is just one of the potential
>  Andreas> benefits from inlining.
> 
> Consider the difference between (in the absence of CTE inlining):
> 
> -- inline subquery with no optimization barrier (qual may be pushed down)
> select * from (select x from y) s where x=1;

...and doesn't need to materialize all of y,

> -- inline subquery with optimization barrier (qual not pushed down)
> select * from (select x from y offset 0) s where x=1;
> 
> -- CTE with materialization
> with s as (select x from y) select * from s where x=1;

while both of these do.  I was interested to discover that on my
synthetic test of 10 million integers from generate_series(1,1000)
both with and without a b-tree index on x--as expected, the index has
no effect--I consistently get stuff like this:

shackle@[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs 
on, buffers on, timing on) with s as (select x from y) select * from s where 
x=1;
  QUERY PLAN
  
══
 CTE Scan on s  (cost=144247.77..369247.25 rows=5 width=4) (actual 
time=0.213..2287.355 rows=1 loops=1)
   Output: s.x
   Filter: (s.x = 1)
   Rows Removed by Filter: 999
   Buffers: shared hit=16310 read=27938, temp written=17089
   CTE s
 ->  Seq Scan on public.y  (cost=0.00..144247.77 rows=977 width=4) 
(actual time=0.208..593.426 rows=1000 loops=1)
   Output: y.x
   Buffers: shared hit=16310 read=27938
 Planning time: 0.110 ms
 Execution time: 2313.682 ms
(11 rows)

shackle@[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs 
on, buffers on, timing on) select * from (select x from y offset 0) s where x=1;
 QUERY PLAN 


 Subquery Scan on s  (cost=0.00..269247.48 rows=1 width=4) (actual 
time=0.734..1069.012 rows=1 loops=1)
   Output: s.x
   Filter: (s.x = 1)
   Rows Removed by Filter: 999
   Buffers: shared hit=16316 read=27932
   ->  Seq Scan on public.y  (cost=0.00..144247.77 rows=977 width=4) 
(actual time=0.731..539.463 rows=1000 loops=1)
 Output: y.x
 Buffers: shared hit=16316 read=27932
 Planning time: 0.114 ms
 Execution time: 1069.032 ms
(10 rows)

i.e. for this case, the CTE scan takes over 2.3x the time the simple
materialization does. Also, when I boost work_mem to 1GB (256MB wasn't
enough to avoid "temp written"), there's still a 1.8x penalty.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2018-10-04 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 > On 10/03/2018 05:57 PM, David Fetter wrote:
 >> Is there any meaningful distinction between "inlining," by which I
 >> mean converting to a subquery, and predicate pushdown, which
 >> would happen at least for a first cut, at the rewrite stage?

Yes.

 Andreas> Sorry, but I do not think I understand your question. The
 Andreas> ability to push down predicates is just one of the potential
 Andreas> benefits from inlining.

Consider the difference between (in the absence of CTE inlining):

-- inline subquery with no optimization barrier (qual may be pushed down)
select * from (select x from y) s where x=1;

-- inline subquery with optimization barrier (qual not pushed down)
select * from (select x from y offset 0) s where x=1;

-- CTE with materialization
with s as (select x from y) select * from s where x=1;

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-10-04 Thread David Fetter
On Thu, Oct 04, 2018 at 11:22:32AM +0200, Andreas Karlsson wrote:
> On 10/03/2018 05:57 PM, David Fetter wrote:
> >Is there any meaningful distinction between "inlining," by which I
> >mean converting to a subquery, and predicate pushdown, which
> >would happen at least for a first cut, at the rewrite stage?
> 
> Sorry, but I do not think I understand your question. The ability to push
> down predicates is just one of the potential benefits from inlining.

Oracle appears to have such a distinction, and it appears we don't.
https://medium.com/@hakibenita/be-careful-with-cte-in-postgresql-fca5e24d2119

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2018-10-04 Thread Andreas Karlsson

On 10/03/2018 05:57 PM, David Fetter wrote:

Is there any meaningful distinction between "inlining," by which I
mean converting to a subquery, and predicate pushdown, which
would happen at least for a first cut, at the rewrite stage?


Sorry, but I do not think I understand your question. The ability to 
push down predicates is just one of the potential benefits from inlining.


Andreas



Re: Early WIP/PoC for inlining CTEs

2018-10-03 Thread David Fetter
On Tue, Oct 02, 2018 at 12:03:06PM +0200, Andreas Karlsson wrote:
> Hi,
> 
> Here is an updated patch which adds some simple syntax for adding the
> optimization barrier. For example:
> 
> WITH x AS MATERIALIZED (
>SELECT 1
> )
> SELECT * FROM x;
> 
> Andreas

This is great!

Is there any meaningful distinction between "inlining," by which I
mean converting to a subquery, and predicate pushdown, which
would happen at least for a first cut, at the rewrite stage?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2018-10-02 Thread Andreas Karlsson

Hi,

Here is an updated patch which adds some simple syntax for adding the 
optimization barrier. For example:


WITH x AS MATERIALIZED (
   SELECT 1
)
SELECT * FROM x;

Andreas
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5ad3a..017b73e0a29 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4783f..3032aed34f9 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -501,8 +501,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afaa..43f1b0e3f15 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionand with_query is:
 
-with_query_name [ ( column_name [, ...] ) ] AS ( select | values | insert | update | delete )
+with_query_name [ ( column_name [, ...] ) ] AS [ MATERIALIZED ] ( select | values | insert | update | delete )
 
 TABLE [ ONLY ] table_name [ * ]
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 925cb8f3800..84435acb83e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2542,6 +2542,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
 	COPY_NODE_FIELD(ctequery);
 	COPY_LOCATION_FIELD(location);
 	COPY_SCALAR_FIELD(cterecursive);
+	COPY_SCALAR_FIELD(ctematerialized);
 	COPY_SCALAR_FIELD(cterefcount);
 	COPY_NODE_FIELD(ctecolnames);
 	COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3bb91c95958..65875d90201 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2794,6 +2794,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
 	COMPARE_NODE_FIELD(ctequery);
 	COMPARE_LOCATION_FIELD(location);
 	COMPARE_SCALAR_FIELD(cterecursive);
+	COMPARE_SCALAR_FIELD(ctematerialized);
 	COMPARE_SCALAR_FIELD(cterefcount);
 	COMPARE_NODE_FIELD(ctecolnames);
 	COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 22dbae15d3b..56bfdcc5d10 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3094,6 +3094,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
 	WRITE_NODE_FIELD(ctequery);
 	

Re: Early WIP/PoC for inlining CTEs

2018-10-01 Thread Andreas Karlsson

On 07/27/2018 10:10 AM, David Fetter wrote:

On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:

On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:

Please find attached the next version, which passes 'make check'.


... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).


Please find attached a patch that does.

It doesn't always pass make installcheck-world, but I need to sleep
rather than investigate that at the moment.


I took a quick look at this patch and I have a couple of comments.

1) Is it really safe, for backwards compatibility reasons, to inline 
when there are volatile functions? I imagine that it is possible that 
there are people who rely on that volatile functions inside a CTE are 
always run.


Imagine this case:

WITH cte AS (
  SELECT x, volatile_f(x) FROM tab ORDER BY x
)
SELECT * FROM cte LIMIT 10;

It could change behavior if volatile_f() has side effects and we inline 
the CTE. Is backwards compatibility for cases like this worth 
preserving? People can get the old behavior by adding OFFSET 0 or 
MATERIALIZED, but existing code would break.


2) The code in inline_cte_walker() is quite finicky. It looks correct to 
me but it is hard to follow and some simple bug could easily be hiding 
in there. I wonder if this code could be rewritten in some way to make 
it easier to follow.


3) Can you recall what the failing test was because I have so far not 
managed to reproduce it?


Andreas



Re: Early WIP/PoC for inlining CTEs

2018-08-07 Thread Andres Freund
Hi,

On 2018-08-08 16:55:22 +1200, Thomas Munro wrote:
> On Fri, Jul 27, 2018 at 8:10 PM, David Fetter  wrote:
> > On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
> >> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
> >> > Please find attached the next version, which passes 'make check'.
> >>
> >> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is 
> >> different).
> >
> > Please find attached a patch that does.
> >
> > It doesn't always pass make installcheck-world, but I need to sleep
> > rather than investigate that at the moment.
> 
> One observation I wanted to share: CTE scans inhibit parallelism today
> (something we might eventually want to fix with shared tuplestores).
> This patch therefore allows parallelism in some WITH queries, which
> seems like a very valuable thing.

Might be interesting to see how big a difference it makes for
TPC-DS. Currently the results are bad (as in many queries don't finish
in a relevant time) because it uses CTEs so widely, and there's often
predicates outside the CTE that could be pushed down.

- Andres



Re: Early WIP/PoC for inlining CTEs

2018-08-07 Thread Thomas Munro
On Fri, Jul 27, 2018 at 8:10 PM, David Fetter  wrote:
> On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
>> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
>> > Please find attached the next version, which passes 'make check'.
>>
>> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).
>
> Please find attached a patch that does.
>
> It doesn't always pass make installcheck-world, but I need to sleep
> rather than investigate that at the moment.

One observation I wanted to share: CTE scans inhibit parallelism today
(something we might eventually want to fix with shared tuplestores).
This patch therefore allows parallelism in some WITH queries, which
seems like a very valuable thing.   Example:

postgres=# create table foo as select generate_series(1, 100) i;
SELECT 100
postgres=# create table bar as select generate_series(1, 100) i;
SELECT 100
postgres=# create table baz as select generate_series(1, 100) i;
SELECT 100
postgres=# analyze;
ANALYZE

=== unpatched master ===

postgres=# explain analyze with cte as (select * from foo join bar
using (i)) select count(*) from cte join baz using (i);
QUERY PLAN
---
 Aggregate  (cost=149531.00..149531.01 rows=1 width=8) (actual
time=4400.951..4400.951 rows=1 loops=1)
   CTE cte
 ->  Hash Join  (cost=30832.00..70728.00 rows=100 width=4)
(actual time=551.243..1961.319 rows=100 loops=1)
   Hash Cond: (foo.i = bar.i)
   ->  Seq Scan on foo  (cost=0.00..14425.00 rows=100
width=4) (actual time=0.048..219.238 rows=100 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=100 width=4)
(actual time=550.477..550.478 rows=100 loops=1)
 Buckets: 131072  Batches: 16  Memory Usage: 3227kB
 ->  Seq Scan on bar  (cost=0.00..14425.00
rows=100 width=4) (actual time=0.031..213.238 rows=100
loops=1)
   ->  Hash Join  (cost=30832.00..76303.00 rows=100 width=0)
(actual time=1090.162..4279.945 rows=100 loops=1)
 Hash Cond: (cte.i = baz.i)
 ->  CTE Scan on cte  (cost=0.00..2.00 rows=100
width=4) (actual time=551.247..2564.529 rows=100 loops=1)
 ->  Hash  (cost=14425.00..14425.00 rows=100 width=4)
(actual time=538.833..538.833 rows=100 loops=1)
   Buckets: 131072  Batches: 16  Memory Usage: 3227kB
   ->  Seq Scan on baz  (cost=0.00..14425.00 rows=100
width=4) (actual time=0.039..208.658 rows=100 loops=1)
 Planning Time: 0.291 ms
 Execution Time: 4416.732 ms
(16 rows)

=== 0001-Inlining-CTEs-v0005.patch  ===

postgres=# explain analyze with cte as (select * from foo join bar
using (i)) select count(*) from cte join baz using (i);

QUERY PLAN

 Finalize Aggregate  (cost=57854.78..57854.79 rows=1 width=8) (actual
time=1441.663..1441.664 rows=1 loops=1)
   ->  Gather  (cost=57854.57..57854.78 rows=2 width=8) (actual
time=1440.506..1474.974 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=56854.57..56854.58 rows=1
width=8) (actual time=1435.017..1435.018 rows=1 loops=3)
   ->  Parallel Hash Join  (cost=30856.01..55812.90
rows=416667 width=0) (actual time=1135.164..1393.437 rows=33
loops=3)
 Hash Cond: (foo.i = baz.i)
 ->  Parallel Hash Join  (cost=15428.00..32202.28
rows=416667 width=8) (actual time=457.786..753.374 rows=33
loops=3)
   Hash Cond: (foo.i = bar.i)
   ->  Parallel Seq Scan on foo
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.094..87.666
rows=33 loops=3)
   ->  Parallel Hash  (cost=8591.67..8591.67
rows=416667 width=4) (actual time=217.222..217.222 rows=33
loops=3)
 Buckets: 131072  Batches: 16  Memory
Usage: 3520kB
 ->  Parallel Seq Scan on bar
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.061..84.631
rows=33 loops=3)
 ->  Parallel Hash  (cost=8591.67..8591.67
rows=416667 width=4) (actual time=227.240..227.241 rows=33
loops=3)
   Buckets: 131072  Batches: 16  Memory Usage: 3520kB
   ->  Parallel Seq Scan on baz
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.060..84.270
rows=33 loops=3)
 Planning Time: 0.407 ms
 Execution Time: 1475.113 ms
(18 rows)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Early WIP/PoC for inlining CTEs

2018-07-27 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 >> WITH ctename AS [[NOT] MATERIALIZED] (query)

 Andreas> I think "NOT MATERIALIZED" would be a bit misleading since the
 Andreas> planner may choose to materialize anyway,

It would certainly be possible to make an explicit NOT MATERIALIZED
override such a choice by the planner, or produce an error if it is
actually impossible to do so (e.g. using NOT MATERIALIZED on a wCTE)

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-07-27 Thread David Fetter
On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
> > Please find attached the next version, which passes 'make check'.
> 
> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).

Please find attached a patch that does.

It doesn't always pass make installcheck-world, but I need to sleep
rather than investigate that at the moment.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f2c2b80653056058b7fb799bb8b9d2c0443ddd5f Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 24 Jul 2018 22:09:55 -0700
Subject: [PATCH] Inlining CTEs v0004
To: pgsql-hack...@postgresql.org

---
 .../postgres_fdw/expected/postgres_fdw.out|  22 +--
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/nodes/outfuncs.c  |   1 +
 src/backend/nodes/readfuncs.c |   1 +
 src/backend/optimizer/plan/subselect.c| 152 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/rowsecurity.out |  92 +--
 src/test/regress/expected/rowtypes.out|  13 +-
 src/test/regress/expected/rules.out   |   6 +-
 10 files changed, 209 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c62bd..8bf011df46 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1889,21 +1889,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
 WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
- QUERY PLAN  
--
+QUERY PLAN
+--
  Limit
-   Output: t.c1_1, t.c2_1, t.c1_3
-   CTE t
- ->  Foreign Scan
-   Output: t1.c1, t1.c3, t2.c1
-   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"
-   ->  Sort
- Output: t.c1_1, t.c2_1, t.c1_3
- Sort Key: t.c1_3, t.c1_1
- ->  CTE Scan on t
-   Output: t.c1_1, t.c2_1, t.c1_3
-(12 rows)
+   Output: t1.c1, t2.c1, t1.c3
+   ->  Foreign Scan
+ Output: t1.c1, t2.c1, t1.c3
+ Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+ Remote SQL: SELECT r2."C 1", r3."C 1", r2.c3 FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (((r2."C 1" = r3."C 1" ORDER BY r2.c3 ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 17b650b8cb..8ba2fc1aed 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2530,6 +2530,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
 	COPY_NODE_FIELD(ctequery);
 	COPY_LOCATION_FIELD(location);
 	COPY_SCALAR_FIELD(cterecursive);
+	COPY_SCALAR_FIELD(ctematerialized);
 	COPY_SCALAR_FIELD(cterefcount);
 	COPY_NODE_FIELD(ctecolnames);
 	COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2facb8..2a04506d0d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2793,6 +2793,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
 	COMPARE_NODE_FIELD(ctequery);
 	COMPARE_LOCATION_FIELD(location);
 	COMPARE_SCALAR_FIELD(cterecursive);
+	COMPARE_SCALAR_FIELD(ctematerialized);
 	COMPARE_SCALAR_FIELD(cterefcount);
 	COMPARE_NODE_FIELD(ctecolnames);
 	COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6454ce28b..0ea6b7125a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3080,6 +3080,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
 	WRITE_NODE_FIELD(ctequery);
 	

Re: Early WIP/PoC for inlining CTEs

2018-07-26 Thread Thomas Munro
On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
> Please find attached the next version, which passes 'make check'.

... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Early WIP/PoC for inlining CTEs

2018-07-26 Thread David Fetter
On Thu, Jul 26, 2018 at 02:51:53PM +0200, Andreas Karlsson wrote:
> On 07/25/2018 06:08 PM, Andrew Gierth wrote:
> >WITH ctename AS [[NOT] MATERIALIZED] (query)
> 
> I think "NOT MATERIALIZED" would be a bit misleading since the
> planner may choose to materialize anyway, so I suggest skipping that
> part of the syntax unless there is a really strong reason for having
> it.

If we're going to give people specific knobs to turn as part of
queries to affect query plans[1], we should think it through at a much
higher level than this.

If we're not going to do that right away, we just need to treat
instances where the wrong call was made as planner bugs and address
them at that level.

Best,
David.

[1] Not to use the word that starts with 'h' and sounds like lints 
http://pghintplan.osdn.jp/
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Early WIP/PoC for inlining CTEs

2018-07-26 Thread Andreas Karlsson

On 07/25/2018 06:08 PM, Andrew Gierth wrote:

WITH ctename AS [[NOT] MATERIALIZED] (query)


I think "NOT MATERIALIZED" would be a bit misleading since the planner 
may choose to materialize anyway, so I suggest skipping that part of the 
syntax unless there is a really strong reason for having it.


Andreas



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread David Fetter
On Wed, Jul 25, 2018 at 04:18:42PM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> Please find attached a version rebased atop 167075be3ab1547e18
>  David> with what I believe are appropriate changes to regression test
>  David> output. The other changes to the regression tests output are
>  David> somewhat puzzling, as they change the actual results of queries.
> 
> Both of those changes are the result of volatile CTEs being inlined more
> than once (in one case, as part of an explicit test to ensure that CTEs
> are being materialized and not multiply evaluated).
> 
> If you look for the XXX comment in the patch, it should be easy to add a
> check that skips inlining if cterefcount > 1 and
> contains_volatile_functions is true.

Thanks for the broad hints!

Please find attached the next version, which passes 'make check'.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 92cc38740341b74dd7d26c0fd80d285554faf038 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 24 Jul 2018 22:09:55 -0700
Subject: [PATCH] Inlining CTEs v0003
To: pgsql-hack...@postgresql.org

---
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/nodes/outfuncs.c  |   1 +
 src/backend/nodes/readfuncs.c |   1 +
 src/backend/optimizer/plan/subselect.c| 152 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/rowsecurity.out |  92 ++---
 src/test/regress/expected/rowtypes.out|  13 +-
 src/test/regress/expected/rules.out   |   6 +-
 9 files changed, 201 insertions(+), 67 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 17b650b8cb..8ba2fc1aed 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2530,6 +2530,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
 	COPY_NODE_FIELD(ctequery);
 	COPY_LOCATION_FIELD(location);
 	COPY_SCALAR_FIELD(cterecursive);
+	COPY_SCALAR_FIELD(ctematerialized);
 	COPY_SCALAR_FIELD(cterefcount);
 	COPY_NODE_FIELD(ctecolnames);
 	COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2facb8..2a04506d0d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2793,6 +2793,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
 	COMPARE_NODE_FIELD(ctequery);
 	COMPARE_LOCATION_FIELD(location);
 	COMPARE_SCALAR_FIELD(cterecursive);
+	COMPARE_SCALAR_FIELD(ctematerialized);
 	COMPARE_SCALAR_FIELD(cterefcount);
 	COMPARE_NODE_FIELD(ctecolnames);
 	COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6454ce28b..0ea6b7125a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3080,6 +3080,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
 	WRITE_NODE_FIELD(ctequery);
 	WRITE_LOCATION_FIELD(location);
 	WRITE_BOOL_FIELD(cterecursive);
+	WRITE_BOOL_FIELD(ctematerialized);
 	WRITE_INT_FIELD(cterefcount);
 	WRITE_NODE_FIELD(ctecolnames);
 	WRITE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 9a01eb6b63..01db3c1c85 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -409,6 +409,7 @@ _readCommonTableExpr(void)
 	READ_NODE_FIELD(ctequery);
 	READ_LOCATION_FIELD(location);
 	READ_BOOL_FIELD(cterecursive);
+	READ_BOOL_FIELD(ctematerialized);
 	READ_INT_FIELD(cterefcount);
 	READ_NODE_FIELD(ctecolnames);
 	READ_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 83008d7661..5aa242ff26 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1146,6 +1146,141 @@ hash_ok_operator(OpExpr *expr)
 }
 
 
+struct inline_cte_walker_ctx
+{
+	const char *ctename;
+	int levelsup;
+	int refcount;
+	Query *ctequery;
+	CommonTableExpr *cte;
+};
+
+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+	struct inline_cte_walker_ctx *ctx = ctxp;
+
+	if (!node)
+		return false;
+
+	if (IsA(node, Query))
+	{
+		/*
+		 * This one is a bit tricky. It's our job to handle the recursion here,
+		 * but we do some of the lifting normally handled by query_tree_walker
+		 * in order to get the sequence of operations right.
+		 *
+		 * First, if the Query we're looking at is the one containing our CTE
+		 * definition, then we don't need to recurse into our own CTE or CTEs
+		 * that are earlier in the list than ours (since cteList has been
+		 * sorted for us into dependency order). We could check whether a
+		 * nested query is hiding ours, but that seems too much of an edge case
+		 * to be worth optimizing (the levelsup check will ensure we don't
+		 * 

Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Tue, Jul 24, 2018 at 07:57:49PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
> >> However, a singly-referenced SELECT CTE could reasonably be treated as
> >> equivalent to a sub-select-in-FROM, and then you would have the same
> >> mechanisms for preventing inlining as you do for those cases,
> >> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
> >> too: your code would still work the same in older releases, unlike if
> >> we invent new syntax for this.
> 
> > I still think this is just doubling down on prior mistakes.
> 
> Not following what you think a better alternative is?  I'd be the
> first to agree that OFFSET 0 is a hack, but people are used to it.
> 
> Assuming that we go for inline-by-default for at least some cases,
> there's a separate discussion to be had about whether it's worth
> making a planner-control GUC to force the old behavior.  I'm not
> very excited about that, but I bet some people will be.

It is widely known that CTEs in PG are optimizer barriers.

That actually is useful, and I do make use of that fact (though I'm not
proud of it).

My proposal is that PG add an extension for specifying that a CTE is to
be materialized (barrier) or not (then inlined).

Nico
-- 



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 05:08:43PM +0100, Andrew Gierth wrote:
>  Nico> It is possible to add a keyword for this purpose in the WITH syntax:
> 
>  Nico> WITH   VIEW (...) AS a_view
> 
> The existing (and standard) syntax is WITH ctename AS (query).

Oy, I flubbed that up.

> Syntaxes that have been suggested for explicitly controlling the
> materialization are along the lines of:
> 
> WITH ctename AS [[NOT] MATERIALIZED] (query)
> 
> (MATERIALIZED is already an un-reserved keyword)

Works for me.



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Andrew Gierth
> "Nico" == Nico Williams  writes:

 Nico> It is possible to add a keyword for this purpose in the WITH syntax:

 Nico> WITH   VIEW (...) AS a_view

The existing (and standard) syntax is WITH ctename AS (query).

Syntaxes that have been suggested for explicitly controlling the
materialization are along the lines of:

WITH ctename AS [[NOT] MATERIALIZED] (query)

(MATERIALIZED is already an un-reserved keyword)

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 07:42:37AM +0200, David Fetter wrote:
> Please find attached a version rebased atop 167075be3ab1547e18 with
> what I believe are appropriate changes to regression test output.  The
> other changes to the regression tests output are somewhat puzzling, as
> they change the actual results of queries.  I've also attached both
> the "leftover" diff and the files to which it should be applied.

I think the SQL programmer needs some control over whether a CTE is:

 - a materialized view -- and therefore a barrier
 - a view (which can then be inlined by the optimizer)

It is possible to add a keyword for this purpose in the WITH syntax:

WITH   VIEW (...) AS a_view
 , MATERIALIZED VIEW (...) AS a_barrier
...;

This would be a lot like creating TEMP views, but without the catalog
overhead.

(I wonder how hard it would be to partiion the OID namespace into
temp/persistent ranges so that temp schema elements need not be written
into the catalog.)

Nico
-- 



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> Please find attached a version rebased atop 167075be3ab1547e18
 David> with what I believe are appropriate changes to regression test
 David> output. The other changes to the regression tests output are
 David> somewhat puzzling, as they change the actual results of queries.

Both of those changes are the result of volatile CTEs being inlined more
than once (in one case, as part of an explicit test to ensure that CTEs
are being materialized and not multiply evaluated).

If you look for the XXX comment in the patch, it should be easy to add a
check that skips inlining if cterefcount > 1 and
contains_volatile_functions is true.

-- 
Andrew (irc:RhodiumToad)



Re: Early WIP/PoC for inlining CTEs

2018-07-24 Thread Andres Freund
On 2018-07-25 01:08:44 +0100, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  Andres> Even in queries with a non-0 OFFSET you can push down in a
>  Andres> number of cases,
> 
> really?

Yea. I guess it's a bit dependant on what kind of behaviour you consider
as "pushing down".  I'm doubtful it's worth the analytical complexity on
ensuring it's safe, however.  With knowledge from the outer query you
e.g. can: trim the target list; remove outer joins below the OFFSET 0;
push down a restriction into an outer join below the OFFSET if that's
guaranteed to only return max one row, and not needed if not matching
the restrcition. I'm sure you can come up with more?

Greetings,

Andres Freund



Re: Early WIP/PoC for inlining CTEs

2018-07-24 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Even in queries with a non-0 OFFSET you can push down in a
 Andres> number of cases,

really?

-- 
Andrew (irc:RhodiumToad)



  1   2   >