Re: [HACKERS] CTE inlining

2017-05-09 Thread Ilya Shkuratov

Ok, it seems that most people in discussion are agree that removing optimization
fence is a right thing to do. But so far the main topic was whether it worth to 
make "inlining" by default, and how we should enable it.

Nonetheless I still hoping to discuss the algorithm and its implementation. 

I suppose, in case of a single reference we can validate CTE subquery and 
inline it
just before SS_process_ctes() in subquery_planner() and then process remaining
CTEs as before. 

The case of multiple reference is more interesting.
Ideally, we would decide whether to inline just before pull_up_sublinks(), so 
all 
the optimizations can be applied to inlined subquery. But It is impossible as 
we 
have no information to build subquery paths and estimate they costs at this 
point. 
All necessary initialization is performed in query_planner(), that invoked far
later in grouping_planner(). (As far as I understand.)

The most straighforward way is to compare CTE scan cost with subquery execution
and result scan cost in set_rel_size(), just after set_cte_pathlist(), and 
alter 
RelOptInfo, if we choose to inline.
(e.g (CTE scan) < (cheapest_path(subquery) + subquery scan))
This way we still can push down predicates as it is performed in 
set_subquery_pathlist(), but we missed pull_up_subquery().
Besides, it seems like a dirty quick solution.

Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure.

So what is a right way to conduct comparison between CTE scan and subquery 
execution with subsequent scan?

I am new to PostgreSQL development, so I need a guidance from someone who 
familiar with optimizer infrastructure to ensure that I moving in a right 
direction and not making something weird.


P.S. There is a paper [1] describing implementation of CTE optimization in Orca 
optimizer. It may be useful, though architecture is completely different.

[1] Optimization of Common Table Expressions in MPP Database Systems 
(http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf)


Ilya Shkuratov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-01 Thread Ilya Shkuratov
  30.04.2017, 08:58, "Craig Ringer" :  On 30 Apr. 2017 13:28, "Andres Freund"  wrote:On 2017-04-30 00:28:46 -0400, Tom Lane wrote:> There's already a pretty large hill to climb here in the way of> breaking peoples' expectations about CTEs being optimization> fences.  Breaking the documented semantics about CTEs being> single-evaluation seems to me to be an absolute non-starter. If all referenced functions are non-volatile, I don't quite see theproblem?  Personally I believe we'll have to offer a properanti-inlining workaround anyway, and in that case there's really nothingthat should stop us from inlining CTE without volatile functions twice? Exactly. The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever? I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right?  We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent. Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls.  Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead. Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.  Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.So we don't break peoples' expectations, nor documented semantics.

[HACKERS] CTE inlining

2017-04-29 Thread Ilya Shkuratov
Hello, dear hackers!

There is task in todo list about optional CTE optimization fence 
disabling.

I am not interested at this point in disabling mechanism 
implementation, but I would like to discuss the optimization 
mechanism, that should work when the fence is disabled.


It seems, that we can replace CTE with subquery, so the optimizer 
can do all available optimizations. This idea is quite 
straightforward, but I could not find a discussion of it. 
(Maybe it is so, because everyone knows that the idea is bad and it is 
not worth to discuss. But I hope it is not, so I start this thread. =))

First of all, to such replacement to be valid, the CTE must be 
1. non-writable (e.g. be of form: SELECT ...),
2. do not use VOLATILE or STABLE functions,
3. ... (maybe there must be more restrictions?) 

Also, before inlining, we should check that some optimization 
can be applied, using functions from 
'pull_up_subqueries_recurse' and 'subquery_push_qual'.

If it is true, and there only one reference to CTE, 
we can inline it immediately.


What it is not clear is how we should estimate whether it is worth 
to inline, when there is multiple references. Here are my preliminary
ideas.


Let consider "pull up subquery" and "push down qualifiers" cases 
separately.

For "push down qualifiers", if `subquery_push_qual` is `true`, 
we can do the following: 
1. copy CTE subquery,
2. push down quals,
3. find paths,
3. inline if cost of 
(CTE scan) > (cheapest_path(subquery) + subquery scan) 

Probably, this approach is not feasible, because it involves subquery 
replaning, and we should consider a more "lightweight" heuristic.

For "pull up subquery" similar approach may lead to duplicate planning 
of the whole query, that almost sure is too expensive.
So I wonder, is it possible to estimate a join predicate selectivity 
against CTE subquery result and inline it if selectivity is "high" enough?
(If it is possible the same can be applied to the first case.)


I would be glad to hear feedback on described approach.

Ilya Shkuratov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers