Re: [HACKERS] CTE inlining
Ok, it seems that most people in discussion are agree that removing optimization fence is a right thing to do. But so far the main topic was whether it worth to make "inlining" by default, and how we should enable it. Nonetheless I still hoping to discuss the algorithm and its implementation. I suppose, in case of a single reference we can validate CTE subquery and inline it just before SS_process_ctes() in subquery_planner() and then process remaining CTEs as before. The case of multiple reference is more interesting. Ideally, we would decide whether to inline just before pull_up_sublinks(), so all the optimizations can be applied to inlined subquery. But It is impossible as we have no information to build subquery paths and estimate they costs at this point. All necessary initialization is performed in query_planner(), that invoked far later in grouping_planner(). (As far as I understand.) The most straighforward way is to compare CTE scan cost with subquery execution and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter RelOptInfo, if we choose to inline. (e.g (CTE scan) < (cheapest_path(subquery) + subquery scan)) This way we still can push down predicates as it is performed in set_subquery_pathlist(), but we missed pull_up_subquery(). Besides, it seems like a dirty quick solution. Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure. So what is a right way to conduct comparison between CTE scan and subquery execution with subsequent scan? I am new to PostgreSQL development, so I need a guidance from someone who familiar with optimizer infrastructure to ensure that I moving in a right direction and not making something weird. P.S. There is a paper [1] describing implementation of CTE optimization in Orca optimizer. It may be useful, though architecture is completely different. [1] Optimization of Common Table Expressions in MPP Database Systems (http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf) Ilya Shkuratov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
30.04.2017, 08:58, "Craig Ringer" : On 30 Apr. 2017 13:28, "Andres Freund"wrote:On 2017-04-30 00:28:46 -0400, Tom Lane wrote:> There's already a pretty large hill to climb here in the way of> breaking peoples' expectations about CTEs being optimization> fences. Breaking the documented semantics about CTEs being> single-evaluation seems to me to be an absolute non-starter. If all referenced functions are non-volatile, I don't quite see theproblem? Personally I believe we'll have to offer a properanti-inlining workaround anyway, and in that case there's really nothingthat should stop us from inlining CTE without volatile functions twice? Exactly. The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever? I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right? We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent. Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls. Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead. Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done. Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.So we don't break peoples' expectations, nor documented semantics.
[HACKERS] CTE inlining
Hello, dear hackers! There is task in todo list about optional CTE optimization fence disabling. I am not interested at this point in disabling mechanism implementation, but I would like to discuss the optimization mechanism, that should work when the fence is disabled. It seems, that we can replace CTE with subquery, so the optimizer can do all available optimizations. This idea is quite straightforward, but I could not find a discussion of it. (Maybe it is so, because everyone knows that the idea is bad and it is not worth to discuss. But I hope it is not, so I start this thread. =)) First of all, to such replacement to be valid, the CTE must be 1. non-writable (e.g. be of form: SELECT ...), 2. do not use VOLATILE or STABLE functions, 3. ... (maybe there must be more restrictions?) Also, before inlining, we should check that some optimization can be applied, using functions from 'pull_up_subqueries_recurse' and 'subquery_push_qual'. If it is true, and there only one reference to CTE, we can inline it immediately. What it is not clear is how we should estimate whether it is worth to inline, when there is multiple references. Here are my preliminary ideas. Let consider "pull up subquery" and "push down qualifiers" cases separately. For "push down qualifiers", if `subquery_push_qual` is `true`, we can do the following: 1. copy CTE subquery, 2. push down quals, 3. find paths, 3. inline if cost of (CTE scan) > (cheapest_path(subquery) + subquery scan) Probably, this approach is not feasible, because it involves subquery replaning, and we should consider a more "lightweight" heuristic. For "pull up subquery" similar approach may lead to duplicate planning of the whole query, that almost sure is too expensive. So I wonder, is it possible to estimate a join predicate selectivity against CTE subquery result and inline it if selectivity is "high" enough? (If it is possible the same can be applied to the first case.) I would be glad to hear feedback on described approach. Ilya Shkuratov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers