On Tue, Nov 21, 2017 at 6:36 PM, Peter Moser <pitiz...@gmail.com> wrote: > 2017-11-14 18:42 GMT+01:00 Tom Lane <t...@sss.pgh.pa.us>: >> You might consider putting the rewriting into, um, the rewriter. >> It could be a separate pass after view expansion, if direct integration >> with the existing behavior seems unduly spaghetti-ish. Or do it in >> an early phase of planning as he suggested. There's not really that >> much difference between the rewriter and the planner for this purpose. >> Although one way to draw the distinction is that the output of the >> rewriter is (currently) still fully expressible as plain SQL, whereas >> once the planner goes into action the intermediate states of the tree >> might not really be SQL anymore (eg, it might contain join types that >> don't correspond to any SQL syntax). So depending on what your rewrite >> emits, there would be a weak preference for calling it part of the >> rewriter or planner respectively. > > 2017-11-16 16:42 GMT+01:00 Robert Haas <robertmh...@gmail.com>: >> Another thing to think about is that even though the CURRENT >> implementation just rewrites the relevant constructs as SQL, in the >> future somebody might want to do something else. I feel like it's not >> hard to imagine a purpose-build ALIGN or NORMALIZE join type being a >> lot faster than the version that's just done by rewriting the SQL. >> That would be more work, potentially, but it would be nice if the >> initial implementation leant itself to be extended that way in the >> future, which an all-rewriter implementation would not. On the other >> hand, maybe an early-in-the-optimizer implementation wouldn't either, >> and maybe it's not worth worrying about it anyway. But it would be >> cool if this worked out in a way that meant it could be further >> improved without having to change it completely. > > Hi hackers, > we like to rethink our approach... > > For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE: > > SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c; > > Our normalization executor node needs the following input (for now > expressed in plain SQL): > > SELECT R.*, p1 > FROM (SELECT *, row_id() OVER () rn FROM R) R > LEFT OUTER JOIN ( > SELECT y, LOWER(time) p1 FROM S > UNION > SELECT y, UPPER(time) p1 FROM S > ) S > ON R.x = S.y AND p1 <@ R.time > ORDER BY rn, p1; > > In other words: > 1) The left subquery adds an unique ID to each tuple (i.e., rn). > 2) The right subquery creates two results for each input tuple: one for > the upper and one for the lower bound of each input tuple's valid time > column. The boundaries get put into a single (scalar) column, namely p1. > 3) We join both subqueries if the normalization predicates hold (R.x = S.y) > and p1 is inside the time of the current outer tuple. > 4) Finally, we sort the result by the unique ID (rn) and p1, and give all > columns of the outer relation, rn and p1 back. > > Our first attempt to understand the new approach would be as follows: The > left base rel of the inner left-outer-join can be expressed as a WindowAgg > node. However, the right query of the join is much more difficult to build > (maybe through hash aggregates). Both queries could be put together with a > MergeJoin for instance. However, if we create the plan tree by hand and > choose algorithms for it manually, how is it possible to have it optimized > later? Or, if that is not possible, how do we choose the best algorithms > for it?
As far as I can see, this patch has received some feedback. In order to digest them properly, I am marking the patch as returned with feedback. -- Michael