On Wed, Feb 15, 2017 at 9:33 PM, David G. Johnston <david.g.johns...@gmail.com> wrote: > On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmh...@gmail.com> wrote: >> >> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN, >> except apparently there's no join type and the optimizer can never >> reorder these operations with each other or with other joins. Is that >> right? The optimizer changes in this patch seem fairly minimal, so >> I'm guessing it can't be doing anything very complex here. >> >> + * INPUT: >> + * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c >> + * where q can be any join qualifier, and r.ts, r.te, s.ts, >> and s.t >> e >> + * can be any column name. >> + * >> + * OUTPUT: >> + * ( >> + * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2 >> + * FROM >> + * ( >> + * SELECT *, row_id() OVER () rn FROM r >> + * ) r >> + * LEFT OUTER JOIN >> + * s >> + * ON q AND r.ts < s.te AND r.te > s.ts >> + * ORDER BY rn, P1, P2 >> + * ) c >> >> It's hard to see what's going on here. What's ts? What's te? If you >> used longer names for these things, it might be a bit more >> self-documenting. > > > Just reasoning out loud here... > > ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or > probably just the common "timestamp start/end") > > From what I can see it is affecting an intersection of the two ranges and, > furthermore, splitting the LEFT range into sub-ranges that match up with the > sub-ranges found on the right side. From the example above this seems like > it should be acting on self-normalized ranges - but I may be missing > something by evaluating this out of context. > > r1 [1, 6] [ts, te] [time period start, time period end] > s1 [2, 3] > s2 [3, 4] > s3 [5, 7] > > r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts) > > r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3] > r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4] > r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6] > > Thus the intersection is [2,6] but since s1 has three ranges that begin > between 2 and 6 (i.e., 2, 3, and 5) there are three output records that > correspond to those sub-ranges.
Yes, this is what the internal rewriting produces for r1. Note that till now we only support half-open ranges, i.e., [), but for visibility I will continue this example using closed ranges []. The executor function (ExecTemporalAdjustment) gets this (the output above) as the input and will then produce: r1[1, 1] r1[2, 3] r1[3, 4] r1[5, 6] Which means also for the ALIGN the non-overlapping parts are retained. > > The description in the OP basically distinguishes between NORMALIZE and > ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two > ranges - discarding the non-overlapping data - while NORMALIZE performs the > alignment while also retaining the non-overlapping data. Also for ALIGN we retain the non-overlapping part. Intersections are symmetric/commutative, so a subsequent outer join can then use equality on the ranges to produce join matches (for overlapping) as well as null-extend the produced non-overlapping parts. The difference between ALIGN and NORMALIZE is how they split, while ALIGN produces intersections between pairs of tuples (used for joins) and the non-overlapping parts, NORMALIZE produces intersections between groups of tuples (used for aggregation, so that all tuples with the same group have equal ranges) and the non-overlapping parts. For instance, the NORMALIZE between r1, s1, s2, and s3 in your example above would give the following: r1[1, 1] r1[2, 2] r1[3, 3] r1[4, 4] r1[5, 6] > > The rest of the syntax seems to deal with selecting subsets of range records > based upon attribute data. Yes, exactly! Best regards, Anton, Johann, Michael, Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers