On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz...@gmail.com> wrote: > >> Using common terms such as ALIGN and NORMALIZE for such a specific > >> functionality seems a bit wrong. > > > > Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be > better > > options? We are also thankful for any suggestion or comments about the > syntax. > > So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN, > except apparently there's no join type and the optimizer can never > reorder these operations with each other or with other joins. Is that > right? The optimizer changes in this patch seem fairly minimal, so > I'm guessing it can't be doing anything very complex here. > > + * INPUT: > + * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c > + * where q can be any join qualifier, and r.ts, r.te, s.ts, > and s.t > e > + * can be any column name. > + * > + * OUTPUT: > + * ( > + * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2 > + * FROM > + * ( > + * SELECT *, row_id() OVER () rn FROM r > + * ) r > + * LEFT OUTER JOIN > + * s > + * ON q AND r.ts < s.te AND r.te > s.ts > + * ORDER BY rn, P1, P2 > + * ) c > > It's hard to see what's going on here. What's ts? What's te? If you > used longer names for these things, it might be a bit more > self-documenting. > Just reasoning out loud here... ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or probably just the common "timestamp start/end") From what I can see it is affecting an intersection of the two ranges and, furthermore, splitting the LEFT range into sub-ranges that match up with the sub-ranges found on the right side. From the example above this seems like it should be acting on self-normalized ranges - but I may be missing something by evaluating this out of context. r1 [1, 6] [ts, te] [time period start, time period end] s1 [2, 3] s2 [3, 4] s3 [5, 7] r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts) r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3] r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4] r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6] Thus the intersection is [2,6] but since s1 has three ranges that begin between 2 and 6 (i.e., 2, 3, and 5) there are three output records that correspond to those sub-ranges. The description in the OP basically distinguishes between NORMALIZE and ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two ranges - discarding the non-overlapping data - while NORMALIZE performs the alignment while also retaining the non-overlapping data. The rest of the syntax seems to deal with selecting subsets of range records based upon attribute data. David J.