Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

David G. Johnston Wed, 15 Feb 2017 12:34:25 -0800

On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz...@gmail.com> wrote:
> >> Using common terms such as ALIGN and NORMALIZE for such a specific
> >> functionality seems a bit wrong.
> >
> > Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be
> better
> > options? We are also thankful for any suggestion or comments about the
> syntax.
>
> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
> except apparently there's no join type and the optimizer can never
> reorder these operations with each other or with other joins.  Is that
> right?  The optimizer changes in this patch seem fairly minimal, so
> I'm guessing it can't be doing anything very complex here.
>
> + * INPUT:
> + *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
> + *             where q can be any join qualifier, and r.ts, r.te, s.ts,
> and s.t
> e
> + *             can be any column name.
> + *
> + * OUTPUT:
> + *             (
> + *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
> + *      FROM
> + *      (
> + *             SELECT *, row_id() OVER () rn FROM r
> + *      ) r
> + *      LEFT OUTER JOIN
> + *      s
> + *      ON q AND r.ts < s.te AND r.te > s.ts
> + *      ORDER BY rn, P1, P2
> + *      ) c
>
> It's hard to see what's going on here.  What's ts?  What's te?  If you
> used longer names for these things, it might be a bit more
> self-documenting.
>

Just reasoning out loud here...

ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or
probably just the common "timestamp start/end")

From what I can see it is affecting an intersection of the two ranges and,
furthermore, splitting the LEFT range into sub-ranges that match up with
the sub-ranges found on the right side.  From the example above this seems
like it should be acting on self-normalized ranges - but I may be missing
something by evaluating this out of context.

r1 [1, 6] [ts, te] [time period start, time period end]
s1 [2, 3]
s2 [3, 4]
s3 [5, 7]

r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)

r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]

Thus the intersection is [2,6] but since s1 has three ranges that begin
between 2 and 6 (i.e., 2, 3, and 5) there are three output records that
correspond to those sub-ranges.

The description in the OP basically distinguishes between NORMALIZE and
ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
ranges - discarding the non-overlapping data - while NORMALIZE performs the
alignment while also retaining the non-overlapping data.

The rest of the syntax seems to deal with selecting subsets of range
records based upon attribute data.

David J.

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

Reply via email to