2017-03-01 10:56 GMT+01:00 Peter Moser <[email protected]>:
> A similar walkthrough for ALIGN will follow soon.
>
> We are thankful for any suggestion or ideas, to be used to write a
> good SGML documentation.
The attached README explains the ALIGN operation step-by-step with a
TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
input, show how we rewrite it during parser stage, and show how the
final execution generates result tuples.
Best regards,
Anton, Michael, Johann, Peter
================================================================================
ALIGN
================================================================================
This is an exemplary walkthrough of a temporal ALIGN operation, from the
query input, over the query rewrite, until the result tuple outputs during
execution.
EXAMPLE
================================================================================
These tables represent employees and projects in a company. An employee works
for a department over a certain time period. Different projects get developed
in a department. The start and end time points of each project is stored in
the period column.
CREATE TABLE emp (empname VARCHAR, dept VARCHAR, period INT4RANGE);
INSERT INTO emp VALUES
('Ann', 'DB', '[1,5)'),
('Bea', 'DB', '[3,8)'),
('Ron', 'AI', '[6,9)');
CREATE TABLE prj (prjname VARCHAR, dept VARCHAR, period INT4RANGE);
INSERT INTO prj VALUES
('PR1', 'DB', '[3,7)'),
('PR2', 'DB', '[1,5)'),
('PR3', 'HW', '[2,8)');
Timeline representation
-----------------------
EMPNAME DEPT PERIOD
Ann DB [1,5) ---- <- Tuple 1, valid from 1 to 5 excl.
Bea DB [3,8) ----- <- Tuple 2
Ron AI [6,9) --- <- Tuple 3
123456789 <- Timeline
PRJNAME DEPT PERIOD
PR1 DB [3,7) ---- <- Tuple 1, valid from 3 to 7 excl.
PR2 DB [1,5) ---- <- Tuple 2
PR3 HW [2,8) ------ <- Tuple 3
123456789 <- Timeline
TEMPORAL LEFT OUTER JOIN query
------------------------------
Query: At each time point, to which projects is an employee assigned, and
when does an employee not have an assigned project?
WITH emp AS (SELECT period u, * FROM emp),
prj AS (SELECT period v, * FROM prj)
SELECT empname, prjname, dept, period FROM (
emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
) emp_aligned
LEFT OUTER JOIN (
prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
) prj_aligned
USING(dept, period)
WHERE period = u * v OR u IS NULL OR v IS NULL;
Result
------
empname | prjname | dept | period
---------+---------+------+--------
Ann | PR2 | DB | [1,5) ----
Ann | PR1 | DB | [3,5) --
Bea | PR1 | DB | [3,7) ----
Bea | PR2 | DB | [3,5) --
Bea | | DB | [7,8) -
Ron | | AI | [6,9) ---
123456789 <- Timeline
STEP-BY-STEP EXPLANATION
================================================================================
In this chapter we describe first the rewrite and processing of the two ALIGN
clauses, that are,
( emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
) emp_aligned
...and...
( prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
) prj_aligned.
Secondly, we explain what the above query does in general, and why we need two
ALIGNs, the WHERE-clause and CTEs (WITH clauses).
ALIGN subquery processing
-------------------------
After receiving the ALIGN query (see above) as input, we rewrite it into
the following query:
SELECT emp.*, GREATEST(LOWER(emp.period), LOWER(prj.period)) p1,
LEAST(UPPER(emp.period), UPPER(prj.period)) p2
FROM
(SELECT *, row_id() OVER () rn FROM emp) emp
LEFT OUTER JOIN
prj
ON emp.dept = prj.dept AND emp.period && prj.period
ORDER BY rn, p1, p2;
Then, we rewrite the second ALIGN subquery analoguously.
Intermediate results: These are the inputs of our ALIGN execution nodes
-----------------------------------------------------------------------
(See Appendix [1] for details)
For emp ALIGN prj...
TUPID empname | dept | period | rn | p1 | p2
---------+------+--------+----+----+----
A Ann | DB | [1,5) | 1 | 1 | 5
B Ann | DB | [1,5) | 1 | 3 | 5
C Bea | DB | [3,8) | 2 | 3 | 5
D Bea | DB | [3,8) | 2 | 3 | 7
E Ron | AI | [6,9) | 3 | 6 | 9
For prj ALIGN emp...
TUPID prjname | dept | period | rn | p1 | p2
---------+------+--------+----+----+----
a PR1 | DB | [3,7) | 1 | 3 | 7
b PR1 | DB | [3,7) | 1 | 3 | 5
c PR2 | DB | [1,5) | 2 | 3 | 5
d PR2 | DB | [1,5) | 2 | 1 | 5
e PR3 | HW | [2,6) | 3 | 2 | 6
We have four possibilities inside ALIGN during execution
--------------------------------------------------------
(See Appendix [2] for details)
1) CURR == PREV
a) S < CURR.P1 --> generate (CURR, [S, CURR.P1)) and set S = CURR.P1
b) otherwise:
if OUT != CURR:
generate (CURR, [CURR.P1, CURR.P2))
set S = MAX(S, CURR.P2)
fetch next tuple
if fetched tuple is NULL --> goto(2) forcing CURR != PREV to be true
2) CURR != PREV
a) S < PREV.te --> generate (PREV, [S, PREV.te))
set S = CURR.ts and goto(1) forcing CURR == PREV to be true
Executor steps for "emp ALIGN prj ON emp.dept = prj.dept..."
------------------------------------------------------------
1) First call of ExecTemporalAdjustment: Fetch tuple A, which is now CURR.
Set sweepline S = 1 (A.ts), and copy tuple A also into PREV. Goto(1)
forcing CURR == PREV to be true. The last result tuple (OUT) is NULL.
2) (1): CURR == PREV, that is, tuple A is equal to itself.
(1b): OUT == NULL, hence we generate a result tuple:
(Ann, DB, [1,5))
...where the interval is [CURR.P1, CURR.P2) and all other attributes
are copied from the current tuple (omitting helper columns: RN, P1,
and P2). This result is now the new OUT.
We update the sweepline: S = 5 <-- MAX(1, 5).
Fetch tuple B; CURR = B, PREV = A
3) (1): A == B.
(1b): OUT != CURR (A.ts != B.p1), hence we generate a result tuple:
(Ann, DB, [3,5))
Fetch tuple C; CURR = C, PREV = B
4) (2): B != C.
We update the sweepline: S = 3 (CURR.ts)
We set PREV = C. Goto(1) forcing CURR == PREV to be true.
5) (1): C == C.
(1b): OUT != CURR (OUT.rn != CURR.rn), hence we generate a result tuple:
(Bea, DB, [3,5))
We update the sweepline: S = 5 (CURR.p2)
Fetch tuple D; CURR = D, PREV = C
6) (1): C == D.
(1b): OUT != CURR (OUT.te != CURR.p2), hence we generate a result tuple:
(Bea, DB, [3,7))
We update the sweepline: S = 7 (CURR.p2)
Fetch tuple E; CURR = E, PREV = D
7) (2): D != E.
(2a): S < D.te (7 < 8), hence we generate a result tuple:
(Bea, DB, [3,7))
We update the sweepline: S = 6 (CURR.ts)
We set PREV = E. Goto(1) forcing CURR == PREV to be true.
8) (1): E == E.
(1b): OUT != CURR (OUT.ts != CURR.p1), hence we generate a result tuple:
(Ron, AI, [6,9))
We update the sweepline: S = 9 (CURR.p2)
No new tuple can be fetched, nor is any new result tuple produced, hence we
terminate our executor.
The result of "emp ALIGN prj..." is as follows:
empname | dept | period
---------+------+--------
Ann | DB | [1,5)
Ann | DB | [3,5)
Bea | DB | [3,5)
Bea | DB | [3,7)
Bea | DB | [7,8)
Ron | AI | [6,9)
Whereas, the result of "prj ALIGN emp..." is the following:
prjname | dept | period
---------+------+--------
PR1 | DB | [3,5)
PR1 | DB | [3,7)
PR2 | DB | [1,5)
PR2 | DB | [3,5)
PR3 | HW | [2,6)
The initial goal was to find at each time point, for which projects inside a
department is an employee responsible. These can now be easily done from these
aligned inputs with standard (reused) PostgreSQL operators. In our example a
regular LEFT OUTER JOIN, which gives the output below:
empname | prjname | dept | period
---------+---------+------+--------
Ann | PR2 | DB | [1,5)
Ann | PR1 | DB | [3,5)
Ann | PR2 | DB | [3,5)
Bea | PR1 | DB | [3,7)
Bea | PR1 | DB | [3,5)
Bea | PR2 | DB | [3,5)
Bea | | DB | [7,8)
Ron | | AI | [6,9)
As you can see, some result tuples are already covered by others. For instance,
(Ann, PR2, DB, [3,5)) is already covered by (Ann, PR2, DB, [1,5)). Therefore, we
need an "absorbing" WHERE-clause, and CTEs (WITH-clauses) to compare the
original
periods with the new result tuples. After nesting the JOIN within these clauses
the final result is:
empname | prjname | dept | period
---------+---------+------+--------
Ann | PR2 | DB | [1,5)
Ann | PR1 | DB | [3,5)
Bea | PR1 | DB | [3,7)
Bea | PR2 | DB | [3,5)
Bea | | DB | [7,8)
Ron | | AI | [6,9)
Best regards,
Anton, Michael, Johann, Peter
--------------------------------------------------------------------------------
Appendix:
[1]
After the intermediate result we get an input, which is splitted into so-called
temporal groups. Each of these groups satisfy the USING-condition (a.dept =
b.dept), has overlapping periods, and a row number as ID (rn). The input is
ordered by temporal group ID, and the start and ending timepoints, i.e., P1 and
P2. Please note, that each tuple in a certain rn ID is always equal to any other
tuple in that rn ID, except for P1 and P2.
[2]
Additinal information for ALIGN:
- sweepline: to sweep from left-to-right on the time line (S)
- slots to the current and previous tuple (CURR, PREV)
- Two tuples A and B are equal, i.e. A == B, if A.rn == B.rn
- ts means LOWER(time), and te means UPPER(time)
- OUT is the last produced output tuple (last result tuple)
- OUT != CURR, if OUT.rn != CURR.rn
or OUT.ts != CURR.p1
or OUT.te != CURR.p2
or OUT == NULL
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers