Re: [PROPOSAL] Temporal query processing with range types

2020-01-30 Thread Peter Moser
Hi Hackers,

On 12/1/19 4:45 AM, Michael Paquier wrote:
> For now I have moved the patch to
> next CF, waiting on author.

We have withdrawn this patch for now. The reason for this is, that we
had ideas on how to split it into multiple simpler independent patches,
that can be reviewed and committed one by one.

Best regards,
Anton and Peter




Re: [PROPOSAL] Temporal query processing with range types

2019-08-08 Thread Peter Moser
Hi Ibrar, Thomas and Robert,

On 8/2/19 11:00 PM, Ibrar Ahmed wrote:
> I have rebased the patch and currently reviewing the patch 
> on master (1e2fddfa33d3c7cc93ca3ee0f32852699bd3e012).

Thanks a lot for your effort. We are now trying to put again more work
and time in this patch.
We are grateful for any feedback.

Thanks,
Peter





Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2019-02-28 Thread Peter Moser
Dear all,
we rebased our temporal normalization patch on top of 
554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28.


On 9/7/18 1:02 PM, Peter Moser wrote:
> The syntax is
> SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

Please find all information about our decisions and current state within the 
previous email.

> What we like to discuss now is:
> - Is sort_inner_and_outer the correct place to perform this split?
> - How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
>   mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
>   not an equality operator, but if we use RANGE_EQ it assumes that the
>   right-hand-side of the operator must be a range as well.
> - Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
>   comparison, or keep RANGE_EQ and fix pathkeys later?
> - How do we update equivalence classes, and pathkeys
>   when changing the inner relation's data type from "int4range" to "int"
>   in the query tree inside "sort_inner_and_outer" to get the correct
>   ordering and data types

I will also add this prototype (WIP) patch to the commitfest of March, as 
suggested by two developers met at the
FOSDEM some weeks ago.


Best regards,
Anton, Johann, Michael, Peter

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 2a1d000b03..a309596fa1 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -99,6 +99,106 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 
+// XXX TEMPORAL NORMALIZE PEMOSER 
+// !!! THis is just for prototyping, delete asap...
+
+#include "catalog/pg_operator.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/fmgroids.h"
+#include "utils/rangetypes.h"
+#include "utils/typcache.h"
+#include "access/htup_details.h"/* for heap_getattr */
+#include "nodes/print.h"/* for print_slot */
+#include "utils/datum.h"/* for datumCopy */
+
+
+
+#define TEMPORAL_DEBUG
+/*
+ * #define TEMPORAL_DEBUG
+ * XXX PEMOSER Maybe we should use execdebug.h stuff here?
+ */
+#ifdef TEMPORAL_DEBUG
+static char*
+datumToString(Oid typeinfo, Datum attr)
+{
+	Oid typoutput;
+	booltypisvarlena;
+	getTypeOutputInfo(typeinfo, , );
+	return OidOutputFunctionCall(typoutput, attr);
+}
+
+#define TPGdebug(...)   { printf(__VA_ARGS__); printf("\n"); fflush(stdout); }
+#define TPGdebugDatum(attr, typeinfo)   TPGdebug("%s = %s %ld\n", #attr, datumToString(typeinfo, attr), attr)
+#define TPGdebugSlot(slot)  { printf("Printing Slot '%s'\n", #slot); print_slot(slot); fflush(stdout); }
+
+#else
+#define datumToString(typeinfo, attr)
+#define TPGdebug(...)
+#define TPGdebugDatum(attr, typeinfo)
+#define TPGdebugSlot(slot)
+#endif
+
+TypeCacheEntry *testmytypcache;
+#define setSweepline(datum) \
+	node->sweepline = datumCopy(datum, node->datumFormat->attbyval, node->datumFormat->attlen)
+
+#define freeSweepline() \
+	if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline))
+
+ /*
+  * slotGetAttrNotNull
+  *  Same as slot_getattr, but throws an error if NULL is returned.
+  */
+static Datum
+slotGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+
+	result = slot_getattr(slot, attnum, );
+
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+ NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname),
+ attnum)));
+
+	return result;
+}
+
+/*
+ * heapGetAttrNotNull
+ *  Same as heap_getattr, but throws an error if NULL is returned.
+ */
+static Datum
+heapGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+	HeapTuple tuple;
+
+	tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+	result = heap_getattr(tuple,
+		  attnum,
+		  slot->tts_tupleDescriptor,
+		  );
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+		NameStr(TupleDescAttr(slot->tts_tupleDescriptor,
+attnum - 1)->attname),
+		attnum)));
+
+	return result;
+}
+
+// XXX TEMPORAL NORMALIZE PEMOSER END 
+
 
 /*
  * States of the ExecMergeJoin state machine
@@ -138,6 +238,10 @@ typedef struct MergeJoinClauseData
 	 * stored here.
 	 */
 	SortSupportData ssup;
+
+	/* needed for Temporal Normalization */
+	bool isnormalize;
+	TypeCacheEntry  *range_typcache;
 

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-09-07 Thread Peter Moser

On 01/26/2018 07:55 AM, Peter Moser wrote:

We have now a new approach to plan and execute NORMALIZE as a special
join node of type NORMALIZE, an append-plan on the inner join path,
and a merge-join executor. For the latter, we would need to
extend nodeMergejoin.c with an point-in-range-containment join.


We are ready with a new prototype for the temporal NORMALIZE operation. 
In this prototype we do not rewrite queries as in the previous patch, 
but have one executor node, that solves the normalize operation. This 
executor is based on a merge-join.


Our new patch is based on top of 
75f7855369ec56d4a8e7d6eae98aff1182b85cac from September 6, 2018.


The syntax is
SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

It currently is only implemented for empty USING clauses, and solely 
int4range as range attributes.


Example:

A=# table r; 

 a | b | period_r 

---+---+-- 

 a | B | [1,7) 

 b | B | [3,9) 

 c | G | [8,10) 

(3 rows) 




A=# table s; 

 c | d | period_s 

---+---+-- 

 1 | B | [2,5) 

 2 | B | [3,4) 

 3 | B | [7,9) 

(3 rows) 




A=# SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; 

 period_r | a | b 

--+---+--- 

 [1,2)| a | B 

 [2,3)| a | B 

 [3,4)| a | B 

 [4,5)| a | B 

 [5,7)| a | B 

 [3,4)| b | B 

 [4,5)| b | B 

 [5,7)| b | B 

 [7,9)| b | B 

(9 rows) 




A=# EXPLAIN SELECT * FROM (r NORMALIZE s USING() WITH(period_r, 
period_s)) c;
QUERY PLAN 

-- 

 Result  (cost=2.15..2.22 rows=3 width=18) 

   ->  Merge ??? Join  (cost=2.15..2.23 rows=3 width=22) 

 Merge Cond: (r.period_r @> (range_split(s.period_s))) 

 ->  Sort  (cost=1.05..1.06 rows=3 width=18) 

   Sort Key: r.period_r 

   ->  Seq Scan on r  (cost=0.00..1.03 rows=3 width=18) 

 ->  Sort  (cost=1.09..1.10 rows=3 width=4) 

   Sort Key: (range_split(s.period_s)) USING < 

   ->  ProjectSet  (cost=0.00..1.07 rows=3 width=4) 

 ->  Seq Scan on s  (cost=0.00..1.03 rows=3 
width=14)
(10 rows) 





That is, we create a new join path within sort_inner_and_outer
(joinpath.c). First two projection nodes to extract the start- and
end-timepoints of the range type used as interval, and above an
append-plan to merge both subplans. In detail, outer_path is just sort,
whereas inner_path is append of (B, ts) projection with (B, te)
projection.


We changed this implementation and use a set-returning function called 
"range_split", that extracts the upper and lower bound of a range and 
returns two tuples. For instance, a tuple '[4,10),a' becomes two tuples 
of the form '4,a' and '10,a'.



Hereby, B is a set of non-temporal attributes used in join equality
predicates, and [ts,te) forms the valid-time interval. Non-equality
predicates must be handled separately as a filter step.


The current prototype supports only an integer range-type without any 
additional non-temporal attributes (empty USING clause).



Do you think, it is a good idea to extend the sort_inner_and_outer()
with a new branch, where jointype == NORMALIZE and add the projection
and append sub-paths there?


We actually extended sort_inner_and_outer now. It is an early solution, 
to support discussions. Please see the two sections starting with "if 
(jointype == JOIN_TEMPORAL_NORMALIZE)" inside sort_inner_and_outer:


The purpose of these sections is to change the inner path's range type 
into its single bounds.


We accomplish this with a new function called range_split. We take the 
inner clause and extract the second operator of an RANGE_EQ expression 
out of it. We assume *for this prototype*, that their is only one such 
operator and that it is solely used for NORMALIZE. Then, we replace it 
with range_split. A range split returns a set of tuples, hence we add a 
new "set projection path" above the inner path, and another sort path 
above that.


What we like to discuss now is:
- Is sort_inner_and_outer the correct place to perform this split?
- How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
  mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
  not an equality operator, but if we use RANGE_EQ it assumes that the
  right-hand-side of the operator must be a range as well.
- Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
  comparison, or keep RANGE_EQ and fix pathkeys later?
- How do we update equivalence classes, pathkeys, and any other struct,
  when changing the inner relation's data type from "int4range" to "int"
  in the query tree inside "sort_inner_and_outer" to get the correct
  ordering and data types


Best regards,
Anton, Johann, Michael, Peter

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/exe

Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-28 Thread Peter Moser

On 06/26/2018 07:06 PM, Tom Lane wrote:

Also worth noting is that similar issues arise elsewhere, eg we now
have "procedures" vs "functions" in a single namespace.  Let's not have
DROP TABLE acting in a way that's inconsistent with other parts of the
system.


I think, that

DROP   ...

could search within the type's namespace for the  in 
combination, and only fail if it cannot be found.


I use those commands in a project with an Java ORM in place, that 
automatically generates/updates a schema on each startup. It wrongly 
generates a table X, where it should generate a view X. Hence, I do the 
following inside an sql-script after startup:


DROP TABLE X IF EXISTS ...  
DROP VIEW X IF EXISTS ...   
CREATE VIEW X ...

It works on the first run, but not on a subsequent one, because the view 
X already exists, hence DROP TABLE X fails.


If I switch the first two lines, it fails already during the first run, 
because a table X exists...


DROP VIEW X IF EXISTS ...   
DROP TABLE X IF EXISTS ...  
CREATE VIEW X ...

It is only solvable with two different calls to the database, and error 
handling on the application side.


Intuitively, I (and also others, that I asked) think that this command 
should only fail, if a search for  in combination 
succeeds and the DROP itself fails.


In general my use-case is, that I want to delete an X in a certain 
namespace, where the type is not known in advance. I could query the 
catalog to get that information and then build a procedure to "execute" 
the right DROP command (as Pavel Stehule suggested), but that adds 
complexity to the application code, where it shouldn't be necessary IMHO.


Best regards,
Peter



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Peter Moser

On 06/26/2018 05:34 PM, Pavel Stehule wrote:
2018-06-26 17:23 GMT+02:00 Peter Moser <mailto:pitiz...@gmail.com>>:

What do you think, is it worth to create a patch to solve this
issue, where a DROP TABLE does not fail, if the given name is
actually a VIEW or vice-versa?


DROP TABLE should to remove table and nothing else, like DROP VIEW 
should to drop just view and nothing else. It is safeguard.


My last sentence is misleading. What I thought is:

DROP TABLE X;

Should not delete a view X, and result in an error only if table X does 
not exist. It should not look at views for error handling, maybe just as 
hint that there is a view X and we might use DROP VIEW X instead.


DROP TABLE IF EXISTS X;

Should also not delete a view X, and not result in any error if X is a 
view and not a table.


Hopefully I explained things better now.

Cheers,
Peter



Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Peter Moser

Hi,
I want to delete a table X, that may not exist, hence I execute

DROP TABLE IF EXISTS X;

However, if X is a view, I get an error

ERROR: "X" is not a table
HINT: Use DROP VIEW to remove a view.
SQL state: 42809

That is unexpected and also difficult to handle, if I want to be sure 
that I can delete all old tables *and* views called X, and create a new

view for instance with

CREATE VIEW X AS 

I cannot do that safely, because having both DROP commands would for 
sure cause an error and therefore a rollback.


What do you think, is it worth to create a patch to solve this issue, 
where a DROP TABLE does not fail, if the given name is actually a VIEW 
or vice-versa?


Best regards,
Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Mon, 2018-01-08 at 11:18 -0500, Robert Haas wrote:
> I also don't agree with the idea that we should reject syntax that
> doesn't appear in the SQL standard.  We have a great deal of such
> syntax already, and we add more of it in every release, and a good
> deal of it is contributed by you and your colleagues.  I don't see
> why this patch should be held to a stricter standard than we do in
> general.  I agree that there is some possibility for pain if the SQL
> standards committee adopts syntax that is similar to whatever we pick
> but different in detail, but I don't think we should be too worried
> about that unless other database systems, such as Oracle, have syntax
> that is similar to what is proposed here but different in
> detail.  The
> SQL standards committee seems to like standardizing on whatever
> companies with a lot of money have already implemented; it's unlikely
> that they are going to adopt something totally different from any
> existing system but inconveniently similar to ours.

We agree with you.

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Sun, 2018-01-07 at 09:58 +, Simon Riggs wrote:
> > The attached README explains the ALIGN operation step-by-step with
> > a TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> > input, show how we rewrite it during parser stage, and show how the
> > final execution generates result tuples.
> Sorry, this was too complex for me.
>
> Can we get a much simpler example please?

Please see the following simpler example:

DROP TABLE budg;
CREATE TABLE budg(name VARCHAR(5), amnt INTEGER, t DATERANGE);
INSERT INTO budg VALUES ('Joe', 5, '[2012/2/1,2012/9/1)');
INSERT INTO budg VALUES ('Ann', 7, '[2012/5/1,2012/9/1)');
INSERT INTO budg VALUES ('Per', 3, '[2012/4/1,2012/10/1)');
SELECT * FROM budg AS r;

SELECT *
FROM ( budg r ALIGN budg s ON s.amnt > r.amnt WITH (t,t)) r
WHERE NOT EXISTS (
  SELECT *
  FROM (budg s ALIGN budg r ON s.amnt > r.amnt WITH (t,t)) s
  WHERE s.amnt > r.amnt
  AND r.t = s.t  );

--  name | amnt |t
-- --+--+-
--  Joe  |5 | [2012-02-01,2012-05-01)
--  Ann  |7 | [2012-05-01,2012-09-01)
--  Per  |3 | [2012-09-01,2012-10-01)



Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Sat, 2018-01-06 at 20:29 +, Simon Riggs wrote:
> * various PostJoinSetProjection() functions to be called as needed
> So the idea is we enable Postgres to allow major new functionality,
> as was done for PostGIS so successfully.

If we use a post-join approach many intermediate result tuples, that do
not contribute to the final result would be emmitted, and thus the
performance would suffer.

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Thu, 2017-11-30 at 12:26 -0500, Robert Haas wrote:
> I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
> WITH (R.time, S.time) as an ordinary join on R.x = S.y with some
> extra processing afterwards.  After finding all the join partners for
> a row in R, extract all the lower and upper bounds from the S.time
> fields of the join partners and use that to emit as many rows from R
> as may be necessary.

We have now a new approach to plan and execute NORMALIZE as a special
join node of type NORMALIZE, an append-plan on the inner join path,
and a merge-join executor. For the latter, we would need to
extend nodeMergejoin.c with an point-in-range-containment join.

That is, we create a new join path within sort_inner_and_outer
(joinpath.c). First two projection nodes to extract the start- and
end-timepoints of the range type used as interval, and above an
append-plan to merge both subplans. In detail, outer_path is just sort,
whereas inner_path is append of (B, ts) projection with (B, te)
projection.
Hereby, B is a set of non-temporal attributes used in join equality
predicates, and [ts,te) forms the valid-time interval. Non-equality
predicates must be handled separately as a filter step.

Do you think, it is a good idea to extend the sort_inner_and_outer()
with a new branch, where jointype == NORMALIZE and add the projection
and append sub-paths there?

> The main problem that I see with that approach is that it
> seems desirable to separate the extra-processing-afterwards step into
> a separate executor node, and there's no way for a separate type of
> plan node to know when the join advances the outer side of the join.
> That's the purpose that row_id() is serving as you have it; we'd have
> to invent some other kind of signalling mechanism, which might not be
> entirely trivial.  :-(

One advantage of the new approach is that row_id() is no longer needed.
The executor knows that it is inside a new "group" after every NEXTOUTER,
then the whole loop of the inner relation has the same row_id. The whole
mechanism could be handled through the MergeJoinState struct.

> If we could do it, though, it might be quite a bit more efficient,
> because it would avoid scanning S twice and performing a UNION on the
> results of those scans.  Also, we wouldn't necessarily need to sort
> the whole set of rows from R, which I suspect is unavoidable in your
> implementation.  We'd just need to sort the individual groups of rows
> from S, and my guess is that in many practical cases those groups are
> fairly small.

We would need a special SEQSCAN node/executor, that does the projection
and append steps in a single read. Do you have any suggestions how to
implement this?

> I wonder what identities hold for NORMALIZE.  It does not seem to be
> commutative, but it looks like it might be associative... i.e. (R
> NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
> NORMALIZE T), perhaps?

It is not commutative and also not associative.
Assume relations that contain one tuple each as follows:

R={[1, 100)}, S={[50, 100)}, and T={[10, 20)}

(R NORMALIZE S) NORMALIZE T gives {[1, 10), [10, 20), [20,50), [50, 100)}

while

R NORMALIZE (S NORMALIZE T) gives {[1, 50), [50, 100)}

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-21 Thread Peter Moser
2017-11-14 18:42 GMT+01:00 Tom Lane :
> You might consider putting the rewriting into, um, the rewriter.
> It could be a separate pass after view expansion, if direct integration
> with the existing behavior seems unduly spaghetti-ish.  Or do it in
> an early phase of planning as he suggested.  There's not really that
> much difference between the rewriter and the planner for this purpose.
> Although one way to draw the distinction is that the output of the
> rewriter is (currently) still fully expressible as plain SQL, whereas
> once the planner goes into action the intermediate states of the tree
> might not really be SQL anymore (eg, it might contain join types that
> don't correspond to any SQL syntax).  So depending on what your rewrite
> emits, there would be a weak preference for calling it part of the
> rewriter or planner respectively.

2017-11-16 16:42 GMT+01:00 Robert Haas :
> Another thing to think about is that even though the CURRENT
> implementation just rewrites the relevant constructs as SQL, in the
> future somebody might want to do something else.  I feel like it's not
> hard to imagine a purpose-build ALIGN or NORMALIZE join type being a
> lot faster than the version that's just done by rewriting the SQL.
> That would be more work, potentially, but it would be nice if the
> initial implementation leant itself to be extended that way in the
> future, which an all-rewriter implementation would not.  On the other
> hand, maybe an early-in-the-optimizer implementation wouldn't either,
> and maybe it's not worth worrying about it anyway.  But it would be
> cool if this worked out in a way that meant it could be further
> improved without having to change it completely.

Hi hackers,
we like to rethink our approach...

For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:

SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;

Our normalization executor node needs the following input (for now
expressed in plain SQL):

SELECT R.*, p1
FROM (SELECT *, row_id() OVER () rn FROM R) R
 LEFT OUTER JOIN (
SELECT y, LOWER(time) p1 FROM S
UNION
SELECT y, UPPER(time) p1 FROM S
 ) S
 ON R.x = S.y AND p1 <@ R.time
ORDER BY rn, p1;

In other words:
1) The left subquery adds an unique ID to each tuple (i.e., rn).
2) The right subquery creates two results for each input tuple: one for
   the upper and one for the lower bound of each input tuple's valid time
   column. The boundaries get put into a single (scalar) column, namely p1.
3) We join both subqueries if the normalization predicates hold (R.x = S.y)
   and p1 is inside the time of the current outer tuple.
4) Finally, we sort the result by the unique ID (rn) and p1, and give all
   columns of the outer relation, rn and p1 back.

Our first attempt to understand the new approach would be as follows: The
left base rel of the inner left-outer-join can be expressed as a WindowAgg
node. However, the right query of the join is much more difficult to build
(maybe through hash aggregates). Both queries could be put together with a
MergeJoin for instance. However, if we create the plan tree by hand and
choose algorithms for it manually, how is it possible to have it optimized
later? Or, if that is not possible, how do we choose the best algorithms
for it?

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-16 Thread Peter Moser
2017-11-14 18:42 GMT+01:00 Tom Lane :
> Robert is correct that putting this into the parser is completely the
> wrong thing.  If you do that, then for example views using the features
> will reverse-list in the rewritten form, which we Do Not Want, even
> if the rewritten form is completely valid SQL (is it?).

Yes, the subnode to our executor is rewritten in valid SQL.

> You might consider putting the rewriting into, um, the rewriter.
> It could be a separate pass after view expansion, if direct integration
> with the existing behavior seems unduly spaghetti-ish.  Or do it in
> an early phase of planning as he suggested.  There's not really that
> much difference between the rewriter and the planner for this purpose.
> Although one way to draw the distinction is that the output of the
> rewriter is (currently) still fully expressible as plain SQL, whereas
> once the planner goes into action the intermediate states of the tree
> might not really be SQL anymore (eg, it might contain join types that
> don't correspond to any SQL syntax).  So depending on what your rewrite
> emits, there would be a weak preference for calling it part of the
> rewriter or planner respectively.

Thank you for your feedback. We'll have a look at this and come back to you.



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-14 Thread Peter Moser
2017-11-11 13:19 GMT+01:00 Robert Haas :
> This is really good input.  If the feature weren't useful, then it
> wouldn't make sense to try to figure out how to integrate it, but if
> it is, then we should try.

We are happy to hear this and will do the implementation.  Any input
regarding the implementation is much appreciated.

> I don't think that implementing a feature like this by SQL
> transformation can work.  It's certainly got the advantage of
> simplicity of implemention, but there are quite a few things that seem
> like they won't always work correctly.
> [...]
> Overall, I think that the whole approach here probably needs to be
> scrapped and rethought.  The stuff this patch is doing really belongs
> in the optimizer, not the parser, I think.  It could possibly happen
> at a relatively early stage in the optimizer so that the rest of the
> optimizer can see the results of the transformation and, well,
> optimize.  But parse time is way too early.

We create this query rewrites during parser stage, because we want
that the optimizer chooses the best strategies for each rewritten
subplan and that our executor nodes get the desired input format in
the most optimal way. Our goal was an integration that re-uses the
existing PostgreSQL rewrites and optimizations fully.

Another approach is to optimize the temporal primitives manually.
This does not reuse existing PostgreSQL optimizations automatically.

Is there a general guideline or policy as to which approach is
preferable?

Regarding all the other issues, we will look into them in detail and
report back soon.


Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-14 Thread Peter Moser
2017-10-06 19:22 GMT+02:00 Paul A Jungwirth :
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>
> Almost every project I work on could use temporal database support,
> but there is nothing available in the Open Source world. The
> temporal_tables extension [3] offers transaction-time support, which
> is great for auditing, but it has no valid-time support (aka
> application-time or state-time). Same with Magnus Hagander's TARDIS
> approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
> other project I've seen. But valid-time is the more valuable
> dimension, because it tells you the history of the thing itself (not
> just when the database was changed). Also nothing is even attempting
> full bitemporal support.
>
> The ideas behind temporal data are covered extensively in Snodgrass's
> 1999 book [6], which shows how valuable it is to handle temporal data
> in a principled way, rather than ad hoc. But that book also
> demonstrates how complex the queries become to do things like temporal
> foreign key constraints and temporal joins. I was sad to learn that
> his proposed TSQL2 was rejected as a standard back in the 90s,
> although the critiques by C. J. Date [7] have some merit. In
> particular, since TSQL2 used *statement* modifiers, some of the
> behavior was unclear or bad when using subqueries, views, and
> set-returning functions. It makes more sense to have temporal
> *operators*, so alongside inner join you have temporal inner join, and
> likewise with temporal left outer join, temporal
> union/intersection/difference, temporal aggregation, etc. (I think the
> drawbacks of TSQL2 came from pursuing an unachievable goal, which was
> to enable seamlessly converting existing non-temporal tables to
> temporal without breaking any queries.)
>
> Another unsatisfactory approach at historical data, from the industry
> rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
> *Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
> is to ignore the problem and overwrite old data with new. His Type 2
> approach (make a new row) is better but loses the continuity between
> the old row and the new. Type 3 fixes that but supports only one
> change, not several. And anyway his ideas are tailored to star-schema
> designs so are not as broadly useful. Workarounds like bridge tables
> and "put the data in the fact table" are even more wedded to a
> star-schema approach. But I think his efforts do show how valuable
> historical data is, and how hard it is to handle without built-in
> support.
>
> As far as I can tell SQL:2011 avoids the statement modifier problem
> (I'm not 100% sure), but it is quite limited, mostly covering
> transaction-time semantics and not giving any way to do valid-time
> outer joins or aggregations. It is clearly an early first step.
> Unfortunately the syntax feels (to me) crippled by over-specificity,
> like it will have a hard time growing to support all the things you'd
> want to do.
>
> The research by Dignös et al shows how you can define temporal
> variants for every operator in the relational algebra, and then
> implement them by using just two transformations (ALIGN and NORMALIZE)
> combined with the existing non-temporal operators. It has a strong
> theoretical basis and avoids the TSQL2 problems with composability.
> And unlike SQL:2011 it has a great elegance and completeness I haven't
> seen anywhere else.
>
> I believe with range types the approach was to build up useful
> primitives rather than jumping straight to a less-factored full
> implementation of temporal features. (This in spite of SQL:2011
> choosing to model begin/end times as separate columns, not as ranges.
> :-) It seems to me the Dignös work follows the same philosophy. Their
> ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
> they are also useful for much more. In their papers they actually
> suggest that these transformations need not be exposed to end-users,
> although it was convenient to have access to them for their own
> research. I think it'd be great if Postgres's SQL dialect supported
> them though, since SQL:2011 leaves out so much.
>
> Anyway, I wanted to thank them for their excellent work, their
> generosity, and also their perseverance. ([1] is from 2012 and was
> built against Postgres 9.0!) I hope we take their contribution
> seriously, because it would truly move Postgres's temporal support
> beyond any database on the market.