Re: SQL:2011 application time

2024-05-13 Thread Paul Jungwirth

On 5/13/24 03:11, Peter Eisentraut wrote:
It looks like we missed some of these fundamental design questions early on, and it might be too 
late now to fix them for PG17.


For example, the discussion on unique constraints misses that the question of null values in unique 
constraints itself is controversial and that there is now a way to change the behavior.  So I 
imagine there is also a selection of possible behaviors you might want for empty ranges.  
Intuitively, I don't think empty ranges are sensible for temporal unique constraints.  But anyway, 
it's a bit late now to be discussing this.


I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan 
to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems.


Some of these issues might be design flaws in the underlying mechanisms, like range types and 
exclusion constraints.  Like, if you're supposed to use this for scheduling but you can use empty 
ranges to bypass exclusion constraints, how is one supposed to use this?  Yes, a check constraint 
using isempty() might be the right answer.  But I don't see this documented anywhere.


On the technical side, adding an implicit check constraint as part of a primary key constraint is 
quite a difficult implementation task, as I think you are discovering.  I'm just reminded about how 
the patch for catalogued not-null constraints struggled with linking these not-null constraints to 
primary keys correctly.  This sounds a bit similar.


I'm afraid that these issues cannot be resolved in good time for this release, so we should revert 
this patch set for now.


I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either. 
I'd be happy to have some more time to rework this for v18.


A couple alternatives I'd like to explore:

1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't 
plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise.


2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty' 
with anything else could still be false (or not). That operator would prevent duplicates in an 
exclusion constraint. This also means we could support more types than just ranges & multiranges. I 
need to think about whether this combines badly with existing operators, but if not it has a lot of 
promise. If anything it might be *less* contradictory, because it fits better with 'empty' @> 
'empty', which we say is true.


Another thing a revert would give me some time to consider: even though it's not standard syntax, I 
wonder if we want to require syntax something like `PRIMARY KEY USING gist (id, valid_at WITHOUT 
OVERLAPS)`. Everywhere else we default to btree, so defaulting to gist feels a little weird. In 
theory we could even someday support WITHOUT OVERLAPS with btree, if we taught that AM to answer 
that question. (I admit there is probably not a lot of desire for that though.)


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-05-13 Thread Paul Jungwirth

On 5/12/24 08:51, Paul Jungwirth wrote:

On 5/12/24 05:55, Matthias van de Meent wrote:

  > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during);
  > ERROR:  access method "gist" does not support unique indexes

To me that error message seems correct. The programmer hasn't said anything 
about the special
temporal behavior they are looking for.


But I showed that I had a GIST index that does have the indisunique
flag set, which shows that GIST does support indexes with unique
semantics.

That I can't use CREATE UNIQUE INDEX to create such an index doesn't
mean the feature doesn't exist, which is what the error message
implies.


True, the error message is not really telling the truth anymore. I do think most people who hit this 
error are not thinking about temporal constraints at all though, and for non-temporal constraints it 
is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the 
*constraint*. So how about adding a hint, something like this?:


ERROR:  access method "gist" does not support unique indexes
HINT: To create a unique constraint with non-overlap behavior, use ADD 
CONSTRAINT ... WITHOUT OVERLAPS.


I thought a little more about eventually implementing WITHOUT OVERLAPS support for CREATE INDEX, and 
how it relates to this error message in particular. Even when that is done, it will still depend on 
the stratnum support function for the keys' opclasses, so the GiST AM itself will still have false 
amcanunique, I believe. Probably the existing error message is still the right one. The hint won't 
need to mention ADD CONSTRAINT anymore. It should still point users to WITHOUT OVERLAPS, and 
possibly the stratnum support function too. I think what we are doing for v17 is all compatible with 
that plan.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-05-12 Thread Paul Jungwirth

On 5/5/24 20:01, jian he wrote:

hi.
I hope I understand the problem correctly.
my understanding is that we are trying to solve a corner case:
create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
insert into t values ('[1,2]','empty'), ('[1,2]','empty');


I think the entry point is ATAddCheckNNConstraint and index_create.
in a chain of DDL commands, you cannot be sure which one
(primary key constraint or check constraint) is being created first,
you just want to make sure that after both constraints are created,
then add a dependency between primary key and check constraint.

so you need to validate at different functions
(ATAddCheckNNConstraint, index_create)
that these two constraints are indeed created,
only after that we have a dependency linking these two constraints.


I've attached a patch trying to solve this problem.
the patch is not totally polished, but works as expected, and also has
lots of comments.


Thanks for this! I've incorporated it into the CHECK constraint patch with some changes. In 
particular I thought index_create was a strange place to change the conperiod value of a 
pg_constraint record, and it is not actually needed if we are copying that value correctly.


Some other comments on the patch file:

> N.B. we also need to have special care for case
> where check constraint was readded, e.g. ALTER TYPE.
> if ALTER TYPE is altering the PERIOD column of the primary key,
> alter column of primary key makes the index recreate, check constraint 
recreate,
> however, former interally also including add a check constraint.
> so we need to take care of merging two check constraint.

This is a good point. I've included tests for this based on your patch.

> N.B. the check constraint name is hard-wired, so if you create the constraint
> with the same name, PERIOD primary key cannot be created.

Yes, it may be worth doing something like other auto-named constraints and trying to avoid 
duplicates. I haven't taken that on yet; I'm curious what others have to say about it.


> N.B. what about UNIQUE constraint?

See my previous posts on this thread about allowing 'empty' in UNIQUE 
constraints.

> N.B. seems ok to not care about FOREIGN KEY regarding this corner case?

Agreed.

v3 patches attached, rebased to 3ca43dbbb6.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 4f4428fb41ea79056a13e425826fdac9c7b5d349 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Tue, 2 Apr 2024 15:39:04 -0700
Subject: [PATCH v3 1/2] Don't treat WITHOUT OVERLAPS indexes as unique in
 planner

Because the special rangetype 'empty' never overlaps another value, it
is possible for WITHOUT OVERLAPS tables to have two rows with the same
key, despite being indisunique, if the application-time range is
'empty'. So to be safe we should not treat WITHOUT OVERLAPS indexes as
unique in any proofs.

This still needs a test, but I'm having trouble finding a query that
gives wrong results.
---
 src/backend/optimizer/path/indxpath.c | 5 +++--
 src/backend/optimizer/plan/analyzejoins.c | 6 +++---
 src/backend/optimizer/util/plancat.c  | 1 +
 src/include/nodes/pathnodes.h | 2 ++
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78df..72346f78ebe 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,13 +3498,14 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
-		 * a partial index, it's useless here.  We're unable to make use of
+		 * a partial index, or if it's a WITHOUT OVERLAPS index (so not
+		 * literally unique), it's useless here.  We're unable to make use of
 		 * predOK partial unique indexes due to the fact that
 		 * check_index_predicates() also makes use of join predicates to
 		 * determine if the partial index is usable. Here we need proofs that
 		 * hold true before any joins are evaluated.
 		 */
-		if (!ind->unique || !ind->immediate || ind->indpred != NIL)
+		if (!ind->unique || !ind->immediate || ind->indpred != NIL || ind->hasperiod)
 			continue;
 
 		/*
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81f8a..dc8327d5769 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -814,8 +814,8 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 		 * For a plain relation, we only know how to prove uniqueness by
 		 * reference to unique indexes.  Make sure there's at least one
 		 * suitable unique index.  It must be immediately enforced, and not a
-		 * partial index. (Keep these conditions in sync with
-		 * relation_has_unique_index_for!)
+		 * partial index, and not WITHOUT OVERLAPS (Keep these conditions
+		 * in sync with relation_has_unique_index_for!)
 		 

Re: SQL:2011 application time

2024-05-12 Thread Paul Jungwirth

On 5/12/24 05:55, Matthias van de Meent wrote:

  > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during);
  > ERROR:  access method "gist" does not support unique indexes

To me that error message seems correct. The programmer hasn't said anything 
about the special
temporal behavior they are looking for.


But I showed that I had a GIST index that does have the indisunique
flag set, which shows that GIST does support indexes with unique
semantics.

That I can't use CREATE UNIQUE INDEX to create such an index doesn't
mean the feature doesn't exist, which is what the error message
implies.


True, the error message is not really telling the truth anymore. I do think most people who hit this 
error are not thinking about temporal constraints at all though, and for non-temporal constraints it 
is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the 
*constraint*. So how about adding a hint, something like this?:


ERROR:  access method "gist" does not support unique indexes
HINT: To create a unique constraint with non-overlap behavior, use ADD 
CONSTRAINT ... WITHOUT OVERLAPS.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-05-11 Thread Paul Jungwirth

On 5/9/24 17:44, Matthias van de Meent wrote:

I haven't really been following this thread, but after playing around
a bit with the feature I feel there are new gaps in error messages. I
also think there are gaps in the functionality regarding the (lack of)
support for CREATE UNIQUE INDEX, and attaching these indexes to
constraints
Thank you for trying this out and sharing your thoughts! I think these are good points about CREATE 
UNIQUE INDEX and then creating the constraint by handing it an existing index. This is something 
that I am hoping to add, but it's not covered by the SQL:2011 standard, so I think it needs some 
discussion, and I don't think it needs to go into v17.


For instance you are saying:

> pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during);
> ERROR:  access method "gist" does not support unique indexes

To me that error message seems correct. The programmer hasn't said anything about the special 
temporal behavior they are looking for. To get non-overlapping semantics from an index, this more 
explicit syntax seems better, similar to PKs in the standard:


> pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during 
WITHOUT OVERLAPS);
> ERROR:  access method "gist" does not support unique indexes

We could also support *non-temporal* unique GiST indexes, particularly now that we have the stratnum 
support function. Those would use the syntax you gave, omitting WITHOUT OVERLAPS. But that seems 
like a separate effort to me.



Additionally, because I can't create my own non-constraint-backing
unique GIST indexes, I can't pre-create my unique constraints
CONCURRENTLY as one could do for the non-temporal case: UNIQUE
constraints hold ownership of the index and would drop the index if
the constraint is dropped, too, and don't support a CONCURRENTLY
modifier, nor an INVALID modifier. This means temporal unique
constraints have much less administrative wiggle room than normal
unique constraints, and I think that's not great.


This is a great use-case for why we should support this eventually, even if it 
uses non-standard syntax.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-05-11 Thread Paul Jungwirth

On 5/11/24 17:00, jian he wrote:

I hope I understand the problem correctly.
my understanding is that we are trying to solve a corner case:
create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
insert into t values ('[1,2]','empty'), ('[1,2]','empty');




but we still not yet address for cases like:
create table t10(a int4range, b int4range, unique (a, b WITHOUT OVERLAPS));
insert into t10 values ('[1,2]','empty'), ('[1,2]','empty');

one table can have more than one temporal unique constraint,
for each temporal unique constraint adding a check isempty constraint
seems not easy.


I think we should add the not-empty constraint only for PRIMARY KEYs, not all UNIQUE constraints. 
The empty edge case is very similar to the NULL edge case, and while every PK column must be 
non-null, we do allow nulls in ordinary UNIQUE constraints. If users want to have 'empty' in those 
constraints, I think we should let them. And then the problems you give don't arise.



Maybe we can just mention that the special 'empty' range value makes
temporal unique constraints not "unique".


Just documenting the behavior is also an okay solution here I think. I see two downsides though: (1) 
it makes rangetype temporal keys differ from PERIOD temporal keys (2) it could allow more 
planner/etc bugs than we have thought of. So I think it's worth adding the constraint instead.



also we can make sure that
FOREIGN KEY can only reference primary keys, not unique temporal constraints.
so the unique temporal constraints not "unique" implication is limited.
I played around with it, we can error out these cases in the function
transformFkeyCheckAttrs.


I don't think it is a problem to reference a temporal UNIQUE constraint, even if it contains empty 
values. An empty value means you're not asserting that row at any time (though another row might 
assert the same thing for some time), so it could never contribute toward fulfilling a reference anyway.


I do think it would be nice if the *reference* could contain empty values. Right now the FK SQL will 
cause that to never match, because we use `&&` as an optimization, but we could tweak the SQL (maybe 
for v18 instead) so that users could get away with that kind of thing. As I said in an earlier 
email, this would be you an escape hatch to reference a temporal table from a non-temporal table. 
Otherwise temporal tables are "contagious," which is a bit of a drawback.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-05-08 Thread Paul Jungwirth
Here are a couple new patches, rebased to e305f715, addressing Peter's feedback. I'm still working 
on integrating jian he's suggestions for the last patch, so I've omitted that one here.


On 5/8/24 06:51, Peter Eisentraut wrote:

About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think 
the
ideas are right, but I wonder if we can fine-tune the new conditionals a bit.

--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
  * If the indexes are to be used for speculative insertion, 
add extra
  * information required by unique index entries.
  */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
     BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So 
we
wouldn't need ii_HasWithoutOverlaps.


Okay.


Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the 
rest
if an exclusion constraint is passed, on the theory that all the speculative 
index
info is already present in that case.


I like how BuildSpeculativeIndexInfo starts with an Assert that it's given a unique index, so I've 
left the check outside the function. This seems cleaner anyway: the function stays more focused.



--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
  */
     if (indexOidFromConstraint == idxForm->indexrelid)
     {
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == 
ONCONFLICT_UPDATE)

     ereport(ERROR,
     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("ON CONFLICT DO UPDATE not supported with exclusion 
constraints")));

Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)

That matches what the error message is reporting afterwards.


Agreed.


  * constraints), so index under consideration can be immediately
  * skipped if it's not unique
  */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
     goto next;

Maybe here we need a comment.  Or make that a separate statement, like


Yes, that is nice. Done.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 0ebe2e6d6cd8dc6f8120fe93b9024cf80472f8cc Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Sun, 24 Mar 2024 21:46:30 -0700
Subject: [PATCH v2 1/2] Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes

A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST
index, not a B-Tree, but it will still have indisunique set. The code
for ON CONFLICT fails if it sees a non-btree index that has indisunique.
This commit fixes that and adds some tests. But now that we can't just
test indisunique, we also need some extra checks to prevent DO UPDATE
from running against a WITHOUT OVERLAPS constraint (because the conflict
could happen against more than one row, and we'd only update one).
---
 src/backend/catalog/index.c   |   1 +
 src/backend/executor/execIndexing.c   |   2 +-
 src/backend/optimizer/util/plancat.c  |   9 +-
 src/include/nodes/execnodes.h |   1 +
 .../regress/expected/without_overlaps.out | 176 ++
 src/test/regress/sql/without_overlaps.sql | 113 +++
 6 files changed, 300 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5a8568c55c9..1fd543cc550 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2443,6 +2443,7 @@ BuildIndexInfo(Relation index)
  >ii_ExclusionOps,
  >ii_ExclusionProcs,
  >ii_ExclusionStrats);
+		ii->ii_HasWithoutOverlaps = ii->ii_Unique;
 	}
 
 	return ii;
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9f05b3654c1..59acf67a36a 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
 		 * If the indexes are to be used for speculative insertion, add extra
 		 * information required by unique index entries.
 		 */
-		if (speculative && ii->ii_Unique)
+		if (speculative && ii->ii_Unique && !indexDesc->rd_index->indisexclusion)
 			BuildSpeculativeIndexInfo(indexDesc, ii);
 
 		relationDescs[i] = indexDesc;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 130f838629f..775c3e26cd8 100644
--- 

Re: PERIOD foreign key feature

2024-05-08 Thread Paul Jungwirth

On 5/8/24 07:44, Bruce Momjian wrote:

On Wed, May  8, 2024 at 02:29:34PM +0200, Peter Eisentraut wrote:

Yes, David is correct here on all points. I like his suggestion to
clarify the language here also. If you need a patch from me let me know,
but I assume it's something a committer can just make happen?


In principle yes, but it's also very helpful if someone produces an actual
patch file, with complete commit message, credits, mailing list link, etc.


I am ready to do the work, but waited a day for Peter to reply, since he
was the author of the text.


Here is a patch for this.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom d97a60b7e56c57a242668609c8fb82e6a6a32506 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 8 May 2024 20:41:05 -0700
Subject: [PATCH] Reword docs for temporal PKs with implicit referent

The old docs had confusing language about "one side" of the foreign key,
which could mean either the referenced primary key itself or the
REFERENCES clause of the FK declaration.

Author: David G. Johnston 
Discussion: https://www.postgresql.org/message-id/ZjpApuq8I9DE5Elv%40momjian.us
---
 doc/src/sgml/ref/create_table.sgml | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 02f31d2d6fd..75f06bc49cc 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1184,11 +1184,13 @@ WITH ( MODULUS numeric_literal, REM
   referent for its entire duration.  This column must be a range or
   multirange type.  In addition, the referenced table must have a primary
   key or unique constraint declared with WITHOUT
-  OVERLAPS.  Finally, if one side of the foreign key uses
-  PERIOD, the other side must too.  If the refcolumn list is omitted, the
-  WITHOUT OVERLAPS part of the primary key is treated
-  as if marked with PERIOD.
+  OVERLAPS. Finally, if the foreign key has a PERIOD
+  column_name specification
+  the corresponding refcolumn,
+  if present, must also be marked PERIOD.  If the
+  refcolumn clause is omitted,
+  and thus the reftable's primary key constraint chosen, the primary key
+  must have its final column marked WITHOUT OVERLAPS.
  
 
  
-- 
2.42.0



Re: PERIOD foreign key feature

2024-05-07 Thread Paul Jungwirth

On 5/7/24 08:23, David G. Johnston wrote:

On Tue, May 7, 2024 at 7:54 AM Bruce Momjian mailto:br...@momjian.us>> wrote:
In the two marked lines, it says "if one side of the foreign key uses
PERIOD, the other side must too."  However, looking at the example
queries, it seems like if the foreign side has PERIOD, the primary side
must have WITHOUT OVERLAPS, not PERIOD.

Does this doc text need correcting?


The text is factually correct, though a bit hard to parse.

"the other side" refers to the part after "REFERENCES":

FOREIGN KEY ( column_name [, ... ] [, PERIOD column_name ] ) REFERENCES reftable [ ( refcolumn [, 
... ] [, PERIOD column_name ] ) ]


***(shouldn't the second occurrence be [, PERIOD refcolum] ?)

The text is pointing out that since the refcolumn specification is optional you may very well not 
see a second PERIOD keyword in the clause.  Instead it will be inferred from the PK.


Maybe:

Finally, if the foreign key has a PERIOD column_name specification the corresponding refcolumn, if 
present, must also be marked PERIOD.  If the refcolumn clause is omitted, and thus the reftable's 
primary key constraint chosen, the primary key must have its final column marked WITHOUT OVERLAPS.


Yes, David is correct here on all points. I like his suggestion to clarify the language here also. 
If you need a patch from me let me know, but I assume it's something a committer can just make happen?


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-04-30 Thread Paul Jungwirth

On 4/30/24 09:24, Robert Haas wrote:

Peter, could you have a look at
http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
and express an opinion about whether each of those proposals are (a)
good or bad ideas and (b) whether they need to be fixed for the
current release?


Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK 
constraint. I don't really consider it finished though, because it has these problems:


- The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop 
it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, 
so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the 
ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the 
Constraint node, but since ObjectAddress isn't a Node it doesn't work.


- The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what 
we do with FK triggers.


- When you create partitions you get a warning about the constraint already existing, because it 
gets created via the PK and then also the partitioning code tries to copy it. Solving the first 
issue here should solve this nicely though.


Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes. 
But I think the CHECK constraint approach is a better solution.


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 0dbc008a654ab1fdc5f492345ee4575c352499d3 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Sun, 24 Mar 2024 21:46:30 -0700
Subject: [PATCH v2 1/4] Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes

A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST
index, not a B-Tree, but it will still have indisunique set. The code
for ON CONFLICT fails if it sees a non-btree index that has indisunique.
This commit fixes that and adds some tests. But now that we can't just
test indisunique, we also need some extra checks to prevent DO UPDATE
from running against a WITHOUT OVERLAPS constraint (because the conflict
could happen against more than one row, and we'd only update one).
---
 src/backend/catalog/index.c   |   1 +
 src/backend/executor/execIndexing.c   |   2 +-
 src/backend/optimizer/util/plancat.c  |   4 +-
 src/include/nodes/execnodes.h |   1 +
 .../regress/expected/without_overlaps.out | 176 ++
 src/test/regress/sql/without_overlaps.sql | 113 +++
 6 files changed, 294 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5a8568c55c9..1fd543cc550 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2443,6 +2443,7 @@ BuildIndexInfo(Relation index)
  >ii_ExclusionOps,
  >ii_ExclusionProcs,
  >ii_ExclusionStrats);
+		ii->ii_HasWithoutOverlaps = ii->ii_Unique;
 	}
 
 	return ii;
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9f05b3654c1..faa37ca56db 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
 		 * If the indexes are to be used for speculative insertion, add extra
 		 * information required by unique index entries.
 		 */
-		if (speculative && ii->ii_Unique)
+		if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
 			BuildSpeculativeIndexInfo(indexDesc, ii);
 
 		relationDescs[i] = indexDesc;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 130f838629f..a398d7a78d1 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 */
 		if (indexOidFromConstraint == idxForm->indexrelid)
 		{
-			if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE)
+			if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == ONCONFLICT_UPDATE)
 ereport(ERROR,
 		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 		 errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));
@@ -837,7 +837,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 * constraints), so index under consideration can be immediately
 		 * skipped if it's not unique
 		 */
-		if (!idxForm->indisunique)
+		if (!idxForm->indisunique || idxForm->indisexclusion)
 			goto next;
 
 		/* Build BMS representation of plain (non expression) index attrs */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d927ac44a82..fdfaef284e9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -204,6 +204,7 @@ typedef struct IndexInfo
 	bool		ii_Summarizing;
 	int			ii_ParallelWorkers;
 	Oid			ii_Am;
+	bool	

Re: SQL:2011 application time

2024-04-26 Thread Paul Jungwirth

On 4/26/24 12:25, Robert Haas wrote:

I think this thread should be added to the open items list.


Thanks! I sent a request to pgsql-www to get edit permission. I didn't realize there was a wiki page 
tracking things like this. I agree it needs to be fixed if we want to include the feature.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-04-02 Thread Paul Jungwirth

On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches
v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch 
(together).


Hi Hackers,

I found some problems with temporal primary keys and the idea of uniqueness, especially around the 
indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need 
but I'd like some feedback on.


The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE.

DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a 
temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion).
This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT 
OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?)


We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more 
than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't 
now, and we don't want to update just one:


postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary 
key (id, valid_at without overlaps));

CREATE TABLE
postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)', 
'[2001-01-01,2002-01-01)', 'b');

INSERT 0 2
postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id, 
valid_at) do update set name = excluded.name;

INSERT 0 1
postgres=# select * from t;
  id   |valid_at | name
---+-+--
 [1,2) | [2001-01-01,2002-01-01) | b
 [1,2) | [2000-01-01,2001-01-01) | c
(2 rows)

So I also added code to prevent that. This is just preserving the old behavior for exclusion 
constraints, which was bypassed because of indisunique. All this is in the first patch.


That got me thinking about indisunique and where else it could cause problems. Perhaps there are 
other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error 
like ON CONFLICT, but I can imagine more subtle problems.


A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically* 
unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in 
your table more than once, as long as the records' application times don't overlap.


And it is *officially* unique: the standard calls these constraints unique. I think it is correct 
for us to report them as unique in pg_index.


But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, 
'[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique 
index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, 
'[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 
'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:


postgres=# insert into t values
('[1,2)', 'empty', 'foo'),
('[1,2)', 'empty', 'bar');
INSERT 0 2
postgres=# select * from t;
  id   | valid_at | name
---+--+--
 [1,2) | empty| foo
 [1,2) | empty| bar
(2 rows)

Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful 
value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure 
they don't cause problems.


One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do 
that, but patch 2 adds a test to keep it that way.


Uniqueness is an important property to the planner, too.

We consider indisunique often for estimates, where it needn't be 100% true. Even if there are 
nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty' 
shouldn't cause any new problems there.


In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and 
rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than 
needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL 
that was wrong before its change. I'd be happy for help here!


Another problem is GROUP BY and functional dependencies. This is wrong:

postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary 
key (id, valid_at without overlaps));

CREATE TABLE
postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 
'empty', 'bar');
INSERT 0 2
postgres=# select * from a group by id, valid_at;
  id   | valid_at | name
---+--+--
 [1,2) | 

Re: altering a column's collation leaves an invalid foreign key

2024-03-25 Thread Paul Jungwirth

On 3/23/24 10:04, Paul Jungwirth wrote:

Perhaps if the previous collation was nondeterministic we should force a 
re-check.


Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a 
better way.


We have had nondeterministic collations since v12, so perhaps it is something 
to back-patch?

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom a8078f85859400f4e4cffb57d8ec1b069e46bfb8 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Sun, 24 Mar 2024 23:37:48 -0700
Subject: [PATCH v1] Re-check foreign key if referenced collation was
 nondeterministic

With deterministic collations, we break ties by looking at binary
equality. But nondeterministic collations can allow non-identical values
to be considered equal, e.g. 'a' and 'A' when case-insensitive. So some
references that used to be valid may not be anymore.
---
 src/backend/commands/tablecmds.c  | 96 ---
 src/include/nodes/parsenodes.h|  2 +
 .../regress/expected/collate.icu.utf8.out |  9 ++
 src/test/regress/sql/collate.icu.utf8.sql |  8 ++
 4 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 71740984f33..940b1845edb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -194,6 +194,7 @@ typedef struct AlteredTableInfo
 	/* Objects to rebuild after completing ALTER TYPE operations */
 	List	   *changedConstraintOids;	/* OIDs of constraints to rebuild */
 	List	   *changedConstraintDefs;	/* string definitions of same */
+	List	   *changedCollationOids;	/* collation of each attnum */
 	List	   *changedIndexOids;	/* OIDs of indexes to rebuild */
 	List	   *changedIndexDefs;	/* string definitions of same */
 	char	   *replicaIdentityIndex;	/* index to reset as REPLICA IDENTITY */
@@ -572,6 +573,7 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 		   AlterTableCmd *cmd, LOCKMODE lockmode);
 static void RememberAllDependentForRebuilding(AlteredTableInfo *tab, AlterTableType subtype,
 			  Relation rel, AttrNumber attnum, const char *colName);
+static void RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab);
 static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab);
 static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab);
 static void RememberStatisticsForRebuilding(Oid stxoid, AlteredTableInfo *tab);
@@ -579,12 +581,12 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
    LOCKMODE lockmode);
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
  char *cmd, List **wqueue, LOCKMODE lockmode,
- bool rewrite);
+ bool rewrite, List *collationOids);
 static void RebuildConstraintComment(AlteredTableInfo *tab, AlterTablePass pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
-static void TryReuseForeignKey(Oid oldId, Constraint *con);
+static void TryReuseForeignKey(Oid oldId, Constraint *con, List *changedCollationOids);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
 	 List *options, LOCKMODE lockmode);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -9826,6 +9828,7 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	bool		old_check_ok;
 	ObjectAddress address;
 	ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
+	ListCell   *old_collation_item = list_head(fkconstraint->old_collations);
 
 	/*
 	 * Grab ShareRowExclusiveLock on the pk table, so that someone doesn't
@@ -10212,9 +10215,15 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			CoercionPathType new_pathtype;
 			Oid			old_castfunc;
 			Oid			new_castfunc;
+			Oid			old_collation;
 			Form_pg_attribute attr = TupleDescAttr(tab->oldDesc,
    fkattnum[i] - 1);
 
+			/* Get the collation for this key part. */
+			old_collation = lfirst_oid(old_collation_item);
+			old_collation_item = lnext(fkconstraint->old_collations,
+	   old_collation_item);
+
 			/*
 			 * Identify coercion pathways from each of the old and new FK-side
 			 * column types to the right (foreign) operand type of the pfeqop.
@@ -10250,9 +10259,12 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			 * turn conform to the domain.  Consequently, we need not treat
 			 * domains specially here.
 			 *
-			 * Since we require that all collations share the same notion of
-			 * equality (which they do, because texteq reduces to bitwise
-			 * equality), we don't compare collation here.
+			 * All deterministic collations use bitwise equality to resolve
+			 * ties, but if the previous collation was nonde

altering a column's collation leaves an invalid foreign key

2024-03-23 Thread Paul Jungwirth

Dear hackers,

I was looking at how foreign keys deal with collations, and I came across this comment about not 
re-checking a foreign key if the column type changes in a compatible way:


  * Since we require that all collations share the same notion of
  * equality (which they do, because texteq reduces to bitwise
  * equality), we don't compare collation here.

But now that we have nondeterministic collations, isn't that out of date?

For instance I could make this foreign key:

paul=# create collation itext (provider = 'icu', locale = 'und-u-ks-level1', 
deterministic = false);
CREATE COLLATION
paul=# create table t1 (id text collate itext primary key);
CREATE TABLE
paul=# create table t2 (id text, parent_id text references t1);
CREATE TABLE

And then:

paul=# insert into t1 values ('a');
INSERT 0 1
paul=# insert into t2 values ('.', 'A');
INSERT 0 1

So far that behavior seems correct, because the user told us 'a' and 'A' were 
equivalent,
but now I can change the collation on the referenced table and the FK doesn't 
complain:

paul=# alter table t1 alter column id type text collate "C";
ALTER TABLE

The constraint claims to be valid, but I can't drop & add it:

paul=# alter table t2 drop constraint t2_parent_id_fkey;
ALTER TABLE
paul=# alter table t2 add constraint t2_parent_id_fkey foreign key (parent_id) 
references t1;
ERROR:  insert or update on table "t2" violates foreign key constraint 
"t2_parent_id_fkey"
DETAIL:  Key (parent_id)=(A) is not present in table "t1".

Isn't that a problem?

Perhaps if the previous collation was nondeterministic we should force a 
re-check.

(Tested on 17devel 697f8d266c and also 16.)

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-02-29 Thread Paul Jungwirth

On 2/13/24 21:00, jian he wrote:

Hi
more minor issues.

+ FindFKComparisonOperators(
+ fkconstraint, tab, i, fkattnum,
+ _check_ok, _pfeqop_item,
+ pktypoid[i], fktypoid[i], opclasses[i],
+ is_temporal, false,
+ [i], [i], [i]);
+ }
+ if (is_temporal) {
+ pkattnum[numpks] = pkperiodattnum;
+ pktypoid[numpks] = pkperiodtypoid;
+ fkattnum[numpks] = fkperiodattnum;
+ fktypoid[numpks] = fkperiodtypoid;

- pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
- eqstrategy);
- if (OidIsValid(pfeqop))
- {
- pfeqop_right = fktyped;
- ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
- eqstrategy);
- }
- else
- {
- /* keep compiler quiet */
- pfeqop_right = InvalidOid;
- ffeqop = InvalidOid;
- }
+ FindFKComparisonOperators(
+ fkconstraint, tab, numpks, fkattnum,
+ _check_ok, _pfeqop_item,
+ pkperiodtypoid, fkperiodtypoid, opclasses[numpks],
+ is_temporal, true,
+ [numpks], [numpks], [numpks]);
+ numfks += 1;
+ numpks += 1;
+ }

opening curly brace should be the next line,


Fixed in v25 (submitted in my other email).


also do you think it's
good idea to add following in the `if (is_temporal)` branch
`
Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid));
Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0));
`

` if (is_temporal)` branch, you can set the FindFKComparisonOperators
10th argument (is_temporal)
to true, since you are already in the ` if (is_temporal)` branch.

maybe we need some extra comments on
`
+ numfks += 1;
+ numpks += 1;
`
since it might not be that evident?


That branch doesn't exist anymore. Same with the increments.


Do you think it's a good idea to list arguments line by line (with
good indentation) is good format? like:
FindFKComparisonOperators(fkconstraint,
tab,
i,
fkattnum,
_check_ok,
_pfeqop_item,
pktypoid[i],
fktypoid[i],
opclasses[i],
false,
false,
[i],
[i],
[i]);


There are places we do that, but most code I've seen tries to fill the line. I haven't followed that 
strictly here, but I'm trying to get better at doing what pg_indent wants.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: automating RangeTblEntry node support

2024-02-16 Thread Paul Jungwirth

On 1/15/24 02:37, Peter Eisentraut wrote:
In this updated patch set, I have also added the treatment of the Constraint type.  (I also noted 
that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying 
this would be really helpful.)  I have also added commit messages to each patch.


The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for 
inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein.


I had to apply the first patch by hand (on 9f13376396), so this looks due for a rebase. Patches 2-4 
applied fine.


Compiles & passes tests after each patch.

The overall idea seems like a good improvement to me.

A few remarks about cleaning up the RangeTblEntry comments:

After the fourth patch we have a "Fields valid in all RTEs" comment twice in the struct, once at the 
top and once at the bottom. It's fine IMO but maybe the second could be "More fields valid in all RTEs"?


The new order of fields in RangleTblEntry matches the intro comment, which seems like another small 
benefit.


It seems like we are moving away from ever putting RTEKind-specific fields into a union as suggested 
by the FIXME comment here. It was written in 2002. Is it time to remove it?


This now needs to say "above" not "below":

/*
 * join_using_alias is an alias clause attached directly to JOIN/USING. It
 * is different from the alias field (below) in that it does not hide the
 * range variables of the tables being joined.
 */
Alias  *join_using_alias pg_node_attr(query_jumble_ignore);

Re bloating the serialization output, we could leave this last patch until after the work on that 
other thread is done to skip default-valued items.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-01-08 Thread Paul Jungwirth

On 1/8/24 06:54, jian he wrote:
> On Fri, Jan 5, 2024 at 1:06 PM jian he  wrote:
>
> range_intersect returns the intersection of two ranges.
> I think here we are doing the opposite.
> names the main SQL function "range_not_intersect" and the internal
> function as "range_not_intersect_internal" should be fine.
> so people don't need to understand the meaning of "portion".

Thank you for helping me figure out a name here! I realize that can be a bike-sheddy kind of 
discussion, so let me share some of my principles.


Range and multirange are highly mathematically "pure", and that's something I value in them. It 
makes them more general-purpose, less encumbered by edge cases, easier to combine, and easier to 
reason about. Preserving that close connection to math is a big goal.


What I've called `without_portion` is (like) a closed form of minus (hence `@-` for the operator). 
Minus isn't closed under everything (e.g. ranges), so `without_portion` adds arrays---much as to 
close subtraction we add negative numbers and to close division we add rationals). We get the same 
effect from multiranges, but that only buys us range support. It would be awesome to support 
arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., so I think an array is 
the way to go here. And then each array element is a "leftover". What do we call a closed form of 
minus that returns arrays?


Using "not" suggests a function that returns true/false, but `@-` returns an array of things. So 
instead of "not" let's consider "complement". I think that's what you're expressing re intersection.


But `@-` is not the same as the complement of intersection. For one thing, `@-` is not commutative. 
`old_range @- target_portion` is not the same as `target_portion @- old_range`. But 
`complement(old_range * target_portion)` *is* the same as `complement(target_portion * old_range)`. 
Or from another angle: it's true that `old_range @- target_portion = old_range @- (old_range * 
target_portion)`, but the intersection isn't "doing" anything here. It's true that intersection and 
minus both "reduce" what you put in, but minus is more accurate.


So I think we want a name that captures that idea of "minus". Both "not" and "intersection" are 
misleading IMO.


Of course "minus" is already taken (and you wouldn't expect it to return arrays anyway), which is 
why I'm thinking about names like "without" or "except". Or maybe "multi-minus". I still think 
"without portion" is the closest to capturing everything above (and avoids ambiguity with other SQL 
operations). And the "portion" ties the operator to `FOR PORTION OF`, which is its purpose. But I 
wouldn't be surprised if there were something better.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Improve rowcount estimate for UNNEST(column)

2023-12-06 Thread Paul Jungwirth

Hello,

On 11/26/23 12:22, Tom Lane wrote:
> Yes, this regression test is entirely unacceptable; the numbers will
> not be stable enough.  Even aside from the different-settings issue,
> you can't rely on ANALYZE deriving exactly the same stats every time.
> Usually what we try to do is devise a query where the plan shape
> changes because of the better estimate.

Here is a patch with an improved test. With the old "10" estimate we get a Merge Join, but now that 
the planner can see there are only ~4 elements per array, we get a Nested Loop.


It was actually hard to get a new plan, since all our regress tables' arrays have around 5 elements 
average, which isn't so far from 10. Adding a table with 1- or 2- element arrays would be more 
dramatic. So I resorted to tuning the query with `WHERE seqno <= 50`. Hopefully that's not cheating 
too much.


I thought about also adding a test where the old code *underestimates*, but then I'd have to add a 
new table with big arrays. If it's worth it let me know.


On 11/26/23 23:05, jian he wrote:
> I found using table array_op_test test more convincing.

True, arrtest is pretty small. The new test uses array_op_test instead.

On 11/29/23 20:35, jian he wrote:
> I did a minor change. change estimate_array_length return type to double

I'm not sure I want to change estimate_array_length from returning ints to returning doubles, since 
it's called in many places. I can see how that might give plans that are more accurate yet, so maybe 
it's worth it? It feels like it ought to be a separate patch to me, but if others want me to include 
it here please let me know.


I did add clamp_row_est since it's essentially free and maybe gives some safety.

Rebased onto d43bd090a8.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom f8246ff806f4b3ec1441e515ada97662a8fe8967 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Sat, 25 Nov 2023 09:12:52 -0800
Subject: [PATCH v2] Use statitics for estimating UNNEST(column) rows

Extends estimate_array_length to consult DECHIST statistics when called
with a Var. This improves planning with array_unnest_support (added in
a391ff3c3d41) and should give more accurate results for other callers
too. DECHIST has the average *distinct* element count which isn't
exactly what we want, but it should still be an improvement over a fixed
10.
---
 src/backend/optimizer/path/costsize.c|  8 ++---
 src/backend/utils/adt/array_typanalyze.c |  1 +
 src/backend/utils/adt/arrayfuncs.c   |  2 +-
 src/backend/utils/adt/selfuncs.c | 39 +++-
 src/include/utils/selfuncs.h |  2 +-
 src/test/regress/expected/arrays.out | 20 
 src/test/regress/sql/arrays.sql  | 11 +++
 7 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..8bcdcc7fd8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1254,7 +1254,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) qual;
 			Node	   *arraynode = (Node *) lsecond(saop->args);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += estimate_array_length(root, arraynode);
 		}
 		else if (IsA(qual, CurrentOfExpr))
 		{
@@ -4741,7 +4741,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
 		Node	   *arraynode = (Node *) lsecond(saop->args);
 		QualCost	sacosts;
 		QualCost	hcosts;
-		int			estarraylen = estimate_array_length(arraynode);
+		int			estarraylen = estimate_array_length(context->root, arraynode);
 
 		set_sa_opfuncid(saop);
 		sacosts.startup = sacosts.per_tuple = 0;
@@ -4779,7 +4779,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
 			 */
 			context->total.startup += sacosts.startup;
 			context->total.per_tuple += sacosts.per_tuple *
-estimate_array_length(arraynode) * 0.5;
+estimate_array_length(context->root, arraynode) * 0.5;
 		}
 	}
 	else if (IsA(node, Aggref) ||
@@ -4830,7 +4830,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
 		context->total.startup += perelemcost.startup;
 		if (perelemcost.per_tuple > 0)
 			context->total.per_tuple += perelemcost.per_tuple *
-estimate_array_length((Node *) acoerce->arg);
+estimate_array_length(context->root, (Node *) acoerce->arg);
 	}
 	else if (IsA(node, RowCompareExpr))
 	{
diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index 04b3764b68..a60d5fb5d2 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -604,6 +604,7 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 			/*
 			 * Prepare to fill stanumbers with the histogram, followed by the
 			 * average count.  This array must be stored in anl_context.
+			 * We use the average count in estimate_array_length.
 

Re: SQL:2011 application time

2023-12-02 Thread Paul Jungwirth

On Thu, Nov 23, 2023 at 1:08 AM Peter Eisentraut  wrote:
> After further thought, I think the right solution is to change
> btree_gist (and probably also btree_gin) to use the common RT* strategy
> numbers.

Okay. That will mean bumping the version of btree_gist, and you must be running that version to use 
the new temporal features, or you will get silly results. Right? Is there a way to protect users 
against that and communicate they need to upgrade the extension?


This also means temporal features may not work in custom GIST opclasses. What we're saying is they 
must have an appropriate operator for RTEqualStrategyNumber (18) and RTOverlapStrategyNumber (3). 
Equal matters for the scalar key part(s), overlap for the range part. So equal is more likely to be 
an issue, but overlap matters if we want to support non-ranges (which I'd say is worth doing).


Also if they get it wrong, we won't really have any way to report an error.

I did some research on other extensions in contrib, as well as PostGIS. Here is 
what I found:

## btree_gin:

3 is =
18 is undefined

same for all types: macaddr8, int2, int4, int8, float4, float8, oid, timestamp, timestamptz, time, 
timetz, date, interval, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, anyenum, uuid, 
name, bool, bpchar


## cube

3 is &&
18 is <=>

## intarray

3 is &&
18 is undefined

## ltree

3 is =
18 is undefined

## hstore

3 and 18 are undefined

## seg

3 is &&
18 is undefined

## postgis: geometry

3 is &&
18 is undefined

## postgis: geometry_nd

3 is &&&
18 is undefined

I thought about looking through pgxn for more, but I haven't yet. I may still 
do that.
But already it seems like there is not much consistency.

So what do you think of this idea instead?:

We could add a new (optional) support function to GiST that translates "well-known" strategy numbers 
into the opclass's own strategy numbers. This would be support function 12. Then we can say 
translateStrategyNumber(RTEqualStrategyNumber) and look up the operator with the result.


There is not a performance hit, because we do this for the DDL command (create pk/uq/fk), then store 
the operator in the index/constraint.


If you don't provide this new support function, then creating the pk/uq/fk fails with a hint about 
what you can do to make it work.


This approach means we don't change the rules about GiST opclasses: you can still use the stranums 
how you like.


This function would also let me support non-range "temporal" foreign keys, where I'll need to build 
queries with && and maybe other operators.


What do you think?

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2023-11-09 Thread Paul Jungwirth

On 11/9/23 05:47, Peter Eisentraut wrote:
I went over the patch 
v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more 
detail


Thanks Peter!

I'm about halfway through jian he's last two emails. I'll address your 
feedback also. I wanted to reply to this without waiting though:


Overall, with these fixes, I think this patch is structurally okay.  We 
just need to make sure we have all the weird corner cases covered.


One remaining issue I know about is with table partitions whose column 
order has changed. I've got an in-progress fix for that, but I've been 
prioritizing reviewer feedback the last few months. Just want to make 
sure you know about it for now.


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2023-10-10 Thread Paul Jungwirth

On 9/25/23 14:00, Peter Eisentraut wrote:

Looking through the tests in v16-0001:

+-- PK with no columns just WITHOUT OVERLAPS:
+CREATE TABLE temporal_rng (
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS)
+);
+ERROR:  syntax error at or near "WITHOUT"
+LINE 3:  CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV...
+  ^

I think this error is confusing.  The SQL standard requires at least one 
non-period column in a PK.  I don't know why that is or why we should 
implement it.  But if we want to implement it, maybe we should enforce 
that in parse analysis rather than directly in the parser, to be able to 
produce a more friendly error message.


Okay.

(I think the reason the standard requires one non-period column is to 
identify the "entity". If philosophically the row is an Aristotelian 
proposition about that thing, the period qualifies it as true just 
during some time span. So the scalar part is doing the work that a PK 
conventionally does, and the period part does something else. Perhaps a 
PK/UNIQUE constraint with no scalar part would still be useful, but not 
very often I think, and I'm not sure it makes sense to call it PK/UNIQUE.)



+-- PK with a range column/PERIOD that isn't there:
+CREATE TABLE temporal_rng (
+   id INTEGER,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);
+ERROR:  range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist

I think here we should just produce a "column doesn't exist" error 
message, the same as if the "id" column was invalid.  We don't need to 
get into the details of what kind of column it should be.  That is done 
in the next test


I'll change it. The reason for the different wording is that it might 
not be a column at all. It might be a PERIOD. So what about just "column 
or PERIOD doesn't exist"? (Your suggestion is fine too though.)



+ERROR:  column "valid_at" in WITHOUT OVERLAPS is not a range type

Also, in any case it would be nice to have a location pointer here (for 
both cases).


Agreed.


+-- PK with one column plus a range:
+CREATE TABLE temporal_rng (
+   -- Since we can't depend on having btree_gist here,
+   -- use an int4range instead of an int.
+   -- (The rangetypes regression test uses the same trick.)
+   id int4range,
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);

I'm confused why you are using int4range here (and in further tests) for 
the scalar (non-range) part of the primary key.  Wouldn't a plaint int4 
serve here?


A plain int4 would be better, and it would match the normal use-case, 
but you must have btree_gist to create an index like that, and the 
regress tests can't assume we have that. Here is the part from 
sql/rangetypes.sql I'm referring to:


--
-- Btree_gist is not included by default, so to test exclusion
-- constraints with range types, use singleton int ranges for the "="
-- portion of the constraint.
--

create table test_range_excl(
  room int4range,
  speaker int4range,
  during tsrange,
  exclude using gist (room with =, during with &&),
  exclude using gist (speaker with =, during with &&)
);

+SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE 
conname = 'temporal_rng_pk';

+    pg_get_indexdef
+---
+ CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, 
valid_at)


Shouldn't this somehow show the operator classes for the columns?  We 
are using different operator classes for the id and valid_at columns, 
aren't we?


We only print the operator classes if they are not the default, so they 
don't appear here.


I do suspect something more is desirable though. For exclusion 
constraints we replace everything before the columns with just "EXCLUDE 
USING gist". I could embed WITHOUT OVERLAPS but it's not valid syntax in 
CREATE INDEX. Let me know if you have any ideas.



+-- PK with USING INDEX (not possible):
+CREATE TABLE temporal3 (
+   id int4range,
+   valid_at tsrange
+);
+CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at);
+ALTER TABLE temporal3
+   ADD CONSTRAINT temporal3_pk
+   PRIMARY KEY USING INDEX idx_temporal3_uq;
+ERROR:  "idx_temporal3_uq" is not a unique index
+LINE 2:  ADD CONSTRAINT temporal3_pk
+ ^
+DETAIL:  Cannot create a primary key or unique constraint using such an 
index.


Could you also add a test where the index is unique and the whole thing 
does work?


No problem!

Apart from the tests, how about renaming the column 
pg_constraint.contemporal to something like to conwithoutoverlaps?


Is that too verbose? I've got some code already changing it to 
conoverlaps but I'm probably happier with conwithoutoverlaps, assuming 
no one else minds it.


Yours,

--

Re: SQL:2011 application time

2023-10-10 Thread Paul Jungwirth

Hi Peter et al,

On 9/1/23 12:56, Paul Jungwirth wrote:

On 9/1/23 11:30, Peter Eisentraut wrote:
I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
would be possible.  Then the WITHOUT OVERLAPS clause would directly 
correspond to the choice between equality or overlaps operator per 
column.
I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality.


I've been working on implementing this, but I've come to think it is the 
wrong way to go.


If we support this in primary key and unique constraints, then we must 
also support it for foreign keys and UPDATE/DELETE FOR PORTION OF. But 
implementing that logic is pretty tricky. For example take a foreign key 
on (id, PERIOD valid_at, PERIOD asserted_at). We need to ensure the 
referenced two-dimensional time space `contains` the referencing 
two-dimensional space. You can visualize a rectangle in two-dimensional 
space for each referencing record (which we validate one at a time). The 
referenced records must be aggregated and so form a polygon (of all 
right angles). For example the referencing record may be (1, [0,2), 
[0,2)) with referenced records of (1, [0,2), [0,1)) and (1, [0,1), 
[1,2)). (I'm using intranges since they're easier to read, but you could 
imagine these as dateranges like [2000-01-01,2002-01-01).) Now the 
range_agg of their valid_ats is [0,2) and of their asserted_ats is 
[0,2). But the referenced 2d space still doesn't contain the referencing 
space. It's got one corner missing. This is a well-known problem among 
game developers. We're lucky not to have arbitrary polygons, but it's 
still a tough issue.


Besides `contains` we also need to compute `overlaps` and `intersects` 
to support these temporal features. Implementing that for 2d, 3d, etc 
looks very complicated, for something that is far outside the normal use 
case and also not part of the standard. It will cost a little 
performance for the normal 1d use case too.


I think a better approach (which I want to attempt as an add-on patch, 
not in this main series) is to support not just range types, but any 
type with the necessary operators. Then you could have an mdrange 
(multi-dimensional range) or potentially even an arbitrary n-dimensional 
polygon. (PostGIS has something like this, but its `contains` operator 
compares (non-concave) *bounding boxes*, so it would not work for the 
example above. Still the similarity between temporal and spatial data is 
striking. I'm going to see if I can get some input from PostGIS folks 
about how useful any of this is to them.) This approach would also let 
us use multiranges: not for multiple dimensions, but for non-contiguous 
time spans stored in a single row. This puts the complexity in the types 
themselves (which seems more appropriate) and is ultimately more 
flexible (supporting not just mdrange but also multirange, and other 
things too).


This approach also means that instead of storing a mask/list of which 
columns use WITHOUT OVERLAPS, I can just store one attnum. Again, this 
saves the common use-case from paying a performance penalty to support a 
much rarer one.


I've still got my multi-WITHOUT OVERLAPS work, but I'm going to switch 
gears to what I've described here. Please let me know if you disagree!


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2023-09-14 Thread Paul Jungwirth

On 9/7/23 18:24, jian he wrote:

for a range primary key, is it fine to expect it to be unique, not
null and also not overlap? (i am not sure how hard to implement it).

-
quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique
constraints, page 97 of 1483.

...
-
based on the above, the unique constraint does not specify that the
column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS).
Here column "a" can be a range type (that have overlap property) and
can be not.
In fact, many of your primary key, foreign key regess test using
something like '[11,11]' (which make it more easy to understand),
which in logic is a non-range usage.
So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type
does make sense?


I'm not sure I understand this question, but here are a few things that 
might help clarify things:


In SQL:2011, a temporal primary key, unique constraint, or foreign key 
may have one or more "scalar" parts (just like a regular key) followed 
by one "PERIOD" part, which is denoted with "WITHOUT OVERLAPS" (in 
PKs/UNIQUEs) or "PERIOD" (in FKs). Except for this last key part, 
everything is still compared for equality, just as in a traditional key. 
But this last part is compared for overlaps. It's exactly the same as 
`EXCLUDE (id WITH =, valid_at WITH &&)`. The overlap part must come last 
and you can have only one (but you may have more than one scalar part if 
you like).


In the patch, I have followed that pattern, except I also allow a 
regular range column anywhere I allow a PERIOD. In fact PERIODs are 
mostly implemented on top of range types. (Until recently PERIOD support 
was in the first patch, not the last, and there was code all throughout 
for handling both, e.g. within indexes, etc. But at pgcon Peter 
suggested building everything on just range columns, and then having 
PERIODs create an "internal" GENERATED column, and that cleaned up the 
code considerably.)


One possible source of confusion is that in the tests I'm using range 
columns *also* for the scalar key part. So valid_at is a tsrange, and 
int is an int4range. This is not normally how you'd use the feature, but 
you need the btree_gist extension to mix int & tsrange (e.g.), and 
that's not available in the regress tests. We are still comparing the 
int4range for regular equality and the tsrange for overlaps. If you 
search this thread there was some discussion about bringing btree_gist 
into core, but it sounds like it doesn't need to happen. (It might be 
still desirable independently. EXCLUDE constraints are also not really 
something you can use practically without it, and their tests use the 
same trick of comparing ranges for plain equality.)


The piece of discussion you're replying to is about allowing *multiple* 
WITHOUT OVERLAPS modifiers on a PK/UNIQUE constraint, and in any 
position. I think that's a good idea, so I've started adapting the code 
to support it. (In fact there is a lot of code that assumes the overlaps 
key part will be in the last position, and I've never really been happy 
with that, so it's an excuse to make that more robust.) Here I'm saying 
(1) you will still need at least one scalar key part, (2) if there are 
no WITHOUT OVERLAPS parts then you just have a regular key, not a 
temporal one, (3) changing this obliges us to do the same for foreign 
keys and FOR PORTION OF.


I hope that helps! I apologize if I've completely missed the point. If 
so please try again. :-)


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2023-09-01 Thread Paul Jungwirth

On 9/1/23 03:50, Vik Fearing wrote:

On 9/1/23 11:30, Peter Eisentraut wrote:
1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
OVERLAPS clause attach to the last column, or to the whole column 
list? In the SQL standard, you can only have one period and it has to 
be listed last, so this question does not arise.  But here we are 
building a more general facility to then build the SQL facility on top 
of.  So I think it doesn't make sense that the range column must be 
last or that there can only be one.  Also, your implementation 
requires at least one non-overlaps column, which also seems like a 
confusing restriction.


I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
would be possible.  Then the WITHOUT OVERLAPS clause would directly 
correspond to the choice between equality or overlaps operator per 
column.


An alternative interpretation would be that WITHOUT OVERLAPS applies 
to the whole column list, and we would take it to mean, for any range 
column, use the overlaps operator, for any non-range column, use the 
equals operator.  But I think this would be confusing and would 
prevent the case of using the equality operator for some ranges and 
the overlaps operator for some other ranges in the same key.


I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
the column or expression it is attached to, and need not be last in line.


I agree. The second option seems confusing and is more restrictive.

I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality. Several books[1,2] about temporal databases describe a 
multi-dimensional temporal space (even beyond application time vs. 
system time), and the standard is pretty disappointing here. It's not a 
weird idea.


But I just want to be explicit that this isn't something the standard 
describes. (I think everyone in the conversation so far understands 
that.) So far I've tried to be pretty scrupulous about following 
SQL:2011, although personally I'd rather see Postgres support this 
functionality. And it's not like it goes *against* what the standard 
says. But if there are any objections, I'd love to hear them before 
putting in the work. :-)


If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE 
constraints, then surely we also allow multiple+anywhere PERIOD in 
FOREIGN KEY constraints too. (I guess the standard switched keywords 
because a FK is more like "MUST OVERLAPS". :-)


Also if you have multiple application-time dimensions we probably need 
to allow multiple FOR PORTION OF clauses. I think the syntax would be:


UPDATE t
  FOR PORTION OF valid_at FROM ... TO ...
  FOR PORTION OF asserted_at FROM ... TO ...
  [...]
  SET foo = bar

Does that sound okay?

I don't quite understand this part:

>> Also, your implementation
>> requires at least one non-overlaps column, which also seems like a
>> confusing restriction.

That's just a regular non-temporal constraint. Right? If I'm missing 
something let me know.


[1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational 
Theory, Second Edition: Temporal Databases in the Relational Model and 
SQL. 2nd edition, 2014.

[2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014.

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Exclusion constraints on partitioned tables

2023-03-17 Thread Paul Jungwirth

On 1/24/23 06:38, Ronan Dunklau wrote:

I've taken a look at the patch, and I'm not sure why you keep the restriction
on the Gist operator being of the RTEqualStrategyNumber strategy. I don't
think  we have any other place where we expect those strategy numbers to
match. For hash it's different, as the hash-equality is the only operator
strategy and as such there is no other way to look at it. Can't we just
enforce partition_operator == exclusion_operator without adding the
RTEqualStrategyNumber for the opfamily into the mix ?


Thank you for taking a look! I did some research on the history of the 
code here, and I think I understand Tom's concern about making sure the 
index uses the same equality operator as the partition. I was confused 
about his remarks about the opfamily, but I agree with you that if the 
operator is the same, we should be okay.


I added the code about RTEqualStrategyNumber because that's what we need 
to find an equals operator when the index is GiST (except if it's using 
an opclass from btree_gist; then it needs to be BTEqual again). But then 
I realized that for exclusion constraints we have already figured out 
the operator (in RelationGetExclusionInfo) and put it in 
indexInfo->ii_ExclusionOps. So we can just compare against that. This 
works whether your index uses btree_gist or not.


Here is an updated patch with that change (also rebased).

I also included a more specific error message. If we find a matching 
column in the index but with the wrong operator, we should say so, and 
not say there is no matching column.


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 928a31433a4b8cad25f74017ca96b843aa30e02d Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 23 Nov 2022 14:55:43 -0800
Subject: [PATCH v3] Allow some exclusion constraints on partitions

Previously we only allowed UNIQUE B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something like
&&.
---
 doc/src/sgml/ddl.sgml  | 12 ++--
 src/backend/commands/indexcmds.c   | 66 ---
 src/backend/parser/parse_utilcmd.c |  6 --
 src/test/regress/expected/alter_table.out  | 31 +++--
 src/test/regress/expected/create_table.out | 16 +++--
 src/test/regress/expected/indexing.out | 73 ++
 src/test/regress/sql/alter_table.sql   | 29 +++--
 src/test/regress/sql/create_table.sql  | 13 +++-
 src/test/regress/sql/indexing.sql  | 57 +++--
 9 files changed, 236 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 91c036d1cb..9d3c423ffd 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4233,11 +4233,13 @@ ALTER INDEX measurement_city_id_logdate_key
 
  
   
-   There is no way to create an exclusion constraint spanning the
-   whole partitioned table.  It is only possible to put such a
-   constraint on each leaf partition individually.  Again, this
-   limitation stems from not being able to enforce cross-partition
-   restrictions.
+   Similarly an exclusion constraint must include all the
+   partition key columns. Furthermore the constraint must compare those
+   columns for equality (not e.g. ).
+   Again, this limitation stems from not being able to enforce
+   cross-partition restrictions. The constraint may include additional
+   columns that aren't part of the partition key, and it may compare
+   those with any operators you like.
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e..52d2395daa 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -709,11 +709,6 @@ DefineIndex(Oid relationId,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
 			RelationGetRelationName(rel;
-		if (stmt->excludeOpNames)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
-			RelationGetRelationName(rel;
 	}
 
 	/*
@@ -918,15 +913,16 @@ DefineIndex(Oid relationId,
 		index_check_primary_key(rel, indexInfo, is_alter_table, stmt);
 
 	/*
-	 * If this table is partitioned and we're creating a unique index or a
-	 * primary key, make sure that the partition key is a subset of the
-	 * index's columns.  Otherwise it would be possible to violate uniqueness
-	 * by putting values that ought to be unique in different partitions.
+	 * If this table is partitioned and we're creating a unique index,
+	 * primary key, or exclusion constraint, make sure that the partition key
+	 * is a subset of the index's columns.  Otherwise it would be possible 

Re: Exclusion constraints on partitioned tables

2022-12-15 Thread Paul Jungwirth

On 12/15/22 16:12, Tom Lane wrote:

This patch also requires the matching constraint columns to use equality
comparisons (`(foo WITH =)`), so it is really equivalent to the existing
b-tree rule.


That's not quite good enough: you'd better enforce that it's the same
equality operator (and same collation, if relevant) as is being used
in the partition key.
[snip]
It might work better to consider the operator itself and ask if
it's equality in the same btree opfamily that's used by the
partition key.


Thank you for taking a look! Here is a comparison on just the operator 
itself.


I included a collation check too, but I'm not sure it's necessary. 
Exclusion constraints don't have a collation per se; it comes from the 
index, and we choose it just a little above in this function. (I'm not 
even sure how to elicit that new error message in a test case.)


I'm not sure what to do about matching the opfamily. In practice an 
exclusion constraint will typically use gist, but the partition key will 
always use btree/hash. You're saying that the equals operator can be 
inconsistent between those access methods? That is surprising to me, but 
I admit op classes/families are still sinking in. (Even prior to this 
patch, isn't the code for hash-based partitions looking up ptkey_eqop 
via the hash opfamily, and then comparing it to idx_eqop looked up via 
the btree opfamily?)


If partitions can only support btree-based exclusion constraints, you 
still wouldn't be able to partition a temporal table, because those 
constraints would always be gist. So I guess what I really want is to 
support gist index constraints on partitioned tables.


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom df25cb63cacd992fc7453fc432488c6046e59479 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 23 Nov 2022 14:55:43 -0800
Subject: [PATCH v2] Allow some exclusion constraints on partitions

Previously we only allowed UNIQUE B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something like
&&.
---
 doc/src/sgml/ddl.sgml  | 12 ++--
 src/backend/commands/indexcmds.c   | 42 +++--
 src/backend/parser/parse_utilcmd.c |  6 --
 src/test/regress/expected/alter_table.out  | 31 +++--
 src/test/regress/expected/create_table.out | 16 +++--
 src/test/regress/expected/indexing.out | 73 ++
 src/test/regress/sql/alter_table.sql   | 29 +++--
 src/test/regress/sql/create_table.sql  | 13 +++-
 src/test/regress/sql/indexing.sql  | 57 +++--
 9 files changed, 227 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 6e92bbddd2..59be911471 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4206,11 +4206,13 @@ ALTER INDEX measurement_city_id_logdate_key
 
  
   
-   There is no way to create an exclusion constraint spanning the
-   whole partitioned table.  It is only possible to put such a
-   constraint on each leaf partition individually.  Again, this
-   limitation stems from not being able to enforce cross-partition
-   restrictions.
+   Similarly an exclusion constraint must include all the 
+   partition key columns. Furthermore the constraint must compare those
+   columns for equality (not e.g. ).
+   Again, this limitation stems from not being able to enforce
+   cross-partition restrictions. The constraint may include additional
+   columns that aren't part of the partition key, and it may compare
+   those with any operators you like.
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7dc1aca8fe..a9ecb7df19 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -709,11 +709,6 @@ DefineIndex(Oid relationId,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
 			RelationGetRelationName(rel;
-		if (stmt->excludeOpNames)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
-			RelationGetRelationName(rel;
 	}
 
 	/*
@@ -926,7 +921,7 @@ DefineIndex(Oid relationId,
 	 * We could lift this limitation if we had global indexes, but those have
 	 * their own problems, so this is a useful feature combination.
 	 */
-	if (partitioned && (stmt->unique || stmt->primary))
+	if (partitioned && (stmt->unique || stmt->primary || stmt->excludeOpNames != NIL))
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
 		const char *constraint_type;
@@ -983,6 +978,8 @@ DefineIndex(Oid relationId,
 			 */
 			if (accessMethodId == BTREE_AM_OID)
 eq_strategy = 

Exclusion constraints on partitioned tables

2022-12-15 Thread Paul Jungwirth

Hello Hackers,

I'm trying to get things going again on my temporal tables work, and 
here is a small patch to move that forward.


It lets you create exclusion constraints on partitioned tables, similar 
to today's rules for b-tree primary keys & unique constraints:
just as we permit a PK on a partitioned table when the PK's columns are 
a superset of the partition keys, so we could also allow an exclusion 
constraint when its columns are a superset of the partition keys.


This patch also requires the matching constraint columns to use equality 
comparisons (`(foo WITH =)`), so it is really equivalent to the existing 
b-tree rule. Perhaps that is more conservative than necessary, but we 
can't permit an arbitrary operator, since some might require testing 
rows that fall into other partitions. For example `(foo WITH <>)` would 
obviously cause problems.


The exclusion constraint may still include other columns beyond the 
partition keys, and those may use equality operators or something else.


This patch is required to support temporal partitioned tables, because 
temporal tables use exclusion constraints as their primary key.
Essentially they are `(id WITH =, valid_at with &&)`. Since the primary 
key is not a b-tree, partitioning them would be forbidden prior to this 
patch. But now you could partition that table on `id`, and we could 
still correctly validate the temporal PK without requiring rows from 
other partitions.


This patch may be helpful beyond just temporal tables (or for DIY 
temporal tables), so it seems worth submitting it separately.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 7daadf9e822509186c9e32794d0e29effdc90edc Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 23 Nov 2022 14:55:43 -0800
Subject: [PATCH v1] Allow some exclusion constraints on partitions

Previously we only allowed UNIQUE B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something like
&&.
---
 doc/src/sgml/ddl.sgml  | 12 ++--
 src/backend/commands/indexcmds.c   | 36 +--
 src/backend/parser/parse_utilcmd.c |  6 --
 src/test/regress/expected/alter_table.out  | 31 +++--
 src/test/regress/expected/create_table.out | 16 +++--
 src/test/regress/expected/indexing.out | 73 ++
 src/test/regress/sql/alter_table.sql   | 29 +++--
 src/test/regress/sql/create_table.sql  | 13 +++-
 src/test/regress/sql/indexing.sql  | 57 +++--
 9 files changed, 221 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 6e92bbddd2..59be911471 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4206,11 +4206,13 @@ ALTER INDEX measurement_city_id_logdate_key
 
  
   
-   There is no way to create an exclusion constraint spanning the
-   whole partitioned table.  It is only possible to put such a
-   constraint on each leaf partition individually.  Again, this
-   limitation stems from not being able to enforce cross-partition
-   restrictions.
+   Similarly an exclusion constraint must include all the 
+   partition key columns. Furthermore the constraint must compare those
+   columns for equality (not e.g. ).
+   Again, this limitation stems from not being able to enforce
+   cross-partition restrictions. The constraint may include additional
+   columns that aren't part of the partition key, and it may compare
+   those with any operators you like.
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7dc1aca8fe..28840544b5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -709,11 +709,6 @@ DefineIndex(Oid relationId,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
 			RelationGetRelationName(rel;
-		if (stmt->excludeOpNames)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
-			RelationGetRelationName(rel;
 	}
 
 	/*
@@ -926,7 +921,7 @@ DefineIndex(Oid relationId,
 	 * We could lift this limitation if we had global indexes, but those have
 	 * their own problems, so this is a useful feature combination.
 	 */
-	if (partitioned && (stmt->unique || stmt->primary))
+	if (partitioned && (stmt->unique || stmt->primary || stmt->excludeOpNames != NIL))
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
 		const char *constraint_type;
@@ -983,6 +978,8 @@ DefineIndex(Oid relationId,
 			 */
 			if (accessMethodId == BTREE_AM_OID)
 eq_strategy = BTEqualStrategyNumber;
+			else if (accessMethodId == GIST_AM_OID)
+eq_strategy = 

Think-o in foreign key comments

2022-12-02 Thread Paul Jungwirth

Hello,

I noticed a few places in the new foreign key code where a comment says 
"the ON DELETE SET NULL/DELETE clause". I believe it should say "ON 
DELETE SET NULL/DEFAULT".


These comments were added in d6f96ed94e7, "Allow specifying column list 
for foreign key ON DELETE SET actions." Here is a patch to correct them. 
I don't think you usually create a commitfest entry for tiny fixes like 
this, right? But if you'd like one please let me know and I'll add it.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom dc5e317762266b93bfa44bb303fe1882853d8de7 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 2 Dec 2022 14:02:55 -0800
Subject: [PATCH v1] Fix FK comment think-o

---
 src/backend/commands/tablecmds.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e83f375b5..716793e157 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9467,10 +9467,10 @@ validateFkOnDeleteSetColumns(int numfks, const int16 *fkattnums,
  * numfks is the number of columns in the foreign key
  * pkattnum is the attnum array of referenced attributes.
  * fkattnum is the attnum array of referencing attributes.
- * numfkdelsetcols is the number of columns in the ON DELETE SET NULL/DELETE
+ * numfkdelsetcols is the number of columns in the ON DELETE SET NULL/DEFAULT
  *  (...) clause
  * fkdelsetcols is the attnum array of the columns in the ON DELETE SET
- *  NULL/DELETE clause
+ *  NULL/DEFAULT clause
  * pf/pp/ffeqoperators are OID array of operators between columns.
  * old_check_ok signals that this constraint replaces an existing one that
  * was already validated (thus this one doesn't need validation).
@@ -9686,10 +9686,10 @@ addFkRecurseReferenced(List **wqueue, Constraint *fkconstraint, Relation rel,
  * pkattnum is the attnum array of referenced attributes.
  * fkattnum is the attnum array of referencing attributes.
  * pf/pp/ffeqoperators are OID array of operators between columns.
- * numfkdelsetcols is the number of columns in the ON DELETE SET NULL/DELETE
+ * numfkdelsetcols is the number of columns in the ON DELETE SET NULL/DEFAULT
  *  (...) clause
  * fkdelsetcols is the attnum array of the columns in the ON DELETE SET
- *  NULL/DELETE clause
+ *  NULL/DEFAULT clause
  * old_check_ok signals that this constraint replaces an existing one that
  *		was already validated (thus this one doesn't need validation).
  * lockmode is the lockmode to acquire on partitions when recursing.
-- 
2.25.1



Re: range_agg with multirange inputs

2022-03-11 Thread Paul Jungwirth

On 3/10/22 14:07, Chapman Flack wrote:

When I apply this patch, I get a func.sgml with two entries for
range_intersect_agg(anymultirange).


Arg, fixed.


In range_agg_transfn, you've changed the message in the "must be called
with a range or multirange"; that seems like another good candidate to
be an elog.


Agreed. Updated here.


I think your query finds aggregate declarations that share the same SQL
function declaration as their finalizer functions. That seems to be more
common.

The query I used looks for cases where different SQL-declared functions
appear as finalizers of aggregates, but the different SQL declared functions
share the same internal C implementation.


Okay, I see. I believe that is quite common for ordinary SQL functions. 
Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. 
You're right there are no cases for other finalfns yet, but I don't 
think there is anything special about finalfns that would make this a 
weirder thing to do there than with ordinary functions. Still, noting it 
with a comment does seem helpful. I've updated the remark to match what 
you suggested.


Thank you again for the review, and sorry for so many iterations! :-)

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom b409d7333132e34a928ec8cc0ecca0a3421b7268 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 10 Dec 2021 16:04:57 -0800
Subject: [PATCH v4] Add range_agg with multirange inputs

---
 doc/src/sgml/func.sgml|  30 ++
 src/backend/utils/adt/multirangetypes.c   |  69 ++--
 src/include/catalog/pg_aggregate.dat  |   3 +
 src/include/catalog/pg_proc.dat   |  11 ++
 src/test/regress/expected/multirangetypes.out | 100 ++
 src/test/regress/expected/opr_sanity.out  |   1 +
 src/test/regress/sql/multirangetypes.sql  |  21 
 src/test/regress/sql/opr_sanity.sql   |   1 +
 8 files changed, 230 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..ab6c4f0093 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20012,8 +20012,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_agg
+
+range_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the union of the non-null input values.
+   
+   No
+  
+
   

 
  range_intersect_agg
@@ -20027,8 +20042,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
   

 
  string_agg
diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index c474b24431..a6d376b083 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -1344,11 +1344,9 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, "range_agg_transfn called in non-aggregate context");
 
 	rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
 	if (!type_is_range(rngtypoid))
-		ereport(ERROR,
-(errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("range_agg must be called with a range")));
+		elog(ERROR, "range_agg must be called with a range or multirange");
 
 	if (PG_ARGISNULL(0))
 		state = initArrayResult(rngtypoid, aggContext, false);
 	else
@@ -1362,8 +1360,11 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 }
 
 /*
  * range_agg_finalfn: use our internal array to merge touching ranges.
+ *
+ * Shared by range_agg_finalfn(anyrange) and
+ * multirange_agg_finalfn(anymultirange).
  */
 Datum
 range_agg_finalfn(PG_FUNCTION_ARGS)
 {
@@ -1397,8 +1398,66 @@ range_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges));
 }
 
+/*
+ * multirange_agg_transfn: combine adjacent/overlapping multiranges.
+ *
+ * All we do here is gather the input multiranges' ranges into an array
+ * so that the finalfn can sort and combine them.
+ */
+Datum
+multirange_agg_transfn(PG_FUNCTION_ARGS)
+{
+	MemoryContext aggContext;
+	Oid			mltrngtypoid;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rngtypcache;
+	ArrayBuildState *state;
+	MultirangeType *current;
+	int32		range_count;
+	RangeType **ranges;
+	int32		i;
+
+	if (!AggCheckCallContext(fcinfo, ))
+		elog(ERROR, "multirange_agg_transfn called in non-aggregate context");
+
+	mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+	if (!type_is_multirange(mltrngtypoid))
+		elog(ERROR, "range_agg must be called with a range or multirange");
+
+	typcache = multirange_get_typcache(fcinfo, mltrngtypoid);
+	rngtypcache = typcache->rngtype;
+
+	if (PG_ARGISNULL(0))
+		

Re: range_agg with multirange inputs

2022-03-05 Thread Paul Jungwirth

On 3/1/22 13:33, Chapman Flack wrote:

I think the 4 lines should suffice, but it looks like this patch was
generated from a rebase of the old one (with three lines) that ended up
putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
position is now baked into the 4 lines of context. :)


You're right, my last rebase messed up the docs. Here it is fixed. Sorry 
about that!



I would not change them to actual Assert, which would blow up the whole
process on failure. If it's a genuine "not expected to happen" case,
maybe changing it to elog (or ereport with errmsg_internal) would save
a little workload for translators.


I like the elog solution. I've changed them in both places.


I did a small double-take seeing the C range_agg_finalfn being shared
by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that
the reason it works is get_fn_expr_rettype works equally well with
either parameter type.

Do you think it would be worth adding a comment at the C function
explaining that? In a quick query I just did, I found no other aggregate
final functions sharing a C function that way, so this could be the first.


I see 13 other shared finalfns (using select 
array_agg(aggfnoid::regproc) as procs, array_agg(aggtransfn) as 
transfns, aggfinalfn from pg_aggregate where aggfinalfn is distinct from 
0 group by aggfinalfn having count(*) > 1;) but a comment can't hurt! Added.


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 1e7a96668b0ad341d29281055927ad693c656843 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 10 Dec 2021 16:04:57 -0800
Subject: [PATCH v3] Add range_agg with multirange inputs

---
 doc/src/sgml/func.sgml|  45 
 src/backend/utils/adt/multirangetypes.c   |  66 +++-
 src/include/catalog/pg_aggregate.dat  |   3 +
 src/include/catalog/pg_proc.dat   |  11 ++
 src/test/regress/expected/multirangetypes.out | 100 ++
 src/test/regress/expected/opr_sanity.out  |   1 +
 src/test/regress/sql/multirangetypes.sql  |  21 
 src/test/regress/sql/opr_sanity.sql   |   1 +
 8 files changed, 244 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..fb1963a687 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20012,8 +20012,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_agg
+
+range_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the union of the non-null input values.
+   
+   No
+  
+
   

 
  range_intersect_agg
@@ -20027,8 +20042,38 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
   

 
  string_agg
diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index c474b24431..5a4c00457a 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -1346,9 +1346,9 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 	rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
 	if (!type_is_range(rngtypoid))
 		ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("range_agg must be called with a range")));
+ errmsg("range_agg must be called with a range or multirange")));
 
 	if (PG_ARGISNULL(0))
 		state = initArrayResult(rngtypoid, aggContext, false);
 	else
@@ -1362,8 +1362,10 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 }
 
 /*
  * range_agg_finalfn: use our internal array to merge touching ranges.
+ *
+ * Shared by range_agg(anyrange) and range_agg(anymultirange).
  */
 Datum
 range_agg_finalfn(PG_FUNCTION_ARGS)
 {
@@ -1397,8 +1399,66 @@ range_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges));
 }
 
+/*
+ * multirange_agg_transfn: combine adjacent/overlapping multiranges.
+ *
+ * All we do here is gather the input multiranges' ranges into an array
+ * so that the finalfn can sort and combine them.
+ */
+Datum
+multirange_agg_transfn(PG_FUNCTION_ARGS)
+{
+	MemoryContext aggContext;
+	Oid			mltrngtypoid;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rngtypcache;
+	ArrayBuildState *state;
+	MultirangeType *current;
+	int32		range_count;
+	RangeType **ranges;
+	int32		i;
+
+	if 

Re: range_agg with multirange inputs

2022-02-28 Thread Paul Jungwirth

On 2/26/22 17:13, Chapman Flack wrote:

This applies (with some fuzz) and passes installcheck-world, but a rebase
is needed, because 3 lines of context aren't enough to get the doc changes
in the right place in the aggregate function table. (I think generating
the patch with 4 lines of context would be enough to keep that from being
a recurring issue.)


Thank you for the review and the tip re 4 lines of context! Rebase attached.


One thing that seems a bit funny is this message in the new
multirange_agg_transfn:

+   if (!type_is_multirange(mltrngtypoid))
+   ereport(ERROR,
+   (errcode(ERRCODE_DATATYPE_MISMATCH),
+errmsg("range_agg must be called with a multirange")));


I agree it would be more helpful to users to let them know we can take 
either kind of argument. I changed the message to "range_agg must be 
called with a range or multirange". How does that seem?



I kind of wonder whether either message is really reachable, at least
through the aggregate machinery in the expected way. Won't that machinery
ensure that it is calling the right transfn with the right type of
argument? If that's the case, maybe the messages could only be seen
by someone calling the transfn directly ... which also seems ruled out
for these transfns because the state type is internal. Is this whole test
more of the nature of an assertion?


I don't think they are reachable, so perhaps they are more like asserts. 
Do you think I should change it? It seems like a worthwhile check in any 
case.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom a6689485aab9b1aaa6e866f2a577368c7a0e324e Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 10 Dec 2021 16:04:57 -0800
Subject: [PATCH v2] Add range_agg with multirange inputs

---
 doc/src/sgml/func.sgml|  30 ++
 src/backend/utils/adt/multirangetypes.c   |  62 ++-
 src/include/catalog/pg_aggregate.dat  |   3 +
 src/include/catalog/pg_proc.dat   |  11 ++
 src/test/regress/expected/multirangetypes.out | 100 ++
 src/test/regress/expected/opr_sanity.out  |   1 +
 src/test/regress/sql/multirangetypes.sql  |  21 
 src/test/regress/sql/opr_sanity.sql   |   1 +
 8 files changed, 228 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..6a72785327 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19959,8 +19959,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_agg
+
+range_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the union of the non-null input values.
+   
+   No
+  
+
   

 
  max
@@ -20012,8 +20027,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
   

 
  range_intersect_agg
diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index c474b24431..0efef8cf35 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -1346,9 +1346,9 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 	rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
 	if (!type_is_range(rngtypoid))
 		ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("range_agg must be called with a range")));
+ errmsg("range_agg must be called with a range or multirange")));
 
 	if (PG_ARGISNULL(0))
 		state = initArrayResult(rngtypoid, aggContext, false);
 	else
@@ -1397,8 +1397,68 @@ range_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges));
 }
 
+/*
+ * multirange_agg_transfn: combine adjacent/overlapping multiranges.
+ *
+ * All we do here is gather the input multiranges' ranges into an array
+ * so that the finalfn can sort and combine them.
+ */
+Datum
+multirange_agg_transfn(PG_FUNCTION_ARGS)
+{
+	MemoryContext aggContext;
+	Oid			mltrngtypoid;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rngtypcache;
+	ArrayBuildState *state;
+	MultirangeType *current;
+	int32		range_count;
+	RangeType **ranges;
+	int32		i;
+
+	if (!AggCheckCallContext(fcinfo, ))
+		elog(ERROR, "multirange_agg_transfn called in non-aggregate context");
+
+	mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+	if (!type_is_multirange(mltrngtypoid))
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("range_agg must be called with a range or multirange")));
+
+	typcache = multirange_get_typcache(fcinfo, mltrngtypoid);
+	rngtypcache = typcache->rngtype;
+
+	if 

range_agg with multirange inputs

2021-12-10 Thread Paul Jungwirth
Here is a patch adding range_agg(anymultirange). Previously range_agg 
only accepted anyrange.


Here is a bug report from last month requesting this addition:

https://www.postgresql.org/message-id/CAOC8YUcOtAGscPa31ik8UEMzgn8uAWA09s6CYOGPyP9_cBbWTw%40mail.gmail.com

As that message points out, range_intersect_agg accepts either anyrange 
or anymultirange, so it makes sense for range_agg to do the same.


I noticed that the docs only mentioned range_intersect_agg(anyrange), so 
I added the anymultirange versions of both on the aggregate functions page.


I also added a few more tests for range_intersect_agg since the coverage 
there seemed light.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com
>From 116c9dd5b3fbb6626d81588c124cf8fdcb4185ce Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 10 Dec 2021 16:04:57 -0800
Subject: [PATCH v1] Add range_agg with multirange inputs

---
 doc/src/sgml/func.sgml|  30 ++
 src/backend/utils/adt/multirangetypes.c   |  60 +++
 src/include/catalog/pg_aggregate.dat  |   3 +
 src/include/catalog/pg_proc.dat   |  11 ++
 src/test/regress/expected/multirangetypes.out | 100 ++
 src/test/regress/expected/opr_sanity.out  |   1 +
 src/test/regress/sql/multirangetypes.sql  |  21 
 src/test/regress/sql/opr_sanity.sql   |   1 +
 8 files changed, 227 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0a725a6711..697bf1924e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19951,6 +19951,21 @@ SELECT NULLIF(value, '(none)') ...
No
   
 
+  
+   
+
+ range_agg
+
+range_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the union of the non-null input values.
+   
+   No
+  
+
   

 
@@ -19966,6 +19981,21 @@ SELECT NULLIF(value, '(none)') ...
No
   
 
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
   

 
diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index 7773215564..beaa843eab 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -1394,6 +1394,66 @@ range_agg_finalfn(PG_FUNCTION_ARGS)
 	PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges));
 }
 
+/*
+ * multirange_agg_transfn: combine adjacent/overlapping multiranges.
+ *
+ * All we do here is gather the input multiranges' ranges into an array
+ * so that the finalfn can sort and combine them.
+ */
+Datum
+multirange_agg_transfn(PG_FUNCTION_ARGS)
+{
+	MemoryContext aggContext;
+	Oid			mltrngtypoid;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rngtypcache;
+	ArrayBuildState *state;
+	MultirangeType *current;
+	int32		range_count;
+	RangeType **ranges;
+	int32		i;
+
+	if (!AggCheckCallContext(fcinfo, ))
+		elog(ERROR, "multirange_agg_transfn called in non-aggregate context");
+
+	mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+	if (!type_is_multirange(mltrngtypoid))
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("range_agg must be called with a multirange")));
+
+	typcache = multirange_get_typcache(fcinfo, mltrngtypoid);
+	rngtypcache = typcache->rngtype;
+
+	if (PG_ARGISNULL(0))
+		state = initArrayResult(rngtypcache->type_id, aggContext, false);
+	else
+		state = (ArrayBuildState *) PG_GETARG_POINTER(0);
+
+	/* skip NULLs */
+	if (!PG_ARGISNULL(1))
+	{
+		current = PG_GETARG_MULTIRANGE_P(1);
+		multirange_deserialize(rngtypcache, current, _count, );
+		if (range_count == 0)
+		{
+			/*
+			 * Add an empty range so we get an empty result (not a null result).
+			 */
+			accumArrayResult(state,
+			 RangeTypePGetDatum(make_empty_range(rngtypcache)),
+			 false, rngtypcache->type_id, aggContext);
+		}
+		else
+		{
+			for (i = 0; i < range_count; i++)
+accumArrayResult(state, RangeTypePGetDatum(ranges[i]), false, rngtypcache->type_id, aggContext);
+		}
+	}
+
+	PG_RETURN_POINTER(state);
+}
+
 Datum
 multirange_intersect_agg_transfn(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index fc6d3bfd94..5a06583798 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -557,6 +557,9 @@
 { aggfnoid => 'range_agg(anyrange)', aggtransfn => 'range_agg_transfn',
   aggfinalfn => 'range_agg_finalfn', aggfinalextra => 't',
   aggtranstype => 'internal' },
+{ aggfnoid => 'range_agg(anymultirange)', aggtransfn => 'multirange_agg_transfn',
+  aggfinalfn => 'multirange_agg_finalfn', aggfinalextra => 't',
+  aggtranstype => 

Re: SQL:2011 PERIODS vs Postgres Ranges?

2020-10-27 Thread Paul Jungwirth

On 10/27/20 7:11 AM, Ibrar Ahmed wrote:
I have spent some more time on the patch and did a lot of cleanup along 
with some fixes, compilation errors, and warnings.


Thank you for taking a look at this! I've been swamped with ordinary 
work and haven't had a chance to focus on it for a while, but I'm hoping 
to make some improvements over the coming holidays, especially based on 
feedback from my talk at PgCon. There are a handful of small specific 
things I'd like to do, and then one big thing: add support for PERIODs. 
Vik said I could include his old patch for PERIODs, so I'd like to get 
that working on the latest master, and then rebase my own work on top of 
it. Then we can accept either ranges or PERIODs in various places 
(marked by TODOs in the code).


Vik also pointed out a way to check foreign keys without using 
range_agg. He thinks it may even be more efficient. On the other hand 
it's a much more complicated SQL statement. I'd like to do a performance 
comparison to get concrete numbers, but if we did use his query, then 
this patch wouldn't depend on multiranges anymore---which seems like a 
big aid to moving it forward. Assuming multiranges gets committed, we 
can always swap in the range_agg query depending on the performance 
comparison results.


I apologize for the slow progress here, and thank you for your help!

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2020-03-23 Thread Paul Jungwirth

Thanks Alvaro!

On Mon, Mar 23, 2020 at 4:33 PM Alvaro Herrera 
 wrote:

>
> Thinking about the on-disk representation, can we do better than putting
> the contained ranges in long-varlena format, including padding; also we
> include the type OID with each element.  Sounds wasteful.  A more
> compact representation might be to allow short varlenas and doing away
> with the alignment padding, put the the type OID just once.  This is
> important because we cannot change it later.

Can you give me some guidance on this? I don't know how to make the 
on-disk format different from the in-memory format. (And for the 
in-memory format, I think it's important to have actual RangeTypes 
inside the multirange.) Is there something in the documentation, or a 
README in the repo, or even another type I can follow?


> I'm also wondering if multirange_in() is the right strategy.  Would 
it> be sensible to give each range to range_parse or range_parse_bounde, so

> that it determines where each range starts and ends?  Then that function
> doesn't have to worry about each quote and escape, duplicating range
> parsing code.  (This will probably require changing signature of the
> rangetypes.c function, and exporting it; for example have
> range_parse_bound allow bound_str to be NULL and in that case don't mess
> with the StringInfo and just return the end position of the parsed
> bound.)

Yeah, I really wanted to do it that way originally too. As you say it 
would require passing back more information from the range-parsing code. 
I can take a stab at making the necessary changes. I'm a bit more 
confident now than I was then in changing the range code we have already.


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: useless RangeIOData->typiofunc

2020-03-04 Thread Paul Jungwirth

On 3/4/20 1:57 PM, Alvaro Herrera wrote:

I noticed while going over the multirange types patch that it adds a
pointless typiofunc cached OID to a struct used for I/O functions'
fn_extra.  It seems to go completely unused, so I checked range types
(which this was cribbed from) and indeed, it is completely unused there
either.  My guess is that it was in turn cribbed from array's
ArrayMetaState, which is considerably more sophisticated; I suspect
nobody noticed that caching it was pointless.


I didn't believe it at first but I think you're right. :-)


Here's a patch to remove it from rangetypes.c.  It doesn't really waste
much memory anyway, but removing it lessens the cognitive load by one or
two bits.


Looks good to me, and it seems okay to make the same edits to 
multirangetypes.c


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2020-03-04 Thread Paul Jungwirth

Thanks for looking at this again!

On 3/4/20 1:33 PM, Alvaro Herrera wrote:

I came across an interesting thing, namely multirange_canonicalize()'s
use of qsort_arg with a callback of range_compare().  range_compare()
calls range_deserialize() (non-trivial parsing) for each input range;
multirange_canonicalize() later does a few extra deserialize calls of
its own.  Call me a premature optimization guy if you will, but I think
it makes sense to have a different struct (let's call it
"InMemoryRange") which stores the parsed representation of each range;
then we can deserialize all ranges up front, and use that as many times
as needed, without having to deserialize each range every time.


I don't know, this sounds like a drastic change. I agree that 
multirange_deserialize and range_deserialize do a lot of copying (not 
really any parsing though, and they both assume their inputs are already 
de-TOASTED). But they are used very extensively, so if you wanted to 
remove them you'd have to rewrite a lot.


I interpreted the intention of range_deserialize to be a way to keep the 
range struct fairly "private" and give a standard interface to 
extracting its attributes. Its motive seems akin to deconstruct_array. 
So I wrote multirange_deserialize to follow that principle. Both 
functions also handle memory alignment issues for you. With 
multirange_deserialize, there isn't actually much structure (just the 
list of ranges), so perhaps you could more easily omit it and give 
callers direct access into the multirange contents. That still seems 
risky though, and less well encapsulated.


My preference would be to see if these functions are really a 
performance problem first, and only redo the in-memory structures if 
they are. Also that seems like something you could do as a separate 
project. (I wouldn't mind working on it myself, although I'd prefer to 
do actual temporal database features first.) There are no 
backwards-compatibility concerns to changing the in-memory structure, 
right? (Even if there are, it's too late to avoid them for ranges.)



While I'm at this, why not name the new file simply multiranges.c
instead of multirangetypes.c?


As someone who doesn't do a lot of Postgres hacking, I tried to follow 
the approach in rangetypes.c as closely as I could, especially for 
naming things. So I named the file multirangetypes.c because there was 
already rangetypes.c. But also I can see how the "types" emphasizes that 
ranges and multiranges are not concrete types themselves, but more like 
abstract data types or generics (like arrays).


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-11-21 Thread Paul Jungwirth

On 11/21/19 1:06 AM, Pavel Stehule wrote:
2. I don't like introduction "safe" operators - now the basic operators 
are doubled, and nobody without documentation will use @* operators.


It is not intuitive. I think is better to map this functionality to 
basic operators +- * and implement it just for pairs (Multirange, 
Multirange) and (Multirange, Range) if it is possible


It's same relation line Numeric X integer. There should not be 
introduced new operators. If somebody need it for ranges, then he can 
use cast to multirange, and can continue.

> [snip]

3. There are not prepared casts -

postgres=# select int8range(10,15)::int8multirange;
ERROR:  cannot cast type int8range to int8multirange
LINE 1: select int8range(10,15)::int8multirange;
                                ^
There should be some a) fully generic solution, or b) possibility to 
build implicit cast when any multirange type is created.


Okay, I like the idea of just having `range + range` and `multirange + 
multirange`, then letting you cast between ranges and multiranges. The 
analogy to int/numeric seems strong. I guess if you cast a multirange 
with more than one element to a range it will raise an error. That will 
let me clean up the docs a lot too.


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Add json_object(text[], json[])?

2019-10-25 Thread Paul Jungwirth

On 10/25/19 6:40 AM, Andrew Dunstan wrote:

json{b}_build_object and json{b}_build_array are designed for creating
nested json{b}. Not sure if they would work for your purpose.


Thanks for the suggestion! I looked at these a bit, but they only work 
if you have a known-ahead-of-time number of arguments. (I did explore 
building an array and calling jsonb_build_object using VARIADIC, but you 
can't build an array with alternating text & jsonb elements. That made 
me curious how these functions even worked, which led me to 
extract_variadic_args (utils/fmgr/funcapi.c), which has some magic to 
support heterogeneous types when not called with the VARIADIC keyword, 
so it seems they bypass the normal variadic handling.)


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Add json_object(text[], json[])?

2019-10-24 Thread Paul Jungwirth

Hello,

I noticed that our existing 2-param json{,b}_object functions take 
text[] for both keys and values, so they are only able to build 
one-layer-deep JSON objects. I'm interested in adding json{,b}_object 
functions that take text[] for the keys and json{,b}[] for the values. 
It would otherwise behave the same as json_object(text[], text[]) (e.g. 
re NULL handling). Does that seem worthwhile to anyone?


I'll share my specific problem where I felt I could use this function, 
although you can stop reading here if that isn't interesting to you. :-) 
I was building a jsonb_dasherize(j jsonb) function, which converts 
snake_case JSON keys into dashed-case JSON keys. (It's because of a 
Javascript framework :-) My function needs to walk the whole JSON 
structure, doing this recursively when it sees objects inside arrays or 
other objects. Here is the definition, including a comment where my 
proposed jsonb_object would have helped:


CREATE FUNCTION jsonb_dasherize(j jsonb)
RETURNS jsonb
IMMUTABLE
AS
$$
DECLARE
t text;
key text;
val jsonb;
ret jsonb;
BEGIN
  t := jsonb_typeof(j);
  IF t = 'object' THEN
-- So close! If only jsonb_object took text[] and jsonb[] params
-- SELECT  jsonb_object(
--   array_agg(dasherize_key(k)),
--   array_agg(jsonb_dasherize(v)))
-- FROMjsonb_each(j) AS t(k, v);
ret := '{}';
FOR key, val IN SELECT * FROM jsonb_each(j) LOOP
  ret := jsonb_set(ret,
   array[REPLACE(key, '_', '-')],
   jsonb_dasherize(val), true);
END LOOP;
RETURN ret;
  ELSIF t = 'array' THEN
SELECT  COALESCE(jsonb_agg(jsonb_dasherize(elem)), '[]')
INTOret
FROMjsonb_array_elements(j) AS t(elem);
RETURN ret;
  ELSIF t IS NULL THEN
-- This should never happen internally
-- but only from a passed-in NULL.
RETURN NULL;
  ELSE
-- string/number/null:
RETURN j;
  END IF;
END;
$$
LANGUAGE plpgsql;

I also tried a recursive CTE there using jsonb_set, but it was too late 
at night for me to figure that one out. :-)


It seems like a json-taking json_object would be just what I needed. And 
in general I was surprised that Postgres didn't have a more convenient 
way to build multi-layer JSON. I'm happy to add this myself if other 
folks want it.


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 PERIODS vs Postgres Ranges?

2019-08-06 Thread Paul Jungwirth

Hi Ibrar,

On 8/6/19 3:26 AM, Ibrar Ahmed wrote:

- Why we are not allowing any other datatype other than ranges in the
primary key. Without that there is no purpose of a primary key.


A temporal primary key always has at least one ordinary column (of any 
type), so it is just a traditional primary key *plus* a PERIOD and/or 
range column to indicate when the record was true.



- Thinking about some special token to differentiate between normal
primary key and temporal primary key


There is already some extra syntax. For the time part of a PK, you say 
`WITHOUT OVERLAPS`, like this:


CONSTRAINT pk_on_t PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)

In this example `id` is an ordinary column, and `valid_at` is either a 
Postgres range or a SQL:2011 PERIOD. (The latter is not yet implemented 
in my patch but there are some placeholder comments.)


Similarly a foreign key has one or more traditional columns *plus* a 
range/PERIOD. It needs to have a range/PERIOD on both sides. It too has 
some special syntax, but instead of `WITHOUT OVERLAPS` it is `PERIOD`. 
(Don't blame me, I didn't write the standard :-) So here is an example:


CONSTRAINT fk_t2_to_t FOREIGN KEY (id, PERIOD valid_at)
  REFERENCES t (id, PERIOD valid_at)

You should be able to see my changes to gram.y to support this new syntax.

I hope this clears up how it works! I'm happy to answer more questions 
if you have any. Also if you want to read more:


- This paper by Kulkarni & Michels is a 10-page overview of SQL:2011:

https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf

- This is a talk I gave at PGCon 2019 going over the concepts, with a 
lot of pictures. You can find text, slides, and a link to the video here:


https://github.com/pjungwir/postgres-temporal-talk

- This link is ostensibly an annotated bibliography but really tells a 
story about how the research has developed:


https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/

- There is also some discussion about PERIODs vs ranges upthread here, 
as well as here:


https://www.postgresql-archive.org/Periods-td6022563.html


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-07-10 Thread Paul Jungwirth

On 7/9/19 11:24 PM, David Fetter wrote:

I seem to recall that the usual convention (at least in math) is
to use intervals that are generally represented as open on the
infinity side, but that might not fit how we do things.


I think it does, unless I'm misunderstanding?


Oh, I was just wondering about the square bracket on the left side of
[null, 1).  It's not super important.


Ah, I understand now. Just a typo on my part. Thanks for catching it, 
and sorry for the confusion!



!mr , perhaps?


I like that suggestion. Honestly I'm not sure we even want an inverse,
but it's so important theoretically we should at least consider
whether it is appropriate here. Or maybe "inverse" is the wrong word
for this, or there is a different meaning it should have.


Jeff's suggestion of ~ for complement is better.


Okay, thanks. I like it better too.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-07-09 Thread Paul Jungwirth

On 7/9/19 12:01 PM, Alvaro Herrera wrote:

On 2019-Jul-08, Paul A Jungwirth wrote:


- You can subscript a multirange like you do an array (? This could be
a function instead.)


Note that we already have a patch in the pipe to make subscripting an
extensible operation, which would fit pretty well here, I think.


I'll take a look at that!


Also, I suppose you would need unnest(multirange) to yield the set of
ranges.


I think that would be really nice, although it isn't critical I think if 
you can do something like UNNEST(multirange::tstzrange[]).


--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-05-06 Thread Paul Jungwirth

I suspect that if you build it, the will come, "they" being anyone who
has to schedule coverage, check usage of a resource over time, etc. Is
this something you want help with at some level? Coding, testing,
promoting...


You might be right. :-) Most of this is done already, since it was 
largely copy/paste from my extension plus figuring out how to register 
built-in functions with the .dat files. I need to write some docs and do 
some cleanup and I'll have a CF entry. And I'll probably go ahead and 
add your two suggestions too Things I'd love help with:


- Getting more opinions about the functions' interface, either from you 
or others, especially:
  - In the extension I have a boolean param to let you accept gaps or 
raise an error, and another for overlaps. But what about 
accepting/raising/returning null? How should the parameters expose that? 
Maybe leave them as bools but accept true/false/null for 
permit/raise/nullify respectively? That seems like a questionable UI, 
but I'm not sure what would be better. Maybe someone with better taste 
can weigh in. :-) I tried to find existing built-in functions that gave 
a enumeration of options like that but couldn't find an existing example.
  - Also: what do you think of the question I asked in my reply to 
Corey? Is it better to have *all* range_agg functions return an array of 
ranges, or it is nicer to have a variant that always returns a single range?

- Getting it reviewed.
- Advice about sequencing it with respect to my temporal foreign keys 
patch, where I'm planning to call range_agg to check an FK. E.g. should 
my FK patch be a diff on top of the range_agg code? I assume they should 
have separate CF entries though?


Oh and here's something specific:

- I gave oids to my new functions starting with 8000, because I thought 
I saw some discussion about that recently, and the final committer will 
correct the oids to the current n+1? But I can't find that discussion 
anymore, so if that's the wrong approach let me know.


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-05-06 Thread Paul Jungwirth

On 5/3/19 6:41 PM, David Fetter wrote:

This suggests two different ways to extend ranges over aggregation:
one which is a union of (in general) disjoint intervals, two others
are a union of intervals, each of which has a weight.
. . .
I think the cases above, or at least the first two of them, should be
available. They could be called range_agg, weighted_range_agg, and
covering_range_agg.


Thanks David! I think these two new functions make sense. Before I 
implement them too I wonder if anyone else has uses for them?


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-05-06 Thread Paul Jungwirth

On 5/4/19 3:11 PM, Corey Huinker wrote:

One question is how to aggregate ranges that would leave gaps and/or
overlaps. So in my extension there is a one-param version that forbids
gaps & overlaps, but I let you permit them by passing extra parameters,
so the signature is:


Perhaps a third way would be to allow and preserve the gaps.


Thanks for the feedback! I think this is what I'm doing already 
(returning an array of ranges), but let me know if I'm misunderstanding. 
My extension has these signatures:


range_agg(anyrange) returning anyrange
range_agg(anyrange, bool) returning anyarray
range_agg(anyrange, bool, bool) returning anyarray.

The first variant raises an error if there are gaps or overlaps and 
always returns a single range, but the other two return an array of ranges.


I was planning to use the same signatures for my patch to pg, unless 
someone thinks they should be different. But I'm starting to wonder if 
they shouldn't *all* return arrays. I have two concrete use-cases for 
these functions and they both require the array-returning versions. Is 
it helpful to have a version that always returns a single range? Or 
should I make them all consistent?


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




range_agg

2019-05-03 Thread Paul Jungwirth

Hello,

I wrote an extension to add a range_agg function with similar behavior 
to existing *_agg functions, and I'm wondering if folks would like to 
have it in core? Here is the repo: https://github.com/pjungwir/range_agg


I'm also working on a patch for temporal foreign keys, and having 
range_agg would make the FK check easier and faster, which is why I'd 
like to get it added. But also it just seems useful, like array_agg, 
json_agg, etc.


One question is how to aggregate ranges that would leave gaps and/or 
overlaps. So in my extension there is a one-param version that forbids 
gaps & overlaps, but I let you permit them by passing extra parameters, 
so the signature is:


range_agg(r anyrange, permit_gaps boolean, permit_overlaps boolean)

Perhaps another useful choice would be to return NULL if a gap/overlap 
is found, so that each param would have three choices instead of just 
two: accept the inputs, raise an error, return a NULL.


What do people think? I plan to work on a patch regardless, so that I 
can use it for temporal FKs, but I'd appreciate some feedback on the 
"user interface".


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Temporal Table Proposal

2019-03-03 Thread Paul Jungwirth

On 2/25/19 4:21 AM, Ibrar Ahmed wrote:
Great, to hear that you are working on that. Do you think I can help you 
with this? I did some groundwork to make it possible. I can help in 
coding/reviewing or even can take lead if you want to.


Hi Ibrar,

I'd love some help with this! I submitted my patch to the March 
commitfest, and Peter Moser & Anton Dignös submitted theirs also. I 
still need to rebase on the most recent commits, but I'll try to do that 
tonight or tomorrow. Personally I'd love some review and feedback, 
because this is my first substantial patch. (I made a small change to 
btree_gist a couple years ago also)


I think the challenge with temporal functionality is that there are a 
lot of new concepts, and we'd like them all to hang together in a 
coherent way. (That's why I want to give a talk about it: to increase 
background understanding in the Postgres community.) So having someone 
take the lead on it makes sense. I'm happy to provide some opinions and 
direction, but my own coding contributions are likely to be slow, and 
having a regular contributor more closely involved would help a lot.


Here are some thoughts about things that need work:

- temporal primary keys (my patch)
- temporal foreign keys (I've done some work on this adding to my patch 
but I haven't finished it yet.)

- temporal joins (Moser/Dignös patch)
- declaring PERIODs (Vik Fearing's patch)
- showing PERIODs in the system catalog (Vik Fearing's patch)
- using PERIODs in SELECT, WHERE, GROUP BY, HAVING, function arguments, 
etc. (TODO)

- SYSTEM_TIME PERIODs for transaction-time tables (TODO)
- temporal insert/update/delete for transaction-time tables (TODO)
- temporal insert/update/delete for valid-time tables (TODO)
- temporal SELECT for valid-time tables (TODO, could build off the 
Moser/Dignös work)

- temporal SELECT for transaction-time tables (TODO)

I think the transaction-time stuff is easier, but also less interesting, 
and there are well-known patterns for accomplishing it already. I'm more 
interested in supporting valid-time tables personally.


--
Paul  ~{:-)
p...@illuminatedcomputing.com



Re: Temporal Table Proposal

2019-02-22 Thread Paul Jungwirth

On 2/22/19 11:31 AM, Euler Taveira wrote:

Em sex, 22 de fev de 2019 às 15:41, Ibrar Ahmed
 escreveu:


While working on another PostgreSQL feature, I was thinking that we could use a 
temporal table in PostgreSQL. Some existing databases offer this. I searched 
for any discussion on the PostgreSQL mailing list, but could not find any. 
Maybe my search wasn’t accurate enough:  if anyone can point me to a 
discussion, that would be useful.


https://www.postgresql.org/message-id/CA%2BrenyUb%2BXHzsrPHHR6ELqguxaUPGhOPyVc7NW%2BkRsRpBZuUFQ%40mail.gmail.com

This is the last one. I don't know why it wasn't in the January CF.


Oh that's by me! :-)

I didn't put it into the CF because I wanted to get some feedback on 
primary keys before I got too far into foreign keys, but someone 
recently advised me to starting adding to CFs anyway with "WIP" in the 
title, so I'll do that next time.


Btw my own patch is very modest, and I'd love to see this other much 
more extensive patch get some attention:


https://www.postgresql.org/message-id/flat/CAHO0eLYyvuqwF%3D2FsgDn1xOs_NOrFBu9Xh-Wq%2BaWfFy0y6%3DjWQ%40mail.gmail.com#4f7fbace3a2f2ce85fcc161cc3fdd273

They were told to adjust where in the query pipeline they do their work, 
and the latest patch does that (as I understand it), but I don't think 
anyone has looked at it yet.


Both of these patches use range types rather than SQL:2011 PERIODs, but 
I'd like to *also* support PERIODs (and accept ranges everywhere we 
accept PERIODs). Vik Fearing already has a patch to let you *declare* 
PERIODs:


https://www.postgresql-archive.org/Periods-td6022563.html

Actually using PERIODs in queries seems like a decent chunk of work 
though: basically it means making our grammar & processing accept 
PERIODs anywhere they currently accept columns. I'd love to hear some 
thoughts/suggestions around that. For example: a PERIOD is *similar* to 
a GENERATED column, so maybe the work being done there can/should 
influence how we implement them.


I'm excited to be getting some momentum around temporal features though! 
I'm supposed to give a talk about them at PGCon in Ottawa this spring, 
so hopefully that will help too.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com



Re: SQL:2011 PERIODS vs Postgres Ranges?

2018-10-28 Thread Paul Jungwirth

Hi Jeff,

Thanks for sharing your thoughts and encouragement! :-)

> The model in [7] is
> based heavily on pack/unpack operators, and it's hard for me to see
> how those fit into SQL. Also, the pack/unpack operators have some
> theoretical weirdness that the book does not make clear*.
>
> *: My question was about the significance
> of the order when packing on two intervals. Hugh Darwen was kind
> enough to reply at length, and offered a lot of insight, but was still
> somewhat inconclusive.

I'd be interested in seeing that conversation if you ever find it again.

I really like how Date/Darwen/Lorentzos use pack/unpack to explain 
temporal operations as operating on every concurrent "instant" 
separately, and then bringing the adjacent instants back together into 
ranges again. Even if you don't materialize that approach, conceptually 
it makes it easy to understand what's going on.


So what is great about the patch from Anton Dignös 
(https://www.postgresql-archive.org/PROPOSAL-Temporal-query-processing-with-range-types-tt5913058.html) 
is that (like Date/Darwen/Lorentzos) you still have temporal variants 
for every operator in the relational algebra, but they give 
straightforward & efficient implementations of each based on traditional 
operators plus just their two new "normalize" and "align" operations. (I 
think they renamed these in later papers/patches though?) Their main 
paper is at https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf 
if anyone wants to read it. It's short! :-)


The biggest challenge implementing temporal operators in plain SQL is 
merging/splitting ranges from the left & right sides of an operator so 
they line up. A single row can get split into multiple rows, or several 
rows might be merged into one, etc. You can see how tricky Snodgrass's 
"coalesce" operation is in his book. I gave some example SQL to 
implement coalesce with UNNEST plus a range_agg function at 
https://github.com/pjungwir/range_agg but with the Dignös approach I 
don't think you'd need that. Normalize/align targets roughly the same 
problem.


Anyway I'd be curious whether the theoretical weirdness you found in 
pack/unpack also applies to normalize/align.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com