Re: Referential Integrity Checks with Statement-level Triggers

2019-08-01 Thread Corey Huinker
>
>
> > The people who expressed opinions on nuking triggers from orbit (it's
> the only way to be sure) have yet to offer up any guidance on how to
> proceed from here, and I suspect it's because they're all very busy getting
> things ready for v12. I definitely have an interest in working on this for
> 13, but I don't feel good about striking out on my own without their input.
>
> Very interesting thread, but the current patch has been through two
> CFs without comments or new patches, so I'm going to mark it "Returned
> with feedback".  I hope all this discussion will trigger more research
> in this space.
>

I've noticed that the zedstore efforts ran into the same problem that
refactoring triggers has: we cannot determine which columns in a table will
be affected by a trigger. so we have to assume that all of them will be.
This causes a lot of unnecessary overhead with triggers. If we had a
compilation step for triggers (which, ultimately means a compilation step
for procedures) which kept a dependency tree of which tables/columns were
touched, then we would have that insight. it's true that one dynamic
statement or SELECT * would force us right back to keep-everything, but if
procedures which did not do such things had performance benefits, that
would be an incentive to code them more fastidiously.


Re: Referential Integrity Checks with Statement-level Triggers

2019-08-01 Thread Thomas Munro
On Tue, Feb 26, 2019 at 5:41 AM Corey Huinker  wrote:
>> In order to avoid per-row calls of the constraint trigger functions, we could
>> try to "aggregate" the constraint-specific events somehow, but I think a
>> separate queue would be needed for the constraint-specific events.
>>
>> In general, the (after) triggers and constraints have too much in common, so
>> separation of these w/o seeing code changes is beyond my imagination.
>
> Yeah, there's a lot of potential for overlap where a trigger could "borrow" 
> an RI tuplestore or vice versa.
>
> The people who expressed opinions on nuking triggers from orbit (it's the 
> only way to be sure) have yet to offer up any guidance on how to proceed from 
> here, and I suspect it's because they're all very busy getting things ready 
> for v12. I definitely have an interest in working on this for 13, but I don't 
> feel good about striking out on my own without their input.

Very interesting thread, but the current patch has been through two
CFs without comments or new patches, so I'm going to mark it "Returned
with feedback".  I hope all this discussion will trigger more research
in this space.

-- 
Thomas Munro
https://enterprisedb.com




Re: Referential Integrity Checks with Statement-level Triggers

2019-02-25 Thread Corey Huinker
>
>
> In order to avoid per-row calls of the constraint trigger functions, we
> could
> try to "aggregate" the constraint-specific events somehow, but I think a
> separate queue would be needed for the constraint-specific events.
>
> In general, the (after) triggers and constraints have too much in common,
> so
> separation of these w/o seeing code changes is beyond my imagination.
>
>
Yeah, there's a lot of potential for overlap where a trigger could "borrow"
an RI tuplestore or vice versa.

The people who expressed opinions on nuking triggers from orbit (it's the
only way to be sure) have yet to offer up any guidance on how to proceed
from here, and I suspect it's because they're all very busy getting things
ready for v12. I definitely have an interest in working on this for 13, but
I don't feel good about striking out on my own without their input.


Re: Referential Integrity Checks with Statement-level Triggers

2019-02-25 Thread Antonin Houska
Corey Huinker  wrote:

>  Have you considered bulk processing of individual rows by row-level trigger?
>  For IMMEDIATE constraints we'd have to ensure that the trigger is notified
>  that the current row is the last one from the current query, but that might
>  not be difficult.
> 
> I'm not sure I understand what you're suggesting, but if it keeps the
> overhead of one trigger firing per row deleted, then it doesn't seem like
> much of a win.

I thought of a trigger that still fires for each row, but most of the time it
only stores the row deleted into a tuplestore of the appropriate lifespan. The
queries that you proposed would only be executed if the tuplestore contains
given amount of tuples or if the last row of the current statement has been
stored.

> Given that this patch has been punted to v13, I'd like to instead look at
> how we might go about building up the transition tuplestores for the
> specific purpose of doing the RI checks, not just deletes, and executing
> those at the appropriate time, rather than trying to make our needs fit into
> trigger form.

Constraint-specific tuplestore can make some things easier, but if table has
both constraints and (non-constraint) triggers which use the transition
tables, then the tuples will have to be stored in both tuplestores. On the
other hand, using the same tuplestore for both constraint and non-constraint
triggers is difficult because deferred constraint triggers need to see rows
added by all statements while the non-constraint triggers should only see rows
of the current statement.

In order to avoid per-row calls of the constraint trigger functions, we could
try to "aggregate" the constraint-specific events somehow, but I think a
separate queue would be needed for the constraint-specific events.

In general, the (after) triggers and constraints have too much in common, so
separation of these w/o seeing code changes is beyond my imagination.

-- 
Antonin Houska
https://www.cybertec-postgresql.com



Re: Referential Integrity Checks with Statement-level Triggers

2019-02-22 Thread Corey Huinker
>
>
> While the idea to use the transition table is good, this approach probably
> requires the trigger engine (trigger.c) to be adjusted, and that in a
> non-trivial way.
>

It probably does. Several people with advanced knowledge of trigger.c
expressed a desire to rebuild trigger.c from the ground up, and with it
create case-specific tuplestores for handling referential integrity
constraints, which would be lighter than either the transition tables or
the per-row invocation of a trigger. After all, we need a RI check to
happen, we don't need it to happen *through a trigger function*.

I'm also not sure if it's o.k. that performance related patch potentially
> makes performance worse in some cases. If FK violations are checked at
> statement boundary, the wasted effort / time can (at least in theory) be
> high
> if early rows violate the FK.
>

That concern was also expressed with varying levels of alarm in their
voices.

Have you considered bulk processing of individual rows by row-level trigger?
> For IMMEDIATE constraints we'd have to ensure that the trigger is notified
> that the current row is the last one from the current query, but that might
> not be difficult.
>

I'm not sure I understand what you're suggesting, but if it keeps the
overhead of one trigger firing per row deleted, then it doesn't seem like
much of a win.

Given that this patch has been punted to v13, I'd like to instead look at
how we might go about building up the transition tuplestores for the
specific purpose of doing the RI checks, not just deletes, and executing
those at the appropriate time, rather than trying to make our needs fit
into trigger form.


Re: Referential Integrity Checks with Statement-level Triggers

2019-02-20 Thread Antonin Houska
Corey Huinker  wrote:

> Attached is a patch that refactors DELETE triggers to fire at the statement 
> level.
> 
> I chose delete triggers partly out of simplicity, and partly because there
> some before/after row linkage in the ON UPDATE CASCADE cases where statement
> level triggers might not be feasible as we have currently implemented them.

I tried to review this patch, also with the intention to learn more about
AFTER triggers internals. As for the code, I understood your idea and it
really seems like low hanging fruit. However in trigger.c I noticed a comment
that transition table is not supported for deferred constraints. Of course I
tried to use this information to test your patch:

CREATE TABLE t_pk(i int PRIMARY KEY);
CREATE TABLE t_fk(i int REFERENCES t_pk ON DELETE NO ACTION DEFERRABLE);
INSERT INTO t_pk(i) VALUES (1);
INSERT INTO t_fk(i) VALUES (1);
BEGIN;
SET CONSTRAINTS t_fk_i_fkey DEFERRED;
DELETE FROM t_pk;
COMMIT;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Missing transition table really appears to be the problem:

#0  0x00a8c2a8 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at 
tuplestore.c:548
#1  0x009b0e58 in ri_on_delete_restrict (trigdata=0x7ffe800c2890, 
is_no_action=true) at ri_triggers.c:720
#2  0x009b0d3f in RI_FKey_noaction_del (fcinfo=0x7ffe800c27a0) at 
ri_triggers.c:607
#3  0x006b8768 in ExecCallTriggerFunc (trigdata=0x7ffe800c2890, 
tgindx=0, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690) at 
trigger.c:2412
#4  0x006bbb86 in AfterTriggerExecute (event=0x1f0ab10, 
rel=0x7fc71306ea70, trigdesc=0x1f089f8, finfo=0x1f08c10, instr=0x0, 
per_tuple_context=0x1f30690, trig_tuple_slot1=0x0, trig_tuple_slot2=0x0)
at trigger.c:4367
#5  0x006bbf9e in afterTriggerInvokeEvents (events=0xf97950 
, firing_id=1, estate=0x1f086c8, delete_ok=true) at 
trigger.c:4560
#6  0x006bc844 in AfterTriggerFireDeferred () at trigger.c:4996
#7  0x0055c989 in CommitTransaction () at xact.c:1987
#8  0x0055d6a4 in CommitTransactionCommand () at xact.c:2832

While the idea to use the transition table is good, this approach probably
requires the trigger engine (trigger.c) to be adjusted, and that in a
non-trivial way.

I'm also not sure if it's o.k. that performance related patch potentially
makes performance worse in some cases. If FK violations are checked at
statement boundary, the wasted effort / time can (at least in theory) be high
if early rows violate the FK.

Have you considered bulk processing of individual rows by row-level trigger?
For IMMEDIATE constraints we'd have to ensure that the trigger is notified
that the current row is the last one from the current query, but that might
not be difficult.

-- 
Antonin Houska
https://www.cybertec-postgresql.com



Re: Referential Integrity Checks with Statement-level Triggers

2019-01-23 Thread Corey Huinker
Attached is a patch that refactors DELETE triggers to fire at the statement
level.

I chose delete triggers partly out of simplicity, and partly because there
some before/after row linkage in the ON UPDATE CASCADE cases where
statement level triggers might not be feasible as we have currently
implemented them.

After having done the work, I think INSERT triggers would be similarly
straightforward, but wanted to limit scope.

Also, after having stripped the delete cases out of the update-or-delete
functions, it became obvious that the on-update-set-null and
on-update-set-default cases differed by only 3-4 lines, so those functions
were combined.

On a vagrant VM running on my desktop machine, I'm seeing a speed-up of
about 25% in the benchmark provided. I think that figure is cloudy and
below my expectations. Perhaps we'd get a much better picture of whether or
not this is worth it on a bare metal machine, or at least a VM better
suited to benchmarking.

Currently 4 make-check tests are failing. Two of which appear to false
positives (the test makes assumptions about triggers that are no longer
true), and the other two are outside the scope of this benchmark so I'll
revisit them if we go forward.

ri-set-logic.sql is an edited benchmark script adapted from Kevin
Grittner's benchmark that he ran against hand-rolled triggers and posted on
2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.

Many thanks to everyone who helped, often despite their own objections to
the overall reasoning behind the endeavor. I'm aware that a large
contingent of highly experienced people would very much like to replace our
entire trigger architecture, or at least divorce RI checks from triggers.
Maybe this patch spurs on that change. Even if nothing comes of it, it's
been a great learning experience.

On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli  wrote:

> > It is far from a premature optimization IMO, it is super useful and
> something I was hoping would happen ever since I heard about transition
> tables being worked on.
>
> Me too.  Never-ending DELETEs are a common pain point especially for
> people migrated from MySQL which creates indexes for foreign keys
> automatically.
>
From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001
From: vagrant 
Date: Wed, 23 Jan 2019 16:59:17 +
Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers

---
 src/backend/commands/tablecmds.c|   9 +-
 src/backend/commands/trigger.c  |   2 +
 src/backend/utils/adt/ri_triggers.c | 779 
 3 files changed, 566 insertions(+), 224 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..21f5bf94a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 {
 	CreateTrigStmt *fk_trigger;
 
+	TriggerTransition *del = makeNode(TriggerTransition);
+	del->name = "pg_deleted_transition_table";
+	del->isNew = false;
+	del->isTable = true;
+
 	/*
 	 * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON
 	 * DELETE action on the referenced table.
@@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 	fk_trigger = makeNode(CreateTrigStmt);
 	fk_trigger->trigname = "RI_ConstraintTrigger_a";
 	fk_trigger->relation = NULL;
-	fk_trigger->row = true;
+	fk_trigger->row = false;
 	fk_trigger->timing = TRIGGER_TYPE_AFTER;
 	fk_trigger->events = TRIGGER_TYPE_DELETE;
 	fk_trigger->columns = NIL;
-	fk_trigger->transitionRels = NIL;
+	fk_trigger->transitionRels = list_make1(del);
 	fk_trigger->whenClause = NULL;
 	fk_trigger->isconstraint = true;
 	fk_trigger->constrrel = NULL;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7ffaeaffc6..080587215f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 *
 			 * Currently this is enforced by the grammar, so just Assert here.
 			 */
+			/*
 			Assert(!stmt->isconstraint);
+			*/
 
 			if (tt->isNew)
 			{
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e1aa3d0044..6f89ab4c77 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -194,9 +194,10 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
   HeapTuple old_row,
   const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
-static Datum ri_setnull(TriggerData *trigdata);
-static Datum ri_setdefault(TriggerData *trigdata);
+static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action);
+static 

Re: Referential Integrity Checks with Statement-level Triggers

2018-12-22 Thread Emre Hasegeli
> It is far from a premature optimization IMO, it is super useful and something 
> I was hoping would happen ever since I heard about transition tables being 
> worked on.

Me too.  Never-ending DELETEs are a common pain point especially for
people migrated from MySQL which creates indexes for foreign keys
automatically.



Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Kevin Grittner
On Mon, Dec 17, 2018 at 8:32 AM Corey Huinker  wrote:

> All suggestions are appreciated.

As I recall, the main issue is how to handle concurrency.  The
existing RI triggers do some very special handling of the multi-xact
ID to manage concurrent modifications.  Has anyone looked at the
issues with using those techniques with set-oriented statements for
the transition table approach?

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Kevin Grittner
On Mon, Dec 17, 2018 at 11:27 AM Alvaro Herrera
 wrote:
> On 2018-Dec-17, Pavel Stehule wrote:
>
>> ROW trigger call RI check too often, and statement trigger too less. I
>> think so ideal design can be call RI check every 10K rows. I think so can
>> be unfriendly if somebody does very long import and it fails on the end. I
>> don't think so there should not be any performance difference, if RI check
>> is called per 1000 or more rows.
>
> This is a good point, but I'm not sure if it's possible to implement
> using statement-level triggers.  I think the way transition tables work
> is that you get the full results at the end of the command; there's no
> way to pass control to the RI stuff at arbitrary points during the
> execution of the command.
>
> Is there any guidance on the SQL standard about this?  I don't think the
> timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
> this.  Or do they?

Yes, they do.  *ALL* AFTER triggers fire after the statement
completes, it's a question of whether a particular trigger fires once
for the whole statement or once for each row.  Observe:

test=# CREATE TABLE t1 (t1id int PRIMARY KEY, t1desc text);
CREATE TABLE
test=# CREATE TABLE t2 (t2id int PRIMARY KEY, t1id int NOT NULL, t2desc text,
test(#   FOREIGN KEY (t1id) REFERENCES t1);
CREATE TABLE
test=# CREATE FUNCTION t2_insert_func()
test-#   RETURNS TRIGGER
test-#   LANGUAGE plpgsql
test-# AS $$
test$# BEGIN
test$#   RAISE NOTICE '%', new;
test$#   RETURN new;
test$# END;
test$# $$;
CREATE FUNCTION
test=# CREATE TRIGGER t2_insert_trig
test-#   BEFORE INSERT ON t2
test-#   FOR EACH ROW
test-#   EXECUTE FUNCTION t2_insert_func();
CREATE TRIGGER
test=# INSERT INTO t1 VALUES (1), (2), (3);
INSERT 0 3
test=# INSERT INTO t2 VALUES (10, 1), (20, 2), (30, 3), (40, 4), (50, 5);
NOTICE:  (10,1,)
NOTICE:  (20,2,)
NOTICE:  (30,3,)
NOTICE:  (40,4,)
NOTICE:  (50,5,)
ERROR:  insert or update on table "t2" violates foreign key constraint
"t2_t1id_fkey"
DETAIL:  Key (t1id)=(4) is not present in table "t1".

All inserts occur before the statement fails, per standard.

Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Pavel Stehule
po 17. 12. 2018 v 19:19 odesílatel Adam Brusselback <
adambrusselb...@gmail.com> napsal:

> It's something I know I am interested in. For me, I don't really care if
> my statement doesn't cancel until the very end if there is a RI violation.
> The benefit of not having deletes be slow on tables which have others
> referencing it  with a fkey which don't have their own index is huge IMO. I
> have a good number of those type of logging tables where an index is not
> useful 99% of the time, but every once and a while a bulk delete needs to
> happen.
>
> It is far from a premature optimization IMO, it is super useful and
> something I was hoping would happen ever since I heard about transition
> tables being worked on.
>

note: my sentence about premature optimization was related to my idea to
divide RI check per 10K rows.

It would be great if RI check will be faster.


> Just my $0.02.
> -Adam
>


Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Adam Brusselback
It's something I know I am interested in. For me, I don't really care if my
statement doesn't cancel until the very end if there is a RI violation. The
benefit of not having deletes be slow on tables which have others
referencing it  with a fkey which don't have their own index is huge IMO. I
have a good number of those type of logging tables where an index is not
useful 99% of the time, but every once and a while a bulk delete needs to
happen.

It is far from a premature optimization IMO, it is super useful and
something I was hoping would happen ever since I heard about transition
tables being worked on.

Just my $0.02.
-Adam


Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Pavel Stehule
po 17. 12. 2018 v 18:27 odesílatel Alvaro Herrera 
napsal:

> On 2018-Dec-17, Pavel Stehule wrote:
>
> > ROW trigger call RI check too often, and statement trigger too less. I
> > think so ideal design can be call RI check every 10K rows. I think so can
> > be unfriendly if somebody does very long import and it fails on the end.
> I
> > don't think so there should not be any performance difference, if RI
> check
> > is called per 1000 or more rows.
>
> This is a good point, but I'm not sure if it's possible to implement
> using statement-level triggers.  I think the way transition tables work
> is that you get the full results at the end of the command; there's no
> way to pass control to the RI stuff at arbitrary points during the
> execution of the command.
>

It was just a idea. When I think about it, I am not sure, if RI check from
statement trigger is not worse when statement related changes are too big.
On second hand, it is premature optimization maybe. We can check
performance on prototype.



> Is there any guidance on the SQL standard about this?  I don't think the
> timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
> this.  Or do they?
>
> Maybe there is a solution for this.  I think it's worth thinking about,
> even if it's just to say that we won't do it.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Alvaro Herrera
On 2018-Dec-17, Pavel Stehule wrote:

> ROW trigger call RI check too often, and statement trigger too less. I
> think so ideal design can be call RI check every 10K rows. I think so can
> be unfriendly if somebody does very long import and it fails on the end. I
> don't think so there should not be any performance difference, if RI check
> is called per 1000 or more rows.

This is a good point, but I'm not sure if it's possible to implement
using statement-level triggers.  I think the way transition tables work
is that you get the full results at the end of the command; there's no
way to pass control to the RI stuff at arbitrary points during the
execution of the command.

Is there any guidance on the SQL standard about this?  I don't think the
timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
this.  Or do they?

Maybe there is a solution for this.  I think it's worth thinking about,
even if it's just to say that we won't do it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Pavel Stehule
po 17. 12. 2018 v 15:32 odesílatel Corey Huinker 
napsal:

> Back when Pg added statement-level triggers, I was interested in the
> potential promise of moving referential integrity checks to statement-level
> triggers.
>
> The initial conversation, along with Kevin Grittner's POC script (in SQL)
> that showed a potential for a 98% reduction in time spent doing RI checks.
> The original thread is here:
>
>
> https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com
>
> I dug around in the code, and was rather surprised at how close we already
> are to implementing this. The function RI_Initial_Check() already does a
> left-join query via SPI to look for any invalid data, so if we could just
> replace the near table with the transition table for inserted rows, we'd be
> home free. The function SPI_register_trigger_data() makes the transition
> tables visible to SPI, so I started to wonder why this hadn't be done
> already.
>
> I approached Kevin and Thomas Munro seeking feedback on my approach. I
> also made it into a session at the PgConf.ASIA un-conference, and then
> later with Michael Paquier at that same conference, and the coalesced
> feedback was this:
>
> - the overhead of registering the transition tables probably makes it
> unprofitable for single row inserts
> - the single row overhead is itself significant, so maybe the transition
> tables aren't worse
> - there has been talk of replacing transition tables with an in-memory
> data structure that would be closer to "free" from a startup perspective
> and might even coalesce the transition tables of multiple statements in the
> same transaction
> - because no declarative code changes, it's trivial to switch from row
> level to statement level triggering via pg_upgrade
> - assuming that transition tables are an overhead that only pays off when
> > N rows have been updated, does it make sense to enforce RI with something
> that isn't actually a trigger?
> - there was also some mention that parallel query uses a queue mechanism
> that might be leveraged to do row-level triggers for updates of <= N rows
> and statement level for > N
>
> That's what I have so far. I'm going to be working on a POC patch so that
> I can benchmark a pure-statement-level solution, which if nothing else will
> let us know the approximate value of N.
>
> All suggestions are appreciated.
>

It is great. I though little bit about it - just theoretically.

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end. I
don't think so there should not be any performance difference, if RI check
is called per 1000 or more rows.

Regards

Pavel


Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Corey Huinker
Back when Pg added statement-level triggers, I was interested in the
potential promise of moving referential integrity checks to statement-level
triggers.

The initial conversation, along with Kevin Grittner's POC script (in SQL)
that showed a potential for a 98% reduction in time spent doing RI checks.
The original thread is here:

https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com

I dug around in the code, and was rather surprised at how close we already
are to implementing this. The function RI_Initial_Check() already does a
left-join query via SPI to look for any invalid data, so if we could just
replace the near table with the transition table for inserted rows, we'd be
home free. The function SPI_register_trigger_data() makes the transition
tables visible to SPI, so I started to wonder why this hadn't be done
already.

I approached Kevin and Thomas Munro seeking feedback on my approach. I also
made it into a session at the PgConf.ASIA un-conference, and then later
with Michael Paquier at that same conference, and the coalesced feedback
was this:

- the overhead of registering the transition tables probably makes it
unprofitable for single row inserts
- the single row overhead is itself significant, so maybe the transition
tables aren't worse
- there has been talk of replacing transition tables with an in-memory data
structure that would be closer to "free" from a startup perspective and
might even coalesce the transition tables of multiple statements in the
same transaction
- because no declarative code changes, it's trivial to switch from row
level to statement level triggering via pg_upgrade
- assuming that transition tables are an overhead that only pays off when >
N rows have been updated, does it make sense to enforce RI with something
that isn't actually a trigger?
- there was also some mention that parallel query uses a queue mechanism
that might be leveraged to do row-level triggers for updates of <= N rows
and statement level for > N

That's what I have so far. I'm going to be working on a POC patch so that I
can benchmark a pure-statement-level solution, which if nothing else will
let us know the approximate value of N.

All suggestions are appreciated.