Re: Referential Integrity Checks with Statement-level Triggers
> > > > The people who expressed opinions on nuking triggers from orbit (it's > the only way to be sure) have yet to offer up any guidance on how to > proceed from here, and I suspect it's because they're all very busy getting > things ready for v12. I definitely have an interest in working on this for > 13, but I don't feel good about striking out on my own without their input. > > Very interesting thread, but the current patch has been through two > CFs without comments or new patches, so I'm going to mark it "Returned > with feedback". I hope all this discussion will trigger more research > in this space. > I've noticed that the zedstore efforts ran into the same problem that refactoring triggers has: we cannot determine which columns in a table will be affected by a trigger. so we have to assume that all of them will be. This causes a lot of unnecessary overhead with triggers. If we had a compilation step for triggers (which, ultimately means a compilation step for procedures) which kept a dependency tree of which tables/columns were touched, then we would have that insight. it's true that one dynamic statement or SELECT * would force us right back to keep-everything, but if procedures which did not do such things had performance benefits, that would be an incentive to code them more fastidiously.
Re: Referential Integrity Checks with Statement-level Triggers
On Tue, Feb 26, 2019 at 5:41 AM Corey Huinker wrote: >> In order to avoid per-row calls of the constraint trigger functions, we could >> try to "aggregate" the constraint-specific events somehow, but I think a >> separate queue would be needed for the constraint-specific events. >> >> In general, the (after) triggers and constraints have too much in common, so >> separation of these w/o seeing code changes is beyond my imagination. > > Yeah, there's a lot of potential for overlap where a trigger could "borrow" > an RI tuplestore or vice versa. > > The people who expressed opinions on nuking triggers from orbit (it's the > only way to be sure) have yet to offer up any guidance on how to proceed from > here, and I suspect it's because they're all very busy getting things ready > for v12. I definitely have an interest in working on this for 13, but I don't > feel good about striking out on my own without their input. Very interesting thread, but the current patch has been through two CFs without comments or new patches, so I'm going to mark it "Returned with feedback". I hope all this discussion will trigger more research in this space. -- Thomas Munro https://enterprisedb.com
Re: Referential Integrity Checks with Statement-level Triggers
> > > In order to avoid per-row calls of the constraint trigger functions, we > could > try to "aggregate" the constraint-specific events somehow, but I think a > separate queue would be needed for the constraint-specific events. > > In general, the (after) triggers and constraints have too much in common, > so > separation of these w/o seeing code changes is beyond my imagination. > > Yeah, there's a lot of potential for overlap where a trigger could "borrow" an RI tuplestore or vice versa. The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.
Re: Referential Integrity Checks with Statement-level Triggers
Corey Huinker wrote: > Have you considered bulk processing of individual rows by row-level trigger? > For IMMEDIATE constraints we'd have to ensure that the trigger is notified > that the current row is the last one from the current query, but that might > not be difficult. > > I'm not sure I understand what you're suggesting, but if it keeps the > overhead of one trigger firing per row deleted, then it doesn't seem like > much of a win. I thought of a trigger that still fires for each row, but most of the time it only stores the row deleted into a tuplestore of the appropriate lifespan. The queries that you proposed would only be executed if the tuplestore contains given amount of tuples or if the last row of the current statement has been stored. > Given that this patch has been punted to v13, I'd like to instead look at > how we might go about building up the transition tuplestores for the > specific purpose of doing the RI checks, not just deletes, and executing > those at the appropriate time, rather than trying to make our needs fit into > trigger form. Constraint-specific tuplestore can make some things easier, but if table has both constraints and (non-constraint) triggers which use the transition tables, then the tuples will have to be stored in both tuplestores. On the other hand, using the same tuplestore for both constraint and non-constraint triggers is difficult because deferred constraint triggers need to see rows added by all statements while the non-constraint triggers should only see rows of the current statement. In order to avoid per-row calls of the constraint trigger functions, we could try to "aggregate" the constraint-specific events somehow, but I think a separate queue would be needed for the constraint-specific events. In general, the (after) triggers and constraints have too much in common, so separation of these w/o seeing code changes is beyond my imagination. -- Antonin Houska https://www.cybertec-postgresql.com
Re: Referential Integrity Checks with Statement-level Triggers
> > > While the idea to use the transition table is good, this approach probably > requires the trigger engine (trigger.c) to be adjusted, and that in a > non-trivial way. > It probably does. Several people with advanced knowledge of trigger.c expressed a desire to rebuild trigger.c from the ground up, and with it create case-specific tuplestores for handling referential integrity constraints, which would be lighter than either the transition tables or the per-row invocation of a trigger. After all, we need a RI check to happen, we don't need it to happen *through a trigger function*. I'm also not sure if it's o.k. that performance related patch potentially > makes performance worse in some cases. If FK violations are checked at > statement boundary, the wasted effort / time can (at least in theory) be > high > if early rows violate the FK. > That concern was also expressed with varying levels of alarm in their voices. Have you considered bulk processing of individual rows by row-level trigger? > For IMMEDIATE constraints we'd have to ensure that the trigger is notified > that the current row is the last one from the current query, but that might > not be difficult. > I'm not sure I understand what you're suggesting, but if it keeps the overhead of one trigger firing per row deleted, then it doesn't seem like much of a win. Given that this patch has been punted to v13, I'd like to instead look at how we might go about building up the transition tuplestores for the specific purpose of doing the RI checks, not just deletes, and executing those at the appropriate time, rather than trying to make our needs fit into trigger form.
Re: Referential Integrity Checks with Statement-level Triggers
Corey Huinker wrote: > Attached is a patch that refactors DELETE triggers to fire at the statement > level. > > I chose delete triggers partly out of simplicity, and partly because there > some before/after row linkage in the ON UPDATE CASCADE cases where statement > level triggers might not be feasible as we have currently implemented them. I tried to review this patch, also with the intention to learn more about AFTER triggers internals. As for the code, I understood your idea and it really seems like low hanging fruit. However in trigger.c I noticed a comment that transition table is not supported for deferred constraints. Of course I tried to use this information to test your patch: CREATE TABLE t_pk(i int PRIMARY KEY); CREATE TABLE t_fk(i int REFERENCES t_pk ON DELETE NO ACTION DEFERRABLE); INSERT INTO t_pk(i) VALUES (1); INSERT INTO t_fk(i) VALUES (1); BEGIN; SET CONSTRAINTS t_fk_i_fkey DEFERRED; DELETE FROM t_pk; COMMIT; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Missing transition table really appears to be the problem: #0 0x00a8c2a8 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at tuplestore.c:548 #1 0x009b0e58 in ri_on_delete_restrict (trigdata=0x7ffe800c2890, is_no_action=true) at ri_triggers.c:720 #2 0x009b0d3f in RI_FKey_noaction_del (fcinfo=0x7ffe800c27a0) at ri_triggers.c:607 #3 0x006b8768 in ExecCallTriggerFunc (trigdata=0x7ffe800c2890, tgindx=0, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690) at trigger.c:2412 #4 0x006bbb86 in AfterTriggerExecute (event=0x1f0ab10, rel=0x7fc71306ea70, trigdesc=0x1f089f8, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690, trig_tuple_slot1=0x0, trig_tuple_slot2=0x0) at trigger.c:4367 #5 0x006bbf9e in afterTriggerInvokeEvents (events=0xf97950 , firing_id=1, estate=0x1f086c8, delete_ok=true) at trigger.c:4560 #6 0x006bc844 in AfterTriggerFireDeferred () at trigger.c:4996 #7 0x0055c989 in CommitTransaction () at xact.c:1987 #8 0x0055d6a4 in CommitTransactionCommand () at xact.c:2832 While the idea to use the transition table is good, this approach probably requires the trigger engine (trigger.c) to be adjusted, and that in a non-trivial way. I'm also not sure if it's o.k. that performance related patch potentially makes performance worse in some cases. If FK violations are checked at statement boundary, the wasted effort / time can (at least in theory) be high if early rows violate the FK. Have you considered bulk processing of individual rows by row-level trigger? For IMMEDIATE constraints we'd have to ensure that the trigger is notified that the current row is the last one from the current query, but that might not be difficult. -- Antonin Houska https://www.cybertec-postgresql.com
Re: Referential Integrity Checks with Statement-level Triggers
Attached is a patch that refactors DELETE triggers to fire at the statement level. I chose delete triggers partly out of simplicity, and partly because there some before/after row linkage in the ON UPDATE CASCADE cases where statement level triggers might not be feasible as we have currently implemented them. After having done the work, I think INSERT triggers would be similarly straightforward, but wanted to limit scope. Also, after having stripped the delete cases out of the update-or-delete functions, it became obvious that the on-update-set-null and on-update-set-default cases differed by only 3-4 lines, so those functions were combined. On a vagrant VM running on my desktop machine, I'm seeing a speed-up of about 25% in the benchmark provided. I think that figure is cloudy and below my expectations. Perhaps we'd get a much better picture of whether or not this is worth it on a bare metal machine, or at least a VM better suited to benchmarking. Currently 4 make-check tests are failing. Two of which appear to false positives (the test makes assumptions about triggers that are no longer true), and the other two are outside the scope of this benchmark so I'll revisit them if we go forward. ri-set-logic.sql is an edited benchmark script adapted from Kevin Grittner's benchmark that he ran against hand-rolled triggers and posted on 2016-11-02 ri_test.out is a copy paste of two runs of the benchmark script. Many thanks to everyone who helped, often despite their own objections to the overall reasoning behind the endeavor. I'm aware that a large contingent of highly experienced people would very much like to replace our entire trigger architecture, or at least divorce RI checks from triggers. Maybe this patch spurs on that change. Even if nothing comes of it, it's been a great learning experience. On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli wrote: > > It is far from a premature optimization IMO, it is super useful and > something I was hoping would happen ever since I heard about transition > tables being worked on. > > Me too. Never-ending DELETEs are a common pain point especially for > people migrated from MySQL which creates indexes for foreign keys > automatically. > From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001 From: vagrant Date: Wed, 23 Jan 2019 16:59:17 + Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers --- src/backend/commands/tablecmds.c| 9 +- src/backend/commands/trigger.c | 2 + src/backend/utils/adt/ri_triggers.c | 779 3 files changed, 566 insertions(+), 224 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..21f5bf94a4 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr { CreateTrigStmt *fk_trigger; + TriggerTransition *del = makeNode(TriggerTransition); + del->name = "pg_deleted_transition_table"; + del->isNew = false; + del->isTable = true; + /* * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON * DELETE action on the referenced table. @@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr fk_trigger = makeNode(CreateTrigStmt); fk_trigger->trigname = "RI_ConstraintTrigger_a"; fk_trigger->relation = NULL; - fk_trigger->row = true; + fk_trigger->row = false; fk_trigger->timing = TRIGGER_TYPE_AFTER; fk_trigger->events = TRIGGER_TYPE_DELETE; fk_trigger->columns = NIL; - fk_trigger->transitionRels = NIL; + fk_trigger->transitionRels = list_make1(del); fk_trigger->whenClause = NULL; fk_trigger->isconstraint = true; fk_trigger->constrrel = NULL; diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 7ffaeaffc6..080587215f 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, * * Currently this is enforced by the grammar, so just Assert here. */ + /* Assert(!stmt->isconstraint); + */ if (tt->isNew) { diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index e1aa3d0044..6f89ab4c77 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -194,9 +194,10 @@ static int ri_constraint_cache_valid_count = 0; static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel, HeapTuple old_row, const RI_ConstraintInfo *riinfo); -static Datum ri_restrict(TriggerData *trigdata, bool is_no_action); -static Datum ri_setnull(TriggerData *trigdata); -static Datum ri_setdefault(TriggerData *trigdata); +static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action); +static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action); +static
Re: Referential Integrity Checks with Statement-level Triggers
> It is far from a premature optimization IMO, it is super useful and something > I was hoping would happen ever since I heard about transition tables being > worked on. Me too. Never-ending DELETEs are a common pain point especially for people migrated from MySQL which creates indexes for foreign keys automatically.
Re: Referential Integrity Checks with Statement-level Triggers
On Mon, Dec 17, 2018 at 8:32 AM Corey Huinker wrote: > All suggestions are appreciated. As I recall, the main issue is how to handle concurrency. The existing RI triggers do some very special handling of the multi-xact ID to manage concurrent modifications. Has anyone looked at the issues with using those techniques with set-oriented statements for the transition table approach? -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: Referential Integrity Checks with Statement-level Triggers
On Mon, Dec 17, 2018 at 11:27 AM Alvaro Herrera wrote: > On 2018-Dec-17, Pavel Stehule wrote: > >> ROW trigger call RI check too often, and statement trigger too less. I >> think so ideal design can be call RI check every 10K rows. I think so can >> be unfriendly if somebody does very long import and it fails on the end. I >> don't think so there should not be any performance difference, if RI check >> is called per 1000 or more rows. > > This is a good point, but I'm not sure if it's possible to implement > using statement-level triggers. I think the way transition tables work > is that you get the full results at the end of the command; there's no > way to pass control to the RI stuff at arbitrary points during the > execution of the command. > > Is there any guidance on the SQL standard about this? I don't think the > timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on > this. Or do they? Yes, they do. *ALL* AFTER triggers fire after the statement completes, it's a question of whether a particular trigger fires once for the whole statement or once for each row. Observe: test=# CREATE TABLE t1 (t1id int PRIMARY KEY, t1desc text); CREATE TABLE test=# CREATE TABLE t2 (t2id int PRIMARY KEY, t1id int NOT NULL, t2desc text, test(# FOREIGN KEY (t1id) REFERENCES t1); CREATE TABLE test=# CREATE FUNCTION t2_insert_func() test-# RETURNS TRIGGER test-# LANGUAGE plpgsql test-# AS $$ test$# BEGIN test$# RAISE NOTICE '%', new; test$# RETURN new; test$# END; test$# $$; CREATE FUNCTION test=# CREATE TRIGGER t2_insert_trig test-# BEFORE INSERT ON t2 test-# FOR EACH ROW test-# EXECUTE FUNCTION t2_insert_func(); CREATE TRIGGER test=# INSERT INTO t1 VALUES (1), (2), (3); INSERT 0 3 test=# INSERT INTO t2 VALUES (10, 1), (20, 2), (30, 3), (40, 4), (50, 5); NOTICE: (10,1,) NOTICE: (20,2,) NOTICE: (30,3,) NOTICE: (40,4,) NOTICE: (50,5,) ERROR: insert or update on table "t2" violates foreign key constraint "t2_t1id_fkey" DETAIL: Key (t1id)=(4) is not present in table "t1". All inserts occur before the statement fails, per standard. Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: Referential Integrity Checks with Statement-level Triggers
po 17. 12. 2018 v 19:19 odesílatel Adam Brusselback < adambrusselb...@gmail.com> napsal: > It's something I know I am interested in. For me, I don't really care if > my statement doesn't cancel until the very end if there is a RI violation. > The benefit of not having deletes be slow on tables which have others > referencing it with a fkey which don't have their own index is huge IMO. I > have a good number of those type of logging tables where an index is not > useful 99% of the time, but every once and a while a bulk delete needs to > happen. > > It is far from a premature optimization IMO, it is super useful and > something I was hoping would happen ever since I heard about transition > tables being worked on. > note: my sentence about premature optimization was related to my idea to divide RI check per 10K rows. It would be great if RI check will be faster. > Just my $0.02. > -Adam >
Re: Referential Integrity Checks with Statement-level Triggers
It's something I know I am interested in. For me, I don't really care if my statement doesn't cancel until the very end if there is a RI violation. The benefit of not having deletes be slow on tables which have others referencing it with a fkey which don't have their own index is huge IMO. I have a good number of those type of logging tables where an index is not useful 99% of the time, but every once and a while a bulk delete needs to happen. It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heard about transition tables being worked on. Just my $0.02. -Adam
Re: Referential Integrity Checks with Statement-level Triggers
po 17. 12. 2018 v 18:27 odesílatel Alvaro Herrera napsal: > On 2018-Dec-17, Pavel Stehule wrote: > > > ROW trigger call RI check too often, and statement trigger too less. I > > think so ideal design can be call RI check every 10K rows. I think so can > > be unfriendly if somebody does very long import and it fails on the end. > I > > don't think so there should not be any performance difference, if RI > check > > is called per 1000 or more rows. > > This is a good point, but I'm not sure if it's possible to implement > using statement-level triggers. I think the way transition tables work > is that you get the full results at the end of the command; there's no > way to pass control to the RI stuff at arbitrary points during the > execution of the command. > It was just a idea. When I think about it, I am not sure, if RI check from statement trigger is not worse when statement related changes are too big. On second hand, it is premature optimization maybe. We can check performance on prototype. > Is there any guidance on the SQL standard about this? I don't think the > timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on > this. Or do they? > > Maybe there is a solution for this. I think it's worth thinking about, > even if it's just to say that we won't do it. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: Referential Integrity Checks with Statement-level Triggers
On 2018-Dec-17, Pavel Stehule wrote: > ROW trigger call RI check too often, and statement trigger too less. I > think so ideal design can be call RI check every 10K rows. I think so can > be unfriendly if somebody does very long import and it fails on the end. I > don't think so there should not be any performance difference, if RI check > is called per 1000 or more rows. This is a good point, but I'm not sure if it's possible to implement using statement-level triggers. I think the way transition tables work is that you get the full results at the end of the command; there's no way to pass control to the RI stuff at arbitrary points during the execution of the command. Is there any guidance on the SQL standard about this? I don't think the timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on this. Or do they? Maybe there is a solution for this. I think it's worth thinking about, even if it's just to say that we won't do it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Referential Integrity Checks with Statement-level Triggers
po 17. 12. 2018 v 15:32 odesílatel Corey Huinker napsal: > Back when Pg added statement-level triggers, I was interested in the > potential promise of moving referential integrity checks to statement-level > triggers. > > The initial conversation, along with Kevin Grittner's POC script (in SQL) > that showed a potential for a 98% reduction in time spent doing RI checks. > The original thread is here: > > > https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com > > I dug around in the code, and was rather surprised at how close we already > are to implementing this. The function RI_Initial_Check() already does a > left-join query via SPI to look for any invalid data, so if we could just > replace the near table with the transition table for inserted rows, we'd be > home free. The function SPI_register_trigger_data() makes the transition > tables visible to SPI, so I started to wonder why this hadn't be done > already. > > I approached Kevin and Thomas Munro seeking feedback on my approach. I > also made it into a session at the PgConf.ASIA un-conference, and then > later with Michael Paquier at that same conference, and the coalesced > feedback was this: > > - the overhead of registering the transition tables probably makes it > unprofitable for single row inserts > - the single row overhead is itself significant, so maybe the transition > tables aren't worse > - there has been talk of replacing transition tables with an in-memory > data structure that would be closer to "free" from a startup perspective > and might even coalesce the transition tables of multiple statements in the > same transaction > - because no declarative code changes, it's trivial to switch from row > level to statement level triggering via pg_upgrade > - assuming that transition tables are an overhead that only pays off when > > N rows have been updated, does it make sense to enforce RI with something > that isn't actually a trigger? > - there was also some mention that parallel query uses a queue mechanism > that might be leveraged to do row-level triggers for updates of <= N rows > and statement level for > N > > That's what I have so far. I'm going to be working on a POC patch so that > I can benchmark a pure-statement-level solution, which if nothing else will > let us know the approximate value of N. > > All suggestions are appreciated. > It is great. I though little bit about it - just theoretically. ROW trigger call RI check too often, and statement trigger too less. I think so ideal design can be call RI check every 10K rows. I think so can be unfriendly if somebody does very long import and it fails on the end. I don't think so there should not be any performance difference, if RI check is called per 1000 or more rows. Regards Pavel
Referential Integrity Checks with Statement-level Triggers
Back when Pg added statement-level triggers, I was interested in the potential promise of moving referential integrity checks to statement-level triggers. The initial conversation, along with Kevin Grittner's POC script (in SQL) that showed a potential for a 98% reduction in time spent doing RI checks. The original thread is here: https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com I dug around in the code, and was rather surprised at how close we already are to implementing this. The function RI_Initial_Check() already does a left-join query via SPI to look for any invalid data, so if we could just replace the near table with the transition table for inserted rows, we'd be home free. The function SPI_register_trigger_data() makes the transition tables visible to SPI, so I started to wonder why this hadn't be done already. I approached Kevin and Thomas Munro seeking feedback on my approach. I also made it into a session at the PgConf.ASIA un-conference, and then later with Michael Paquier at that same conference, and the coalesced feedback was this: - the overhead of registering the transition tables probably makes it unprofitable for single row inserts - the single row overhead is itself significant, so maybe the transition tables aren't worse - there has been talk of replacing transition tables with an in-memory data structure that would be closer to "free" from a startup perspective and might even coalesce the transition tables of multiple statements in the same transaction - because no declarative code changes, it's trivial to switch from row level to statement level triggering via pg_upgrade - assuming that transition tables are an overhead that only pays off when > N rows have been updated, does it make sense to enforce RI with something that isn't actually a trigger? - there was also some mention that parallel query uses a queue mechanism that might be leveraged to do row-level triggers for updates of <= N rows and statement level for > N That's what I have so far. I'm going to be working on a POC patch so that I can benchmark a pure-statement-level solution, which if nothing else will let us know the approximate value of N. All suggestions are appreciated.