Re: [HACKERS] Declarative partitioning - another take
Hi Maksim, On 2017/04/07 19:52, Maksim Milyutin wrote: On 07.04.2017 13:05, Etsuro Fujita wrote: On 2016/12/09 19:46, Maksim Milyutin wrote: I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; There seems to be no work on the first one, so I'd like to work on that. Yes, you can start to work on this, I'll join later as a reviewer. Great! I added the patch to the next commitfest: https://commitfest.postgresql.org/14/1184/ Sorry for the delay. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Wed, May 10, 2017 at 6:50 AM, Etsuro Fujitawrote: > I started working on this. I agree that the changes made in > make_modifytable would be unacceptable, but I'd vote for Amit's idea of > passing a modified PlannerInfo to PlanForeignModify so that the FDW can do > query planning for INSERT into a foreign partition in the same way as for > INSERT into a non-partition foreign table. (Though, I think we should > generate a more-valid-looking working-copy of the PlannerInfo which has > Query with the foreign partition as target.) I'm not sure it's a good idea > to add a new FDW API or modify the existing one such as PlanForeignModify > for this purpose. Thanks for working on this, but I think it would be better to start a new thread for this discussion. And probably also for any other issues that come up. This thread has gotten so long that between the original discussion and commit of the patch and discussion of multiple follow-on commits and patches that it's very hard to follow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/11/18 1:43, Robert Haas wrote: On Thu, Nov 17, 2016 at 6:27 AM, Amit Langotewrote: - The code in make_modifytable() to swap out the rte_array for a fake one looks like an unacceptable kludge. I don't know offhand what a better design would look like, but what you've got is really ugly. Agree that it looks horrible. The problem is we don't add partition (child table) RTEs when planning an insert on the parent and FDW partitions can't do without some planner handling - planForeignModify() expects a valid PlannerInfo for deparsing target lists (basically, to be able to use planner_rt_fetch()). If it's only needed for foreign tables, how about for v1 we just throw an error and say errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot route inserted tuples to a foreign table") for now. We can come back and fix it later. Doing more inheritance expansion Coming up with some new FDW API or some modification to the existing one is probably better, but I don't really want to get hung up on that right now. I started working on this. I agree that the changes made in make_modifytable would be unacceptable, but I'd vote for Amit's idea of passing a modified PlannerInfo to PlanForeignModify so that the FDW can do query planning for INSERT into a foreign partition in the same way as for INSERT into a non-partition foreign table. (Though, I think we should generate a more-valid-looking working-copy of the PlannerInfo which has Query with the foreign partition as target.) I'm not sure it's a good idea to add a new FDW API or modify the existing one such as PlanForeignModify for this purpose. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/05/10 12:59, Robert Haas wrote: > On Tue, May 9, 2017 at 11:54 PM, Thomas Munro >wrote: >> +A statement that targets a parent table in a inheritance or partitioning >> >> A tiny typo: s/a inheritance/an inheritance/ > > Now he tells me. Thanks both. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, May 9, 2017 at 11:54 PM, Thomas Munrowrote: > +A statement that targets a parent table in a inheritance or partitioning > > A tiny typo: s/a inheritance/an inheritance/ Now he tells me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Wed, May 10, 2017 at 3:51 PM, Robert Haaswrote: > On Sun, May 7, 2017 at 9:44 PM, Amit Langote > wrote: >> I think that makes sense. Modified it to read: "A statement that targets >> a parent table in a inheritance or partitioning hierarchy..." in the >> attached updated patch. > > LGTM. Committed. +A statement that targets a parent table in a inheritance or partitioning A tiny typo: s/a inheritance/an inheritance/ -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Sun, May 7, 2017 at 9:44 PM, Amit Langotewrote: > I think that makes sense. Modified it to read: "A statement that targets > a parent table in a inheritance or partitioning hierarchy..." in the > attached updated patch. LGTM. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/05/08 10:22, Thomas Munro wrote: > On Mon, May 8, 2017 at 12:47 PM, Amit Langote >wrote: >> On 2017/05/03 2:48, Robert Haas wrote: >>> On Tue, May 2, 2017 at 3:30 AM, Amit Langote >>> wrote: You're right. I agree that whatever text we add here should be pointing out that statement-level triggers of affected child tables are not fired, when root parent is specified in the command. Since there was least some talk of changing that behavior for regular inheritance so that statement triggers of any affected children are fired [1], I thought we shouldn't say something general that applies to both inheritance and partitioning. But since nothing has happened in that regard, we might as well. How about the attached? >>> >>> Looks better, but I think we should say "statement" instead of >>> "operation" for consistency with the previous paragraph, and it >>> certainly shouldn't be capitalized. >> >> Agreed, done. Attached updated patch. > > > +A statement that targets the root table in a inheritance or partitioning > +hierarchy does not cause the statement-level triggers of affected child > +tables to be fired; only the root table's statement-level triggers are > +fired. However, row-level triggers of any affected child tables will be > +fired. > + > + > + > > Why talk specifically about the "root" table? Wouldn't we describe > the situation more generally if we said [a,the] "parent"? I think that makes sense. Modified it to read: "A statement that targets a parent table in a inheritance or partitioning hierarchy..." in the attached updated patch. Thanks, Amit >From 5c2c453235d5eedf857a5e7123337aae5aedd761 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Clarify statement trigger behavior with inheritance --- doc/src/sgml/trigger.sgml | 8 1 file changed, 8 insertions(+) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 6f8416dda7..ce76a1f042 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -123,6 +123,14 @@ +A statement that targets a parent table in a inheritance or partitioning +hierarchy does not cause the statement-level triggers of affected child +tables to be fired; only the parent table's statement-level triggers are +fired. However, row-level triggers of any affected child tables will be +fired. + + + If an INSERT contains an ON CONFLICT DO UPDATE clause, it is possible that the effects of all row-level BEFORE INSERT triggers -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, May 8, 2017 at 12:47 PM, Amit Langotewrote: > On 2017/05/03 2:48, Robert Haas wrote: >> On Tue, May 2, 2017 at 3:30 AM, Amit Langote >> wrote: >>> You're right. I agree that whatever text we add here should be pointing >>> out that statement-level triggers of affected child tables are not fired, >>> when root parent is specified in the command. >>> >>> Since there was least some talk of changing that behavior for regular >>> inheritance so that statement triggers of any affected children are fired >>> [1], I thought we shouldn't say something general that applies to both >>> inheritance and partitioning. But since nothing has happened in that >>> regard, we might as well. >>> >>> How about the attached? >> >> Looks better, but I think we should say "statement" instead of >> "operation" for consistency with the previous paragraph, and it >> certainly shouldn't be capitalized. > > Agreed, done. Attached updated patch. +A statement that targets the root table in a inheritance or partitioning +hierarchy does not cause the statement-level triggers of affected child +tables to be fired; only the root table's statement-level triggers are +fired. However, row-level triggers of any affected child tables will be +fired. + + + Why talk specifically about the "root" table? Wouldn't we describe the situation more generally if we said [a,the] "parent"? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/05/03 2:48, Robert Haas wrote: > On Tue, May 2, 2017 at 3:30 AM, Amit Langote >wrote: >> You're right. I agree that whatever text we add here should be pointing >> out that statement-level triggers of affected child tables are not fired, >> when root parent is specified in the command. >> >> Since there was least some talk of changing that behavior for regular >> inheritance so that statement triggers of any affected children are fired >> [1], I thought we shouldn't say something general that applies to both >> inheritance and partitioning. But since nothing has happened in that >> regard, we might as well. >> >> How about the attached? > > Looks better, but I think we should say "statement" instead of > "operation" for consistency with the previous paragraph, and it > certainly shouldn't be capitalized. Agreed, done. Attached updated patch. Thanks, Amit >From 1d7e383c6d89ebabacc7aa3f7d9987779daaa4fb Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Clarify statement trigger behavior with inheritance --- doc/src/sgml/trigger.sgml | 8 1 file changed, 8 insertions(+) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 6f8416dda7..89e8c01a71 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -123,6 +123,14 @@ +A statement that targets the root table in a inheritance or partitioning +hierarchy does not cause the statement-level triggers of affected child +tables to be fired; only the root table's statement-level triggers are +fired. However, row-level triggers of any affected child tables will be +fired. + + + If an INSERT contains an ON CONFLICT DO UPDATE clause, it is possible that the effects of all row-level BEFORE INSERT triggers -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, May 2, 2017 at 3:30 AM, Amit Langotewrote: > You're right. I agree that whatever text we add here should be pointing > out that statement-level triggers of affected child tables are not fired, > when root parent is specified in the command. > > Since there was least some talk of changing that behavior for regular > inheritance so that statement triggers of any affected children are fired > [1], I thought we shouldn't say something general that applies to both > inheritance and partitioning. But since nothing has happened in that > regard, we might as well. > > How about the attached? Looks better, but I think we should say "statement" instead of "operation" for consistency with the previous paragraph, and it certainly shouldn't be capitalized. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/05/01 21:30, Robert Haas wrote: > On Mon, May 1, 2017 at 12:18 AM, Amit Langote >wrote: >> Attached updated patch. > > Committed, except for this bit: Thanks. > +A statement-level trigger defined on partitioned tables is fired only > +once for the table itself, not once for every table in the partitioning > +hierarchy. However, row-level triggers of any affected leaf partitions > +will be fired. > > The first sentence there has a number of issues. Grammatically, there > is an agreement problem: trigger is singular, but partitioned table is > plural, and one trigger isn't defined across multiple tables. It > would have to say something like "A statement-level trigger defined on > a partitioned table". But even with that correction, it's not really > saying what you want it to say. Nobody would expect that the same > statement-level trigger would be fired multiple times. The issue is > whether statement-level triggers on the partitions themselves will be > fired, not the statement-level trigger on the partitioned table. > Also, if this applies to inheritance as well as partitioning, then why > mention only partitioning? I think we might want to document > something more here, but not like this. You're right. I agree that whatever text we add here should be pointing out that statement-level triggers of affected child tables are not fired, when root parent is specified in the command. Since there was least some talk of changing that behavior for regular inheritance so that statement triggers of any affected children are fired [1], I thought we shouldn't say something general that applies to both inheritance and partitioning. But since nothing has happened in that regard, we might as well. How about the attached? Thanks, Amit [1] https://www.postgresql.org/message-id/cd282adde5b70b20c57f53bb9ab75...@biglumber.com >From d2d27ca458fbd341efd5ae231f977385a5e3c951 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Clarify statement trigger behavior with inheritance --- doc/src/sgml/trigger.sgml | 8 1 file changed, 8 insertions(+) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 6f8416dda7..ef41085744 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -123,6 +123,14 @@ +An Operation that targets the root table in a inheritance or partitioning +hierarchy does not cause the statement-level triggers of affected child +tables to be fired; only the root table's statement-level triggers are +fired. However, row-level triggers of any affected child tables will be +fired. + + + If an INSERT contains an ON CONFLICT DO UPDATE clause, it is possible that the effects of all row-level BEFORE INSERT triggers -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, May 1, 2017 at 12:18 AM, Amit Langotewrote: > Attached updated patch. Committed, except for this bit: +A statement-level trigger defined on partitioned tables is fired only +once for the table itself, not once for every table in the partitioning +hierarchy. However, row-level triggers of any affected leaf partitions +will be fired. The first sentence there has a number of issues. Grammatically, there is an agreement problem: trigger is singular, but partitioned table is plural, and one trigger isn't defined across multiple tables. It would have to say something like "A statement-level trigger defined on a partitioned table". But even with that correction, it's not really saying what you want it to say. Nobody would expect that the same statement-level trigger would be fired multiple times. The issue is whether statement-level triggers on the partitions themselves will be fired, not the statement-level trigger on the partitioned table. Also, if this applies to inheritance as well as partitioning, then why mention only partitioning? I think we might want to document something more here, but not like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/04/29 6:56, David Fetter wrote: > On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote: >> On 2017/04/28 7:36, David Fetter wrote: >>> Did I notice correctly that there's no way to handle transition tables >>> for partitions in AFTER STATEMENT triggers? >> >> Did you mean to ask about AFTER STATEMENT triggers defined on >> "partitioned" tables? Specifying transition table for them is disallowed >> at all. >> >> ERROR: "p" is a partitioned table >> DETAIL: Triggers on partitioned tables cannot have transition tables. > > OK, I suppose. It wasn't clear from the documentation. > >> Triggers created on (leaf) partitions *do* allow specifying transition table. > > That includes the upcoming "default" tables, I presume. If a "default" table is also a "leaf" table, then yes. A default table/partition can also be itself a partitioned table, in which case, it won't allow triggers that require transition tables. AFAICS, it's the table's being partitioned that stops it from supporting transition tables, not whether it is a "default" partition or not. >> Or are you asking something else altogether? > > I was just fuzzy on the interactions among these features. > >>> If not, I'm not suggesting that this be added at this late date, but >>> we might want to document that. >> >> I don't see mentioned in the documentation that such triggers cannot be >> defined on partitioned tables. Is that what you are saying should be >> documented? > > Yes, but I bias toward documenting a lot, and this restriction could > go away in some future version, which would make things more confusing > in the long run. Yeah, it would be a good idea to document this. > I'm picturing a conversation in 2020 that goes > something like this: > > "On 10, you could have AFTER STATEMENT triggers on tables, foreigh > tables, and leaf partition tables which referenced transition tables, > but not on DEFAULT partitions. On 11, you could on DEFAULT partition > tables. From 12 onward, you can have transition tables on any > relation." What we could document now is that partitioned tables don't allow specifying triggers that reference transition tables. Although, I am wondering where this note really belongs - the partitioning chapter, the triggers chapter or the CREATE TRIGGER reference page? Maybe, Kevin and Thomas have something to say about that. If it turns out that the partitioning chapter is a good place, here is a documentation patch. Thanks, Amit >From 289b4d906abb50451392d0efe13926f710952ca0 Mon Sep 17 00:00:00 2001 From: amitDate: Mon, 1 May 2017 13:46:58 +0900 Subject: [PATCH] Document transition table trigger limitation of partitioned tables --- doc/src/sgml/ddl.sgml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 84c4f20990..5f5a956d41 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3293,7 +3293,9 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 Row triggers, if necessary, must be defined on individual partitions, - not the partitioned table. + not the partitioned tables. Also, triggers that require accessing + transition tables are not allowed to be defined on the partitioned + tables. -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Thanks for the review. On 2017/04/29 1:25, Robert Haas wrote: > On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote >wrote: By the way, code changes I made in the attached are such that a subsequent patch could implement firing statement-level triggers of all the tables in a partition hierarchy, which it seems we don't want to do. Should then the code be changed to not create ResultRelInfos of all the tables but only the root table (the one mentioned in the command)? You will see that the patch adds fields named es_nonleaf_result_relations and es_num_nonleaf_result_relations, whereas just es_root_result_relation would perhaps do, for example. >>> >>> It seems better not to create any ResultRelInfos that we don't >>> actually need, so +1 for such a revision to the patch. >> >> OK, done. It took a bit more work than I thought. > > So, this seems weird, because rootResultRelIndex is initialized even > when splan->partitioned_rels == NIL, but isn't actually valid in that > case. ExecInitModifyTable seems to think it's always valid, though. OK, rootResultRelIndex is now set to a value >= 0 only when the node modifies a partitioned table. And then ExecInitModifyTable() checks if the index is valid before initializing the root table info. > I think the way that you've refactored fireBSTriggers and > fireASTriggers is a bit confusing. Instead of splitting out a > separate function, how about just having the existing function begin > with if (node->rootResultRelInfo) resultRelInfo = > node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ? > I think the way you've coded it is a holdover from the earlier design > where you were going to call it multiple times, but now that's not > needed. OK, done that way. > Looks OK, otherwise. Attached updated patch. Thanks, Amit >From 53c964f9f1fd3e27824080b0e9e5b672fa53cdf0 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Fire per-statement triggers of partitioned tables Current implementation completely misses them, because it assumes that no ResultRelInfos need to be created for partitioned tables, whereas they *are* needed to fire the per-statment triggers, which we *do* allow to be defined on the partitioned tables. Reported By: Rajkumar Raghuwanshi Patch by: Amit Langote Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com --- doc/src/sgml/trigger.sgml | 26 +++ src/backend/executor/execMain.c | 55 -- src/backend/executor/nodeModifyTable.c | 42 + src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/outfuncs.c| 3 ++ src/backend/nodes/readfuncs.c | 2 + src/backend/optimizer/plan/createplan.c | 1 + src/backend/optimizer/plan/planner.c| 3 ++ src/backend/optimizer/plan/setrefs.c| 19 ++-- src/include/nodes/execnodes.h | 12 + src/include/nodes/plannodes.h | 13 +- src/include/nodes/relation.h| 1 + src/test/regress/expected/triggers.out | 81 + src/test/regress/sql/triggers.sql | 70 14 files changed, 303 insertions(+), 27 deletions(-) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 2a718d7f47..5771bd5fdc 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,8 @@ A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to tables, views, and foreign tables. +performed. Triggers can be attached to tables (partitioned or not), +views, and foreign tables. @@ -111,14 +112,21 @@ Statement-level BEFORE triggers naturally fire before the statement starts to do anything, while statement-level AFTER triggers fire at the very end of the statement. These types of -triggers may be defined on tables or views. Row-level BEFORE -triggers fire immediately before a particular row is operated on, -while row-level AFTER triggers fire at the end of the -statement (but before any statement-level AFTER triggers). -These types of triggers may only be defined on tables and foreign tables. -Row-level INSTEAD OF triggers may only be defined on views, -and fire immediately as each row in the view is identified as needing to -be operated on. +triggers may be defined on tables, views, or foreign tables. Row-level +BEFORE triggers fire immediately before a particular row is +operated on, while row-level AFTER triggers fire at the end of +the statement (but before any statement-level AFTER triggers). +These types of triggers may only be defined on non-partitioned tables and
Re: [HACKERS] Declarative partitioning - another take
On Mon, Apr 24, 2017 at 07:43:29PM +0900, Amit Langote wrote: > On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote: > > I am able to create statement triggers at root partition, but these > > triggers, not getting fired on updating partition. > It would be great if you could check if the patches fix the issues. > > Also, adding this to the PostgreSQL 10 open items list. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote: > On 2017/04/28 7:36, David Fetter wrote: > > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote: > >> On 2017/04/27 1:52, Robert Haas wrote: > >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote > >>>wrote: > FWIW, I too prefer the latter, that is, fire only the parent's triggers. > In that case, applying only the patch 0001 will do. > >>> > >>> Do we need to update the documentation? > >> > >> Yes, I think we should. How about as in the attached? > >> > >> By the way, code changes I made in the attached are such that a subsequent > >> patch could implement firing statement-level triggers of all the tables in > >> a partition hierarchy, which it seems we don't want to do. Should then > >> the code be changed to not create ResultRelInfos of all the tables but > >> only the root table (the one mentioned in the command)? You will see that > >> the patch adds fields named es_nonleaf_result_relations and > >> es_num_nonleaf_result_relations, whereas just es_root_result_relation > >> would perhaps do, for example. > > > > Did I notice correctly that there's no way to handle transition tables > > for partitions in AFTER STATEMENT triggers? > > Did you mean to ask about AFTER STATEMENT triggers defined on > "partitioned" tables? Specifying transition table for them is disallowed > at all. > > ERROR: "p" is a partitioned table > DETAIL: Triggers on partitioned tables cannot have transition tables. OK, I suppose. It wasn't clear from the documentation. > Triggers created on (leaf) partitions *do* allow specifying transition table. That includes the upcoming "default" tables, I presume. > Or are you asking something else altogether? I was just fuzzy on the interactions among these features. > > If not, I'm not suggesting that this be added at this late date, but > > we might want to document that. > > I don't see mentioned in the documentation that such triggers cannot be > defined on partitioned tables. Is that what you are saying should be > documented? Yes, but I bias toward documenting a lot, and this restriction could go away in some future version, which would make things more confusing in the long run. I'm picturing a conversation in 2020 that goes something like this: "On 10, you could have AFTER STATEMENT triggers on tables, foreigh tables, and leaf partition tables which referenced transition tables, but not on DEFAULT partitions. On 11, you could on DEFAULT partition tables. From 12 onward, you can have transition tables on any relation." Kevin? Thomas? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 2:13 AM, Amit Langotewrote: > It seems to me that there is no difference in behavior between > inheritance-based and declarative partitioning as far as statement-level > triggers are concerned (at least currently). In both the cases, we fire > statement-level triggers only for the table specified in the command. OK. >>> By the way, code changes I made in the attached are such that a subsequent >>> patch could implement firing statement-level triggers of all the tables in >>> a partition hierarchy, which it seems we don't want to do. Should then >>> the code be changed to not create ResultRelInfos of all the tables but >>> only the root table (the one mentioned in the command)? You will see that >>> the patch adds fields named es_nonleaf_result_relations and >>> es_num_nonleaf_result_relations, whereas just es_root_result_relation >>> would perhaps do, for example. >> >> It seems better not to create any ResultRelInfos that we don't >> actually need, so +1 for such a revision to the patch. > > OK, done. It took a bit more work than I thought. So, this seems weird, because rootResultRelIndex is initialized even when splan->partitioned_rels == NIL, but isn't actually valid in that case. ExecInitModifyTable seems to think it's always valid, though. I think the way that you've refactored fireBSTriggers and fireASTriggers is a bit confusing. Instead of splitting out a separate function, how about just having the existing function begin with if (node->rootResultRelInfo) resultRelInfo = node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ? I think the way you've coded it is a holdover from the earlier design where you were going to call it multiple times, but now that's not needed. Looks OK, otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/04/28 7:36, David Fetter wrote: > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote: >> On 2017/04/27 1:52, Robert Haas wrote: >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote >>>wrote: FWIW, I too prefer the latter, that is, fire only the parent's triggers. In that case, applying only the patch 0001 will do. >>> >>> Do we need to update the documentation? >> >> Yes, I think we should. How about as in the attached? >> >> By the way, code changes I made in the attached are such that a subsequent >> patch could implement firing statement-level triggers of all the tables in >> a partition hierarchy, which it seems we don't want to do. Should then >> the code be changed to not create ResultRelInfos of all the tables but >> only the root table (the one mentioned in the command)? You will see that >> the patch adds fields named es_nonleaf_result_relations and >> es_num_nonleaf_result_relations, whereas just es_root_result_relation >> would perhaps do, for example. > > Did I notice correctly that there's no way to handle transition tables > for partitions in AFTER STATEMENT triggers? Did you mean to ask about AFTER STATEMENT triggers defined on "partitioned" tables? Specifying transition table for them is disallowed at all. ERROR: "p" is a partitioned table DETAIL: Triggers on partitioned tables cannot have transition tables. Triggers created on (leaf) partitions *do* allow specifying transition table. Or are you asking something else altogether? > If not, I'm not suggesting that this be added at this late date, but > we might want to document that. I don't see mentioned in the documentation that such triggers cannot be defined on partitioned tables. Is that what you are saying should be documented? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Rajkumar, On 2017/04/28 17:11, Rajkumar Raghuwanshi wrote: > On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote < >> Updated patch attached. > > I have applied given patch, could see below behaviour with statement > trigger. > > When trying to delete value within partition range, triggers got fired > (with zero row affected) as expected, but if trying to delete value which > is outside of partition range (with zero row affected), No trigger fired. > is this expected?? Good catch. I'm afraid though that this is not a defect of this patch, but some unrelated (maybe) bug, which affects not only the partitioned tables but inheritance in general. Problem is that the plan created is such that the executor has no opportunity to fire the trigger in question, because the plan contains no information about which table is affected by the statement. You can see that with inheritance. See below: create table foo (); create table bar () inherits (foo); create or replace function trig_notice() returns trigger as $$ begin raise notice 'trigger fired'; return null; end; $$ language plpgsql; create trigger foo_del_before before delete on foo for each statement execute procedure trig_notice(); explain delete from foo where false; QUERY PLAN -- Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) -- no trigger fired delete from foo where false; DELETE 0 Trigger *is* fired though, if inheritance is not used. explain delete from only foo where false; QUERY PLAN - Delete on foo (cost=0.00..0.00 rows=0 width=0) -> Result (cost=0.00..0.00 rows=0 width=6) One-Time Filter: false (3 rows) delete from only foo where false; NOTICE: trigger fired DELETE 0 I'm not sure how to go about fixing this, if at all. Or to fix it at least for partitioned tables. Would like to hear what others think. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > Updated patch attached. > Hi Amit, I have applied given patch, could see below behaviour with statement trigger. When trying to delete value within partition range, triggers got fired (with zero row affected) as expected, but if trying to delete value which is outside of partition range (with zero row affected), No trigger fired. is this expected?? Below are steps to reproduce. CREATE TABLE trigger_test_table (a INT, b INT) PARTITION BY RANGE(a); CREATE TABLE trigger_test_table1 PARTITION OF trigger_test_table FOR VALUES FROM (0) to (6); INSERT INTO trigger_test_table (a,b) SELECT i,i FROM generate_series(1,3)i; CREATE TABLE trigger_test_statatics(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL varchar,TG_WHEN varchar, TG_OP varchar); CREATE FUNCTION trigger_test_procedure() RETURNS TRIGGER AS $ttp$ BEGIN INSERT INTO trigger_test_statatics SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,TG_OP; RETURN OLD; END; $ttp$ LANGUAGE plpgsql; CREATE TRIGGER trigger_test11 AFTER DELETE ON trigger_test_table FOR EACH STATEMENT EXECUTE PROCEDURE trigger_test_procedure(); CREATE TRIGGER trigger_test12 AFTER DELETE ON trigger_test_table1 FOR EACH STATEMENT EXECUTE PROCEDURE trigger_test_procedure(); postgres=# DELETE FROM trigger_test_table WHERE a = 5; DELETE 0 postgres=# SELECT * FROM trigger_test_statatics; tg_name | tg_table_name| tg_level | tg_when | tg_op ++---+-+ trigger_test11 | trigger_test_table | STATEMENT | AFTER | DELETE (1 row) TRUNCATE TABLE trigger_test_statatics; --statement trigger NOT fired, when trying to delete data outside partition range. postgres=# DELETE FROM trigger_test_table WHERE a = 10; DELETE 0 postgres=# SELECT * FROM trigger_test_statatics; tg_name | tg_table_name | tg_level | tg_when | tg_op -+---+--+-+--- (0 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Declarative partitioning - another take
Thanks for taking a look. On 2017/04/28 5:24, Robert Haas wrote: > On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote >wrote: >>> Do we need to update the documentation? >> >> Yes, I think we should. How about as in the attached? > > Looks reasonable, but I was thinking you might also update the section > which contrasts inheritance-based partitioning with declarative > partitioning. It seems to me that there is no difference in behavior between inheritance-based and declarative partitioning as far as statement-level triggers are concerned (at least currently). In both the cases, we fire statement-level triggers only for the table specified in the command. Maybe, we will fix things someday so that statement-level triggers will be fired for all the tables in a inheritance hierarchy when the root parent is updated or deleted, but that's currently not true. We may never implement that behavior for declarative partitioned tables though, so there will be a difference if and when we implement the former. Am I missing something? >> By the way, code changes I made in the attached are such that a subsequent >> patch could implement firing statement-level triggers of all the tables in >> a partition hierarchy, which it seems we don't want to do. Should then >> the code be changed to not create ResultRelInfos of all the tables but >> only the root table (the one mentioned in the command)? You will see that >> the patch adds fields named es_nonleaf_result_relations and >> es_num_nonleaf_result_relations, whereas just es_root_result_relation >> would perhaps do, for example. > > It seems better not to create any ResultRelInfos that we don't > actually need, so +1 for such a revision to the patch. OK, done. It took a bit more work than I thought. Updated patch attached. Thanks, Amit >From a8b584f09bc02b6c16c33e816fc12c7e443dd65c Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Fire per-statement triggers of partitioned tables Current implementation completely misses them, because it assumes that no ResultRelInfos need to be created for partitioned tables, whereas they *are* needed to fire the per-statment triggers, which we *do* allow to be defined on the partitioned tables. Reported By: Rajkumar Raghuwanshi Patch by: Amit Langote Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com --- doc/src/sgml/trigger.sgml | 26 +++ src/backend/executor/execMain.c | 55 -- src/backend/executor/nodeModifyTable.c | 64 ++ src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/outfuncs.c| 3 ++ src/backend/nodes/readfuncs.c | 2 + src/backend/optimizer/plan/createplan.c | 1 + src/backend/optimizer/plan/planner.c| 3 ++ src/backend/optimizer/plan/setrefs.c| 19 ++-- src/include/nodes/execnodes.h | 12 + src/include/nodes/plannodes.h | 13 +- src/include/nodes/relation.h| 1 + src/test/regress/expected/triggers.out | 81 + src/test/regress/sql/triggers.sql | 70 14 files changed, 323 insertions(+), 29 deletions(-) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 2a718d7f47..5771bd5fdc 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,8 @@ A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to tables, views, and foreign tables. +performed. Triggers can be attached to tables (partitioned or not), +views, and foreign tables. @@ -111,14 +112,21 @@ Statement-level BEFORE triggers naturally fire before the statement starts to do anything, while statement-level AFTER triggers fire at the very end of the statement. These types of -triggers may be defined on tables or views. Row-level BEFORE -triggers fire immediately before a particular row is operated on, -while row-level AFTER triggers fire at the end of the -statement (but before any statement-level AFTER triggers). -These types of triggers may only be defined on tables and foreign tables. -Row-level INSTEAD OF triggers may only be defined on views, -and fire immediately as each row in the view is identified as needing to -be operated on. +triggers may be defined on tables, views, or foreign tables. Row-level +BEFORE triggers fire immediately before a particular row is +operated on, while row-level AFTER triggers fire at the end of +the statement (but before any statement-level AFTER triggers). +These types of triggers may only be defined on non-partitioned tables and +foreign tables.
Re: [HACKERS] Declarative partitioning - another take
On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote: > On 2017/04/27 1:52, Robert Haas wrote: > > On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote > >wrote: > >> FWIW, I too prefer the latter, that is, fire only the parent's triggers. > >> In that case, applying only the patch 0001 will do. > > > > Do we need to update the documentation? > > Yes, I think we should. How about as in the attached? > > By the way, code changes I made in the attached are such that a subsequent > patch could implement firing statement-level triggers of all the tables in > a partition hierarchy, which it seems we don't want to do. Should then > the code be changed to not create ResultRelInfos of all the tables but > only the root table (the one mentioned in the command)? You will see that > the patch adds fields named es_nonleaf_result_relations and > es_num_nonleaf_result_relations, whereas just es_root_result_relation > would perhaps do, for example. Did I notice correctly that there's no way to handle transition tables for partitions in AFTER STATEMENT triggers? If not, I'm not suggesting that this be added at this late date, but we might want to document that. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Wed, Apr 26, 2017 at 9:30 PM, Amit Langotewrote: >> Do we need to update the documentation? > > Yes, I think we should. How about as in the attached? Looks reasonable, but I was thinking you might also update the section which contrasts inheritance-based partitioning with declarative partitioning. > By the way, code changes I made in the attached are such that a subsequent > patch could implement firing statement-level triggers of all the tables in > a partition hierarchy, which it seems we don't want to do. Should then > the code be changed to not create ResultRelInfos of all the tables but > only the root table (the one mentioned in the command)? You will see that > the patch adds fields named es_nonleaf_result_relations and > es_num_nonleaf_result_relations, whereas just es_root_result_relation > would perhaps do, for example. It seems better not to create any ResultRelInfos that we don't actually need, so +1 for such a revision to the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/04/27 1:52, Robert Haas wrote: > On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote >wrote: >> FWIW, I too prefer the latter, that is, fire only the parent's triggers. >> In that case, applying only the patch 0001 will do. > > Do we need to update the documentation? Yes, I think we should. How about as in the attached? By the way, code changes I made in the attached are such that a subsequent patch could implement firing statement-level triggers of all the tables in a partition hierarchy, which it seems we don't want to do. Should then the code be changed to not create ResultRelInfos of all the tables but only the root table (the one mentioned in the command)? You will see that the patch adds fields named es_nonleaf_result_relations and es_num_nonleaf_result_relations, whereas just es_root_result_relation would perhaps do, for example. Thanks, Amit >From c8b785a77a2a193906967762066e03e09de80442 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH 1/2] Fire per-statement triggers of partitioned tables Current implementation completely misses them, because it assumes that no ResultRelInfos need to be created for partitioned tables, whereas they *are* needed to fire the per-statment triggers, which we *do* allow to be defined on the partitioned tables. Reported By: Rajkumar Raghuwanshi Patch by: Amit Langote Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com --- doc/src/sgml/trigger.sgml | 26 - src/backend/executor/execMain.c | 33 - src/backend/executor/nodeModifyTable.c | 66 - src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/outfuncs.c| 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/plan/createplan.c | 1 + src/backend/optimizer/plan/setrefs.c| 4 ++ src/include/nodes/execnodes.h | 11 ++ src/include/nodes/plannodes.h | 2 + src/test/regress/expected/triggers.out | 54 +++ src/test/regress/sql/triggers.sql | 48 12 files changed, 227 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 2a718d7f47..5771bd5fdc 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,8 @@ A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to tables, views, and foreign tables. +performed. Triggers can be attached to tables (partitioned or not), +views, and foreign tables. @@ -111,14 +112,21 @@ Statement-level BEFORE triggers naturally fire before the statement starts to do anything, while statement-level AFTER triggers fire at the very end of the statement. These types of -triggers may be defined on tables or views. Row-level BEFORE -triggers fire immediately before a particular row is operated on, -while row-level AFTER triggers fire at the end of the -statement (but before any statement-level AFTER triggers). -These types of triggers may only be defined on tables and foreign tables. -Row-level INSTEAD OF triggers may only be defined on views, -and fire immediately as each row in the view is identified as needing to -be operated on. +triggers may be defined on tables, views, or foreign tables. Row-level +BEFORE triggers fire immediately before a particular row is +operated on, while row-level AFTER triggers fire at the end of +the statement (but before any statement-level AFTER triggers). +These types of triggers may only be defined on non-partitioned tables and +foreign tables. Row-level INSTEAD OF triggers may only be +defined on views, and fire immediately as each row in the view is +identified as needing to be operated on. + + + +A statement-level trigger defined on partitioned tables is fired only +once for the table itself, not once for every table in the partitioning +hierarchy. However, row-level triggers of any affected leaf partitions +will be fired. diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 5c12fb457d..586b396b3e 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -861,18 +861,35 @@ InitPlan(QueryDesc *queryDesc, int eflags) /* * In the partitioned result relation case, lock the non-leaf result - * relations too. We don't however need ResultRelInfos for them. + * relations too. We also need ResultRelInfos so that per-statement + * triggers defined on these relations can be fired. */ if (plannedstmt->nonleafResultRelations) { + numResultRelations = +
Re: [HACKERS] Declarative partitioning - another take
On Tue, Apr 25, 2017 at 10:34 PM, Amit Langotewrote: > FWIW, I too prefer the latter, that is, fire only the parent's triggers. > In that case, applying only the patch 0001 will do. Do we need to update the documentation? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 26 April 2017 at 00:28, Robert Haaswrote: > So what I'd prefer -- on > the totally unprincipled basis that it would let us improve > performance in the future -- if you operate on a partition directly, > you fire the partition's triggers, but if you operate on the parent, > only the parent's triggers fire. I would also opt for this behaviour. Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, Thanks for testing. On 2017/04/25 19:03, Rajkumar Raghuwanshi wrote: > Thanks for looking into it. I have applied fixes and checked for triggers. > I could see difference in behaviour of statement triggers for INSERT and > UPDATE, for insert only root partition triggers are getting fired but for > update root as well as child partition table triggers both getting fired. > is this expected?? Yes, because I didn't implement anything for the insert case yet. I posed a question whether to fire partitions' per-statement triggers when inserting data through the root table. Robert replied [1] that it would be desirable to not fire partitions' per-statement triggers if the root table is mentioned in the query; only fire their per-row triggers if any. It already works that way for inserts, and applying only 0001 will get you the same for update/delete. Patch 0002 is to enable firing partition's per-statement triggers even if the root table is mentioned in the query, but it implemented the same only for the update/delete cases. If we decide that that's the right thing to do, then I will implement the same behavior for the insert case too. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoatBYy8Hyi3cYR1rFrCkD2NM4ZLZcck4QDGvH%3DHddfDwA%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/04/26 3:58, Robert Haas wrote: > On Mon, Apr 24, 2017 at 6:43 AM, Amit Langote >wrote: >> The reason it doesn't work is that we do not allocate ResultRelInfos for >> partitioned tables (not even for the root partitioned table in the >> update/delete cases), because the current implementation assumes they are >> not required. That's fine only so long as we consider that no rows are >> inserted into them, no indexes need to be updated, and that no row-level >> triggers are to be fired. But it misses the fact that we do allow >> statement-level triggers on partitioned tables. So, we should fix things >> such that ResultRelInfos are allocated so that any statement-level >> triggers are fired. But there are following questions to consider: >> >> 1. Should we consider only the root partitioned table or all partitioned >>tables in a given hierarchy, including the child partitioned tables? >>Note that this is related to a current limitation of updating/deleting >>inheritance hierarchies that we do not currently fire per-statement >>triggers of the child tables. See the TODO item in wiki: >>https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When >>statement-level triggers are defined on a parent table, have them fire >>only on the parent table, and fire child table triggers only where >>appropriate" >> >> 2. Do we apply the same to inserts on the partitioned tables? Since >>insert on a partitioned table implicitly affects its partitions, one >>might say that we would need to fire per-statement insert triggers of >>all the partitions. > > It seems to me that it doesn't make a lot of sense to fire the > triggers for some tables involved in the hierarchy and not others. I > suppose the consistent thing to do here is to make sure that we fire > the statement triggers for all tables in the partitioning hierarchy > for all operations (insert, update, delete, etc.). > > TBH, I don't like that very much. I'd rather fire the triggers only > for the table actually named in the query and skip all the others, > mostly because it seems like it would be faster and less likely to > block future optimizations; eventually, I'd like to consider not even > locking the children we're not touching, but that's not going to work > if we have to check them all for triggers. So what I'd prefer -- on > the totally unprincipled basis that it would let us improve > performance in the future -- if you operate on a partition directly, > you fire the partition's triggers, but if you operate on the parent, > only the parent's triggers fire. FWIW, I too prefer the latter, that is, fire only the parent's triggers. In that case, applying only the patch 0001 will do. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, Apr 24, 2017 at 6:43 AM, Amit Langotewrote: > The reason it doesn't work is that we do not allocate ResultRelInfos for > partitioned tables (not even for the root partitioned table in the > update/delete cases), because the current implementation assumes they are > not required. That's fine only so long as we consider that no rows are > inserted into them, no indexes need to be updated, and that no row-level > triggers are to be fired. But it misses the fact that we do allow > statement-level triggers on partitioned tables. So, we should fix things > such that ResultRelInfos are allocated so that any statement-level > triggers are fired. But there are following questions to consider: > > 1. Should we consider only the root partitioned table or all partitioned >tables in a given hierarchy, including the child partitioned tables? >Note that this is related to a current limitation of updating/deleting >inheritance hierarchies that we do not currently fire per-statement >triggers of the child tables. See the TODO item in wiki: >https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When >statement-level triggers are defined on a parent table, have them fire >only on the parent table, and fire child table triggers only where >appropriate" > > 2. Do we apply the same to inserts on the partitioned tables? Since >insert on a partitioned table implicitly affects its partitions, one >might say that we would need to fire per-statement insert triggers of >all the partitions. It seems to me that it doesn't make a lot of sense to fire the triggers for some tables involved in the hierarchy and not others. I suppose the consistent thing to do here is to make sure that we fire the statement triggers for all tables in the partitioning hierarchy for all operations (insert, update, delete, etc.). TBH, I don't like that very much. I'd rather fire the triggers only for the table actually named in the query and skip all the others, mostly because it seems like it would be faster and less likely to block future optimizations; eventually, I'd like to consider not even locking the children we're not touching, but that's not going to work if we have to check them all for triggers. So what I'd prefer -- on the totally unprincipled basis that it would let us improve performance in the future -- if you operate on a partition directly, you fire the partition's triggers, but if you operate on the parent, only the parent's triggers fire. How heretical is that idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, Apr 24, 2017 at 4:13 PM, Amit Langotewrote: > Hi Rajkumar, > > It would be great if you could check if the patches fix the issues. > Hi Amit, Thanks for looking into it. I have applied fixes and checked for triggers. I could see difference in behaviour of statement triggers for INSERT and UPDATE, for insert only root partition triggers are getting fired but for update root as well as child partition table triggers both getting fired. is this expected?? Below are steps to reproduce. CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a); CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (6); CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (6) to (11); INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,7)i; CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL varchar,TG_WHEN varchar,a_old int,a_new int,b_old int,b_new int); CREATE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $ttp$ BEGIN IF (TG_OP = 'INSERT') THEN IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF; IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NEW.a,NULL,NEW.b; END IF; RETURN NEW; END IF; IF (TG_OP = 'UPDATE') THEN IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF; IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,OLD.a,NEW.a,OLD.b,NEW.b; END IF; RETURN NEW; END IF; END; $ttp$ LANGUAGE plpgsql; CREATE TRIGGER trigger_test11 AFTER INSERT OR UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test12 AFTER INSERT OR UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test13 AFTER INSERT OR UPDATE ON pt2 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test21 BEFORE INSERT OR UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test22 BEFORE INSERT OR UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test23 BEFORE INSERT OR UPDATE ON pt2 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test32 AFTER INSERT OR UPDATE ON pt1 FOR EACH ROW EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test33 AFTER INSERT OR UPDATE ON pt2 FOR EACH ROW EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test42 BEFORE INSERT OR UPDATE ON pt1 FOR EACH ROW EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER trigger_test43 BEFORE INSERT OR UPDATE ON pt2 FOR EACH ROW EXECUTE PROCEDURE process_pt_trigger(); postgres=# INSERT INTO pt (a,b) VALUES (8,8); INSERT 0 1 postgres=# SELECT * FROM pt_trigger; tg_name | tg_table_name | tg_level | tg_when | a_old | a_new | b_old | b_new +---+---+-+---+---+---+--- trigger_test21 | pt| STATEMENT | BEFORE | | | | trigger_test43 | pt2 | ROW | BEFORE | | 8 | | 8 trigger_test33 | pt2 | ROW | AFTER | | 8 | | 8 trigger_test11 | pt| STATEMENT | AFTER | | | | (4 rows) postgres=# TRUNCATE TABLE pt_trigger; TRUNCATE TABLE postgres=# UPDATE pt SET a = 2 WHERE a = 1; UPDATE 1 postgres=# SELECT * FROM pt_trigger; tg_name | tg_table_name | tg_level | tg_when | a_old | a_new | b_old | b_new +---+---+-+---+---+---+--- trigger_test21 | pt| STATEMENT | BEFORE | | | | trigger_test22 | pt1 | STATEMENT | BEFORE | | | | trigger_test42 | pt1 | ROW | BEFORE | 1 | 2 | 1 | 1 trigger_test32 | pt1 | ROW | AFTER | 1 | 2 | 1 | 1 trigger_test11 | pt| STATEMENT | AFTER | | | | trigger_test12 | pt1 | STATEMENT | AFTER | | | | (6 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB
Re: [HACKERS] Declarative partitioning - another take
Hi Rajkumar, Thanks for testing and the report. On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote: > Hi, > > I have observed below with the statement triggers. > > I am able to create statement triggers at root partition, but these > triggers, not getting fired on updating partition. > > CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a); > CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7); > CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11); > INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i; > CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL > varchar,TG_WHEN varchar); > CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$ > BEGIN > IF (TG_OP = 'UPDATE') THEN > INSERT INTO pt_trigger SELECT > TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN; > RETURN NEW; > END IF; > RETURN NULL; > END; > $pttg$ LANGUAGE plpgsql; > CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT > EXECUTE PROCEDURE process_pt_trigger(); > CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT > EXECUTE PROCEDURE process_pt_trigger(); > > postgres=# UPDATE pt SET a = 2 WHERE a = 1; > UPDATE 1 > postgres=# SELECT * FROM pt_trigger ORDER BY 1; > tg_name | tg_table_name | tg_level | tg_when > -+---+--+- > (0 rows) > > no statement level trigger fired in this case, is this expected behaviour?? I think we discussed this during the development and decided to allow per-statement triggers on partitioned tables [1], but it seems it doesn't quite work for update/delete as your example shows. You can however see that per-statement triggers of the root partitioned table *are* fired in the case of insert. The reason it doesn't work is that we do not allocate ResultRelInfos for partitioned tables (not even for the root partitioned table in the update/delete cases), because the current implementation assumes they are not required. That's fine only so long as we consider that no rows are inserted into them, no indexes need to be updated, and that no row-level triggers are to be fired. But it misses the fact that we do allow statement-level triggers on partitioned tables. So, we should fix things such that ResultRelInfos are allocated so that any statement-level triggers are fired. But there are following questions to consider: 1. Should we consider only the root partitioned table or all partitioned tables in a given hierarchy, including the child partitioned tables? Note that this is related to a current limitation of updating/deleting inheritance hierarchies that we do not currently fire per-statement triggers of the child tables. See the TODO item in wiki: https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate" 2. Do we apply the same to inserts on the partitioned tables? Since insert on a partitioned table implicitly affects its partitions, one might say that we would need to fire per-statement insert triggers of all the partitions. Meanwhile, attached is a set of patches to fix this. Descriptions follow: 0001: fire per-statement triggers of the partitioned table mentioned in a given update or delete statement 0002: fire per-statement triggers of the child tables during update/delete of inheritance hierarchies (including the partitioned table case) Depending on the answer to 2 above, we can arrange for the per-statement triggers of all the partitions to be fired upon insert into the partitioned table. > but if i am creating triggers on leaf partition, trigger is getting fired. > > CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT > EXECUTE PROCEDURE process_pt_trigger(); > CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT > EXECUTE PROCEDURE process_pt_trigger(); > > postgres=# UPDATE pt SET a = 5 WHERE a = 4; > UPDATE 1 > postgres=# SELECT * FROM pt_trigger ORDER BY 1; >tg_name| tg_table_name | tg_level | tg_when > --+---+---+- > pt_trigger_after_p1 | pt1 | STATEMENT | AFTER > pt_trigger_before_p1 | pt1 | STATEMENT | BEFORE > (2 rows) Actually, this works only by accident (with the current implementation, the *first* partition's triggers will get fired). If you create another partition, its per-statement triggers won't get fired. The patches described above fixes that. It would be great if you could check if the patches fix the issues. Also, adding this to the PostgreSQL 10 open items list. Thanks, Amit [1] https://www.postgresql.org/message-id/8e05817e-14c8-13e4-502b-e440adb24258%40lab.ntt.co.jp >From 9c7543615ccb05c004591a9176f818413df7ea9d Mon Sep 17 00:00:00 2001 From: amit
Re: [HACKERS] Declarative partitioning - another take
Hi, I have observed below with the statement triggers. I am able to create statement triggers at root partition, but these triggers, not getting fired on updating partition. CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a); CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7); CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11); INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i; CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL varchar,TG_WHEN varchar); CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$ BEGIN IF (TG_OP = 'UPDATE') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN; RETURN NEW; END IF; RETURN NULL; END; $pttg$ LANGUAGE plpgsql; CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); postgres=# UPDATE pt SET a = 2 WHERE a = 1; UPDATE 1 postgres=# SELECT * FROM pt_trigger ORDER BY 1; tg_name | tg_table_name | tg_level | tg_when -+---+--+- (0 rows) no statement level trigger fired in this case, is this expected behaviour?? but if i am creating triggers on leaf partition, trigger is getting fired. CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); postgres=# UPDATE pt SET a = 5 WHERE a = 4; UPDATE 1 postgres=# SELECT * FROM pt_trigger ORDER BY 1; tg_name| tg_table_name | tg_level | tg_when --+---+---+- pt_trigger_after_p1 | pt1 | STATEMENT | AFTER pt_trigger_before_p1 | pt1 | STATEMENT | BEFORE (2 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Declarative partitioning - another take
On 07.04.2017 13:05, Etsuro Fujita wrote: On 2016/12/14 16:20, Etsuro Fujita wrote: On 2016/12/09 19:46, Maksim Milyutin wrote: I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; The first one has been implemented in pg_pathman somehow, but the code relies on dirty hacks, so the FDW API has to be improved. As for the extended index support, it doesn't look like a super-hard task. That would be great! I'd like to help review the first one. There seems to be no work on the first one, so I'd like to work on that. Yes, you can start to work on this, I'll join later as a reviewer. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/14 16:20, Etsuro Fujita wrote: On 2016/12/09 19:46, Maksim Milyutin wrote: I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; The first one has been implemented in pg_pathman somehow, but the code relies on dirty hacks, so the FDW API has to be improved. As for the extended index support, it doesn't look like a super-hard task. That would be great! I'd like to help review the first one. There seems to be no work on the first one, so I'd like to work on that. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Amit, On Thu, 23 Feb 2017 16:29:32 +0900 Amit Langotewrote: > Thanks for taking care of that. > > + * PartitionRoot relation descriptor for parent > relation > > Maybe: relation descriptor for root parent relation This seems better. Patch is updated. Thanks, -- Yugo Nagata diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 6332ea0..af3367a 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -326,6 +326,7 @@ typedef struct JunkFilter * onConflictSetWhere list of ON CONFLICT DO UPDATE exprs (qual) * PartitionCheck partition check expression * PartitionCheckExpr partition check expression state + * PartitionRoot relation descriptor for root parent relation * */ typedef struct ResultRelInfo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Nagata-san, On 2017/02/23 16:17, Yugo Nagata wrote: > Hi, > > I found that a comment for PartitionRoot in ResultRelInfo is missing. > Although this is trivial, since all other members have comments, I > think it is needed. Attached is the patch to fix it. Thanks for taking care of that. + * PartitionRoot relation descriptor for parent relation Maybe: relation descriptor for root parent relation Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, I found that a comment for PartitionRoot in ResultRelInfo is missing. Although this is trivial, since all other members have comments, I think it is needed. Attached is the patch to fix it. Regards, Yugo Nagata On Tue, 27 Dec 2016 17:59:05 +0900 Amit Langotewrote: > On 2016/12/26 19:46, Amit Langote wrote: > > (Perhaps, the following should be its own new thread) > > > > I noticed that ExecProcessReturning() doesn't work properly after tuple > > routing (example shows how returning tableoid currently fails but I > > mention some other issues below): > > > > create table p (a int, b int) partition by range (a); > > create table p1 partition of p for values from (1) to (10); > > insert into p values (1) returning tableoid::regclass, *; > > tableoid | a | b > > --+---+--- > > -| 1 | > > (1 row) > > > > INSERT 0 1 > > > > I tried to fix that in 0007 to get: > > > > insert into p values (1) returning tableoid::regclass, *; > > tableoid | a | b > > --+---+--- > > p| 1 | > > (1 row) > > > > INSERT 0 1 > > > > But I think it *may* be wrong to return the root table OID for tuples > > inserted into leaf partitions, because with select we get partition OIDs: > > > > select tableoid::regclass, * from p; > > tableoid | a | b > > --+---+--- > > p1 | 1 | > > (1 row) > > > > If so, that means we should build the projection info (corresponding to > > the returning list) for each target partition somehow. ISTM, that's going > > to have to be done within the planner by appropriate inheritance > > translation of the original returning targetlist. > > Turns out getting the 2nd result may not require planner tweaks after all. > Unless I'm missing something, translation of varattnos of the RETURNING > target list can be done as late as ExecInitModifyTable() for the insert > case, unlike update/delete (which do require planner's attention). > > I updated the patch 0007 to implement the same, including the test. While > doing that, I realized map_partition_varattnos introduced in 0003 is > rather restrictive in its applicability, because it assumes varno = 1 for > the expressions it accepts as input for the mapping. Mapping returning > (target) list required modifying map_partition_varattnos to accept > target_varno as additional argument. That way, we can map arbitrary > expressions from the parent attributes numbers to partition attribute > numbers for expressions not limited to partition constraints. > > Patches 0001 to 0006 unchanged. > > Thanks, > Amit -- Yugo Nagata diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 6332ea0..845bdf2 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -326,6 +326,7 @@ typedef struct JunkFilter * onConflictSetWhere list of ON CONFLICT DO UPDATE exprs (qual) * PartitionCheck partition check expression * PartitionCheckExpr partition check expression state + * PartitionRoot relation descriptor for parent relation * */ typedef struct ResultRelInfo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Feb 10, 2017 at 12:54 AM, Amit Langotewrote: > On 2017/02/09 15:22, amul sul wrote: >> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch, >> following test is already covered in alter_table.sql @ Line # 1918, >> instead of this kindly add test for list_partition: >> >> 77 +-- cannot drop NOT NULL constraint of a range partition key column >> 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL; >> 79 + > > Thanks for the review! You're right. Updated patch attached. Committed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/02/09 15:22, amul sul wrote: > About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch, > following test is already covered in alter_table.sql @ Line # 1918, > instead of this kindly add test for list_partition: > > 77 +-- cannot drop NOT NULL constraint of a range partition key column > 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL; > 79 + Thanks for the review! You're right. Updated patch attached. Thanks, Amit >From a4335048e92462fb55fa6db0d830c7c066cd7011 Mon Sep 17 00:00:00 2001 From: amitDate: Fri, 10 Feb 2017 14:52:18 +0900 Subject: [PATCH] Check partition strategy in ATExecDropNotNull We should prevent dropping the NOT NULL constraint on a column only if the column is in the *range* partition key (as the error message also says). Add a regression test while at it. Reported by: Amul Sul Patch by: Amit Langote Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com --- src/backend/commands/tablecmds.c | 22 +- src/test/regress/expected/alter_table.out | 4 src/test/regress/sql/alter_table.sql | 5 + 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 37a4c4a3d6..f33aa70da6 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode) if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { PartitionKey key = RelationGetPartitionKey(rel); - int partnatts = get_partition_natts(key), - i; - for (i = 0; i < partnatts; i++) + if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE) { - AttrNumber partattnum = get_partition_col_attnum(key, i); + int partnatts = get_partition_natts(key), + i; - if (partattnum == attnum) -ereport(ERROR, - (errcode(ERRCODE_INVALID_TABLE_DEFINITION), - errmsg("column \"%s\" is in range partition key", -colName))); + for (i = 0; i < partnatts; i++) + { +AttrNumber partattnum = get_partition_col_attnum(key, i); + +if (partattnum == attnum) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("column \"%s\" is in range partition key", + colName))); + } } } diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index d8e7b61294..b0e80a7788 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3029,6 +3029,10 @@ ERROR: cannot alter type of column referenced in partition key expression -- cannot drop NOT NULL on columns in the range partition key ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL; ERROR: column "a" is in range partition key +-- it's fine however to drop one on the list partition key column +CREATE TABLE list_partitioned (a int not null) partition by list (a); +ALTER TABLE list_partitioned ALTER a DROP NOT NULL; +DROP TABLE list_partitioned; -- partitioned table cannot participate in regular inheritance CREATE TABLE foo ( a int, diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql index 1f551ec53c..7513769359 100644 --- a/src/test/regress/sql/alter_table.sql +++ b/src/test/regress/sql/alter_table.sql @@ -1917,6 +1917,11 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5); -- cannot drop NOT NULL on columns in the range partition key ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL; +-- it's fine however to drop one on the list partition key column +CREATE TABLE list_partitioned (a int not null) partition by list (a); +ALTER TABLE list_partitioned ALTER a DROP NOT NULL; +DROP TABLE list_partitioned; + -- partitioned table cannot participate in regular inheritance CREATE TABLE foo ( a int, -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch, following test is already covered in alter_table.sql @ Line # 1918, instead of this kindly add test for list_partition: 77 +-- cannot drop NOT NULL constraint of a range partition key column 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL; 79 + Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/02/08 21:20, amul sul wrote: > Regarding following code in ATExecDropNotNull function, I don't see > any special check for RANGE partitioned, is it intended to have same > restriction for LIST partitioned too, I guess not? > > /* > * If the table is a range partitioned table, check that the column is not > * in the partition key. > */ > if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > { > PartitionKey key = RelationGetPartitionKey(rel); > int partnatts = get_partition_natts(key), > i; > > for (i = 0; i < partnatts; i++) > { > AttrNumber partattnum = get_partition_col_attnum(key, i); > > if (partattnum == attnum) > ereport(ERROR, > (errcode(ERRCODE_INVALID_TABLE_DEFINITION), > errmsg("column \"%s\" is in range partition key", > colName))); > } > } Good catch! Seems like an oversight, attached fixes it. Thanks, Amit >From b84ac63b6b4acd19a09e8507cf199db127c06d71 Mon Sep 17 00:00:00 2001 From: amitDate: Thu, 9 Feb 2017 10:31:58 +0900 Subject: [PATCH] Check partition strategy in ATExecDropNotNull We should prevent dropping the NOT NULL constraint on a column only if the column is in the *range* partition key (as the error message also says). Add a regression test while at it. Reported by: Amul Sul Patch by: Amit Langote Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com --- src/backend/commands/tablecmds.c | 22 +- src/test/regress/expected/alter_table.out | 3 +++ src/test/regress/sql/alter_table.sql | 3 +++ 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 37a4c4a3d6..f33aa70da6 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode) if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { PartitionKey key = RelationGetPartitionKey(rel); - int partnatts = get_partition_natts(key), - i; - for (i = 0; i < partnatts; i++) + if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE) { - AttrNumber partattnum = get_partition_col_attnum(key, i); + int partnatts = get_partition_natts(key), + i; - if (partattnum == attnum) -ereport(ERROR, - (errcode(ERRCODE_INVALID_TABLE_DEFINITION), - errmsg("column \"%s\" is in range partition key", -colName))); + for (i = 0; i < partnatts; i++) + { +AttrNumber partattnum = get_partition_col_attnum(key, i); + +if (partattnum == attnum) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("column \"%s\" is in range partition key", + colName))); + } } } diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index d8e7b61294..e6d45fbf9c 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3330,6 +3330,9 @@ ALTER TABLE list_parted2 DROP COLUMN b; ERROR: cannot drop column named in partition key ALTER TABLE list_parted2 ALTER COLUMN b TYPE text; ERROR: cannot alter type of column named in partition key +-- cannot drop NOT NULL constraint of a range partition key column +ALTER TABLE range_parted ALTER a DROP NOT NULL; +ERROR: column "a" is in range partition key -- cleanup: avoid using CASCADE DROP TABLE list_parted, part_1; DROP TABLE list_parted2, part_2, part_5, part_5_a; diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql index 1f551ec53c..12edb8b3ba 100644 --- a/src/test/regress/sql/alter_table.sql +++ b/src/test/regress/sql/alter_table.sql @@ -2189,6 +2189,9 @@ ALTER TABLE part_2 INHERIT inh_test; ALTER TABLE list_parted2 DROP COLUMN b; ALTER TABLE list_parted2 ALTER COLUMN b TYPE text; +-- cannot drop NOT NULL constraint of a range partition key column +ALTER TABLE range_parted ALTER a DROP NOT NULL; + -- cleanup: avoid using CASCADE DROP TABLE list_parted, part_1; DROP TABLE list_parted2, part_2, part_5, part_5_a; -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Amit, Regarding following code in ATExecDropNotNull function, I don't see any special check for RANGE partitioned, is it intended to have same restriction for LIST partitioned too, I guess not? /* * If the table is a range partitioned table, check that the column is not * in the partition key. */ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { PartitionKey key = RelationGetPartitionKey(rel); int partnatts = get_partition_natts(key), i; for (i = 0; i < partnatts; i++) { AttrNumber partattnum = get_partition_col_attnum(key, i); if (partattnum == attnum) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("column \"%s\" is in range partition key", colName))); } } Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, Jan 30, 2017 at 4:42 PM, Peter Eisentrautwrote: > On 1/25/17 12:54 AM, Ashutosh Bapat wrote: >> The documentation available at >> https://www.postgresql.org/docs/devel/static/sql-createtable.html, >> does not make it clear that the lower bound of a range partition is >> always inclusive and the higher one is exclusive. I think a note in >> section " PARTITION OF parent_table FOR VALUES partition_bound_spec" >> would be helpful. > > Hmm. I see the practical use of that, but I think this is going to be a > source of endless confusion. Can we make that a bit clearer in the > syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)? I am not in favor of adding mandatory noise words to the syntax; it's fairly verbose already. I think it would be reasonable to eventually consider supporting optional keywords to allow multiple behaviors, but I don't think that the usefulness of that has been so firmly established that we should do it right this minute. I think there are a heck of a lot of other things about this partitioning implementation that are more urgently in need of improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/31 6:42, Peter Eisentraut wrote: > On 1/25/17 12:54 AM, Ashutosh Bapat wrote: >> The documentation available at >> https://www.postgresql.org/docs/devel/static/sql-createtable.html, >> does not make it clear that the lower bound of a range partition is >> always inclusive and the higher one is exclusive. I think a note in >> section " PARTITION OF parent_table FOR VALUES partition_bound_spec" >> would be helpful. > > Hmm. I see the practical use of that, but I think this is going to be a > source of endless confusion. Can we make that a bit clearer in the > syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)? The decision not to make that configurable with INCLUSIVE/EXCLUSIVE syntax was deliberate. To summarize, we can start with a default configuration catering to most practical cases (that is, inclusive lower and exclusive upper bounds) and documenting so (not done yet, which I will post a doc patch today for). If it turns out that there is some demand for making that configurable, we can later add the code to handle that internally plus the syntax. But *starting* with that syntax means we have to potentially needlessly carry the code to handle seldom used cases that could not be made as efficient as it is now with all lower bounds being inclusive and upper bounds exclusive. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoZou4ApEvC_nfhOxsi5G4SoD_evwNaiYn60ZcJ4XB_-QQ%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 1/25/17 12:54 AM, Ashutosh Bapat wrote: > The documentation available at > https://www.postgresql.org/docs/devel/static/sql-createtable.html, > does not make it clear that the lower bound of a range partition is > always inclusive and the higher one is exclusive. I think a note in > section " PARTITION OF parent_table FOR VALUES partition_bound_spec" > would be helpful. Hmm. I see the practical use of that, but I think this is going to be a source of endless confusion. Can we make that a bit clearer in the syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Ashutosh, On 2017/01/25 14:54, Ashutosh Bapat wrote: > The documentation available at > https://www.postgresql.org/docs/devel/static/sql-createtable.html, > does not make it clear that the lower bound of a range partition is > always inclusive and the higher one is exclusive. I think a note in > section " PARTITION OF parent_table FOR VALUES partition_bound_spec" > would be helpful. Yes, I'm working on a documentation patch for that and a few other things such as revising the Partitioning chapter. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
The documentation available at https://www.postgresql.org/docs/devel/static/sql-createtable.html, does not make it clear that the lower bound of a range partition is always inclusive and the higher one is exclusive. I think a note in section " PARTITION OF parent_table FOR VALUES partition_bound_spec" would be helpful. On Wed, Jan 25, 2017 at 7:11 AM, Amit Langotewrote: > On 2017/01/25 5:55, Robert Haas wrote: >> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote >> wrote: >>> [ new patches ] >> >> Committed 0001 and 0002. See my earlier email for comments on 0003. > > It seems patches for all the issues mentioned in this thread so far, > except 0003 (I just sent an updated version in another email), have been > committed. Thanks a lot for your time. I will create new threads for any > more patches from here on. > > Regards, > Amit > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/25 5:55, Robert Haas wrote: > On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote >wrote: >> [ new patches ] > > Committed 0001 and 0002. See my earlier email for comments on 0003. It seems patches for all the issues mentioned in this thread so far, except 0003 (I just sent an updated version in another email), have been committed. Thanks a lot for your time. I will create new threads for any more patches from here on. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/25 2:56, Robert Haas wrote: > On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote >wrote: >>> But I wonder why we don't instead just change this function to >>> consider tdhasoid rather than tdtypeid. I mean, if the only point of >>> comparing the type OIDs is to find out whether the table-has-OIDs >>> setting matches, we could instead test that directly and avoid needing >>> to pass an extra argument. I wonder if there's some other reason this >>> code is there which is not documented in the comment... >> >> With the following patch, regression tests run fine: >> >> if (indesc->natts == outdesc->natts && >> - indesc->tdtypeid == outdesc->tdtypeid) >> + indesc->tdhasoid != outdesc->tdhasoid) >> { >> >> If checking tdtypeid (instead of tdhasoid directly) has some other >> consideration, I'd would have seen at least some tests broken by this >> change. So, if we are to go with this, I too prefer it over my previous >> proposal to add an argument to convert_tuples_by_name(). Attached 0003 >> implements the above approach. > > I think this is not quite right. First, the patch compares the > tdhasoid status with != rather than ==, which would have the effect of > saying that we can skip conversion of the has-OID statuses do NOT > match. That can't be right. You're right. > Second, I believe that the comments > imply that conversion should be done if *either* tuple has OIDs. I > believe that's because whoever wrote this comment thought that we > needed to replace the OID if the tuple already had one, which is what > do_convert_tuple would do. I'm not sure whether that's really > necessary, but we're less likely to break anything if we preserve the > existing behavior, and I don't think we lose much from doing so > because few user tables will have OIDs. So I would change this test > to if (indesc->natts == outdesc->natts && !indesc->tdhasoid && > !outdesc->tdhasoid), and I'd revise the one in > convert_tuples_by_position() to match. Then I think it's much clearer > that we're just optimizing what's there already, not changing the > behavior. Agreed. Updated patch attached. Thanks, Amit >From c1fa4b9f04f328b8df54ef487ee9d46f5978e0de Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 26 Dec 2016 17:44:14 +0900 Subject: [PATCH] Avoid tuple coversion in common partitioning cases Currently, the tuple conversion is performed after a tuple is routed, even if the attributes of a target leaf partition map one-to-one with those of the root table, which is wasteful. Avoid that by making convert_tuples_by_name() return a NULL map for such cases. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/access/common/tupconvert.c | 18 ++ src/backend/catalog/partition.c| 3 +-- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c index b17ceafa6e..a4012525d8 100644 --- a/src/backend/access/common/tupconvert.c +++ b/src/backend/access/common/tupconvert.c @@ -138,12 +138,13 @@ convert_tuples_by_position(TupleDesc indesc, nincols, noutcols))); /* - * Check to see if the map is one-to-one and the tuple types are the same. - * (We check the latter because if they're not, we want to do conversion - * to inject the right OID into the tuple datum.) + * Check to see if the map is one-to-one, in which case we need not do + * the tuple conversion. That's not enough though if either source or + * destination (tuples) contains OIDs; we'd need conversion in that case + * to inject the right OID into the tuple datum. */ if (indesc->natts == outdesc->natts && - indesc->tdtypeid == outdesc->tdtypeid) + !indesc->tdhasoid && !outdesc->tdhasoid) { for (i = 0; i < n; i++) { @@ -214,12 +215,13 @@ convert_tuples_by_name(TupleDesc indesc, attrMap = convert_tuples_by_name_map(indesc, outdesc, msg); /* - * Check to see if the map is one-to-one and the tuple types are the same. - * (We check the latter because if they're not, we want to do conversion - * to inject the right OID into the tuple datum.) + * Check to see if the map is one-to-one, in which case we need not do + * the tuple conversion. That's not enough though if either source or + * destination (tuples) contains OIDs; we'd need conversion in that case + * to inject the right OID into the tuple datum. */ if (indesc->natts == outdesc->natts && - indesc->tdtypeid == outdesc->tdtypeid) + !indesc->tdhasoid && !outdesc->tdhasoid) { same = true; for (i = 0; i < n; i++) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 7be60529c5..4bcef58763 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1700,12 +1700,11 @@ get_partition_for_tuple(PartitionDispatch *pd, return -1; } - if (myslot != NULL) + if (myslot != NULL && map != NULL) {
Re: [HACKERS] Declarative partitioning - another take
Hi Keith, On 2017/01/20 12:40, Keith Fiske wrote: > So testing things out in pg_partman for native sub-partitioning and ran > into what is a bug for me that I know I have to fix, but I'm curious if > this can be prevented in the first place within the native partitioning > code itself. The below shows a sub-partitioning set where the sub-partition > has a constraint range that is outside of the range of its parent. If the > columns were different I could see where this would be allowed, but the > same column is used throughout the levels of sub-partitioning. > Understandable if that may be too complex to check for, but figured I'd > bring it up as something I accidentally ran into in case you see an easy > way to prevent it. This was discussed. See Robert's response (2nd part of): https://www.postgresql.org/message-id/CA%2BTgmoaQABrsLQK4ms_4NiyavyJGS-b6ZFkZBBNC%2B-P5DjJNFA%40mail.gmail.com In short, defining partitions across different levels such that the data user intended to insert into the table (the part of the sub-partition's range that doesn't overlap with its parent's) couldn't be, that's an operator error. It's like adding contradictory check constraints to the table: create table foo (a int check (a > 0 and a < 0)); insert into foo values (1); ERROR: new row for relation "foo" violates check constraint "foo_a_check" DETAIL: Failing row contains (1). One (perhaps the only) thing that could be done is to warn users to prevent this kind of mistake through documentation. Trying to do anything else in the core partitioning code is making it too complicated for not much benefit (also see Robert's last line, :)). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Jan 19, 2017 at 9:58 PM, Amit Langotewrote: > [ new patches ] Committed 0001 and 0002. See my earlier email for comments on 0003. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Jan 19, 2017 at 9:58 PM, Amit Langotewrote: >> But I wonder why we don't instead just change this function to >> consider tdhasoid rather than tdtypeid. I mean, if the only point of >> comparing the type OIDs is to find out whether the table-has-OIDs >> setting matches, we could instead test that directly and avoid needing >> to pass an extra argument. I wonder if there's some other reason this >> code is there which is not documented in the comment... > > With the following patch, regression tests run fine: > > if (indesc->natts == outdesc->natts && > - indesc->tdtypeid == outdesc->tdtypeid) > + indesc->tdhasoid != outdesc->tdhasoid) > { > > If checking tdtypeid (instead of tdhasoid directly) has some other > consideration, I'd would have seen at least some tests broken by this > change. So, if we are to go with this, I too prefer it over my previous > proposal to add an argument to convert_tuples_by_name(). Attached 0003 > implements the above approach. I think this is not quite right. First, the patch compares the tdhasoid status with != rather than ==, which would have the effect of saying that we can skip conversion of the has-OID statuses do NOT match. That can't be right. Second, I believe that the comments imply that conversion should be done if *either* tuple has OIDs. I believe that's because whoever wrote this comment thought that we needed to replace the OID if the tuple already had one, which is what do_convert_tuple would do. I'm not sure whether that's really necessary, but we're less likely to break anything if we preserve the existing behavior, and I don't think we lose much from doing so because few user tables will have OIDs. So I would change this test to if (indesc->natts == outdesc->natts && !indesc->tdhasoid && !outdesc->tdhasoid), and I'd revise the one in convert_tuples_by_position() to match. Then I think it's much clearer that we're just optimizing what's there already, not changing the behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/21 6:29, Robert Haas wrote: > On Fri, Jan 20, 2017 at 1:15 AM, Andres Freundwrote: >> On 2017-01-19 14:18:23 -0500, Robert Haas wrote: >>> Committed. >> >> One of the patches around this topic committed recently seems to cause >> valgrind failures like >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02 >> : > > Tom Lane independently reported this same issue. I believe I've fixed > it. Sorry for not noticing and crediting this report also. Thanks Robert for looking at this and committing the fix. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Jan 20, 2017 at 1:15 AM, Andres Freundwrote: > On 2017-01-19 14:18:23 -0500, Robert Haas wrote: >> Committed. > > One of the patches around this topic committed recently seems to cause > valgrind failures like > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02 > : Tom Lane independently reported this same issue. I believe I've fixed it. Sorry for not noticing and crediting this report also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Andres, On 2017/01/20 15:15, Andres Freund wrote: > On 2017-01-19 14:18:23 -0500, Robert Haas wrote: >> Committed. > > One of the patches around this topic committed recently seems to cause > valgrind failures like > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02 > : > ==24969== Conditional jump or move depends on uninitialised value(s) > ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97) > ==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318) > ==24969==by 0x536643: partition_bounds_equal (partition.c:627) > ==24969==by 0x820864: equalPartitionDescs (relcache.c:1203) > ==24969==by 0x82423A: RelationClearRelation (relcache.c:2553) > ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662) > ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714) > ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568) > ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444) > ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056) > ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374) > ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957) > ==24969== Uninitialised value was created by a heap allocation > ==24969==at 0x85AA83: palloc (mcxt.c:914) > ==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528) > ==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348) > ==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524) > ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662) > ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714) > ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568) > ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444) > ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056) > ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374) > ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957) > ==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490) > ==24969== > ==24969== VALGRINDERROR-END Thanks for the report. This being my first time reading a valgrind report on buildfarm, is it correct to to assume that the command immediately preceding VALGRINDERROR-BEGIN is what triggered the failure? ... LOG: statement: truncate list_parted; ==24969== VALGRINDERROR-BEGIN ==24969== Conditional jump or move depends on uninitialised value(s) ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97) So in this case: truncate list_parted? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017-01-19 14:18:23 -0500, Robert Haas wrote: > Committed. One of the patches around this topic committed recently seems to cause valgrind failures like https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02 : ==24969== Conditional jump or move depends on uninitialised value(s) ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97) ==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318) ==24969==by 0x536643: partition_bounds_equal (partition.c:627) ==24969==by 0x820864: equalPartitionDescs (relcache.c:1203) ==24969==by 0x82423A: RelationClearRelation (relcache.c:2553) ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662) ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714) ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568) ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444) ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056) ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374) ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957) ==24969== Uninitialised value was created by a heap allocation ==24969==at 0x85AA83: palloc (mcxt.c:914) ==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528) ==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348) ==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524) ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662) ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714) ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568) ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444) ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056) ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374) ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957) ==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490) ==24969== ==24969== VALGRINDERROR-END Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
So testing things out in pg_partman for native sub-partitioning and ran into what is a bug for me that I know I have to fix, but I'm curious if this can be prevented in the first place within the native partitioning code itself. The below shows a sub-partitioning set where the sub-partition has a constraint range that is outside of the range of its parent. If the columns were different I could see where this would be allowed, but the same column is used throughout the levels of sub-partitioning. Understandable if that may be too complex to check for, but figured I'd bring it up as something I accidentally ran into in case you see an easy way to prevent it. This was encountered as of commit ba61a04bc7fefeee03416d9911eb825c4897c223. keith@keith=# \d+ partman_test.time_taptest_table Table "partman_test.time_taptest_table" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +--+---+--+-+--+--+- col1 | integer | | | | plain| | col2 | text | | | | extended | | col3 | timestamp with time zone | | not null | now() | plain| | Partition key: RANGE (col3) Partitions: partman_test.time_taptest_table_p2015 FOR VALUES FROM ('2015-01-01 00:00:00-05') TO ('2016-01-01 00:00:00-05'), partman_test.time_taptest_table_p2016 FOR VALUES FROM ('2016-01-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'), partman_test.time_taptest_table_p2017 FOR VALUES FROM ('2017-01-01 00:00:00-05') TO ('2018-01-01 00:00:00-05'), partman_test.time_taptest_table_p2018 FOR VALUES FROM ('2018-01-01 00:00:00-05') TO ('2019-01-01 00:00:00-05'), partman_test.time_taptest_table_p2019 FOR VALUES FROM ('2019-01-01 00:00:00-05') TO ('2020-01-01 00:00:00-05') keith@keith=# \d+ partman_test.time_taptest_table_p2015 Table "partman_test.time_taptest_table_p2015" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +--+---+--+-+--+--+- col1 | integer | | | | plain| | col2 | text | | | | extended | | col3 | timestamp with time zone | | not null | now() | plain| | Partition of: partman_test.time_taptest_table FOR VALUES FROM ('2015-01-01 00:00:00-05') TO ('2016-01-01 00:00:00-05') Partition key: RANGE (col3) Partitions: partman_test.time_taptest_table_p2015_p2016_11 FOR VALUES FROM ('2016-11-01 00:00:00-04') TO ('2016-12-01 00:00:00-05'), partman_test.time_taptest_table_p2015_p2016_12 FOR VALUES FROM ('2016-12-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'), partman_test.time_taptest_table_p2015_p2017_01 FOR VALUES FROM ('2017-01-01 00:00:00-05') TO ('2017-02-01 00:00:00-05'), partman_test.time_taptest_table_p2015_p2017_02 FOR VALUES FROM ('2017-02-01 00:00:00-05') TO ('2017-03-01 00:00:00-05'), partman_test.time_taptest_table_p2015_p2017_03 FOR VALUES FROM ('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04') keith@keith=# \d+ partman_test.time_taptest_table_p2015_p2017_03 Table "partman_test.time_taptest_table_p2015_p2017_03" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +--+---+--+-+--+--+- col1 | integer | | | | plain| | col2 | text | | | | extended | | col3 | timestamp with time zone | | not null | now() | plain| | Partition of: partman_test.time_taptest_table_p2015 FOR VALUES FROM ('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04') -- Keith Fiske Database Administrator OmniTI Computer Consulting, Inc. http://www.keithf4.com
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/20 4:18, Robert Haas wrote: > On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote wrote: >> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch >> >> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that >> it's possible for a different TupleTableSlot to be used for partitioned >> tables at successively lower levels. If we do end up changing the slot >> from the original, we must update ecxt_scantuple to point to the new one >> for partition key of the tuple to be computed correctly. >> >> Last posted here: >> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp > > Why does FormPartitionKeyDatum need this? Could that be documented in > a comment in here someplace, perhaps the header comment to > FormPartitionKeyDatum? FormPartitionKeyDatum() computes partition key expressions (if any) and the expression evaluation machinery expects ecxt_scantuple to point to the tuple to extract attribute values from. FormPartitionKeyDatum() already has a tiny comment, which it seems is the only thing we could say here about this there: * the ecxt_scantuple slot of estate's per-tuple expr context must point to * the heap tuple passed in. In get_partition_for_tuple() which calls FormPartitionKeyDatum(), the patch adds this comment (changed it a little from the last version): + /* + * Extract partition key from tuple. Expression evaluation machinery + * that FormPartitionKeyDatum() invokes expects ecxt_scantuple to + * point to the correct tuple slot. The slot might have changed from + * what was used for the parent table if the table of the current + * partitioning level has different tuple descriptor from the parent. + * So update ecxt_scantuple accordingly. + */ + ecxt->ecxt_scantuple = slot; FormPartitionKeyDatum(parent, slot, estate, values, isnull); It says why we need to change which slot ecxt_scantuple points to. >> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch >> >> Automatically updatable views failed to handle partitioned tables. >> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without >> the WCO expressions having been suitably converted for each partition >> (think applying map_partition_varattnos to Vars in the WCO expressions >> just like with partition constraint expressions). > > The changes to execMain.c contain a hunk which has essentially the > same code twice. That looks like a mistake. Also, the patch doesn't > compile because convert_tuples_by_name() takes 3 arguments, not 4. Actually, I realized that and sent the updated version [1] of this patch that fixed this issue. In the updated version, I removed that code block (the 2 copies of it), because we are still discussing what to do about showing tuples in constraint violation (in this case, WITH CHECK OPTION violation) messages. Anyway, attached here again. >> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch >> >> Currently, the tuple conversion is performed after a tuple is routed, >> even if the attributes of a target leaf partition map one-to-one with >> those of the root table, which is wasteful. Avoid that by making >> convert_tuples_by_name() return a NULL map for such cases. > > +Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid); > > I think you mean Assert(consider_typeid || indesc->tdhasoid == > outdesc->tdhasoid); Ah, you're right. > But I wonder why we don't instead just change this function to > consider tdhasoid rather than tdtypeid. I mean, if the only point of > comparing the type OIDs is to find out whether the table-has-OIDs > setting matches, we could instead test that directly and avoid needing > to pass an extra argument. I wonder if there's some other reason this > code is there which is not documented in the comment... With the following patch, regression tests run fine: if (indesc->natts == outdesc->natts && - indesc->tdtypeid == outdesc->tdtypeid) + indesc->tdhasoid != outdesc->tdhasoid) { If checking tdtypeid (instead of tdhasoid directly) has some other consideration, I'd would have seen at least some tests broken by this change. So, if we are to go with this, I too prefer it over my previous proposal to add an argument to convert_tuples_by_name(). Attached 0003 implements the above approach. > Phew. Thanks for all the patches, sorry I'm having trouble keeping up. Thanks a lot for taking time to look through them and committing. Thanks, Amit [1] https://www.postgresql.org/message-id/92fe2a71-5eb7-ee8d-53ef-cfd5a65dfc3d%40lab.ntt.co.jp >From c7088290221a9fe0818139145b7bdf6731bc419a Mon Sep 17 00:00:00 2001 From: amitDate: Wed, 28 Dec 2016 10:10:26 +0900 Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successively lower levels. If we do
Re: [HACKERS] Declarative partitioning - another take
On Thu, Jan 19, 2017 at 12:15 AM, Amit Langotewrote: > 0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch > > Since implicit partition constraints are not inherited, an internal > partition's constraint was not being enforced when targeted directly. > So, include such constraint when setting up leaf partition result > relations for tuple-routing. Committed. > 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch > > In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that > it's possible for a different TupleTableSlot to be used for partitioned > tables at successively lower levels. If we do end up changing the slot > from the original, we must update ecxt_scantuple to point to the new one > for partition key of the tuple to be computed correctly. > > Last posted here: > https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp Why does FormPartitionKeyDatum need this? Could that be documented in a comment in here someplace, perhaps the header comment to FormPartitionKeyDatum? > 0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch > > In ExecInsert(), do not switch back to the root partitioned table > ResultRelInfo until after we finish ExecProcessReturning(), so that > RETURNING projection is done using the partition's descriptor. For > the projection to work correctly, we must initialize the same for > each leaf partition during ModifyTableState initialization. Committed. > 0004-Fix-some-issues-with-views-and-partitioned-tables.patch > > Automatically updatable views failed to handle partitioned tables. > Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without > the WCO expressions having been suitably converted for each partition > (think applying map_partition_varattnos to Vars in the WCO expressions > just like with partition constraint expressions). The changes to execMain.c contain a hunk which has essentially the same code twice. That looks like a mistake. Also, the patch doesn't compile because convert_tuples_by_name() takes 3 arguments, not 4. > 0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch > > Because a given range bound in the PartitionBoundInfo.datums array > is sometimes a range lower bound and upper bound at other times, we > must be careful when assuming which, especially when interpreting > the result of partition_bound_bsearch which returns the index of the > greatest bound that is less than or equal to probe. Due to an error > in thinking about the same, the relevant code in > check_new_partition_bound() caused invalid partition (index = -1) > to be chosen as the partition being overlapped. > > Last posted here: > https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp } +/* + * If equal has been set to true or if there is no "gap" + * between the bound at off1 and that at off1 + 1, the new + * partition will overlap some partition. In the former + * case, the new lower bound is found to be equal to the + * bound at off1, which could only ever be true if the + * latter is the lower bound of some partition. It's + * clear in such a case that the new partition overlaps + * that partition, whose index we get using its upper + * bound (that is, using the bound at off1 + 1). + */ else Stylistically, we usually avoid this, or at least I do. The comment should go inside the "else" block. But it looks OK apart from that, so committed with a little rephrasing and reformatting of the comment. > 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch > > Currently, the tuple conversion is performed after a tuple is routed, > even if the attributes of a target leaf partition map one-to-one with > those of the root table, which is wasteful. Avoid that by making > convert_tuples_by_name() return a NULL map for such cases. +Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid); I think you mean Assert(consider_typeid || indesc->tdhasoid == outdesc->tdhasoid); But I wonder why we don't instead just change this function to consider tdhasoid rather than tdtypeid. I mean, if the only point of comparing the type OIDs is to find out whether the table-has-OIDs setting matches, we could instead test that directly and avoid needing to pass an extra argument. I wonder if there's some other reason this code is there which is not documented in the comment... > 0007-Avoid-code-duplication-in-map_partition_varattnos.patch > > Code to map attribute numbers in map_partition_varattnos() duplicates > what convert_tuples_by_name_map() does. Avoid that as pointed out by > Álvaro Herrera. > > Last posted here: >
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/19 14:15, Amit Langote wrote: > So, here are all the patches I posted to date (and one new at the bottom) > for reported and unreported bugs, excluding the one involving > BulkInsertState for which you replied in a new thread. > > I'll describe the attached patches in brief: Sorry, I forgot to mention that I have skipped the patch I proposed to modify the committed approach [1] to get the correct tuple to show in the constraint violation messages. It might be better to continue that discussion at [2]. And because I skipped that patch, I should have removed the related logic in ExecWithCheckOptions() that I added in one of the later patches (patch 0004), which I forgot to do. So, here are all the patches again with the correct 0004 this time. Sigh. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c771ea74f42447dccaed42ffcdcccf3aa694 [2] https://www.postgresql.org/message-id/CA%2BTgmoZjGzSM5WwnyapFaw3GxnDLWh7pm8Xiz8_QWQnUQy%3DSCA%40mail.gmail.com >From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001 From: amitDate: Tue, 13 Dec 2016 15:07:41 +0900 Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition. Since implicit partition constraints are not inherited, an internal partition's constraint was not being enforced when targeted directly. So, include such constraint when setting up leaf partition result relations for tuple-routing. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/commands/copy.c | 1 - src/backend/commands/tablecmds.c | 1 - src/backend/executor/execMain.c | 42 src/include/executor/executor.h | 1 - src/test/regress/expected/insert.out | 6 ++ src/test/regress/sql/insert.sql | 5 + 6 files changed, 44 insertions(+), 12 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 1fd2162794..75386212e0 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2432,7 +2432,6 @@ CopyFrom(CopyState cstate) InitResultRelInfo(resultRelInfo, cstate->rel, 1, /* dummy rangetable index */ - true, /* do load partition check expression */ NULL, 0); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index e633a50dd2..06e43cbb3a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1324,7 +1324,6 @@ ExecuteTruncate(TruncateStmt *stmt) InitResultRelInfo(resultRelInfo, rel, 0, /* dummy rangetable index */ - false, NULL, 0); resultRelInfo++; diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index ff277d300a..5457f8fbde 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -824,10 +824,10 @@ InitPlan(QueryDesc *queryDesc, int eflags) resultRelationOid = getrelid(resultRelationIndex, rangeTable); resultRelation = heap_open(resultRelationOid, RowExclusiveLock); + InitResultRelInfo(resultRelInfo, resultRelation, resultRelationIndex, - true, NULL, estate->es_instrument); resultRelInfo++; @@ -1218,10 +1218,11 @@ void InitResultRelInfo(ResultRelInfo *resultRelInfo, Relation resultRelationDesc, Index resultRelationIndex, - bool load_partition_check, Relation partition_root, int instrument_options) { + List *partition_check = NIL; + MemSet(resultRelInfo, 0, sizeof(ResultRelInfo)); resultRelInfo->type = T_ResultRelInfo; resultRelInfo->ri_RangeTableIndex = resultRelationIndex; @@ -1257,13 +1258,38 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo, resultRelInfo->ri_ConstraintExprs = NULL; resultRelInfo->ri_junkFilter = NULL; resultRelInfo->ri_projectReturning = NULL; - if (load_partition_check) - resultRelInfo->ri_PartitionCheck = - RelationGetPartitionQual(resultRelationDesc); + /* - * The following gets set to NULL unless we are initializing leaf - * partitions for tuple-routing. + * If partition_root has been specified, that means we are builiding the + * ResultRelationInfo for one of its leaf partitions. In that case, we + * need *not* initialize the leaf partition's constraint, but rather the + * the partition_root's (if any). We must do that explicitly like this, + * because implicit partition constraints are not inherited like user- + * defined constraints and would fail to be enforced by ExecConstraints() + * after a tuple is routed to a leaf partition. */ + if (partition_root) + { + /* + * Root table itself may or may not be a partition; partition_check + * would be NIL in the latter case. + */ + partition_check = RelationGetPartitionQual(partition_root); + + /* + * This is not our own partition constraint, but rather an ancestor's. + * So any Vars in it bear the ancestor's attribute
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/19 5:29, Robert Haas wrote: > On Wed, Jan 18, 2017 at 3:12 PM, Robert Haaswrote: >> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote >> wrote: >>> [ updated patches ] >> >> I committed 0004 and also fixed the related regression test not to >> rely on DROP .. CASCADE, which isn't always stable. The remainder of Thanks! >> this patch set needs a rebase, and perhaps you could also fold in >> other pending partitioning fixes so I have everything to look at it in >> one place. > > Just to be a little more clear, I don't mind multiple threads each > with a patch or patch set so much, but multiple patch sets on the same > thread gets hard for me to track. Sorry for the inconvenience. OK, I agree that having multiple patch sets on the same thread is cumbersome. So, here are all the patches I posted to date (and one new at the bottom) for reported and unreported bugs, excluding the one involving BulkInsertState for which you replied in a new thread. I'll describe the attached patches in brief: 0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch Since implicit partition constraints are not inherited, an internal partition's constraint was not being enforced when targeted directly. So, include such constraint when setting up leaf partition result relations for tuple-routing. 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successively lower levels. If we do end up changing the slot from the original, we must update ecxt_scantuple to point to the new one for partition key of the tuple to be computed correctly. Last posted here: https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp 0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch In ExecInsert(), do not switch back to the root partitioned table ResultRelInfo until after we finish ExecProcessReturning(), so that RETURNING projection is done using the partition's descriptor. For the projection to work correctly, we must initialize the same for each leaf partition during ModifyTableState initialization. 0004-Fix-some-issues-with-views-and-partitioned-tables.patch Automatically updatable views failed to handle partitioned tables. Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without the WCO expressions having been suitably converted for each partition (think applying map_partition_varattnos to Vars in the WCO expressions just like with partition constraint expressions). 0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch Because a given range bound in the PartitionBoundInfo.datums array is sometimes a range lower bound and upper bound at other times, we must be careful when assuming which, especially when interpreting the result of partition_bound_bsearch which returns the index of the greatest bound that is less than or equal to probe. Due to an error in thinking about the same, the relevant code in check_new_partition_bound() caused invalid partition (index = -1) to be chosen as the partition being overlapped. Last posted here: https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch Currently, the tuple conversion is performed after a tuple is routed, even if the attributes of a target leaf partition map one-to-one with those of the root table, which is wasteful. Avoid that by making convert_tuples_by_name() return a NULL map for such cases. 0007-Avoid-code-duplication-in-map_partition_varattnos.patch Code to map attribute numbers in map_partition_varattnos() duplicates what convert_tuples_by_name_map() does. Avoid that as pointed out by Álvaro Herrera. Last posted here: https://www.postgresql.org/message-id/9ce97382-54c8-deb3-9ee9-a2ec271d866b%40lab.ntt.co.jp 0008-Avoid-DROP-TABLE-.-CASCADE-in-more-partitioning-test.patch This is the new one. There were quite a few commits recently to fix the breakage in regression tests due to not using ORDER BY in queries on system catalogs and using DROP TABLE ... CASCADE. There were still some instances of the latter in create_table.sql and alter_table.sql. Thanks, Amit >From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001 From: amit Date: Tue, 13 Dec 2016 15:07:41 +0900 Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition. Since implicit partition constraints are not inherited, an internal partition's constraint was not being enforced when targeted directly. So, include such constraint when setting up leaf partition result relations for tuple-routing. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/commands/copy.c | 1 - src/backend/commands/tablecmds.c | 1 - src/backend/executor/execMain.c
Re: [HACKERS] Declarative partitioning - another take
On Wed, Jan 18, 2017 at 3:12 PM, Robert Haaswrote: > On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote > wrote: >> [ updated patches ] > > I committed 0004 and also fixed the related regression test not to > rely on DROP .. CASCADE, which isn't always stable. The remainder of > this patch set needs a rebase, and perhaps you could also fold in > other pending partitioning fixes so I have everything to look at it in > one place. Just to be a little more clear, I don't mind multiple threads each with a patch or patch set so much, but multiple patch sets on the same thread gets hard for me to track. Sorry for the inconvenience. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Jan 10, 2017 at 6:06 AM, Amit Langotewrote: > [ updated patches ] I committed 0004 and also fixed the related regression test not to rely on DROP .. CASCADE, which isn't always stable. The remainder of this patch set needs a rebase, and perhaps you could also fold in other pending partitioning fixes so I have everything to look at it in one place. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, Jan 16, 2017 at 4:09 AM, Amit Langotewrote: > The problem is that whereas the SlotValueDescription that we build to show > in the error message should be based on the tuple that was passed to > ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to > process, it might fail to be the case if the tuple was needed to be > converted after tuple routing. slot (the tuple in it and its tuple > descriptor) and resultRelInfo that ExecConstraint() receives *do* > correspond with each other, even after possible tuple conversion following > tuple-routing, and hence constraint checking itself works fine (since > commit 2ac3ef7a01 [1]). As said, it's the val_desc built to show in the > error message being based on the converted-for-partition tuple that could > be seen as a problem - is it acceptable if we showed in the error message > whatever the converted-for-partition tuple looks like which might have > columns ordered differently from the root table? If so, we could simply > forget the whole thing, including reverting f1b4c771 [2]. > > An example: > > create table p (a int, b char, c int) partition by list (a); > create table p1 (b char, c int, a int);-- note the column order > alter table p attach partition p1 for values in (1); > alter table p add constraint check_b check (b = 'x'); > > insert into p values (1, 'y', 1); > ERROR: new row for relation "p1" violates check constraint "check_b" > DETAIL: Failing row contains (y, 1, 1). > > Note that "(y, 1, 1)" results from using p1's descriptor on the converted > tuple. As long that's clear and acceptable, I think we need not worry > about this patch and revert the previously committed patch for this "problem". Hmm. It would be fine, IMHO, if the detail message looked like the one that BuildIndexValueDescription produces. Without the column names, the clarity is somewhat lessened. Anybody else have an opinion on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/14 6:24, Robert Haas wrote: > On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote wrote: >> >> Thanks! I realized however that the approach I used in 0002 of passing >> the original slot to ExecConstraints() fails in certain situations. For >> example, if a BR trigger changes the tuple, the original slot would not >> receive those changes, so it will be wrong to use such a tuple anymore. >> In attached 0001, I switched back to the approach of converting the >> partition-tupdesc-based tuple back to the root partitioned table's format. >> The converted tuple contains the changes by BR triggers, if any. Sorry >> about some unnecessary work. > > Hmm. Even with this patch, I wonder if this is really correct. I > mean, isn't the root of the problem here that ExecConstraints() is > expecting that resultRelInfo matches slot, and it doesn't? The problem is that whereas the SlotValueDescription that we build to show in the error message should be based on the tuple that was passed to ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to process, it might fail to be the case if the tuple was needed to be converted after tuple routing. slot (the tuple in it and its tuple descriptor) and resultRelInfo that ExecConstraint() receives *do* correspond with each other, even after possible tuple conversion following tuple-routing, and hence constraint checking itself works fine (since commit 2ac3ef7a01 [1]). As said, it's the val_desc built to show in the error message being based on the converted-for-partition tuple that could be seen as a problem - is it acceptable if we showed in the error message whatever the converted-for-partition tuple looks like which might have columns ordered differently from the root table? If so, we could simply forget the whole thing, including reverting f1b4c771 [2]. An example: create table p (a int, b char, c int) partition by list (a); create table p1 (b char, c int, a int);-- note the column order alter table p attach partition p1 for values in (1); alter table p add constraint check_b check (b = 'x'); insert into p values (1, 'y', 1); ERROR: new row for relation "p1" violates check constraint "check_b" DETAIL: Failing row contains (y, 1, 1). Note that "(y, 1, 1)" results from using p1's descriptor on the converted tuple. As long that's clear and acceptable, I think we need not worry about this patch and revert the previously committed patch for this "problem". > And why > isn't that also a problem for the things that get passed resultRelInfo > and slot after tuple routing and before ExecConstraints? In > particular, in copy.c, ExecBRInsertTriggers. If I explained the problem (as I'm seeing it) well enough above, you may see why that's not an issue in other cases. Other sub-routines viz. ExecBRInsertTriggers, ExecWithCheckOptions for RLS INSERT WITH CHECK policies, ExecARInsertTriggers, etc. do receive the correct slot and resultRelInfo for whatever they do with a given tuple and the relation (partition) it is being inserted into. However, if we do consider the above to be a problem, then we would need to fix ExecWithCheckOptions() to do whatever we decide ExecConstraints() should do, because it can also report WITH CHECK violation for a tuple. > Committed 0002. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c77 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Jan 10, 2017 at 6:06 AM, Amit Langotewrote: > On 2017/01/05 5:50, Robert Haas wrote: >> On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote >> wrote: >>> Patches 0001 to 0006 unchanged. >> >> Committed 0001 earlier, as mentioned in a separate email. Committed >> 0002 and part of 0003. > > Thanks! I realized however that the approach I used in 0002 of passing > the original slot to ExecConstraints() fails in certain situations. For > example, if a BR trigger changes the tuple, the original slot would not > receive those changes, so it will be wrong to use such a tuple anymore. > In attached 0001, I switched back to the approach of converting the > partition-tupdesc-based tuple back to the root partitioned table's format. > The converted tuple contains the changes by BR triggers, if any. Sorry > about some unnecessary work. Hmm. Even with this patch, I wonder if this is really correct. I mean, isn't the root of the problem here that ExecConstraints() is expecting that resultRelInfo matches slot, and it doesn't? And why isn't that also a problem for the things that get passed resultRelInfo and slot after tuple routing and before ExecConstraints? In particular, in copy.c, ExecBRInsertTriggers. Committed 0002. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/06 20:23, Amit Langote wrote: > On 2017/01/05 3:26, Robert Haas wrote: >> It's unclear to me why we need to do 0002. It doesn't seem like it >> should be necessary, it doesn't seem like a good idea, and the commit >> message you proposed is uninformative. > > If a single BulkInsertState object is passed to > heap_insert()/heap_multi_insert() for different heaps corresponding to > different partitions (from one input tuple to next), tuples might end up > going into wrong heaps (like demonstrated in one of the reports [1]). A > simple solution is to disable bulk-insert in case of partitioned tables. > > But my patch (or its motivations) was slightly wrongheaded, wherein I > conflated multi-insert stuff and bulk-insert considerations. I revised > 0002 to not do that. Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct. Attaching updated 0002 along with rebased 0001 and 0003. Thanks, Amit [1] https://www.postgresql.org/message-id/732dfc84-25f5-413c-1eee-0bfa7a370093%40agama.tv >From f2a64348021c7dba1f96d0c8b4e3e253f635b019 Mon Sep 17 00:00:00 2001 From: amitDate: Wed, 28 Dec 2016 10:10:26 +0900 Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successively lower levels. If we do end up changing the slot from the original, we must update ecxt_scantuple to point to the new one for partition key of the tuple to be computed correctly. Also update the regression tests so that the code manipulating ecxt_scantuple is covered. Reported by: Rajkumar Raghuwanshi Patch by: Amit Langote Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com --- src/backend/catalog/partition.c | 29 ++--- src/backend/executor/execMain.c | 2 -- src/test/regress/expected/insert.out | 2 +- src/test/regress/sql/insert.sql | 2 +- 4 files changed, 24 insertions(+), 11 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index f54e1bdf3f..0de1cf245a 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd, bool isnull[PARTITION_MAX_KEYS]; int cur_offset, cur_index; - int i; + int i, +result; + ExprContext *ecxt = GetPerTupleExprContext(estate); + TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple; /* start with the root partitioned table */ parent = pd[0]; @@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd, slot = myslot; } - /* Extract partition key from tuple */ + /* + * Extract partition key from tuple; FormPartitionKeyDatum() expects + * ecxt_scantuple to point to the correct tuple slot (which might be + * different from the slot we received from the caller if the + * partitioned table of the current level has different tuple + * descriptor from its parent). + */ + ecxt->ecxt_scantuple = slot; FormPartitionKeyDatum(parent, slot, estate, values, isnull); if (key->strategy == PARTITION_STRATEGY_RANGE) @@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd, */ if (cur_index < 0) { + result = -1; *failed_at = RelationGetRelid(parent->reldesc); - return -1; + break; } - else if (parent->indexes[cur_index] < 0) - parent = pd[-parent->indexes[cur_index]]; - else + else if (parent->indexes[cur_index] >= 0) + { + result = parent->indexes[cur_index]; break; + } + else + parent = pd[-parent->indexes[cur_index]]; } - return parent->indexes[cur_index]; + ecxt->ecxt_scantuple = ecxt_scantuple_old; + return result; } /* diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index ff277d300a..6a9bc8372f 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3167,9 +3167,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, { int result; Oid failed_at; - ExprContext *econtext = GetPerTupleExprContext(estate); - econtext->ecxt_scantuple = slot; result = get_partition_for_tuple(pd, slot, estate, _at); if (result < 0) { diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out index ca3134c34c..1c7b8047ee 100644 --- a/src/test/regress/expected/insert.out +++ b/src/test/regress/expected/insert.out @@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1 drop cascades to table part_ee_ff2 -- more tests for certain multi-level partitioning scenarios create table p (a int, b int) partition by range (a, b); -create table p1 (b int, a int not null) partition by range (b); +create table p1 (b int not null, a int not null) partition by range ((b+0)); create table p11 (like p1); alter table p11 drop a; alter table p11 add a
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/05 5:50, Robert Haas wrote: > On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote >wrote: >> Patches 0001 to 0006 unchanged. > > Committed 0001 earlier, as mentioned in a separate email. Committed > 0002 and part of 0003. Thanks! I realized however that the approach I used in 0002 of passing the original slot to ExecConstraints() fails in certain situations. For example, if a BR trigger changes the tuple, the original slot would not receive those changes, so it will be wrong to use such a tuple anymore. In attached 0001, I switched back to the approach of converting the partition-tupdesc-based tuple back to the root partitioned table's format. The converted tuple contains the changes by BR triggers, if any. Sorry about some unnecessary work. > But I'm skeptical that the as-patched-by-0003 > logic in generate_partition_qual() makes sense. You do this: > > result = list_concat(generate_partition_qual(parent), > copyObject(rel->rd_partcheck)); > > /* Mark Vars with correct attnos */ > result = map_partition_varattnos(result, rel, parent); > > But that has the effect of applying map_partition_varattnos to > everything in rel->rd_partcheck in addition to applying it to > everything returned by generate_partition_qual() on the parent, which > doesn't seem right. I've replaced this portion of the code with (as also mentioned below): /* Quick copy */ if (rel->rd_partcheck != NIL) return copyObject(rel->rd_partcheck); Down below (for the case when the partition qual is not cached, we now do this: my_qual = get_qual_from_partbound(rel, parent, bound); /* Add the parent's quals to the list (if any) */ if (parent->rd_rel->relispartition) result = list_concat(generate_partition_qual(parent), my_qual); else result = my_qual; /* * Change Vars to have partition's attnos instead of the parent's. * We do this after we concatenate the parent's quals, because * we want every Var in it to bear this relation's attnos. */ result = map_partition_varattnos(result, rel, parent); Which is then cached wholly in rd_partcheck. As for your concern whether it's correct to do so, consider that doing generate_partition_qual() on the parent returns qual with Vars that bear the parent's attnos (which is OK as far parent as partition is concerned). To apply the qual to the current relation as partition, we must change the Vars to have this relation's attnos. > Also, don't we want to do map_partition_varattnos() just ONCE, rather > than on every call to this function? I think maybe your concern is > that the parent might be changed without a relcache flush on the > child, but I don't quite see how that could happen. If the parent's > tuple descriptor changes, surely the child's tuple descriptor would > have to be altered at the same time... Makes sense. I fixed so that we return copyObject(rel->rd_partcheck), if it's non-NIL, instead of generating parent's qual and doing the mapping again. For some reason, I thought we couldn't save the mapped version in the relcache. By the way, in addition to the previously mentioned bug of RETURNING, I found that WITH CHECK OPTION didn't work correctly as well. In fact automatically updatable views failed to consider partitioned tables at all. Patch 0007 is addressed towards fixing that. Thanks, Amit >From e408234633c01817d6a2313fdbdccdb4f0057c1e Mon Sep 17 00:00:00 2001 From: amit Date: Fri, 6 Jan 2017 15:53:10 +0900 Subject: [PATCH 1/7] Fix reporting of violation in ExecConstraints, again We decided in f1b4c771ea74f42447dccaed42ffcdcccf3aa694 that passing the original slot (one containing the tuple formatted per root partitioned table's tupdesc) to ExecConstraints(), but that breaks certain cases. Imagine what would happen if a BR trigger changed the tuple - the original slot would not contain those changes. So, it seems better to convert (if necessary) the tuple formatted per partition tupdesc after tuple-routing back to the root table's format and use the converted tuple to make val_desc shown in the message if an error occurs. --- src/backend/commands/copy.c| 6 ++-- src/backend/executor/execMain.c| 53 +- src/backend/executor/nodeModifyTable.c | 5 ++-- src/include/executor/executor.h| 3 +- src/test/regress/expected/insert.out | 18 ++-- src/test/regress/sql/insert.sql| 17 ++- 6 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f56b2ac49b..65eb167087 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2491,8 +2491,7 @@ CopyFrom(CopyState cstate) for (;;) { - TupleTableSlot *slot, - *oldslot; + TupleTableSlot *slot; bool skip_tuple; Oid loaded_oid = InvalidOid; @@
Re: [HACKERS] Declarative partitioning - another take
Hi Kieth, On 2017/01/10 14:44, Keith Fiske wrote: > Is there any reason for the exclusion of parent tables from the pg_tables > system catalog view? They do not show up in information_schema.tables as > well. I believe I found where to make the changes and I tested to make sure > it works for my simple case. Attached is my first attempt at patching > anything in core. Not sure if there's anywhere else this would need to be > fixed. That's an oversight. The original partitioning patch didn't touch information_schema.sql and system_views.sql at all. I added the relkind = 'P' check in some other views as well, including what your patch considered. Thanks, Amit >From 9eef3e87b9a025d233aa4b935b50bb0c7633efbb Mon Sep 17 00:00:00 2001 From: amitDate: Tue, 10 Jan 2017 18:12:25 +0900 Subject: [PATCH] information_schema/system_views.sql and relkind 'P' Currently, partitioned table are not taken into account in various information_schema and system views. Reported by: Keith Fiske Patch by: Kieth Fiske, Amit Langote Reports: https://www.postgresql.org/message-id/CAG1_KcDJiZB=l6youo_bvufj2q2851_xdkfhw0jdcd_2vtk...@mail.gmail.com --- src/backend/catalog/information_schema.sql | 37 +++--- src/backend/catalog/system_views.sql | 3 ++- 2 files changed, 21 insertions(+), 19 deletions(-) diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 4df390a763..318f195b81 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -453,7 +453,7 @@ CREATE VIEW check_constraints AS AND a.attnum > 0 AND NOT a.attisdropped AND a.attnotnull - AND r.relkind = 'r' + AND r.relkind IN ('r', 'P') AND pg_has_role(r.relowner, 'USAGE'); GRANT SELECT ON check_constraints TO PUBLIC; @@ -525,7 +525,7 @@ CREATE VIEW column_domain_usage AS AND a.attrelid = c.oid AND a.atttypid = t.oid AND t.typtype = 'd' - AND c.relkind IN ('r', 'v', 'f') + AND c.relkind IN ('r', 'v', 'f', 'P') AND a.attnum > 0 AND NOT a.attisdropped AND pg_has_role(t.typowner, 'USAGE'); @@ -564,7 +564,7 @@ CREATE VIEW column_privileges AS pr_c.relowner FROM (SELECT oid, relname, relnamespace, relowner, (aclexplode(coalesce(relacl, acldefault('r', relowner.* FROM pg_class - WHERE relkind IN ('r', 'v', 'f') + WHERE relkind IN ('r', 'v', 'f', 'P') ) pr_c (oid, relname, relnamespace, relowner, grantor, grantee, prtype, grantable), pg_attribute a WHERE a.attrelid = pr_c.oid @@ -586,7 +586,7 @@ CREATE VIEW column_privileges AS ) pr_a (attrelid, attname, grantor, grantee, prtype, grantable), pg_class c WHERE pr_a.attrelid = c.oid - AND relkind IN ('r', 'v', 'f') + AND relkind IN ('r', 'v', 'f', 'P') ) x, pg_namespace nc, pg_authid u_grantor, @@ -629,7 +629,7 @@ CREATE VIEW column_udt_usage AS WHERE a.attrelid = c.oid AND a.atttypid = t.oid AND nc.oid = c.relnamespace - AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f') + AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P') AND pg_has_role(coalesce(bt.typowner, t.typowner), 'USAGE'); GRANT SELECT ON column_udt_usage TO PUBLIC; @@ -738,7 +738,7 @@ CREATE VIEW columns AS CAST('NEVER' AS character_data) AS is_generated, CAST(null AS character_data) AS generation_expression, - CAST(CASE WHEN c.relkind = 'r' OR + CAST(CASE WHEN c.relkind IN ('r', 'P') OR (c.relkind IN ('v', 'f') AND pg_column_is_updatable(c.oid, a.attnum, false)) THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_updatable @@ -753,7 +753,7 @@ CREATE VIEW columns AS WHERE (NOT pg_is_other_temp_schema(nc.oid)) - AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f') + AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P') AND (pg_has_role(c.relowner, 'USAGE') OR has_column_privilege(c.oid, a.attnum, @@ -789,7 +789,7 @@ CREATE VIEW constraint_column_usage AS AND d.objid = c.oid AND c.connamespace = nc.oid AND c.contype = 'c' -AND r.relkind = 'r' +AND r.relkind IN ('r', 'P') AND NOT a.attisdropped UNION ALL @@ -841,7 +841,7 @@ CREATE VIEW constraint_table_usage AS WHERE c.connamespace = nc.oid AND r.relnamespace = nr.oid AND ( (c.contype = 'f' AND c.confrelid = r.oid) OR (c.contype IN ('p', 'u') AND c.conrelid = r.oid) ) - AND
Re: [HACKERS] Declarative partitioning - another take
Hi Amul, On 2017/01/09 17:29, amul sul wrote: > I got server crash due to assert failure at ATTACHing overlap rang > partition, here is test case to reproduce this: > > CREATE TABLE test_parent(a int) PARTITION BY RANGE (a); > CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES > FROM(100) TO(200); > CREATE TABLE test_parent_part1(a int NOT NULL); > ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES > FROM(1) TO(200); > > I think, this bug exists in the following code of check_new_partition_bound(): > > 767 if (equal || off1 != off2) > 768 { > 769 overlap = true; > 770 with = boundinfo->indexes[off2 + 1]; > 771 } > > When equal is true array index should not be 'off2 + 1'. Good catch. Attached patch should fix that. I observed crash with the following command as well: ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES FROM (1) TO (300); That's because there is one more case when the array index shouldn't be off2 + 1 - the case where the bound at off2 is an upper bound (I'd wrongly assumed that it's always a lower bound). Anyway, I rewrote the surrounding comments to clarify the logic a bit. > While reading code related to this, I wondered why > partition_bound_bsearch is not immediately returns when cmpval==0? partition_bound_bsearch() is meant to return the *greatest* index of the bound less than or equal to the input bound ("probe"). But it seems to me now that we would always return the first index at which we get 0 for cmpval, albeit after wasting cycles to try to find even greater index. Because we don't have duplicates in the datums array, once we encounter a bound that is equal to probe, we are only going to find bounds that are *greater than* probe if we continue looking right, only to turn back again to return the equal index (which is wasted cycles in invoking the partition key comparison function(s)). So, it perhaps makes sense to do this per your suggestion: @@ -1988,8 +2018,11 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, if (cmpval <= 0) { lo = mid; *is_equal = (cmpval == 0); + +if (*is_equal) +break; } Thanks, Amit >From 7fe537f8e8efeac51c3b0cc91ac51a1aa39399cd Mon Sep 17 00:00:00 2001 From: amitDate: Tue, 10 Jan 2017 17:43:36 +0900 Subject: [PATCH] Fix some wrong thinking in check_new_partition_bound() Because a given range bound in the PartitionBoundInfo.datums array is sometimes a lower range bound and at other times an upper range bound, we must be careful when assuming which, especially when interpreting the result of partition_bound_bsearch which returns the index of the greatest bound that is less than or equal to probe. Due to an error in thinking about the same, the relevant code in check_new_partition_bound() caused invalid partition (index == -1) to be chosen as the partition being overlapped. Also, we need not continue searching for even greater bound in partition_bound_bsearch() once we find the first bound that is *equal* to the probe, because we don't have duplicate datums. That spends cycles needlessly. Per suggestion from Amul Sul. Reported by: Amul Sul Patch by: Amit Langote Reports: https://www.postgresql.org/message-id/CAAJ_b94XgbqVoXMyxxs63CaqWoMS1o2gpHiU0F7yGnJBnvDc_A%40mail.gmail.com --- src/backend/catalog/partition.c| 62 ++ src/test/regress/expected/create_table.out | 10 - src/test/regress/sql/create_table.sql | 4 ++ 3 files changed, 60 insertions(+), 16 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index f54e1bdf3f..df5652de4c 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -741,35 +741,64 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound) boundinfo->strategy == PARTITION_STRATEGY_RANGE); /* - * Find the greatest index of a range bound that is less - * than or equal with the new lower bound. + * Firstly, find the greatest range bound that is less + * than or equal to the new lower bound. */ off1 = partition_bound_bsearch(key, boundinfo, lower, true, ); /* - * If equal has been set to true, that means the new lower - * bound is found to be equal with the bound at off1, - * which clearly means an overlap with the partition at - * index off1+1). - * - * Otherwise, check if there is a "gap" that could be - * occupied by the new partition. In case of a gap, the - * new upper bound should not cross past the upper - * boundary of the gap, that is, off2 == off1 should be - * true. + * off1 == -1 means that all existing bounds are greater + *
Re: [HACKERS] Declarative partitioning - another take
Is there any reason for the exclusion of parent tables from the pg_tables system catalog view? They do not show up in information_schema.tables as well. I believe I found where to make the changes and I tested to make sure it works for my simple case. Attached is my first attempt at patching anything in core. Not sure if there's anywhere else this would need to be fixed. -- Keith Fiske Database Administrator OmniTI Computer Consulting, Inc. http://www.keithf4.com diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 4df390a..c31d0d8 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -1901,6 +1901,7 @@ CREATE VIEW tables AS WHEN c.relkind = 'r' THEN 'BASE TABLE' WHEN c.relkind = 'v' THEN 'VIEW' WHEN c.relkind = 'f' THEN 'FOREIGN TABLE' + WHEN c.relkind = 'P' THEN 'PARTITIONED TABLE' ELSE null END AS character_data) AS table_type, @@ -1912,7 +1913,7 @@ CREATE VIEW tables AS CAST(t.typname AS sql_identifier) AS user_defined_type_name, CAST(CASE WHEN c.relkind = 'r' OR - (c.relkind IN ('v', 'f') AND + (c.relkind IN ('v', 'f', 'P') AND -- 1 << CMD_INSERT pg_relation_is_updatable(c.oid, false) & 8 = 8) THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_insertable_into, @@ -1923,7 +1924,7 @@ CREATE VIEW tables AS FROM pg_namespace nc JOIN pg_class c ON (nc.oid = c.relnamespace) LEFT JOIN (pg_type t JOIN pg_namespace nt ON (t.typnamespace = nt.oid)) ON (c.reloftype = t.oid) -WHERE c.relkind IN ('r', 'v', 'f') +WHERE c.relkind IN ('r', 'v', 'f', 'P') AND (NOT pg_is_other_temp_schema(nc.oid)) AND (pg_has_role(c.relowner, 'USAGE') OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER') diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 31aade1..f4dc460 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -136,7 +136,7 @@ CREATE VIEW pg_tables AS C.relrowsecurity AS rowsecurity FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) LEFT JOIN pg_tablespace T ON (T.oid = C.reltablespace) -WHERE C.relkind = 'r'; +WHERE C.relkind = 'r' OR C.relkind = 'P'; CREATE VIEW pg_matviews AS SELECT -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, I got server crash due to assert failure at ATTACHing overlap rang partition, here is test case to reproduce this: CREATE TABLE test_parent(a int) PARTITION BY RANGE (a); CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES FROM(100) TO(200); CREATE TABLE test_parent_part1(a int NOT NULL); ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES FROM(1) TO(200); I think, this bug exists in the following code of check_new_partition_bound(): 767 if (equal || off1 != off2) 768 { 769 overlap = true; 770 with = boundinfo->indexes[off2 + 1]; 771 } When equal is true array index should not be 'off2 + 1'. While reading code related to this, I wondered why partition_bound_bsearch is not immediately returns when cmpval==0? Apologise if this has been already reported. Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/05 3:26, Robert Haas wrote: > On Tue, Dec 27, 2016 at 8:41 PM, Amit Langote >wrote: >> On 2016/12/27 19:07, Amit Langote wrote: >>> Attached should fix that. >> >> Here are the last two patches with additional information like other >> patches. Forgot to do that yesterday. > > 0001 has the disadvantage that get_partition_for_tuple() acquires a > side effect. That seems undesirable. At the least, it needs to be > documented in the function's header comment. That's true. How about we save away the original ecxt_scantuple at entry and restore the same just before returning from the function? That way there would be no side effect. 0001 implements that. > It's unclear to me why we need to do 0002. It doesn't seem like it > should be necessary, it doesn't seem like a good idea, and the commit > message you proposed is uninformative. If a single BulkInsertState object is passed to heap_insert()/heap_multi_insert() for different heaps corresponding to different partitions (from one input tuple to next), tuples might end up going into wrong heaps (like demonstrated in one of the reports [1]). A simple solution is to disable bulk-insert in case of partitioned tables. But my patch (or its motivations) was slightly wrongheaded, wherein I conflated multi-insert stuff and bulk-insert considerations. I revised 0002 to not do that. However if we disable bulk-insert mode, COPY's purported performance benefit compared with INSERT is naught. Patch 0003 is a proposal to implement bulk-insert mode even for partitioned tables. Basically, allocate separate BulkInsertState objects for each partition and switch to the appropriate one just before calling heap_insert()/heap_multi_insert(). Then to be able to use heap_multi_insert(), we must also manage buffered tuples separately for each partition. Although, I didn't modify the limit on number of buffered tuples and/or size of buffered tuples which controls when we pause buffering and do heap_multi_insert() on buffered tuples. Maybe, it should work slightly differently for the partitioned table case, like for example, increase the overall limit on both the number of tuples and tuple size in the partitioning case (I observed that increasing it 10x or 100x helped to some degree). Thoughts on this? Thanks, Amit [1] https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com >From 332c67c258a0f25f76c29ced23199fe0ee8e153e Mon Sep 17 00:00:00 2001 From: amit Date: Wed, 28 Dec 2016 10:10:26 +0900 Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successively lower levels. If we do end up changing the slot from the original, we must update ecxt_scantuple to point to the new one for partition key of the tuple to be computed correctly. Also update the regression tests so that the code manipulating ecxt_scantuple is covered. Reported by: Rajkumar Raghuwanshi Patch by: Amit Langote Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com --- src/backend/catalog/partition.c | 29 ++--- src/backend/executor/execMain.c | 2 -- src/test/regress/expected/insert.out | 2 +- src/test/regress/sql/insert.sql | 2 +- 4 files changed, 24 insertions(+), 11 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index f54e1bdf3f..0de1cf245a 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd, bool isnull[PARTITION_MAX_KEYS]; int cur_offset, cur_index; - int i; + int i, +result; + ExprContext *ecxt = GetPerTupleExprContext(estate); + TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple; /* start with the root partitioned table */ parent = pd[0]; @@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd, slot = myslot; } - /* Extract partition key from tuple */ + /* + * Extract partition key from tuple; FormPartitionKeyDatum() expects + * ecxt_scantuple to point to the correct tuple slot (which might be + * different from the slot we received from the caller if the + * partitioned table of the current level has different tuple + * descriptor from its parent). + */ + ecxt->ecxt_scantuple = slot; FormPartitionKeyDatum(parent, slot, estate, values, isnull); if (key->strategy == PARTITION_STRATEGY_RANGE) @@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd, */ if (cur_index < 0) { + result = -1; *failed_at = RelationGetRelid(parent->reldesc); - return -1; + break; } - else if (parent->indexes[cur_index] < 0) - parent =
Re: [HACKERS] Declarative partitioning - another take
Hi Keith, On 2017/01/06 2:16, Keith Fiske wrote: > Could we get some clarification on the partition_bound_spec portion of the > PARTITION OF clause? Just doing some testing it seems it's inclusive of the > FROM value but exclusive of the TO value. I don't see mention of this in > the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday. > It does mention that the values aren't allowed to overlap, but looking at > the schema below, without the clarification of which side is > inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even > the child table does not clarify this. Not sure if there's a way to do this > in the \d+ display which would be ideal, but it should at least be > mentioned in the docs. I agree that needs highlighting. I'm planning to write a doc patch for that (among other documentation improvements). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Could we get some clarification on the partition_bound_spec portion of the PARTITION OF clause? Just doing some testing it seems it's inclusive of the FROM value but exclusive of the TO value. I don't see mention of this in the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday. It does mention that the values aren't allowed to overlap, but looking at the schema below, without the clarification of which side is inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even the child table does not clarify this. Not sure if there's a way to do this in the \d+ display which would be ideal, but it should at least be mentioned in the docs. keith@keith=# \d+ measurement Table "public.measurement" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---+-+---+--+-+-+--+- logdate | date| | not null | | plain | | peaktemp | integer | | | 1 | plain | | unitsales | integer | | | | plain | | Partition key: RANGE (logdate) Check constraints: "measurement_peaktemp_check" CHECK (peaktemp > 0) Partitions: measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO ('2016-08-01'), measurement_y2016m08 FOR VALUES FROM ('2016-08-01') TO ('2016-09-01') keith@keith=# \d+ measurement_y2016m07 Table "public.measurement_y2016m07" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---+-+---+--+-+-+--+- logdate | date| | not null | | plain | | peaktemp | integer | | | 1 | plain | | unitsales | integer | | | 0 | plain | | Partition of: measurement FOR VALUES FROM ('2016-07-01') TO ('2016-08-01') Check constraints: "measurement_peaktemp_check" CHECK (peaktemp > 0) keith@keith=# insert into measurement (logdate) values ('2016-08-01'); INSERT 0 1 Time: 2.848 ms keith@keith=# select * from measurement_y2016m07; logdate | peaktemp | unitsales -+--+--- (0 rows) Time: 0.273 ms keith@keith=# select * from measurement_y2016m08; logdate | peaktemp | unitsales +--+--- 2016-08-01 |1 |«NULL» (1 row) Time: 0.272 ms keith@keith=# drop table measurement_y2016m08; DROP TABLE Time: 5.919 ms keith@keith=# select * from only measurement; logdate | peaktemp | unitsales -+--+--- (0 rows) Time: 0.307 ms keith@keith=# insert into measurement (logdate) values ('2016-08-01'); ERROR: no partition of relation "measurement" found for row DETAIL: Failing row contains (2016-08-01, 1, null). Time: 0.622 ms
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/14 12:14, Venkata B Nagothi wrote: > Loading the data into a normal table is not an issue (infact the csv is > generated from the table itself) > > The issue is occurring only when i am trying to load the data from CSV file > into a partitioned table - > > db01=# CREATE TABLE orders_y1992 > PARTITION OF orders2 FOR VALUES FROM ('1992-01-01') TO ('1992-12-31'); > CREATE TABLE > db01=# copy orders2 from '/data/orders-1993.csv' delimiter '|'; > ERROR: could not read block 6060 in file "base/16384/16407": read only 0 > of 8192 bytes > CONTEXT: COPY orders2, line 376589: > "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely > regular pack" > > Not sure why COPY is failing. I think I've been able to reproduce this issue and suspect that it's a bug. I tried to solve it in response to another related report [1], where it was apparent that the cause was related to how the bulk-insert mode in the COPY FROM code is not handled correctly for a partitioned table. My proposed solution [2] was to disable bulk-insert mode completely for partitioned tables. But it may not be desirable performance-wise (for example, COPY FROM on partitioned tables would have same performance as INSERT, whereas in case of regular tables, COPY FROM is much faster than INSERT due to the bulk insert mode). I will propose another solution for the same. Meanwhile, could you please try your test again with the patch posted at [1], although it will not likely be committed as the fix for this issue. Thanks, Amit [1] https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com [2] https://www.postgresql.org/message-id/101e2c2d-45d6-fb1a-468c-d3f67572a2f3%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 27, 2016 at 3:59 AM, Amit Langotewrote: > Patches 0001 to 0006 unchanged. Committed 0001 earlier, as mentioned in a separate email. Committed 0002 and part of 0003. But I'm skeptical that the as-patched-by-0003 logic in generate_partition_qual() makes sense. You do this: result = list_concat(generate_partition_qual(parent), copyObject(rel->rd_partcheck)); /* Mark Vars with correct attnos */ result = map_partition_varattnos(result, rel, parent); But that has the effect of applying map_partition_varattnos to everything in rel->rd_partcheck in addition to applying it to everything returned by generate_partition_qual() on the parent, which doesn't seem right. Also, don't we want to do map_partition_varattnos() just ONCE, rather than on every call to this function? I think maybe your concern is that the parent might be changed without a relcache flush on the child, but I don't quite see how that could happen. If the parent's tuple descriptor changes, surely the child's tuple descriptor would have to be altered at the same time... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 27, 2016 at 8:41 PM, Amit Langotewrote: > On 2016/12/27 19:07, Amit Langote wrote: >> Attached should fix that. > > Here are the last two patches with additional information like other > patches. Forgot to do that yesterday. 0001 has the disadvantage that get_partition_for_tuple() acquires a side effect. That seems undesirable. At the least, it needs to be documented in the function's header comment. It's unclear to me why we need to do 0002. It doesn't seem like it should be necessary, it doesn't seem like a good idea, and the commit message you proposed is uninformative. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Mon, Dec 26, 2016 at 5:46 AM, Amit Langotewrote: > On 2016/12/23 8:08, Robert Haas wrote: >> On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote >> wrote: >>> While working on that, I discovered yet-another-bug having to do with the >>> tuple descriptor that's used as we route a tuple down a partition tree. If >>> attnums of given key attribute(s) are different on different levels, it >>> would be incorrect to use the original slot's (one passed by ExecInsert()) >>> tuple descriptor to inspect the original slot's heap tuple, as we go down >>> the tree. It might cause spurious "partition not found" at some level due >>> to looking at incorrect field in the input tuple because of using the >>> wrong tuple descriptor (root table's attnums not always same as other >>> partitioned tables in the tree). Patch 0001 fixes that including a test. >> >> I committed this, but I'm a bit uncomfortable with it: should the >> TupleTableSlot be part of the ModifyTableState rather than the EState? > > Done that way in 0001 of the attached patches. So, instead of making the > standalone partition_tuple_slot a field of EState (with the actual > TupleTableSlot in its tupleTable), it is now allocated within > ModifyTableState and CopyState, and released when ModifyTable node or > CopyFrom finishes, respectively. I dropped some comments from this and committed it. They were formatted in a way that wouldn't survive pgindent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/04 16:31, 高增琦 wrote: > Server crash(failed assertion) when two "insert" in one SQL: > > Step to reproduce: > create table t(a int, b int) partition by range(a); > create table t_p1 partition of t for values from (1) to (100); > create table t_p2 partition of t for values from (100) to (200); > create table t_p3 partition of t for values from (200) to (300); > > create table b(a int, b int); > with a(a,b) as(insert into t values(3, 3) returning a, b) insert into b > select * from a; > > Please check it. Thanks for testing! This should be fixed by a patch I posted earlier (Try the patch 0001 of the patches posted at [1]). Robert did express his concern [2] about the approach used in my patch that was committed as 2ac3ef7a01 [3]; your test demonstrates that it wasn't a good approach after all. Regards, Amit [1] https://www.postgresql.org/message-id/f6f3a214-5bb5-aa8c-f82c-c720348cf086%40lab.ntt.co.jp [2] https://www.postgresql.org/message-id/CA%2BTgmoYUfs8peo-p%2BStw7afTdXqNWv_S4dx_6AWc-Y_ZrGWZbQ%40mail.gmail.com [3] https://git.postgresql.org/gitweb/?p=postgresql.git=commit=2ac3ef7a01df859c62d0a02333b646d65eaec5ff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Server crash(failed assertion) when two "insert" in one SQL: Step to reproduce: create table t(a int, b int) partition by range(a); create table t_p1 partition of t for values from (1) to (100); create table t_p2 partition of t for values from (100) to (200); create table t_p3 partition of t for values from (200) to (300); create table b(a int, b int); with a(a,b) as(insert into t values(3, 3) returning a, b) insert into b select * from a; Please check it. 2017-01-04 14:11 GMT+08:00 Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com>: > On Wed, Jan 4, 2017 at 10:37 AM, Amit Langote < > langote_amit...@lab.ntt.co.jp> wrote: > >> On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote: >> > On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote: >> >> >> >> Attached patch should fix the same. >> > >> > I have applied attached patch, server crash for range is fixed, but >> still >> > getting crash for multi-level list partitioning insert. >> > >> > postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY >> > LIST(c); >> >> [ ... ] >> >> > postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') >> FROM >> > generate_series(0, 599, 2) i; >> > server closed the connection unexpectedly >> > This probably means the server terminated abnormally >> > before or while processing the request. >> > The connection to the server was lost. Attempting reset: Failed. >> >> Hm, that's odd. I tried your new example, but didn't get the crash. >> >> Thanks, >> Amit >> > > Thanks, I have pulled latest sources from git, and then applied patch > "fix-wrong-ecxt_scantuple-crash.patch", Not getting crash now, may be I > have missed something last time. > > -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com
Re: [HACKERS] Declarative partitioning - another take
On Wed, Jan 4, 2017 at 10:37 AM, Amit Langotewrote: > On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote: > > On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote: > >> > >> Attached patch should fix the same. > > > > I have applied attached patch, server crash for range is fixed, but still > > getting crash for multi-level list partitioning insert. > > > > postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY > > LIST(c); > > [ ... ] > > > postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') > FROM > > generate_series(0, 599, 2) i; > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Failed. > > Hm, that's odd. I tried your new example, but didn't get the crash. > > Thanks, > Amit > Thanks, I have pulled latest sources from git, and then applied patch "fix-wrong-ecxt_scantuple-crash.patch", Not getting crash now, may be I have missed something last time.
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote: > On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote: >> >> Attached patch should fix the same. > > I have applied attached patch, server crash for range is fixed, but still > getting crash for multi-level list partitioning insert. > > postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY > LIST(c); [ ... ] > postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') FROM > generate_series(0, 599, 2) i; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. Hm, that's odd. I tried your new example, but didn't get the crash. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 27, 2016 at 3:24 PM, Amit Langotewrote: > On 2016/12/27 18:30, Rajkumar Raghuwanshi wrote: > > Hi Amit, > > > > I have pulled latest sources from git and tried to create multi-level > > partition, getting a server crash, below are steps to reproduce. please > > check if it is reproducible in your machine also. > > > > [ ... ] > > > postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM > > generate_series(0, 599, 2) i; > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Failed. > > Thanks for the example. Looks like there was an oversight in my patch > that got committed as 2ac3ef7a01 [1]. > > Attached patch should fix the same. > > Thanks, > Amit > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h= > 2ac3ef7a01df859c62d0a02333b646d65eaec5ff > Hi Amit, I have applied attached patch, server crash for range is fixed, but still getting crash for multi-level list partitioning insert. postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY LIST(c); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p1 PARTITION OF test_ml_l FOR VALUES IN ('', '0003', '0004', '0010') PARTITION BY LIST (c); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p1_p1 PARTITION OF test_ml_l_p1 FOR VALUES IN ('', '0003'); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p1_p2 PARTITION OF test_ml_l_p1 FOR VALUES IN ('0004', '0010'); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p2 PARTITION OF test_ml_l FOR VALUES IN ('0001', '0005', '0002', '0009') PARTITION BY LIST (c); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p2_p1 PARTITION OF test_ml_l_p2 FOR VALUES IN ('0001', '0005'); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p2_p2 PARTITION OF test_ml_l_p2 FOR VALUES IN ('0002', '0009'); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p3 PARTITION OF test_ml_l FOR VALUES IN ('0006', '0007', '0008', '0011') PARTITION BY LIST (ltrim(c,'A')); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p3_p1 PARTITION OF test_ml_l_p3 FOR VALUES IN ('0006', '0007'); CREATE TABLE postgres=# CREATE TABLE test_ml_l_p3_p2 PARTITION OF test_ml_l_p3 FOR VALUES IN ('0008', '0011'); CREATE TABLE postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 599, 2) i; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/27 19:07, Amit Langote wrote: > Attached should fix that. Here are the last two patches with additional information like other patches. Forgot to do that yesterday. Thanks, Amit >From 5a82b4caa6cec7845eb48e0397fab49c74b8dd98 Mon Sep 17 00:00:00 2001 From: amitDate: Wed, 28 Dec 2016 10:10:26 +0900 Subject: [PATCH 1/2] Set ecxt_scantuple correctly for tuple-routing In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successive levels. If we do end up changing the slot from the original, we must update ecxt_scantuple to point to the new one for partition key expressions to be computed correctly. Also update the regression tests so that the code manipulating ecxt_scantuple is covered. Reported by: Rajkumar Raghuwanshi Patch by: Amit Langote Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com --- src/backend/catalog/partition.c | 2 ++ src/backend/executor/execMain.c | 2 -- src/test/regress/expected/insert.out | 2 +- src/test/regress/sql/insert.sql | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index fca874752f..f9daf8052d 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd, PartitionDesc partdesc = parent->partdesc; TupleTableSlot *myslot = parent->tupslot; TupleConversionMap *map = parent->tupmap; + ExprContext *econtext = GetPerTupleExprContext(estate); /* Quick exit */ if (partdesc->nparts == 0) @@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd, } /* Extract partition key from tuple */ + econtext->ecxt_scantuple = slot; FormPartitionKeyDatum(parent, slot, estate, values, isnull); if (key->strategy == PARTITION_STRATEGY_RANGE) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index bca34a509c..1d699c1dab 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, { int result; Oid failed_at; - ExprContext *econtext = GetPerTupleExprContext(estate); - econtext->ecxt_scantuple = slot; result = get_partition_for_tuple(pd, slot, estate, _at); if (result < 0) { diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out index 49f667b119..ae54625034 100644 --- a/src/test/regress/expected/insert.out +++ b/src/test/regress/expected/insert.out @@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1 drop cascades to table part_ee_ff2 -- more tests for certain multi-level partitioning scenarios create table p (a int, b int) partition by range (a, b); -create table p1 (b int, a int not null) partition by range (b); +create table p1 (b int not null, a int not null) partition by range ((b+0)); create table p11 (like p1); alter table p11 drop a; alter table p11 add a int; diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql index 08dc068de8..9d3a34073c 100644 --- a/src/test/regress/sql/insert.sql +++ b/src/test/regress/sql/insert.sql @@ -173,7 +173,7 @@ drop table list_parted cascade; -- more tests for certain multi-level partitioning scenarios create table p (a int, b int) partition by range (a, b); -create table p1 (b int, a int not null) partition by range (b); +create table p1 (b int not null, a int not null) partition by range ((b+0)); create table p11 (like p1); alter table p11 drop a; alter table p11 add a int; -- 2.11.0 >From 798c51254563735dff843c71f1bbf34b969d8162 Mon Sep 17 00:00:00 2001 From: amit Date: Wed, 28 Dec 2016 10:28:37 +0900 Subject: [PATCH 2/2] No BulkInsertState when tuple-routing is in action MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In future, we might consider alleviating this restriction by allocating a BulkInsertState per partition. Reported by: é«å¢ç¦ Patch by: Amit Langote (with pointers from é«å¢ç¦) Reports: https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com --- src/backend/commands/copy.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index aa25a23336..e9bf4afa44 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate) ErrorContextCallback errcallback; CommandId mycid = GetCurrentCommandId(true); int hi_options = 0; /* start with default heap_insert options */ - BulkInsertState bistate; + BulkInsertState bistate = NULL; uint64 processed = 0; bool useHeapMultiInsert; int
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/27 18:48, 高增琦 wrote: > Hi , > > I tried "COPY FROM" in the git version. It inserts rows to wrong partition. > > step to reproduce: > create table t(a int, b int) partition by range(a); > create table t_p1 partition of t for values from (1) to (100); > create table t_p2 partition of t for values from (100) to (200); > create table t_p3 partition of t for values from (200) to (300); > insert into t values(1,1); > insert into t values(101,101); > insert into t values(201,201); > copy (select * from t) to '/tmp/test2.txt'; > copy t from '/tmp/test2.txt'; > select * from t_p1; > > result: > postgres=# select * from t_p1; > a | b > -+- >1 | 1 >1 | 1 > 101 | 101 > 201 | 201 > (4 rows) > > I think the argument "BulkInsertState" used in CopyFrom/heap_insert > is related to this problem. Please check it. You're quite right. Attached should fix that. Thanks, Amit diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index aa25a23336..e9bf4afa44 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate) ErrorContextCallback errcallback; CommandId mycid = GetCurrentCommandId(true); int hi_options = 0; /* start with default heap_insert options */ - BulkInsertState bistate; + BulkInsertState bistate = NULL; uint64 processed = 0; bool useHeapMultiInsert; int nBufferedTuples = 0; @@ -2482,7 +2482,8 @@ CopyFrom(CopyState cstate) values = (Datum *) palloc(tupDesc->natts * sizeof(Datum)); nulls = (bool *) palloc(tupDesc->natts * sizeof(bool)); - bistate = GetBulkInsertState(); + if (useHeapMultiInsert) + bistate = GetBulkInsertState(); econtext = GetPerTupleExprContext(estate); /* Set up callback to identify error line number */ @@ -2707,7 +2708,8 @@ CopyFrom(CopyState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - FreeBulkInsertState(bistate); + if (bistate != NULL) + FreeBulkInsertState(bistate); MemoryContextSwitchTo(oldcontext); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/27 18:30, Rajkumar Raghuwanshi wrote: > Hi Amit, > > I have pulled latest sources from git and tried to create multi-level > partition, getting a server crash, below are steps to reproduce. please > check if it is reproducible in your machine also. > [ ... ] > postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM > generate_series(0, 599, 2) i; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. Thanks for the example. Looks like there was an oversight in my patch that got committed as 2ac3ef7a01 [1]. Attached patch should fix the same. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7a01df859c62d0a02333b646d65eaec5ff diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index fca874752f..f9daf8052d 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd, PartitionDesc partdesc = parent->partdesc; TupleTableSlot *myslot = parent->tupslot; TupleConversionMap *map = parent->tupmap; + ExprContext *econtext = GetPerTupleExprContext(estate); /* Quick exit */ if (partdesc->nparts == 0) @@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd, } /* Extract partition key from tuple */ + econtext->ecxt_scantuple = slot; FormPartitionKeyDatum(parent, slot, estate, values, isnull); if (key->strategy == PARTITION_STRATEGY_RANGE) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index bca34a509c..1d699c1dab 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, { int result; Oid failed_at; - ExprContext *econtext = GetPerTupleExprContext(estate); - econtext->ecxt_scantuple = slot; result = get_partition_for_tuple(pd, slot, estate, _at); if (result < 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi , I tried "COPY FROM" in the git version. It inserts rows to wrong partition. step to reproduce: create table t(a int, b int) partition by range(a); create table t_p1 partition of t for values from (1) to (100); create table t_p2 partition of t for values from (100) to (200); create table t_p3 partition of t for values from (200) to (300); insert into t values(1,1); insert into t values(101,101); insert into t values(201,201); copy (select * from t) to '/tmp/test2.txt'; copy t from '/tmp/test2.txt'; select * from t_p1; result: postgres=# select * from t_p1; a | b -+- 1 | 1 1 | 1 101 | 101 201 | 201 (4 rows) I think the argument "BulkInsertState" used in CopyFrom/heap_insert is related to this problem. Please check it. Thanks. 2016-12-27 17:30 GMT+08:00 Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com>: > Hi Amit, > > I have pulled latest sources from git and tried to create multi-level > partition, getting a server crash, below are steps to reproduce. please > check if it is reproducible in your machine also. > > postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY > RANGE(a); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM > (0) TO (250) PARTITION BY RANGE (b); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES > FROM (0) TO (100); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES > FROM (100) TO (250); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM > (250) TO (500) PARTITION BY RANGE (c); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES > FROM ('0250') TO ('0400'); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES > FROM ('0400') TO ('0500'); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM > (500) TO (600) PARTITION BY RANGE ((b + a)); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES > FROM (1000) TO (1100); > CREATE TABLE > postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES > FROM (1100) TO (1200); > CREATE TABLE > postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM > generate_series(0, 599, 2) i; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > Thanks & Regards, > Rajkumar Raghuwanshi > QMG, EnterpriseDB Corporation > > > -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com
Re: [HACKERS] Declarative partitioning - another take
Hi Amit, I have pulled latest sources from git and tried to create multi-level partition, getting a server crash, below are steps to reproduce. please check if it is reproducible in your machine also. postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY RANGE(a); CREATE TABLE postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM (0) TO (250) PARTITION BY RANGE (b); CREATE TABLE postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES FROM (0) TO (100); CREATE TABLE postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES FROM (100) TO (250); CREATE TABLE postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM (250) TO (500) PARTITION BY RANGE (c); CREATE TABLE postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES FROM ('0250') TO ('0400'); CREATE TABLE postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES FROM ('0400') TO ('0500'); CREATE TABLE postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM (500) TO (600) PARTITION BY RANGE ((b + a)); CREATE TABLE postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES FROM (1000) TO (1100); CREATE TABLE postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES FROM (1100) TO (1200); CREATE TABLE postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM generate_series(0, 599, 2) i; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/26 19:46, Amit Langote wrote: > (Perhaps, the following should be its own new thread) > > I noticed that ExecProcessReturning() doesn't work properly after tuple > routing (example shows how returning tableoid currently fails but I > mention some other issues below): > > create table p (a int, b int) partition by range (a); > create table p1 partition of p for values from (1) to (10); > insert into p values (1) returning tableoid::regclass, *; > tableoid | a | b > --+---+--- > -| 1 | > (1 row) > > INSERT 0 1 > > I tried to fix that in 0007 to get: > > insert into p values (1) returning tableoid::regclass, *; > tableoid | a | b > --+---+--- > p| 1 | > (1 row) > > INSERT 0 1 > > But I think it *may* be wrong to return the root table OID for tuples > inserted into leaf partitions, because with select we get partition OIDs: > > select tableoid::regclass, * from p; > tableoid | a | b > --+---+--- > p1 | 1 | > (1 row) > > If so, that means we should build the projection info (corresponding to > the returning list) for each target partition somehow. ISTM, that's going > to have to be done within the planner by appropriate inheritance > translation of the original returning targetlist. Turns out getting the 2nd result may not require planner tweaks after all. Unless I'm missing something, translation of varattnos of the RETURNING target list can be done as late as ExecInitModifyTable() for the insert case, unlike update/delete (which do require planner's attention). I updated the patch 0007 to implement the same, including the test. While doing that, I realized map_partition_varattnos introduced in 0003 is rather restrictive in its applicability, because it assumes varno = 1 for the expressions it accepts as input for the mapping. Mapping returning (target) list required modifying map_partition_varattnos to accept target_varno as additional argument. That way, we can map arbitrary expressions from the parent attributes numbers to partition attribute numbers for expressions not limited to partition constraints. Patches 0001 to 0006 unchanged. Thanks, Amit >From fcfe08948d31802547e93ac6551873afd554bc36 Mon Sep 17 00:00:00 2001 From: amitDate: Mon, 26 Dec 2016 11:53:19 +0900 Subject: [PATCH 1/7] Allocate partition_tuple_slot in respective nodes ...instead of making it part of EState and its tuple table. Respective nodes means ModifyTableState and CopyState for now. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/commands/copy.c| 30 +- src/backend/executor/execMain.c| 12 src/backend/executor/nodeModifyTable.c | 17 - src/include/executor/executor.h| 1 + src/include/nodes/execnodes.h | 6 +++--- 5 files changed, 41 insertions(+), 25 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index aa25a23336..e5a0f1bf80 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -161,11 +161,18 @@ typedef struct CopyStateData ExprState **defexprs; /* array of default att expressions */ bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; + PartitionDispatch *partition_dispatch_info; - int num_dispatch; - int num_partitions; - ResultRelInfo *partitions; + /* Tuple-routing support info */ + int num_dispatch; /* Number of entries in the above array */ + int num_partitions; /* Number of members in the following + * arrays */ + ResultRelInfo *partitions; /* Per partition result relation */ TupleConversionMap **partition_tupconv_maps; + /* Per partition tuple conversion map */ + TupleTableSlot *partition_tuple_slot; + /* Slot used to manipulate a tuple after + * it is routed to a partition */ /* * These variables are used to reduce overhead in textual COPY FROM. @@ -1409,6 +1416,7 @@ BeginCopy(ParseState *pstate, PartitionDispatch *partition_dispatch_info; ResultRelInfo *partitions; TupleConversionMap **partition_tupconv_maps; + TupleTableSlot *partition_tuple_slot; int num_parted, num_partitions; @@ -1416,12 +1424,14 @@ BeginCopy(ParseState *pstate, _dispatch_info, , _tupconv_maps, + _tuple_slot, _parted, _partitions); cstate->partition_dispatch_info = partition_dispatch_info; cstate->num_dispatch = num_parted; cstate->partitions = partitions; cstate->num_partitions = num_partitions; cstate->partition_tupconv_maps = partition_tupconv_maps; + cstate->partition_tuple_slot = partition_tuple_slot; } } else @@ -2436,15 +2446,6 @@ CopyFrom(CopyState cstate) estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); /* - * Initialize a dedicated slot to manipulate tuples of any given - * partition's rowtype. - */ - if
Re: [HACKERS] Declarative partitioning - another take
Sorry about the delay in replying. On 2016/12/23 8:08, Robert Haas wrote: > On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote >wrote: >> While working on that, I discovered yet-another-bug having to do with the >> tuple descriptor that's used as we route a tuple down a partition tree. If >> attnums of given key attribute(s) are different on different levels, it >> would be incorrect to use the original slot's (one passed by ExecInsert()) >> tuple descriptor to inspect the original slot's heap tuple, as we go down >> the tree. It might cause spurious "partition not found" at some level due >> to looking at incorrect field in the input tuple because of using the >> wrong tuple descriptor (root table's attnums not always same as other >> partitioned tables in the tree). Patch 0001 fixes that including a test. > > I committed this, but I'm a bit uncomfortable with it: should the > TupleTableSlot be part of the ModifyTableState rather than the EState? Done that way in 0001 of the attached patches. So, instead of making the standalone partition_tuple_slot a field of EState (with the actual TupleTableSlot in its tupleTable), it is now allocated within ModifyTableState and CopyState, and released when ModifyTable node or CopyFrom finishes, respectively. >> It also addresses the problem I mentioned previously that once >> tuple-routing is done, we failed to switch to a slot with the leaf >> partition's tupdesc (IOW, continued to use the original slot with root >> table's tupdesc causing spurious failures due to differences in attums >> between the leaf partition and the root table). >> >> Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for >> in my previous message. Each patch has a test for the bug it's meant to fix. > > Regarding 0002, I think that this is kind of a strange fix. Wouldn't > it be better to get hold of the original tuple instead of reversing > the conversion? And what of the idea of avoiding the conversion in > the (probably very common) case where we can? To get hold of the original tuple, how about adding an argument orig_slot to ExecConstraints()? I've implemented that approach in the new 0002. Regarding the possibility of avoiding the conversion in very common cases, I think that could be done considering the following: If the mapping from the attribute numbers of the parent table to that of a child table is an identity map, we don't need to convert tuples. Currently however, convert_tuples_by_name() also requires tdtypeid of the input and output TupleDescs to be equal. The reason cited for that is that we may fail to "inject the right OID into the tuple datum" if the types don't match. In case of partitioning, hasoid status must match between the parent and its partitions at all times, so the aforementioned condition is satisfied without requiring that tdtypeid are same. And oid column (if present) is always located at a given position in HeapTuple, so need not map that. Based on the above argument, patch 0006 teaches convert_tuples_by_name() to *optionally* not require tdtypeid of input and output tuple descriptors to be equal. It's implemented by introducing a new argument to convert_tuples_by_name() named 'consider_typeid'. We pass 'false' only for the partitioning cases. (Perhaps, the following should be its own new thread) I noticed that ExecProcessReturning() doesn't work properly after tuple routing (example shows how returning tableoid currently fails but I mention some other issues below): create table p (a int, b int) partition by range (a); create table p1 partition of p for values from (1) to (10); insert into p values (1) returning tableoid::regclass, *; tableoid | a | b --+---+--- -| 1 | (1 row) INSERT 0 1 I tried to fix that in 0007 to get: insert into p values (1) returning tableoid::regclass, *; tableoid | a | b --+---+--- p| 1 | (1 row) INSERT 0 1 But I think it *may* be wrong to return the root table OID for tuples inserted into leaf partitions, because with select we get partition OIDs: select tableoid::regclass, * from p; tableoid | a | b --+---+--- p1 | 1 | (1 row) If so, that means we should build the projection info (corresponding to the returning list) for each target partition somehow. ISTM, that's going to have to be done within the planner by appropriate inheritance translation of the original returning targetlist. Thanks, Amit >From 89f8740195189cc77391bdb844f5092c0440f061 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 26 Dec 2016 11:53:19 +0900 Subject: [PATCH 1/7] Allocate partition_tuple_slot in respective nodes ...instead of making it part of EState and its tuple table. Respective nodes means ModifyTableState and CopyState for now. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/commands/copy.c| 30 +- src/backend/executor/execMain.c
Re: [HACKERS] Declarative partitioning - another take
On Thu, Dec 22, 2016 at 3:35 AM, Amit Langotewrote: > While working on that, I discovered yet-another-bug having to do with the > tuple descriptor that's used as we route a tuple down a partition tree. If > attnums of given key attribute(s) are different on different levels, it > would be incorrect to use the original slot's (one passed by ExecInsert()) > tuple descriptor to inspect the original slot's heap tuple, as we go down > the tree. It might cause spurious "partition not found" at some level due > to looking at incorrect field in the input tuple because of using the > wrong tuple descriptor (root table's attnums not always same as other > partitioned tables in the tree). Patch 0001 fixes that including a test. I committed this, but I'm a bit uncomfortable with it: should the TupleTableSlot be part of the ModifyTableState rather than the EState? > It also addresses the problem I mentioned previously that once > tuple-routing is done, we failed to switch to a slot with the leaf > partition's tupdesc (IOW, continued to use the original slot with root > table's tupdesc causing spurious failures due to differences in attums > between the leaf partition and the root table). > > Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for > in my previous message. Each patch has a test for the bug it's meant to fix. Regarding 0002, I think that this is kind of a strange fix. Wouldn't it be better to get hold of the original tuple instead of reversing the conversion? And what of the idea of avoiding the conversion in the (probably very common) case where we can? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/22 1:50, Robert Haas wrote: > On Wed, Dec 21, 2016 at 5:33 AM, Amit Langote >wrote: >> Breaking changes into multiple commits/patches does not seem to work for >> adding regression tests. So, I've combined multiple patches into a single >> patch which is now patch 0002 in the attached set of patches. > > Ugh, seriously? It's fine to combine closely related bug fixes but > not all of these are. I don't see why you can't add some regression > tests in one patch and then add some more in the next patch. I managed to do that this time around with the attached set of patches. Guess I gave up too easily in the previous attempt. While working on that, I discovered yet-another-bug having to do with the tuple descriptor that's used as we route a tuple down a partition tree. If attnums of given key attribute(s) are different on different levels, it would be incorrect to use the original slot's (one passed by ExecInsert()) tuple descriptor to inspect the original slot's heap tuple, as we go down the tree. It might cause spurious "partition not found" at some level due to looking at incorrect field in the input tuple because of using the wrong tuple descriptor (root table's attnums not always same as other partitioned tables in the tree). Patch 0001 fixes that including a test. It also addresses the problem I mentioned previously that once tuple-routing is done, we failed to switch to a slot with the leaf partition's tupdesc (IOW, continued to use the original slot with root table's tupdesc causing spurious failures due to differences in attums between the leaf partition and the root table). Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for in my previous message. Each patch has a test for the bug it's meant to fix. Patch 0005 is the same old "Add some more tests for tuple-routing" per [1]: > Meanwhile, committed the latest 0001 and the elog() patch. Thanks! Regards, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoZ86v1G%2Bzx9etMiSQaBBvYMKfU-iitqZArSh5z0n8Q4cA%40mail.gmail.com >From cac625b22990c12a537eaa4c77434017a2303b92 Mon Sep 17 00:00:00 2001 From: amit Date: Thu, 22 Dec 2016 15:33:41 +0900 Subject: [PATCH 1/5] Use correct tuple descriptor before and after routing a tuple When we have a multi-level partitioned hierarchy where tables at successive levels have different attnums for a partition key attribute(s), we must use the tuple descriptor of a partitioned table of the given level to inspect the appropriately converted representation of the input tuple before computing the partition key to perform tuple-routing. So, store in each PartitionDispatch object, a standalone TupleTableSlot initialized with the TupleDesc of the corresponding partitioned table, along with a TupleConversionMap to map tuples from the its parent's rowtype to own rowtype. After the routing is done and a leaf partition returned, we must use the leaf partition's tuple descriptor, not the root table's. For roughly same reasons as above. For the ext row and so on, we must then switch back to the root table's tuple descriptor. To that end, a dedicated TupleTableSlot is allocated in EState called es_partition_tuple_slot, whose descriptor is set to a given leaf partition for every input tuple after it's routed. Reported by: n/a Patch by: Amit Langote Reports: n/a --- src/backend/catalog/partition.c| 74 -- src/backend/commands/copy.c| 31 +- src/backend/executor/nodeModifyTable.c | 27 + src/include/catalog/partition.h| 7 src/include/nodes/execnodes.h | 3 ++ src/test/regress/expected/insert.out | 37 + src/test/regress/sql/insert.sql| 26 7 files changed, 190 insertions(+), 15 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 9980582b77..fca874752f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -923,13 +923,19 @@ RelationGetPartitionQual(Relation rel, bool recurse) return generate_partition_qual(rel, recurse); } -/* Turn an array of OIDs with N elements into a list */ -#define OID_ARRAY_TO_LIST(arr, N, list) \ +/* + * Append OIDs of rel's partitions to the list 'partoids' and for each OID, + * append pointer rel to the list 'parents'. + */ +#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \ do\ {\ int i;\ - for (i = 0; i < (N); i++)\ - (list) = lappend_oid((list), (arr)[i]);\ + for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\ + {\ + (partoids) = lappend_oid((partoids), (rel)->rd_partdesc->oids[i]);\ + (parents) = lappend((parents), (rel));\ + }\ } while(0) /* @@ -944,11 +950,13 @@ PartitionDispatch * RelationGetPartitionDispatchInfo(Relation rel, int lockmode, int *num_parted, List **leaf_part_oids) { - PartitionDesc rootpartdesc =
Re: [HACKERS] Declarative partitioning - another take
On Wed, Dec 21, 2016 at 5:33 AM, Amit Langotewrote: > Breaking changes into multiple commits/patches does not seem to work for > adding regression tests. So, I've combined multiple patches into a single > patch which is now patch 0002 in the attached set of patches. Ugh, seriously? It's fine to combine closely related bug fixes but not all of these are. I don't see why you can't add some regression tests in one patch and then add some more in the next patch. Meanwhile, committed the latest 0001 and the elog() patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/21 1:53, Robert Haas wrote: > On Mon, Dec 19, 2016 at 10:59 PM, Robert Haaswrote: >> On Sun, Dec 18, 2016 at 10:00 PM, Amit Langote >> wrote: >>> Here are updated patches including the additional information. >> >> Thanks. Committed 0001. Will have to review the others when I'm less tired. > > 0002. Can you add a test case for the bug fixed by this patch? Done (also see below). > 0003. Loses equalTupleDescs() check and various cases where > ExecOpenIndexes can be skipped. Isn't that bad? I realized that as far as checking whether tuple conversion mapping is required, the checks performed by convert_tuples_by_name() at the beginning of the function following the comment /* Verify compatibility and prepare attribute-number map */ are enough. If equalTupleDescs() returned false (which it always does because tdtypeid are never the same for passed in tuple descriptors), we would be repeating some of the tests in convert_tuples_by_name() anyway. As for the checks performed for ExecOpenIndices(), it seems would be better to keep the following in place, so added back: if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex && leaf_part_rri->ri_IndexRelationDescs == NULL) ExecOpenIndices(leaf_part_rri, false); > Also, "We locked all > the partitions above including the leaf partitions" should say "Caller > must have locked all the partitions including the leaf partitions". No, we do the locking in RelationGetPartitionDispatchInfo(), which is called by ExecSetupPartitionTupleRouting() itself. In ExecSetupPartitionTupleRouting() is the first time we lock all the partitions. > 0004. Unnecessary whitespace change in executor.h. Still don't > understand why we need to hoist RelationGetPartitionQual() into the > caller. We only need to check a result relation's (ri_RelationDesc's) partition constraint if we are inserting into the result relation directly. In case of tuple-routing, we do not want to check the leaf partitions' partition constraint, but if the direct target in that case is an internal partition, we must check its partition constraint, which is same for all leaf partitions in that sub-tree. It wouldn't be wrong per se to check each leaf partition's constraint in that case, which includes the target partitioned table's constraint as well, but that would inefficient due to both having to retrieve the constraints and having ExecConstraints() *unnecessarily* check it for every row. If we keep doing RelationGetPartitionQual() in InitResultRelInfo() controlled by a bool argument (load_partition_check), we cannot avoid the above mentioned inefficiency if we want to fix this bug. > 0005. Can you add a test case for the bug fixed by this patch? Done, but... Breaking changes into multiple commits/patches does not seem to work for adding regression tests. So, I've combined multiple patches into a single patch which is now patch 0002 in the attached set of patches. Its commit message is very long now. To show an example of bugs that 0002 is meant for: create table p (a int, b int) partition by range (a, b); create table p1 (b int, a int not null) partition by range (b); create table p11 (like p1); alter table p11 drop a; alter table p11 add a int; alter table p11 drop a; alter table p11 add a int not null; # select attrelid::regclass, attname, attnum from pg_attribute where attnum > 0 and (attrelid = 'p'::regclass or attrelid = 'p1'::regclass or attrelid = 'p11'::regclass) and attname = 'a'; attrelid | attname | attnum --+-+ p| a | 1 p1 | a | 2 p11 | a | 4 (3 rows) alter table p1 attach partition p11 for values from (1) to (5); alter table p attach partition p1 for values from (1, 1) to (1, 10); -- the following is wrong # insert into p11 (a, b) values (10, 4); INSERT 0 1 -- wrong too (using the wrong TupleDesc after tuple routing) # insert into p1 (a, b) values (10, 4); ERROR: null value in column "a" violates not-null constraint DETAIL: Failing row contains (4, null). -- once we fix the wrong TupleDesc issue # insert into p1 (a, b) values (10, 4); INSERT 0 1 which is wrong because p1, as a partition of p, should not accept 10 for a. But its partition constraint is not being applied to the leaf partition p11 into which the tuple is routed (the bug). Thanks, Amit >From d21ae74ae01e93f21df5b84ed097ebbc85e529dd Mon Sep 17 00:00:00 2001 From: amit Date: Thu, 15 Dec 2016 15:56:27 +0900 Subject: [PATCH 1/3] Refactor tuple-routing setup code It's unnecessarily repeated in copy.c and nodeModifyTable.c, which makes it a burden to maintain. Should've been like this to begin with. I moved the common code to ExecSetupPartitionTupleRouting() in execMain.c that also houses ExecFindParttion() currently. Hmm, should there be a new src/backend/executor/execPartition.c? Reported
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/21 14:03, Amit Langote wrote: > OK, updated patch attached. Oops, incomplete patch that was. Complete patch attached this time. Thanks, Amit diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 1c219b03dd..115b98313e 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13297,8 +13297,11 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) } } + /* It's safe to skip the validation scan after all */ if (skip_validate) - elog(NOTICE, "skipping scan to validate partition constraint"); + ereport(INFO, +(errmsg("partition constraint for table \"%s\" is implied by existing constraints", + RelationGetRelationName(attachRel; /* * Set up to have the table to be scanned to validate the partition diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 99e20eb922..62e18961d3 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3179,7 +3179,7 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); ALTER TABLE list_parted2 DETACH PARTITION part_3_4; ALTER TABLE part_3_4 ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); -NOTICE: skipping scan to validate partition constraint +INFO: partition constraint for table "part_3_4" is implied by existing constraints -- check validation when attaching range partitions CREATE TABLE range_parted ( a int, @@ -3204,7 +3204,7 @@ CREATE TABLE part2 ( b int NOT NULL CHECK (b >= 10 AND b < 18) ); ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20); -NOTICE: skipping scan to validate partition constraint +INFO: partition constraint for table "part2" is implied by existing constraints -- check that leaf partitions are scanned when attaching a partitioned -- table CREATE TABLE part_5 ( @@ -3219,7 +3219,7 @@ ERROR: partition constraint is violated by some row DELETE FROM part_5_a WHERE a NOT IN (3); ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); -NOTICE: skipping scan to validate partition constraint +INFO: partition constraint for table "part_5" is implied by existing constraints -- check that the table being attached is not already a partition ALTER TABLE list_parted2 ATTACH PARTITION part_2 FOR VALUES IN (2); ERROR: "part_2" is already a partition -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/21 13:42, Robert Haas wrote: > On Tue, Dec 20, 2016 at 9:14 PM, Amit Langote >wrote: >> On 2016/12/21 1:45, Alvaro Herrera wrote: >>> Robert Haas wrote: On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera wrote: > Even if we decide to keep the message, I think it's not very good > wording anyhow; as a translator I disliked it on sight. Instead of > "skipping scan to validate" I would use "skipping validation scan", > except that it's not clear what it is we're validating. Mentioning > partition constraint in errcontext() doesn't like a great solution, but > I can't think of anything better. Maybe something like: partition constraint for table \"%s\" is implied by existing constraints >>> >>> Actually, shouldn't we emit a message if we *don't* skip the check? >> >> Scanning (aka, not skipping) to validate the partition constraint is the >> default behavior, so a user would be expecting it anyway, IOW, need not be >> informed of it. But when ATExecAttachPartition's efforts to avoid the >> scan by comparing the partition constraint against existing constraints >> (which the user most probably deliberately added just for this) succeed, >> that seems like a better piece of information to provide the user with, >> IMHO. But then again, having a message printed before a potentially long >> validation scan seems like something a user would like to see, to know >> what it is that is going to take so long. Hmm. >> >> Anyway, what would the opposite of Robert's suggested message look like: >> "scanning table \"%s\" to validate partition constraint"? > > Maybe: partition constraint for table \"%s\" is implied by existing > constraints OK, updated patch attached. Thanks, Amit diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 2a324c0b49..62e18961d3 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3179,7 +3179,7 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); ALTER TABLE list_parted2 DETACH PARTITION part_3_4; ALTER TABLE part_3_4 ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); -INFO: skipping scan to validate partition constraint +INFO: partition constraint for table "part_3_4" is implied by existing constraints -- check validation when attaching range partitions CREATE TABLE range_parted ( a int, @@ -3204,7 +3204,7 @@ CREATE TABLE part2 ( b int NOT NULL CHECK (b >= 10 AND b < 18) ); ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20); -INFO: skipping scan to validate partition constraint +INFO: partition constraint for table "part2" is implied by existing constraints -- check that leaf partitions are scanned when attaching a partitioned -- table CREATE TABLE part_5 ( @@ -3219,7 +3219,7 @@ ERROR: partition constraint is violated by some row DELETE FROM part_5_a WHERE a NOT IN (3); ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); -INFO: skipping scan to validate partition constraint +INFO: partition constraint for table "part_5" is implied by existing constraints -- check that the table being attached is not already a partition ALTER TABLE list_parted2 ATTACH PARTITION part_2 FOR VALUES IN (2); ERROR: "part_2" is already a partition -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 20, 2016 at 9:14 PM, Amit Langotewrote: > On 2016/12/21 1:45, Alvaro Herrera wrote: >> Robert Haas wrote: >>> On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera >>> wrote: Even if we decide to keep the message, I think it's not very good wording anyhow; as a translator I disliked it on sight. Instead of "skipping scan to validate" I would use "skipping validation scan", except that it's not clear what it is we're validating. Mentioning partition constraint in errcontext() doesn't like a great solution, but I can't think of anything better. >>> >>> Maybe something like: partition constraint for table \"%s\" is implied >>> by existing constraints >> >> Actually, shouldn't we emit a message if we *don't* skip the check? > > Scanning (aka, not skipping) to validate the partition constraint is the > default behavior, so a user would be expecting it anyway, IOW, need not be > informed of it. But when ATExecAttachPartition's efforts to avoid the > scan by comparing the partition constraint against existing constraints > (which the user most probably deliberately added just for this) succeed, > that seems like a better piece of information to provide the user with, > IMHO. But then again, having a message printed before a potentially long > validation scan seems like something a user would like to see, to know > what it is that is going to take so long. Hmm. > > Anyway, what would the opposite of Robert's suggested message look like: > "scanning table \"%s\" to validate partition constraint"? Maybe: partition constraint for table \"%s\" is implied by existing constraints -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/21 1:45, Alvaro Herrera wrote: > Robert Haas wrote: >> On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera >>wrote: >>> Even if we decide to keep the message, I think it's not very good >>> wording anyhow; as a translator I disliked it on sight. Instead of >>> "skipping scan to validate" I would use "skipping validation scan", >>> except that it's not clear what it is we're validating. Mentioning >>> partition constraint in errcontext() doesn't like a great solution, but >>> I can't think of anything better. >> >> Maybe something like: partition constraint for table \"%s\" is implied >> by existing constraints > > Actually, shouldn't we emit a message if we *don't* skip the check? Scanning (aka, not skipping) to validate the partition constraint is the default behavior, so a user would be expecting it anyway, IOW, need not be informed of it. But when ATExecAttachPartition's efforts to avoid the scan by comparing the partition constraint against existing constraints (which the user most probably deliberately added just for this) succeed, that seems like a better piece of information to provide the user with, IMHO. But then again, having a message printed before a potentially long validation scan seems like something a user would like to see, to know what it is that is going to take so long. Hmm. Anyway, what would the opposite of Robert's suggested message look like: "scanning table \"%s\" to validate partition constraint"? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers