subject:"\\\[HACKERS\\\] Declarative partitioning \\\- another take"

Re: [HACKERS] Declarative partitioning - another take

2017-06-29 Thread Etsuro Fujita


Hi Maksim,

On 2017/04/07 19:52, Maksim Milyutin wrote:

On 07.04.2017 13:05, Etsuro Fujita wrote:



On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;



There seems to be no work on the first one, so I'd like to work on that.



Yes, you can start to work on this, I'll join later as a reviewer.


Great!  I added the patch to the next commitfest:

https://commitfest.postgresql.org/14/1184/

Sorry for the delay.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-10 Thread Robert Haas

On Wed, May 10, 2017 at 6:50 AM, Etsuro Fujita
 wrote:
> I started working on this.  I agree that the changes made in
> make_modifytable would be unacceptable, but I'd vote for Amit's idea of
> passing a modified PlannerInfo to PlanForeignModify so that the FDW can do
> query planning for INSERT into a foreign partition in the same way as for
> INSERT into a non-partition foreign table.  (Though, I think we should
> generate a more-valid-looking working-copy of the PlannerInfo which has
> Query with the foreign partition as target.)  I'm not sure it's a good idea
> to add a new FDW API or modify the existing one such as PlanForeignModify
> for this purpose.

Thanks for working on this, but I think it would be better to start a
new thread for this discussion.

And probably also for any other issues that come up.  This thread has
gotten so long that between the original discussion and commit of the
patch and discussion of multiple follow-on commits and patches that
it's very hard to follow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-10 Thread Etsuro Fujita


On 2016/11/18 1:43, Robert Haas wrote:

On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote
 wrote:



- The code in make_modifytable() to swap out the rte_array for a fake
one looks like an unacceptable kludge.  I don't know offhand what a
better design would look like, but what you've got is really ugly.



Agree that it looks horrible.  The problem is we don't add partition
(child table) RTEs when planning an insert on the parent and FDW
partitions can't do without some planner handling - planForeignModify()
expects a valid PlannerInfo for deparsing target lists (basically, to be
able to use planner_rt_fetch()).



If it's only needed for foreign tables, how about for v1 we just throw
an error and say errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route inserted tuples to a foreign table") for now.  We
can come back and fix it later.  Doing more inheritance expansion



Coming up with some new FDW API or some modification
to the existing one is probably better, but I don't really want to get
hung up on that right now.


I started working on this.  I agree that the changes made in 
make_modifytable would be unacceptable, but I'd vote for Amit's idea of 
passing a modified PlannerInfo to PlanForeignModify so that the FDW can 
do query planning for INSERT into a foreign partition in the same way as 
for INSERT into a non-partition foreign table.  (Though, I think we 
should generate a more-valid-looking working-copy of the PlannerInfo 
which has Query with the foreign partition as target.)  I'm not sure 
it's a good idea to add a new FDW API or modify the existing one such as 
PlanForeignModify for this purpose.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Amit Langote

On 2017/05/10 12:59, Robert Haas wrote:
> On Tue, May 9, 2017 at 11:54 PM, Thomas Munro
>  wrote:
>> +A statement that targets a parent table in a inheritance or partitioning
>>
>> A tiny typo: s/a inheritance/an inheritance/
> 
> Now he tells me.

Thanks both.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Robert Haas

On Tue, May 9, 2017 at 11:54 PM, Thomas Munro
 wrote:
> +A statement that targets a parent table in a inheritance or partitioning
>
> A tiny typo: s/a inheritance/an inheritance/

Now he tells me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Thomas Munro

On Wed, May 10, 2017 at 3:51 PM, Robert Haas  wrote:
> On Sun, May 7, 2017 at 9:44 PM, Amit Langote
>  wrote:
>> I think that makes sense.  Modified it to read: "A statement that targets
>> a parent table in a inheritance or partitioning hierarchy..." in the
>> attached updated patch.
>
> LGTM.  Committed.

+A statement that targets a parent table in a inheritance or partitioning

A tiny typo: s/a inheritance/an inheritance/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Robert Haas

On Sun, May 7, 2017 at 9:44 PM, Amit Langote
 wrote:
> I think that makes sense.  Modified it to read: "A statement that targets
> a parent table in a inheritance or partitioning hierarchy..." in the
> attached updated patch.

LGTM.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Amit Langote

On 2017/05/08 10:22, Thomas Munro wrote:
> On Mon, May 8, 2017 at 12:47 PM, Amit Langote
>  wrote:
>> On 2017/05/03 2:48, Robert Haas wrote:
>>> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>>>  wrote:
 You're right.  I agree that whatever text we add here should be pointing
 out that statement-level triggers of affected child tables are not fired,
 when root parent is specified in the command.

 Since there was least some talk of changing that behavior for regular
 inheritance so that statement triggers of any affected children are fired
 [1], I thought we shouldn't say something general that applies to both
 inheritance and partitioning.  But since nothing has happened in that
 regard, we might as well.

 How about the attached?
>>>
>>> Looks better, but I think we should say "statement" instead of
>>> "operation" for consistency with the previous paragraph, and it
>>> certainly shouldn't be capitalized.
>>
>> Agreed, done.  Attached updated patch.
> 
> 
> +A statement that targets the root table in a inheritance or partitioning
> +hierarchy does not cause the statement-level triggers of affected child
> +tables to be fired; only the root table's statement-level triggers are
> +fired.  However, row-level triggers of any affected child tables will be
> +fired.
> +   
> +
> +   
> 
> Why talk specifically about the "root" table?  Wouldn't we describe
> the situation more generally if we said [a,the] "parent"?

I think that makes sense.  Modified it to read: "A statement that targets
a parent table in a inheritance or partitioning hierarchy..." in the
attached updated patch.

Thanks,
Amit
>From 5c2c453235d5eedf857a5e7123337aae5aedd761 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..ce76a1f042 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+A statement that targets a parent table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the parent table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Thomas Munro

On Mon, May 8, 2017 at 12:47 PM, Amit Langote
 wrote:
> On 2017/05/03 2:48, Robert Haas wrote:
>> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>>  wrote:
>>> You're right.  I agree that whatever text we add here should be pointing
>>> out that statement-level triggers of affected child tables are not fired,
>>> when root parent is specified in the command.
>>>
>>> Since there was least some talk of changing that behavior for regular
>>> inheritance so that statement triggers of any affected children are fired
>>> [1], I thought we shouldn't say something general that applies to both
>>> inheritance and partitioning.  But since nothing has happened in that
>>> regard, we might as well.
>>>
>>> How about the attached?
>>
>> Looks better, but I think we should say "statement" instead of
>> "operation" for consistency with the previous paragraph, and it
>> certainly shouldn't be capitalized.
>
> Agreed, done.  Attached updated patch.


+A statement that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   

Why talk specifically about the "root" table?  Wouldn't we describe
the situation more generally if we said [a,the] "parent"?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Amit Langote

On 2017/05/03 2:48, Robert Haas wrote:
> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>  wrote:
>> You're right.  I agree that whatever text we add here should be pointing
>> out that statement-level triggers of affected child tables are not fired,
>> when root parent is specified in the command.
>>
>> Since there was least some talk of changing that behavior for regular
>> inheritance so that statement triggers of any affected children are fired
>> [1], I thought we shouldn't say something general that applies to both
>> inheritance and partitioning.  But since nothing has happened in that
>> regard, we might as well.
>>
>> How about the attached?
> 
> Looks better, but I think we should say "statement" instead of
> "operation" for consistency with the previous paragraph, and it
> certainly shouldn't be capitalized.

Agreed, done.  Attached updated patch.

Thanks,
Amit
>From 1d7e383c6d89ebabacc7aa3f7d9987779daaa4fb Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..89e8c01a71 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+A statement that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-02 Thread Robert Haas

On Tue, May 2, 2017 at 3:30 AM, Amit Langote
 wrote:
> You're right.  I agree that whatever text we add here should be pointing
> out that statement-level triggers of affected child tables are not fired,
> when root parent is specified in the command.
>
> Since there was least some talk of changing that behavior for regular
> inheritance so that statement triggers of any affected children are fired
> [1], I thought we shouldn't say something general that applies to both
> inheritance and partitioning.  But since nothing has happened in that
> regard, we might as well.
>
> How about the attached?

Looks better, but I think we should say "statement" instead of
"operation" for consistency with the previous paragraph, and it
certainly shouldn't be capitalized.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-02 Thread Amit Langote

On 2017/05/01 21:30, Robert Haas wrote:
> On Mon, May 1, 2017 at 12:18 AM, Amit Langote
>  wrote:
>> Attached updated patch.
> 
> Committed, except for this bit:

Thanks.

> +A statement-level trigger defined on partitioned tables is fired only
> +once for the table itself, not once for every table in the partitioning
> +hierarchy.  However, row-level triggers of any affected leaf partitions
> +will be fired.
> 
> The first sentence there has a number of issues.  Grammatically, there
> is an agreement problem: trigger is singular, but partitioned table is
> plural, and one trigger isn't defined across multiple tables.  It
> would have to say something like "A statement-level trigger defined on
> a partitioned table".  But even with that correction, it's not really
> saying what you want it to say.  Nobody would expect that the same
> statement-level trigger would be fired multiple times.  The issue is
> whether statement-level triggers on the partitions themselves will be
> fired, not the statement-level trigger on the partitioned table.
> Also, if this applies to inheritance as well as partitioning, then why
> mention only partitioning?  I think we might want to document
> something more here, but not like this.

You're right.  I agree that whatever text we add here should be pointing
out that statement-level triggers of affected child tables are not fired,
when root parent is specified in the command.

Since there was least some talk of changing that behavior for regular
inheritance so that statement triggers of any affected children are fired
[1], I thought we shouldn't say something general that applies to both
inheritance and partitioning.  But since nothing has happened in that
regard, we might as well.

How about the attached?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/cd282adde5b70b20c57f53bb9ab75...@biglumber.com
>From d2d27ca458fbd341efd5ae231f977385a5e3c951 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..ef41085744 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+An Operation that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-01 Thread Robert Haas

On Mon, May 1, 2017 at 12:18 AM, Amit Langote
 wrote:
> Attached updated patch.

Committed, except for this bit:

+A statement-level trigger defined on partitioned tables is fired only
+once for the table itself, not once for every table in the partitioning
+hierarchy.  However, row-level triggers of any affected leaf partitions
+will be fired.

The first sentence there has a number of issues.  Grammatically, there
is an agreement problem: trigger is singular, but partitioned table is
plural, and one trigger isn't defined across multiple tables.  It
would have to say something like "A statement-level trigger defined on
a partitioned table".  But even with that correction, it's not really
saying what you want it to say.  Nobody would expect that the same
statement-level trigger would be fired multiple times.  The issue is
whether statement-level triggers on the partitions themselves will be
fired, not the statement-level trigger on the partitioned table.
Also, if this applies to inheritance as well as partitioning, then why
mention only partitioning?  I think we might want to document
something more here, but not like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Amit Langote

On 2017/04/29 6:56, David Fetter wrote:
> On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote:
>> On 2017/04/28 7:36, David Fetter wrote:
>>> Did I notice correctly that there's no way to handle transition tables
>>> for partitions in AFTER STATEMENT triggers?
>>
>> Did you mean to ask about AFTER STATEMENT triggers defined on
>> "partitioned" tables?  Specifying transition table for them is disallowed
>> at all.
>>
>> ERROR:  "p" is a partitioned table
>> DETAIL:  Triggers on partitioned tables cannot have transition tables.
> 
> OK, I suppose.  It wasn't clear from the documentation.
> 
>> Triggers created on (leaf) partitions *do* allow specifying transition table.
> 
> That includes the upcoming "default" tables, I presume.

If a "default" table is also a "leaf" table, then yes.  A default
table/partition can also be itself a partitioned table, in which case, it
won't allow triggers that require transition tables.  AFAICS, it's the
table's being partitioned that stops it from supporting transition tables,
not whether it is a "default" partition or not.

>> Or are you asking something else altogether?
> 
> I was just fuzzy on the interactions among these features.
> 
>>> If not, I'm not suggesting that this be added at this late date, but
>>> we might want to document that.
>>
>> I don't see mentioned in the documentation that such triggers cannot be
>> defined on partitioned tables.  Is that what you are saying should be
>> documented?
> 
> Yes, but I bias toward documenting a lot, and this restriction could
> go away in some future version, which would make things more confusing
> in the long run.

Yeah, it would be a good idea to document this.

> I'm picturing a conversation in 2020 that goes
> something like this:
> 
> "On 10, you could have AFTER STATEMENT triggers on tables, foreigh
> tables, and leaf partition tables which referenced transition tables,
> but not on DEFAULT partitions.  On 11, you could on DEFAULT partition
> tables.  From 12 onward, you can have transition tables on any
> relation."

What we could document now is that partitioned tables don't allow
specifying triggers that reference transition tables.  Although, I am
wondering where this note really belongs - the partitioning chapter, the
triggers chapter or the CREATE TRIGGER reference page?  Maybe, Kevin and
Thomas have something to say about that.  If it turns out that the
partitioning chapter is a good place, here is a documentation patch.

Thanks,
Amit
>From 289b4d906abb50451392d0efe13926f710952ca0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 1 May 2017 13:46:58 +0900
Subject: [PATCH] Document transition table trigger limitation of partitioned
 tables

---
 doc/src/sgml/ddl.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 84c4f20990..5f5a956d41 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3293,7 +3293,9 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  
   
Row triggers, if necessary, must be defined on individual partitions,
-   not the partitioned table.
+   not the partitioned tables.  Also, triggers that require accessing
+   transition tables are not allowed to be defined on the partitioned
+   tables.
   
  
 
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Amit Langote

Thanks for the review.

On 2017/04/29 1:25, Robert Haas wrote:
>  On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote
>  wrote:
 By the way, code changes I made in the attached are such that a subsequent
 patch could implement firing statement-level triggers of all the tables in
 a partition hierarchy, which it seems we don't want to do.  Should then
 the code be changed to not create ResultRelInfos of all the tables but
 only the root table (the one mentioned in the command)?  You will see that
 the patch adds fields named es_nonleaf_result_relations and
 es_num_nonleaf_result_relations, whereas just es_root_result_relation
 would perhaps do, for example.
>>>
>>> It seems better not to create any ResultRelInfos that we don't
>>> actually need, so +1 for such a revision to the patch.
>>
>> OK, done.  It took a bit more work than I thought.
> 
> So, this seems weird, because rootResultRelIndex is initialized even
> when splan->partitioned_rels == NIL, but isn't actually valid in that
> case.  ExecInitModifyTable seems to think it's always valid, though.

OK, rootResultRelIndex is now set to a value >= 0 only when the node
modifies a partitioned table.  And then ExecInitModifyTable() checks if
the index is valid before initializing the root table info.

> I think the way that you've refactored fireBSTriggers and
> fireASTriggers is a bit confusing.  Instead of splitting out a
> separate function, how about just having the existing function begin
> with if (node->rootResultRelInfo) resultRelInfo =
> node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ?
> I think the way you've coded it is a holdover from the earlier design
> where you were going to call it multiple times, but now that's not
> needed.

OK, done that way.

> Looks OK, otherwise.

Attached updated patch.

Thanks,
Amit
>From 53c964f9f1fd3e27824080b0e9e5b672fa53cdf0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 +++
 src/backend/executor/execMain.c | 55 --
 src/backend/executor/nodeModifyTable.c  | 42 +
 src/backend/nodes/copyfuncs.c   |  2 +
 src/backend/nodes/outfuncs.c|  3 ++
 src/backend/nodes/readfuncs.c   |  2 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/planner.c|  3 ++
 src/backend/optimizer/plan/setrefs.c| 19 ++--
 src/include/nodes/execnodes.h   | 12 +
 src/include/nodes/plannodes.h   | 13 +-
 src/include/nodes/relation.h|  1 +
 src/test/regress/expected/triggers.out  | 81 +
 src/test/regress/sql/triggers.sql   | 70 
 14 files changed, 303 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Noah Misch

On Mon, Apr 24, 2017 at 07:43:29PM +0900, Amit Langote wrote:
> On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote:
> > I am able to create statement triggers at root partition, but these
> > triggers, not getting fired on updating partition.

> It would be great if you could check if the patches fix the issues.
> 
> Also, adding this to the PostgreSQL 10 open items list.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread David Fetter

On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote:
> On 2017/04/28 7:36, David Fetter wrote:
> > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> >> On 2017/04/27 1:52, Robert Haas wrote:
> >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >>>  wrote:
>  FWIW, I too prefer the latter, that is, fire only the parent's triggers.
>  In that case, applying only the patch 0001 will do.
> >>>
> >>> Do we need to update the documentation?
> >>
> >> Yes, I think we should.  How about as in the attached?
> >>
> >> By the way, code changes I made in the attached are such that a subsequent
> >> patch could implement firing statement-level triggers of all the tables in
> >> a partition hierarchy, which it seems we don't want to do.  Should then
> >> the code be changed to not create ResultRelInfos of all the tables but
> >> only the root table (the one mentioned in the command)?  You will see that
> >> the patch adds fields named es_nonleaf_result_relations and
> >> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> >> would perhaps do, for example.
> > 
> > Did I notice correctly that there's no way to handle transition tables
> > for partitions in AFTER STATEMENT triggers?
> 
> Did you mean to ask about AFTER STATEMENT triggers defined on
> "partitioned" tables?  Specifying transition table for them is disallowed
> at all.
> 
> ERROR:  "p" is a partitioned table
> DETAIL:  Triggers on partitioned tables cannot have transition tables.

OK, I suppose.  It wasn't clear from the documentation.

> Triggers created on (leaf) partitions *do* allow specifying transition table.

That includes the upcoming "default" tables, I presume.

> Or are you asking something else altogether?

I was just fuzzy on the interactions among these features.

> > If not, I'm not suggesting that this be added at this late date, but
> > we might want to document that.
> 
> I don't see mentioned in the documentation that such triggers cannot be
> defined on partitioned tables.  Is that what you are saying should be
> documented?

Yes, but I bias toward documenting a lot, and this restriction could
go away in some future version, which would make things more confusing
in the long run.  I'm picturing a conversation in 2020 that goes
something like this:

"On 10, you could have AFTER STATEMENT triggers on tables, foreigh
tables, and leaf partition tables which referenced transition tables,
but not on DEFAULT partitions.  On 11, you could on DEFAULT partition
tables.  From 12 onward, you can have transition tables on any
relation."

Kevin?  Thomas?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Robert Haas

 On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote
 wrote:
> It seems to me that there is no difference in behavior between
> inheritance-based and declarative partitioning as far as statement-level
> triggers are concerned (at least currently).  In both the cases, we fire
> statement-level triggers only for the table specified in the command.

OK.

>>> By the way, code changes I made in the attached are such that a subsequent
>>> patch could implement firing statement-level triggers of all the tables in
>>> a partition hierarchy, which it seems we don't want to do.  Should then
>>> the code be changed to not create ResultRelInfos of all the tables but
>>> only the root table (the one mentioned in the command)?  You will see that
>>> the patch adds fields named es_nonleaf_result_relations and
>>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>>> would perhaps do, for example.
>>
>> It seems better not to create any ResultRelInfos that we don't
>> actually need, so +1 for such a revision to the patch.
>
> OK, done.  It took a bit more work than I thought.

So, this seems weird, because rootResultRelIndex is initialized even
when splan->partitioned_rels == NIL, but isn't actually valid in that
case.  ExecInitModifyTable seems to think it's always valid, though.

I think the way that you've refactored fireBSTriggers and
fireASTriggers is a bit confusing.  Instead of splitting out a
separate function, how about just having the existing function begin
with if (node->rootResultRelInfo) resultRelInfo =
node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ?
I think the way you've coded it is a holdover from the earlier design
where you were going to call it multiple times, but now that's not
needed.

Looks OK, otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote

On 2017/04/28 7:36, David Fetter wrote:
> On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
>> On 2017/04/27 1:52, Robert Haas wrote:
>>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
>>>  wrote:
 FWIW, I too prefer the latter, that is, fire only the parent's triggers.
 In that case, applying only the patch 0001 will do.
>>>
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
>>
>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> Did I notice correctly that there's no way to handle transition tables
> for partitions in AFTER STATEMENT triggers?

Did you mean to ask about AFTER STATEMENT triggers defined on
"partitioned" tables?  Specifying transition table for them is disallowed
at all.

ERROR:  "p" is a partitioned table
DETAIL:  Triggers on partitioned tables cannot have transition tables.

Triggers created on (leaf) partitions *do* allow specifying transition table.

Or are you asking something else altogether?

> If not, I'm not suggesting that this be added at this late date, but
> we might want to document that.

I don't see mentioned in the documentation that such triggers cannot be
defined on partitioned tables.  Is that what you are saying should be
documented?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote

Hi Rajkumar,

On 2017/04/28 17:11, Rajkumar Raghuwanshi wrote:
> On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
>> Updated patch attached.
> 
> I have applied given patch, could see below behaviour with statement
> trigger.
> 
> When trying to delete value within partition range, triggers got fired
> (with zero row affected) as expected, but if trying to delete value which
> is outside of partition range (with zero row affected), No trigger fired.
> is this expected??

Good catch.

I'm afraid though that this is not a defect of this patch, but some
unrelated (maybe) bug, which affects not only the partitioned tables but
inheritance in general.

Problem is that the plan created is such that the executor has no
opportunity to fire the trigger in question, because the plan contains no
information about which table is affected by the statement.  You can see
that with inheritance.  See below:

create table foo ();
create table bar () inherits (foo);

create or replace function trig_notice() returns trigger as $$
  begin raise notice 'trigger fired'; return null; end;
$$ language plpgsql;

create trigger foo_del_before before delete on foo
  for each statement execute procedure trig_notice();

explain delete from foo where false;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

-- no trigger fired
delete from foo where false;
DELETE 0

Trigger *is* fired though, if inheritance is not used.

explain delete from only foo where false;
   QUERY PLAN
-
 Delete on foo  (cost=0.00..0.00 rows=0 width=0)
   ->  Result  (cost=0.00..0.00 rows=0 width=6)
 One-Time Filter: false
(3 rows)

delete from only foo where false;
NOTICE:  trigger fired
DELETE 0

I'm not sure how to go about fixing this, if at all.  Or to fix it at
least for partitioned tables.  Would like to hear what others think.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Rajkumar Raghuwanshi

On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Updated patch attached.
>

Hi Amit,

I have applied given patch, could see below behaviour with statement
trigger.

When trying to delete value within partition range, triggers got fired
(with zero row affected) as expected, but if trying to delete value which
is outside of partition range (with zero row affected), No trigger fired.
is this expected??

Below are steps to reproduce.

CREATE TABLE trigger_test_table (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE trigger_test_table1 PARTITION OF trigger_test_table FOR VALUES
FROM (0) to (6);
INSERT INTO trigger_test_table (a,b) SELECT i,i FROM generate_series(1,3)i;

CREATE TABLE trigger_test_statatics(TG_NAME varchar,TG_TABLE_NAME
varchar,TG_LEVEL varchar,TG_WHEN varchar, TG_OP varchar);

CREATE FUNCTION trigger_test_procedure() RETURNS TRIGGER AS $ttp$
BEGIN
INSERT INTO trigger_test_statatics SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,TG_OP;
RETURN OLD;
END;
$ttp$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_test11 AFTER DELETE ON trigger_test_table FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();
CREATE TRIGGER trigger_test12 AFTER DELETE ON trigger_test_table1 FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();

postgres=# DELETE FROM trigger_test_table WHERE a = 5;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
tg_name |   tg_table_name| tg_level  | tg_when | tg_op
++---+-+
 trigger_test11 | trigger_test_table | STATEMENT | AFTER   | DELETE
(1 row)

TRUNCATE TABLE trigger_test_statatics;

--statement trigger NOT fired, when trying to delete data outside partition
range.
postgres=# DELETE FROM trigger_test_table WHERE a = 10;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
 tg_name | tg_table_name | tg_level | tg_when | tg_op
-+---+--+-+---
(0 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote

Thanks for taking a look.

On 2017/04/28 5:24, Robert Haas wrote:
> On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
>  wrote:
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
> 
> Looks reasonable, but I was thinking you might also update the section
> which contrasts inheritance-based partitioning with declarative
> partitioning.

It seems to me that there is no difference in behavior between
inheritance-based and declarative partitioning as far as statement-level
triggers are concerned (at least currently).  In both the cases, we fire
statement-level triggers only for the table specified in the command.

Maybe, we will fix things someday so that statement-level triggers will be
fired for all the tables in a inheritance hierarchy when the root parent
is updated or deleted, but that's currently not true.  We may never
implement that behavior for declarative partitioned tables though, so
there will be a difference if and when we implement the former.

Am I missing something?

>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> It seems better not to create any ResultRelInfos that we don't
> actually need, so +1 for such a revision to the patch.

OK, done.  It took a bit more work than I thought.

Updated patch attached.

Thanks,
Amit
>From a8b584f09bc02b6c16c33e816fc12c7e443dd65c Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 +++
 src/backend/executor/execMain.c | 55 --
 src/backend/executor/nodeModifyTable.c  | 64 ++
 src/backend/nodes/copyfuncs.c   |  2 +
 src/backend/nodes/outfuncs.c|  3 ++
 src/backend/nodes/readfuncs.c   |  2 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/planner.c|  3 ++
 src/backend/optimizer/plan/setrefs.c| 19 ++--
 src/include/nodes/execnodes.h   | 12 +
 src/include/nodes/plannodes.h   | 13 +-
 src/include/nodes/relation.h|  1 +
 src/test/regress/expected/triggers.out  | 81 +
 src/test/regress/sql/triggers.sql   | 70 
 14 files changed, 323 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.

Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread David Fetter

On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> On 2017/04/27 1:52, Robert Haas wrote:
> > On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >  wrote:
> >> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
> >> In that case, applying only the patch 0001 will do.
> > 
> > Do we need to update the documentation?
> 
> Yes, I think we should.  How about as in the attached?
> 
> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

Did I notice correctly that there's no way to handle transition tables
for partitions in AFTER STATEMENT triggers?

If not, I'm not suggesting that this be added at this late date, but
we might want to document that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread Robert Haas

On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
 wrote:
>> Do we need to update the documentation?
>
> Yes, I think we should.  How about as in the attached?

Looks reasonable, but I was thinking you might also update the section
which contrasts inheritance-based partitioning with declarative
partitioning.

> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

It seems better not to create any ResultRelInfos that we don't
actually need, so +1 for such a revision to the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Amit Langote

On 2017/04/27 1:52, Robert Haas wrote:
> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
>  wrote:
>> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
>> In that case, applying only the patch 0001 will do.
> 
> Do we need to update the documentation?

Yes, I think we should.  How about as in the attached?

By the way, code changes I made in the attached are such that a subsequent
patch could implement firing statement-level triggers of all the tables in
a partition hierarchy, which it seems we don't want to do.  Should then
the code be changed to not create ResultRelInfos of all the tables but
only the root table (the one mentioned in the command)?  You will see that
the patch adds fields named es_nonleaf_result_relations and
es_num_nonleaf_result_relations, whereas just es_root_result_relation
would perhaps do, for example.

Thanks,
Amit
>From c8b785a77a2a193906967762066e03e09de80442 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH 1/2] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 -
 src/backend/executor/execMain.c | 33 -
 src/backend/executor/nodeModifyTable.c  | 66 -
 src/backend/nodes/copyfuncs.c   |  1 +
 src/backend/nodes/outfuncs.c|  1 +
 src/backend/nodes/readfuncs.c   |  1 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/setrefs.c|  4 ++
 src/include/nodes/execnodes.h   | 11 ++
 src/include/nodes/plannodes.h   |  2 +
 src/test/regress/expected/triggers.out  | 54 +++
 src/test/regress/sql/triggers.sql   | 48 
 12 files changed, 227 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.  Row-level INSTEAD OF triggers may only be
+defined on views, and fire immediately as each row in the view is
+identified as needing to be operated on.
+   
+
+   
+A statement-level trigger defined on partitioned tables is fired only
+once for the table itself, not once for every table in the partitioning
+hierarchy.  However, row-level triggers of any affected leaf partitions
+will be fired.

 

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5c12fb457d..586b396b3e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -861,18 +861,35 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 		/*
 		 * In the partitioned result relation case, lock the non-leaf result
-		 * relations too.  We don't however need ResultRelInfos for them.
+		 * relations too.  We also need ResultRelInfos so that per-statement
+		 * triggers defined on these relations can be fired.
 		 */
 		if (plannedstmt->nonleafResultRelations)
 		{
+			numResultRelations =
+

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Robert Haas

On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
 wrote:
> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
> In that case, applying only the patch 0001 will do.

Do we need to update the documentation?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Amit Khandekar

On 26 April 2017 at 00:28, Robert Haas  wrote:
> So what I'd prefer -- on
> the totally unprincipled basis that it would let us improve
> performance in the future -- if you operate on a partition directly,
> you fire the partition's triggers, but if you operate on the parent,
> only the parent's triggers fire.

I would also opt for this behaviour.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Amit Langote

Hi,

Thanks for testing.

On 2017/04/25 19:03, Rajkumar Raghuwanshi wrote:
> Thanks for looking into it. I have applied fixes and checked for triggers.
> I could see difference in behaviour of statement triggers for INSERT and
> UPDATE, for insert only root partition triggers are getting fired but for
> update root as well as child partition table triggers both getting fired.
> is this expected??

Yes, because I didn't implement anything for the insert case yet.  I posed
a question whether to fire partitions' per-statement triggers when
inserting data through the root table.

Robert replied [1] that it would be desirable to not fire partitions'
per-statement triggers if the root table is mentioned in the query; only
fire their per-row triggers if any.  It already works that way for
inserts, and applying only 0001 will get you the same for update/delete.
Patch 0002 is to enable firing partition's per-statement triggers even if
the root table is mentioned in the query, but it implemented the same only
for the update/delete cases.  If we decide that that's the right thing to
do, then I will implement the same behavior for the insert case too.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoatBYy8Hyi3cYR1rFrCkD2NM4ZLZcck4QDGvH%3DHddfDwA%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Amit Langote

On 2017/04/26 3:58, Robert Haas wrote:
> On Mon, Apr 24, 2017 at 6:43 AM, Amit Langote
>  wrote:
>> The reason it doesn't work is that we do not allocate ResultRelInfos for
>> partitioned tables (not even for the root partitioned table in the
>> update/delete cases), because the current implementation assumes they are
>> not required.  That's fine only so long as we consider that no rows are
>> inserted into them, no indexes need to be updated, and that no row-level
>> triggers are to be fired.  But it misses the fact that we do allow
>> statement-level triggers on partitioned tables.  So, we should fix things
>> such that ResultRelInfos are allocated so that any statement-level
>> triggers are fired. But there are following questions to consider:
>>
>> 1. Should we consider only the root partitioned table or all partitioned
>>tables in a given hierarchy, including the child partitioned tables?
>>Note that this is related to a current limitation of updating/deleting
>>inheritance hierarchies that we do not currently fire per-statement
>>triggers of the child tables.  See the TODO item in wiki:
>>https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
>>statement-level triggers are defined on a parent table, have them fire
>>only on the parent table, and fire child table triggers only where
>>appropriate"
>>
>> 2. Do we apply the same to inserts on the partitioned tables?  Since
>>insert on a partitioned table implicitly affects its partitions, one
>>might say that we would need to fire per-statement insert triggers of
>>all the partitions.
> 
> It seems to me that it doesn't make a lot of sense to fire the
> triggers for some tables involved in the hierarchy and not others.  I
> suppose the consistent thing to do here is to make sure that we fire
> the statement triggers for all tables in the partitioning hierarchy
> for all operations (insert, update, delete, etc.).
> 
> TBH, I don't like that very much.  I'd rather fire the triggers only
> for the table actually named in the query and skip all the others,
> mostly because it seems like it would be faster and less likely to
> block future optimizations; eventually, I'd like to consider not even
> locking the children we're not touching, but that's not going to work
> if we have to check them all for triggers.  So what I'd prefer -- on
> the totally unprincipled basis that it would let us improve
> performance in the future -- if you operate on a partition directly,
> you fire the partition's triggers, but if you operate on the parent,
> only the parent's triggers fire.

FWIW, I too prefer the latter, that is, fire only the parent's triggers.
In that case, applying only the patch 0001 will do.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Robert Haas

On Mon, Apr 24, 2017 at 6:43 AM, Amit Langote
 wrote:
> The reason it doesn't work is that we do not allocate ResultRelInfos for
> partitioned tables (not even for the root partitioned table in the
> update/delete cases), because the current implementation assumes they are
> not required.  That's fine only so long as we consider that no rows are
> inserted into them, no indexes need to be updated, and that no row-level
> triggers are to be fired.  But it misses the fact that we do allow
> statement-level triggers on partitioned tables.  So, we should fix things
> such that ResultRelInfos are allocated so that any statement-level
> triggers are fired. But there are following questions to consider:
>
> 1. Should we consider only the root partitioned table or all partitioned
>tables in a given hierarchy, including the child partitioned tables?
>Note that this is related to a current limitation of updating/deleting
>inheritance hierarchies that we do not currently fire per-statement
>triggers of the child tables.  See the TODO item in wiki:
>https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
>statement-level triggers are defined on a parent table, have them fire
>only on the parent table, and fire child table triggers only where
>appropriate"
>
> 2. Do we apply the same to inserts on the partitioned tables?  Since
>insert on a partitioned table implicitly affects its partitions, one
>might say that we would need to fire per-statement insert triggers of
>all the partitions.

It seems to me that it doesn't make a lot of sense to fire the
triggers for some tables involved in the hierarchy and not others.  I
suppose the consistent thing to do here is to make sure that we fire
the statement triggers for all tables in the partitioning hierarchy
for all operations (insert, update, delete, etc.).

TBH, I don't like that very much.  I'd rather fire the triggers only
for the table actually named in the query and skip all the others,
mostly because it seems like it would be faster and less likely to
block future optimizations; eventually, I'd like to consider not even
locking the children we're not touching, but that's not going to work
if we have to check them all for triggers.  So what I'd prefer -- on
the totally unprincipled basis that it would let us improve
performance in the future -- if you operate on a partition directly,
you fire the partition's triggers, but if you operate on the parent,
only the parent's triggers fire.

How heretical is that idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Rajkumar Raghuwanshi

On Mon, Apr 24, 2017 at 4:13 PM, Amit Langote  wrote:

> Hi Rajkumar,
>
> It would be great if you could check if the patches fix the issues.
>

Hi Amit,

Thanks for looking into it. I have applied fixes and checked for triggers.
I could see difference in behaviour of statement triggers for INSERT and
UPDATE, for insert only root partition triggers are getting fired but for
update root as well as child partition table triggers both getting fired.
is this expected??

Below are steps to reproduce.

CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (6);
CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (6) to (11);
INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,7)i;

CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
varchar,TG_WHEN varchar,a_old int,a_new int,b_old int,b_new int);
CREATE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $ttp$
 BEGIN
 IF (TG_OP = 'INSERT') THEN
  IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger
SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF;
  IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NEW.a,NULL,NEW.b; END IF;
  RETURN NEW;
  END IF;
 IF (TG_OP = 'UPDATE') THEN
IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger
SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF;
IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,OLD.a,NEW.a,OLD.b,NEW.b; END IF;
 RETURN NEW;
 END IF;
 END;
$ttp$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_test11 AFTER INSERT OR UPDATE ON pt FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test12 AFTER INSERT OR UPDATE ON pt1 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test13 AFTER INSERT OR UPDATE ON pt2 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test21 BEFORE INSERT OR UPDATE ON pt FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test22 BEFORE INSERT OR UPDATE ON pt1 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test23 BEFORE INSERT OR UPDATE ON pt2 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test32 AFTER INSERT OR UPDATE ON pt1 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test33 AFTER INSERT OR UPDATE ON pt2 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test42 BEFORE INSERT OR UPDATE ON pt1 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test43 BEFORE INSERT OR UPDATE ON pt2 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();

postgres=# INSERT INTO pt (a,b) VALUES (8,8);
INSERT 0 1
postgres=# SELECT * FROM pt_trigger;
tg_name | tg_table_name | tg_level  | tg_when | a_old | a_new |
b_old | b_new
+---+---+-+---+---+---+---
 trigger_test21 | pt| STATEMENT | BEFORE  |   |
|   |
 trigger_test43 | pt2   | ROW   | BEFORE  |   | 8
|   | 8
 trigger_test33 | pt2   | ROW   | AFTER   |   | 8
|   | 8
 trigger_test11 | pt| STATEMENT | AFTER   |   |
|   |
(4 rows)

postgres=# TRUNCATE TABLE pt_trigger;
TRUNCATE TABLE
postgres=# UPDATE pt SET a = 2 WHERE a = 1;
UPDATE 1
postgres=# SELECT * FROM pt_trigger;
tg_name | tg_table_name | tg_level  | tg_when | a_old | a_new |
b_old | b_new
+---+---+-+---+---+---+---
 trigger_test21 | pt| STATEMENT | BEFORE  |   |
|   |
 trigger_test22 | pt1   | STATEMENT | BEFORE  |   |
|   |
 trigger_test42 | pt1   | ROW   | BEFORE  | 1 | 2 |
1 | 1
 trigger_test32 | pt1   | ROW   | AFTER   | 1 | 2 |
1 | 1
 trigger_test11 | pt| STATEMENT | AFTER   |   |
|   |
 trigger_test12 | pt1   | STATEMENT | AFTER   |   |
|   |
(6 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB

Re: [HACKERS] Declarative partitioning - another take

2017-04-24 Thread Amit Langote

Hi Rajkumar,

Thanks for testing and the report.

On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote:
> Hi,
> 
> I have observed below with the statement triggers.
> 
> I am able to create statement triggers at root partition, but these
> triggers, not getting fired on updating partition.
> 
> CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
> CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7);
> CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11);
> INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i;
> CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
> varchar,TG_WHEN varchar);
> CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$
> BEGIN
> IF (TG_OP = 'UPDATE') THEN
> INSERT INTO pt_trigger SELECT
> TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN;
> RETURN NEW;
> END IF;
> RETURN NULL;
> END;
> $pttg$ LANGUAGE plpgsql;
> CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> 
> postgres=# UPDATE pt SET a = 2 WHERE a = 1;
> UPDATE 1
> postgres=# SELECT * FROM pt_trigger ORDER BY 1;
>  tg_name | tg_table_name | tg_level | tg_when
> -+---+--+-
> (0 rows)
> 
> no statement level trigger fired in this case, is this expected behaviour??

I think we discussed this during the development and decided to allow
per-statement triggers on partitioned tables [1], but it seems it doesn't
quite work for update/delete as your example shows.  You can however see
that per-statement triggers of the root partitioned table *are* fired in
the case of insert.

The reason it doesn't work is that we do not allocate ResultRelInfos for
partitioned tables (not even for the root partitioned table in the
update/delete cases), because the current implementation assumes they are
not required.  That's fine only so long as we consider that no rows are
inserted into them, no indexes need to be updated, and that no row-level
triggers are to be fired.  But it misses the fact that we do allow
statement-level triggers on partitioned tables.  So, we should fix things
such that ResultRelInfos are allocated so that any statement-level
triggers are fired.  But there are following questions to consider:

1. Should we consider only the root partitioned table or all partitioned
   tables in a given hierarchy, including the child partitioned tables?
   Note that this is related to a current limitation of updating/deleting
   inheritance hierarchies that we do not currently fire per-statement
   triggers of the child tables.  See the TODO item in wiki:
   https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
   statement-level triggers are defined on a parent table, have them fire
   only on the parent table, and fire child table triggers only where
   appropriate"

2. Do we apply the same to inserts on the partitioned tables?  Since
   insert on a partitioned table implicitly affects its partitions, one
   might say that we would need to fire per-statement insert triggers of
   all the partitions.

Meanwhile, attached is a set of patches to fix this.  Descriptions follow:

0001: fire per-statement triggers of the partitioned table mentioned in a
  given update or delete statement

0002: fire per-statement triggers of the child tables during update/delete
  of inheritance hierarchies (including the partitioned table case)

Depending on the answer to 2 above, we can arrange for the per-statement
triggers of all the partitions to be fired upon insert into the
partitioned table.

> but if i am creating triggers on leaf partition, trigger is getting fired.
> 
> CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> 
> postgres=# UPDATE pt SET a = 5 WHERE a = 4;
> UPDATE 1
> postgres=# SELECT * FROM pt_trigger ORDER BY 1;
>tg_name| tg_table_name | tg_level  | tg_when
> --+---+---+-
>  pt_trigger_after_p1  | pt1   | STATEMENT | AFTER
>  pt_trigger_before_p1 | pt1   | STATEMENT | BEFORE
> (2 rows)

Actually, this works only by accident (with the current implementation,
the *first* partition's triggers will get fired).  If you create another
partition, its per-statement triggers won't get fired.  The patches
described above fixes that.

It would be great if you could check if the patches fix the issues.

Also, adding this to the PostgreSQL 10 open items list.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/8e05817e-14c8-13e4-502b-e440adb24258%40lab.ntt.co.jp
>From 9c7543615ccb05c004591a9176f818413df7ea9d Mon Sep 17 00:00:00 2001
From: amit

Re: [HACKERS] Declarative partitioning - another take

2017-04-21 Thread Rajkumar Raghuwanshi

Hi,

I have observed below with the statement triggers.

I am able to create statement triggers at root partition, but these
triggers, not getting fired on updating partition.

CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7);
CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11);
INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i;
CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
varchar,TG_WHEN varchar);
CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$
BEGIN
IF (TG_OP = 'UPDATE') THEN
INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN;
RETURN NEW;
END IF;
RETURN NULL;
END;
$pttg$ LANGUAGE plpgsql;
CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 2 WHERE a = 1;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
 tg_name | tg_table_name | tg_level | tg_when
-+---+--+-
(0 rows)

no statement level trigger fired in this case, is this expected behaviour??

but if i am creating triggers on leaf partition, trigger is getting fired.

CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 5 WHERE a = 4;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
   tg_name| tg_table_name | tg_level  | tg_when
--+---+---+-
 pt_trigger_after_p1  | pt1   | STATEMENT | AFTER
 pt_trigger_before_p1 | pt1   | STATEMENT | BEFORE
(2 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Maksim Milyutin


On 07.04.2017 13:05, Etsuro Fujita wrote:

On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.


Yes, you can start to work on this, I'll join later as a reviewer.



--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Etsuro Fujita


On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-23 Thread Yugo Nagata

Hi Amit,

On Thu, 23 Feb 2017 16:29:32 +0900
Amit Langote  wrote:

> Thanks for taking care of that.
> 
> + *   PartitionRoot   relation descriptor for parent 
> relation
> 
> Maybe: relation descriptor for root parent relation

This seems better. Patch is updated.

Thanks,
-- 
Yugo Nagata 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..af3367a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -326,6 +326,7 @@ typedef struct JunkFilter
  *		onConflictSetWhere		list of ON CONFLICT DO UPDATE exprs (qual)
  *		PartitionCheck			partition check expression
  *		PartitionCheckExpr		partition check expression state
+ *		PartitionRoot			relation descriptor for root parent relation
  * 
  */
 typedef struct ResultRelInfo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-22 Thread Amit Langote

Nagata-san,

On 2017/02/23 16:17, Yugo Nagata wrote:
> Hi,
> 
> I found that a comment for PartitionRoot in ResultRelInfo is missing.
> Although this is trivial, since all other members have comments, I
> think it is needed. Attached is the patch to fix it.

Thanks for taking care of that.

+ * PartitionRoot   relation descriptor for parent 
relation

Maybe: relation descriptor for root parent relation

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-22 Thread Yugo Nagata

Hi,

I found that a comment for PartitionRoot in ResultRelInfo is missing.
Although this is trivial, since all other members have comments, I
think it is needed. Attached is the patch to fix it.

Regards,
Yugo Nagata

On Tue, 27 Dec 2016 17:59:05 +0900
Amit Langote  wrote:

> On 2016/12/26 19:46, Amit Langote wrote:
> > (Perhaps, the following should be its own new thread)
> > 
> > I noticed that ExecProcessReturning() doesn't work properly after tuple
> > routing (example shows how returning tableoid currently fails but I
> > mention some other issues below):
> > 
> > create table p (a int, b int) partition by range (a);
> > create table p1 partition of p for values from (1) to (10);
> > insert into p values (1) returning tableoid::regclass, *;
> >  tableoid | a | b
> > --+---+---
> >  -| 1 |
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > I tried to fix that in 0007 to get:
> > 
> > insert into p values (1) returning tableoid::regclass, *;
> >  tableoid | a | b
> > --+---+---
> >  p| 1 |
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > But I think it *may* be wrong to return the root table OID for tuples
> > inserted into leaf partitions, because with select we get partition OIDs:
> > 
> > select tableoid::regclass, * from p;
> >  tableoid | a | b
> > --+---+---
> >  p1   | 1 |
> > (1 row)
> > 
> > If so, that means we should build the projection info (corresponding to
> > the returning list) for each target partition somehow.  ISTM, that's going
> > to have to be done within the planner by appropriate inheritance
> > translation of the original returning targetlist.
> 
> Turns out getting the 2nd result may not require planner tweaks after all.
> Unless I'm missing something, translation of varattnos of the RETURNING
> target list can be done as late as ExecInitModifyTable() for the insert
> case, unlike update/delete (which do require planner's attention).
> 
> I updated the patch 0007 to implement the same, including the test. While
> doing that, I realized map_partition_varattnos introduced in 0003 is
> rather restrictive in its applicability, because it assumes varno = 1 for
> the expressions it accepts as input for the mapping.  Mapping returning
> (target) list required modifying map_partition_varattnos to accept
> target_varno as additional argument.  That way, we can map arbitrary
> expressions from the parent attributes numbers to partition attribute
> numbers for expressions not limited to partition constraints.
> 
> Patches 0001 to 0006 unchanged.
> 
> Thanks,
> Amit


-- 
Yugo Nagata 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..845bdf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -326,6 +326,7 @@ typedef struct JunkFilter
  *		onConflictSetWhere		list of ON CONFLICT DO UPDATE exprs (qual)
  *		PartitionCheck			partition check expression
  *		PartitionCheckExpr		partition check expression state
+ *		PartitionRoot			relation descriptor for parent relation
  * 
  */
 typedef struct ResultRelInfo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-14 Thread Robert Haas

On Fri, Feb 10, 2017 at 12:54 AM, Amit Langote
 wrote:
> On 2017/02/09 15:22, amul sul wrote:
>> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
>> following test is already covered in alter_table.sql @ Line # 1918,
>> instead of this kindly add test for list_partition:
>>
>>  77 +-- cannot drop NOT NULL constraint of a range partition key column
>>  78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
>>  79 +
>
> Thanks for the review!  You're right.  Updated patch attached.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-09 Thread Amit Langote

On 2017/02/09 15:22, amul sul wrote:
> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
> following test is already covered in alter_table.sql @ Line # 1918,
> instead of this kindly add test for list_partition:
> 
>  77 +-- cannot drop NOT NULL constraint of a range partition key column
>  78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
>  79 +

Thanks for the review!  You're right.  Updated patch attached.

Thanks,
Amit
>From a4335048e92462fb55fa6db0d830c7c066cd7011 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 10 Feb 2017 14:52:18 +0900
Subject: [PATCH] Check partition strategy in ATExecDropNotNull

We should prevent dropping the NOT NULL constraint on a column only
if the column is in the *range* partition key (as the error message
also says).  Add a regression test while at it.

Reported by: Amul Sul
Patch by: Amit Langote
Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 22 +-
 src/test/regress/expected/alter_table.out |  4 
 src/test/regress/sql/alter_table.sql  |  5 +
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 37a4c4a3d6..f33aa70da6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
-		int			partnatts = get_partition_natts(key),
-	i;
 
-		for (i = 0; i < partnatts; i++)
+		if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE)
 		{
-			AttrNumber	partattnum = get_partition_col_attnum(key, i);
+			int			partnatts = get_partition_natts(key),
+		i;
 
-			if (partattnum == attnum)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-		 errmsg("column \"%s\" is in range partition key",
-colName)));
+			for (i = 0; i < partnatts; i++)
+			{
+AttrNumber	partattnum = get_partition_col_attnum(key, i);
+
+if (partattnum == attnum)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			 errmsg("column \"%s\" is in range partition key",
+	colName)));
+			}
 		}
 	}
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index d8e7b61294..b0e80a7788 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3029,6 +3029,10 @@ ERROR:  cannot alter type of column referenced in partition key expression
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 ERROR:  column "a" is in range partition key
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 1f551ec53c..7513769359 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1917,6 +1917,11 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5);
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
+
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul

About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
following test is already covered in alter_table.sql @ Line # 1918,
instead of this kindly add test for list_partition:

 77 +-- cannot drop NOT NULL constraint of a range partition key column
 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
 79 +

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread Amit Langote

On 2017/02/08 21:20, amul sul wrote:
> Regarding following code in ATExecDropNotNull function, I don't see
> any special check for RANGE partitioned, is it intended to have same
> restriction for LIST partitioned too, I guess not?
> 
>   /*
>  * If the table is a range partitioned table, check that the column is not
>  * in the partition key.
>  */
> if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> {
> PartitionKey key = RelationGetPartitionKey(rel);
> int partnatts = get_partition_natts(key),
> i;
> 
> for (i = 0; i < partnatts; i++)
> {
> AttrNumber  partattnum = get_partition_col_attnum(key, i);
> 
> if (partattnum == attnum)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>  errmsg("column \"%s\" is in range partition key",
> colName)));
> }
> }

Good catch!  Seems like an oversight, attached fixes it.

Thanks,
Amit
>From b84ac63b6b4acd19a09e8507cf199db127c06d71 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 9 Feb 2017 10:31:58 +0900
Subject: [PATCH] Check partition strategy in ATExecDropNotNull

We should prevent dropping the NOT NULL constraint on a column only
if the column is in the *range* partition key (as the error message
also says).  Add a regression test while at it.

Reported by: Amul Sul
Patch by: Amit Langote
Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 22 +-
 src/test/regress/expected/alter_table.out |  3 +++
 src/test/regress/sql/alter_table.sql  |  3 +++
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 37a4c4a3d6..f33aa70da6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
-		int			partnatts = get_partition_natts(key),
-	i;
 
-		for (i = 0; i < partnatts; i++)
+		if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE)
 		{
-			AttrNumber	partattnum = get_partition_col_attnum(key, i);
+			int			partnatts = get_partition_natts(key),
+		i;
 
-			if (partattnum == attnum)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-		 errmsg("column \"%s\" is in range partition key",
-colName)));
+			for (i = 0; i < partnatts; i++)
+			{
+AttrNumber	partattnum = get_partition_col_attnum(key, i);
+
+if (partattnum == attnum)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			 errmsg("column \"%s\" is in range partition key",
+	colName)));
+			}
 		}
 	}
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index d8e7b61294..e6d45fbf9c 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3330,6 +3330,9 @@ ALTER TABLE list_parted2 DROP COLUMN b;
 ERROR:  cannot drop column named in partition key
 ALTER TABLE list_parted2 ALTER COLUMN b TYPE text;
 ERROR:  cannot alter type of column named in partition key
+-- cannot drop NOT NULL constraint of a range partition key column
+ALTER TABLE range_parted ALTER a DROP NOT NULL;
+ERROR:  column "a" is in range partition key
 -- cleanup: avoid using CASCADE
 DROP TABLE list_parted, part_1;
 DROP TABLE list_parted2, part_2, part_5, part_5_a;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 1f551ec53c..12edb8b3ba 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2189,6 +2189,9 @@ ALTER TABLE part_2 INHERIT inh_test;
 ALTER TABLE list_parted2 DROP COLUMN b;
 ALTER TABLE list_parted2 ALTER COLUMN b TYPE text;
 
+-- cannot drop NOT NULL constraint of a range partition key column
+ALTER TABLE range_parted ALTER a DROP NOT NULL;
+
 -- cleanup: avoid using CASCADE
 DROP TABLE list_parted, part_1;
 DROP TABLE list_parted2, part_2, part_5, part_5_a;
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul

Hi Amit,

Regarding following code in ATExecDropNotNull function, I don't see
any special check for RANGE partitioned, is it intended to have same
restriction for LIST partitioned too, I guess not?

  /*
 * If the table is a range partitioned table, check that the column is not
 * in the partition key.
 */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
PartitionKey key = RelationGetPartitionKey(rel);
int partnatts = get_partition_natts(key),
i;

for (i = 0; i < partnatts; i++)
{
AttrNumber  partattnum = get_partition_col_attnum(key, i);

if (partattnum == attnum)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 errmsg("column \"%s\" is in range partition key",
colName)));
}
}

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-03 Thread Robert Haas

On Mon, Jan 30, 2017 at 4:42 PM, Peter Eisentraut
 wrote:
> On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
>> The documentation available at
>> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
>> does not make it clear that the lower bound of a range partition is
>> always inclusive and the higher one is exclusive. I think a note in
>> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
>> would be helpful.
>
> Hmm.  I see the practical use of that, but I think this is going to be a
> source of endless confusion.  Can we make that a bit clearer in the
> syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

I am not in favor of adding mandatory noise words to the syntax; it's
fairly verbose already.  I think it would be reasonable to eventually
consider supporting optional keywords to allow multiple behaviors, but
I don't think that the usefulness of that has been so firmly
established that we should do it right this minute.  I think there are
a heck of a lot of other things about this partitioning implementation
that are more urgently in need of improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-02 Thread Amit Langote

On 2017/01/31 6:42, Peter Eisentraut wrote:
> On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
>> The documentation available at
>> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
>> does not make it clear that the lower bound of a range partition is
>> always inclusive and the higher one is exclusive. I think a note in
>> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
>> would be helpful.
> 
> Hmm.  I see the practical use of that, but I think this is going to be a
> source of endless confusion.  Can we make that a bit clearer in the
> syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

The decision not to make that configurable with INCLUSIVE/EXCLUSIVE syntax
was deliberate.  To summarize, we can start with a default configuration
catering to most practical cases (that is, inclusive lower and exclusive
upper bounds) and documenting so (not done yet, which I will post a doc
patch today for).  If it turns out that there is some demand for making
that configurable, we can later add the code to handle that internally
plus the syntax.  But *starting* with that syntax means we have to
potentially needlessly carry the code to handle seldom used cases that
could not be made as efficient as it is now with all lower bounds being
inclusive and upper bounds exclusive.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZou4ApEvC_nfhOxsi5G4SoD_evwNaiYn60ZcJ4XB_-QQ%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-30 Thread Peter Eisentraut

On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Hmm.  I see the practical use of that, but I think this is going to be a
source of endless confusion.  Can we make that a bit clearer in the
syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

Hi Ashutosh,

On 2017/01/25 14:54, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Yes, I'm working on a documentation patch for that and a few other things
such as revising the Partitioning chapter.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Ashutosh Bapat

The documentation available at
https://www.postgresql.org/docs/devel/static/sql-createtable.html,
does not make it clear that the lower bound of a range partition is
always inclusive and the higher one is exclusive. I think a note in
section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
would be helpful.

On Wed, Jan 25, 2017 at 7:11 AM, Amit Langote
 wrote:
> On 2017/01/25 5:55, Robert Haas wrote:
>> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>>  wrote:
>>> [ new patches ]
>>
>> Committed 0001 and 0002.  See my earlier email for comments on 0003.
>
> It seems patches for all the issues mentioned in this thread so far,
> except 0003 (I just sent an updated version in another email), have been
> committed.  Thanks a lot for your time.  I will create new threads for any
> more patches from here on.
>
> Regards,
> Amit
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

On 2017/01/25 5:55, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>> [ new patches ]
> 
> Committed 0001 and 0002.  See my earlier email for comments on 0003.

It seems patches for all the issues mentioned in this thread so far,
except 0003 (I just sent an updated version in another email), have been
committed.  Thanks a lot for your time.  I will create new threads for any
more patches from here on.

Regards,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

On 2017/01/25 2:56, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>>> But I wonder why we don't instead just change this function to
>>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>>> comparing the type OIDs is to find out whether the table-has-OIDs
>>> setting matches, we could instead test that directly and avoid needing
>>> to pass an extra argument.  I wonder if there's some other reason this
>>> code is there which is not documented in the comment...
>>
>> With the following patch, regression tests run fine:
>>
>>   if (indesc->natts == outdesc->natts &&
>> - indesc->tdtypeid == outdesc->tdtypeid)
>> + indesc->tdhasoid != outdesc->tdhasoid)
>>  {
>>
>> If checking tdtypeid (instead of tdhasoid directly) has some other
>> consideration, I'd would have seen at least some tests broken by this
>> change.  So, if we are to go with this, I too prefer it over my previous
>> proposal to add an argument to convert_tuples_by_name().  Attached 0003
>> implements the above approach.
> 
> I think this is not quite right.  First, the patch compares the
> tdhasoid status with != rather than ==, which would have the effect of
> saying that we can skip conversion of the has-OID statuses do NOT
> match.  That can't be right.

You're right.

> Second, I believe that the comments
> imply that conversion should be done if *either* tuple has OIDs.  I
> believe that's because whoever wrote this comment thought that we
> needed to replace the OID if the tuple already had one, which is what
> do_convert_tuple would do.  I'm not sure whether that's really
> necessary, but we're less likely to break anything if we preserve the
> existing behavior, and I don't think we lose much from doing so
> because few user tables will have OIDs.  So I would change this test
> to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
> !outdesc->tdhasoid), and I'd revise the one in
> convert_tuples_by_position() to match.  Then I think it's much clearer
> that we're just optimizing what's there already, not changing the
> behavior.

Agreed.  Updated patch attached.

Thanks,
Amit
>From c1fa4b9f04f328b8df54ef487ee9d46f5978e0de Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 17:44:14 +0900
Subject: [PATCH] Avoid tuple coversion in common partitioning cases

Currently, the tuple conversion is performed after a tuple is routed,
even if the attributes of a target leaf partition map one-to-one with
those of the root table, which is wasteful.  Avoid that by making
convert_tuples_by_name() return a NULL map for such cases.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/access/common/tupconvert.c | 18 ++
 src/backend/catalog/partition.c|  3 +--
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index b17ceafa6e..a4012525d8 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -138,12 +138,13 @@ convert_tuples_by_position(TupleDesc indesc,
 		   nincols, noutcols)));
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		for (i = 0; i < n; i++)
 		{
@@ -214,12 +215,13 @@ convert_tuples_by_name(TupleDesc indesc,
 	attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		same = true;
 		for (i = 0; i < n; i++)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7be60529c5..4bcef58763 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1700,12 +1700,11 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			return -1;
 		}
 
-		if (myslot != NULL)
+		if (myslot != NULL && map != NULL)
 		{

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

Hi Keith,

On 2017/01/20 12:40, Keith Fiske wrote:
> So testing things out in pg_partman for native sub-partitioning and ran
> into what is a bug for me that I know I have to fix, but I'm curious if
> this can be prevented in the first place within the native partitioning
> code itself. The below shows a sub-partitioning set where the sub-partition
> has a constraint range that is outside of the range of its parent. If the
> columns were different I could see where this would be allowed, but the
> same column is used throughout the levels of sub-partitioning.
> Understandable if that may be too complex to check for, but figured I'd
> bring it up as something I accidentally ran into in case you see an easy
> way to prevent it.

This was discussed.  See Robert's response (2nd part of):
https://www.postgresql.org/message-id/CA%2BTgmoaQABrsLQK4ms_4NiyavyJGS-b6ZFkZBBNC%2B-P5DjJNFA%40mail.gmail.com

In short, defining partitions across different levels such that the data
user intended to insert into the table (the part of the sub-partition's
range that doesn't overlap with its parent's) couldn't be, that's an
operator error.  It's like adding contradictory check constraints to the
table:

create table foo (a int check (a > 0 and a < 0));
insert into foo values (1);
ERROR:  new row for relation "foo" violates check constraint "foo_a_check"
DETAIL:  Failing row contains (1).

One (perhaps the only) thing that could be done is to warn users to
prevent this kind of mistake through documentation.  Trying to do anything
else in the core partitioning code is making it too complicated for not
much benefit (also see Robert's last line, :)).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Robert Haas

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
> [ new patches ]

Committed 0001 and 0002.  See my earlier email for comments on 0003.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Robert Haas

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
>> But I wonder why we don't instead just change this function to
>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>> comparing the type OIDs is to find out whether the table-has-OIDs
>> setting matches, we could instead test that directly and avoid needing
>> to pass an extra argument.  I wonder if there's some other reason this
>> code is there which is not documented in the comment...
>
> With the following patch, regression tests run fine:
>
>   if (indesc->natts == outdesc->natts &&
> - indesc->tdtypeid == outdesc->tdtypeid)
> + indesc->tdhasoid != outdesc->tdhasoid)
>  {
>
> If checking tdtypeid (instead of tdhasoid directly) has some other
> consideration, I'd would have seen at least some tests broken by this
> change.  So, if we are to go with this, I too prefer it over my previous
> proposal to add an argument to convert_tuples_by_name().  Attached 0003
> implements the above approach.

I think this is not quite right.  First, the patch compares the
tdhasoid status with != rather than ==, which would have the effect of
saying that we can skip conversion of the has-OID statuses do NOT
match.  That can't be right.  Second, I believe that the comments
imply that conversion should be done if *either* tuple has OIDs.  I
believe that's because whoever wrote this comment thought that we
needed to replace the OID if the tuple already had one, which is what
do_convert_tuple would do.  I'm not sure whether that's really
necessary, but we're less likely to break anything if we preserve the
existing behavior, and I don't think we lose much from doing so
because few user tables will have OIDs.  So I would change this test
to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
!outdesc->tdhasoid), and I'd revise the one in
convert_tuples_by_position() to match.  Then I think it's much clearer
that we're just optimizing what's there already, not changing the
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-22 Thread Amit Langote

On 2017/01/21 6:29, Robert Haas wrote:
> On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund  wrote:
>> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>>> Committed.
>>
>> One of the patches around this topic committed recently seems to cause
>> valgrind failures like
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02
>> :
> 
> Tom Lane independently reported this same issue.  I believe I've fixed
> it.  Sorry for not noticing and crediting this report also.

Thanks Robert for looking at this and committing the fix.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-20 Thread Robert Haas

On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund  wrote:
> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>> Committed.
>
> One of the patches around this topic committed recently seems to cause
> valgrind failures like
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02
> :

Tom Lane independently reported this same issue.  I believe I've fixed
it.  Sorry for not noticing and crediting this report also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

Hi Andres,

On 2017/01/20 15:15, Andres Freund wrote:
> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>> Committed.
> 
> One of the patches around this topic committed recently seems to cause
> valgrind failures like
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02
> :
> ==24969== Conditional jump or move depends on uninitialised value(s)
> ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
> ==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
> ==24969==by 0x536643: partition_bounds_equal (partition.c:627)
> ==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
> ==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==  Uninitialised value was created by a heap allocation
> ==24969==at 0x85AA83: palloc (mcxt.c:914)
> ==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
> ==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
> ==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
> ==24969== 
> ==24969== VALGRINDERROR-END

Thanks for the report.  This being my first time reading a valgrind report
on buildfarm, is it correct to to assume that the command immediately
preceding VALGRINDERROR-BEGIN is what triggered the failure?

... LOG:  statement: truncate list_parted;
==24969== VALGRINDERROR-BEGIN
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)

So in this case: truncate list_parted?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Andres Freund

On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
> Committed.

One of the patches around this topic committed recently seems to cause
valgrind failures like
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02
:
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
==24969==by 0x536643: partition_bounds_equal (partition.c:627)
==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==  Uninitialised value was created by a heap allocation
==24969==at 0x85AA83: palloc (mcxt.c:914)
==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
==24969== 
==24969== VALGRINDERROR-END


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Keith Fiske

So testing things out in pg_partman for native sub-partitioning and ran
into what is a bug for me that I know I have to fix, but I'm curious if
this can be prevented in the first place within the native partitioning
code itself. The below shows a sub-partitioning set where the sub-partition
has a constraint range that is outside of the range of its parent. If the
columns were different I could see where this would be allowed, but the
same column is used throughout the levels of sub-partitioning.
Understandable if that may be too complex to check for, but figured I'd
bring it up as something I accidentally ran into in case you see an easy
way to prevent it.

This was encountered as of commit ba61a04bc7fefeee03416d9911eb825c4897c223.

keith@keith=# \d+ partman_test.time_taptest_table
  Table "partman_test.time_taptest_table"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2015-01-01 00:00:00-05') TO ('2016-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2016 FOR VALUES FROM
('2016-01-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2017 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2018-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2018 FOR VALUES FROM
('2018-01-01 00:00:00-05') TO ('2019-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2019 FOR VALUES FROM
('2019-01-01 00:00:00-05') TO ('2020-01-01 00:00:00-05')

keith@keith=# \d+ partman_test.time_taptest_table_p2015
   Table "partman_test.time_taptest_table_p2015"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('2015-01-01
00:00:00-05') TO ('2016-01-01 00:00:00-05')
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015_p2016_11 FOR VALUES FROM
('2016-11-01 00:00:00-04') TO ('2016-12-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2016_12 FOR VALUES FROM
('2016-12-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_01 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2017-02-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_02 FOR VALUES FROM
('2017-02-01 00:00:00-05') TO ('2017-03-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_03 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

keith@keith=# \d+ partman_test.time_taptest_table_p2015_p2017_03
   Table
"partman_test.time_taptest_table_p2015_p2017_03"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

On 2017/01/20 4:18, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote wrote:
>> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>>
>> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
>> it's possible for a different TupleTableSlot to be used for partitioned
>> tables at successively lower levels.  If we do end up changing the slot
>> from the original, we must update ecxt_scantuple to point to the new one
>> for partition key of the tuple to be computed correctly.
>>
>> Last posted here:
>> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp
> 
> Why does FormPartitionKeyDatum need this?  Could that be documented in
> a comment in here someplace, perhaps the header comment to
> FormPartitionKeyDatum?

FormPartitionKeyDatum() computes partition key expressions (if any) and
the expression evaluation machinery expects ecxt_scantuple to point to the
tuple to extract attribute values from.

FormPartitionKeyDatum() already has a tiny comment, which it seems is the
only thing we could say here about this there:

* the ecxt_scantuple slot of estate's per-tuple expr context must point to
* the heap tuple passed in.

In get_partition_for_tuple() which calls FormPartitionKeyDatum(), the
patch adds this comment (changed it a little from the last version):

+ /*
+  * Extract partition key from tuple. Expression evaluation machinery
+  * that FormPartitionKeyDatum() invokes expects ecxt_scantuple to
+  * point to the correct tuple slot.  The slot might have changed from
+  * what was used for the parent table if the table of the current
+  * partitioning level has different tuple descriptor from the parent.
+  * So update ecxt_scantuple accordingly.
+  */
+ ecxt->ecxt_scantuple = slot;
FormPartitionKeyDatum(parent, slot, estate, values, isnull);

It says why we need to change which slot ecxt_scantuple points to.

>> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>>
>> Automatically updatable views failed to handle partitioned tables.
>> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
>> the WCO expressions having been suitably converted for each partition
>> (think applying map_partition_varattnos to Vars in the WCO expressions
>> just like with partition constraint expressions).
> 
> The changes to execMain.c contain a hunk which has essentially the
> same code twice.  That looks like a mistake.  Also, the patch doesn't
> compile because convert_tuples_by_name() takes 3 arguments, not 4.

Actually, I realized that and sent the updated version [1] of this patch
that fixed this issue.  In the updated version, I removed that code block
(the 2 copies of it), because we are still discussing what to do about
showing tuples in constraint violation (in this case, WITH CHECK OPTION
violation) messages.  Anyway, attached here again.

>> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>>
>> Currently, the tuple conversion is performed after a tuple is routed,
>> even if the attributes of a target leaf partition map one-to-one with
>> those of the root table, which is wasteful.  Avoid that by making
>> convert_tuples_by_name() return a NULL map for such cases.
> 
> +Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);
> 
> I think you mean Assert(consider_typeid || indesc->tdhasoid ==
> outdesc->tdhasoid);

Ah, you're right.

> But I wonder why we don't instead just change this function to
> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
> comparing the type OIDs is to find out whether the table-has-OIDs
> setting matches, we could instead test that directly and avoid needing
> to pass an extra argument.  I wonder if there's some other reason this
> code is there which is not documented in the comment...

With the following patch, regression tests run fine:

  if (indesc->natts == outdesc->natts &&
- indesc->tdtypeid == outdesc->tdtypeid)
+ indesc->tdhasoid != outdesc->tdhasoid)
 {

If checking tdtypeid (instead of tdhasoid directly) has some other
consideration, I'd would have seen at least some tests broken by this
change.  So, if we are to go with this, I too prefer it over my previous
proposal to add an argument to convert_tuples_by_name().  Attached 0003
implements the above approach.

> Phew.  Thanks for all the patches, sorry I'm having trouble keeping up.

Thanks a lot for taking time to look through them and committing.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/92fe2a71-5eb7-ee8d-53ef-cfd5a65dfc3d%40lab.ntt.co.jp
>From c7088290221a9fe0818139145b7bdf6731bc419a Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote
 wrote:
> 0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch
>
> Since implicit partition constraints are not inherited, an internal
> partition's constraint was not being enforced when targeted directly.
> So, include such constraint when setting up leaf partition result
> relations for tuple-routing.

Committed.

> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>
> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
> it's possible for a different TupleTableSlot to be used for partitioned
> tables at successively lower levels.  If we do end up changing the slot
> from the original, we must update ecxt_scantuple to point to the new one
> for partition key of the tuple to be computed correctly.
>
> Last posted here:
> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp

Why does FormPartitionKeyDatum need this?  Could that be documented in
a comment in here someplace, perhaps the header comment to
FormPartitionKeyDatum?

> 0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch
>
> In ExecInsert(), do not switch back to the root partitioned table
> ResultRelInfo until after we finish ExecProcessReturning(), so that
> RETURNING projection is done using the partition's descriptor.  For
> the projection to work correctly, we must initialize the same for
> each leaf partition during ModifyTableState initialization.

Committed.

> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>
> Automatically updatable views failed to handle partitioned tables.
> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
> the WCO expressions having been suitably converted for each partition
> (think applying map_partition_varattnos to Vars in the WCO expressions
> just like with partition constraint expressions).

The changes to execMain.c contain a hunk which has essentially the
same code twice.  That looks like a mistake.  Also, the patch doesn't
compile because convert_tuples_by_name() takes 3 arguments, not 4.

> 0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch
>
> Because a given range bound in the PartitionBoundInfo.datums array
> is sometimes a range lower bound and upper bound at other times, we
> must be careful when assuming which, especially when interpreting
> the result of partition_bound_bsearch which returns the index of the
> greatest bound that is less than or equal to probe.  Due to an error
> in thinking about the same, the relevant code in
> check_new_partition_bound() caused invalid partition (index = -1)
> to be chosen as the partition being overlapped.
>
> Last posted here:
> https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp

 }
+/*
+ * If equal has been set to true or if there is no "gap"
+ * between the bound at off1 and that at off1 + 1, the new
+ * partition will overlap some partition.  In the former
+ * case, the new lower bound is found to be equal to the
+ * bound at off1, which could only ever be true if the
+ * latter is the lower bound of some partition.  It's
+ * clear in such a case that the new partition overlaps
+ * that partition, whose index we get using its upper
+ * bound (that is, using the bound at off1 + 1).
+ */
 else

Stylistically, we usually avoid this, or at least I do.  The comment
should go inside the "else" block.  But it looks OK apart from that,
so committed with a little rephrasing and reformatting of the comment.

> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>
> Currently, the tuple conversion is performed after a tuple is routed,
> even if the attributes of a target leaf partition map one-to-one with
> those of the root table, which is wasteful.  Avoid that by making
> convert_tuples_by_name() return a NULL map for such cases.

+Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);

I think you mean Assert(consider_typeid || indesc->tdhasoid ==
outdesc->tdhasoid);

But I wonder why we don't instead just change this function to
consider tdhasoid rather than tdtypeid.  I mean, if the only point of
comparing the type OIDs is to find out whether the table-has-OIDs
setting matches, we could instead test that directly and avoid needing
to pass an extra argument.  I wonder if there's some other reason this
code is there which is not documented in the comment...

> 0007-Avoid-code-duplication-in-map_partition_varattnos.patch
>
> Code to map attribute numbers in map_partition_varattnos() duplicates
> what convert_tuples_by_name_map() does.  Avoid that as pointed out by
> Álvaro Herrera.
>
> Last posted here:
>

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

On 2017/01/19 14:15, Amit Langote wrote:
> So, here are all the patches I posted to date (and one new at the bottom)
> for reported and unreported bugs, excluding the one involving
> BulkInsertState for which you replied in a new thread.
> 
> I'll describe the attached patches in brief:

Sorry, I forgot to mention that I have skipped the patch I proposed to
modify the committed approach [1] to get the correct tuple to show in the
constraint violation messages.  It might be better to continue that
discussion at [2].

And because I skipped that patch, I should have removed the related logic
in ExecWithCheckOptions() that I added in one of the later patches (patch
0004), which I forgot to do.  So, here are all the patches again with the
correct 0004 this time.  Sigh.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c771ea74f42447dccaed42ffcdcccf3aa694

[2]
https://www.postgresql.org/message-id/CA%2BTgmoZjGzSM5WwnyapFaw3GxnDLWh7pm8Xiz8_QWQnUQy%3DSCA%40mail.gmail.com
>From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 13 Dec 2016 15:07:41 +0900
Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition.

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c  |  1 -
 src/backend/commands/tablecmds.c |  1 -
 src/backend/executor/execMain.c  | 42 
 src/include/executor/executor.h  |  1 -
 src/test/regress/expected/insert.out |  6 ++
 src/test/regress/sql/insert.sql  |  5 +
 6 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1fd2162794..75386212e0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2432,7 +2432,6 @@ CopyFrom(CopyState cstate)
 	InitResultRelInfo(resultRelInfo,
 	  cstate->rel,
 	  1,		/* dummy rangetable index */
-	  true,		/* do load partition check expression */
 	  NULL,
 	  0);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e633a50dd2..06e43cbb3a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1324,7 +1324,6 @@ ExecuteTruncate(TruncateStmt *stmt)
 		InitResultRelInfo(resultRelInfo,
 		  rel,
 		  0,	/* dummy rangetable index */
-		  false,
 		  NULL,
 		  0);
 		resultRelInfo++;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff277d300a..5457f8fbde 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -824,10 +824,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 			resultRelationOid = getrelid(resultRelationIndex, rangeTable);
 			resultRelation = heap_open(resultRelationOid, RowExclusiveLock);
+
 			InitResultRelInfo(resultRelInfo,
 			  resultRelation,
 			  resultRelationIndex,
-			  true,
 			  NULL,
 			  estate->es_instrument);
 			resultRelInfo++;
@@ -1218,10 +1218,11 @@ void
 InitResultRelInfo(ResultRelInfo *resultRelInfo,
   Relation resultRelationDesc,
   Index resultRelationIndex,
-  bool load_partition_check,
   Relation partition_root,
   int instrument_options)
 {
+	List   *partition_check = NIL;
+
 	MemSet(resultRelInfo, 0, sizeof(ResultRelInfo));
 	resultRelInfo->type = T_ResultRelInfo;
 	resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
@@ -1257,13 +1258,38 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
-	if (load_partition_check)
-		resultRelInfo->ri_PartitionCheck =
-			RelationGetPartitionQual(resultRelationDesc);
+
 	/*
-	 * The following gets set to NULL unless we are initializing leaf
-	 * partitions for tuple-routing.
+	 * If partition_root has been specified, that means we are builiding the
+	 * ResultRelationInfo for one of its leaf partitions.  In that case, we
+	 * need *not* initialize the leaf partition's constraint, but rather the
+	 * the partition_root's (if any).  We must do that explicitly like this,
+	 * because implicit partition constraints are not inherited like user-
+	 * defined constraints and would fail to be enforced by ExecConstraints()
+	 * after a tuple is routed to a leaf partition.
 	 */
+	if (partition_root)
+	{
+		/*
+		 * Root table itself may or may not be a partition; partition_check
+		 * would be NIL in the latter case.
+		 */
+		partition_check = RelationGetPartitionQual(partition_root);
+
+		/*
+		 * This is not our own partition constraint, but rather an ancestor's.
+		 * So any Vars in it bear the ancestor's attribute

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Amit Langote

On 2017/01/19 5:29, Robert Haas wrote:
> On Wed, Jan 18, 2017 at 3:12 PM, Robert Haas  wrote:
>> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
>>  wrote:
>>> [ updated patches ]
>>
>> I committed 0004 and also fixed the related regression test not to
>> rely on DROP .. CASCADE, which isn't always stable.  The remainder of

Thanks!

>> this patch set needs a rebase, and perhaps you could also fold in
>> other pending partitioning fixes so I have everything to look at it in
>> one place.
> 
> Just to be a little more clear, I don't mind multiple threads each
> with a patch or patch set so much, but multiple patch sets on the same
> thread gets hard for me to track.  Sorry for the inconvenience.

OK, I agree that having multiple patch sets on the same thread is cumbersome.

So, here are all the patches I posted to date (and one new at the bottom)
for reported and unreported bugs, excluding the one involving
BulkInsertState for which you replied in a new thread.

I'll describe the attached patches in brief:

0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Last posted here:
https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp

0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch

In ExecInsert(), do not switch back to the root partitioned table
ResultRelInfo until after we finish ExecProcessReturning(), so that
RETURNING projection is done using the partition's descriptor.  For
the projection to work correctly, we must initialize the same for
each leaf partition during ModifyTableState initialization.

0004-Fix-some-issues-with-views-and-partitioned-tables.patch

Automatically updatable views failed to handle partitioned tables.
Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
the WCO expressions having been suitably converted for each partition
(think applying map_partition_varattnos to Vars in the WCO expressions
just like with partition constraint expressions).

0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch

Because a given range bound in the PartitionBoundInfo.datums array
is sometimes a range lower bound and upper bound at other times, we
must be careful when assuming which, especially when interpreting
the result of partition_bound_bsearch which returns the index of the
greatest bound that is less than or equal to probe.  Due to an error
in thinking about the same, the relevant code in
check_new_partition_bound() caused invalid partition (index = -1)
to be chosen as the partition being overlapped.

Last posted here:
https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp

0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch

Currently, the tuple conversion is performed after a tuple is routed,
even if the attributes of a target leaf partition map one-to-one with
those of the root table, which is wasteful.  Avoid that by making
convert_tuples_by_name() return a NULL map for such cases.

0007-Avoid-code-duplication-in-map_partition_varattnos.patch

Code to map attribute numbers in map_partition_varattnos() duplicates
what convert_tuples_by_name_map() does.  Avoid that as pointed out by
Álvaro Herrera.

Last posted here:
https://www.postgresql.org/message-id/9ce97382-54c8-deb3-9ee9-a2ec271d866b%40lab.ntt.co.jp

0008-Avoid-DROP-TABLE-.-CASCADE-in-more-partitioning-test.patch

This is the new one.  There were quite a few commits recently to fix the
breakage in regression tests due to not using ORDER BY in queries on
system catalogs and using DROP TABLE ... CASCADE.  There were still some
instances of the latter in create_table.sql and alter_table.sql.

Thanks,
Amit
>From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 13 Dec 2016 15:07:41 +0900
Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition.

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c  |  1 -
 src/backend/commands/tablecmds.c |  1 -
 src/backend/executor/execMain.c

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Robert Haas

On Wed, Jan 18, 2017 at 3:12 PM, Robert Haas  wrote:
> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
>  wrote:
>> [ updated patches ]
>
> I committed 0004 and also fixed the related regression test not to
> rely on DROP .. CASCADE, which isn't always stable.  The remainder of
> this patch set needs a rebase, and perhaps you could also fold in
> other pending partitioning fixes so I have everything to look at it in
> one place.

Just to be a little more clear, I don't mind multiple threads each
with a patch or patch set so much, but multiple patch sets on the same
thread gets hard for me to track.  Sorry for the inconvenience.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Robert Haas

On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
 wrote:
> [ updated patches ]

I committed 0004 and also fixed the related regression test not to
rely on DROP .. CASCADE, which isn't always stable.  The remainder of
this patch set needs a rebase, and perhaps you could also fold in
other pending partitioning fixes so I have everything to look at it in
one place.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-17 Thread Robert Haas

On Mon, Jan 16, 2017 at 4:09 AM, Amit Langote
 wrote:
> The problem is that whereas the SlotValueDescription that we build to show
> in the error message should be based on the tuple that was passed to
> ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
> process, it might fail to be the case if the tuple was needed to be
> converted after tuple routing.  slot (the tuple in it and its tuple
> descriptor) and resultRelInfo that ExecConstraint() receives *do*
> correspond with each other, even after possible tuple conversion following
> tuple-routing, and hence constraint checking itself works fine (since
> commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
> error message being based on the converted-for-partition tuple that could
> be seen as a problem - is it acceptable if we showed in the error message
> whatever the converted-for-partition tuple looks like which might have
> columns ordered differently from the root table?  If so, we could simply
> forget the whole thing, including reverting f1b4c771 [2].
>
> An example:
>
> create table p (a int, b char, c int) partition by list (a);
> create table p1 (b char, c int, a int);-- note the column order
> alter table p attach partition p1 for values in (1);
> alter table p add constraint check_b check (b = 'x');
>
> insert into p values (1, 'y', 1);
> ERROR:  new row for relation "p1" violates check constraint "check_b"
> DETAIL:  Failing row contains (y, 1, 1).
>
> Note that "(y, 1, 1)" results from using p1's descriptor on the converted
> tuple.  As long that's clear and acceptable, I think we need not worry
> about this patch and revert the previously committed patch for this "problem".

Hmm.  It would be fine, IMHO, if the detail message looked like the
one that BuildIndexValueDescription produces.  Without the column
names, the clarity is somewhat lessened.

Anybody else have an opinion on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-16 Thread Amit Langote

On 2017/01/14 6:24, Robert Haas wrote:
> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote wrote:
>>
>> Thanks!  I realized however that the approach I used in 0002 of passing
>> the original slot to ExecConstraints() fails in certain situations.  For
>> example, if a BR trigger changes the tuple, the original slot would not
>> receive those changes, so it will be wrong to use such a tuple anymore.
>> In attached 0001, I switched back to the approach of converting the
>> partition-tupdesc-based tuple back to the root partitioned table's format.
>>  The converted tuple contains the changes by BR triggers, if any.  Sorry
>> about some unnecessary work.
> 
> Hmm.  Even with this patch, I wonder if this is really correct.  I
> mean, isn't the root of the problem here that ExecConstraints() is
> expecting that resultRelInfo matches slot, and it doesn't?

The problem is that whereas the SlotValueDescription that we build to show
in the error message should be based on the tuple that was passed to
ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
process, it might fail to be the case if the tuple was needed to be
converted after tuple routing.  slot (the tuple in it and its tuple
descriptor) and resultRelInfo that ExecConstraint() receives *do*
correspond with each other, even after possible tuple conversion following
tuple-routing, and hence constraint checking itself works fine (since
commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
error message being based on the converted-for-partition tuple that could
be seen as a problem - is it acceptable if we showed in the error message
whatever the converted-for-partition tuple looks like which might have
columns ordered differently from the root table?  If so, we could simply
forget the whole thing, including reverting f1b4c771 [2].

An example:

create table p (a int, b char, c int) partition by list (a);
create table p1 (b char, c int, a int);-- note the column order
alter table p attach partition p1 for values in (1);
alter table p add constraint check_b check (b = 'x');

insert into p values (1, 'y', 1);
ERROR:  new row for relation "p1" violates check constraint "check_b"
DETAIL:  Failing row contains (y, 1, 1).

Note that "(y, 1, 1)" results from using p1's descriptor on the converted
tuple.  As long that's clear and acceptable, I think we need not worry
about this patch and revert the previously committed patch for this "problem".

> And why
> isn't that also a problem for the things that get passed resultRelInfo
> and slot after tuple routing and before ExecConstraints?  In
> particular, in copy.c, ExecBRInsertTriggers.

If I explained the problem (as I'm seeing it) well enough above, you may
see why that's not an issue in other cases.  Other sub-routines viz.
ExecBRInsertTriggers, ExecWithCheckOptions for RLS INSERT WITH CHECK
policies, ExecARInsertTriggers, etc. do receive the correct slot and
resultRelInfo for whatever they do with a given tuple and the relation
(partition) it is being inserted into.  However, if we do consider the
above to be a problem, then we would need to fix ExecWithCheckOptions() to
do whatever we decide ExecConstraints() should do, because it can also
report WITH CHECK violation for a tuple.

> Committed 0002.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c77

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-13 Thread Robert Haas

On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
 wrote:
> On 2017/01/05 5:50, Robert Haas wrote:
>> On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote
>>  wrote:
>>> Patches 0001 to 0006 unchanged.
>>
>> Committed 0001 earlier, as mentioned in a separate email.  Committed
>> 0002 and part of 0003.
>
> Thanks!  I realized however that the approach I used in 0002 of passing
> the original slot to ExecConstraints() fails in certain situations.  For
> example, if a BR trigger changes the tuple, the original slot would not
> receive those changes, so it will be wrong to use such a tuple anymore.
> In attached 0001, I switched back to the approach of converting the
> partition-tupdesc-based tuple back to the root partitioned table's format.
>  The converted tuple contains the changes by BR triggers, if any.  Sorry
> about some unnecessary work.

Hmm.  Even with this patch, I wonder if this is really correct.  I
mean, isn't the root of the problem here that ExecConstraints() is
expecting that resultRelInfo matches slot, and it doesn't? And why
isn't that also a problem for the things that get passed resultRelInfo
and slot after tuple routing and before ExecConstraints?  In
particular, in copy.c, ExecBRInsertTriggers.

Committed 0002.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-11 Thread Amit Langote

On 2017/01/06 20:23, Amit Langote wrote:
> On 2017/01/05 3:26, Robert Haas wrote:
>> It's unclear to me why we need to do 0002.  It doesn't seem like it
>> should be necessary, it doesn't seem like a good idea, and the commit
>> message you proposed is uninformative.
>
> If a single BulkInsertState object is passed to
> heap_insert()/heap_multi_insert() for different heaps corresponding to
> different partitions (from one input tuple to next), tuples might end up
> going into wrong heaps (like demonstrated in one of the reports [1]).  A
> simple solution is to disable bulk-insert in case of partitioned tables.
>
> But my patch (or its motivations) was slightly wrongheaded, wherein I
> conflated multi-insert stuff and bulk-insert considerations.  I revised
> 0002 to not do that.

Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
Attaching updated 0002 along with rebased 0001 and 0003.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/732dfc84-25f5-413c-1eee-0bfa7a370093%40agama.tv
>From f2a64348021c7dba1f96d0c8b4e3e253f635b019 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 29 ++---
 src/backend/executor/execMain.c  |  2 --
 src/test/regress/expected/insert.out |  2 +-
 src/test/regress/sql/insert.sql  |  2 +-
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..0de1cf245a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd,
 	bool		isnull[PARTITION_MAX_KEYS];
 	int			cur_offset,
 cur_index;
-	int			i;
+	int			i,
+result;
+	ExprContext *ecxt = GetPerTupleExprContext(estate);
+	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
 	/* start with the root partitioned table */
 	parent = pd[0];
@@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			slot = myslot;
 		}
 
-		/* Extract partition key from tuple */
+		/*
+		 * Extract partition key from tuple; FormPartitionKeyDatum() expects
+		 * ecxt_scantuple to point to the correct tuple slot (which might be
+		 * different from the slot we received from the caller if the
+		 * partitioned table of the current level has different tuple
+		 * descriptor from its parent).
+		 */
+		ecxt->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
@@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		 */
 		if (cur_index < 0)
 		{
+			result = -1;
 			*failed_at = RelationGetRelid(parent->reldesc);
-			return -1;
+			break;
 		}
-		else if (parent->indexes[cur_index] < 0)
-			parent = pd[-parent->indexes[cur_index]];
-		else
+		else if (parent->indexes[cur_index] >= 0)
+		{
+			result = parent->indexes[cur_index];
 			break;
+		}
+		else
+			parent = pd[-parent->indexes[cur_index]];
 	}
 
-	return parent->indexes[cur_index];
+	ecxt->ecxt_scantuple = ecxt_scantuple_old;
+	return result;
 }
 
 /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff277d300a..6a9bc8372f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3167,9 +3167,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, _at);
 	if (result < 0)
 	{
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index ca3134c34c..1c7b8047ee 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1
 drop cascades to table part_ee_ff2
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote

On 2017/01/05 5:50, Robert Haas wrote:
> On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote
>  wrote:
>> Patches 0001 to 0006 unchanged.
>
> Committed 0001 earlier, as mentioned in a separate email.  Committed
> 0002 and part of 0003.

Thanks!  I realized however that the approach I used in 0002 of passing
the original slot to ExecConstraints() fails in certain situations.  For
example, if a BR trigger changes the tuple, the original slot would not
receive those changes, so it will be wrong to use such a tuple anymore.
In attached 0001, I switched back to the approach of converting the
partition-tupdesc-based tuple back to the root partitioned table's format.
 The converted tuple contains the changes by BR triggers, if any.  Sorry
about some unnecessary work.

> But I'm skeptical that the as-patched-by-0003
> logic in generate_partition_qual() makes sense.  You do this:
>
> result = list_concat(generate_partition_qual(parent),
>  copyObject(rel->rd_partcheck));
>
> /* Mark Vars with correct attnos */
> result = map_partition_varattnos(result, rel, parent);
>
> But that has the effect of applying map_partition_varattnos to
> everything in rel->rd_partcheck in addition to applying it to
> everything returned by generate_partition_qual() on the parent, which
> doesn't seem right.

I've replaced this portion of the code with (as also mentioned below):

/* Quick copy */
if (rel->rd_partcheck != NIL)
return copyObject(rel->rd_partcheck);

Down below (for the case when the partition qual is not cached, we now do
this:

my_qual = get_qual_from_partbound(rel, parent, bound);

/* Add the parent's quals to the list (if any) */
if (parent->rd_rel->relispartition)
result = list_concat(generate_partition_qual(parent), my_qual);
else
result = my_qual;

/*
 * Change Vars to have partition's attnos instead of the parent's.
 * We do this after we concatenate the parent's quals, because
 * we want every Var in it to bear this relation's attnos.
 */
result = map_partition_varattnos(result, rel, parent);

Which is then cached wholly in rd_partcheck.

As for your concern whether it's correct to do so, consider that doing
generate_partition_qual() on the parent returns qual with Vars that bear
the parent's attnos (which is OK as far parent as partition is concerned).
 To apply the qual to the current relation as partition, we must change
the Vars to have this relation's attnos.

> Also, don't we want to do map_partition_varattnos() just ONCE, rather
> than on every call to this function?  I think maybe your concern is
> that the parent might be changed without a relcache flush on the
> child, but I don't quite see how that could happen.  If the parent's
> tuple descriptor changes, surely the child's tuple descriptor would
> have to be altered at the same time...

Makes sense.  I fixed so that we return copyObject(rel->rd_partcheck), if
it's non-NIL, instead of generating parent's qual and doing the mapping
again.  For some reason, I thought we couldn't save the mapped version in
the relcache.

By the way, in addition to the previously mentioned bug of RETURNING, I
found that WITH CHECK OPTION didn't work correctly as well.  In fact
automatically updatable views failed to consider partitioned tables at
all.  Patch 0007 is addressed towards fixing that.

Thanks,
Amit
>From e408234633c01817d6a2313fdbdccdb4f0057c1e Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 6 Jan 2017 15:53:10 +0900
Subject: [PATCH 1/7] Fix reporting of violation in ExecConstraints, again

We decided in f1b4c771ea74f42447dccaed42ffcdcccf3aa694 that passing
the original slot (one containing the tuple formatted per root
partitioned table's tupdesc) to ExecConstraints(), but that breaks
certain cases.  Imagine what would happen if a BR trigger changed the
tuple - the original slot would not contain those changes.
So, it seems better to convert (if necessary) the tuple formatted
per partition tupdesc after tuple-routing back to the root table's
format and use the converted tuple to make val_desc shown in the
message if an error occurs.
---
 src/backend/commands/copy.c|  6 ++--
 src/backend/executor/execMain.c| 53 +-
 src/backend/executor/nodeModifyTable.c |  5 ++--
 src/include/executor/executor.h|  3 +-
 src/test/regress/expected/insert.out   | 18 ++--
 src/test/regress/sql/insert.sql| 17 ++-
 6 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f56b2ac49b..65eb167087 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2491,8 +2491,7 @@ CopyFrom(CopyState cstate)

 	for (;;)
 	{
-		TupleTableSlot *slot,
-	   *oldslot;
+		TupleTableSlot *slot;
 		bool		skip_tuple;
 		Oid			loaded_oid = InvalidOid;

@@

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote


Hi Kieth,

On 2017/01/10 14:44, Keith Fiske wrote:
> Is there any reason for the exclusion of parent tables from the pg_tables
> system catalog view? They do not show up in information_schema.tables as
> well. I believe I found where to make the changes and I tested to make sure
> it works for my simple case. Attached is my first attempt at patching
> anything in core. Not sure if there's anywhere else this would need to be
> fixed.

That's an oversight.  The original partitioning patch didn't touch
information_schema.sql and system_views.sql at all.  I added the relkind =
'P' check in some other views as well, including what your patch considered.

Thanks,
Amit
>From 9eef3e87b9a025d233aa4b935b50bb0c7633efbb Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Jan 2017 18:12:25 +0900
Subject: [PATCH] information_schema/system_views.sql and relkind 'P'

Currently, partitioned table are not taken into account in various
information_schema and system views.

Reported by: Keith Fiske
Patch by: Kieth Fiske, Amit Langote
Reports: https://www.postgresql.org/message-id/CAG1_KcDJiZB=l6youo_bvufj2q2851_xdkfhw0jdcd_2vtk...@mail.gmail.com
---
 src/backend/catalog/information_schema.sql | 37 +++---
 src/backend/catalog/system_views.sql   |  3 ++-
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 4df390a763..318f195b81 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -453,7 +453,7 @@ CREATE VIEW check_constraints AS
   AND a.attnum > 0
   AND NOT a.attisdropped
   AND a.attnotnull
-  AND r.relkind = 'r'
+  AND r.relkind IN ('r', 'P')
   AND pg_has_role(r.relowner, 'USAGE');
 
 GRANT SELECT ON check_constraints TO PUBLIC;
@@ -525,7 +525,7 @@ CREATE VIEW column_domain_usage AS
   AND a.attrelid = c.oid
   AND a.atttypid = t.oid
   AND t.typtype = 'd'
-  AND c.relkind IN ('r', 'v', 'f')
+  AND c.relkind IN ('r', 'v', 'f', 'P')
   AND a.attnum > 0
   AND NOT a.attisdropped
   AND pg_has_role(t.typowner, 'USAGE');
@@ -564,7 +564,7 @@ CREATE VIEW column_privileges AS
   pr_c.relowner
FROM (SELECT oid, relname, relnamespace, relowner, (aclexplode(coalesce(relacl, acldefault('r', relowner.*
  FROM pg_class
- WHERE relkind IN ('r', 'v', 'f')
+ WHERE relkind IN ('r', 'v', 'f', 'P')
 ) pr_c (oid, relname, relnamespace, relowner, grantor, grantee, prtype, grantable),
 pg_attribute a
WHERE a.attrelid = pr_c.oid
@@ -586,7 +586,7 @@ CREATE VIEW column_privileges AS
 ) pr_a (attrelid, attname, grantor, grantee, prtype, grantable),
 pg_class c
WHERE pr_a.attrelid = c.oid
- AND relkind IN ('r', 'v', 'f')
+ AND relkind IN ('r', 'v', 'f', 'P')
  ) x,
  pg_namespace nc,
  pg_authid u_grantor,
@@ -629,7 +629,7 @@ CREATE VIEW column_udt_usage AS
 WHERE a.attrelid = c.oid
   AND a.atttypid = t.oid
   AND nc.oid = c.relnamespace
-  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f')
+  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P')
   AND pg_has_role(coalesce(bt.typowner, t.typowner), 'USAGE');
 
 GRANT SELECT ON column_udt_usage TO PUBLIC;
@@ -738,7 +738,7 @@ CREATE VIEW columns AS
CAST('NEVER' AS character_data) AS is_generated,
CAST(null AS character_data) AS generation_expression,
 
-   CAST(CASE WHEN c.relkind = 'r' OR
+   CAST(CASE WHEN c.relkind IN ('r', 'P') OR
   (c.relkind IN ('v', 'f') AND
pg_column_is_updatable(c.oid, a.attnum, false))
 THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_updatable
@@ -753,7 +753,7 @@ CREATE VIEW columns AS
 
 WHERE (NOT pg_is_other_temp_schema(nc.oid))
 
-  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f')
+  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P')
 
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_column_privilege(c.oid, a.attnum,
@@ -789,7 +789,7 @@ CREATE VIEW constraint_column_usage AS
 AND d.objid = c.oid
 AND c.connamespace = nc.oid
 AND c.contype = 'c'
-AND r.relkind = 'r'
+AND r.relkind IN ('r', 'P')
 AND NOT a.attisdropped
 
 UNION ALL
@@ -841,7 +841,7 @@ CREATE VIEW constraint_table_usage AS
 WHERE c.connamespace = nc.oid AND r.relnamespace = nr.oid
   AND ( (c.contype = 'f' AND c.confrelid = r.oid)
  OR (c.contype IN ('p', 'u') AND c.conrelid = r.oid) )
-  AND

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote

Hi Amul,

On 2017/01/09 17:29, amul sul wrote:
> I got server crash due to assert failure at ATTACHing overlap rang
> partition, here is test case to reproduce this:
> 
> CREATE TABLE test_parent(a int) PARTITION BY RANGE (a);
> CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES
> FROM(100) TO(200);
> CREATE TABLE test_parent_part1(a int NOT NULL);
> ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES
> FROM(1) TO(200);
> 
> I think, this bug exists in the following code of check_new_partition_bound():
> 
>  767 if (equal || off1 != off2)
>  768 {
>  769 overlap = true;
>  770 with = boundinfo->indexes[off2 + 1];
>  771 }
> 
> When equal is true array index should not be 'off2 + 1'.

Good catch.  Attached patch should fix that.  I observed crash with the
following command as well:

ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES FROM
(1) TO (300);

That's because there is one more case when the array index shouldn't be
off2 + 1 - the case where the bound at off2 is an upper bound (I'd wrongly
assumed that it's always a lower bound).  Anyway, I rewrote the
surrounding comments to clarify the logic a bit.

> While reading code related to this, I wondered why
> partition_bound_bsearch is not immediately returns when cmpval==0?

partition_bound_bsearch() is meant to return the *greatest* index of the
bound less than or equal to the input bound ("probe").  But it seems to me
now that we would always return the first index at which we get 0 for
cmpval, albeit after wasting cycles to try to find even greater index.
Because we don't have duplicates in the datums array, once we encounter a
bound that is equal to probe, we are only going to find bounds that are
*greater than* probe if we continue looking right, only to turn back again
to return the equal index (which is wasted cycles in invoking the
partition key comparison function(s)).  So, it perhaps makes sense to do
this per your suggestion:

@@ -1988,8 +2018,11 @@ partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
 if (cmpval <= 0)
 {
 lo = mid;
 *is_equal = (cmpval == 0);
+
+if (*is_equal)
+break;
 }

Thanks,
Amit
>From 7fe537f8e8efeac51c3b0cc91ac51a1aa39399cd Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Jan 2017 17:43:36 +0900
Subject: [PATCH] Fix some wrong thinking in check_new_partition_bound()

Because a given range bound in the PartitionBoundInfo.datums array
is sometimes a lower range bound and at other times an upper range
bound, we must be careful when assuming which, especially when
interpreting the result of partition_bound_bsearch which returns
the index of the greatest bound that is less than or equal to probe.
Due to an error in thinking about the same, the relevant code in
check_new_partition_bound() caused invalid partition (index == -1)
to be chosen as the partition being overlapped.

Also, we need not continue searching for even greater bound in
partition_bound_bsearch() once we find the first bound that is *equal*
to the probe, because we don't have duplicate datums.  That spends
cycles needlessly.  Per suggestion from Amul Sul.

Reported by: Amul Sul
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAAJ_b94XgbqVoXMyxxs63CaqWoMS1o2gpHiU0F7yGnJBnvDc_A%40mail.gmail.com
---
 src/backend/catalog/partition.c| 62 ++
 src/test/regress/expected/create_table.out | 10 -
 src/test/regress/sql/create_table.sql  |  4 ++
 3 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..df5652de4c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -741,35 +741,64 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
 		   boundinfo->strategy == PARTITION_STRATEGY_RANGE);

 	/*
-	 * Find the greatest index of a range bound that is less
-	 * than or equal with the new lower bound.
+	 * Firstly, find the greatest range bound that is less
+	 * than or equal to the new lower bound.
 	 */
 	off1 = partition_bound_bsearch(key, boundinfo, lower, true,
    );

 	/*
-	 * If equal has been set to true, that means the new lower
-	 * bound is found to be equal with the bound at off1,
-	 * which clearly means an overlap with the partition at
-	 * index off1+1).
-	 *
-	 * Otherwise, check if there is a "gap" that could be
-	 * occupied by the new partition.  In case of a gap, the
-	 * new upper bound should not cross past the upper
-	 * boundary of the gap, that is, off2 == off1 should be
-	 * true.
+	 * off1 == -1 means that all existing bounds are greater
+	 *

Re: [HACKERS] Declarative partitioning - another take

2017-01-09 Thread Keith Fiske

Is there any reason for the exclusion of parent tables from the pg_tables
system catalog view? They do not show up in information_schema.tables as
well. I believe I found where to make the changes and I tested to make sure
it works for my simple case. Attached is my first attempt at patching
anything in core. Not sure if there's anywhere else this would need to be
fixed.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 4df390a..c31d0d8 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1901,6 +1901,7 @@ CREATE VIEW tables AS
   WHEN c.relkind = 'r' THEN 'BASE TABLE'
   WHEN c.relkind = 'v' THEN 'VIEW'
   WHEN c.relkind = 'f' THEN 'FOREIGN TABLE'
+  WHEN c.relkind = 'P' THEN 'PARTITIONED TABLE'
   ELSE null END
  AS character_data) AS table_type,
 
@@ -1912,7 +1913,7 @@ CREATE VIEW tables AS
CAST(t.typname AS sql_identifier) AS user_defined_type_name,
 
CAST(CASE WHEN c.relkind = 'r' OR
-  (c.relkind IN ('v', 'f') AND
+  (c.relkind IN ('v', 'f', 'P') AND
-- 1 << CMD_INSERT
pg_relation_is_updatable(c.oid, false) & 8 = 8)
 THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_insertable_into,
@@ -1923,7 +1924,7 @@ CREATE VIEW tables AS
 FROM pg_namespace nc JOIN pg_class c ON (nc.oid = c.relnamespace)
LEFT JOIN (pg_type t JOIN pg_namespace nt ON (t.typnamespace = nt.oid)) ON (c.reloftype = t.oid)
 
-WHERE c.relkind IN ('r', 'v', 'f')
+WHERE c.relkind IN ('r', 'v', 'f', 'P')
   AND (NOT pg_is_other_temp_schema(nc.oid))
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER')
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31aade1..f4dc460 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -136,7 +136,7 @@ CREATE VIEW pg_tables AS
 C.relrowsecurity AS rowsecurity
 FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  LEFT JOIN pg_tablespace T ON (T.oid = C.reltablespace)
-WHERE C.relkind = 'r';
+WHERE C.relkind = 'r' OR C.relkind = 'P';
 
 CREATE VIEW pg_matviews AS
 SELECT


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-09 Thread amul sul

Hi,

I got server crash due to assert failure at ATTACHing overlap rang
partition, here is test case to reproduce this:

CREATE TABLE test_parent(a int) PARTITION BY RANGE (a);
CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES
FROM(100) TO(200);
CREATE TABLE test_parent_part1(a int NOT NULL);
ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES
FROM(1) TO(200);

I think, this bug exists in the following code of check_new_partition_bound():

 767 if (equal || off1 != off2)
 768 {
 769 overlap = true;
 770 with = boundinfo->indexes[off2 + 1];
 771 }

When equal is true array index should not be 'off2 + 1'.

While reading code related to this, I wondered why
partition_bound_bsearch is not immediately returns when cmpval==0?

Apologise if this has been already reported.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-06 Thread Amit Langote

On 2017/01/05 3:26, Robert Haas wrote:
> On Tue, Dec 27, 2016 at 8:41 PM, Amit Langote
>  wrote:
>> On 2016/12/27 19:07, Amit Langote wrote:
>>> Attached should fix that.
>>
>> Here are the last two patches with additional information like other
>> patches.  Forgot to do that yesterday.
> 
> 0001 has the disadvantage that get_partition_for_tuple() acquires a
> side effect.  That seems undesirable.  At the least, it needs to be
> documented in the function's header comment.

That's true.  How about we save away the original ecxt_scantuple at entry
and restore the same just before returning from the function?  That way
there would be no side effect.  0001 implements that.

> It's unclear to me why we need to do 0002.  It doesn't seem like it
> should be necessary, it doesn't seem like a good idea, and the commit
> message you proposed is uninformative.

If a single BulkInsertState object is passed to
heap_insert()/heap_multi_insert() for different heaps corresponding to
different partitions (from one input tuple to next), tuples might end up
going into wrong heaps (like demonstrated in one of the reports [1]).  A
simple solution is to disable bulk-insert in case of partitioned tables.

But my patch (or its motivations) was slightly wrongheaded, wherein I
conflated multi-insert stuff and bulk-insert considerations.  I revised
0002 to not do that.

However if we disable bulk-insert mode, COPY's purported performance
benefit compared with INSERT is naught.  Patch 0003 is a proposal to
implement bulk-insert mode even for partitioned tables.  Basically,
allocate separate BulkInsertState objects for each partition and switch to
the appropriate one just before calling heap_insert()/heap_multi_insert().
 Then to be able to use heap_multi_insert(), we must also manage buffered
tuples separately for each partition.  Although, I didn't modify the limit
on number of buffered tuples and/or size of buffered tuples which controls
when we pause buffering and do heap_multi_insert() on buffered tuples.
Maybe, it should work slightly differently for the partitioned table case,
like for example, increase the overall limit on both the number of tuples
and tuple size in the partitioning case (I observed that increasing it 10x
or 100x helped to some degree).  Thoughts on this?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
>From 332c67c258a0f25f76c29ced23199fe0ee8e153e Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 29 ++---
 src/backend/executor/execMain.c  |  2 --
 src/test/regress/expected/insert.out |  2 +-
 src/test/regress/sql/insert.sql  |  2 +-
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..0de1cf245a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd,
 	bool		isnull[PARTITION_MAX_KEYS];
 	int			cur_offset,
 cur_index;
-	int			i;
+	int			i,
+result;
+	ExprContext *ecxt = GetPerTupleExprContext(estate);
+	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;

 	/* start with the root partitioned table */
 	parent = pd[0];
@@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			slot = myslot;
 		}

-		/* Extract partition key from tuple */
+		/*
+		 * Extract partition key from tuple; FormPartitionKeyDatum() expects
+		 * ecxt_scantuple to point to the correct tuple slot (which might be
+		 * different from the slot we received from the caller if the
+		 * partitioned table of the current level has different tuple
+		 * descriptor from its parent).
+		 */
+		ecxt->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);

 		if (key->strategy == PARTITION_STRATEGY_RANGE)
@@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		 */
 		if (cur_index < 0)
 		{
+			result = -1;
 			*failed_at = RelationGetRelid(parent->reldesc);
-			return -1;
+			break;
 		}
-		else if (parent->indexes[cur_index] < 0)
-			parent =

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Amit Langote


Hi Keith,

On 2017/01/06 2:16, Keith Fiske wrote:
> Could we get some clarification on the partition_bound_spec portion of the
> PARTITION OF clause? Just doing some testing it seems it's inclusive of the
> FROM value but exclusive of the TO value. I don't see mention of this in
> the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
> It does mention that the values aren't allowed to overlap, but looking at
> the schema below, without the clarification of which side is
> inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
> the child table does not clarify this. Not sure if there's a way to do this
> in the \d+ display which would be ideal, but it should at least be
> mentioned in the docs.

I agree that needs highlighting.  I'm planning to write a doc patch for
that (among other documentation improvements).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Keith Fiske

Could we get some clarification on the partition_bound_spec portion of the
PARTITION OF clause? Just doing some testing it seems it's inclusive of the
FROM value but exclusive of the TO value. I don't see mention of this in
the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
It does mention that the values aren't allowed to overlap, but looking at
the schema below, without the clarification of which side is
inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
the child table does not clarify this. Not sure if there's a way to do this
in the \d+ display which would be ideal, but it should at least be
mentioned in the docs.

keith@keith=# \d+ measurement
 Table "public.measurement"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | | plain
|  |
Partition key: RANGE (logdate)
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)
Partitions: measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO
('2016-08-01'),
measurement_y2016m08 FOR VALUES FROM ('2016-08-01') TO
('2016-09-01')

keith@keith=# \d+ measurement_y2016m07
 Table "public.measurement_y2016m07"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | 0   | plain
|  |
Partition of: measurement FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)


keith@keith=# insert into measurement (logdate) values ('2016-08-01');
INSERT 0 1
Time: 2.848 ms

keith@keith=# select * from measurement_y2016m07;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.273 ms
keith@keith=# select * from measurement_y2016m08;
  logdate   | peaktemp | unitsales
+--+---
 2016-08-01 |1 |«NULL»
(1 row)

Time: 0.272 ms

keith@keith=# drop table measurement_y2016m08;
DROP TABLE
Time: 5.919 ms
keith@keith=# select * from only measurement;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.307 ms
keith@keith=# insert into measurement (logdate) values ('2016-08-01');
ERROR:  no partition of relation "measurement" found for row
DETAIL:  Failing row contains (2016-08-01, 1, null).
Time: 0.622 ms

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Amit Langote

On 2016/12/14 12:14, Venkata B Nagothi wrote:
> Loading the data into a normal table is not an issue (infact the csv is
> generated from the table itself)
> 
> The issue is occurring only when i am trying to load the data from CSV file
> into a partitioned table -
> 
> db01=# CREATE TABLE orders_y1992
> PARTITION OF orders2 FOR VALUES FROM ('1992-01-01') TO ('1992-12-31');
> CREATE TABLE
> db01=# copy orders2 from '/data/orders-1993.csv' delimiter '|';
> ERROR:  could not read block 6060 in file "base/16384/16407": read only 0
> of 8192 bytes
> CONTEXT:  COPY orders2, line 376589:
> "9876391|374509|O|54847|1997-07-16|3-MEDIUM   |Clerk#01993|0|ithely
> regular pack"
> 
> Not sure why COPY is failing.

I think I've been able to reproduce this issue and suspect that it's a
bug.  I tried to solve it in response to another related report [1], where
it was apparent that the cause was related to how the bulk-insert mode in
the COPY FROM code is not handled correctly for a partitioned table.  My
proposed solution [2] was to disable bulk-insert mode completely for
partitioned tables.  But it may not be desirable performance-wise (for
example, COPY FROM on partitioned tables would have same performance as
INSERT, whereas in case of regular tables, COPY FROM is much faster than
INSERT due to the bulk insert mode).

I will propose another solution for the same.  Meanwhile, could you please
try your test again with the patch posted at [1], although it will not
likely be committed as the fix for this issue.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/101e2c2d-45d6-fb1a-468c-d3f67572a2f3%40lab.ntt.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-04 Thread Robert Haas

On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote
 wrote:
> Patches 0001 to 0006 unchanged.

Committed 0001 earlier, as mentioned in a separate email.  Committed
0002 and part of 0003.  But I'm skeptical that the as-patched-by-0003
logic in generate_partition_qual() makes sense.  You do this:

result = list_concat(generate_partition_qual(parent),
 copyObject(rel->rd_partcheck));

/* Mark Vars with correct attnos */
result = map_partition_varattnos(result, rel, parent);

But that has the effect of applying map_partition_varattnos to
everything in rel->rd_partcheck in addition to applying it to
everything returned by generate_partition_qual() on the parent, which
doesn't seem right.

Also, don't we want to do map_partition_varattnos() just ONCE, rather
than on every call to this function?  I think maybe your concern is
that the parent might be changed without a relcache flush on the
child, but I don't quite see how that could happen.  If the parent's
tuple descriptor changes, surely the child's tuple descriptor would
have to be altered at the same time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-04 Thread Robert Haas

On Tue, Dec 27, 2016 at 8:41 PM, Amit Langote
 wrote:
> On 2016/12/27 19:07, Amit Langote wrote:
>> Attached should fix that.
>
> Here are the last two patches with additional information like other
> patches.  Forgot to do that yesterday.

0001 has the disadvantage that get_partition_for_tuple() acquires a
side effect.  That seems undesirable.  At the least, it needs to be
documented in the function's header comment.

It's unclear to me why we need to do 0002.  It doesn't seem like it
should be necessary, it doesn't seem like a good idea, and the commit
message you proposed is uninformative.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-04 Thread Robert Haas

On Mon, Dec 26, 2016 at 5:46 AM, Amit Langote
 wrote:
> On 2016/12/23 8:08, Robert Haas wrote:
>> On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote
>>  wrote:
>>> While working on that, I discovered yet-another-bug having to do with the
>>> tuple descriptor that's used as we route a tuple down a partition tree. If
>>> attnums of given key attribute(s) are different on different levels, it
>>> would be incorrect to use the original slot's (one passed by ExecInsert())
>>> tuple descriptor to inspect the original slot's heap tuple, as we go down
>>> the tree.  It might cause spurious "partition not found" at some level due
>>> to looking at incorrect field in the input tuple because of using the
>>> wrong tuple descriptor (root table's attnums not always same as other
>>> partitioned tables in the tree).  Patch 0001 fixes that including a test.
>>
>> I committed this, but I'm a bit uncomfortable with it: should the
>> TupleTableSlot be part of the ModifyTableState rather than the EState?
>
> Done that way in 0001 of the attached patches.  So, instead of making the
> standalone partition_tuple_slot a field of EState (with the actual
> TupleTableSlot in its tupleTable), it is now allocated within
> ModifyTableState and CopyState, and released when ModifyTable node or
> CopyFrom finishes, respectively.

I dropped some comments from this and committed it.  They were
formatted in a way that wouldn't survive pgindent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-04 Thread Amit Langote

On 2017/01/04 16:31, 高增琦 wrote:
> Server crash(failed assertion) when two "insert" in one SQL:
> 
> Step to reproduce:
> create table t(a int, b int) partition by range(a);
> create table t_p1 partition of t for values from (1) to (100);
> create table t_p2 partition of t for values from (100) to (200);
> create table t_p3 partition of t for values from (200) to (300);
> 
> create table b(a int, b int);
> with a(a,b) as(insert into t values(3, 3) returning a, b) insert into b
> select * from a;
> 
> Please check it.

Thanks for testing!  This should be fixed by a patch I posted earlier (Try
the patch 0001 of the patches posted at [1]).  Robert did express his
concern [2] about the approach used in my patch that was committed as
2ac3ef7a01 [3]; your test demonstrates that it wasn't a good approach
after all.

Regards,
Amit

[1]
https://www.postgresql.org/message-id/f6f3a214-5bb5-aa8c-f82c-c720348cf086%40lab.ntt.co.jp

[2]
https://www.postgresql.org/message-id/CA%2BTgmoYUfs8peo-p%2BStw7afTdXqNWv_S4dx_6AWc-Y_ZrGWZbQ%40mail.gmail.com

[3]
https://git.postgresql.org/gitweb/?p=postgresql.git=commit=2ac3ef7a01df859c62d0a02333b646d65eaec5ff




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-03 Thread 高增琦

Server crash(failed assertion) when two "insert" in one SQL:

Step to reproduce:
create table t(a int, b int) partition by range(a);
create table t_p1 partition of t for values from (1) to (100);
create table t_p2 partition of t for values from (100) to (200);
create table t_p3 partition of t for values from (200) to (300);

create table b(a int, b int);
with a(a,b) as(insert into t values(3, 3) returning a, b) insert into b
select * from a;

Please check it.

2017-01-04 14:11 GMT+08:00 Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com>:

> On Wed, Jan 4, 2017 at 10:37 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote:
>> > On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote:
>> >>
>> >> Attached patch should fix the same.
>> >
>> > I have applied attached patch, server crash for range is fixed, but
>> still
>> > getting crash for multi-level list partitioning insert.
>> >
>> > postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY
>> > LIST(c);
>>
>> [ ... ]
>>
>> > postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM')
>> FROM
>> > generate_series(0, 599, 2) i;
>> > server closed the connection unexpectedly
>> > This probably means the server terminated abnormally
>> > before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Failed.
>>
>> Hm, that's odd.  I tried your new example, but didn't get the crash.
>>
>> Thanks,
>> Amit
>>
>
> Thanks, I have pulled latest sources from git, and then applied patch
> "fix-wrong-ecxt_scantuple-crash.patch", Not getting crash now, may be I
> have missed something last time.
>
>


-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com

Re: [HACKERS] Declarative partitioning - another take

2017-01-03 Thread Rajkumar Raghuwanshi

On Wed, Jan 4, 2017 at 10:37 AM, Amit Langote  wrote:

> On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote:
> > On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote:
> >>
> >> Attached patch should fix the same.
> >
> > I have applied attached patch, server crash for range is fixed, but still
> > getting crash for multi-level list partitioning insert.
> >
> > postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY
> > LIST(c);
>
> [ ... ]
>
> > postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM')
> FROM
> > generate_series(0, 599, 2) i;
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
>
> Hm, that's odd.  I tried your new example, but didn't get the crash.
>
> Thanks,
> Amit
>

Thanks, I have pulled latest sources from git, and then applied patch
"fix-wrong-ecxt_scantuple-crash.patch", Not getting crash now, may be I
have missed something last time.

Re: [HACKERS] Declarative partitioning - another take

2017-01-03 Thread Amit Langote

On 2017/01/03 19:04, Rajkumar Raghuwanshi wrote:
> On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote wrote:
>>
>> Attached patch should fix the same.
> 
> I have applied attached patch, server crash for range is fixed, but still
> getting crash for multi-level list partitioning insert.
> 
> postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY
> LIST(c);

[ ... ]

> postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') FROM
> generate_series(0, 599, 2) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hm, that's odd.  I tried your new example, but didn't get the crash.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-03 Thread Rajkumar Raghuwanshi

On Tue, Dec 27, 2016 at 3:24 PM, Amit Langote  wrote:

> On 2016/12/27 18:30, Rajkumar Raghuwanshi wrote:
> > Hi Amit,
> >
> > I have pulled latest sources from git and tried to create multi-level
> > partition,  getting a server crash, below are steps to reproduce. please
> > check if it is reproducible in your machine also.
> >
>
> [ ... ]
>
> > postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
> > generate_series(0, 599, 2) i;
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
>
> Thanks for the example.  Looks like there was an oversight in my patch
> that got committed as 2ac3ef7a01 [1].
>
> Attached patch should fix the same.
>
> Thanks,
> Amit
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
> 2ac3ef7a01df859c62d0a02333b646d65eaec5ff
>

Hi Amit,

I have applied attached patch, server crash for range is fixed, but still
getting crash for multi-level list partitioning insert.

postgres=# CREATE TABLE test_ml_l (a int, b int, c varchar) PARTITION BY
LIST(c);
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p1 PARTITION OF test_ml_l FOR VALUES IN
('', '0003', '0004', '0010') PARTITION BY LIST (c);
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p1_p1 PARTITION OF test_ml_l_p1 FOR
VALUES IN ('', '0003');
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p1_p2 PARTITION OF test_ml_l_p1 FOR
VALUES IN ('0004', '0010');
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p2 PARTITION OF test_ml_l FOR VALUES IN
('0001', '0005', '0002', '0009') PARTITION BY LIST (c);
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p2_p1 PARTITION OF test_ml_l_p2 FOR
VALUES IN ('0001', '0005');
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p2_p2 PARTITION OF test_ml_l_p2 FOR
VALUES IN ('0002', '0009');
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p3 PARTITION OF test_ml_l FOR VALUES IN
('0006', '0007', '0008', '0011') PARTITION BY LIST (ltrim(c,'A'));
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p3_p1 PARTITION OF test_ml_l_p3 FOR
VALUES IN ('0006', '0007');
CREATE TABLE
postgres=# CREATE TABLE test_ml_l_p3_p2 PARTITION OF test_ml_l_p3 FOR
VALUES IN ('0008', '0011');
CREATE TABLE
postgres=# INSERT INTO test_ml_l SELECT i, i, to_char(i/50, 'FM') FROM
generate_series(0, 599, 2) i;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 19:07, Amit Langote wrote:
> Attached should fix that.

Here are the last two patches with additional information like other
patches.  Forgot to do that yesterday.

Thanks,
Amit
>From 5a82b4caa6cec7845eb48e0397fab49c74b8dd98 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/2] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successive levels.  If we do end up changing the slot from
the original, we must update ecxt_scantuple to point to the new one for
partition key expressions to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 2 ++
 src/backend/executor/execMain.c  | 2 --
 src/test/regress/expected/insert.out | 2 +-
 src/test/regress/sql/insert.sql  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fca874752f..f9daf8052d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		PartitionDesc partdesc = parent->partdesc;
 		TupleTableSlot *myslot = parent->tupslot;
 		TupleConversionMap *map = parent->tupmap;
+		ExprContext *econtext = GetPerTupleExprContext(estate);
 
 		/* Quick exit */
 		if (partdesc->nparts == 0)
@@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		}
 
 		/* Extract partition key from tuple */
+		econtext->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bca34a509c..1d699c1dab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, _at);
 	if (result < 0)
 	{
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 49f667b119..ae54625034 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1
 drop cascades to table part_ee_ff2
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a int;
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 08dc068de8..9d3a34073c 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -173,7 +173,7 @@ drop table list_parted cascade;
 
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a int;
-- 
2.11.0

>From 798c51254563735dff843c71f1bbf34b969d8162 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:28:37 +0900
Subject: [PATCH 2/2] No BulkInsertState when tuple-routing is in action
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In future, we might consider alleviating this restriction by
allocating a BulkInsertState per partition.

Reported by: é«å¢ç¦ 
Patch by: Amit Langote (with pointers from é«å¢ç¦)
Reports: https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
---
 src/backend/commands/copy.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e9bf4afa44 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate)
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
-	BulkInsertState bistate;
+	BulkInsertState bistate = NULL;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 18:48, 高增琦 wrote:
> Hi ,
> 
> I tried "COPY FROM"  in the git version. It inserts rows to wrong partition.
> 
> step to reproduce:
> create table t(a int, b int) partition by range(a);
> create table t_p1 partition of t for values from (1) to (100);
> create table t_p2 partition of t for values from (100) to (200);
> create table t_p3 partition of t for values from (200) to (300);
> insert into t values(1,1);
> insert into t values(101,101);
> insert into t values(201,201);
> copy (select * from t) to '/tmp/test2.txt';
> copy t from '/tmp/test2.txt';
> select * from t_p1;
> 
> result:
> postgres=# select * from t_p1;
>   a  |  b
> -+-
>1 |   1
>1 |   1
>  101 | 101
>  201 | 201
> (4 rows)
> 
> I think the argument "BulkInsertState" used in CopyFrom/heap_insert
> is related to this problem. Please check it.

You're quite right.  Attached should fix that.

Thanks,
Amit
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e9bf4afa44 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate)
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
-	BulkInsertState bistate;
+	BulkInsertState bistate = NULL;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int			nBufferedTuples = 0;
@@ -2482,7 +2482,8 @@ CopyFrom(CopyState cstate)
 	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
-	bistate = GetBulkInsertState();
+	if (useHeapMultiInsert)
+		bistate = GetBulkInsertState();
 	econtext = GetPerTupleExprContext(estate);
 
 	/* Set up callback to identify error line number */
@@ -2707,7 +2708,8 @@ CopyFrom(CopyState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	FreeBulkInsertState(bistate);
+	if (bistate != NULL)
+		FreeBulkInsertState(bistate);
 
 	MemoryContextSwitchTo(oldcontext);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 18:30, Rajkumar Raghuwanshi wrote:
> Hi Amit,
> 
> I have pulled latest sources from git and tried to create multi-level
> partition,  getting a server crash, below are steps to reproduce. please
> check if it is reproducible in your machine also.
> 

[ ... ]

> postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
> generate_series(0, 599, 2) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Thanks for the example.  Looks like there was an oversight in my patch
that got committed as 2ac3ef7a01 [1].

Attached patch should fix the same.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7a01df859c62d0a02333b646d65eaec5ff
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fca874752f..f9daf8052d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		PartitionDesc partdesc = parent->partdesc;
 		TupleTableSlot *myslot = parent->tupslot;
 		TupleConversionMap *map = parent->tupmap;
+		ExprContext *econtext = GetPerTupleExprContext(estate);
 
 		/* Quick exit */
 		if (partdesc->nparts == 0)
@@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		}
 
 		/* Extract partition key from tuple */
+		econtext->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bca34a509c..1d699c1dab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, _at);
 	if (result < 0)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread 高增琦

Hi ,

I tried "COPY FROM"  in the git version. It inserts rows to wrong partition.

step to reproduce:
create table t(a int, b int) partition by range(a);
create table t_p1 partition of t for values from (1) to (100);
create table t_p2 partition of t for values from (100) to (200);
create table t_p3 partition of t for values from (200) to (300);
insert into t values(1,1);
insert into t values(101,101);
insert into t values(201,201);
copy (select * from t) to '/tmp/test2.txt';
copy t from '/tmp/test2.txt';
select * from t_p1;

result:
postgres=# select * from t_p1;
  a  |  b
-+-
   1 |   1
   1 |   1
 101 | 101
 201 | 201
(4 rows)

I think the argument "BulkInsertState" used in CopyFrom/heap_insert
is related to this problem. Please check it.

Thanks.





2016-12-27 17:30 GMT+08:00 Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com>:

> Hi Amit,
>
> I have pulled latest sources from git and tried to create multi-level
> partition,  getting a server crash, below are steps to reproduce. please
> check if it is reproducible in your machine also.
>
> postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY
> RANGE(a);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM
> (0) TO (250) PARTITION BY RANGE (b);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES
> FROM (0) TO (100);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES
> FROM (100) TO (250);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM
> (250) TO (500) PARTITION BY RANGE (c);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES
> FROM ('0250') TO ('0400');
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES
> FROM ('0400') TO ('0500');
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM
> (500) TO (600) PARTITION BY RANGE ((b + a));
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES
> FROM (1000) TO (1100);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES
> FROM (1100) TO (1200);
> CREATE TABLE
> postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
> generate_series(0, 599, 2) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>
>
>


-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Rajkumar Raghuwanshi

Hi Amit,

I have pulled latest sources from git and tried to create multi-level
partition,  getting a server crash, below are steps to reproduce. please
check if it is reproducible in your machine also.

postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY
RANGE(a);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM (0)
TO (250) PARTITION BY RANGE (b);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES
FROM (0) TO (100);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES
FROM (100) TO (250);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM
(250) TO (500) PARTITION BY RANGE (c);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES
FROM ('0250') TO ('0400');
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES
FROM ('0400') TO ('0500');
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM
(500) TO (600) PARTITION BY RANGE ((b + a));
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES
FROM (1000) TO (1100);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES
FROM (1100) TO (1200);
CREATE TABLE
postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 2) i;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/26 19:46, Amit Langote wrote:
> (Perhaps, the following should be its own new thread)
> 
> I noticed that ExecProcessReturning() doesn't work properly after tuple
> routing (example shows how returning tableoid currently fails but I
> mention some other issues below):
> 
> create table p (a int, b int) partition by range (a);
> create table p1 partition of p for values from (1) to (10);
> insert into p values (1) returning tableoid::regclass, *;
>  tableoid | a | b
> --+---+---
>  -| 1 |
> (1 row)
> 
> INSERT 0 1
> 
> I tried to fix that in 0007 to get:
> 
> insert into p values (1) returning tableoid::regclass, *;
>  tableoid | a | b
> --+---+---
>  p| 1 |
> (1 row)
> 
> INSERT 0 1
> 
> But I think it *may* be wrong to return the root table OID for tuples
> inserted into leaf partitions, because with select we get partition OIDs:
> 
> select tableoid::regclass, * from p;
>  tableoid | a | b
> --+---+---
>  p1   | 1 |
> (1 row)
> 
> If so, that means we should build the projection info (corresponding to
> the returning list) for each target partition somehow.  ISTM, that's going
> to have to be done within the planner by appropriate inheritance
> translation of the original returning targetlist.

Turns out getting the 2nd result may not require planner tweaks after all.
Unless I'm missing something, translation of varattnos of the RETURNING
target list can be done as late as ExecInitModifyTable() for the insert
case, unlike update/delete (which do require planner's attention).

I updated the patch 0007 to implement the same, including the test. While
doing that, I realized map_partition_varattnos introduced in 0003 is
rather restrictive in its applicability, because it assumes varno = 1 for
the expressions it accepts as input for the mapping.  Mapping returning
(target) list required modifying map_partition_varattnos to accept
target_varno as additional argument.  That way, we can map arbitrary
expressions from the parent attributes numbers to partition attribute
numbers for expressions not limited to partition constraints.

Patches 0001 to 0006 unchanged.

Thanks,
Amit
>From fcfe08948d31802547e93ac6551873afd554bc36 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 11:53:19 +0900
Subject: [PATCH 1/7] Allocate partition_tuple_slot in respective nodes

...instead of making it part of EState and its tuple table.
Respective nodes means ModifyTableState and CopyState for now.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c| 30 +-
 src/backend/executor/execMain.c| 12 
 src/backend/executor/nodeModifyTable.c | 17 -
 src/include/executor/executor.h|  1 +
 src/include/nodes/execnodes.h  |  6 +++---
 5 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e5a0f1bf80 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -161,11 +161,18 @@ typedef struct CopyStateData
 	ExprState **defexprs;		/* array of default att expressions */
 	bool		volatile_defexprs;		/* is any of defexprs volatile? */
 	List	   *range_table;
+
 	PartitionDispatch *partition_dispatch_info;
-	int			num_dispatch;
-	int			num_partitions;
-	ResultRelInfo *partitions;
+	/* Tuple-routing support info */
+	int			num_dispatch;		/* Number of entries in the above array */
+	int			num_partitions;		/* Number of members in the following
+	 * arrays */
+	ResultRelInfo  *partitions;		/* Per partition result relation */
 	TupleConversionMap **partition_tupconv_maps;
+	/* Per partition tuple conversion map */
+	TupleTableSlot *partition_tuple_slot;
+	/* Slot used to manipulate a tuple after
+	 * it is routed to a partition */
 
 	/*
 	 * These variables are used to reduce overhead in textual COPY FROM.
@@ -1409,6 +1416,7 @@ BeginCopy(ParseState *pstate,
 			PartitionDispatch  *partition_dispatch_info;
 			ResultRelInfo	   *partitions;
 			TupleConversionMap **partition_tupconv_maps;
+			TupleTableSlot	   *partition_tuple_slot;
 			int	num_parted,
 num_partitions;
 
@@ -1416,12 +1424,14 @@ BeginCopy(ParseState *pstate,
 		   _dispatch_info,
 		   ,
 		   _tupconv_maps,
+		   _tuple_slot,
 		   _parted, _partitions);
 			cstate->partition_dispatch_info = partition_dispatch_info;
 			cstate->num_dispatch = num_parted;
 			cstate->partitions = partitions;
 			cstate->num_partitions = num_partitions;
 			cstate->partition_tupconv_maps = partition_tupconv_maps;
+			cstate->partition_tuple_slot = partition_tuple_slot;
 		}
 	}
 	else
@@ -2436,15 +2446,6 @@ CopyFrom(CopyState cstate)
 	estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
 
 	/*
-	 * Initialize a dedicated slot to manipulate tuples of any given
-	 * partition's rowtype.
-	 */
-	if

Re: [HACKERS] Declarative partitioning - another take

2016-12-26 Thread Amit Langote

Sorry about the delay in replying.

On 2016/12/23 8:08, Robert Haas wrote:
> On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote
>  wrote:
>> While working on that, I discovered yet-another-bug having to do with the
>> tuple descriptor that's used as we route a tuple down a partition tree. If
>> attnums of given key attribute(s) are different on different levels, it
>> would be incorrect to use the original slot's (one passed by ExecInsert())
>> tuple descriptor to inspect the original slot's heap tuple, as we go down
>> the tree.  It might cause spurious "partition not found" at some level due
>> to looking at incorrect field in the input tuple because of using the
>> wrong tuple descriptor (root table's attnums not always same as other
>> partitioned tables in the tree).  Patch 0001 fixes that including a test.
> 
> I committed this, but I'm a bit uncomfortable with it: should the
> TupleTableSlot be part of the ModifyTableState rather than the EState?

Done that way in 0001 of the attached patches.  So, instead of making the
standalone partition_tuple_slot a field of EState (with the actual
TupleTableSlot in its tupleTable), it is now allocated within
ModifyTableState and CopyState, and released when ModifyTable node or
CopyFrom finishes, respectively.

>> It also addresses the problem I mentioned previously that once
>> tuple-routing is done, we failed to switch to a slot with the leaf
>> partition's tupdesc (IOW, continued to use the original slot with root
>> table's tupdesc causing spurious failures due to differences in attums
>> between the leaf partition and the root table).
>>
>> Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for
>> in my previous message.  Each patch has a test for the bug it's meant to fix.
> 
> Regarding 0002, I think that this is kind of a strange fix.  Wouldn't
> it be better to get hold of the original tuple instead of reversing
> the conversion?  And what of the idea of avoiding the conversion in
> the (probably very common) case where we can?

To get hold of the original tuple, how about adding an argument orig_slot
to ExecConstraints()?  I've implemented that approach in the new 0002.

Regarding the possibility of avoiding the conversion in very common cases,
I think that could be done considering the following:  If the mapping from
the attribute numbers of the parent table to that of a child table is an
identity map, we don't need to convert tuples.  Currently however,
convert_tuples_by_name() also requires tdtypeid of the input and output
TupleDescs to be equal.  The reason cited for that is that we may fail to
"inject the right OID into the tuple datum" if the types don't match.  In
case of partitioning, hasoid status must match between the parent and its
partitions at all times, so the aforementioned condition is satisfied
without requiring that tdtypeid are same.  And oid column (if present) is
always located at a given position in HeapTuple, so need not map that.

Based on the above argument, patch 0006  teaches convert_tuples_by_name()
to *optionally* not require tdtypeid of input and output tuple descriptors
to be equal.  It's implemented by introducing a new argument to
convert_tuples_by_name() named 'consider_typeid'.  We pass 'false' only
for the partitioning cases.

(Perhaps, the following should be its own new thread)

I noticed that ExecProcessReturning() doesn't work properly after tuple
routing (example shows how returning tableoid currently fails but I
mention some other issues below):

create table p (a int, b int) partition by range (a);
create table p1 partition of p for values from (1) to (10);
insert into p values (1) returning tableoid::regclass, *;
 tableoid | a | b
--+---+---
 -| 1 |
(1 row)

INSERT 0 1

I tried to fix that in 0007 to get:

insert into p values (1) returning tableoid::regclass, *;
 tableoid | a | b
--+---+---
 p| 1 |
(1 row)

INSERT 0 1

But I think it *may* be wrong to return the root table OID for tuples
inserted into leaf partitions, because with select we get partition OIDs:

select tableoid::regclass, * from p;
 tableoid | a | b
--+---+---
 p1   | 1 |
(1 row)

If so, that means we should build the projection info (corresponding to
the returning list) for each target partition somehow.  ISTM, that's going
to have to be done within the planner by appropriate inheritance
translation of the original returning targetlist.

Thanks,
Amit
>From 89f8740195189cc77391bdb844f5092c0440f061 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 11:53:19 +0900
Subject: [PATCH 1/7] Allocate partition_tuple_slot in respective nodes

...instead of making it part of EState and its tuple table.
Respective nodes means ModifyTableState and CopyState for now.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c| 30 +-
 src/backend/executor/execMain.c

Re: [HACKERS] Declarative partitioning - another take

2016-12-22 Thread Robert Haas

On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote
 wrote:
> While working on that, I discovered yet-another-bug having to do with the
> tuple descriptor that's used as we route a tuple down a partition tree. If
> attnums of given key attribute(s) are different on different levels, it
> would be incorrect to use the original slot's (one passed by ExecInsert())
> tuple descriptor to inspect the original slot's heap tuple, as we go down
> the tree.  It might cause spurious "partition not found" at some level due
> to looking at incorrect field in the input tuple because of using the
> wrong tuple descriptor (root table's attnums not always same as other
> partitioned tables in the tree).  Patch 0001 fixes that including a test.

I committed this, but I'm a bit uncomfortable with it: should the
TupleTableSlot be part of the ModifyTableState rather than the EState?

> It also addresses the problem I mentioned previously that once
> tuple-routing is done, we failed to switch to a slot with the leaf
> partition's tupdesc (IOW, continued to use the original slot with root
> table's tupdesc causing spurious failures due to differences in attums
> between the leaf partition and the root table).
>
> Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for
> in my previous message.  Each patch has a test for the bug it's meant to fix.

Regarding 0002, I think that this is kind of a strange fix.  Wouldn't
it be better to get hold of the original tuple instead of reversing
the conversion?  And what of the idea of avoiding the conversion in
the (probably very common) case where we can?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-22 Thread Amit Langote

On 2016/12/22 1:50, Robert Haas wrote:
> On Wed, Dec 21, 2016 at 5:33 AM, Amit Langote
>  wrote:
>> Breaking changes into multiple commits/patches does not seem to work for
>> adding regression tests.  So, I've combined multiple patches into a single
>> patch which is now patch 0002 in the attached set of patches.
> 
> Ugh, seriously?  It's fine to combine closely related bug fixes but
> not all of these are.  I don't see why you can't add some regression
> tests in one patch and then add some more in the next patch.

I managed to do that this time around with the attached set of patches.
Guess I gave up too easily in the previous attempt.

While working on that, I discovered yet-another-bug having to do with the
tuple descriptor that's used as we route a tuple down a partition tree. If
attnums of given key attribute(s) are different on different levels, it
would be incorrect to use the original slot's (one passed by ExecInsert())
tuple descriptor to inspect the original slot's heap tuple, as we go down
the tree.  It might cause spurious "partition not found" at some level due
to looking at incorrect field in the input tuple because of using the
wrong tuple descriptor (root table's attnums not always same as other
partitioned tables in the tree).  Patch 0001 fixes that including a test.
It also addresses the problem I mentioned previously that once
tuple-routing is done, we failed to switch to a slot with the leaf
partition's tupdesc (IOW, continued to use the original slot with root
table's tupdesc causing spurious failures due to differences in attums
between the leaf partition and the root table).

Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for
in my previous message.  Each patch has a test for the bug it's meant to fix.

Patch 0005 is the same old "Add some more tests for tuple-routing" per [1]:

> Meanwhile, committed the latest 0001 and the elog() patch.

Thanks!

Regards,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZ86v1G%2Bzx9etMiSQaBBvYMKfU-iitqZArSh5z0n8Q4cA%40mail.gmail.com
>From cac625b22990c12a537eaa4c77434017a2303b92 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 22 Dec 2016 15:33:41 +0900
Subject: [PATCH 1/5] Use correct tuple descriptor before and after routing a
 tuple

When we have a multi-level partitioned hierarchy where tables at
successive levels have different attnums for a partition key attribute(s),
we must use the tuple descriptor of a partitioned table of the given level
to inspect the appropriately converted representation of the input tuple
before computing the partition key to perform tuple-routing.

So, store in each PartitionDispatch object, a standalone TupleTableSlot
initialized with the TupleDesc of the corresponding partitioned table,
along with a TupleConversionMap to map tuples from the its parent's
rowtype to own rowtype.

After the routing is done and a leaf partition returned, we must use the
leaf partition's tuple descriptor, not the root table's.  For roughly
same reasons as above.  For the ext row and so on, we must then switch
back to the root table's tuple descriptor.  To that end, a dedicated
TupleTableSlot is allocated in EState called es_partition_tuple_slot,
whose descriptor is set to a given leaf partition for every input tuple
after it's routed.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/catalog/partition.c| 74 --
 src/backend/commands/copy.c| 31 +-
 src/backend/executor/nodeModifyTable.c | 27 +
 src/include/catalog/partition.h|  7 
 src/include/nodes/execnodes.h  |  3 ++
 src/test/regress/expected/insert.out   | 37 +
 src/test/regress/sql/insert.sql| 26 
 7 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 9980582b77..fca874752f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -923,13 +923,19 @@ RelationGetPartitionQual(Relation rel, bool recurse)
 	return generate_partition_qual(rel, recurse);
 }

-/* Turn an array of OIDs with N elements into a list */
-#define OID_ARRAY_TO_LIST(arr, N, list) \
+/*
+ * Append OIDs of rel's partitions to the list 'partoids' and for each OID,
+ * append pointer rel to the list 'parents'.
+ */
+#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \
 	do\
 	{\
 		int		i;\
-		for (i = 0; i < (N); i++)\
-			(list) = lappend_oid((list), (arr)[i]);\
+		for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\
+		{\
+			(partoids) = lappend_oid((partoids), (rel)->rd_partdesc->oids[i]);\
+			(parents) = lappend((parents), (rel));\
+		}\
 	} while(0)

 /*
@@ -944,11 +950,13 @@ PartitionDispatch *
 RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
  int *num_parted, List **leaf_part_oids)
 {
-	PartitionDesc rootpartdesc =

Re: [HACKERS] Declarative partitioning - another take

2016-12-21 Thread Robert Haas

On Wed, Dec 21, 2016 at 5:33 AM, Amit Langote
 wrote:
> Breaking changes into multiple commits/patches does not seem to work for
> adding regression tests.  So, I've combined multiple patches into a single
> patch which is now patch 0002 in the attached set of patches.

Ugh, seriously?  It's fine to combine closely related bug fixes but
not all of these are.  I don't see why you can't add some regression
tests in one patch and then add some more in the next patch.

Meanwhile, committed the latest 0001 and the elog() patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-21 Thread Amit Langote

On 2016/12/21 1:53, Robert Haas wrote:
> On Mon, Dec 19, 2016 at 10:59 PM, Robert Haas  wrote:
>> On Sun, Dec 18, 2016 at 10:00 PM, Amit Langote
>>  wrote:
>>> Here are updated patches including the additional information.
>>
>> Thanks.  Committed 0001.  Will have to review the others when I'm less tired.
> 
> 0002. Can you add a test case for the bug fixed by this patch?

Done (also see below).

> 0003. Loses equalTupleDescs() check and various cases where
> ExecOpenIndexes can be skipped.  Isn't that bad?

I realized that as far as checking whether tuple conversion mapping is
required, the checks performed by convert_tuples_by_name() at the
beginning of the function following the comment /* Verify compatibility
and prepare attribute-number map */ are enough.  If equalTupleDescs()
returned false (which it always does because tdtypeid are never the same
for passed in tuple descriptors), we would be repeating some of the tests
in convert_tuples_by_name() anyway.

As for the checks performed for ExecOpenIndices(), it seems would be
better to keep the following in place, so added back:

if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
leaf_part_rri->ri_IndexRelationDescs == NULL)
ExecOpenIndices(leaf_part_rri, false);

> Also, "We locked all
> the partitions above including the leaf partitions" should say "Caller
> must have locked all the partitions including the leaf partitions".

No, we do the locking in RelationGetPartitionDispatchInfo(), which is
called by ExecSetupPartitionTupleRouting() itself.

In ExecSetupPartitionTupleRouting() is the first time we lock all the
partitions.

> 0004. Unnecessary whitespace change in executor.h.  Still don't
> understand why we need to hoist RelationGetPartitionQual() into the
> caller.

We only need to check a result relation's (ri_RelationDesc's) partition
constraint if we are inserting into the result relation directly.  In case
of tuple-routing, we do not want to check the leaf partitions' partition
constraint, but if the direct target in that case is an internal
partition, we must check its partition constraint, which is same for all
leaf partitions in that sub-tree.  It wouldn't be wrong per se to check
each leaf partition's constraint in that case, which includes the target
partitioned table's constraint as well, but that would inefficient due to
both having to retrieve the constraints and having ExecConstraints()
*unnecessarily* check it for every row.

If we keep doing RelationGetPartitionQual() in InitResultRelInfo()
controlled by a bool argument (load_partition_check), we cannot avoid the
above mentioned inefficiency if we want to fix this bug.

> 0005. Can you add a test case for the bug fixed by this patch?

Done, but...

Breaking changes into multiple commits/patches does not seem to work for
adding regression tests.  So, I've combined multiple patches into a single
patch which is now patch 0002 in the attached set of patches.  Its commit
message is very long now.  To show an example of bugs that 0002 is meant for:

create table p (a int, b int) partition by range (a, b);
create table p1 (b int, a int not null) partition by range (b);
create table p11 (like p1);
alter table p11 drop a;
alter table p11 add a int;
alter table p11 drop a;
alter table p11 add a int not null;

# select attrelid::regclass, attname, attnum
  from pg_attribute
  where attnum > 0
  and (attrelid = 'p'::regclass
or attrelid = 'p1'::regclass
or attrelid = 'p11'::regclass) and attname = 'a';
 attrelid | attname | attnum
--+-+
 p| a   |  1
 p1   | a   |  2
 p11  | a   |  4
(3 rows)

alter table p1 attach partition p11 for values from (1) to (5);
alter table p attach partition p1 for values from (1, 1) to (1, 10);

-- the following is wrong
# insert into p11 (a, b) values (10, 4);
INSERT 0 1

-- wrong too (using the wrong TupleDesc after tuple routing)
# insert into p1 (a, b) values (10, 4);
ERROR:  null value in column "a" violates not-null constraint
DETAIL:  Failing row contains (4, null).

-- once we fix the wrong TupleDesc issue
# insert into p1 (a, b) values (10, 4);
INSERT 0 1

which is wrong because p1, as a partition of p, should not accept 10 for
a.  But its partition constraint is not being applied to the leaf
partition p11 into which the tuple is routed (the bug).

Thanks,
Amit
>From d21ae74ae01e93f21df5b84ed097ebbc85e529dd Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 15 Dec 2016 15:56:27 +0900
Subject: [PATCH 1/3] Refactor tuple-routing setup code

It's unnecessarily repeated in copy.c and nodeModifyTable.c, which makes
it a burden to maintain.  Should've been like this to begin with.

I moved the common code to ExecSetupPartitionTupleRouting() in execMain.c
that also houses ExecFindParttion() currently.  Hmm, should there be a
new src/backend/executor/execPartition.c?

Reported

Re: [HACKERS] Declarative partitioning - another take

2016-12-20 Thread Amit Langote

On 2016/12/21 14:03, Amit Langote wrote:
> OK, updated patch attached.

Oops, incomplete patch that was.  Complete patch attached this time.

Thanks,
Amit
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c219b03dd..115b98313e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13297,8 +13297,11 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 		}
 	}
 
+	/* It's safe to skip the validation scan after all */
 	if (skip_validate)
-		elog(NOTICE, "skipping scan to validate partition constraint");
+		ereport(INFO,
+(errmsg("partition constraint for table \"%s\" is implied by existing constraints",
+		RelationGetRelationName(attachRel;
 
 	/*
 	 * Set up to have the table to be scanned to validate the partition
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 99e20eb922..62e18961d3 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3179,7 +3179,7 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
 ALTER TABLE list_parted2 DETACH PARTITION part_3_4;
 ALTER TABLE part_3_4 ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
-NOTICE:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part_3_4" is implied by existing constraints
 -- check validation when attaching range partitions
 CREATE TABLE range_parted (
 	a int,
@@ -3204,7 +3204,7 @@ CREATE TABLE part2 (
 	b int NOT NULL CHECK (b >= 10 AND b < 18)
 );
 ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20);
-NOTICE:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part2" is implied by existing constraints
 -- check that leaf partitions are scanned when attaching a partitioned
 -- table
 CREATE TABLE part_5 (
@@ -3219,7 +3219,7 @@ ERROR:  partition constraint is violated by some row
 DELETE FROM part_5_a WHERE a NOT IN (3);
 ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5);
-NOTICE:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part_5" is implied by existing constraints
 -- check that the table being attached is not already a partition
 ALTER TABLE list_parted2 ATTACH PARTITION part_2 FOR VALUES IN (2);
 ERROR:  "part_2" is already a partition

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-20 Thread Amit Langote

On 2016/12/21 13:42, Robert Haas wrote:
> On Tue, Dec 20, 2016 at 9:14 PM, Amit Langote
>  wrote:
>> On 2016/12/21 1:45, Alvaro Herrera wrote:
>>> Robert Haas wrote:
 On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera
  wrote:
> Even if we decide to keep the message, I think it's not very good
> wording anyhow; as a translator I disliked it on sight.  Instead of
> "skipping scan to validate" I would use "skipping validation scan",
> except that it's not clear what it is we're validating.  Mentioning
> partition constraint in errcontext() doesn't like a great solution, but
> I can't think of anything better.

 Maybe something like: partition constraint for table \"%s\" is implied
 by existing constraints
>>>
>>> Actually, shouldn't we emit a message if we *don't* skip the check?
>>
>> Scanning (aka, not skipping) to validate the partition constraint is the
>> default behavior, so a user would be expecting it anyway, IOW, need not be
>> informed of it.  But when ATExecAttachPartition's efforts to avoid the
>> scan by comparing the partition constraint against existing constraints
>> (which the user most probably deliberately added just for this) succeed,
>> that seems like a better piece of information to provide the user with,
>> IMHO.  But then again, having a message printed before a potentially long
>> validation scan seems like something a user would like to see, to know
>> what it is that is going to take so long.  Hmm.
>>
>> Anyway, what would the opposite of Robert's suggested message look like:
>> "scanning table \"%s\" to validate partition constraint"?
> 
> Maybe: partition constraint for table \"%s\" is implied by existing 
> constraints

OK, updated patch attached.

Thanks,
Amit
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 2a324c0b49..62e18961d3 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3179,7 +3179,7 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
 ALTER TABLE list_parted2 DETACH PARTITION part_3_4;
 ALTER TABLE part_3_4 ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
-INFO:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part_3_4" is implied by existing constraints
 -- check validation when attaching range partitions
 CREATE TABLE range_parted (
 	a int,
@@ -3204,7 +3204,7 @@ CREATE TABLE part2 (
 	b int NOT NULL CHECK (b >= 10 AND b < 18)
 );
 ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20);
-INFO:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part2" is implied by existing constraints
 -- check that leaf partitions are scanned when attaching a partitioned
 -- table
 CREATE TABLE part_5 (
@@ -3219,7 +3219,7 @@ ERROR:  partition constraint is violated by some row
 DELETE FROM part_5_a WHERE a NOT IN (3);
 ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5);
-INFO:  skipping scan to validate partition constraint
+INFO:  partition constraint for table "part_5" is implied by existing constraints
 -- check that the table being attached is not already a partition
 ALTER TABLE list_parted2 ATTACH PARTITION part_2 FOR VALUES IN (2);
 ERROR:  "part_2" is already a partition

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-20 Thread Robert Haas

On Tue, Dec 20, 2016 at 9:14 PM, Amit Langote
 wrote:
> On 2016/12/21 1:45, Alvaro Herrera wrote:
>> Robert Haas wrote:
>>> On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera
>>>  wrote:
 Even if we decide to keep the message, I think it's not very good
 wording anyhow; as a translator I disliked it on sight.  Instead of
 "skipping scan to validate" I would use "skipping validation scan",
 except that it's not clear what it is we're validating.  Mentioning
 partition constraint in errcontext() doesn't like a great solution, but
 I can't think of anything better.
>>>
>>> Maybe something like: partition constraint for table \"%s\" is implied
>>> by existing constraints
>>
>> Actually, shouldn't we emit a message if we *don't* skip the check?
>
> Scanning (aka, not skipping) to validate the partition constraint is the
> default behavior, so a user would be expecting it anyway, IOW, need not be
> informed of it.  But when ATExecAttachPartition's efforts to avoid the
> scan by comparing the partition constraint against existing constraints
> (which the user most probably deliberately added just for this) succeed,
> that seems like a better piece of information to provide the user with,
> IMHO.  But then again, having a message printed before a potentially long
> validation scan seems like something a user would like to see, to know
> what it is that is going to take so long.  Hmm.
>
> Anyway, what would the opposite of Robert's suggested message look like:
> "scanning table \"%s\" to validate partition constraint"?

Maybe: partition constraint for table \"%s\" is implied by existing constraints

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-20 Thread Amit Langote

On 2016/12/21 1:45, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Tue, Dec 20, 2016 at 10:27 AM, Alvaro Herrera
>>  wrote:
>>> Even if we decide to keep the message, I think it's not very good
>>> wording anyhow; as a translator I disliked it on sight.  Instead of
>>> "skipping scan to validate" I would use "skipping validation scan",
>>> except that it's not clear what it is we're validating.  Mentioning
>>> partition constraint in errcontext() doesn't like a great solution, but
>>> I can't think of anything better.
>>
>> Maybe something like: partition constraint for table \"%s\" is implied
>> by existing constraints
> 
> Actually, shouldn't we emit a message if we *don't* skip the check?

Scanning (aka, not skipping) to validate the partition constraint is the
default behavior, so a user would be expecting it anyway, IOW, need not be
informed of it.  But when ATExecAttachPartition's efforts to avoid the
scan by comparing the partition constraint against existing constraints
(which the user most probably deliberately added just for this) succeed,
that seems like a better piece of information to provide the user with,
IMHO.  But then again, having a message printed before a potentially long
validation scan seems like something a user would like to see, to know
what it is that is going to take so long.  Hmm.

Anyway, what would the opposite of Robert's suggested message look like:
"scanning table \"%s\" to validate partition constraint"?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 >

1 - 100 of 309 matches

Mail list logo