Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-19 Thread Amit Langote
On 2018/04/20 4:40, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> Amit Langote wrote:
>>
>>> Yeah, I too have wondered in the past what it would take to make
>>> equalTupDescs() return true for parent and partitions.  Maybe we can make
>>> it work by looking a bit harder than I did then.
>>
>> How about simply relaxing the tdtypeid test from equalTupleDescs?  I
>> haven't looked deeply but I think just checking whether or not both are
>> RECORDOID might be sufficient, for typecache purposes.
> 
> After looking at the code, I'm a bit nervous about doing this, because I
> don't fully understand what is going on in typcache, and what is the
> HeapTupleHeaderGetTypeId macro really doing.  I'm afraid that if we
> confuse a table's tupdesc with one of its partition's , something
> entirely random might end up happening.
> 
> Maybe this is completely off-base, but if so I'd like to have to proof.
> So I'm thinking of reverting that patch instead per your patch.
> 
> While composing this we got emails from Robert and Peter G suggesting
> the same too, so consider it done.

Thank you for committing the patch.

Regards,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-19 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Amit Langote wrote:
> 
> > Yeah, I too have wondered in the past what it would take to make
> > equalTupDescs() return true for parent and partitions.  Maybe we can make
> > it work by looking a bit harder than I did then.
> 
> How about simply relaxing the tdtypeid test from equalTupleDescs?  I
> haven't looked deeply but I think just checking whether or not both are
> RECORDOID might be sufficient, for typecache purposes.

After looking at the code, I'm a bit nervous about doing this, because I
don't fully understand what is going on in typcache, and what is the
HeapTupleHeaderGetTypeId macro really doing.  I'm afraid that if we
confuse a table's tupdesc with one of its partition's , something
entirely random might end up happening.

Maybe this is completely off-base, but if so I'd like to have to proof.
So I'm thinking of reverting that patch instead per your patch.

While composing this we got emails from Robert and Peter G suggesting
the same too, so consider it done.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-19 Thread Peter Geoghegan
On Thu, Apr 19, 2018 at 12:00 PM, Robert Haas  wrote:
> On Thu, Apr 19, 2018 at 1:20 PM, Alvaro Herrera  
> wrote:
>> How about simply relaxing the tdtypeid test from equalTupleDescs?  I
>> haven't looked deeply but I think just checking whether or not both are
>> RECORDOID might be sufficient, for typecache purposes.
>
> That strike me as a very scary thing to do.  There's code all over the
> system that may have non-obvious assumptions about the behavior of
> equalTupleDescs(), and I don't think we can have any confidence that
> nothing will break unless we do a detailed audit of all that code.

+1. I think that it is plainly a bad idea to do something like that at
this point in the cycle.

-- 
Peter Geoghegan



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-19 Thread Robert Haas
On Thu, Apr 19, 2018 at 1:20 PM, Alvaro Herrera  wrote:
> Amit Langote wrote:
>> Yeah, I too have wondered in the past what it would take to make
>> equalTupDescs() return true for parent and partitions.  Maybe we can make
>> it work by looking a bit harder than I did then.
>
> How about simply relaxing the tdtypeid test from equalTupleDescs?  I
> haven't looked deeply but I think just checking whether or not both are
> RECORDOID might be sufficient, for typecache purposes.

That strike me as a very scary thing to do.  There's code all over the
system that may have non-obvious assumptions about the behavior of
equalTupleDescs(), and I don't think we can have any confidence that
nothing will break unless we do a detailed audit of all that code.

> If we just remove the tdtypeid test, check-world passes.

That does not reassure me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-19 Thread Alvaro Herrera
Amit Langote wrote:

> Yeah, I too have wondered in the past what it would take to make
> equalTupDescs() return true for parent and partitions.  Maybe we can make
> it work by looking a bit harder than I did then.

How about simply relaxing the tdtypeid test from equalTupleDescs?  I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

If we just remove the tdtypeid test, check-world passes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-18 Thread Amit Langote
On 2018/04/18 22:40, Alvaro Herrera wrote:
> Amit Langote wrote:
>> On 2018/04/18 0:04, Alvaro Herrera wrote:
>>> Amit Langote wrote:
>>>
 I just confirmed my hunch that this wouldn't somehow do the right thing
 when the OID system column is involved.  Like this case:
>>>
>>> This looks too big a patch to pursue now.  I'm inclined to just remove
>>> the equalTupdesc changes.
>>
>> OK.  Here is the patch that removes equalTupdesc optimization.
> 
> Hmm.  If we modify (during pg12, of course -- not now) partition tables
> that are created identical to their parent table so that they share the
> pg_type row, this would become useful.  Unless there a reason why that
> change is completely unworkable, I'd just leave it there.  (I claim that
> it works like that only because it used to work like that, not because
> it's impossible to make work the other way.)

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions.  Maybe we can make
it work by looking a bit harder than I did then.

Although, just leaving it there now would mean we're adding a few cycles
needlessly in the PG 11 code.  Why not add that optimization when we
surely know it can work?

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-18 Thread Alvaro Herrera
Amit Langote wrote:
> On 2018/04/18 0:04, Alvaro Herrera wrote:
> > Amit Langote wrote:
> > 
> >> I just confirmed my hunch that this wouldn't somehow do the right thing
> >> when the OID system column is involved.  Like this case:
> > 
> > This looks too big a patch to pursue now.  I'm inclined to just remove
> > the equalTupdesc changes.
> 
> OK.  Here is the patch that removes equalTupdesc optimization.

Hmm.  If we modify (during pg12, of course -- not now) partition tables
that are created identical to their parent table so that they share the
pg_type row, this would become useful.  Unless there a reason why that
change is completely unworkable, I'd just leave it there.  (I claim that
it works like that only because it used to work like that, not because
it's impossible to make work the other way.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Amit Langote
On 2018/04/18 0:04, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> I just confirmed my hunch that this wouldn't somehow do the right thing
>> when the OID system column is involved.  Like this case:
> 
> This looks too big a patch to pursue now.  I'm inclined to just remove
> the equalTupdesc changes.

OK.  Here is the patch that removes equalTupdesc optimization.

I will add the rest of the patch to the next CF after starting a new
thread for it sometime later.

Thanks,
Amit
From ceaba0f59653be237f9bafee47fe82205db6fe14 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 18 Apr 2018 13:22:49 +0900
Subject: [PATCH v1] Remove equalTupDescs-based optimization in
 ExecInitPartitionInfo

---
 src/backend/executor/execPartition.c | 151 ---
 1 file changed, 67 insertions(+), 84 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 23a74bc3d9..a2f6b29cd5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -313,7 +313,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
MemoryContext oldContext;
AttrNumber *part_attnos = NULL;
boolfound_whole_row;
-   boolequalTupdescs;
 
/*
 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -361,10 +360,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
(node != NULL &&
 node->onConflictAction != 
ONCONFLICT_NONE));
 
-   /* if tuple descs are identical, we don't need to map the attrs */
-   equalTupdescs = equalTupleDescs(RelationGetDescr(partrel),
-   
RelationGetDescr(firstResultRel));
-
/*
 * Build WITH CHECK OPTION constraints for the partition.  Note that we
 * didn't build the withCheckOptionList for partitions within the 
planner,
@@ -405,21 +400,18 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
/*
 * Convert Vars in it to contain this partition's attribute 
numbers.
 */
-   if (!equalTupdescs)
-   {
-   part_attnos =
-   
convert_tuples_by_name_map(RelationGetDescr(partrel),
-   
   RelationGetDescr(firstResultRel),
-   
   gettext_noop("could not convert row type"));
-   wcoList = (List *)
-   map_variable_attnos((Node *) wcoList,
-   
firstVarno, 0,
-   
part_attnos,
-   
RelationGetDescr(firstResultRel)->natts,
-   
RelationGetForm(partrel)->reltype,
-   
_whole_row);
-   /* We ignore the value of found_whole_row. */
-   }
+   part_attnos =
+   convert_tuples_by_name_map(RelationGetDescr(partrel),
+  
RelationGetDescr(firstResultRel),
+  
gettext_noop("could not convert row type"));
+   wcoList = (List *)
+   map_variable_attnos((Node *) wcoList,
+   firstVarno, 0,
+   part_attnos,
+   
RelationGetDescr(firstResultRel)->natts,
+   
RelationGetForm(partrel)->reltype,
+   
_whole_row);
+   /* We ignore the value of found_whole_row. */
 
foreach(ll, wcoList)
{
@@ -464,25 +456,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 */
returningList = linitial(node->returningLists);
 
-   if (!equalTupdescs)
-   {
-   /*
-* Convert Vars in it to contain this partition's 
attribute numbers.
-*/
-   if (part_attnos == NULL)
-   part_attnos =
-   
convert_tuples_by_name_map(RelationGetDescr(partrel),
-   
   

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Amit Langote
On 2018/04/18 0:02, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> Attached find a patch that does that.  When working on this, I noticed
>> that when recursing for inheritance children, ATPrepAlterColumnType()
>> would use a AlterTableCmd (cmd) that's already scribbled on as if it were
>> the original.
> 
> While I agree that the code here is in poor style, there is no live bug
> here, because the only thing that is changed each time is the copy's
> cmd->def, and its value is not obtained from the scribbled 'cmd' -- it's
> obtained from the passed-in cmd->def, which is unmodified.

Ah, you're right.  The original cmd->def itself remains intact.

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Alvaro Herrera
Amit Langote wrote:

> I just confirmed my hunch that this wouldn't somehow do the right thing
> when the OID system column is involved.  Like this case:

This looks too big a patch to pursue now.  I'm inclined to just remove
the equalTupdesc changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Alvaro Herrera
Amit Langote wrote:

> Attached find a patch that does that.  When working on this, I noticed
> that when recursing for inheritance children, ATPrepAlterColumnType()
> would use a AlterTableCmd (cmd) that's already scribbled on as if it were
> the original.

While I agree that the code here is in poor style, there is no live bug
here, because the only thing that is changed each time is the copy's
cmd->def, and its value is not obtained from the scribbled 'cmd' -- it's
obtained from the passed-in cmd->def, which is unmodified.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Amit Langote
On 2018/04/17 16:45, Amit Langote wrote:
> Instead of doing this, I think we should try to make
> convert_tuples_by_name_map() a bit smarter by integrating the logic in
> convert_tuples_by_name() that's used conclude if no tuple conversion is
> necessary.  So, if it turns that the tuples descriptors passed to
> convert_tuples_by_name_map() contain the same number of attributes and the
> individual attributes are at the same positions, we signal to the caller
> that no conversion is necessary by returning NULL.
> 
> Attached find a patch that does that.
I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved.  Like this case:

create table parent (a int);
create table child () inherits (parent) with oids;
insert into parent values (1);
insert into child values (1);
analyze parent;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That's because, convert_tuples_by_name() that's called by
acquire_inherited_sample_rows() gets a TupleConversionMap whose attrMap is
set to NULL.  do_convert_tuple() may then try to access a member of such
NULL attrMap.  In this case, even if parent and child tables have same
user attributes, patched convert_tuples_by_name_map would return NULL, but
since their hasoids setting doesn't match, a TupleConversionMap is still
returned but has its attrMap set to NULL.  To fix that, I taught
do_convert_tuple() to ignore the map if NULL.  Also, free_conversion_map()
shouldn't try to free attrMap if it's NULL.

Attached updated patch.

Thanks,
Amit
From d5cc2db9bd610523915d1512c2fcad84e8bae3b6 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 17 Apr 2018 15:51:42 +0900
Subject: [PATCH v2] Optimize convert_tuples_by_name_map usage a bit

---
 src/backend/access/common/tupconvert.c |  79 -
 src/backend/catalog/index.c|  31 +
 src/backend/catalog/partition.c|  13 ++--
 src/backend/catalog/pg_constraint.c|   5 +-
 src/backend/commands/indexcmds.c   |  14 ++--
 src/backend/commands/tablecmds.c   |  32 +
 src/backend/executor/execPartition.c   | 122 ++---
 src/backend/parser/parse_utilcmd.c |   9 +--
 8 files changed, 143 insertions(+), 162 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c 
b/src/backend/access/common/tupconvert.c
index 2d0d2f4b32..2ee4a7f40f 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -214,57 +214,20 @@ convert_tuples_by_name(TupleDesc indesc,
TupleConversionMap *map;
AttrNumber *attrMap;
int n = outdesc->natts;
-   int i;
-   boolsame;
 
/* Verify compatibility and prepare attribute-number map */
attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
/*
-* Check to see if the map is one-to-one, in which case we need not do a
-* tuple conversion.  We must also insist that both tupdescs either
-* specify or don't specify an OID column, else we need a conversion to
-* add/remove space for that.  (For some callers, presence or absence of
-* an OID column perhaps would not really matter, but let's be safe.)
+* If attributes are at the same positions in the input and output
+* descriptors, there is no need for tuple conversion.  Also, we must 
also
+* insist that both tupdescs either specify or don't specify an OID 
column,
+* else we need a conversion to add/remove space for that.  (For some
+* callers, presence or absence of an OID column perhaps would not 
really
+* matter, but let's be safe.)
 */
-   if (indesc->natts == outdesc->natts &&
-   indesc->tdhasoid == outdesc->tdhasoid)
-   {
-   same = true;
-   for (i = 0; i < n; i++)
-   {
-   Form_pg_attribute inatt;
-   Form_pg_attribute outatt;
-
-   if (attrMap[i] == (i + 1))
-   continue;
-
-   /*
-* If it's a dropped column and the corresponding input 
column is
-* also dropped, we needn't convert.  However, attlen 
and attalign
-* must agree.
-*/
-   inatt = TupleDescAttr(indesc, i);
-   outatt = TupleDescAttr(outdesc, i);
-   if (attrMap[i] == 0 &&
-   inatt->attisdropped &&
-   inatt->attlen == outatt->attlen &&
-   inatt->attalign == outatt->attalign)
-   continue;
-
-   

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-17 Thread Amit Langote
On 2018/04/17 4:10, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> The solution I came up with is to call map_variable_attnos() directly,
>> instead of going through map_partition_varattnos() every time, after first
>> creating the attribute map ourselves.
> 
> Yeah, sounds good.  I added a tweak: if the tupledescs are equal, there
> should be no need to do any mapping.

Thanks for the commit!

About the equalTupleDescs()-based optimization you added -- It seems to me
that that *always* returns false if you pass it tupledescs of two
different tables, which in this case, we do.  That's because
equalTupleDescs has this:

if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

So, it fails to optimize as you were hoping it would.

Instead of doing this, I think we should try to make
convert_tuples_by_name_map() a bit smarter by integrating the logic in
convert_tuples_by_name() that's used conclude if no tuple conversion is
necessary.  So, if it turns that the tuples descriptors passed to
convert_tuples_by_name_map() contain the same number of attributes and the
individual attributes are at the same positions, we signal to the caller
that no conversion is necessary by returning NULL.

Attached find a patch that does that.  When working on this, I noticed
that when recursing for inheritance children, ATPrepAlterColumnType()
would use a AlterTableCmd (cmd) that's already scribbled on as if it were
the original.

Thanks,
Amit
From ec390e3d5d25e53d39d0be30961e9e272a5cf88e Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 17 Apr 2018 15:51:42 +0900
Subject: [PATCH v1] Optimize convert_tuples_by_name_map usage a bit

---
 src/backend/access/common/tupconvert.c |  74 +++-
 src/backend/catalog/index.c|  31 +
 src/backend/catalog/partition.c|  13 ++--
 src/backend/catalog/pg_constraint.c|   5 +-
 src/backend/commands/indexcmds.c   |  14 ++--
 src/backend/commands/tablecmds.c   |  32 +
 src/backend/executor/execPartition.c   | 121 ++---
 src/backend/parser/parse_utilcmd.c |   9 +--
 8 files changed, 139 insertions(+), 160 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c 
b/src/backend/access/common/tupconvert.c
index 2d0d2f4b32..6f3de2d8f2 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -214,57 +214,20 @@ convert_tuples_by_name(TupleDesc indesc,
TupleConversionMap *map;
AttrNumber *attrMap;
int n = outdesc->natts;
-   int i;
-   boolsame;
 
/* Verify compatibility and prepare attribute-number map */
attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
/*
-* Check to see if the map is one-to-one, in which case we need not do a
-* tuple conversion.  We must also insist that both tupdescs either
-* specify or don't specify an OID column, else we need a conversion to
-* add/remove space for that.  (For some callers, presence or absence of
-* an OID column perhaps would not really matter, but let's be safe.)
+* If attributes are at the same positions in the input and output
+* descriptors, there is no need for tuple conversion.  Also, we must 
also
+* insist that both tupdescs either specify or don't specify an OID 
column,
+* else we need a conversion to add/remove space for that.  (For some
+* callers, presence or absence of an OID column perhaps would not 
really
+* matter, but let's be safe.)
 */
-   if (indesc->natts == outdesc->natts &&
-   indesc->tdhasoid == outdesc->tdhasoid)
-   {
-   same = true;
-   for (i = 0; i < n; i++)
-   {
-   Form_pg_attribute inatt;
-   Form_pg_attribute outatt;
-
-   if (attrMap[i] == (i + 1))
-   continue;
-
-   /*
-* If it's a dropped column and the corresponding input 
column is
-* also dropped, we needn't convert.  However, attlen 
and attalign
-* must agree.
-*/
-   inatt = TupleDescAttr(indesc, i);
-   outatt = TupleDescAttr(outdesc, i);
-   if (attrMap[i] == 0 &&
-   inatt->attisdropped &&
-   inatt->attlen == outatt->attlen &&
-   inatt->attalign == outatt->attalign)
-   continue;
-
-   same = false;
-   break;
-   }
-   }
-   else
-   same = false;
-
-   if (same)
-   {
-   /* Runtime conversion is not needed */
-   pfree(attrMap);
+   

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Tom Lane
Alvaro Herrera  writes:
> Pushed now, thanks.

Buildfarm doesn't like this even a little bit.

regards, tom lane



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Alvaro Herrera
Amit Langote wrote:

> The solution I came up with is to call map_variable_attnos() directly,
> instead of going through map_partition_varattnos() every time, after first
> creating the attribute map ourselves.

Yeah, sounds good.  I added a tweak: if the tupledescs are equal, there
should be no need to do any mapping.

(Minor adjustment to the test: "cuarenta" means forty, so I changed the
new test to say "cincuenta" instead, which means fifty).

Pushed now, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Amit Langote
On 2018/04/10 11:56, Amit Langote wrote:
> On 2018/03/27 13:27, Amit Langote wrote:
>> On 2018/03/26 23:20, Alvaro Herrera wrote:
>>> The one thing I wasn't terribly in love with is the four calls to
>>> map_partition_varattnos(), creating the attribute map four times ... but
>>> we already have it in the TupleConversionMap, no?  Looks like we could
>>> save a bunch of work there.
>>
>> Hmm, actually we can't use that map, assuming you're talking about the
>> following map:
>>
>>   TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
>>
>> We can only use that to tell if we need converting expressions (as we
>> currently do), but it cannot be used to actually convert the expressions.
>> The map in question is for use by do_convert_tuple(), not to map varattnos
>> in Vars using map_variable_attnos().
>>
>> But it's definitely a bit undesirable to have various
>> map_partition_varattnos() calls within ExecInitPartitionInfo() to
>> initialize the same information (the map) multiple times.
> 
> I will try to think of doing something about this later this week.

The solution I came up with is to call map_variable_attnos() directly,
instead of going through map_partition_varattnos() every time, after first
creating the attribute map ourselves.

>>> And a final item is: can we have a whole-row expression in the clauses?
>>> We currently don't handle those either, not even to throw an error.
>>> [figures a test case] ... and now that I test it, it does crash!
>>>
>>> create table part (a int primary key, b text) partition by range (a);
>>> create table part1 (b text, a int not null);
>>> alter table part attach partition part1 for values from (1) to (1000);
>>> insert into part values (1, 'two') on conflict (a)
>>>   do update set b = format('%s (was %s)', excluded.b, part.b)
>>>   where part.* *<> (1, text 'two');
>>>
>>> I think this means we should definitely handle found_whole_row.  (If you
>>> create part1 in the normal way, it works as you'd expect.)
>>
>> I agree.  That means we simply remove the Assert after the
>> map_partition_varattnos call.
>>
>>> I'm going to close a few other things first, then come back to this.
>>
>> Attached find a patch to fix the whole-row expression issue.  I added your
>> test to insert_conflict.sql.

Combined the above patch into the attached patch.

Thanks,
Amit
From a90decd69a42bebdb6e07c8268686c0500f8c48e Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 16 Apr 2018 17:31:42 +0900
Subject: [PATCH v2] Couple of fixes for ExecInitPartitionInfo

First, avoid repeated calling of map_partition_varattnos.  To do that,
generate the rootrel -> partrel attribute conversion map ourselves
just once and call map_variable_attnos() directly with it.

Second, support conversion of whole-row variables that appear in
ON CONFLICT expressions.  Add relevant test.
---
 src/backend/executor/execPartition.c  | 88 ---
 src/test/regress/expected/insert_conflict.out | 16 +
 src/test/regress/sql/insert_conflict.sql  | 14 +
 3 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 218645d43b..1727e111bb 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
 #include "partitioning/partprune.h"
+#include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
@@ -309,6 +310,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
partrel;
ResultRelInfo *leaf_part_rri;
MemoryContext oldContext;
+   AttrNumber *part_attnos = NULL;
+   boolfound_whole_row;
 
/*
 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -397,8 +400,19 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
/*
 * Convert Vars in it to contain this partition's attribute 
numbers.
 */
-   wcoList = map_partition_varattnos(wcoList, firstVarno,
-   
  partrel, firstResultRel, NULL);
+   part_attnos =
+   convert_tuples_by_name_map(RelationGetDescr(partrel),
+  
RelationGetDescr(firstResultRel),
+  
gettext_noop("could not convert row type"));
+   wcoList = (List *)
+   map_variable_attnos((Node *) wcoList,
+   
firstVarno, 0,
+   
part_attnos,
+   

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-09 Thread Amit Langote
On 2018/03/27 13:27, Amit Langote wrote:
> On 2018/03/26 23:20, Alvaro Herrera wrote:
>> The one thing I wasn't terribly in love with is the four calls to
>> map_partition_varattnos(), creating the attribute map four times ... but
>> we already have it in the TupleConversionMap, no?  Looks like we could
>> save a bunch of work there.
> 
> Hmm, actually we can't use that map, assuming you're talking about the
> following map:
> 
>   TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
> 
> We can only use that to tell if we need converting expressions (as we
> currently do), but it cannot be used to actually convert the expressions.
> The map in question is for use by do_convert_tuple(), not to map varattnos
> in Vars using map_variable_attnos().
> 
> But it's definitely a bit undesirable to have various
> map_partition_varattnos() calls within ExecInitPartitionInfo() to
> initialize the same information (the map) multiple times.

I will try to think of doing something about this later this week.

>> And a final item is: can we have a whole-row expression in the clauses?
>> We currently don't handle those either, not even to throw an error.
>> [figures a test case] ... and now that I test it, it does crash!
>>
>> create table part (a int primary key, b text) partition by range (a);
>> create table part1 (b text, a int not null);
>> alter table part attach partition part1 for values from (1) to (1000);
>> insert into part values (1, 'two') on conflict (a)
>>   do update set b = format('%s (was %s)', excluded.b, part.b)
>>   where part.* *<> (1, text 'two');
>>
>> I think this means we should definitely handle found_whole_row.  (If you
>> create part1 in the normal way, it works as you'd expect.)
> 
> I agree.  That means we simply remove the Assert after the
> map_partition_varattnos call.
> 
>> I'm going to close a few other things first, then come back to this.
> 
> Attached find a patch to fix the whole-row expression issue.  I added your
> test to insert_conflict.sql.

Adding this to the open items list.

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-26 Thread Amit Langote
On 2018/03/26 23:20, Alvaro Herrera wrote:
> Pushed now.

Thank you!

> Amit Langote wrote:
>> On 2018/03/24 9:23, Alvaro Herrera wrote:
> 
>>> To fix this, I had to completely rework the "get partition parent root"
>>> stuff into "get list of ancestors of this partition".
>>
>> I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
>> instead of creating a list of ancestors and then looping over it as you've
>> done, but maybe what you have here is fine.
> 
> Yeah, I wondered about doing it that way too (since you can stop looking
> early), but decided that I didn't like repeatedly opening pg_inherits
> for each level.  Anyway the most common case is a single level, and in
> rare cases two levels ... I don't think we're going to see much more
> than that.  So it doesn't matter too much.  We can refine later anyway,
> if this becomes a hot spot (I doubt it TBH).

Yes, I suppose.

>>> * General code style improvements, comment rewording, etc.
>>
>> There was one comment in Fujita-san's review he posted on Friday [1] that
>> doesn't seem to be addressed in v10, which I think we probably should.  It
>> was this comment:
>>
>> "ExecBuildProjectionInfo is called without setting the tuple descriptor of
>> mtstate->mt_conflproj to tupDesc.  That might work at least for now, but I
>> think it's a good thing to set it appropriately to make that future proof."
>>
>> All of his other comments seem to have been taken care of in v10.  I have
>> fixed the above one in the attached updated version.
> 
> I was of two minds about this item myself; we don't use the tupdesc for
> anything at that point and I expect more things would break if we
> required that.  But I don't think it hurts, so I kept it.
> 
> The one thing I wasn't terribly in love with is the four calls to
> map_partition_varattnos(), creating the attribute map four times ... but
> we already have it in the TupleConversionMap, no?  Looks like we could
> save a bunch of work there.

Hmm, actually we can't use that map, assuming you're talking about the
following map:

  TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];

We can only use that to tell if we need converting expressions (as we
currently do), but it cannot be used to actually convert the expressions.
The map in question is for use by do_convert_tuple(), not to map varattnos
in Vars using map_variable_attnos().

But it's definitely a bit undesirable to have various
map_partition_varattnos() calls within ExecInitPartitionInfo() to
initialize the same information (the map) multiple times.

> And a final item is: can we have a whole-row expression in the clauses?
> We currently don't handle those either, not even to throw an error.
> [figures a test case] ... and now that I test it, it does crash!
> 
> create table part (a int primary key, b text) partition by range (a);
> create table part1 (b text, a int not null);
> alter table part attach partition part1 for values from (1) to (1000);
> insert into part values (1, 'two') on conflict (a)
>   do update set b = format('%s (was %s)', excluded.b, part.b)
>   where part.* *<> (1, text 'two');
> 
> I think this means we should definitely handle found_whole_row.  (If you
> create part1 in the normal way, it works as you'd expect.)

I agree.  That means we simply remove the Assert after the
map_partition_varattnos call.

> I'm going to close a few other things first, then come back to this.

Attached find a patch to fix the whole-row expression issue.  I added your
test to insert_conflict.sql.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 9a13188649..f1a972e235 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -557,7 +557,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
{
List   *onconflset;
TupleDesc   tupDesc;
-   boolfound_whole_row;
 
leaf_part_rri->ri_onConflict = 
makeNode(OnConflictSetState);
 
@@ -571,12 +570,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
onconflset = (List *) copyObject((Node *) 
node->onConflictSet);
onconflset =
map_partition_varattnos(onconflset, 
INNER_VAR, partrel,
-   
firstResultRel, _whole_row);
-   Assert(!found_whole_row);
+   
firstResultRel, NULL);
onconflset =
map_partition_varattnos(onconflset, 
firstVarno, partrel,
-   
firstResultRel, 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-26 Thread Alvaro Herrera
Pushed now.

Amit Langote wrote:
> On 2018/03/24 9:23, Alvaro Herrera wrote:

> > To fix this, I had to completely rework the "get partition parent root"
> > stuff into "get list of ancestors of this partition".
> 
> I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
> instead of creating a list of ancestors and then looping over it as you've
> done, but maybe what you have here is fine.

Yeah, I wondered about doing it that way too (since you can stop looking
early), but decided that I didn't like repeatedly opening pg_inherits
for each level.  Anyway the most common case is a single level, and in
rare cases two levels ... I don't think we're going to see much more
than that.  So it doesn't matter too much.  We can refine later anyway,
if this becomes a hot spot (I doubt it TBH).


> > * General code style improvements, comment rewording, etc.
> 
> There was one comment in Fujita-san's review he posted on Friday [1] that
> doesn't seem to be addressed in v10, which I think we probably should.  It
> was this comment:
> 
> "ExecBuildProjectionInfo is called without setting the tuple descriptor of
> mtstate->mt_conflproj to tupDesc.  That might work at least for now, but I
> think it's a good thing to set it appropriately to make that future proof."
> 
> All of his other comments seem to have been taken care of in v10.  I have
> fixed the above one in the attached updated version.

I was of two minds about this item myself; we don't use the tupdesc for
anything at that point and I expect more things would break if we
required that.  But I don't think it hurts, so I kept it.

The one thing I wasn't terribly in love with is the four calls to
map_partition_varattnos(), creating the attribute map four times ... but
we already have it in the TupleConversionMap, no?  Looks like we could
save a bunch of work there.

And a final item is: can we have a whole-row expression in the clauses?
We currently don't handle those either, not even to throw an error.
[figures a test case] ... and now that I test it, it does crash!

create table part (a int primary key, b text) partition by range (a);
create table part1 (b text, a int not null);
alter table part attach partition part1 for values from (1) to (1000);
insert into part values (1, 'two') on conflict (a)
  do update set b = format('%s (was %s)', excluded.b, part.b)
  where part.* *<> (1, text 'two');

I think this means we should definitely handle found_whole_row.  (If you
create part1 in the normal way, it works as you'd expect.)

I'm going to close a few other things first, then come back to this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-25 Thread Amit Langote
On 2018/03/24 9:23, Alvaro Herrera wrote:
> I made a bunch of further edits and I think this v10 is ready to push.
> Before doing so I'll give it a final look, particularly because of the
> new elog(ERROR) I added.  Post-commit review is of course always
> appreciated.
> 
> Most notable change is because I noticed that if you mention an
> intermediate partition level in the INSERT command, and the index is on
> the top level, arbiter index selection fails to find the correct index
> because it walks all the way to the top instead of stopping in the
> middle, as it should (the command was still working because it ended up
> with an empty arbiter index list).

Good catch!

> To fix this, I had to completely rework the "get partition parent root"
> stuff into "get list of ancestors of this partition".

I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
instead of creating a list of ancestors and then looping over it as you've
done, but maybe what you have here is fine.

> Because of this, I added a new check that the partition's arbiter index
> list is same length as parent's; if not, throw an error.  I couldn't get
> it to fire (so it's just an elog not ereport), but maybe I just didn't
> try any weird enough scenarios.
> 
> Other changes:
> 
> * I added a copyObject() call for nodes we're operating upon.  Maybe
>   this is unnecessary but the comments claimed "we're working on a copy"
>   and I couldn't find any place where we were actually making one.
>   Anyway it seems sane to make a copy, because we're scribbling on those
>   nodes ... I hope I didn't introduce any serious memory leaks.

That seems fine as ExecInitPartitionInfo allocates in the query context
(es_query_cxt).

> * I made the new OnConflictSetState thing into a proper node.  Like
>   ResultRelInfo, it doesn't have any niceties like nodeToString support,
>   but it seems saner this way (palloc -> makeNode).  I reworked the
>   formatting of that struct definition too, and renamed members.

Looks good, thanks.

> * I removed an assertion block at the bottom of adjust_partition_tlist.
>   It seemed quite pointless, since it was just about checking that the
>   resno values were sorted, but by construction we already know that
>   they are indeed sorted ...

Hmm yes.

> * General code style improvements, comment rewording, etc.

There was one comment in Fujita-san's review he posted on Friday [1] that
doesn't seem to be addressed in v10, which I think we probably should.  It
was this comment:

"ExecBuildProjectionInfo is called without setting the tuple descriptor of
mtstate->mt_conflproj to tupDesc.  That might work at least for now, but I
think it's a good thing to set it appropriately to make that future proof."

All of his other comments seem to have been taken care of in v10.  I have
fixed the above one in the attached updated version.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/5AB4DEB6.2020901%40lab.ntt.co.jp
>From 7c0e1432f8f9516647126eab962656c13e691f3e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 22 Mar 2018 19:12:57 -0300
Subject: [PATCH v11] on conflict

---
 doc/src/sgml/ddl.sgml |  15 --
 doc/src/sgml/ref/insert.sgml  |   8 +
 src/backend/catalog/partition.c   |  88 --
 src/backend/executor/execMain.c   |   4 +
 src/backend/executor/execPartition.c  | 230 --
 src/backend/executor/nodeModifyTable.c|  74 +++--
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   1 +
 src/include/nodes/execnodes.h |  22 ++-
 src/include/nodes/nodes.h |   1 +
 src/test/regress/expected/insert_conflict.out | 108 ++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  95 +--
 src/test/regress/sql/triggers.sql |  33 
 14 files changed, 636 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   
-   Using the ON CONFLICT clause with partitioned tables
-   will cause an error if the conflict target is specified (see
-for more details on how the clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the other
-   hand, specifying DO NOTHING as the alternative action
-   works fine provided the conflict target is not specified.  In that case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 
-
- 
-  
When an UPDATE causes a 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Alvaro Herrera
I made a bunch of further edits and I think this v10 is ready to push.
Before doing so I'll give it a final look, particularly because of the
new elog(ERROR) I added.  Post-commit review is of course always
appreciated.

Most notable change is because I noticed that if you mention an
intermediate partition level in the INSERT command, and the index is on
the top level, arbiter index selection fails to find the correct index
because it walks all the way to the top instead of stopping in the
middle, as it should (the command was still working because it ended up
with an empty arbiter index list).

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

Because of this, I added a new check that the partition's arbiter index
list is same length as parent's; if not, throw an error.  I couldn't get
it to fire (so it's just an elog not ereport), but maybe I just didn't
try any weird enough scenarios.

Other changes:

* I added a copyObject() call for nodes we're operating upon.  Maybe
  this is unnecessary but the comments claimed "we're working on a copy"
  and I couldn't find any place where we were actually making one.
  Anyway it seems sane to make a copy, because we're scribbling on those
  nodes ... I hope I didn't introduce any serious memory leaks.

* I made the new OnConflictSetState thing into a proper node.  Like
  ResultRelInfo, it doesn't have any niceties like nodeToString support,
  but it seems saner this way (palloc -> makeNode).  I reworked the
  formatting of that struct definition too, and renamed members.

* I removed an assertion block at the bottom of adjust_partition_tlist.
  It seemed quite pointless, since it was just about checking that the
  resno values were sorted, but by construction we already know that
  they are indeed sorted ...

* General code style improvements, comment rewording, etc.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d87cd6154fe026f7641e98c8a43683b208a61f5b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 22 Mar 2018 19:12:57 -0300
Subject: [PATCH v10] on conflict

---
 doc/src/sgml/ddl.sgml |  15 --
 doc/src/sgml/ref/insert.sgml  |   8 +
 src/backend/catalog/partition.c   |  88 --
 src/backend/executor/execMain.c   |   4 +
 src/backend/executor/execPartition.c  | 229 --
 src/backend/executor/nodeModifyTable.c|  74 +++--
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   1 +
 src/include/nodes/execnodes.h |  22 ++-
 src/include/nodes/nodes.h |   1 +
 src/test/regress/expected/insert_conflict.out | 108 ++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  95 +--
 src/test/regress/sql/triggers.sql |  33 
 14 files changed, 635 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   
-   Using the ON CONFLICT clause with partitioned tables
-   will cause an error if the conflict target is specified (see
-for more details on how the clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the other
-   hand, specifying DO NOTHING as the alternative action
-   works fine provided the conflict target is not specified.  In that case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 
-
- 
-  
When an UPDATE causes a row to move from one
partition to another, there is a chance that another concurrent
UPDATE or DELETE misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO table_name [ AS 
+
+   
+Note that it is currently not supported for the
+ON CONFLICT DO UPDATE clause of an
+INSERT applied to a partitioned table to update the
+partition key of a conflicting row such that it requires the row be moved
+to a new partition.
+   

 
  It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..b00a986432 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Alvaro Herrera
Thanks for these changes.  I'm going over this now, with intention to
push it shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Etsuro Fujita

(2018/03/22 18:31), Amit Langote wrote:

On 2018/03/20 20:53, Etsuro Fujita wrote:

Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
  ItemPointerData conflictTid;
  boolspecConflict;
  List   *arbiterIndexes;
+PartitionTupleRouting *proute =
+mtstate->mt_partition_tuple_routing;

-arbiterIndexes = node->arbiterIndexes;
+/* Use the appropriate list of arbiter indexes. */
+if (mtstate->mt_partition_tuple_routing != NULL)
+{
+Assert(partition_index>= 0&&  proute != NULL);
+arbiterIndexes =
+proute->partition_arbiter_indexes[partition_index];
+}
+else
+arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to have
the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro
upthread, or to re-add mt_arbiterindexes as before and set it to
proute->partition_arbiter_indexes[partition_index] before we get here,
maybe in ExecPrepareTupleRouting, in the case of tuple routing.


It's a good idea.  I somehow missed that Alvaro had already mentioned it.

In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere.  I
propose we name the field ri_onConflictArbiterIndexes as done in the
updated patch.


I like that naming.


@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  econtext = mtstate->ps.ps_ExprContext;
  relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-/* initialize slot for the existing tuple */
-mtstate->mt_existing =
-ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+/*
+ * Initialize slot for the existing tuple.  We determine which
+ * tupleDesc to use for this after we have determined which relation
+ * the insert/update will be applied to, possibly after performing
+ * tuple routing.
+ */
+mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state,
NULL);

  /* carried forward solely for the benefit of explain */
  mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  /* create target slot for UPDATE SET projection */
  tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
   relationDesc->tdhasoid);
+PinTupleDesc(tupDesc);
+mtstate->mt_conflproj_tupdesc = tupDesc;
+
+/*
+ * Just like the "existing tuple" slot, we'll defer deciding which
+ * tupleDesc to use for this slot to a point where tuple routing has
+ * been performed.
+ */
  mtstate->mt_conflproj =
-ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+ExecInitExtraTupleSlot(mtstate->ps.state, NULL);

If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing, as said above, we wouldn't need this changes.  I think doing that
only in the case of tuple routing and keeping this as-is would be better
because that would save cycles in the normal case.


Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in
ExecPrepareTupleRouting, because we shouldn't have more than one instance
of mtstate->mt_existing and mtstate->mt_conflproj slots.


Yeah, I think so too.  What I was going to say here is 
ExecSetSlotDescriptor, not ExecInitExtraTupleSlot, as you said below. 
Sorry about the incorrectness.  I guess I was too tired when writing 
that comments.



As you also said above, I think you meant to say here that we do
ExecInitExtraTupleSlot only once for both mtstate->mt_existing and
mtstate->mt_conflproj in ExecInitModifyTable and only do
ExecSetSlotDescriptor in ExecPrepareTupleRouting.


That's right.


I have changed it so
that ExecInitModifyTable now both creates the slot and sets the descriptor
for non-tuple-routing cases and only creates but doesn't set the
descriptor in the tuple-routing case.


IMHO I don't see much value in modifying code as such, because we do 
ExecSetSlotDescriptor for mt_existing and mt_conflproj in 
ExecPrepareTupleRouting for every inserted tuple.  So, I would leave 
that as-is, to keep that simple.



For ExecPrepareTupleRouting to be able to access the tupDesc of the
onConflictSet target list, I've added ri_onConflictSetProjTupDesc which is
set by ExecInitPartitionInfo on first call for a give partition.  This is
also suggested by Pavan in his review.


Seems like a good idea.

Here are some comments on the latest version of the patch:

+   /*
+* Caller must set mtstate->mt_conflproj's tuple 
descriptor to

+   

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-22 Thread Amit Langote
On 2018/03/22 20:48, Pavan Deolasee wrote:
> Thanks. It's looking much better now.

Thanks.

> I think we can possibly move all ON
> CONFLICT related members to a separate structure and just copy the pointer
> to the structure if (map == NULL). That might make the code a bit more tidy.

OK, I tried that in the attached updated patch.

> Is there anything that needs to be done for transition tables? I checked
> and didn't see anything, but please check.

There doesn't seem to be anything that this patch has to do for transition
tables.  If you look at the tests I added in triggers.sql which exercise
INSERT ON CONFLICT's interaction with transition tables, you can see that
we get the same output for a partitioned table as we get for a normal table.

Thanks,
Amit
From d385c307fbe98935661d7b983229eb5b2e2e6436 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v9] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml |  15 --
 doc/src/sgml/ref/insert.sgml  |   8 +
 src/backend/catalog/partition.c   |  83 +++--
 src/backend/executor/execMain.c   |   4 +
 src/backend/executor/execPartition.c  | 252 --
 src/backend/executor/nodeModifyTable.c|  82 +++--
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   1 +
 src/include/nodes/execnodes.h |  25 ++-
 src/test/regress/expected/insert_conflict.out |  86 +++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  72 +++-
 src/test/regress/sql/triggers.sql |  33 
 13 files changed, 616 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   
-   Using the ON CONFLICT clause with partitioned tables
-   will cause an error if the conflict target is specified (see
-for more details on how the clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the other
-   hand, specifying DO NOTHING as the alternative action
-   works fine provided the conflict target is not specified.  In that case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 
-
- 
-  
When an UPDATE causes a row to move from one
partition to another, there is a chance that another concurrent
UPDATE or DELETE misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO table_name [ AS 
+
+   
+Note that it is currently not supported for the
+ON CONFLICT DO UPDATE clause of an
+INSERT applied to a partitioned table to update the
+partition key of a conflicting row such that it requires the row be moved
+to a new partition.
+   

 
  It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..bfe559490e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int  
get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 Datum *values, 
bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool 
getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1377,6 +1378,7 @@ check_default_allows_bound(Relation parent, Relation 
default_rel,
 
 /*
  * get_partition_parent
+ * Obtain direct parent of given relation
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
@@ -1387,14 +1389,59 @@ check_default_allows_bound(Relation parent, Relation 
default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
-   Form_pg_inherits form;
-   RelationcatalogRelation;
+   RelationinhRel;
+   Oid parentOid;
+
+   inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+   parentOid = get_partition_parent_recurse(inhRel, relid, false);
+   if (parentOid == InvalidOid)
+   elog(ERROR, "could not find parent of relation %u", relid);
+
+   heap_close(inhRel, AccessShareLock);
+
+   return 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-22 Thread Pavan Deolasee
On Thu, Mar 22, 2018 at 3:01 PM, Amit Langote  wrote:

>
> >
> > Why do we need to pin the descriptor? If we do need, why don't we need
> > corresponding ReleaseTupDesc() call?
>
> PinTupleDesc was added in the patch as Alvaro had submitted it upthread,
> which it wasn't clear to me either why it was needed.  Looking at it
> closely, it seems we can get rid of it if for the lack of corresponding
> ReleaseTupleDesc().  ExecSetSlotDescriptor() seems to take care of pinning
> and releasing tuple descriptors that are passed to it.  If some
> partition's tupDesc remains pinned because it was the last one that was
> passed to it, the final ExecResetTupleTable will take care of releasing it.
>
> I have removed the instances of PinTupleDesc in the updated patch, but I'm
> not sure why the loose PinTupleDesc() in the previous version of the patch
> didn't cause reference leak warnings or some such.
>

Yeah, it wasn't clear to me as well. But I did not investigate. May be
Alvaro knows better.


> > I am still confused if the partition_conflproj_tdescs really belongs to
> > PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW
> for
> > the
> > MERGE patch, I moved everything to a new struct and made it part of the
> > ResultRelInfo. If no re-mapping is necessary, we can just point to the
> > struct
> > in the root relation's ResultRelInfo. Otherwise create and populate a new
> > one
> > in the partition relation's ResultRelInfo.
> >
> > + leaf_part_rri->ri_onConflictSetWhere =
> > + ExecInitQual(onconflwhere, >ps);
> > + }
> >
> > So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
> > ResultRelInfo. Why not move mt_conflproj_tupdesc,
> > partition_conflproj_tdescs,
> > partition_arbiter_indexes etc to the ResultRelInfo as well?
>
> I have moved both the projection tupdesc and the arbiter indexes list into
> ResultRelInfo as I wrote above.
>
>
Thanks. It's looking much better now. I think we can possibly move all ON
CONFLICT related members to a separate structure and just copy the pointer
to the structure if (map == NULL). That might make the code a bit more tidy.

Is there anything that needs to be done for transition tables? I checked
and didn't see anything, but please check.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-22 Thread Amit Langote
Fujita-san, Pavan,

Thank you both for reviewing.  Replying to both emails here.

On 2018/03/20 20:53, Etsuro Fujita wrote:
> Here are comments on executor changes in (the latest version of) the patch:
> 
> @@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
>  ItemPointerData conflictTid;
>  bool    specConflict;
>  List   *arbiterIndexes;
> +    PartitionTupleRouting *proute =
> +    mtstate->mt_partition_tuple_routing;
> 
> -    arbiterIndexes = node->arbiterIndexes;
> +    /* Use the appropriate list of arbiter indexes. */
> +    if (mtstate->mt_partition_tuple_routing != NULL)
> +    {
> +    Assert(partition_index >= 0 && proute != NULL);
> +    arbiterIndexes =
> +    proute->partition_arbiter_indexes[partition_index];
> +    }
> +    else
> +    arbiterIndexes = node->arbiterIndexes;
> 
> To handle both cases the same way, I wonder if it would be better to have
> the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro
> upthread, or to re-add mt_arbiterindexes as before and set it to
> proute->partition_arbiter_indexes[partition_index] before we get here,
> maybe in ExecPrepareTupleRouting, in the case of tuple routing.

It's a good idea.  I somehow missed that Alvaro had already mentioned it.

In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere.  I
propose we name the field ri_onConflictArbiterIndexes as done in the
updated patch.

>  ExecOnConflictUpdate(ModifyTableState *mtstate,
>   ResultRelInfo *resultRelInfo,
> + TupleDesc onConflictSetTupdesc,
>   ItemPointer conflictTid,
>   TupleTableSlot *planSlot,
>   TupleTableSlot *excludedSlot,
> @@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
>  ExecCheckHeapTupleVisible(estate, , buffer);
> 
>  /* Store target's existing tuple in the state's dedicated slot */
> +    ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
>  ExecStoreTuple(, mtstate->mt_existing, buffer, false);
> 
>  /*
> @@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
>  }
> 
>  /* Project the new tuple version */
> +    ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
>  ExecProject(resultRelInfo->ri_onConflictSetProj);
> 
> Can we do ExecSetSlotDescriptor for mtstate->mt_existing and
> mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
> routing?  That would make the API changes to ExecOnConflictUpdate
> unnecessary.

That's a good idea too, so done.

> 
> @@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
> *estate, int eflags)
>  econtext = mtstate->ps.ps_ExprContext;
>  relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
> 
> -    /* initialize slot for the existing tuple */
> -    mtstate->mt_existing =
> -    ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
> +    /*
> + * Initialize slot for the existing tuple.  We determine which
> + * tupleDesc to use for this after we have determined which relation
> + * the insert/update will be applied to, possibly after performing
> + * tuple routing.
> + */
> +    mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state,
> NULL);
> 
>  /* carried forward solely for the benefit of explain */
>  mtstate->mt_excludedtlist = node->exclRelTlist;
> @@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState
> *estate, int eflags)
>  /* create target slot for UPDATE SET projection */
>  tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
>   relationDesc->tdhasoid);
> +    PinTupleDesc(tupDesc);
> +    mtstate->mt_conflproj_tupdesc = tupDesc;
> +
> +    /*
> + * Just like the "existing tuple" slot, we'll defer deciding which
> + * tupleDesc to use for this slot to a point where tuple routing has
> + * been performed.
> + */
>  mtstate->mt_conflproj =
> -    ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
> +    ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
> 
> If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
> mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
> routing, as said above, we wouldn't need this changes.  I think doing that
> only in the case of tuple routing and keeping this as-is would be better
> because that would save cycles in the normal case.

Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in
ExecPrepareTupleRouting, because we shouldn't have more than one instance
of mtstate->mt_existing and mtstate->mt_conflproj slots.

As you also said above, I think you meant to 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-21 Thread Pavan Deolasee
On Tue, Mar 20, 2018 at 10:09 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2018/03/20 13:30, Amit Langote wrote:
> > I have incorporated your patch in the main patch after updating the
> > comments a bit.  Also, now that ee49f49 is in [1], the transition
> > table related tests I proposed yesterday pass nicely.  Instead of posting
> > as a separate patch, I have merged it with the main patch.  So now that
> > planner refactoring is unnecessary, attached is just one patch.
>
> Sorry, I forgot to remove a hunk in the patch affecting
> src/include/optimizer/prep.h.  Fixed in the attached updated version.
>


Thanks for writing a new version. A few comments:


  
   
-   Using the ON CONFLICT clause with partitioned
tables
-   will cause an error if the conflict target is specified (see
-for more details on how the
clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the
other
-   hand, specifying DO NOTHING as the alternative
action
-   works fine provided the conflict target is not specified.  In that
case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 

We should document it somewhere that partition key update is not supported
by
ON CONFLICT DO UPDATE

 /*
  * get_partition_parent
+ * Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning
pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance
hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is
passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */

Given that most callers only look for immediate parent, I wonder if it makes
sense to have a new function, get_partition_root(), instead of changing
signature of the current function. That will reduce foot-print of this patch
quite a lot.


@@ -36,6 +38,7 @@ static char
*ExecBuildSlotPartitionKeyDescription(Relation rel,
  Datum *values,
  bool *isnull,
  int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap
*map);

We should name this function in a more generic way, given that it's going
to be
used for other things too. What about adjust_partition_tlist?

+
+ /*
+ * If partition's tuple descriptor differs from the root parent,
+ * we need to adjust the onConflictSet target list to account for
+ * differences in attribute numbers.
+ */
+ if (map != NULL)
+ {
+ /*
+ * First convert Vars to contain partition's atttribute
+ * numbers.
+ */
+
+ /* Convert Vars referencing EXCLUDED pseudo-relation. */
+ onconflset = map_partition_varattnos(node->onConflictSet,
+ INNER_VAR,
+ partrel,
+ firstResultRel, NULL);

Are we not modifying node->onConflictSet in place? Or does
map_partition_varattnos() create a fresh copy before scribbling on the
input?
If it's former then I guess that's a problem. If it's latter then we ought
to
have comments explaining that.


+ tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+ PinTupleDesc(tupDesc);

Why do we need to pin the descriptor? If we do need, why don't we need
corresponding ReleaseTupDesc() call?

I am still confused if the partition_conflproj_tdescs really belongs to
PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW for
the
MERGE patch, I moved everything to a new struct and made it part of the
ResultRelInfo. If no re-mapping is necessary, we can just point to the
struct
in the root relation's ResultRelInfo. Otherwise create and populate a new
one
in the partition relation's ResultRelInfo.

+ leaf_part_rri->ri_onConflictSetWhere =
+ ExecInitQual(onconflwhere, >ps);
+ }

So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
ResultRelInfo. Why not move mt_conflproj_tupdesc,
partition_conflproj_tdescs,
partition_arbiter_indexes etc to the ResultRelInfo as well?

+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *

As I said above, we should disassociate this function from ON CONFLICT DO
UPDATE and rather have it as a general purpose facility.

+ * The expressions have already been fixed, but we have to make sure that
the
+ * target resnos match the partition.  In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *

Can we have some explanation regarding how the targetlist is reordered? I
know
the function does that by updating the resno in place, but some explanation
would help. Also, should we add an assertion-build check to confirm that the
resultant list is actually ordered?

@@ -66,7 +67,8 @@ static TupleTableSlot

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-20 Thread Etsuro Fujita

(2018/03/20 13:30), Amit Langote wrote:

On 2018/03/19 21:59, Etsuro Fujita wrote:

(2018/03/18 13:17), Alvaro Herrera wrote:

Alvaro Herrera wrote:
The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing.  I fear we may be doing
redundant and/or misplaced work.  I'll look into it next week.


I'm still reviewing the patches, but I really agree on that point.  As
Pavan mentioned upthread, the onConflictSet tlist for the root parent,
from which we create a translated onConflictSet tlist for a partition,
would have already been processed by expand_targetlist() to contain all
missing columns as well, so I think we could create the tlist for the
partition by simply re-ordering the expression-converted tlist (ie,
conv_setproj) based on the conversion map for the partition.  The Attached
defines a function for that, which could be called, instead of calling
adjust_and_expand_partition_tlist().  This would allow us to get rid of
planner changes from the patches.  Maybe I'm missing something, though.


Thanks for the patch.  I can confirm your proposed
adjust_onconflictset_tlist() is enough to replace adjust_inherited_tlist()
+ expand_targetlist() combination (aka
adjust_and_expand_partition_tlist()), thus rendering the planner changes
in this patch unnecessary.  I tested it with a partition tree involving
partitions of varying attribute numbers (dropped columns included) and it
seems to work as expected (as also exercised in regression tests) as shown
below.


Thanks for testing!


I have incorporated your patch in the main patch after updating the
comments a bit.  Also, now that ee49f49 is in [1], the transition
table related tests I proposed yesterday pass nicely.  Instead of posting
as a separate patch, I have merged it with the main patch.  So now that
planner refactoring is unnecessary, attached is just one patch.


Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
ItemPointerData conflictTid;
boolspecConflict;
List   *arbiterIndexes;
+   PartitionTupleRouting *proute =
+   
mtstate->mt_partition_tuple_routing;

-   arbiterIndexes = node->arbiterIndexes;
+   /* Use the appropriate list of arbiter indexes. */
+   if (mtstate->mt_partition_tuple_routing != NULL)
+   {
+   Assert(partition_index >= 0 && proute != NULL);
+   arbiterIndexes =
+   
proute->partition_arbiter_indexes[partition_index];
+   }
+   else
+   arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to 
have the arbiterindexes list in ResultRelInfo as well, as mentioned by 
Alvaro upthread, or to re-add mt_arbiterindexes as before and set it to 
proute->partition_arbiter_indexes[partition_index] before we get here, 
maybe in ExecPrepareTupleRouting, in the case of tuple routing.


 ExecOnConflictUpdate(ModifyTableState *mtstate,
 ResultRelInfo *resultRelInfo,
+TupleDesc onConflictSetTupdesc,
 ItemPointer conflictTid,
 TupleTableSlot *planSlot,
 TupleTableSlot *excludedSlot,
@@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
ExecCheckHeapTupleVisible(estate, , buffer);

/* Store target's existing tuple in the state's dedicated slot */
+   ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
ExecStoreTuple(, mtstate->mt_existing, buffer, false);

/*
@@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
}

/* Project the new tuple version */
+   ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
ExecProject(resultRelInfo->ri_onConflictSetProj);

Can we do ExecSetSlotDescriptor for mtstate->mt_existing and 
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple 
routing?  That would make the API changes to ExecOnConflictUpdate 
unnecessary.


@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState 
*estate, int eflags)

econtext = mtstate->ps.ps_ExprContext;
relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-   /* initialize slot for the existing tuple */
-   mtstate->mt_existing =
-   ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+   /*
+* Initialize 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Amit Langote
On 2018/03/20 13:30, Amit Langote wrote:
> I have incorporated your patch in the main patch after updating the
> comments a bit.  Also, now that ee49f49 is in [1], the transition
> table related tests I proposed yesterday pass nicely.  Instead of posting
> as a separate patch, I have merged it with the main patch.  So now that
> planner refactoring is unnecessary, attached is just one patch.

Sorry, I forgot to remove a hunk in the patch affecting
src/include/optimizer/prep.h.  Fixed in the attached updated version.

Thanks,
Amit
>From 793b407545b0d24715f0d44a9a546689e9a4282a Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v7] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml |  15 --
 src/backend/catalog/heap.c|   2 +-
 src/backend/catalog/partition.c   |  62 +--
 src/backend/commands/tablecmds.c  |  15 +-
 src/backend/executor/execPartition.c  | 237 --
 src/backend/executor/nodeModifyTable.c|  93 --
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   2 +-
 src/include/executor/execPartition.h  |  10 ++
 src/include/nodes/execnodes.h |   1 +
 src/test/regress/expected/insert_conflict.out |  73 ++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  64 +--
 src/test/regress/sql/triggers.sql |  33 
 14 files changed, 551 insertions(+), 96 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   
-   Using the ON CONFLICT clause with partitioned tables
-   will cause an error if the conflict target is specified (see
-for more details on how the clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the other
-   hand, specifying DO NOTHING as the alternative action
-   works fine provided the conflict target is not specified.  In that case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 
-
- 
-  
When an UPDATE causes a row to move from one
partition to another, there is a chance that another concurrent
UPDATE or DELETE misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3d80ff9e5b..13489162df 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
elog(ERROR, "cache lookup failed for relation %u", relid);
if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
{
-   parentOid = get_partition_parent(relid);
+   parentOid = get_partition_parent(relid, false);
LockRelationOid(parentOid, AccessExclusiveLock);
 
/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..8dc73ae092 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int  
get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 Datum *values, 
bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool 
getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1384,24 +1385,43 @@ check_default_allows_bound(Relation parent, Relation 
default_rel,
 
 /*
  * get_partition_parent
+ * Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+   RelationinhRel;
+   Oid parentOid;
+
+   inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+   parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
+   if (parentOid == InvalidOid)
+   elog(ERROR, 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Amit Langote
Fujita-san,

On 2018/03/19 21:59, Etsuro Fujita wrote:
> (2018/03/18 13:17), Alvaro Herrera wrote:
>> Alvaro Herrera wrote:
>> The only thing that I remain unhappy about this patch is the whole
>> adjust_and_expand_partition_tlist() thing.  I fear we may be doing
>> redundant and/or misplaced work.  I'll look into it next week.
> 
> I'm still reviewing the patches, but I really agree on that point.  As
> Pavan mentioned upthread, the onConflictSet tlist for the root parent,
> from which we create a translated onConflictSet tlist for a partition,
> would have already been processed by expand_targetlist() to contain all
> missing columns as well, so I think we could create the tlist for the
> partition by simply re-ordering the expression-converted tlist (ie,
> conv_setproj) based on the conversion map for the partition.  The Attached
> defines a function for that, which could be called, instead of calling
> adjust_and_expand_partition_tlist().  This would allow us to get rid of
> planner changes from the patches.  Maybe I'm missing something, though.

Thanks for the patch.  I can confirm your proposed
adjust_onconflictset_tlist() is enough to replace adjust_inherited_tlist()
+ expand_targetlist() combination (aka
adjust_and_expand_partition_tlist()), thus rendering the planner changes
in this patch unnecessary.  I tested it with a partition tree involving
partitions of varying attribute numbers (dropped columns included) and it
seems to work as expected (as also exercised in regression tests) as shown
below.

Partitioned table p has partitions p1, p2, p3, p4, and p5 whose attributes
look like this; shown as (colname: attnum, ...).

p:  (a: 1, b: 4)
p1: (a: 1, b: 4)
p2: (a: 2, b: 4)
p3: (a: 1, b: 3)
p4: (a: 3, b: 8)
p5: (a: 1, b: 5)

You may notice that all partitions but p1 will have a tuple conversion map
and hence will undergo adjust_onconflictset_tlist() treatment.

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (1, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (1, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (1) on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (2, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (2, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (2, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (5, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (5, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (5, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (5) on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

select tableoid::regclass, * from p;
 tableoid | a | b
--+---+---
 p1   | 1 | b
 p2   | 2 | b
 p5   | 5 | b
(3 rows)

I have incorporated your patch in the main patch after updating the
comments a bit.  Also, now that ee49f49 is in [1], the transition
table related tests I proposed yesterday pass nicely.  Instead of posting
as a separate patch, I have merged it with the main patch.  So now that
planner refactoring is unnecessary, attached is just one patch.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ee49f49
>From ac4f82d67720994ff8c632515bcf6760542c0d2f Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v6] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml |  15 --
 src/backend/catalog/heap.c|   2 +-
 src/backend/catalog/partition.c   |  62 +--
 src/backend/commands/tablecmds.c  |  15 +-
 src/backend/executor/execPartition.c  | 237 --
 src/backend/executor/nodeModifyTable.c|  93 --
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   2 +-
 src/include/executor/execPartition.h  |  10 ++
 src/include/nodes/execnodes.h |   1 +
 src/include/optimizer/prep.h  |   9 +
 src/test/regress/expected/insert_conflict.out |  73 ++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  64 +--
 src/test/regress/sql/triggers.sql |  33 
 15 files changed, 560 insertions(+), 96 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Etsuro Fujita

(2018/03/18 13:17), Alvaro Herrera wrote:

Alvaro Herrera wrote:
The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing.  I fear we may be doing
redundant and/or misplaced work.  I'll look into it next week.


I'm still reviewing the patches, but I really agree on that point.  As 
Pavan mentioned upthread, the onConflictSet tlist for the root parent, 
from which we create a translated onConflictSet tlist for a partition, 
would have already been processed by expand_targetlist() to contain all 
missing columns as well, so I think we could create the tlist for the 
partition by simply re-ordering the expression-converted tlist (ie, 
conv_setproj) based on the conversion map for the partition.  The 
Attached defines a function for that, which could be called, instead of 
calling adjust_and_expand_partition_tlist().  This would allow us to get 
rid of planner changes from the patches.  Maybe I'm missing something, 
though.


Best regards,
Etsuro Fujita
*** src/backend/executor/execPartition.c.org	2018-03-19 21:32:52.0 +0900
--- src/backend/executor/execPartition.c	2018-03-19 21:44:17.0 +0900
***
*** 15,24 
--- 15,26 
  #include "postgres.h"
  
  #include "catalog/pg_inherits_fn.h"
+ #include "catalog/pg_type.h"
  #include "executor/execPartition.h"
  #include "executor/executor.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "nodes/makefuncs.h"
  #include "optimizer/prep.h"
  #include "utils/lsyscache.h"
  #include "utils/rls.h"
***
*** 37,42 
--- 39,45 
  	 Datum *values,
  	 bool *isnull,
  	 int maxfieldlen);
+ static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map);
  
  /*
   * ExecSetupPartitionTupleRouting - sets up information needed during
***
*** 554,565 
  		firstVarno, partrel,
  		firstResultRel, NULL);
  
! 			conv_setproj =
! adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
!   RelationGetDescr(partrel),
!   RelationGetRelationName(partrel),
!   firstVarno,
!   conv_setproj);
  
  			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
  			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
--- 557,563 
  		firstVarno, partrel,
  		firstResultRel, NULL);
  
! 			conv_setproj = adjust_onconflictset_tlist(conv_setproj, map);
  
  			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
  			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
***
*** 1091,1093 
--- 1089,1153 
  
  	return buf.data;
  }
+ 
+ /*
+  * Adjust the targetlist entries of a translated ON CONFLICT UPDATE operation
+  *
+  * The expressions have already been fixed, but we need to re-order the tlist
+  * so that the target resnos match the child table.
+  *
+  * The given tlist has already been through expression_tree_mutator;
+  * therefore the TargetEntry nodes are fresh copies that it's okay to
+  * scribble on.
+  */
+ static List *
+ adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map)
+ {
+ 	List	   *new_tlist = NIL;
+ 	TupleDesc	tupdesc = map->outdesc;
+ 	AttrNumber *attrMap = map->attrMap;
+ 	int			numattrs = tupdesc->natts;
+ 	int			attrno;
+ 
+ 	for (attrno = 1; attrno <= numattrs; attrno++)
+ 	{
+ 		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+ 		TargetEntry *tle;
+ 
+ 		if (attrMap[attrno - 1] != 0)
+ 		{
+ 			Assert(!att_tup->attisdropped);
+ 
+ 			/* Get the corresponding tlist entry from the given tlist */
+ 			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+ 
+ 			/* Get the resno right */
+ 			if (tle->resno != attrno)
+ tle->resno = attrno;
+ 		}
+ 		else
+ 		{
+ 			Node	   *expr;
+ 
+ 			Assert(att_tup->attisdropped);
+ 
+ 			/* Insert NULL for dropped column */
+ 			expr = (Node *) makeConst(INT4OID,
+ 	  -1,
+ 	  InvalidOid,
+ 	  sizeof(int32),
+ 	  (Datum) 0,
+ 	  true, /* isnull */
+ 	  true /* byval */ );
+ 
+ 			tle = makeTargetEntry((Expr *) expr,
+   attrno,
+   pstrdup(NameStr(att_tup->attname)),
+   false);
+ 		}
+ 
+ 		new_tlist = lappend(new_tlist, tle);
+ 	}
+ 
+ 	return new_tlist;
+ }


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Amit Langote
On 2018/03/19 16:45, Amit Langote wrote:
> I have tried to make these changes and attached are the updated patches
> containing those, including the change I suggested for 0001 (that is,
> getting rid of mt_onconflict).  I also expanded some comments in 0003
> while making those changes.

I realized that there should be a test where transition table is involved
for an ON CONFLICT DO UPDATE on a partitioned table due to relevant
statement trigger on the table; something like the attached.  But due to a
bug being discussed over at [1], we won't get the correct expected output
for the test until the latest patch submitted for that bug [2] is
committed as a bug fix.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/ba19eff9-2120-680e-4671-55a9bea9454f%40lab.ntt.co.jp

[2]
https://www.postgresql.org/message-id/df921671-32df-45ea-c0e4-9b51ee86ba3b%40lab.ntt.co.jp
>From 152c6d5afed21e775caf4862c00e5c6388f7403b Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 19 Mar 2018 17:13:08 +0900
Subject: [PATCH] More tests for ON CONFLICT DO UPDATE on partitioned tables

For transition tables of the DO UPDATE action.
---
 src/test/regress/expected/triggers.out | 33 +
 src/test/regress/sql/triggers.sql  | 33 +
 2 files changed, 66 insertions(+)

diff --git a/src/test/regress/expected/triggers.out 
b/src/test/regress/expected/triggers.out
index 99be9ac6e9..e8b849f773 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new 
table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = 
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table 
as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = , new table 
= 
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 
'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+ERROR:  new row for relation "iocdu_tt_parted1" violates partition constraint
+DETAIL:  Failing row contains (2, BBB).
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = , new table 
= 
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/triggers.sql 
b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table 
as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Amit Langote
Thanks for the updated patches.

On 2018/03/18 13:17, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
>> I think what I should be doing is the same as the returning stuff: keep
>> a tupdesc around, and use a single slot, whose descriptor is changed
>> just before the projection.
> 
> Yes, this works, though it's ugly.  Not any uglier than what's already
> there, though, so I think it's okay.
> 
> The only thing that I remain unhappy about this patch is the whole
> adjust_and_expand_partition_tlist() thing.  I fear we may be doing
> redundant and/or misplaced work.  I'll look into it next week.
> 
> 0001 should be pretty much ready to push -- adjustments to ExecInsert
>   and ModifyTableState I already mentioned.

This seems like good cleanup.

While at it, why not also get rid of mt_onconflict in favor of always just
using its counterpart in ModifyTable -- onConflictAction?

> 0002 is stuff I would like to get rid of completely -- changes to
>   planner code so that it better supports functionality we need for
>   0003.

Hmm.  I'm not sure if we can completely get rid of this, because we do
need the adjust_inherited_tlist() facility to translate TargetEntry resnos
in any case.

But as I just said in reply to Pavan's email suggesting deferring
onConflistSet expansion to execution time, we don't need the hack in
adjust_inherited_tlist() if we go with the suggestion.

> 0003 is the main patch.  Compared to the previous version, this one
>   reuses slots by switching them to different tupdescs as needed.

Your proposed change to use just one slot (the existing mt_conflproj slot)
sounds good.  Instead, it seems now we have an array to hold tupleDescs
for the onConflistSet target lists for each partition.

Some comments:

1. I noticed a bug that crashes a test in insert_conflit.sql that uses DO
NOTHING instead of DO UPDATE SET.  It's illegal for ExecInitPartitionInfo
to expect mt_conflproj_tupdesc to be valid in the DO NOTHING case, because
ExecInitModifyTable would only set it if (onConflictAction == DO_UPDATE).

2. It seems better to name the new array field in PartitionTupleRouting
partition_conflproj_tupdescs rather than partition_onconfl_tupdescs to be
consistent with the new field in ModifyTableState.

3. I think it was an oversight in my original patch, but it seems we
should allocate the partition_onconfl_tdescs array only if DO UPDATE
action is used.  Also, ri_onConflictSetProj, ri_onConflictSetWhere should
be only set in that case.  OTOH, we always need to set
partition_arbiter_indexes, that is, for both DO NOTHING and DO UPDATE SET
actions.

4. Need to remove the comments for partition_conflproj_slots and
partition_existing_slots, fields of PartitionTupleRouting that no longer
exist.  Instead one for partition_conflproj_tupdescs should be added.

5. I know the following is so as not to break the Assert in
adjust_inherited_tlist(), so why not have a parentOid argument for
adjust_and_expand_partition_tlist()?

+appinfo.parent_reloid = 1; // dummy  parentRel->rd_id;

6. There is a sentence in the comment above adjust_inherited_tlist():

  Note that this is not needed for INSERT because INSERT isn't
  inheritable.

Maybe, we need to delete that and mention that we do need it in the case
of INSERT ON CONFLICT DO UPDATE on partitioned tables for translating DO
UPDATE SET target list.

7. In ExecInsert, it'd be better to have a partArbiterIndexes, just like
partConflTupdesc in the outermost scope and then do:

+   /* Use the appropriate list of arbiter indexes. */
+   if (mtstate->mt_partition_tuple_routing != NULL)
+   arbiterIndexes = partArbiterIndexes;
+   else
+   arbiterIndexes = node->arbiterIndexes;

and

+   /* Use the appropriate tuple descriptor. */
+   if (mtstate->mt_partition_tuple_routing != NULL)
+   onconfl_tupdesc = partConflTupdesc;
+   else
+   onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;

using arbiterIndexes and onconfl_tupdesc declared in the appropriate scopes.


I have tried to make these changes and attached are the updated patches
containing those, including the change I suggested for 0001 (that is,
getting rid of mt_onconflict).  I also expanded some comments in 0003
while making those changes.

Thanks,
Amit
>From e208f2fb3c2eaac6f7932f8c15afab789679e5ee Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 16 Mar 2018 14:29:28 -0300
Subject: [PATCH v5 1/3] Simplify ExecInsert API re. ON CONFLICT data

Instead of passing the ON CONFLICT-related members of ModifyTableState
into ExecInsert(), we can have that routine obtain them from the node,
since that is already an argument into the function.

While at it, remove arbiterIndexes from ModifyTableState, since that's
just a copy of the list already in the ModifyTable node, to which the
state node already 

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-19 Thread Amit Langote
On 2018/03/05 18:04, Pavan Deolasee wrote:
> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
> wrote:
>> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
>> expand_targetlist() runs *after*, not before, so how could it have
>> affected the result?
>>
> 
> If I understand correctly, planner must have called expand_targetlist()
> once for the parent relation's descriptor and added any dropped columns
> from the parent relation. So we may not find mapped attributes for those
> dropped columns in the parent. I haven't actually tested this case though.
> 
> I wonder if it makes sense to actually avoid expanding on-conflict-set
> targetlist in case the target is a partition table and deal with it during
> execution, like we are doing now.
>> I'm obviously confused about what
>> expand_targetlist call this comment is talking about.  Anyway I wanted
>> to make it use resjunk entries instead, but that broke some other case
>> that I didn't have time to research yesterday.  I'll get back to this
>> soon, but in the meantime, here's what I have.
>>
> 
> +   conv_setproj =
> +
>  adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
> + RelationGetDescr(partrel),
> +
>  RelationGetRelationName(partrel),
> + firstVarno,
> + conv_setproj);
> 
> Aren't we are adding back Vars referring to the parent relation, after
> having converted the existing ones to use partition relation's varno? May
> be that works because missing attributes are already added during planning
> and expand_targetlist() here only adds dropped columns, which are just NULL
> constants.

I think this suggestion to defer onConflictSet target list expansion  to
execution time is a good one.  So, in preprocess_targetlist(), we'd only
perform exapand_targetlist on the onConflictSet list if the table is not a
partitioned table.  For partitioned tables, we don't know which
partition(s) will be affected, so it's useless to do the expansion there.
Instead it's better to expand in ExecInitPartitionInfo(), possibly after
changing original TargetEntry nodes to have correct resnos due to
attribute number differences (calling adjust_inherited_tlist(), etc.).

And then since we're not expanding the target list in the planner anymore,
we don't need to install those hacks in adjust_inherited_tlist() like the
patch currently does to ignore entries for dropped columns in the parent
the plan-time expansion adds.

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-17 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I think what I should be doing is the same as the returning stuff: keep
> a tupdesc around, and use a single slot, whose descriptor is changed
> just before the projection.

Yes, this works, though it's ugly.  Not any uglier than what's already
there, though, so I think it's okay.

The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing.  I fear we may be doing
redundant and/or misplaced work.  I'll look into it next week.

0001 should be pretty much ready to push -- adjustments to ExecInsert
  and ModifyTableState I already mentioned.

0002 is stuff I would like to get rid of completely -- changes to
  planner code so that it better supports functionality we need for
  0003.

0003 is the main patch.  Compared to the previous version, this one
  reuses slots by switching them to different tupdescs as needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 9f9d78e02a474402ee37ebcbed8390f4f3470743 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 16 Mar 2018 14:29:28 -0300
Subject: [PATCH v4 1/3] Simplify ExecInsert API re. ON CONFLICT data

Instead of passing the ON CONFLICT-related members of ModifyTableState
into ExecInsert(), we can have that routine obtain them from the node,
since that is already an argument into the function.

While at it, remove arbiterIndexes from ModifyTableState, since that's
just a copy of the list already in the ModifyTable node, to which the
state node already has access.
---
 src/backend/executor/nodeModifyTable.c | 18 +-
 src/include/nodes/execnodes.h  |  2 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index 3332ae4bf3..a9a48e914f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -258,8 +258,6 @@ static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
   TupleTableSlot *slot,
   TupleTableSlot *planSlot,
-  List *arbiterIndexes,
-  OnConflictAction onconflict,
   EState *estate,
   bool canSetTag)
 {
@@ -271,6 +269,7 @@ ExecInsert(ModifyTableState *mtstate,
List   *recheckIndexes = NIL;
TupleTableSlot *result = NULL;
TransitionCaptureState *ar_insert_trig_tcs;
+   OnConflictAction onconflict = mtstate->mt_onconflict;
 
/*
 * get the heap tuple out of the tuple table slot, making sure we have a
@@ -455,6 +454,7 @@ ExecInsert(ModifyTableState *mtstate,
else
{
WCOKind wco_kind;
+   boolcheck_partition_constr;
 
/*
 * We always check the partition constraint, including when the 
tuple
@@ -463,8 +463,7 @@ ExecInsert(ModifyTableState *mtstate,
 * trigger might modify the tuple such that the partition 
constraint
 * is no longer satisfied, so we need to check in that case.
 */
-   boolcheck_partition_constr =
-   (resultRelInfo->ri_PartitionCheck != NIL);
+   check_partition_constr = (resultRelInfo->ri_PartitionCheck != 
NIL);
 
/*
 * Constraints might reference the tableoid column, so 
initialize
@@ -510,6 +509,9 @@ ExecInsert(ModifyTableState *mtstate,
uint32  specToken;
ItemPointerData conflictTid;
boolspecConflict;
+   List   *arbiterIndexes;
+
+   arbiterIndexes = ((ModifyTable *) 
mtstate->ps.plan)->arbiterIndexes;
 
/*
 * Do a non-conclusive check for conflicts first.
@@ -627,7 +629,7 @@ ExecInsert(ModifyTableState *mtstate,
if (resultRelInfo->ri_NumIndices > 0)
recheckIndexes = ExecInsertIndexTuples(slot, 
&(tuple->t_self),

   estate, false, NULL,
-   
   arbiterIndexes);
+   
   NIL);
}
}
 
@@ -1217,8 +1219,8 @@ lreplace:;
Assert(mtstate->rootResultRelInfo != NULL);
estate->es_result_relation_info = 
mtstate->rootResultRelInfo;
 
-   ret_slot = ExecInsert(mtstate, slot, planSlot, NULL,
- 
ONCONFLICT_NONE, estate, canSetTag);
+

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera
Andres Freund wrote:
> Hi,
> 
> On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:
> > Another thing I noticed is that the split of the ON CONFLICT slot and
> > its corresponding projection is pretty weird.  The projection is in
> > ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> > make the projection work without a corresponding slot initialized with
> > the correct descriptor, so splitting it this way doesn't make a lot of
> > sense to me.
> > 
> > (Now, TBH the split between resultRelInfo->ri_projectReturning and
> > ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> > returning project uses, doesn't make a lot of sense to me either; so
> > maybe there some reason that I'm just not seeing.  But I digress.)
> 
> The projections for different child tables / child plans can look
> different, therefore it can't be stored in ModifyTableState itself. No?
> 
> The slot's descriptor is changed to be "appropriate" when necessary:
>   if (slot->tts_tupleDescriptor != 
> RelationGetDescr(resultRelationDesc))
>   ExecSetSlotDescriptor(slot, 
> RelationGetDescr(resultRelationDesc));

Grumble.  This stuff looks like full of cheats to me, but I won't touch
it anyway.

> > So I want to propose that we move the slot to be together with the
> > projection node that it serves, ie. we put the slot in ResultRelInfo:
> 
> That'll mean we have a good number of additional slots in some cases. I
> don't think the overhead of that is going to break the bank, but it's
> worth considering.

Good point.

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Andres Freund
Hi,

On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:
> Another thing I noticed is that the split of the ON CONFLICT slot and
> its corresponding projection is pretty weird.  The projection is in
> ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> make the projection work without a corresponding slot initialized with
> the correct descriptor, so splitting it this way doesn't make a lot of
> sense to me.
> 
> (Now, TBH the split between resultRelInfo->ri_projectReturning and
> ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> returning project uses, doesn't make a lot of sense to me either; so
> maybe there some reason that I'm just not seeing.  But I digress.)

The projections for different child tables / child plans can look
different, therefore it can't be stored in ModifyTableState itself. No?

The slot's descriptor is changed to be "appropriate" when necessary:
if (slot->tts_tupleDescriptor != 
RelationGetDescr(resultRelationDesc))
ExecSetSlotDescriptor(slot, 
RelationGetDescr(resultRelationDesc));


> So I want to propose that we move the slot to be together with the
> projection node that it serves, ie. we put the slot in ResultRelInfo:

That'll mean we have a good number of additional slots in some cases. I
don't think the overhead of that is going to break the bank, but it's
worth considering.

Greetings,

Andres Freund



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
>  wrote:
> > So ExecInsert receives the ModifyTableState, and separately it receives
> > arbiterIndexes and the OnConflictAction, both of which are members of
> > the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> > be simpler to extract those members from the node?
> 
> > Or is there another reason to pass the index list?
> 
> It works that way pretty much by accident, as far as I can tell.
> Removing the two extra arguments sounds like a good idea.

Great, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Peter Geoghegan
On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
 wrote:
> So ExecInsert receives the ModifyTableState, and separately it receives
> arbiterIndexes and the OnConflictAction, both of which are members of
> the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> be simpler to extract those members from the node?

> Or is there another reason to pass the index list?

It works that way pretty much by accident, as far as I can tell.
Removing the two extra arguments sounds like a good idea.

-- 
Peter Geoghegan



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera
So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState.  I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

With the patch proposed upthread, we receive arbiterIndexes as a
parameter and if the table is a partition we summarily ignore those and
use the list as extracted from the PartitionRoutingInfo.  This is
confusing and pointless.  It seems to me that the logic ought to be "if
partition then use the list in PartitionRoutingInfo; if not partition
use it from ModifyTableState".  This requires changing as per above,
i.e. make the arbiter index list not part of the ExecInsert's API.

The OnConflictAction doesn't matter much; not passing it is merely a
matter of cleanliness.

Or is there another reason to pass the index list?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Alvaro Herrera
Pavan Deolasee wrote:

> Hmm, yeah, probably you're right. The reason I got confused is because the
> patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
> converted projection info/where clause for a given partition in its
> ResultRelationInfo. So I was wondering if we can
> move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
> then just use that per-partition structures too.

I wonder if the proposed structure is good actually.

Some notes as I go along.

1. ModifyTableState->mt_arbiterindexes is just a copy of
ModifyTable->arbiterIndexes.  So why do we need it?  For an
unpartitioned relation we can just use
ModifyTableState.ps->arbiterIndexes.  Now, for each partition we need to
map these indexes onto the partition indexes.  Not sure where to put
these; I'm tempted to say ResultRelInfo is the place.  Maybe the list
should always be in ResultRelInfo instead of the state/plan node?  Not
sure.

2. We don't need mt_existing to be per-relation; a single tuple slot can
do for all partitions; we just need to ExecSlotSetDescriptor to the
partition's descriptor whenever the slot is going to be used.  (This
means the slot no longer has a fixed tupdesc.  That seems OK).

3. ModifyTableState->mt_conflproj is more or less the same thing; the
same TTS can be reused by all the different projections, as long as we
set the descriptor before projecting.  So we don't
need PartitionTupleRouting->partition_conflproj_slots, but we do need a
descriptor to be used when needed.  Let's say
  PartitionTupleRouting->partition_confl_tupdesc 

I'll experiment with these ideas and see where that leads me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Pavan Deolasee
On Fri, Mar 16, 2018 at 5:13 PM, Etsuro Fujita 
wrote:

> (2018/03/16 19:43), Pavan Deolasee wrote:
>
>> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera > > wrote:
>>
>
> @@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
>>  int num_subplan_partition_offsets;
>>  TupleTableSlot *partition_tuple_slot;
>>  TupleTableSlot *root_tuple_slot;
>> +   List  **partition_arbiter_indexes;
>> +   TupleTableSlot **partition_conflproj_slots;
>> +   TupleTableSlot **partition_existing_slots;
>>   } PartitionTupleRouting;
>>
>
> I am curious why you decided to add these members to
>> PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
>> place to track these or is there some rule that we follow?
>>
>
> I just started reviewing the patch, so maybe I'm missing something, but I
> think it would be a good idea to have these in that structure, not in
> ResultRelInfo, because these would be required only for partitions chosen
> via tuple routing.
>

Hmm, yeah, probably you're right. The reason I got confused is because the
patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
converted projection info/where clause for a given partition in its
ResultRelationInfo. So I was wondering if we can
move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
then just use that per-partition structures too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Etsuro Fujita

(2018/03/16 19:43), Pavan Deolasee wrote:

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera > wrote:



@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
 int num_subplan_partition_offsets;
 TupleTableSlot *partition_tuple_slot;
 TupleTableSlot *root_tuple_slot;
+   List  **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
  } PartitionTupleRouting;



I am curious why you decided to add these members to
PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
place to track these or is there some rule that we follow?


I just started reviewing the patch, so maybe I'm missing something, but 
I think it would be a good idea to have these in that structure, not in 
ResultRelInfo, because these would be required only for partitions 
chosen via tuple routing.


Best regards,
Etsuro Fujita



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-16 Thread Pavan Deolasee
On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
wrote:

>
>
@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
int num_subplan_partition_offsets;
TupleTableSlot *partition_tuple_slot;
TupleTableSlot *root_tuple_slot;
+   List  **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;


I am curious why you decided to add these members to PartitionTupleRouting
structure. Wouldn't ResultRelationInfo be a better place to track these or
is there some rule that we follow?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-05 Thread Pavan Deolasee
On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
wrote:

>
>
> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
> expand_targetlist() runs *after*, not before, so how could it have
> affected the result?
>

If I understand correctly, planner must have called expand_targetlist()
once for the parent relation's descriptor and added any dropped columns
from the parent relation. So we may not find mapped attributes for those
dropped columns in the parent. I haven't actually tested this case though.

I wonder if it makes sense to actually avoid expanding on-conflict-set
targetlist in case the target is a partition table and deal with it during
execution, like we are doing now.


> I'm obviously confused about what
> expand_targetlist call this comment is talking about.  Anyway I wanted
> to make it use resjunk entries instead, but that broke some other case
> that I didn't have time to research yesterday.  I'll get back to this
> soon, but in the meantime, here's what I have.
>

+   conv_setproj =
+
 adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+ RelationGetDescr(partrel),
+
 RelationGetRelationName(partrel),
+ firstVarno,
+ conv_setproj);

Aren't we are adding back Vars referring to the parent relation, after
having converted the existing ones to use partition relation's varno? May
be that works because missing attributes are already added during planning
and expand_targetlist() here only adds dropped columns, which are just NULL
constants.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-04 Thread Amit Langote
On 2018/03/03 0:36, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> Actually, after your comment on my original patch [1], I did make it work
>> for multiple levels by teaching the partition initialization code to find
>> a given partition's indexes that are inherited from the root table (that
>> is the table mentioned in command).  So, after a tuple is routed to a
>> partition, we switch from the original arbiterIndexes list to the one we
>> created for the partition, which must contain OIDs corresponding to those
>> in the original list.  After all, for each of the parent's indexes that
>> the planner put into the original arbiterIndexes list, there must exist an
>> index in each of the leaf partitions.
> 
> Oh, your solution for this seems simple enough.  Silly me, I was trying
> to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
> should save the "root" reloid in the relcache).

Do you mean store the root reloid for "any" partition (table or index)?

>> I had also observed when working on the patch that various TupleTableSlots
>> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
>> inheritance-translated target list (DO UPDATE SET target list).  In fact,
>> that has to take into account also the dropped columns; we may have
>> dropped columns either in parent or in a partition or in both at same or
>> different attnum positions.  That means, simple map_partition_varattnos()
>> translation doesn't help in this case.
> 
> Yeah, I was aware these corner cases could become a problem though I
> hadn't gotten around to testing them yet.  Thanks for all your work on
> this.
> 
> The usage of the few optimizer/prep/ functions that are currently static
> doesn't fill me with joy.  These functions have weird APIs because
> they're static so we don't rally care, but once we export them we should
> strive to be more careful.  I'd rather stay away from just exporting
> them all, so I chose to encapsulate those calls in a single function and
> export only expand_targetlist from preptlist.c, keeping the others
> static in prepunion.c.  In the attached patch set, I put an API change
> (work on tupdescs rather than full-blown relations) for a couple of
> those functions as 0001, then your patch as 0002, then a few fixups of
> my own.  (0002 is not bit-by-bit identical to yours; I think I had to
> fix some merge conflict with 0001, but should be pretty much the same).
> 
> But looking further, I think there is much cruft that has accumulated in
> those functions (because they've gotten simplified over time), and we
> could do some additional cleanup surgery.  For example, there is no
> reason to pass a list pointer to make_inh_translation_list(); we could
> just return it.  And then we don't have to cons up a fake AppendRelInfo
> with all dummy values that adjust_inherited_tlist doesn't even care
> about.  I think there was a point for all these contortions back at some
> point (visible by checking git history of this code), but it all seems
> useless now.

Yeah, I think it might as well work to fix up these interfaces the way
you've done.

> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
> expand_targetlist() runs *after*, not before, so how could it have
> affected the result?  I'm obviously confused about what
> expand_targetlist call this comment is talking about.  Anyway I wanted
> to make it use resjunk entries instead, but that broke some other case
> that I didn't have time to research yesterday.  I'll get back to this
> soon, but in the meantime, here's what I have.

Hmm.  I can imagine how the newly added comments in
adjust_inherited_tlist() may be confusing.  With this patch, we're now
calling adjust_inherited_tlist() from the executor, whereas it was
designed only to be run in the planner prep phase.  What we're passing to
it from the executor is the DO UPDATE SET's targetlist that has undergone
the expand_targetlist() treatment by the planner.  Maybe, we need to
update the adjust_inherited_tlist() comments to reflect the expansion of
its scope due to this patch.

Some comments on 0003:

+ * Fist, fix the target entries' resnos, by using inheritance
translation.

First

+appinfo.parent_reltype = InvalidOid; // parentRel->rd_rel->reltype;

I guess you won't retain that comment. :)

+result_tl = expand_targetlist(result_tl, CMD_UPDATE,

Should we add a CmdType argument to adjust_and_expand_partition_tlist()
and use it instead of hard-coding CMD_UPDATE here?

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-02 Thread Alvaro Herrera
Amit Langote wrote:

> Actually, after your comment on my original patch [1], I did make it work
> for multiple levels by teaching the partition initialization code to find
> a given partition's indexes that are inherited from the root table (that
> is the table mentioned in command).  So, after a tuple is routed to a
> partition, we switch from the original arbiterIndexes list to the one we
> created for the partition, which must contain OIDs corresponding to those
> in the original list.  After all, for each of the parent's indexes that
> the planner put into the original arbiterIndexes list, there must exist an
> index in each of the leaf partitions.

Oh, your solution for this seems simple enough.  Silly me, I was trying
to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
should save the "root" reloid in the relcache).

> I had also observed when working on the patch that various TupleTableSlots
> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
> inheritance-translated target list (DO UPDATE SET target list).  In fact,
> that has to take into account also the dropped columns; we may have
> dropped columns either in parent or in a partition or in both at same or
> different attnum positions.  That means, simple map_partition_varattnos()
> translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet.  Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy.  These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful.  I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c.  In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own.  (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery.  For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it.  And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about.  I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?  I'm obviously confused about what
expand_targetlist call this comment is talking about.  Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday.  I'll get back to this
soon, but in the meantime, here's what I have.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d5b3b7c252da2c2a25bd9b8de40a03f2d5f30081 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 1 Mar 2018 19:58:50 -0300
Subject: [PATCH v3 1/3] Make some static functions work on TupleDesc rather
 than Relation

---
 src/backend/optimizer/prep/preptlist.c | 21 +++--
 src/backend/optimizer/prep/prepunion.c | 26 --
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/prep/preptlist.c 
b/src/backend/optimizer/prep/preptlist.c
index 8603feef2b..8c94dd4f59 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -116,7 +116,8 @@ preprocess_targetlist(PlannerInfo *root)
tlist = parse->targetList;
if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
tlist = expand_targetlist(tlist, command_type,
- 
result_relation, target_relation);
+ 
result_relation,
+ 
RelationGetDescr(target_relation));
 
/*
 * Add necessary junk columns for rowmarked rels.  These values are 
needed
@@ -230,7 +231,7 @@ preprocess_targetlist(PlannerInfo *root)
expand_targetlist(parse->onConflict->onConflictSet,
  CMD_UPDATE,
  result_relation,

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-02 Thread Alvaro Herrera
Amit Langote wrote:

> Actually, after your comment on my original patch [1], I did make it work
> for multiple levels by teaching the partition initialization code to find
> a given partition's indexes that are inherited from the root table (that
> is the table mentioned in command).  So, after a tuple is routed to a
> partition, we switch from the original arbiterIndexes list to the one we
> created for the partition, which must contain OIDs corresponding to those
> in the original list.  After all, for each of the parent's indexes that
> the planner put into the original arbiterIndexes list, there must exist an
> index in each of the leaf partitions.

Oh, your solution for this seems simple enough.  Silly me, I was trying
to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
should save the "root" reloid in the relcache).

> I had also observed when working on the patch that various TupleTableSlots
> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
> inheritance-translated target list (DO UPDATE SET target list).  In fact,
> that has to take into account also the dropped columns; we may have
> dropped columns either in parent or in a partition or in both at same or
> different attnum positions.  That means, simple map_partition_varattnos()
> translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet.  Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy.  These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful.  I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c.  In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own.  (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery.  For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it.  And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about.  I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?  I'm obviously confused about what
expand_targetlist call this comment is talking about.  Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday.  I'll get back to this
soon, but in the meantime, here's what I have.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-02-28 Thread Amit Langote
On 2018/03/01 1:03, Robert Haas wrote:
> On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
>  wrote:
>> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
>> Following the lead of edd44738bc88 ("Be lazier about partition tuple
>> routing.") this incarnation only does the necessary push-ups for the
>> specific partition that needs it, at execution time.  As far as I can
>> tell, it works as intended.
>>
>> I chose to refuse the case where the DO UPDATE clause causes the tuple
>> to move to another partition (i.e. you're updating the partition key of
>> the tuple).  While it's probably possible to implement that, it doesn't
>> seem a very productive use of time.
> 
> I would have thought that to be the only case we could support with
> the current infrastructure.  Doesn't a correct implementation for any
> other case require a global index?

I'm thinking that Alvaro is talking here about the DO UPDATE action part,
not the conflict checking part.  The latter will definitely require global
indexes if conflict were to be checked on columns not containing the
partition key.

The case Alvaro mentions arises after checking the conflict, presumably
using an inherited unique index whose keys must include the partition
keys. If the conflict action is DO UPDATE and its SET clause changes
partition key columns such that the row will have to change the partition,
then the current patch will result in an error.  I think that's because
making update row movement work in this case will require some adjustments
to what 2f178441044 (Allow UPDATE to move rows between partitions)
implemented.  We wouldn't have things set up in the ModifyTableState that
the current row-movement code depends on being set up; for example, there
wouldn't be per-subplan ResultRelInfo's in the ON CONFLICT DO UPDATE case.
 The following Assert in ExecUpdate() will fail for instance:

map_index = resultRelInfo - mtstate->resultRelInfo;
Assert(map_index >= 0 && map_index < mtstate->mt_nplans);

Thanks,
Amit




Re: ON CONFLICT DO UPDATE for partitioned tables

2018-02-28 Thread Robert Haas
On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
 wrote:
> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time.  As far as I can
> tell, it works as intended.
>
> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple).  While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

I would have thought that to be the only case we could support with
the current infrastructure.  Doesn't a correct implementation for any
other case require a global index?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-02-28 Thread Amit Langote
On 2018/02/28 9:46, Alvaro Herrera wrote:
> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time.  As far as I can
> tell, it works as intended.

Thanks.  I too have been meaning to send an updated (nay, significantly
rewritten) version of the patch, but you beat me to it.

> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple).  While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

Probably a good idea to save it for later.

> However, there is a shortcoming in the design: it fails if there are
> multiple levels of partitioning, because there is no easy (efficient)
> way to map the index OID more than one level.  I had already mentioned
> this shortcoming to Amit's patch.  So this case (which I added in the
> regression tests) fails unexpectedly:
> 
> -- multiple-layered partitioning
> create table parted_conflict_test (a int primary key, b text) partition by 
> range (a);
> create table parted_conflict_test_1 partition of parted_conflict_test
>   for values from (0) to (1) partition by range (a);
> create table parted_conflict_test_1_1 partition of parted_conflict_test_1
>   for values from (0) to (100);
> insert into parted_conflict_test values ('10', 'ten');
> insert into parted_conflict_test values ('10', 'ten two')
>   on conflict (a) do update set b = excluded.b;
> ERROR:  invalid ON CONFLICT DO UPDATE specification
> DETAIL:  An inferred index was not found in partition 
> "parted_conflict_test_1_1".
> 
> So the problem here is that my MapPartitionIndexList() implementation is
> too stupid.  I think it would be smarter to use the ResultRelInfo
> instead of bare Relation, for one.  But that still doesn't answer how to
> find a "path" from root to leaf partition, which is what I'd need to
> verify that there are valid pg_inherits relationships for the partition
> indexes.  I'm probably missing something about the partitionDesc or
> maybe the partitioned_rels lists that helps me do it efficiently, but I
> hope figure out soon.
> 
> One idea I was toying with is to add RelationData->rd_children as a list
> of OIDs of children relations.  So it should be possible to walk the
> list from the root to the desired descendant, without having to scan
> pg_inherits repeatedly ... although that would probably require doing
> relation_open() for every index, which sounds undesirable.
> 
> (ISTM that having RelationData->rd_children might be a good idea in
> general anyway -- I mean to speed up some code that currently scans
> pg_inherits via find_inheritance_children.  However, since the partition
> descriptor itself is already in relcache, maybe this doesn't matter too
> much.)
> 
> Another idea is to abandon the notion that we need to find a path from
> parent index to descendant index, and just run the inference algorithm
> again on the partition.  I'm not sure how much I like this idea, yet.
> 
> Anyway, while this is still WIP, I think it works correctly for the case
> where there is a single partition level.

Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command).  So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list.  After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list).  In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions.  That means, simple map_partition_varattnos()
translation doesn't help in this case.

For example, with your patch (sorry, I know you said it's a WIP patch), I
see either a crash or errors when dealing with such differing attribute
numbers:

drop table p;
create table p (a int, b text) partition by list (a);
create table p12 (b text, a int);
alter table p attach partition p12 for values in (1, 2);
alter table p drop b, add b text;
create table p4 partition of p for values in (4);
create unique index on p (a);

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b;

insert into p values (1, 'b') on conflict (a) do update set b =