Re: default partition and concurrent attach partition

2020-09-08 Thread Amit Langote
On Wed, Sep 9, 2020 at 7:41 AM Alvaro Herrera  wrote:
> On 2020-Sep-08, Amit Langote wrote:
>
> > Yeah, we need to make sure that ExecPartitionCheck gets a slot whose
> > TupleDesc matches the partition's.  Actually we do have such dedicated
> > slots for partitions around (for both sub-partitioned and leaf
> > partitions), so we can simply use them instead of creating one from
> > scratch for every use.  It did take some code rearrangement to
> > actually make that work though.
>
> Pushed, thanks for working on this fix.

Thanks.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: default partition and concurrent attach partition

2020-09-08 Thread Alvaro Herrera
On 2020-Sep-08, Amit Langote wrote:

> Yeah, we need to make sure that ExecPartitionCheck gets a slot whose
> TupleDesc matches the partition's.  Actually we do have such dedicated
> slots for partitions around (for both sub-partitioned and leaf
> partitions), so we can simply use them instead of creating one from
> scratch for every use.  It did take some code rearrangement to
> actually make that work though.

Pushed, thanks for working on this fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-08 Thread Alvaro Herrera
On 2020-Sep-08, Alvaro Herrera wrote:

> Andres added to CC because of TTS interface: apparently calling
> slot_getallattrs() on a virtual slot raises error that "getsomeattrs is
> not required to be called on a virtual tuple table slot".  I'm thinking
> that this exposes implementation details that should not be necessary
> for a caller to know; patch 0001 fixes that at the problematic caller by
> making the slot_getallatrs() call conditional on the slot not being
> virtual, but I wonder if the better fix isn't to remove the elog(ERROR)
> at tts_virtual_getsomeattrs.

Actually I misread the code, so this is all bogus.  Nevermind ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-08 Thread Alvaro Herrera
Hi,

Andres added to CC because of TTS interface: apparently calling
slot_getallattrs() on a virtual slot raises error that "getsomeattrs is
not required to be called on a virtual tuple table slot".  I'm thinking
that this exposes implementation details that should not be necessary
for a caller to know; patch 0001 fixes that at the problematic caller by
making the slot_getallatrs() call conditional on the slot not being
virtual, but I wonder if the better fix isn't to remove the elog(ERROR)
at tts_virtual_getsomeattrs.

Moving on from that -- this is a version of Amit's previous patch that I
like better.  I think the "prev_myslot" thing was a bit ugly, but it
turns out that with the above fix we can clear the slot before creating
the new one, making things more sensible.  I also changed the "do {}
while ()" into a simple "while {}" loop, which is sensible since the
condition is true on loop entrance.  Minor comment rewording also.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ef139f89531ba15f480cbb64c2bddeee03dc20ab Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 8 Sep 2020 16:55:30 -0300
Subject: [PATCH v3 1/2] Don't "getsomeattrs" on virtual slots
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

execute_attr_map_slot() was not careful about not calling
slot_getallattrs() on tuple slots that could be virtual.  Repair.

Author: Álvaro Herrera 
---
 src/backend/access/common/tupconvert.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index 3cb0cbefaa..d440c1201a 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -193,7 +193,8 @@ execute_attr_map_slot(AttrMap *attrMap,
 	outnatts = out_slot->tts_tupleDescriptor->natts;
 
 	/* Extract all the values of the in slot. */
-	slot_getallattrs(in_slot);
+	if (!TTS_IS_VIRTUAL(in_slot))
+		slot_getallattrs(in_slot);
 
 	/* Before doing the mapping, clear any old contents from the out slot */
 	ExecClearTuple(out_slot);
-- 
2.20.1

>From e5a2b5ddbbb55dd9a83474f3f55b8f8724a3a63e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 3 Sep 2020 13:18:35 -0400
Subject: [PATCH v3 2/2] Check default partitions constraints while descending
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Partitioning tuple route code assumes that the partition chosen while
descending the partition hierarchy is always the correct one.  This is
true except when the partition is the default partition and another
partition has been added concurrently: the partition constraint changes
and we don't recheck it.  This can lead to tuples mistakenly being added
to the default partition that should have been rejected.

Fix by rechecking the default partition constraint while descending the
hierarchy.

An isolation test based on the reproduction steps described by Hao Wu
(with tweaks for extra coverage) is included.

Reported by: Hao Wu 
Author: Amit Langote 
Author: Álvaro Herrera 
Discussion: https://postgr.es/m/ca+hiwqfqbmcssap4sfncbuel_vfommekaq3gwuhyfa4c7j_...@mail.gmail.com
Discussion: https://postgr.es/m/dm5pr0501mb3910e97a9edfb4c775cf3d75a4...@dm5pr0501mb3910.namprd05.prod.outlook.com
---
 src/backend/executor/execPartition.c  | 127 ++
 .../expected/partition-concurrent-attach.out  |  49 +++
 src/test/isolation/isolation_schedule |   1 +
 .../specs/partition-concurrent-attach.spec|  43 ++
 4 files changed, 195 insertions(+), 25 deletions(-)
 create mode 100644 src/test/isolation/expected/partition-concurrent-attach.out
 create mode 100644 src/test/isolation/specs/partition-concurrent-attach.spec

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 79fcbd6b06..bc2372959c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -51,6 +51,11 @@
  *		PartitionDispatchData->indexes for details on how this array is
  *		indexed.
  *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
  * num_dispatch
  *		The current number of items stored in the 'partition_dispatch_info'
  *		array.  Also serves as the index of the next free array element for
@@ -89,6 +94,7 @@ struct PartitionTupleRouting
 {
 	Relation	partition_root;
 	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
 	int			num_dispatch;
 	int			max_dispatch;
 	ResultRelInfo **partitions;
@@ -280,9 +286,11 @@ ExecFindPartition(ModifyTableState *mtstate,
 	PartitionDispatch dispatch;
 	PartitionDesc partdesc;
 	ExprContext *ecxt = GetPerTupleExprContext(estate);
-	TupleTableSlot *ecxt_scantuple_old = 

Re: default partition and concurrent attach partition

2020-09-08 Thread Alvaro Herrera
Hello Amit,

On 2020-Sep-08, Amit Langote wrote:

> On Tue, Sep 8, 2020 at 8:44 AM Alvaro Herrera  
> wrote:
> > On 2020-Sep-07, Alvaro Herrera wrote:
> >
> > > Ah, it looks like we can get away with initializing the RRI to 0, and
> > > then explicitly handle that case in ExecPartitionCheckEmitError, as in
> > > the attached (which means reindenting, but I left it alone to make it
> > > easy to read).
> 
> At this point, I think it may be clear that ri_RangeTableIndex being
> set to a dummy value isn't problematic.

Yep ... I misled myself.

> Yeah, we need to make sure that ExecPartitionCheck gets a slot whose
> TupleDesc matches the partition's.  Actually we do have such dedicated
> slots for partitions around (for both sub-partitioned and leaf
> partitions),

Yeah, that's what I was looking for.

> so we can simply use them instead of creating one from
> scratch for every use.  It did take some code rearrangement to
> actually make that work though.

Thanks.  It doesn't look too bad ... I'd say it even looks easier to
read now in terms of code structure.

> Attached is the latest patch including those changes.  Also, I have
> merged your earlier "fixes" patch and updated the isolation test to
> exercise a case with sub-partitioned default partition, as well as
> varying attribute numbers.  Patch applies as-is to both HEAD and v13
> branches, but needs slight changes to apply to v12 branch, so also
> attaching one for that branch.

Thanks, will dispatch it shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-08 Thread Amit Langote
Hi Alvaro,

On Tue, Sep 8, 2020 at 8:44 AM Alvaro Herrera  wrote:
> On 2020-Sep-07, Alvaro Herrera wrote:
>
> > Ah, it looks like we can get away with initializing the RRI to 0, and
> > then explicitly handle that case in ExecPartitionCheckEmitError, as in
> > the attached (which means reindenting, but I left it alone to make it
> > easy to read).

At this point, I think it may be clear that ri_RangeTableIndex being
set to a dummy value isn't problematic.  I checked the code paths you
mentioned upthread, but don't see anything that would be broken by
ri_RangeTableIndex being set to a dummy value of 1.

Moving on from that...

> Well, that was silly -- this seems fixable by mapping the tuple columns
> prior to ExecPartitionCheck, which is achievable with something like
>
> if (partidx == partdesc->boundinfo->default_index)
> {
> TupleTableSlot *tempslot;
>
> if (dispatch->tupmap != NULL)
> {
> tempslot = 
> MakeSingleTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
>   
>  );
> tempslot = 
> execute_attr_map_slot(dispatch->tupmap, slot, tempslot);
> }
> else
> tempslot = slot;
> ExecPartitionCheck(rri, tempslot, estate, true);
> if (dispatch->tupmap != NULL)
> ExecDropSingleTupleTableSlot(tempslot);
> }
>
> though this exact incantation, apart from being pretty horrible, doesn't
> actually work (other than to fix this specific test case -- it crashes
> elsewhere.)

Yeah, we need to make sure that ExecPartitionCheck gets a slot whose
TupleDesc matches the partition's.  Actually we do have such dedicated
slots for partitions around (for both sub-partitioned and leaf
partitions), so we can simply use them instead of creating one from
scratch for every use.  It did take some code rearrangement to
actually make that work though.

Attached is the latest patch including those changes.  Also, I have
merged your earlier "fixes" patch and updated the isolation test to
exercise a case with sub-partitioned default partition, as well as
varying attribute numbers.  Patch applies as-is to both HEAD and v13
branches, but needs slight changes to apply to v12 branch, so also
attaching one for that branch.



--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


v2-0001-Check-default-partition-s-constraint-even-after-t.patch
Description: Binary data


v2-0001-Check-default-partition-s-constraint-even-after-t_v12.patch
Description: Binary data


Re: default partition and concurrent attach partition

2020-09-07 Thread Alvaro Herrera
On 2020-Sep-07, Alvaro Herrera wrote:

> Ah, it looks like we can get away with initializing the RRI to 0, and
> then explicitly handle that case in ExecPartitionCheckEmitError, as in
> the attached (which means reindenting, but I left it alone to make it
> easy to read).

Well, that was silly -- this seems fixable by mapping the tuple columns
prior to ExecPartitionCheck, which is achievable with something like

if (partidx == partdesc->boundinfo->default_index)
{
TupleTableSlot *tempslot;

if (dispatch->tupmap != NULL)
{
tempslot = 
MakeSingleTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),

   );
tempslot = 
execute_attr_map_slot(dispatch->tupmap, slot, tempslot);
}
else
tempslot = slot;
ExecPartitionCheck(rri, tempslot, estate, true);
if (dispatch->tupmap != NULL)
ExecDropSingleTupleTableSlot(tempslot);
}

though this exact incantation, apart from being pretty horrible, doesn't
actually work (other than to fix this specific test case -- it crashes
elsewhere.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-07 Thread Alvaro Herrera
Ah, it looks like we can get away with initializing the RRI to 0, and
then explicitly handle that case in ExecPartitionCheckEmitError, as in
the attached (which means reindenting, but I left it alone to make it
easy to read).  It kinda sucks because we don't report the tuple that
causes the error, but

a) it's a very unlikely case anyway
b) it's better than the bogus error message
c) it's better than some hypothetical crash
d) nobody uses partitioned default partitions anyway
e) nobody uses differing column order anyway

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7289a0706d95aa621b9d6f626a836ac381fd4f61 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 7 Sep 2020 19:26:37 -0300
Subject: [PATCH] Avoid invalid RRI

---
 src/backend/executor/execMain.c  | 7 +++
 src/backend/executor/execPartition.c | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..a0c3e56a03 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1840,6 +1840,12 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 	 * back to the root table's rowtype so that val_desc in the error message
 	 * matches the input tuple.
 	 */
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		val_desc = NULL;
+	}
+	else
+	{
 	if (resultRelInfo->ri_PartitionRoot)
 	{
 		TupleDesc	old_tupdesc;
@@ -1874,6 +1880,7 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 			 tupdesc,
 			 modifiedCols,
 			 64);
+	}
 	ereport(ERROR,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("new row for relation \"%s\" violates partition constraint",
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0ca1d34dfa..c1ef34d771 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -,7 +,7 @@ ExecInitPartitionDispatchInfo(EState *estate,
 	{
 		ResultRelInfo *rri = makeNode(ResultRelInfo);
 
-		InitResultRelInfo(rri, rel, 1, proute->partition_root, 0);
+		InitResultRelInfo(rri, rel, 0, proute->partition_root, 0);
 		proute->nonleaf_partitions[dispatchidx] = rri;
 	}
 	else
-- 
2.20.1



Re: default partition and concurrent attach partition

2020-09-07 Thread Alvaro Herrera
On 2020-Sep-07, Alvaro Herrera wrote:

> Well, they are fake in that the ri_RangeTableIndex they carry is bogus,
> which means that ExecBuildSlotValueDescription will misbehave if the
> partitioned default partition has a different column order than its
> parent.  That can be evidenced by changing the setup block in the
> isolation test as in the attached; and you'll get an undesirable error
> like this:
> 
> 2020-09-07 19:00:49.513 -03 [12981] ERROR:  attribute 2 of type record has 
> wrong type
> 2020-09-07 19:00:49.513 -03 [12981] DETAIL:  Table has type text, but query 
> expects integer.
> 2020-09-07 19:00:49.513 -03 [12981] STATEMENT:  insert into attach_tab values 
> (110, 'eleven and five twenties');

... and I sent before completing.  I'm not sure what a good fix for this
is.  We could try to initialize the resultRelInfo honestly, or we could
set ri_RangeTableIndex to some invalid value that will ... eh ... do
something else.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-07 Thread Alvaro Herrera
On 2020-Sep-04, Amit Langote wrote:

Hello

> FWIW, I still prefer "minimally valid ResultRelInfo" to "fake
> ResultRelInfo", because those aren't really fake, are they?  They are
> as valid as any other ResultRelInfo as far I can see.  I said
> "minimally valid" because a fully-valid partition ResultRelInfo is one
> made by ExecInitPartitionInfo(), which I avoided to use here thinking
> that would be overkill.

Well, they are fake in that the ri_RangeTableIndex they carry is bogus,
which means that ExecBuildSlotValueDescription will misbehave if the
partitioned default partition has a different column order than its
parent.  That can be evidenced by changing the setup block in the
isolation test as in the attached; and you'll get an undesirable error
like this:

2020-09-07 19:00:49.513 -03 [12981] ERROR:  attribute 2 of type record has 
wrong type
2020-09-07 19:00:49.513 -03 [12981] DETAIL:  Table has type text, but query 
expects integer.
2020-09-07 19:00:49.513 -03 [12981] STATEMENT:  insert into attach_tab values 
(110, 'eleven and five twenties');

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
# Verify that default partition constraint is enforced correctly
# in light of partitions being added concurrently to its parent
setup {
  drop table if exists attach_tab;
  create table attach_tab (i int, j text) partition by range (i);
  create table attach_tab_1 partition of attach_tab for values from (0) to 
(100);
  create table attach_tab_default (j text, i int) partition by range (i);
  alter table attach_tab attach partition attach_tab_default default;
  create table attach_tab_default_1 partition of attach_tab_default for values 
from (110) to (120);
  create table attach_tab_2 (like attach_tab);
  insert into attach_tab_2 values (111, 'ONE HUNDRED AND ELEVENTH BIRTHDAY');
}

session "s1"
step "s1b"  { begin; }
step "s1a"  { alter table attach_tab attach partition attach_tab_2 for 
values from (100) to (200); }
step "s1c"  { commit; }

session "s2"
step "s2b"  { begin; }
step "s2i"  { insert into attach_tab values (111, 'eleven and five 
twenties'); }
step "s2c"  { commit; }

teardown{ drop table if exists attach_tab, attach_tab_1, attach_tab_2; }

# insert into attach_tab by s2 which routes to attach_tab_default due to not 
seeing
# concurrently added attach_tab_2 should fail, because the partition constraint
# of attach_tab_default would have changed due to attach_tab_2 having been added
permutation "s1b" "s1a" "s2b" "s2i" "s1c" "s2c"

# reverse: now the insert into attach_tab_default by s2 occurs first followed by
# attach in s1, which should fail when it scans the default partitions to
# find the violating rows
permutation "s1b" "s2b" "s2i" "s1a" "s2c" "s1c"


Re: default partition and concurrent attach partition

2020-09-03 Thread Amit Langote
Hi Alvaro,

On Fri, Sep 4, 2020 at 6:28 AM Alvaro Herrera  wrote:
>
> Also, I should have pointed out that ExecInsert doesn't actually check
> the partitin constraint except in very specific cases; what it does is
> expect that the partition routing code got it right.  So the comment
> you're adding about that is wrong, and it did misled me into changing
> your code in a way that actually caused a bug -- hence my proposed
> rewording.

Thanks for taking a look.

/*
 * If this partition is the default one, we must check its partition
-* constraint, because it may have changed due to partitions being
-* added to the parent concurrently.  We do the check here instead of
-* in ExecInsert(), because we would not want to miss checking the
-* constraint of any nonleaf partitions as they never make it to
-* ExecInsert().
+* constraint now, which may have changed due to partitions being
+* added to the parent concurrently.

I am fine with your rewording but let me explain how I ended up
writing what I wrote:

At first I had pondered tweaking the following code in ExecInsert() to
fix this bug:

/*
 * Also check the tuple against the partition constraint, if there is
 * one; except that if we got here via tuple-routing, we don't need to
 * if there's no BR trigger defined on the partition.
 */
if (resultRelInfo->ri_PartitionCheck &&
(resultRelInfo->ri_PartitionRoot == NULL ||
 (resultRelInfo->ri_TrigDesc &&
  resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
ExecPartitionCheck(resultRelInfo, slot, estate, true);

I gave up because I concluded that there isn't enough information at
this place in code to determine if the partition is a default
partition, so I moved my attention to ExecFindPartition() where we
have access to the parent's PartitionDesc which is enough to do so.
In the process of modifying ExecFindPartition() to fix the bug I
realized that default partitions can be partitioned and
sub-partitioned partitions never reach ExecInsert().  So, beside the
earlier mentioned inconvenience of fixing this bug in ExecInsert(),
there is also the problem that such a fix would have missed addressing
sub-partitioned default partitions.  I thought someone beside me would
also wonder why ExecInsert() is not touched in this fix, hence the
comment.

FWIW, I still prefer "minimally valid ResultRelInfo" to "fake
ResultRelInfo", because those aren't really fake, are they?  They are
as valid as any other ResultRelInfo as far I can see.  I said
"minimally valid" because a fully-valid partition ResultRelInfo is one
made by ExecInitPartitionInfo(), which I avoided to use here thinking
that would be overkill.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: default partition and concurrent attach partition

2020-09-03 Thread Alvaro Herrera
Also, I should have pointed out that ExecInsert doesn't actually check
the partitin constraint except in very specific cases; what it does is
expect that the partition routing code got it right.  So the comment
you're adding about that is wrong, and it did misled me into changing
your code in a way that actually caused a bug -- hence my proposed
rewording.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-03 Thread Alvaro Herrera
On 2020-Sep-03, Alvaro Herrera wrote:

> + /*
> +  * If setting up a PartitionDispatch for a sub-partitioned table, we may
> +  * also need a fake ResultRelInfo for checking the partition constraint
> +  * later; set that up now.
> +  */
> + if (parent_pd)
> + {
> + ResultRelInfo *rri = makeNode(ResultRelInfo);
> +
> + InitResultRelInfo(rri, rel, 1, proute->partition_root, 0);
> + proute->nonleaf_partitions[dispatchidx] = rri;
> + }
> +

Heh, I had intended to remove the attachment before sending, because I
thought I was seeing a problem with this proposed coding of mine.  But
since I sent it by mistake, I'll explain: I think this will fail if we
have a partitioned default partition, and we direct the insert to it
directly while attaching a partition to its parent.  I think that kind
of scenario deserves its own test case.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: default partition and concurrent attach partition

2020-09-03 Thread Alvaro Herrera
Thanks for this fix!  Looking into it now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 4d34734a45..fe42670e0a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -52,11 +52,9 @@
  * indexed.
  *
  * nonleaf_partitions
- * Array of 'max_dispatch' elements containing pointers to
- * ResultRelInfos of nonleaf partitions touched by tuple routing.  
These
- * are currently only used for checking the partition's constraint 
if
- * the nonleaf partition happens to be a default partition of its
- * parent
+ * Array of 'max_dispatch' elements containing pointers to fake
+ * ResultRelInfo objects for nonleaf partitions, useful for 
checking
+ * the partition constraint.
  *
  * num_dispatch
  * The current number of items stored in the 
'partition_dispatch_info'
@@ -305,7 +303,7 @@ ExecFindPartition(ModifyTableState *mtstate,
 
/* start with the root partitioned table */
dispatch = pd[0];
-   while (true)
+   do
{
AttrNumber *map = dispatch->tupmap;
int partidx = -1;
@@ -455,22 +453,12 @@ ExecFindPartition(ModifyTableState *mtstate,
 
/*
 * If this partition is the default one, we must check its 
partition
-* constraint, because it may have changed due to partitions 
being
-* added to the parent concurrently.  We do the check here 
instead of
-* in ExecInsert(), because we would not want to miss checking 
the
-* constraint of any nonleaf partitions as they never make it to
-* ExecInsert().
+* constraint now, which may have changed due to partitions 
being
+* added to the parent concurrently.
 */
if (partidx == partdesc->boundinfo->default_index)
-   {
-   Assert(rri != NULL);
ExecPartitionCheck(rri, slot, estate, true);
-   }
-
-   /* Break out and return if we've found the leaf partition. */
-   if (dispatch == NULL)
-   break;
-   }
+   } while (dispatch != NULL);
 
MemoryContextSwitchTo(oldcxt);
ecxt->ecxt_scantuple = ecxt_scantuple_old;
@@ -1037,7 +1025,6 @@ ExecInitPartitionDispatchInfo(EState *estate,
Relationrel;
PartitionDesc partdesc;
PartitionDispatch pd;
-   ResultRelInfo *rri = NULL;
int dispatchidx;
MemoryContext oldcxt;
 
@@ -1123,6 +1110,19 @@ ExecInitPartitionDispatchInfo(EState *estate,
}
proute->partition_dispatch_info[dispatchidx] = pd;
 
+   /*
+* If setting up a PartitionDispatch for a sub-partitioned table, we may
+* also need a fake ResultRelInfo for checking the partition constraint
+* later; set that up now.
+*/
+   if (parent_pd)
+   {
+   ResultRelInfo *rri = makeNode(ResultRelInfo);
+
+   InitResultRelInfo(rri, rel, 1, proute->partition_root, 0);
+   proute->nonleaf_partitions[dispatchidx] = rri;
+   }
+
/*
 * Finally, if setting up a PartitionDispatch for a sub-partitioned 
table,
 * install a downlink in the parent to allow quick descent.
@@ -1133,17 +1133,6 @@ ExecInitPartitionDispatchInfo(EState *estate,
parent_pd->indexes[partidx] = dispatchidx;
}
 
-   /*
-* Also create a minimally valid ResultRelInfo to check the partition's
-* partition's constraint.
-*/
-   if (rel->rd_rel->relispartition)
-   {
-   rri = makeNode(ResultRelInfo);
-   InitResultRelInfo(rri, rel, 1, proute->partition_root, 0);
-   }
-   proute->nonleaf_partitions[dispatchidx] = rri;
-
MemoryContextSwitchTo(oldcxt);
 
return pd;


Re: default partition and concurrent attach partition

2020-09-03 Thread Amit Langote
On Thu, Sep 3, 2020 at 6:50 PM Amit Langote  wrote:
>
> Hi,
>
> Starting a new thread to discuss a bug related to $subject that Hao Wu
> reported on thread titled "ALTER TABLE .. DETACH PARTITION
> CONCURRENTLY" [1].  I have been able to reproduce the bug using steps
> that Hao gave in that email:
>
> create table tpart (i int, j int) partition by range(i);
> create table tpart_1 (like tpart);
> create table tpart_2 (like tpart);
> create table tpart_default (like tpart);
> alter table tpart attach partition tpart_1 for values from (0) to (100);
> alter table tpart attach partition tpart_default default;
> insert into tpart_2 values (110,110), (120,120), (150,150);
>
> Session 1:
>
> begin;
> alter table tpart attach partition tpart_2 for values from (100) to (200);
>
> Session 2:
>
> begin;
> insert into tpart values (110,110), (120,120), (150,150);
> 
>
> Session 1:
>
> end;
>
> Session 2:
>
> select tableoid::regclass, * from tpart;
> end;
>
> The select will show that rows inserted by session 2 are inserted into
> tpart_default, whereas after successfully attaching tpart_2, they do
> not actually belong there.
>
> The problem is that when session 2 inserts those rows into tpart, it
> only knows about 2 partitions: tpart_1, tpart_default, of which it
> selects tpart_default to insert those rows into.  When tpart_default
> is locked to perform the insert, it waits for session 1 to release the
> lock taken on tpart_default during the attach command.  When it is
> unblocked, it proceeds to finish the insert without rechecking the
> partition constraint which would have been updated as result of a new
> partition having been added to the parent.
>
> Note that we don't normally check the partition constraint when
> inserting a row into a partition if the insert occurs via tuple
> routing, which makes sense for non-default partitions whose partition
> constraint cannot change due to concurrent activity.  But this test
> case has shown that the assumption is not safe for a default partition
> whose constraint is a function of other partitions that exist as of
> when the insert occurs.
>
> By the way, if you reverse the order of operations between session 1
> and 2 such that the insert by session 2 occurs first and then the
> attach by session 1, then you will correctly get this error from the
> attach command:
>
> ERROR:  updated partition constraint for default partition
> "tpart_default" would be violated by some row
>
> Attached is a patch to fix things on the insert side.

Forgot to mention that the problem exists as of v12 (commit: 898e5e329).

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




default partition and concurrent attach partition

2020-09-03 Thread Amit Langote
Hi,

Starting a new thread to discuss a bug related to $subject that Hao Wu
reported on thread titled "ALTER TABLE .. DETACH PARTITION
CONCURRENTLY" [1].  I have been able to reproduce the bug using steps
that Hao gave in that email:

create table tpart (i int, j int) partition by range(i);
create table tpart_1 (like tpart);
create table tpart_2 (like tpart);
create table tpart_default (like tpart);
alter table tpart attach partition tpart_1 for values from (0) to (100);
alter table tpart attach partition tpart_default default;
insert into tpart_2 values (110,110), (120,120), (150,150);

Session 1:

begin;
alter table tpart attach partition tpart_2 for values from (100) to (200);

Session 2:

begin;
insert into tpart values (110,110), (120,120), (150,150);


Session 1:

end;

Session 2:

select tableoid::regclass, * from tpart;
end;

The select will show that rows inserted by session 2 are inserted into
tpart_default, whereas after successfully attaching tpart_2, they do
not actually belong there.

The problem is that when session 2 inserts those rows into tpart, it
only knows about 2 partitions: tpart_1, tpart_default, of which it
selects tpart_default to insert those rows into.  When tpart_default
is locked to perform the insert, it waits for session 1 to release the
lock taken on tpart_default during the attach command.  When it is
unblocked, it proceeds to finish the insert without rechecking the
partition constraint which would have been updated as result of a new
partition having been added to the parent.

Note that we don't normally check the partition constraint when
inserting a row into a partition if the insert occurs via tuple
routing, which makes sense for non-default partitions whose partition
constraint cannot change due to concurrent activity.  But this test
case has shown that the assumption is not safe for a default partition
whose constraint is a function of other partitions that exist as of
when the insert occurs.

By the way, if you reverse the order of operations between session 1
and 2 such that the insert by session 2 occurs first and then the
attach by session 1, then you will correctly get this error from the
attach command:

ERROR:  updated partition constraint for default partition
"tpart_default" would be violated by some row

Attached is a patch to fix things on the insert side.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


[1] 
https://www.postgresql.org/message-id/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0%40DM5PR0501MB3910.namprd05.prod.outlook.com


v1-0001-Check-default-partition-s-constraint-even-after-t.patch
Description: Binary data