Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-18 Thread Kyotaro HORIGUCHI
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote 
 wrote in 

> Hi.
> 
> On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> > << the following is another topic >>
> > 
>  BTW, in the partitioned table case, the parent is always locked first
>  using an AccessExclusiveLock.  There are other considerations in that 
>  case
>  such as needing to recreate the partition descriptor upon termination of
>  inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >>>
> >>> Apart from the degree of concurrency, if we keep parent->children
> >>> order of locking, such recreation does not seem to be
> >>> needed. Maybe I'm missing something.
> >>
> >> Sorry to have introduced that topic in this thread, but I will try to
> >> explain anyway why things are the way they are currently:
> >>
> >> Once a table is no longer a partition of the parent (detached or dropped),
> >> we must make sure that the next commands in the transaction don't see it
> >> as one.  That information is currently present in the relcache
> >> (rd_partdesc), which is used by a few callers, most notably the
> >> tuple-routing code.  Next commands must recreate the entry so that the
> >> correct thing happens based on the updated information.  More precisely,
> >> we must invalidate the current entry.  RelationClearRelation() will either
> >> delete the entry or rebuild it.  If it's being referenced somewhere, it
> >> will be rebuilt.  The place holding the reference may also be looking at
> >> the content of rd_partdesc, which we don't require them to make a copy of,
> >> so we must preserve its content while other fields of RelationData are
> >> being read anew from the catalog.  We don't have to preserve it if there
> >> has been any change (partition added/dropped), but to make such a change
> >> one would need to take a strong enough lock on the relation (parent).  We
> >> assume here that anyone who wants to reference rd_partdesc takes at least
> >> AccessShareLock lock on the relation, and anyone who wants to change its
> >> content must take a lock that will conflict with it, so
> >> AccessExclusiveLock.  Note that in all of this, we are only talking about
> >> one relation, that is the parent, so parent -> child ordering of taking
> >> locks may be irrelevant.
> > 
> > I think I understand this, anyway DropInherit and DropPartition
> > is different-but-at-the-same-level operations so surely needs
> > amendment for drop/detach cases. Is there already a solution? Or
> > reproducing steps?
> 
> Sorry, I think I forgot to reply to this.  Since you seem to have chosen
> the other solution (checking that child is still a child), maybe this
> reply is a bit too late, but anyway.

I choosed it at that time for the reason mentioned upthread, but
haven't decided which is better.

> DropInherit or NO INHERIT is seen primarily as changing a child table's
> (which is the target table of the command) property that it is no longer a
> child of the parent, so we lock the child table to block concurrent
> operations from considering it a child of parent anymore.  The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.
> 
> DropPartition or DETACH PARTITION is seen primarily as changing the parent
> table's (which is the target table of the command) property that one of
> the partitions is removed, so we lock the parent.   Any concurrent
> operations that rely on the parent's relcache to get the partition list
> will wait for the session that is dropping the partition to finish, so
> that they get the fresh information from the relcache (or more importantly
> do not end up with information obtained from the relcache going invalid
> under them without notice).  Note that the lock on the partition/child is
> also present and it plays more or less the the same role as it does in the
> DropInherit case, but due to different order of locking, reported race
> condition does not occur between SELECT on partitioned table and
> DROP/DETACH PARTITION.

Thank you for the explanation. I understand that the difference
comes from which of parent and children has the information about
inheritance/partitioning. DROP child and ALTER child NO INHERITS
are (I think) the only two operations that intiated from children
side. The parent-locking patch results in different solutions for
similar problems.

However, parent locking on DROPped child requires a new index
InheritsRelidIndex to avoid full scan on pg_inherits, but the NO
INHERITS case doesn't. This might be a good reason for the
difference.

I noticed that DROP TABLE partition acquires lock on the parent
in a callback for RangeVarGetRelid(Extended). The same way seems
reasonable for the NO INHERIT case. As the result the patch
becomes far small and less invasive than the previous
one. (dropinh_lock_parent_v2.patch)

As a negative PoC, attacched dropchild_lock_parent

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-14 Thread Amit Langote
On 2017/09/15 15:36, Amit Langote wrote:
> The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.

Oops, I meant "parent table is locked with AccessShareLock instead of
AccessExclusiveLock..."

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-14 Thread Amit Langote
Hi.

On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> << the following is another topic >>
> 
 BTW, in the partitioned table case, the parent is always locked first
 using an AccessExclusiveLock.  There are other considerations in that case
 such as needing to recreate the partition descriptor upon termination of
 inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
>>>
>>> Apart from the degree of concurrency, if we keep parent->children
>>> order of locking, such recreation does not seem to be
>>> needed. Maybe I'm missing something.
>>
>> Sorry to have introduced that topic in this thread, but I will try to
>> explain anyway why things are the way they are currently:
>>
>> Once a table is no longer a partition of the parent (detached or dropped),
>> we must make sure that the next commands in the transaction don't see it
>> as one.  That information is currently present in the relcache
>> (rd_partdesc), which is used by a few callers, most notably the
>> tuple-routing code.  Next commands must recreate the entry so that the
>> correct thing happens based on the updated information.  More precisely,
>> we must invalidate the current entry.  RelationClearRelation() will either
>> delete the entry or rebuild it.  If it's being referenced somewhere, it
>> will be rebuilt.  The place holding the reference may also be looking at
>> the content of rd_partdesc, which we don't require them to make a copy of,
>> so we must preserve its content while other fields of RelationData are
>> being read anew from the catalog.  We don't have to preserve it if there
>> has been any change (partition added/dropped), but to make such a change
>> one would need to take a strong enough lock on the relation (parent).  We
>> assume here that anyone who wants to reference rd_partdesc takes at least
>> AccessShareLock lock on the relation, and anyone who wants to change its
>> content must take a lock that will conflict with it, so
>> AccessExclusiveLock.  Note that in all of this, we are only talking about
>> one relation, that is the parent, so parent -> child ordering of taking
>> locks may be irrelevant.
> 
> I think I understand this, anyway DropInherit and DropPartition
> is different-but-at-the-same-level operations so surely needs
> amendment for drop/detach cases. Is there already a solution? Or
> reproducing steps?

Sorry, I think I forgot to reply to this.  Since you seem to have chosen
the other solution (checking that child is still a child), maybe this
reply is a bit too late, but anyway.

DropInherit or NO INHERIT is seen primarily as changing a child table's
(which is the target table of the command) property that it is no longer a
child of the parent, so we lock the child table to block concurrent
operations from considering it a child of parent anymore.  The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.

DropPartition or DETACH PARTITION is seen primarily as changing the parent
table's (which is the target table of the command) property that one of
the partitions is removed, so we lock the parent.   Any concurrent
operations that rely on the parent's relcache to get the partition list
will wait for the session that is dropping the partition to finish, so
that they get the fresh information from the relcache (or more importantly
do not end up with information obtained from the relcache going invalid
under them without notice).  Note that the lock on the partition/child is
also present and it plays more or less the the same role as it does in the
DropInherit case, but due to different order of locking, reported race
condition does not occur between SELECT on partitioned table and
DROP/DETACH PARTITION.


By the way, I will take a look at your patch when I come back from the
vacation.  Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-14 Thread Kyotaro HORIGUCHI
At Wed, 13 Sep 2017 20:20:48 -0400, Robert Haas  wrote 
in 
> On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
>  wrote:
> > The cause is that NO INHERIT doesn't take an exlusive lock on the
> > parent. This allows expand_inherited_rtentry to add the child
> > relation into appendrel after removal from the inheritance but
> > still exists.
> >
> > I see two ways to fix this.
> >
> > The first patch adds a recheck of inheritance relationship if the
> > corresponding attribute is missing in the child in
> > make_inh_translation_list(). The recheck is a bit complex but it
> > is not performed unless the sequence above is happen. It checks
> > duplication of relid (or cycles in inheritance) following
> > find_all_inheritors (but doing a bit different) but I'm not sure
> > it is really useful.
> >
> >
> > The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> > the parent first.
> >
> > Since the latter has a larger impact on the current behavior and
> > we already treat "DROP TABLE child" case in the similar way, I
> > suppose that the first approach would be preferable.
> >
> >
> > Any comments or thoughts?
> 
> I agree that the second has less user impact, but I wonder if we can
> think that will really fix the bug completely, or more generally if it
> will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
> locking the parent.  I have a sneaking suspicion that may be wishful
> thinking.

Thanks for the comment.

The recheck patch prevent planner from involving just-detached
children while inheritance expansion. No simultaneous detatching
of children doesn't affect the planning before the time.

Once planner (the select side) gets lock on the child, the alter
side cannot do anything until the select finishes. If the alter
side won, the select side detects detaching immediately after the
lock is released then excludes the children. No problem will
occur ever after. Even in the case a child is replaced with
another table, it is nothing different from simple detaching.

As the result, I think that the recheck patch saves all possible
problem caused by simultaneously detached children.

However, the parent-locking patch is far smaller and it doesn't
need such an explanation on its perfectness. If another problem
occurs by simlultaneous detaching, it must haven't taken
necessary locks in the right order.

The most significant reason for my decision on this ptach was the
fact that the DROP child case have been resolved by rechecking
without parent locks, which is a kind of waste of resources if it
could be resolved without it. (And I saw that the same solution
is taken at least several places)

I don't think there's noticeable difference of behavior
(excluding performance) for users between the two solutions.

Is there anyone who has an opinion on the point?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-13 Thread Robert Haas
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
 wrote:
> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.
>
> I see two ways to fix this.
>
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.
>
>
> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
>
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.
>
>
> Any comments or thoughts?

I agree that the second has less user impact, but I wonder if we can
think that will really fix the bug completely, or more generally if it
will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
locking the parent.  I have a sneaking suspicion that may be wishful
thinking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-12 Thread Amit Langote
On 2017/09/13 12:05, Simon Riggs wrote:
> On 26 June 2017 at 10:16, Amit Langote  wrote:
> 
>> BTW, in the partitioned table case, the parent is always locked first
>> using an AccessExclusiveLock.  There are other considerations in that case
>> such as needing to recreate the partition descriptor upon termination of
>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> 
> Is this requirement documented or in comments anywhere?

Yes.  See the last sentence in the description of PARTITION OF clause in
CREATE TABLE:

https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-partition

And, the 4th point in the list of differences between declarative
partitioning and inheritance:

https://www.postgresql.org/docs/devel/static/ddl-partitioning.html#ddl-partitioning-implementation-inheritance

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-12 Thread Simon Riggs
On 26 June 2017 at 10:16, Amit Langote  wrote:

> BTW, in the partitioned table case, the parent is always locked first
> using an AccessExclusiveLock.  There are other considerations in that case
> such as needing to recreate the partition descriptor upon termination of
> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Is this requirement documented or in comments anywhere?

I can't see anything about that, which is a fairly major usage point.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-12 Thread Kyotaro HORIGUCHI
At Mon, 28 Aug 2017 18:28:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170828.182807.98097766.horiguchi.kyot...@lab.ntt.co.jp>
> I'll add this to CF2017-09.

This patch got deadly atack from the commit 30833ba. I changed
the signature of expand_single_inheritance_child in addition to
make_inh_translation_list to notify that the specified child is
no longer a child of the parent.

This passes regular regression test and fixed the the problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***
*** 42,47  typedef struct SeenRelsEntry
--- 42,49 
  	ListCell   *numparents_cell;	/* corresponding list cell */
  } SeenRelsEntry;
  
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+ 	  HTAB *seen_rels);
  /*
   * find_inheritance_children
   *
***
*** 400,402  typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 
  
  	return result;
  }
+ 
+ /*
+  * Check if the child is really a descendent of the parent
+  */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+ 	HTAB	   *seen_rels;
+ 	HASHCTL		ctl;
+ 	bool		ischild = false;
+ 
+ 	memset(&ctl, 0, sizeof(ctl));
+ 	ctl.keysize = sizeof(Oid);
+ 	ctl.entrysize = sizeof(Oid);
+ 	ctl.hcxt = CurrentMemoryContext;
+ 
+ 	seen_rels = hash_create("is_descendent_of temporary table",
+ 			32, /* start small and extend */
+ 			&ctl,
+ 			HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ 
+ 	ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+ 
+ 	hash_destroy(seen_rels);
+ 
+ 	return ischild;
+ }
+ 
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+ 	Relation	inhrel;
+ 	SysScanDesc scan;
+ 	ScanKeyData key[1];
+ 	bool		ischild = false;
+ 	HeapTuple	inheritsTuple;
+ 
+ 	inhrel = heap_open(InheritsRelationId, AccessShareLock);
+ 	ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+ BTEqualStrategyNumber, F_OIDEQ,	ObjectIdGetDatum(parentId));
+ 	scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+ 			  NULL, 1, key);
+ 
+ 	while ((inheritsTuple = systable_getnext(scan)) != NULL)
+ 	{
+ 		bool found;
+ 		Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+ 
+ 		hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+ 
+ 		/*
+ 		 * Recursively check into children. Although there can't theoretically
+ 		 * be any cycles in the inheritance graph, check the cycles following
+ 		 * find_all_inheritors.
+ 		 */
+ 		if (inhrelid == childId ||
+ 			(!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+ 		{
+ 			ischild = true;
+ 			break;
+ 		}
+ 	}
+ 
+ 	systable_endscan(scan);
+ 	heap_close(inhrel, AccessShareLock);
+ 
+ 	return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***
*** 108,123  static void expand_partitioned_rtentry(PlannerInfo *root,
  		   LOCKMODE lockmode,
  		   bool *has_child, List **appinfos,
  		   List **partitioned_child_rels);
! static void expand_single_inheritance_child(PlannerInfo *root,
  RangeTblEntry *parentrte,
  Index parentRTindex, Relation parentrel,
  PlanRowMark *parentrc, Relation childrel,
  bool *has_child, List **appinfos,
  List **partitioned_child_rels);
! static void make_inh_translation_list(Relation oldrelation,
  		  Relation newrelation,
! 		  Index newvarno,
! 		  List **translated_vars);
  static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
  	List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
--- 108,122 
  		   LOCKMODE lockmode,
  		   bool *has_child, List **appinfos,
  		   List **partitioned_child_rels);
! static bool expand_single_inheritance_child(PlannerInfo *root,
  RangeTblEntry *parentrte,
  Index parentRTindex, Relation parentrel,
  PlanRowMark *parentrc, Relation childrel,
  bool *has_child, List **appinfos,
  List **partitioned_child_rels);
! static List *make_inh_translation_list(Relation oldrelation,
  		  Relation newrelation,
! 		  Index newvarno);
  static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
  	List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
***
*** 1476,1481  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1475,1482 
  		 * in which they appear in the PartitionDesc.  But first, expand the
  		 * parent itself.
  		 */
+ 
+ 		/* ignore the return value since this doesn't exclude the parent */
  		expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc,
  		oldrelation,
  		&has_child, &appinfos,
***
*** 1497,1502  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1498,

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-08-28 Thread Kyotaro HORIGUCHI
Hello,

I'll add this to CF2017-09.

At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote 
 wrote in 
<75fe42df-b1d8-89ff-596d-d9da0749e...@lab.ntt.co.jp>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> > 
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms  so I'd like to fix that this time. How can we move on?
> 
> Agreed that this should be fixed.
> 
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
> 
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
> 
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> > 
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
> 
> Agreed.  BTW, the hash table in patch #1 does not seem to be really
> helpful.  In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true?  AFAICS, it does not.
> 
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.

So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.

This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)


<< the following is another topic >>

> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock.  There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> > 
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
> 
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
> 
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one.  That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code.  Next commands must recreate the entry so that the
> correct thing happens based on the updated information.  More precisely,
> we must invalidate the current entry.  RelationClearRelation() will either
> delete the entry or rebuild it.  If it's being referenced somewhere, it
> will be rebuilt.  The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog.  We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent).  We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock.  Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.

I think I understand this, anyway DropInhe

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-27 Thread Amit Langote
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
>>
>> I recall I had proposed a fix for the same thing some time ago [1].
> 
> Wow. About 1.5 years ago and stuck? I have a concrete case where
> this harms  so I'd like to fix that this time. How can we move on?

Agreed that this should be fixed.

Your proposed approach #1 to recheck the inheritance after obtaining the
lock on the child table seems to be a good way forward.

Approach #2 of reordering locking is a simpler solution, but does not
completely prevent the problem, because DROP TABLE child can still cause
it to occur, as you mentioned.

>>> The cause is that NO INHERIT doesn't take an exlusive lock on the
>>> parent. This allows expand_inherited_rtentry to add the child
>>> relation into appendrel after removal from the inheritance but
>>> still exists.
>>
>> Right.
>>
>>> I see two ways to fix this.
>>>
>>> The first patch adds a recheck of inheritance relationship if the
>>> corresponding attribute is missing in the child in
>>> make_inh_translation_list(). The recheck is a bit complex but it
>>> is not performed unless the sequence above is happen. It checks
>>> duplication of relid (or cycles in inheritance) following
>>> find_all_inheritors (but doing a bit different) but I'm not sure
>>> it is really useful.
>>
>> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
>> I guess your hash table based solution will do the job as far as
>> performance of this check is concerned, although I haven't checked the
>> code closely.
> 
> The hash table is not crucial in the patch. Substantially it just
> recurs down looking up pg_inherits for the child. I considered
> the new index but abandoned because I thought that such case
> won't occur so frequently.

Agreed.  BTW, the hash table in patch #1 does not seem to be really
helpful.  In the while loop in is_descendant_of_internal(), does
hash_search() ever return found = true?  AFAICS, it does not.

>>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
>>> the parent first.
>>>
>>> Since the latter has a larger impact on the current behavior and
>>> we already treat "DROP TABLE child" case in the similar way, I
>>> suppose that the first approach would be preferable.
>>
>> That makes sense.
>>
>> BTW, in the partitioned table case, the parent is always locked first
>> using an AccessExclusiveLock.  There are other considerations in that case
>> such as needing to recreate the partition descriptor upon termination of
>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> 
> Apart from the degree of concurrency, if we keep parent->children
> order of locking, such recreation does not seem to be
> needed. Maybe I'm missing something.

Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:

Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one.  That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code.  Next commands must recreate the entry so that the
correct thing happens based on the updated information.  More precisely,
we must invalidate the current entry.  RelationClearRelation() will either
delete the entry or rebuild it.  If it's being referenced somewhere, it
will be rebuilt.  The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog.  We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent).  We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock.  Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-26 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote 
 wrote in 
<7f5fe522-f328-3c40-565f-5e1ce3745...@lab.ntt.co.jp>
> Hello,
> 
> On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> > Hello.
> > 
> > I had a case of unexpected error caused by ALTER TABLE NO
> > INHERIT.
> > 
> > =# CREATE TABLE p (a int);
> > =# CREATE TABLE c1 () INHERITS (p);
> > 
> > session A=# BEGIN;
> > session A=# ALTER TABLE c1 NO INHERIT p;
> > 
> > session B=# EXPLAIN ANALYZE SELECT * FROM p;
> > (blocked)
> > 
> > session A=# COMMIT;
> > 
> > session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> > 
> > This happens at least back to 9.1 to master and doesn't seem to
> > be a designed behavior.
> 
> I recall I had proposed a fix for the same thing some time ago [1].

Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms  so I'd like to fix that this time. How can we move on?

> > The cause is that NO INHERIT doesn't take an exlusive lock on the
> > parent. This allows expand_inherited_rtentry to add the child
> > relation into appendrel after removal from the inheritance but
> > still exists.
> 
> Right.
> 
> > I see two ways to fix this.
> > 
> > The first patch adds a recheck of inheritance relationship if the
> > corresponding attribute is missing in the child in
> > make_inh_translation_list(). The recheck is a bit complex but it
> > is not performed unless the sequence above is happen. It checks
> > duplication of relid (or cycles in inheritance) following
> > find_all_inheritors (but doing a bit different) but I'm not sure
> > it is really useful.
> 
> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> I guess your hash table based solution will do the job as far as
> performance of this check is concerned, although I haven't checked the
> code closely.

The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.

> > The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> > the parent first.
> > 
> > Since the latter has a larger impact on the current behavior and
> > we already treat "DROP TABLE child" case in the similar way, I
> > suppose that the first approach would be preferable.
> 
> That makes sense.
> 
> BTW, in the partitioned table case, the parent is always locked first
> using an AccessExclusiveLock.  There are other considerations in that case
> such as needing to recreate the partition descriptor upon termination of
> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-26 Thread Amit Langote
Hello,

On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> I had a case of unexpected error caused by ALTER TABLE NO
> INHERIT.
> 
> =# CREATE TABLE p (a int);
> =# CREATE TABLE c1 () INHERITS (p);
> 
> session A=# BEGIN;
> session A=# ALTER TABLE c1 NO INHERIT p;
> 
> session B=# EXPLAIN ANALYZE SELECT * FROM p;
> (blocked)
> 
> session A=# COMMIT;
> 
> session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> 
> This happens at least back to 9.1 to master and doesn't seem to
> be a designed behavior.

I recall I had proposed a fix for the same thing some time ago [1].

> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.

Right.

> I see two ways to fix this.
> 
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.

I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.

> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
> 
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.

That makes sense.

BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock.  There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Thanks,
Amit

[1] https://www.postgresql.org/message-id/565EB1F2.7000201%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers