Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Michael Paquier
On Fri, Nov 30, 2012 at 4:43 AM, Andres Freund wrote:

> On 2012-11-29 11:53:50 -0500, Tom Lane wrote:
> > And here is a version for 9.1.  This omits the code changes directly
> > relevant to DROP INDEX CONCURRENTLY, but includes the changes to avoid
> > transactional updates of the pg_index row during CREATE CONCURRENTLY,
> > as well as the changes to prevent use of not-valid or not-ready indexes
> > in places where it matters.  I also chose to keep on using the
> > IndexIsValid and IndexIsReady macros, so as to avoid unnecessary
> > divergences of the branches.
>
> Looks good me.
>
> > I think this much of the patch needs to go into all supported branches.
>
> Looks like that to me, yes.
>
> Thanks for all that work!
>
Thanks. Just by looking at the patch it will be necessary to realign the
patch of REINDEX CONCURRENTLY.
However, as the discussion regarding the lock taken at phase 2 (index
swapping) is still not done, I am not sure if it is worth to do that yet.
Andres, please let me know in case you want a better version for your
review.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Andres Freund
On 2012-11-29 11:53:50 -0500, Tom Lane wrote:
> And here is a version for 9.1.  This omits the code changes directly
> relevant to DROP INDEX CONCURRENTLY, but includes the changes to avoid
> transactional updates of the pg_index row during CREATE CONCURRENTLY,
> as well as the changes to prevent use of not-valid or not-ready indexes
> in places where it matters.  I also chose to keep on using the
> IndexIsValid and IndexIsReady macros, so as to avoid unnecessary
> divergences of the branches.

Looks good me.

> I think this much of the patch needs to go into all supported branches.

Looks like that to me, yes.

Thanks for all that work!

Andres

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Tom Lane
And here is a version for 9.1.  This omits the code changes directly
relevant to DROP INDEX CONCURRENTLY, but includes the changes to avoid
transactional updates of the pg_index row during CREATE CONCURRENTLY,
as well as the changes to prevent use of not-valid or not-ready indexes
in places where it matters.  I also chose to keep on using the
IndexIsValid and IndexIsReady macros, so as to avoid unnecessary
divergences of the branches.

I think this much of the patch needs to go into all supported branches.

regards, tom lane

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index f12cad44e56c363e62fa617497bbedfe1ba8c1fe..7cbc6a1d7e7328364938ef5d69bebe865524d7f8 100644
*** a/src/backend/access/heap/README.HOT
--- b/src/backend/access/heap/README.HOT
*** from the index, as well as ensuring that
*** 386,391 
--- 386,397 
  rows in a broken HOT chain (the first condition is stronger than the
  second).  Finally, we can mark the index valid for searches.
  
+ Note that we do not need to set pg_index.indcheckxmin in this code path,
+ because we have outwaited any transactions that would need to avoid using
+ the index.  (indcheckxmin is only needed because non-concurrent CREATE
+ INDEX doesn't want to wait; its stronger lock would create too much risk of
+ deadlock if it did.)
+ 
  
  Limitations and Restrictions
  
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2e023d5ab56ac4efe1ab243739b48149e67a7408..129a1ac11f0e79408961abd73f0a33395569c5d5 100644
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*** static void ResetReindexPending(void);
*** 129,134 
--- 129,138 
   *		See whether an existing relation has a primary key.
   *
   * Caller must have suitable lock on the relation.
+  *
+  * Note: we intentionally do not check IndexIsValid here; that's because this
+  * is used to enforce the rule that there can be only one indisprimary index,
+  * and we want that to be true even if said index is invalid.
   */
  static bool
  relationHasPrimaryKey(Relation rel)
*** index_constraint_create(Relation heapRel
*** 1247,1254 
  	 * Note: since this is a transactional update, it's unsafe against
  	 * concurrent SnapshotNow scans of pg_index.  When making an existing
  	 * index into a constraint, caller must have a table lock that prevents
! 	 * concurrent table updates, and there is a risk that concurrent readers
! 	 * of the table will miss seeing this index at all.
  	 */
  	if (update_pgindex && (mark_as_primary || deferrable))
  	{
--- 1251,1259 
  	 * Note: since this is a transactional update, it's unsafe against
  	 * concurrent SnapshotNow scans of pg_index.  When making an existing
  	 * index into a constraint, caller must have a table lock that prevents
! 	 * concurrent table updates; if it's less than a full exclusive lock,
! 	 * there is a risk that concurrent readers of the table will miss seeing
! 	 * this index at all.
  	 */
  	if (update_pgindex && (mark_as_primary || deferrable))
  	{
*** BuildIndexInfo(Relation index)
*** 1450,1456 
  
  	/* other info */
  	ii->ii_Unique = indexStruct->indisunique;
! 	ii->ii_ReadyForInserts = indexStruct->indisready;
  
  	/* initialize index-build state to default */
  	ii->ii_Concurrent = false;
--- 1455,1461 
  
  	/* other info */
  	ii->ii_Unique = indexStruct->indisunique;
! 	ii->ii_ReadyForInserts = IndexIsReady(indexStruct);
  
  	/* initialize index-build state to default */
  	ii->ii_Concurrent = false;
*** index_build(Relation heapRelation,
*** 1789,1796 
  	 * index's usability horizon.  Moreover, we *must not* try to change the
  	 * index's pg_index entry while reindexing pg_index itself, and this
  	 * optimization nicely prevents that.
  	 */
! 	if (indexInfo->ii_BrokenHotChain && !isreindex)
  	{
  		Oid			indexId = RelationGetRelid(indexRelation);
  		Relation	pg_index;
--- 1794,1813 
  	 * index's usability horizon.  Moreover, we *must not* try to change the
  	 * index's pg_index entry while reindexing pg_index itself, and this
  	 * optimization nicely prevents that.
+ 	 *
+ 	 * We also need not set indcheckxmin during a concurrent index build,
+ 	 * because we won't set indisvalid true until all transactions that care
+ 	 * about the broken HOT chains are gone.
+ 	 *
+ 	 * Therefore, this code path can only be taken during non-concurrent
+ 	 * CREATE INDEX.  Thus the fact that heap_update will set the pg_index
+ 	 * tuple's xmin doesn't matter, because that tuple was created in the
+ 	 * current transaction anyway.	That also means we don't need to worry
+ 	 * about any concurrent readers of the tuple; no other transaction can see
+ 	 * it yet.
  	 */
! 	if (indexInfo->ii_BrokenHotChain && !isreindex &&
! 		!indexInfo->ii_Concurrent)
  	{
  		Oid			indexId = RelationGetRelid(indexRelation);
  		Relation	pg_ind

Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Amit Kapila
On Thursday, November 29, 2012 4:24 PM Andres Freund wrote:
> On 2012-11-29 16:18:07 +0530, Amit Kapila wrote:
> > On Thursday, November 29, 2012 12:39 AM Tom Lane wrote.
> > > Andres Freund  writes:
> > > > On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
> > > >> Attached is a very preliminary draft patch for this.  I've not
> > > >> addressed the question of whether we can clear indcheckxmin
> during
> > > >> transactional updates of pg_index rows, but I think it covers
> > > >> everything else talked about in this thread.
> > >
> >
> > > Attached is an updated patch for HEAD that I think is about ready to
> go.
> > > I'll start making a back-patchable version shortly.
> >
> > I had verified in the Patch committed that the problem is resolved.
> >
> > I have a doubt related to RelationGetIndexList() function.
> >
> > In while loop, if index is not live then it continues, so it can be
> possible
> > that we don't find a valid index after this while loop.
> > But still after the loop, it marks relation->rd_indexvalid = 1. I am
> not
> > able to see any problem with it, but why to mark it as valid when
> actually
> > there is no valid index.
> 
> rd_indexvalid is just saying whether rd_indexlist is valid. A zero
> element list seems to be valid to me. If we don't set rd_indexvalid
> pg_index will constantly be rescanned because the result isn't
> considered cached anymore.

Got the point.
Thanks.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Andres Freund
On 2012-11-29 16:18:07 +0530, Amit Kapila wrote:
> On Thursday, November 29, 2012 12:39 AM Tom Lane wrote.
> > Andres Freund  writes:
> > > On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
> > >> Attached is a very preliminary draft patch for this.  I've not
> > >> addressed the question of whether we can clear indcheckxmin during
> > >> transactional updates of pg_index rows, but I think it covers
> > >> everything else talked about in this thread.
> >
>
> > Attached is an updated patch for HEAD that I think is about ready to go.
> > I'll start making a back-patchable version shortly.
>
> I had verified in the Patch committed that the problem is resolved.
>
> I have a doubt related to RelationGetIndexList() function.
>
> In while loop, if index is not live then it continues, so it can be possible
> that we don't find a valid index after this while loop.
> But still after the loop, it marks relation->rd_indexvalid = 1. I am not
> able to see any problem with it, but why to mark it as valid when actually
> there is no valid index.

rd_indexvalid is just saying whether rd_indexlist is valid. A zero
element list seems to be valid to me. If we don't set rd_indexvalid
pg_index will constantly be rescanned because the result isn't
considered cached anymore.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-29 Thread Amit Kapila
On Thursday, November 29, 2012 12:39 AM Tom Lane wrote.
> Andres Freund  writes:
> > On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
> >> Attached is a very preliminary draft patch for this.  I've not
> >> addressed the question of whether we can clear indcheckxmin during
> >> transactional updates of pg_index rows, but I think it covers
> >> everything else talked about in this thread.
> 
 
> Attached is an updated patch for HEAD that I think is about ready to go.
> I'll start making a back-patchable version shortly.

I had verified in the Patch committed that the problem is resolved. 

I have a doubt related to RelationGetIndexList() function.

In while loop, if index is not live then it continues, so it can be possible
that we don't find a valid index after this while loop.
But still after the loop, it marks relation->rd_indexvalid = 1. I am not
able to see any problem with it, but why to mark it as valid when actually
there is no valid index.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Pavan Deolasee
On Thu, Nov 29, 2012 at 10:10 AM, Tom Lane  wrote:

> I wrote:
> > Attached is an updated patch for HEAD that I think is about ready to go.
> > I'll start making a back-patchable version shortly.
>
> Here is an only-lightly-tested version for 9.2.
>
>
Looks good at a glance. I wonder though if it would have been better to
have IndexSetValid/IndexClearValid family of macros instead of the
index_set_state_flags kind of a machinery, purely from consistency
perspective. I understand that index_set_state_flags also takes care of
opening the catalogue etc, but there are only two callers to the function
and we could do that outside that.

May be too late since we already have the patch committed to HEAD.

Thanks,
Pavan


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Michael Paquier

On 2012/11/29, at 9:23, Andres Freund  wrote:

> On 2012-11-28 19:11:46 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2012-11-28 18:41:39 -0500, Tom Lane wrote:
 However, this is more complicated and harder to understand.  So unless
 somebody is really excited about being able to tell the difference
 between create-in-progress and drop-in-progress, I'd rather leave the
 patch as-is.
>> 
>>> The only real argument for doing this that I can see is a potential
>>> REINDEX CONCURRENTLY.
>> 
>> While I was working on this patch, I came to the conclusion that the
>> only way REINDEX CONCURRENTLY could possibly work is:
>> 
>> 1. Do CREATE INDEX CONCURRENTLY with a temporary index name.
>> 
>> 2. Swap index names and any dependencies (eg for unique/pkey
>>   constraints), in a transaction of its own.
>> 
>> 3. Do DROP INDEX CONCURRENTLY on the now-obsolete index.
>> 
>> If you try to do it with just one set of index catalog entries, you'll
>> find the pg_class row has to be in two states at once, since there
>> certainly have to be two underlying physical files while all this is
>> going on.  That being the case, there'll be two different pg_index rows
>> as well, and thus my worries upthread about whether REINDEX CONCURRENTLY
>> would need to do something special with the pg_index row seem unfounded.
>> 
>> Of course, there's still plenty of magic required to make this happen
>> --- I don't see how to do step 2 safely without taking exclusive lock
>> for at least a short interval.  But that's mostly about the SnapshotNow
>> scan problem, which we at least have some ideas about how to solve.
> 
> That's actually pretty similar to the way Michael has implemented it in
> his submitted patch and what has been discussed in a recent thread. His
> patch doesn't claim to solve the concurrency issues around 2) though...
Correct, that is the same approach.
The patch took as approach to create a completely separate and new index entry 
which is a clone of the former index. This way all the entries are in catalogs 
are doubled, and the switch of the names is made while the two indexes are 
valid, but yes, I am myself wondering about the necessary lock that needs to be 
taken when switching the 2 index names. By the way, just by knowing that, I 
would agree to first rework the SnapshotNow mechanisms that would make a far 
better base for concurrent DDLs, and this is not limited to REINDEX only, but 
other things like CLUSTER, ALTER TABLE and perhaps others.

Then once this is done PG will have better prospectives with features using 
CONCURRENTLY, and we could envisage a clean implementation for REINDEX 
CONCURRENTLY,

Regards,

Michael Paquier

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-29 09:10:22 +0900, Michael Paquier wrote:
>> and is going to need a lot of rework as well as more infrastructure
>> like a better MVCC-ish SnapshotNow.

> Which is a major project in itself. I wonder whether my crazy "follow
> updates via t_ctid isn't the actually easier way to get there in the
> short term. On the other hand, a more MVCCish catalog access would be
> awesome.

Yeah, eliminating the race conditions for SnapshotNow scans would be
valuable enough to justify a lot of work --- we could get rid of a
bunch of kluges once we had that, not to mention that Simon's project of
reducing ALTER TABLE lock strength might stand a chance of working.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-28 19:11:46 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-28 18:41:39 -0500, Tom Lane wrote:
> >> However, this is more complicated and harder to understand.  So unless
> >> somebody is really excited about being able to tell the difference
> >> between create-in-progress and drop-in-progress, I'd rather leave the
> >> patch as-is.
>
> > The only real argument for doing this that I can see is a potential
> > REINDEX CONCURRENTLY.
>
> While I was working on this patch, I came to the conclusion that the
> only way REINDEX CONCURRENTLY could possibly work is:
>
> 1. Do CREATE INDEX CONCURRENTLY with a temporary index name.
>
> 2. Swap index names and any dependencies (eg for unique/pkey
>constraints), in a transaction of its own.
>
> 3. Do DROP INDEX CONCURRENTLY on the now-obsolete index.
>
> If you try to do it with just one set of index catalog entries, you'll
> find the pg_class row has to be in two states at once, since there
> certainly have to be two underlying physical files while all this is
> going on.  That being the case, there'll be two different pg_index rows
> as well, and thus my worries upthread about whether REINDEX CONCURRENTLY
> would need to do something special with the pg_index row seem unfounded.
>
> Of course, there's still plenty of magic required to make this happen
> --- I don't see how to do step 2 safely without taking exclusive lock
> for at least a short interval.  But that's mostly about the SnapshotNow
> scan problem, which we at least have some ideas about how to solve.

That's actually pretty similar to the way Michael has implemented it in
his submitted patch and what has been discussed in a recent thread. His
patch doesn't claim to solve the concurrency issues around 2) though...

Greetings,

Andres

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-29 09:10:22 +0900, Michael Paquier wrote:
> On Thu, Nov 29, 2012 at 8:52 AM, Andres Freund wrote:
> 
> > On 2012-11-28 18:41:39 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > On 2012-11-28 17:42:18 -0500, Tom Lane wrote:
> > > >> I agree it's a judgment call, though.  Anybody want to argue for the
> > > >> other position?
> > >
> > > > Hm. Seems odd to include indexes that are being dropped concurrently at
> > > > that moment. But then, we can't really detect that situation and as you
> > > > say its consistent with pg_dump...
> > >
> > > [ thinks about that for a bit... ]  We could have that, for about the
> > same
> > > cost as the currently proposed patch: instead of defining the added flag
> > > column as "index is live", define it as "drop in progress", and set it
> > > immediately at the start of the DROP CONCURRENTLY sequence.  Then the
> > > "dead" condition that RelationGetIndexList must check for is "drop in
> > > progress and not indisvalid and not indisready".
> >
> > You're right.
> >
> > > However, this is more complicated and harder to understand.  So unless
> > > somebody is really excited about being able to tell the difference
> > > between create-in-progress and drop-in-progress, I'd rather leave the
> > > patch as-is.
> >
> > The only real argument for doing this that I can see is a potential
> > REINDEX CONCURRENTLY.
> >
> Patch that has been submitted to this commit fest

Yea, I did a first look through it recently... Not really sure where to
start with the necessary infrastructure yet.

> and is going to need a lot of rework as well as more infrastructure
> like a better MVCC-ish SnapshotNow.

Which is a major project in itself. I wonder whether my crazy "follow
updates via t_ctid isn't the actually easier way to get there in the
short term. On the other hand, a more MVCCish catalog access would be
awesome.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-28 18:41:39 -0500, Tom Lane wrote:
>> However, this is more complicated and harder to understand.  So unless
>> somebody is really excited about being able to tell the difference
>> between create-in-progress and drop-in-progress, I'd rather leave the
>> patch as-is.

> The only real argument for doing this that I can see is a potential
> REINDEX CONCURRENTLY.

While I was working on this patch, I came to the conclusion that the
only way REINDEX CONCURRENTLY could possibly work is:

1. Do CREATE INDEX CONCURRENTLY with a temporary index name.

2. Swap index names and any dependencies (eg for unique/pkey
   constraints), in a transaction of its own.

3. Do DROP INDEX CONCURRENTLY on the now-obsolete index.

If you try to do it with just one set of index catalog entries, you'll
find the pg_class row has to be in two states at once, since there
certainly have to be two underlying physical files while all this is
going on.  That being the case, there'll be two different pg_index rows
as well, and thus my worries upthread about whether REINDEX CONCURRENTLY
would need to do something special with the pg_index row seem unfounded.

Of course, there's still plenty of magic required to make this happen
--- I don't see how to do step 2 safely without taking exclusive lock
for at least a short interval.  But that's mostly about the SnapshotNow
scan problem, which we at least have some ideas about how to solve.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-28 18:41:39 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-28 17:42:18 -0500, Tom Lane wrote:
> >> I agree it's a judgment call, though.  Anybody want to argue for the
> >> other position?
>
> > Hm. Seems odd to include indexes that are being dropped concurrently at
> > that moment. But then, we can't really detect that situation and as you
> > say its consistent with pg_dump...
>
> [ thinks about that for a bit... ]  We could have that, for about the same
> cost as the currently proposed patch: instead of defining the added flag
> column as "index is live", define it as "drop in progress", and set it
> immediately at the start of the DROP CONCURRENTLY sequence.  Then the
> "dead" condition that RelationGetIndexList must check for is "drop in
> progress and not indisvalid and not indisready".

You're right.

> However, this is more complicated and harder to understand.  So unless
> somebody is really excited about being able to tell the difference
> between create-in-progress and drop-in-progress, I'd rather leave the
> patch as-is.

The only real argument for doing this that I can see is a potential
REINDEX CONCURRENTLY.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-28 17:42:18 -0500, Tom Lane wrote:
>> I agree it's a judgment call, though.  Anybody want to argue for the
>> other position?

> Hm. Seems odd to include indexes that are being dropped concurrently at
> that moment. But then, we can't really detect that situation and as you
> say its consistent with pg_dump...

[ thinks about that for a bit... ]  We could have that, for about the same
cost as the currently proposed patch: instead of defining the added flag
column as "index is live", define it as "drop in progress", and set it
immediately at the start of the DROP CONCURRENTLY sequence.  Then the
"dead" condition that RelationGetIndexList must check for is "drop in
progress and not indisvalid and not indisready".

However, this is more complicated and harder to understand.  So unless
somebody is really excited about being able to tell the difference
between create-in-progress and drop-in-progress, I'd rather leave the
patch as-is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-28 17:42:18 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > One minor thing I haven't noticed earlier: Perhaps we should also skip
> > over invalid indexes in transformTableLikeClause's
> > CREATE_TABLE_LIKE_INDEXES case?
>
> I left that as-is intentionally: the fact that an index isn't valid
> doesn't prevent us from cloning it.  A relevant data point is that
> pg_dump doesn't care whether indexes are valid or not --- it'll dump
> their definitions anyway.
>
> I agree it's a judgment call, though.  Anybody want to argue for the
> other position?

Hm. Seems odd to include indexes that are being dropped concurrently at
that moment. But then, we can't really detect that situation and as you
say its consistent with pg_dump...

Hm.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Tom Lane
Andres Freund  writes:
> One minor thing I haven't noticed earlier: Perhaps we should also skip
> over invalid indexes in transformTableLikeClause's
> CREATE_TABLE_LIKE_INDEXES case?

I left that as-is intentionally: the fact that an index isn't valid
doesn't prevent us from cloning it.  A relevant data point is that
pg_dump doesn't care whether indexes are valid or not --- it'll dump
their definitions anyway.

I agree it's a judgment call, though.  Anybody want to argue for the
other position?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-28 14:09:11 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
> >> Attached is a very preliminary draft patch for this.  I've not addressed
> >> the question of whether we can clear indcheckxmin during transactional
> >> updates of pg_index rows, but I think it covers everything else talked
> >> about in this thread.

> > - I noticed while trying my old isolationtester test that
> > heap_update_inplace disregards any locks on the tuple. I don't really
> > see a scenario where this is problematic right now, seems a bit
> > dangerous for the future though.
>
> I think this should be all right --- we have at least
> ShareUpdateExclusiveLock on the table and the index before we do
> anything, so nobody else should be fooling with its pg_index entry.
>
> Attached is an updated patch for HEAD that I think is about ready to go.
> I'll start making a back-patchable version shortly.

Looks good!

One minor thing I haven't noticed earlier: Perhaps we should also skip
over invalid indexes in transformTableLikeClause's
CREATE_TABLE_LIKE_INDEXES case?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-28 Thread Andres Freund
On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
> >>> ... USING someindex ; is done? Also I think indoption might be written
> >>> to as well.
>
> >> Ugh, you're right.  Somebody wasn't paying attention when those ALTER
> >> commands were added.
>
> On closer look, indoption is never updated --- perhaps you were thinking
> about pg_class.reloptions.  indisprimary, indimmediate, and
> indisclustered are all subject to post-creation updates though.

Yea, I haven't really checked what inoption actually does.

> >> We could probably alleviate the consequences of this by having those
> >> operations reset indcheckxmin if the tuple's old xmin is below the
> >> GlobalXmin horizon.  That's something for later though --- it's not
> >> a data corruption issue, it just means that the index might unexpectedly
> >> not be used for queries for a little bit after an ALTER.
>
> > mark_index_clustered() does the same btw, its not a problem in the
> > CLUSTER ... USING ...; case because that creates a new pg_index entry
> > anyway but in the ALTER TABLE one thats not the case.
>
> After further study I think the situation is that
>
> (1) The indisvalid/indisready/indislive updates in CREATE/DROP INDEX
> CONCURRENTLY can, and must, be done in-place since we don't have
> exclusive lock on the parent table.
>
> (2) All the other updates can be transactional because we hold
> sufficient locks to ensure that nothing bad will happen.  The proposed
> reductions in ALTER TABLE lock strength would break this in several
> cases, but that's a problem for another day.


> Attached is a very preliminary draft patch for this.  I've not addressed
> the question of whether we can clear indcheckxmin during transactional
> updates of pg_index rows, but I think it covers everything else talked
> about in this thread.

Looks good on a quick lookthrough. Will play a bit more once the
indexcheckxmin stuff is sorted out.

Some comments:
- INDEX_DROP_CLEAR_READY clears indislive, perhasp INDEX_DROP_SET_DEAD
or NOT_ALIVE is more appropriate?
- I noticed while trying my old isolationtester test that
heap_update_inplace disregards any locks on the tuple. I don't really
see a scenario where this is problematic right now, seems a bit
dangerous for the future though.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
>>> ... USING someindex ; is done? Also I think indoption might be written
>>> to as well.

>> Ugh, you're right.  Somebody wasn't paying attention when those ALTER
>> commands were added.

On closer look, indoption is never updated --- perhaps you were thinking
about pg_class.reloptions.  indisprimary, indimmediate, and
indisclustered are all subject to post-creation updates though.

>> We could probably alleviate the consequences of this by having those
>> operations reset indcheckxmin if the tuple's old xmin is below the
>> GlobalXmin horizon.  That's something for later though --- it's not
>> a data corruption issue, it just means that the index might unexpectedly
>> not be used for queries for a little bit after an ALTER.

> mark_index_clustered() does the same btw, its not a problem in the
> CLUSTER ... USING ...; case because that creates a new pg_index entry
> anyway but in the ALTER TABLE one thats not the case.

After further study I think the situation is that

(1) The indisvalid/indisready/indislive updates in CREATE/DROP INDEX
CONCURRENTLY can, and must, be done in-place since we don't have
exclusive lock on the parent table.

(2) All the other updates can be transactional because we hold
sufficient locks to ensure that nothing bad will happen.  The proposed
reductions in ALTER TABLE lock strength would break this in several
cases, but that's a problem for another day.

Attached is a very preliminary draft patch for this.  I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.

regards, tom lane

diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 6a8a96f60339118f7edb11668c5db23fbf85c211..6172137512f5ee72d39262a394c5bd1bef183e14 100644
*** a/contrib/tcn/tcn.c
--- b/contrib/tcn/tcn.c
*** triggered_change_notification(PG_FUNCTIO
*** 141,148 
  		if (!HeapTupleIsValid(indexTuple))		/* should not happen */
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		index = (Form_pg_index) GETSTRUCT(indexTuple);
! 		/* we're only interested if it is the primary key */
! 		if (index->indisprimary)
  		{
  			int			numatts = index->indnatts;
  
--- 141,148 
  		if (!HeapTupleIsValid(indexTuple))		/* should not happen */
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		index = (Form_pg_index) GETSTRUCT(indexTuple);
! 		/* we're only interested if it is the primary key and valid */
! 		if (index->indisprimary && index->indisvalid)
  		{
  			int			numatts = index->indnatts;
  
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f99919093ca0da8c59ee4f4df0643837dfbdb38b..5f270404bf1279ad2ba745ca417ab00b0b5dbb08 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 3480,3486 
 index is possibly incomplete: it must still be modified by
 INSERT/UPDATE operations, but it cannot safely
 be used for queries. If it is unique, the uniqueness property is not
!true either.

   
  
--- 3480,3486 
 index is possibly incomplete: it must still be modified by
 INSERT/UPDATE operations, but it cannot safely
 be used for queries. If it is unique, the uniqueness property is not
!guaranteed true either.

   
  
***
*** 3508,3513 
--- 3508,3523 
   
  
   
+   indislive
+   bool
+   
+   
+If false, the index is in process of being dropped, and should be
+ignored for all purposes (including HOT-safety decisions)
+   
+  
+ 
+  
indkey
int2vector
pg_attribute.attnum
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index f12cad44e56c363e62fa617497bbedfe1ba8c1fe..4cf3c3a0d4c2db96a57e73e46fdd7463db439f79 100644
*** a/src/backend/access/heap/README.HOT
--- b/src/backend/access/heap/README.HOT
*** from the index, as well as ensuring that
*** 386,391 
--- 386,419 
  rows in a broken HOT chain (the first condition is stronger than the
  second).  Finally, we can mark the index valid for searches.
  
+ Note that we do not need to set pg_index.indcheckxmin in this code path,
+ because we have outwaited any transactions that would need to avoid using
+ the index.  (indcheckxmin is only needed because non-concurrent CREATE
+ INDEX doesn't want to wait; its stronger lock would create too much risk of
+ deadlock if it did.)
+ 
+ 
+ DROP INDEX CONCURRENTLY
+ ---
+ 
+ DROP INDEX CONCURRENTLY is sort of the reverse sequence of CREATE INDEX
+ CONCURRENTLY.  We first mark the index as not indisvalid, and then wait for
+ any transactions that could be usin

Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Andres Freund
On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
> >> The stuff you are allowed to ALTER is pretty much
> >> irrelevant to the index's life as an index.
>
> > Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
> > ... USING someindex ; is done? Also I think indoption might be written
> > to as well.
>
> Ugh, you're right.  Somebody wasn't paying attention when those ALTER
> commands were added.
>
> We could probably alleviate the consequences of this by having those
> operations reset indcheckxmin if the tuple's old xmin is below the
> GlobalXmin horizon.  That's something for later though --- it's not
> a data corruption issue, it just means that the index might unexpectedly
> not be used for queries for a little bit after an ALTER.

mark_index_clustered() does the same btw, its not a problem in the
CLUSTER ... USING ...; case because that creates a new pg_index entry
anyway but in the ALTER TABLE one thats not the case.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
>> The stuff you are allowed to ALTER is pretty much
>> irrelevant to the index's life as an index.

> Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
> ... USING someindex ; is done? Also I think indoption might be written
> to as well.

Ugh, you're right.  Somebody wasn't paying attention when those ALTER
commands were added.

We could probably alleviate the consequences of this by having those
operations reset indcheckxmin if the tuple's old xmin is below the
GlobalXmin horizon.  That's something for later though --- it's not
a data corruption issue, it just means that the index might unexpectedly
not be used for queries for a little bit after an ALTER.

> It strikes me that the whole idea of reusing xmin when indexcheckxmin is
> set is overly clever and storing the xid itself would have be been
> better... Too late though.

Well, the reason for that is documented in README.HOT: if the XID were
in an ordinary column there'd be no nice way to get it frozen when it
gets too old.  As-is, VACUUM takes care of that problem automatically.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Andres Freund
On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
> >> In short, all flag changes in pg_index should be done by
> >> update-in-place, and the tuple's xmin will never change from the
> >> original creating transaction (until frozen).
>
> > Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
> > ALTER INDEX operations are expected to work transactionally right
> > now. As far as I see there is nothing that prohibits a indexcheckxmin
> > requiring index to be altered while indexcheckxmin is still required?
>
> I said "in pg_index".  There is no reason to ever alter an index's
> pg_index entry transactionally, because we don't support redefining
> the index columns.  The stuff you are allowed to ALTER is pretty much
> irrelevant to the index's life as an index.

Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.

> >> What we want the xmin to do, for indcheckxmin purposes, is reflect the
> >> time at which the index started to be included in HOT-safety decisions.
> >> Since we're never going to move the xmin, that means that *we cannot
> >> allow REINDEX to mark a dead index live again*.
>
> > That would be a regression compared with the current state though. We
> > have officially documented REINDEX as a way out of INVALID indexes...
>
> It's a way out of failed CREATE operations.  If DROP fails at the last
> step, you won't be able to go back, but why would you want to?  Just
> do the DROP again.

Oh, sorry, misunderstood you.

>
> >> Anybody feel like bikeshedding the flag column name?  I'm thinking
> >> "indislive" but maybe there's a better name.
>
> > I personally would slightly favor indisdead instead...
>
> Meh ... the other two flags are positive, in the sense of
> true-is-the-normal-state, so I thought this one should be too.

Good point.

> I had also toyed with "indishot", to reflect the idea that this controls
> whether the index is included in HOT-safety decisions, but that seems
> maybe a little too cute.

indislive seems better than that, yes.

> >>> Btw, even if we manage to find a sensible fix for this I would suggest
> >>> postponing it after the next back branch release.
>
> >> AFAICS this is a data loss/corruption problem, and as such a "must fix".
> >> If we can't have it done by next week, I'd rather postpone the releases
> >> until it is done.
>
> > Ok, just seemed like a rather complex fix in a short time for something
> > that seemingly hasn't been noticed since 8.3. I am a bit worried about
> > introducing something worse while fixing this.
>
> Hm?  The fact that the DROP patch broke CREATE INDEX CONCURRENTLY is a
> new and very nasty bug in 9.2.  I would agree with you if we were
> considering the unsafe-row-update problem alone, but it seems like we
> might as well fix both aspects while we're looking at this code.

> There is a question of whether we should risk trying to back-patch the
> in-place-update changes further than 9.2.  Given the lack of complaints
> I'm inclined not to, but could be persuaded differently.

Oh, I only was talking about the inplace changes. The DROP INDEX
CONCURRENTLY breakage definitely needs to get backpatched.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
>> In short, all flag changes in pg_index should be done by
>> update-in-place, and the tuple's xmin will never change from the
>> original creating transaction (until frozen).

> Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
> ALTER INDEX operations are expected to work transactionally right
> now. As far as I see there is nothing that prohibits a indexcheckxmin
> requiring index to be altered while indexcheckxmin is still required?

I said "in pg_index".  There is no reason to ever alter an index's
pg_index entry transactionally, because we don't support redefining
the index columns.  The stuff you are allowed to ALTER is pretty much
irrelevant to the index's life as an index.

>> What we want the xmin to do, for indcheckxmin purposes, is reflect the
>> time at which the index started to be included in HOT-safety decisions.
>> Since we're never going to move the xmin, that means that *we cannot
>> allow REINDEX to mark a dead index live again*.

> That would be a regression compared with the current state though. We
> have officially documented REINDEX as a way out of INVALID indexes...

It's a way out of failed CREATE operations.  If DROP fails at the last
step, you won't be able to go back, but why would you want to?  Just
do the DROP again.

>> Anybody feel like bikeshedding the flag column name?  I'm thinking
>> "indislive" but maybe there's a better name.

> I personally would slightly favor indisdead instead...

Meh ... the other two flags are positive, in the sense of
true-is-the-normal-state, so I thought this one should be too.

I had also toyed with "indishot", to reflect the idea that this controls
whether the index is included in HOT-safety decisions, but that seems
maybe a little too cute.

>>> Btw, even if we manage to find a sensible fix for this I would suggest
>>> postponing it after the next back branch release.

>> AFAICS this is a data loss/corruption problem, and as such a "must fix".
>> If we can't have it done by next week, I'd rather postpone the releases
>> until it is done.

> Ok, just seemed like a rather complex fix in a short time for something
> that seemingly hasn't been noticed since 8.3. I am a bit worried about
> introducing something worse while fixing this.

Hm?  The fact that the DROP patch broke CREATE INDEX CONCURRENTLY is a
new and very nasty bug in 9.2.  I would agree with you if we were
considering the unsafe-row-update problem alone, but it seems like we
might as well fix both aspects while we're looking at this code.

There is a question of whether we should risk trying to back-patch the
in-place-update changes further than 9.2.  Given the lack of complaints
I'm inclined not to, but could be persuaded differently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Andres Freund
On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
> >> I looked through the code a bit, and I think we should be able to make
> >> this work, but note there are also places that update indcheckxmin using
> >> heap_update, and that's just as dangerous as changing the other two
> >> flags via heap_update, if you don't have exclusive lock on the table.
>
> > Isn't setting indexcheckxmin already problematic in the case of CREATE
> > INDEX CONCURRENTLY? index_build already runs in a separate transaction
> > there.
>
> Yeah, you are right, except that AFAICS indcheckxmin is really only
> needed for regular non-concurrent CREATE INDEX, which needs it because
> it commits without waiting for readers that might be bothered by broken
> HOT chains.  In a concurrent CREATE INDEX, we handle that problem by
> waiting out all such readers before setting indisvalid.  So the
> concurrent code path should not be bothering to set indcheckxmin at all,
> I think.  (This is underdocumented.)

Seems to be correct to me.

> Looking closer at the comment in reindex_index, what it's really full of
> angst about is that simple_heap_update will update the tuple's xmin
> *when we would rather that it didn't*.  So switching to update-in-place
> there will solve a problem, not create one.

It strikes me that the whole idea of reusing xmin when indexcheckxmin is
set is overly clever and storing the xid itself would have be been
better... Too late though.

> In short, all flag changes in pg_index should be done by
> update-in-place, and the tuple's xmin will never change from the
> original creating transaction (until frozen).

Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
ALTER INDEX operations are expected to work transactionally right
now. As far as I see there is nothing that prohibits a indexcheckxmin
requiring index to be altered while indexcheckxmin is still required?

> What we want the xmin to do, for indcheckxmin purposes, is reflect the
> time at which the index started to be included in HOT-safety decisions.
> Since we're never going to move the xmin, that means that *we cannot
> allow REINDEX to mark a dead index live again*.  Once DROP INDEX
> CONCURRENTLY has reached the final state, you can't revalidate the index
> by reindexing it, you'll have to drop it and then make a brand new one.
> That seems like a pretty minor compromise.

That would be a regression compared with the current state though. We
have officially documented REINDEX as a way out of INVALID indexes...

If we store the xid of the reindexing transaction there that might be
pessimal (because there should be not HOT safety problems) but should
always be correct, or am I missing something?

> What I propose to do next is create a patch for HEAD that includes a
> new pg_index flag column, since I think the logic will be clearer
> with that.  Once we're happy with that, we can back-port the patch
> into a form that uses the existing flag columns.
>
> Anybody feel like bikeshedding the flag column name?  I'm thinking
> "indislive" but maybe there's a better name.

I personally would slightly favor indisdead instead...

> > Btw, even if we manage to find a sensible fix for this I would suggest
> > postponing it after the next back branch release.
>
> AFAICS this is a data loss/corruption problem, and as such a "must fix".
> If we can't have it done by next week, I'd rather postpone the releases
> until it is done.

Ok, just seemed like a rather complex fix in a short time for something
that seemingly hasn't been noticed since 8.3. I am a bit worried about
introducing something worse while fixing this.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Tom Lane
Andres Freund  writes:
> On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
>> I looked through the code a bit, and I think we should be able to make
>> this work, but note there are also places that update indcheckxmin using
>> heap_update, and that's just as dangerous as changing the other two
>> flags via heap_update, if you don't have exclusive lock on the table.

> Isn't setting indexcheckxmin already problematic in the case of CREATE
> INDEX CONCURRENTLY? index_build already runs in a separate transaction
> there.

Yeah, you are right, except that AFAICS indcheckxmin is really only
needed for regular non-concurrent CREATE INDEX, which needs it because
it commits without waiting for readers that might be bothered by broken
HOT chains.  In a concurrent CREATE INDEX, we handle that problem by
waiting out all such readers before setting indisvalid.  So the
concurrent code path should not be bothering to set indcheckxmin at all,
I think.  (This is underdocumented.)

Looking closer at the comment in reindex_index, what it's really full of
angst about is that simple_heap_update will update the tuple's xmin
*when we would rather that it didn't*.  So switching to update-in-place
there will solve a problem, not create one.

In short, all flag changes in pg_index should be done by
update-in-place, and the tuple's xmin will never change from the
original creating transaction (until frozen).

What we want the xmin to do, for indcheckxmin purposes, is reflect the
time at which the index started to be included in HOT-safety decisions.
Since we're never going to move the xmin, that means that *we cannot
allow REINDEX to mark a dead index live again*.  Once DROP INDEX
CONCURRENTLY has reached the final state, you can't revalidate the index
by reindexing it, you'll have to drop it and then make a brand new one.
That seems like a pretty minor compromise.

What I propose to do next is create a patch for HEAD that includes a
new pg_index flag column, since I think the logic will be clearer
with that.  Once we're happy with that, we can back-port the patch
into a form that uses the existing flag columns.

Anybody feel like bikeshedding the flag column name?  I'm thinking
"indislive" but maybe there's a better name.

Note: I'm not impressed by the proposal to replace these columns with
a single integer flag column.  Aside from any possible incompatibility
with existing client code, it just isn't going to be easy to read the
index's state manually if we do that.  We could maybe dodge that
complaint with a char (pseudo-enum) status column but I don't think
that will simplify the code at all, and it's got the same or worse
compatibility issues.

> Btw, even if we manage to find a sensible fix for this I would suggest
> postponing it after the next back branch release.

AFAICS this is a data loss/corruption problem, and as such a "must fix".
If we can't have it done by next week, I'd rather postpone the releases
until it is done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-27 Thread Andres Freund
On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
> >> I think this could possibly be fixed by using nontransactional
> >> update-in-place when we're trying to change indisvalid and/or
> >> indisready, but I've not really thought through the details.
>
> > I couldn't really think of any realistic method to fix this other than
> > update in place. I thought about it for a while and I think it should
> > work, but I have to say it makes me slightly uneasy.
>
> I looked through the code a bit, and I think we should be able to make
> this work, but note there are also places that update indcheckxmin using
> heap_update, and that's just as dangerous as changing the other two
> flags via heap_update, if you don't have exclusive lock on the table.
> This is going to need some careful thought, because we currently expect
> that such an update will set the pg_index row's xmin to the current XID,
> which is something an in-place update can *not* do.  I think this is a
> non-problem during construction of a new index, since the xmin ought to
> be the current XID already anyway, but it's less clear what to do in
> REINDEX.  In the short term there may be no problem since REINDEX takes
> exclusive lock on the parent table anyway (and hence could safely do
> a transactional update) but if we ever want REINDEX CONCURRENTLY we'll
> need a better answer.

Isn't setting indexcheckxmin already problematic in the case of CREATE
INDEX CONCURRENTLY? index_build already runs in a separate transaction
there.

I am really surprised that we haven't seen any evidence of a problem
with this. On a database with a busy & bigger catalog that ought to be
a real problem.

I wonder whether we couldn't fix it by introducing a variant/wrapper of
heap_fetch et al. that follows t_ctid if the tuple became invisible
"recently". That ought to be able to fix most of these issues in a
pretty general fashion.
That would make a nice implementation of REINDEX CONCURRENTLY easier as
well...

Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-26 Thread Tom Lane
Andres Freund  writes:
> On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
>> I think this could possibly be fixed by using nontransactional
>> update-in-place when we're trying to change indisvalid and/or
>> indisready, but I've not really thought through the details.

> I couldn't really think of any realistic method to fix this other than
> update in place. I thought about it for a while and I think it should
> work, but I have to say it makes me slightly uneasy.

I looked through the code a bit, and I think we should be able to make
this work, but note there are also places that update indcheckxmin using
heap_update, and that's just as dangerous as changing the other two
flags via heap_update, if you don't have exclusive lock on the table.
This is going to need some careful thought, because we currently expect
that such an update will set the pg_index row's xmin to the current XID,
which is something an in-place update can *not* do.  I think this is a
non-problem during construction of a new index, since the xmin ought to
be the current XID already anyway, but it's less clear what to do in
REINDEX.  In the short term there may be no problem since REINDEX takes
exclusive lock on the parent table anyway (and hence could safely do
a transactional update) but if we ever want REINDEX CONCURRENTLY we'll
need a better answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-11-24 Thread Andres Freund
On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
> 1.  These operations think they can use ordinary heap_update operations
> to change pg_index entries when they don't have exclusive lock on the
> parent table.  The lack of ex-lock means that another backend could be
> currently loading up its list of index OIDs for the table --- and since
> it scans pg_index with SnapshotNow to do that, the heap_update could
> result in the other backend failing to see this index *at all*.  That's
> okay if it causes the other backend to not use the index for scanning...
> but not okay if it causes the other backend to fail to make index
> entries it is supposed to make.
>
> I think this could possibly be fixed by using nontransactional
> update-in-place when we're trying to change indisvalid and/or
> indisready, but I've not really thought through the details.

I couldn't really think of any realistic method to fix this other than
update in place. I thought about it for a while and I think it should
work, but I have to say it makes me slightly uneasy.

If we could could ensure both land on the same page it would be possible
to fix in a nicer way, but thats not really possible. Especially not in
any way thats backpatchable.

Unless somebody has a better idea I am going to write a patch for that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-21 Thread Kevin Grittner
Kevin Grittner wrote:
> Tom Lane wrote:
> > This needs to be back-patched, no?
> 
> Looking at that now.

Back-patched to 9.2.  I don't know how I got it in my head that this
was a pending 9.3 feature.  I'll check next time, even if I think I
know.

Thanks to both Andres and Tom for pointing that out.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-21 Thread Kevin Grittner
Tom Lane wrote:
> This needs to be back-patched, no?

Looking at that now.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-21 Thread Tom Lane
"Kevin Grittner"  writes:
> Kevin Grittner wrote:
>> Will apply tomorrow if there are no further objections.

> Done.

This needs to be back-patched, no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-21 Thread Kevin Grittner
Kevin Grittner wrote:

> Will apply tomorrow if there are no further objections.

Done.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-20 Thread Kevin Grittner
Simon Riggs wrote:
> Kevin, you're good to go on the SSI patch, or I'll apply next week
> if you don't. Thanks for that.

There were some hunks failing because of minor improvements to the
comments you applied, so attached is a version with trivial
adjustments for that.  Will apply tomorrow if there are no further
objections.

Nice feature, BTW!

-Kevin
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 1320,1325  index_drop(Oid indexId, bool concurrent)
--- 1320,1337 
  	 * In the concurrent case we make sure that nobody can be looking at the
  	 * indexes by dropping the index in multiple steps, so we don't need a full
  	 * AccessExclusiveLock yet.
+ 	 *
+ 	 * All predicate locks on the index are about to be made invalid. Promote
+ 	 * them to relation locks on the heap. For correctness the index must not
+ 	 * be seen with indisvalid = true during query planning after the move
+ 	 * starts, so that the index will not be used for a scan after the
+ 	 * predicate lock move, as this could create new predicate locks on the
+ 	 * index which would not ensure a heap relation lock. Also, the index must
+ 	 * not be seen during execution of a heap tuple insert with indisready =
+ 	 * false before the move is complete, since the conflict with the
+ 	 * predicate lock on the index gap could be missed before the lock on the
+ 	 * heap relation is in place to detect a conflict based on the heap tuple
+ 	 * insert.
  	 */
  	heapId = IndexGetRelation(indexId, false);
  	if (concurrent)
***
*** 1445,1450  index_drop(Oid indexId, bool concurrent)
--- 1457,1470 
  		}
  
  		/*
+ 		 * No more predicate locks will be acquired on this index, and we're
+ 		 * about to stop doing inserts into the index which could show
+ 		 * conflicts with existing predicate locks, so now is the time to move
+ 		 * them to the heap relation.
+ 		 */
+ 		TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
+ 		/*
  		 * Now we are sure that nobody uses the index for queries, they just
  		 * might have it opened for updating it. So now we can unset
  		 * indisready and wait till nobody could update the index anymore.
***
*** 1514,1525  index_drop(Oid indexId, bool concurrent)
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 
! 	/*
! 	 * All predicate locks on the index are about to be made invalid. Promote
! 	 * them to relation locks on the heap.
! 	 */
! 	TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files
--- 1534,1541 
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 	else
! 		TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-19 Thread Simon Riggs
On 18 October 2012 19:48, Simon Riggs  wrote:
> On 18 October 2012 10:20, Andres Freund  wrote:
>> On Thursday, October 18, 2012 06:12:02 AM Kevin Grittner wrote:
>>> Kevin Grittner wrote:
>>> > Hmm. The comment is probably better now, but I've been re-checking
>>> > the code, and I think my actual code change is completely wrong.
>>> > Give me a bit to sort this out.
>>>
>>> I'm having trouble seeing a way to make this work without rearranging
>>> the code for concurrent drop to get to a state where it has set
>>> indisvalid = false, made that visible to all processes, and ensured
>>> that all scans of the index are complete -- while indisready is still
>>> true. That is the point where TransferPredicateLocksToHeapRelation()
>>> could be safely called. Then we would need to set indisready = false,
>>> make that visible to all processes, and ensure that all access to the
>>> index is complete. I can't see where it works to set both flags at
>>> the same time. I want to sleep on it to see if I can come up with any
>>> other way, but right now that's the only way I'm seeing to make DROP
>>> INDEX CONCURRENTLY compatible with SERIALIZABLE transactions. :-(
>>
>> In a nearby bug I had to restructure the code that in a way thats similar to
>> this anyway, so that seems fine. Maybe you can fix the bug ontop of the two
>> attached patches?
>
> First patch and first test committed.
>
> Working on second patch/test.

I've applied the second patch as-is.

The second test shows it passes, but the nature of the bug is fairly
obscure, so having a specific test for dropping an already dropped
object is a little strange and so I've not applied that.

Thanks for fixes and tests.

Kevin, you're good to go on the SSI patch, or I'll apply next week if
you don't. Thanks for that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-18 Thread Simon Riggs
On 18 October 2012 10:20, Andres Freund  wrote:
> On Thursday, October 18, 2012 06:12:02 AM Kevin Grittner wrote:
>> Kevin Grittner wrote:
>> > Hmm. The comment is probably better now, but I've been re-checking
>> > the code, and I think my actual code change is completely wrong.
>> > Give me a bit to sort this out.
>>
>> I'm having trouble seeing a way to make this work without rearranging
>> the code for concurrent drop to get to a state where it has set
>> indisvalid = false, made that visible to all processes, and ensured
>> that all scans of the index are complete -- while indisready is still
>> true. That is the point where TransferPredicateLocksToHeapRelation()
>> could be safely called. Then we would need to set indisready = false,
>> make that visible to all processes, and ensure that all access to the
>> index is complete. I can't see where it works to set both flags at
>> the same time. I want to sleep on it to see if I can come up with any
>> other way, but right now that's the only way I'm seeing to make DROP
>> INDEX CONCURRENTLY compatible with SERIALIZABLE transactions. :-(
>
> In a nearby bug I had to restructure the code that in a way thats similar to
> this anyway, so that seems fine. Maybe you can fix the bug ontop of the two
> attached patches?

First patch and first test committed.

Working on second patch/test.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-18 Thread Kevin Grittner
Andres Freund wrote:
> On Thursday, October 18, 2012 06:12:02 AM Kevin Grittner wrote:
 
>> I'm having trouble seeing a way to make this work without
>> rearranging the code for concurrent drop to get to a state where
>> it has set indisvalid = false, made that visible to all processes,
>> and ensured that all scans of the index are complete -- while
>> indisready is still true. That is the point where
>> TransferPredicateLocksToHeapRelation() could be safely called.
>> Then we would need to set indisready = false, make that visible to
>> all processes, and ensure that all access to the index is
>> complete. I can't see where it works to set both flags at the same
>> time.

> In a nearby bug I had to restructure the code that in a way thats
> similar to this anyway, so that seems fine. Maybe you can fix the
> bug ontop of the two attached patches?

Perfect; these two patches provide a spot in the code which is
exactly right for handling the predicate lock adjustments. Attached
is a patch which applies on top of the two you sent.

Thanks!

-Kevin
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 1320,1325  index_drop(Oid indexId, bool concurrent)
--- 1320,1337 
  	 * In the concurrent case we make sure that nobody can be looking at the
  	 * indexes by dropping the index in multiple steps, so we don't need a full
  	 * fledged AccessExlusiveLock yet.
+ 	 *
+ 	 * All predicate locks on the index are about to be made invalid. Promote
+ 	 * them to relation locks on the heap. For correctness the index must not
+ 	 * be seen with indisvalid = true during query planning after the move
+ 	 * starts, so that the index will not be used for a scan after the
+ 	 * predicate lock move, as this could create new predicate locks on the
+ 	 * index which would not ensure a heap relation lock. Also, the index must
+ 	 * not be seen during execution of a heap tuple insert with indisready =
+ 	 * false before the move is complete, since the conflict with the
+ 	 * predicate lock on the index gap could be missed before the lock on the
+ 	 * heap relation is in place to detect a conflict based on the heap tuple
+ 	 * insert.
  	 */
  	heapId = IndexGetRelation(indexId, false);
  	if (concurrent)
***
*** 1439,1444  index_drop(Oid indexId, bool concurrent)
--- 1451,1464 
  		}
  
  		/*
+ 		 * No more predicate locks will be acquired on this index, and we're
+ 		 * about to stop doing inserts into the index which could show
+ 		 * conflicts with existing predicate locks, so now is the time to move
+ 		 * them to the heap relation.
+ 		 */
+ 		TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
+ 		/*
  		 * now we are sure that nobody uses the index for queries, they just
  		 * might have it opened for updating it. So now we can unset
  		 * ->indisready and wait till nobody could update the index anymore.
***
*** 1507,1518  index_drop(Oid indexId, bool concurrent)
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 
! 	/*
! 	 * All predicate locks on the index are about to be made invalid. Promote
! 	 * them to relation locks on the heap.
! 	 */
! 	TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files
--- 1527,1534 
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 	else
! 		TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-18 Thread Andres Freund
On Thursday, October 18, 2012 06:12:02 AM Kevin Grittner wrote:
> Kevin Grittner wrote:
> > Hmm. The comment is probably better now, but I've been re-checking
> > the code, and I think my actual code change is completely wrong.
> > Give me a bit to sort this out.
> 
> I'm having trouble seeing a way to make this work without rearranging
> the code for concurrent drop to get to a state where it has set
> indisvalid = false, made that visible to all processes, and ensured
> that all scans of the index are complete -- while indisready is still
> true. That is the point where TransferPredicateLocksToHeapRelation()
> could be safely called. Then we would need to set indisready = false,
> make that visible to all processes, and ensure that all access to the
> index is complete. I can't see where it works to set both flags at
> the same time. I want to sleep on it to see if I can come up with any
> other way, but right now that's the only way I'm seeing to make DROP
> INDEX CONCURRENTLY compatible with SERIALIZABLE transactions. :-(

In a nearby bug I had to restructure the code that in a way thats similar to 
this anyway, so that seems fine. Maybe you can fix the bug ontop of the two 
attached patches?

Greetings,

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From 037daa464d3fc63dbc943b13dd90f477a4fc9aba Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 25 Sep 2012 01:41:29 +0200
Subject: [PATCH 1/2] Fix concurrency issues in concurrent index drops

Previously a DROP INDEX CONCURRENTLY started with unsetting indisvalid *and*
indisready. Thats problematic if some transaction is still looking at the index
and another transction makes changes. See the example below.

Now we do the drop in three stages, just as a concurrent index build. First
unset indivalid, wait, unset indisready, wait, drop index.

Example:

Session 1:
CREATE TABLE test_drop_concurrently(id serial primary key, data int);
INSERT INTO test_drop_concurrently(data) SELECT * FROM generate_series(1,
10);
CREATE INDEX test_drop_concurrently_data ON test_drop_concurrently(data);
BEGIN;
EXPLAIN ANALYZE SELECT * FROM test_drop_concurrently WHERE data = 34343;
SELECT * FROM test_drop_concurrently WHERE data = 34343;
(1 row)

Session 2:
BEGIN;
SELECT * FROM test_drop_concurrently WHERE data = 34343;

Session 3:
DROP INDEX CONCURRENTLY test_drop_concurrently_data;
(in-progress)

Session 2:
INSERT INTO test_drop_concurrently(data) SELECT * FROM generate_series(1, 10);
COMMIT;

Session 1:
SELECT * FROM test_drop_concurrently WHERE data = 34343;
(1 row)
SET enable_bitmapscan = false;
SET enable_indexscan = false;
SELECT * FROM test_drop_concurrently WHERE data = 34343;
(2 rows)
---
 src/backend/catalog/index.c |   99 ---
 1 file changed, 84 insertions(+), 15 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 464950b..b39536e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1316,6 +1316,10 @@ index_drop(Oid indexId, bool concurrent)
 	 * table lock strong enough to prevent all queries on the table from
 	 * proceeding until we commit and send out a shared-cache-inval notice
 	 * that will make them update their index lists.
+	 *
+	 * In the concurrent case we make sure that nobody can be looking at the
+	 * indexes by dropping the index in multiple steps, so we don't need a full
+	 * fledged AccessExlusiveLock yet.
 	 */
 	heapId = IndexGetRelation(indexId, false);
 	if (concurrent)
@@ -1336,7 +1340,19 @@ index_drop(Oid indexId, bool concurrent)
 
 	/*
 	 * Drop Index concurrently is similar in many ways to creating an index
-	 * concurrently, so some actions are similar to DefineIndex()
+	 * concurrently, so some actions are similar to DefineIndex() just in the
+	 * reverse order.
+	 *
+	 * First we unset indisvalid so queries starting afterwards don't use the
+	 * index to answer queries anymore. We have to keep indisready = true
+	 * though so transactions that are still using the index can continue to
+	 * see valid index contents. E.g. when they are using READ COMMITTED mode,
+	 * and another transactions that started later commits makes changes and
+	 * commits, they need to see those new tuples in the index.
+	 *
+	 * After all transactions that could possibly have used it for queries
+	 * ended we can unset indisready and wait till nobody could be updating it
+	 * anymore.
 	 */
 	if (concurrent)
 	{
@@ -1355,21 +1371,14 @@ index_drop(Oid indexId, bool concurrent)
 			elog(ERROR, "cache lookup failed for index %u", indexId);
 		indexForm = (Form_pg_index) GETSTRUCT(tuple);
 
-		indexForm->indisvalid = false;	/* make unusable for queries */
-		indexForm->indisready = false;	/* make invisible to changes */
+		indexForm->indisvalid = false;	/* make unusable for new queries */
+		/* we keep indisready == true so it still gets updated */
 
 		simple_heap_update(i

Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Kevin Grittner
Kevin Grittner wrote:
> Hmm. The comment is probably better now, but I've been re-checking
> the code, and I think my actual code change is completely wrong.
> Give me a bit to sort this out.

I'm having trouble seeing a way to make this work without rearranging
the code for concurrent drop to get to a state where it has set
indisvalid = false, made that visible to all processes, and ensured
that all scans of the index are complete -- while indisready is still
true. That is the point where TransferPredicateLocksToHeapRelation()
could be safely called. Then we would need to set indisready = false,
make that visible to all processes, and ensure that all access to the
index is complete. I can't see where it works to set both flags at
the same time. I want to sleep on it to see if I can come up with any
other way, but right now that's the only way I'm seeing to make DROP
INDEX CONCURRENTLY compatible with SERIALIZABLE transactions. :-(

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Kevin Grittner
Kevin Grittner wrote:

> It took me a while to spot it, but yeah -- I reversed the field
> names in the comment. :-( The email was right; I fixed the comment.

Hmm. The comment is probably better now, but I've been re-checking
the code, and I think my actual code change is completely wrong. Give
me a bit to sort this out.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Kevin Grittner
Tom Lane wrote:
> "Kevin Grittner"  writes:
>> To put that another way, it should be done at a time when it is
>> sure that no query sees indisvalid = true and no query has yet
>> seen indisready = false. Patch attached. Will apply if nobody sees
>> a problem with it.
> 
> The above statement of the requirement doesn't seem to match what
> you put in the comment:
> 
>> + * All predicate locks on the index are about to be made invalid. Promote
>> + * them to relation locks on the heap. For correctness this must be done
>> + * after the index was last seen with indisready = true and before it is
>> + * seen with indisvalid = false.

It took me a while to spot it, but yeah -- I reversed the field names
in the comment. :-( The email was right; I fixed the comment.

> and the comment is rather vaguely worded too (last seen by what?).
> Please wordsmith that a bit more.

I tried to leave nothing to the imagination this time.

I think the README or function comments on the predicate locking
should include more about this sort of thing. In retrospect, the
documentation reads more like a maintainer's guide for SSI, and the
user's guide is lacking. Something like that would make it easier for
people working on other features (like DROP INDEX CONCURRENTLY) to
know where to put which calls. That's more than I can do right at the
moment, but I'll make a note of it. I suspect that if enough of this
is in the comment above the function definition, less of it needs to
be near each call site.

-Kevin
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 1316,1321  index_drop(Oid indexId, bool concurrent)
--- 1316,1333 
  	 * table lock strong enough to prevent all queries on the table from
  	 * proceeding until we commit and send out a shared-cache-inval notice
  	 * that will make them update their index lists.
+ 	 *
+ 	 * All predicate locks on the index are about to be made invalid. Promote
+ 	 * them to relation locks on the heap. For correctness the index must not
+ 	 * be seen with indisvalid = true during query planning after the move
+ 	 * starts, so that the index will not be used for a scan after the
+ 	 * predicate lock move, as this could create new predicate locks on the
+ 	 * index which would not ensure a heap relation lock. Also, the index must
+ 	 * not be seen during execution of a heap tuple insert with indisready =
+ 	 * false before the move is complete, since the conflict with the
+ 	 * predicate lock on the index gap could be missed before the lock on the
+ 	 * heap relation is in place to detect a conflict based on the heap tuple
+ 	 * insert.
  	 */
  	heapId = IndexGetRelation(indexId, false);
  	if (concurrent)
***
*** 1349,1354  index_drop(Oid indexId, bool concurrent)
--- 1361,1368 
  		 */
  		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
  
+ 		TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
  		tuple = SearchSysCacheCopy1(INDEXRELID,
  	ObjectIdGetDatum(indexId));
  		if (!HeapTupleIsValid(tuple))
***
*** 1438,1449  index_drop(Oid indexId, bool concurrent)
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 
! 	/*
! 	 * All predicate locks on the index are about to be made invalid. Promote
! 	 * them to relation locks on the heap.
! 	 */
! 	TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files
--- 1452,1459 
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 	else
! 		TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Tom Lane
"Kevin Grittner"  writes:
> To put that another way, it should be done at a time when it is sure
> that no query sees indisvalid = true and no query has yet seen
> indisready = false.  Patch attached.  Will apply if nobody sees a
> problem with it.

The above statement of the requirement doesn't seem to match what you
put in the comment:

> +  * All predicate locks on the index are about to be made invalid. 
> Promote
> +  * them to relation locks on the heap. For correctness this must be done
> +  * after the index was last seen with indisready = true and before it is
> +  * seen with indisvalid = false.

and the comment is rather vaguely worded too (last seen by what?).
Please wordsmith that a bit more.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Kevin Grittner
Kevin Grittner wrote:
> Simon Riggs wrote:
>> On 6 October 2012 00:56, Tom Lane  wrote:
>  
>>> 2. DROP INDEX CONCURRENTLY doesn't bother to do
>>> TransferPredicateLocksToHeapRelation until long after it's
>>> invalidated the index. Surely that's no good? Is it even possible
>>> to do that correctly, when we don't have a lock that will prevent
>>> new predicate locks from being taken out meanwhile?
>> 
>> No idea there. Input appreciated.
 
> If the creation of a new tuple by insert or update would not
> perform the related index tuple insertion, and the lock has not yet
> been transfered to the heap relation, yes we have a problem.  Will
> take a look at the code.
> 
> Creation of new predicate locks while in this state has no bearing
> on the issue as long as locks are transferred to the heap relation
> after the last scan using the index has completed.
 
To put that another way, it should be done at a time when it is sure
that no query sees indisvalid = true and no query has yet seen
indisready = false.  Patch attached.  Will apply if nobody sees a
problem with it.

-Kevin
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 1316,1321  index_drop(Oid indexId, bool concurrent)
--- 1316,1326 
  	 * table lock strong enough to prevent all queries on the table from
  	 * proceeding until we commit and send out a shared-cache-inval notice
  	 * that will make them update their index lists.
+ 	 *
+ 	 * All predicate locks on the index are about to be made invalid. Promote
+ 	 * them to relation locks on the heap. For correctness this must be done
+ 	 * after the index was last seen with indisready = true and before it is
+ 	 * seen with indisvalid = false.
  	 */
  	heapId = IndexGetRelation(indexId, false);
  	if (concurrent)
***
*** 1349,1354  index_drop(Oid indexId, bool concurrent)
--- 1354,1361 
  		 */
  		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
  
+ 		TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
  		tuple = SearchSysCacheCopy1(INDEXRELID,
  	ObjectIdGetDatum(indexId));
  		if (!HeapTupleIsValid(tuple))
***
*** 1438,1449  index_drop(Oid indexId, bool concurrent)
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 
! 	/*
! 	 * All predicate locks on the index are about to be made invalid. Promote
! 	 * them to relation locks on the heap.
! 	 */
! 	TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files
--- 1445,1452 
  		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
  		userIndexRelation = index_open(indexId, AccessExclusiveLock);
  	}
! 	else
! 		TransferPredicateLocksToHeapRelation(userIndexRelation);
  
  	/*
  	 * Schedule physical removal of the files

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-17 Thread Kevin Grittner
Simon Riggs wrote:
> On 6 October 2012 00:56, Tom Lane  wrote:
 
>> 2. DROP INDEX CONCURRENTLY doesn't bother to do
>> TransferPredicateLocksToHeapRelation until long after it's
>> invalidated the index. Surely that's no good? Is it even possible
>> to do that correctly, when we don't have a lock that will prevent
>> new predicate locks from being taken out meanwhile?
> 
> No idea there. Input appreciated.

[Sorry for delayed response; fighting through a backlog.]
 
If the creation of a new tuple by insert or update would not perform
the related index tuple insertion, and the lock has not yet been
transfered to the heap relation, yes we have a problem.  Will take a
look at the code.

Creation of new predicate locks while in this state has no bearing on
the issue as long as locks are transferred to the heap relation after
the last scan using the index has completed.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-09 Thread Michael Paquier
On Tue, Oct 9, 2012 at 6:11 PM, Simon Riggs  wrote:

> On 6 October 2012 00:56, Tom Lane  wrote:
> > 1.  These operations think they can use ordinary heap_update operations
> > to change pg_index entries when they don't have exclusive lock on the
> > parent table.  The lack of ex-lock means that another backend could be
> > currently loading up its list of index OIDs for the table --- and since
> > it scans pg_index with SnapshotNow to do that, the heap_update could
> > result in the other backend failing to see this index *at all*.  That's
> > okay if it causes the other backend to not use the index for scanning...
> > but not okay if it causes the other backend to fail to make index
> > entries it is supposed to make.
> >
> > I think this could possibly be fixed by using nontransactional
> > update-in-place when we're trying to change indisvalid and/or
> > indisready, but I've not really thought through the details.
>
> Only solution there is to fix SnapshotNow, as we discussed last
> Christmas. I'll dig out my patch for that, which IIRC I was nervous of
> committing at last minute into 9.2.
>
Hi Simon,
Do you have an URL to this patch?
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-09 Thread Simon Riggs
On 6 October 2012 00:56, Tom Lane  wrote:
> 1.  These operations think they can use ordinary heap_update operations
> to change pg_index entries when they don't have exclusive lock on the
> parent table.  The lack of ex-lock means that another backend could be
> currently loading up its list of index OIDs for the table --- and since
> it scans pg_index with SnapshotNow to do that, the heap_update could
> result in the other backend failing to see this index *at all*.  That's
> okay if it causes the other backend to not use the index for scanning...
> but not okay if it causes the other backend to fail to make index
> entries it is supposed to make.
>
> I think this could possibly be fixed by using nontransactional
> update-in-place when we're trying to change indisvalid and/or
> indisready, but I've not really thought through the details.

Only solution there is to fix SnapshotNow, as we discussed last
Christmas. I'll dig out my patch for that, which IIRC I was nervous of
committing at last minute into 9.2.

> 2.  DROP INDEX CONCURRENTLY doesn't bother to do
> TransferPredicateLocksToHeapRelation until long after it's invalidated
> the index.  Surely that's no good?  Is it even possible to do that
> correctly, when we don't have a lock that will prevent new predicate
> locks from being taken out meanwhile?

No idea there. Input appreciated.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-07 Thread Greg Stark
On Sat, Oct 6, 2012 at 12:56 AM, Tom Lane  wrote:
> 1.  These operations think they can use ordinary heap_update operations
> to change pg_index entries when they don't have exclusive lock on the
> parent table.

I wonder if we need a manual that lists exhaustively what operations
can be taken with locks on various objects. Relying on understanding
every other possible operation in the system and knowing which
operations might or might not happen is clearly getting us into
trouble.

This would be a lot less worrying if we had a regression test suite
that could reliably test race conditions like this.

> The lack of ex-lock means that another backend could be
> currently loading up its list of index OIDs for the table --- and since
> it scans pg_index with SnapshotNow to do that, the heap_update could
> result in the other backend failing to see this index *at all*.  That's
> okay if it causes the other backend to not use the index for scanning...
> but not okay if it causes the other backend to fail to make index
> entries it is supposed to make.

I marked this email to read later but now I'm reading it and I realize
I'm not sure I can really help. Using a nontransactional update to
flag indisready makes sense to me since the whole point of the wait is
that we've waited long enough that anyone should see this record. I
guess that's what I had in mind was happening when I wrote the update.
But I'm not sure what other requirements nontransactional updates have
to work properly.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-06 Thread Simon Riggs
On 6 October 2012 00:56, Tom Lane  wrote:
> 1.  These operations think they can use ordinary heap_update operations
> to change pg_index entries when they don't have exclusive lock on the
> parent table.  The lack of ex-lock means that another backend could be
> currently loading up its list of index OIDs for the table --- and since
> it scans pg_index with SnapshotNow to do that, the heap_update could
> result in the other backend failing to see this index *at all*.  That's
> okay if it causes the other backend to not use the index for scanning...
> but not okay if it causes the other backend to fail to make index
> entries it is supposed to make.
>
> I think this could possibly be fixed by using nontransactional
> update-in-place when we're trying to change indisvalid and/or
> indisready, but I've not really thought through the details.
>
> 2.  DROP INDEX CONCURRENTLY doesn't bother to do
> TransferPredicateLocksToHeapRelation until long after it's invalidated
> the index.  Surely that's no good?  Is it even possible to do that
> correctly, when we don't have a lock that will prevent new predicate
> locks from being taken out meanwhile?

I'm in the middle of reviewing other fixes there, so will comment
soon, just not right now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY

2012-10-05 Thread Tom Lane
1.  These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table.  The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*.  That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.

I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.

2.  DROP INDEX CONCURRENTLY doesn't bother to do
TransferPredicateLocksToHeapRelation until long after it's invalidated
the index.  Surely that's no good?  Is it even possible to do that
correctly, when we don't have a lock that will prevent new predicate
locks from being taken out meanwhile?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers