Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-03-28 Thread Robert Haas
On Fri, Feb 3, 2012 at 10:28 AM, Robert Haas  wrote:
> On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs  wrote:
>> On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut  wrote:
>>> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
 Patch now locks index in AccessExclusiveLock in final stage of drop.
>>>
>>> Doesn't that defeat the point of doing the CONCURRENTLY business in the
>>> first place?
>>
>> That was my initial reaction.
>>
>> We lock the index in AccessExclusiveLock only once we are certain
>> nobody else is looking at it any more.
>>
>> So its a Kansas City Shuffle, with safe locking in case of people
>> doing strange low level things.
>
> Yeah, I think this is much safer, and in this version that doesn't
> seem to harm concurrency.
>
> Given our previous experiences in this area, I wouldn't like to bet my
> life savings on this having no remaining bugs - but if it does, I
> can't find them.
>
> I'll mark this "Ready for Committer".

I don't think this has been committed - does that mean you've decided
against doing so, or just haven't had the round tuits?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-12 Thread Jeff Janes
On Sun, Jan 8, 2012 at 8:19 AM, Simon Riggs  wrote:
> On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs  wrote:
>
>> Not having a freelist at all is probably a simpler way of avoiding the
>> lock contention, so I'll happily back that suggestion instead. Patch
>> attached, previous patch revoked.
>
> v2 attached with cleanup of some random stuff that crept onto patch.

Hi Simon,

Based on the way this patch leaves the old code behind (using #ifdef),
this looks more like a WIP patch which you want people to do
performance testing with, rather than  patch proposed for committing.
If that is the case, could you outline the type of performance testing
where you think it would make a difference (and whether it should be
done on top of the main patch from this thread, the concurrent index
drop one)?

Also, it would be much easier to do the performance testing if this
behavior was controlled by a temporarily added GUC, rather than an
#ifdef.  Do you think it is feasible to do that, or would the overhead
of a single "if (some_guc)" per StrategyGetBuffer and
StrategyFreeBuffer call distort things too much?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-03 Thread Robert Haas
On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs  wrote:
> On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut  wrote:
>> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
>>> Patch now locks index in AccessExclusiveLock in final stage of drop.
>>
>> Doesn't that defeat the point of doing the CONCURRENTLY business in the
>> first place?
>
> That was my initial reaction.
>
> We lock the index in AccessExclusiveLock only once we are certain
> nobody else is looking at it any more.
>
> So its a Kansas City Shuffle, with safe locking in case of people
> doing strange low level things.

Yeah, I think this is much safer, and in this version that doesn't
seem to harm concurrency.

Given our previous experiences in this area, I wouldn't like to bet my
life savings on this having no remaining bugs - but if it does, I
can't find them.

I'll mark this "Ready for Committer".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Simon Riggs
On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut  wrote:
> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
>> Patch now locks index in AccessExclusiveLock in final stage of drop.
>
> Doesn't that defeat the point of doing the CONCURRENTLY business in the
> first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Peter Eisentraut
On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
> Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Simon Riggs
On Wed, Feb 1, 2012 at 2:56 AM, Robert Haas  wrote:

> I improved the grammar issues in the attached version of the patch -
> the syntax is now simpler and more consistent, IF EXISTS now works,
> and RESTRICT is accepted (without changing the behavior) while CASCADE
> fails with a nicer error message.  I also fixed a bug in
> RangeVarCallbackForDropRelation.

Plus tests as well. Many thanks.

I fixed the main bug you observed and your test case now works
perfectly. I used pgbench to continuously drop/create an index, so a
little more than manual testing.

v5 Attached.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..343f7ac 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
 
  
 
@@ -50,6 +50,29 @@ DROP INDEX [ IF EXISTS ] name [, ..

 

+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ 
+ 
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the CONCURRENTLY
+  parameter is specified.  Regular DROP INDEX command can be
+  performed within a transaction block, but
+  DROP INDEX CONCURRENTLY cannot.
+  The CASCADE option is not supported when dropping an index concurrently.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 			Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object,
  * deletion of the owning object.)
  */
 ReleaseDeletionLock(object);
-AcquireDeletionLock(&otherObject);
+AcquireDeletionLock(&otherObject, 0);
 
 /*
  * The owning object might have been deleted while we waited
@@ -706,7 +710,7 @@ findDependentObjects(const ObjectAddress *object,
 		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
-		AcquireDeletionLock(&otherObject);
+		AcquireDeletionLock(&otherObject, 0);
 
 		/*
 		 * The dependent object mig

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-31 Thread Robert Haas
On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs  wrote:
> I can't see any way that situation can occur. The patch *explicitly*
> waits until all people that can see the index as usable have dropped
> their lock. So I don't think this is necessary. Having said that,
> since we are talking about the index and not the whole table, if I
> believe the above statement then I can't have any reasonable objection
> to doing as you suggest.
>
> Patch now locks index in AccessExclusiveLock in final stage of drop.
>
> v3 attached.
>
> If you have suggestions to improve grammar issues, they;re most
> welcome. Otherwise this seems good to go.

I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message.  I also fixed a bug in
RangeVarCallbackForDropRelation.

However, testing reveals that this doesn't really work.  I found that
by playing around with a couple of concurrent sessions, I could get a
SELECT statement to hang waiting on the AccessExclusiveLock in the
second phase, because we're really still opening the relation (even
though it's invalid) and thus still locking it, and thus still waiting
for the AccessExclusiveLock.  And I managed to get this:

rhaas=# select * from foo where a = 1;
ERROR:  could not open relation with OID 16388

So then I tried changing the lock level back to
ShareUpdateExclusiveLock.  It appears that that merely narrows the
window for errors of this type, rather than getting rid of them.  I
ran this in one window:

pgbench -c 30 -j 30 -f f -n -T 180

where the file f contained this:

select * from foo where a = 1

and then in the other window I repeatedly did this:

rhaas=# create index foo_a on foo (a);
CREATE INDEX
rhaas=# drop index concurrently foo_a;
DROP INDEX

and pgbench started issuing messages like this:

Client 2 aborted in state 0: ERROR:  could not open relation with OID 16394
Client 9 aborted in state 0: ERROR:  could not open relation with OID 16397
Client 18 aborted in state 0: ERROR:  could not open relation with OID 16397

My conclusion is that regardless of whether ShareUpdateExclusiveLock
or AccessExclusiveLock is used for the final phase of the drop, this
isn't safe.  I haven't tracked down exactly where the wheels are
coming off, but the problem does not occur when using the non-current
form of DROP INDEX.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


drop_index_concurrently.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-29 Thread Simon Riggs
On Thu, Jan 26, 2012 at 2:25 AM, Robert Haas  wrote:
> On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs  wrote:
>> Your caution is wise. All users of an index have already checked
>> whether the index is usable at plan time, so although there is much
>> code that assumes they can look at the index itself, that is not
>> executed until after the correct checks.
>>
>> I'll look at VACUUM and other utilities, so thanks for that.
>>
>> I don't see much point in having the higher level lock, except perhaps
>> as a test this code works.
>
> I thought of another way this can cause a problem: suppose that while
> we're dropping the relation with only ShareUpdateExclusiveLock, we get
> as far as calling DropRelFileNodeBuffers.  Just after we finish, some
> other process that holds AccessShareLock or RowShareLock or
> RowExclusiveLock reads and perhaps even dirties a page in the
> relation.

I can't see any way that situation can occur. The patch *explicitly*
waits until all people that can see the index as usable have dropped
their lock. So I don't think this is necessary. Having said that,
since we are talking about the index and not the whole table, if I
believe the above statement then I can't have any reasonable objection
to doing as you suggest.

Patch now locks index in AccessExclusiveLock in final stage of drop.

v3 attached.

If you have suggestions to improve grammar issues, they;re most
welcome. Otherwise this seems good to go.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  
 
-DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY name
 
  
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, ..

 

+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ 
+ 
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the CONCURRENTLY
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular DROP INDEX command can be performed within
+  a transaction block, but DROP INDEX CONCURRENTLY cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 			Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DE

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-25 Thread Robert Haas
On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs  wrote:
> Your caution is wise. All users of an index have already checked
> whether the index is usable at plan time, so although there is much
> code that assumes they can look at the index itself, that is not
> executed until after the correct checks.
>
> I'll look at VACUUM and other utilities, so thanks for that.
>
> I don't see much point in having the higher level lock, except perhaps
> as a test this code works.

I thought of another way this can cause a problem: suppose that while
we're dropping the relation with only ShareUpdateExclusiveLock, we get
as far as calling DropRelFileNodeBuffers.  Just after we finish, some
other process that holds AccessShareLock or RowShareLock or
RowExclusiveLock reads and perhaps even dirties a page in the
relation.  Now we've got pages in the buffer pool that might even be
dirty, but the backing file is truncated or perhaps removed
altogether.  If the page is dirty, I think the background writer will
eventually choke trying to write out the page.  If the page is not
dirty, I'm less certain whether anything is going to explode
violently, but it seems mighty sketchy to have pages in the buffer
pool with a buffer tag that don't exist any more.  As the comment
says:

 *  It is the responsibility of higher-level code to ensure that the
 *  deletion or truncation does not lose any data that could be needed
 *  later.  It is also the responsibility of higher-level code to ensure
 *  that no other process could be trying to load more pages of the
 *  relation into buffers.

...and of course, the intended mechanism is an AccessExclusiveLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Simon Riggs
On Mon, Jan 23, 2012 at 4:01 PM, Tom Lane  wrote:

> The real
> problem there is that BufFreelistLock is also used to protect the
> clock sweep pointer.

Agreed

> I think basically we gotta find a way to allow
> multiple backends to run clock sweeps concurrently.  Or else fix
> things so that the freelist never (well, hardly ever) runs dry.

Agreed.

The only question is what do we do now. I'm happy with the thought of
jam tomorrow, I'd like to see some minor improvements now, given that
when we say "we'll do that even better in the next release" it often
doesn't happen. Which is where those patches come in.

I've posted an improved lock wait analysis patch and have scheduled
some runs on heavily used systems to see what this tells us. Results
from live machines also greatly appreciated, since test systems seem
likely to be inappropriate tests. Patch copied here again.

This is a key issue since RAM is cheap enough now that people are
swamped by it, so large shared_buffers are very common.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Jeff Janes
On Mon, Jan 23, 2012 at 5:06 AM, Robert Haas  wrote:
> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby  wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...
>>
>>> That's kinda my feeling as well.  The free list in its current form is
>>> pretty much useless, but I don't think we'll save much by getting rid
>>> of it, because that's just a single test.  The expensive part of what
>>> we do while holding BufFreelistLock is, I think, iterating through
>>> buffers taking and releasing a spinlock on each one (!).
>>
>> Yeah ... spinlocks that, by definition, will be uncontested.
>
> What makes you think that they are uncontested?

It is automatically partitioned over 131,072 spinlocks if
shared_buffers=1GB, so the chances of contention seem pretty low.
If a few very hot index root blocks are heavily contested, still that
would only be
encountered a few times out of the 131,072.  I guess we could count
them, similar
to your LWLOCK_STATS changes.

> Or for that matter,
> that even an uncontested spinlock operation is cheap enough to do
> while holding a badly contended LWLock?

Is the concern that getting a uncontested spinlock has lots of
instructions, or that the associated fences are expensive?

>
>> So I think
>> it would be advisable to prove rather than just assume that that's a
>> problem.
>
> It's pretty trivial to prove that there is a very serious problem with
> BufFreelistLock.  I'll admit I can't prove what the right fix is just
> yet, and certainly measurement is warranted.


From my own analysis and experiments, the mere act of getting the
BufFreelistLock when it is contended is far more important than
anything actually done while holding that lock.  When the clock-sweep
is done often enough that the lock is contended, then it is done often
enough to keep the average usagecount low and so the average number of
buffers visited during a clock sweep before finding a usable one was
around 2.0.  YMMV.

Can we get some people who run busy high-CPU servers that churn the
cache and think they may be getting hit with this problem post the
results of this query:

select usagecount, count(*) from pg_buffercache group by usagecount;


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Jeff Janes
On Sat, Jan 21, 2012 at 2:29 PM, Jim Nasby  wrote:
> On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:

>> So, you're proposing that we remove freelist altogether? Sounds reasonable, 
>> but that needs to be performance tested somehow. I'm not sure what exactly 
>> the test should look like, but it probably should involve an OLTP workload, 
>> and large tables being created, populated to fill the cache with pages from 
>> the table, and dropped, while the OLTP workload is running. I'd imagine that 
>> the worst case scenario looks something like that.
>
> We should also look at having the freelist do something useful, instead of 
> just dropping it completely. Unfortunately that's probably more work...

If the head and tail are both protected by BufFreelistLock, I'm pretty
sure this will make things worse, not better.

If we change to having head and tail protected by separate spinlocks,
then I don't see how you can remove the last buffer from the list, or
add a buffer to an empty list, without causing havoc.

Does anyone have ideas for implementing a cross-platform, lock-free,
FIFO linked list?  Without that, I don't see how we are going to get
anywhere on this approach.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas
On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane  wrote:
 The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).
>
>>> Yeah ... spinlocks that, by definition, will be uncontested.
>
>> What makes you think that they are uncontested?
>
> Ah, never mind.  I was thinking that we'd only be touching buffers that
> were *on* the freelist, but of course this is incorrect.  The real
> problem there is that BufFreelistLock is also used to protect the
> clock sweep pointer.  I think basically we gotta find a way to allow
> multiple backends to run clock sweeps concurrently.  Or else fix
> things so that the freelist never (well, hardly ever) runs dry.

I'd come to the same conclusion myself.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane  wrote:
>>> The expensive part of what
>>> we do while holding BufFreelistLock is, I think, iterating through
>>> buffers taking and releasing a spinlock on each one (!).

>> Yeah ... spinlocks that, by definition, will be uncontested.

> What makes you think that they are uncontested?

Ah, never mind.  I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect.  The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer.  I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently.  Or else fix
things so that the freelist never (well, hardly ever) runs dry.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Simon Riggs
On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas  wrote:

> It's pretty trivial to prove that there is a very serious problem with
> BufFreelistLock.  I'll admit I can't prove what the right fix is just
> yet, and certainly measurement is warranted.

I agree there is a problem with BufFreelistLock (so please share your
results with Heikki, who seems not to).

As a result, I've published patches which reduce contention on that
lock in various ways, all of which seem valid to me.

There are lots of things we could have done for 9.2 but didn't, yet we
have some things that did get done on the table right now so it would
be useful to push those through immediately or at least defer
discussion on other things until we get back around to this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas
On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby  wrote:
>>> We should also look at having the freelist do something useful, instead of 
>>> just dropping it completely. Unfortunately that's probably more work...
>
>> That's kinda my feeling as well.  The free list in its current form is
>> pretty much useless, but I don't think we'll save much by getting rid
>> of it, because that's just a single test.  The expensive part of what
>> we do while holding BufFreelistLock is, I think, iterating through
>> buffers taking and releasing a spinlock on each one (!).
>
> Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?  Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

> So I think
> it would be advisable to prove rather than just assume that that's a
> problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock.  I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby  wrote:
>> We should also look at having the freelist do something useful, instead of 
>> just dropping it completely. Unfortunately that's probably more work...

> That's kinda my feeling as well.  The free list in its current form is
> pretty much useless, but I don't think we'll save much by getting rid
> of it, because that's just a single test.  The expensive part of what
> we do while holding BufFreelistLock is, I think, iterating through
> buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.  So I think
it would be advisable to prove rather than just assume that that's a
problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Robert Haas
On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby  wrote:
> We should also look at having the freelist do something useful, instead of 
> just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well.  The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test.  The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).  So I guess
my vote would be to leave it alone pending further study, and maybe
remove it later if we can't find a way to rejigger it to be more
useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Jim Nasby
On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:
> On 04.01.2012 13:14, Simon Riggs wrote:
>> On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane  wrote:
>>> Jim Nasby  writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
> This could well be related to the fact that DropRelFileNodeBuffers()
> does a scan of shared_buffers, which is an O(N) approach no matter the
> size of the index.
>>> 
 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.
>>> 
>>> No, we can't, because if they're still dirty then the bgwriter would
>>> first try to write them to the no-longer-existing storage file.  It's
>>> important that we kill the buffers immediately during relation drop.
>>> 
>>> I'm still thinking that it might be sufficient to mark the buffers
>>> invalid and let the clock sweep find them, thereby eliminating the need
>>> for a freelist.
>> 
>> My patch puts things on the freelist only when it is free to do so.
>> Not having a freelist at all is probably a simpler way of avoiding the
>> lock contention, so I'll happily back that suggestion instead. Patch
>> attached, previous patch revoked.
> 
> So, you're proposing that we remove freelist altogether? Sounds reasonable, 
> but that needs to be performance tested somehow. I'm not sure what exactly 
> the test should look like, but it probably should involve an OLTP workload, 
> and large tables being created, populated to fill the cache with pages from 
> the table, and dropped, while the OLTP workload is running. I'd imagine that 
> the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just 
dropping it completely. Unfortunately that's probably more work...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-21 Thread Simon Riggs
On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas  wrote:
> On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs  wrote:
>> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas  wrote:
>>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs  wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.
>>>
>>> No, I haven't done a full review yet - I just noticed that right at
>>> the outset and wanted to be sure that got addressed.  I can look it
>>> over more carefully, and test it.
>>
>> Corrected your point, and another found during retest.
>>
>> Works as advertised, docs corrected.
>
> This comment needs updating:
>
>    /*
>     * To drop an index safely, we must grab exclusive lock on its parent
>     * table.  Exclusive lock on the index alone is insufficient because
>     * another backend might be about to execute a query on the parent table.
>     * If it relies on a previously cached list of index OIDs, then it could
>     * attempt to access the just-dropped index.  We must therefore take a
>     * table lock strong enough to prevent all queries on the table from
>     * proceeding until we commit and send out a shared-cache-inval notice
>     * that will make them update their index lists.
>     */

OK, I did reword some but clearly not enough.

> Speaking of that comment, why does a concurrent index drop need any
> lock at all on the parent table?  If, as the current comment text
> says, the point of locking the parent table is to make sure that no
> new backends can create relcache entries until our transaction
> commits, then it's not needed here, or at least not for that purpose.
> We're going to out-wait anyone who has the index open *after* we've
> committed our changes to the pg_index entry, so there shouldn't be a
> race condition there.  There may very well be a reason why we need a
> lock on the parent, or there may not, but that's not it.

I feel happier locking the parent as well. I'd rather avoid strange
weirdness later as has happened before in this type of discussion.

> On the flip side, it strikes me that there's considerable danger that
> ShareUpdateExclusiveLock on the index might not be strong enough.  In
> particular, it would fall down if there's anyone who takes an
> AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
> to access the index data despite its being marked !indisready and
> !indisvalid.  There are certainly instances of this in contrib, such
> as in pgstatindex(), which I'm pretty sure will abort with a nastygram
> if somebody truncates away the file under it.  That might not be
> enough reason to reject the patch, but I do think it would be the only
> place in the system where we allow something like that to happen.  I'm
> not sure whether there are any similar risks in core - I am a bit
> worried about get_actual_variable_range() and gincostestimate(), but I
> can't tell for sure whether there is any actual problem there, or
> elsewhere.  I wonder if we should only do the *first* phase of the
> drop with ShareUpdateExcusliveLock, and then after outwaiting
> everyone, take an AccessExclusiveLock on the index (but not the table)
> to do the rest of the work.  If that doesn't allow the drop to proceed
> concurrently, then we go find the places that block against it and
> teach them not to lock/use/examine indexes that are !indisready.  That
> way, the worst case is that D-I-C is less concurrent than we'd like,
> rather than making random other things fall over - specifically,
> anything that count on our traditional guarantee that AccessShareLock
> is enough to keep the file from disappearing.

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.


> In the department of minor nitpicks, the syntax synopsis you've added
> looks wrong:
>
> DROP INDEX
>       [ IF EXISTS ] name
> [, ...] [ CASCADE | RESTRICT ]
>       CONCURRENTLY name
>
> That implies that you may write if exists, followed by 1 or more
> names, followed by cascade or restrict.  After that you must write
> CONCURRENTLY, followed by exactly one name.  I think what you want is:
>
> DROP INDEX [ IF EXISTS ]  class="PARAMETER">name [, ...] [ CASCADE | RESTRICT ]
> DROP INDEX CONCURRENTLY name
>
> ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
> CONCURRENTLY.  It could also be very useful to be able to concurrently
> drop multiple indexes at once, since that would require waiting out
> all concurrent transactions only once instead of once per drop.
> Either way, I think we should have only one syntax, and then reject
> combinations we can't handle in t

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-20 Thread Robert Haas
On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs  wrote:
> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas  wrote:
>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs  wrote:
>>> Can I just check with you that the only review comment is a one line
>>> change? Seems better to make any additional review comments in one go.
>>
>> No, I haven't done a full review yet - I just noticed that right at
>> the outset and wanted to be sure that got addressed.  I can look it
>> over more carefully, and test it.
>
> Corrected your point, and another found during retest.
>
> Works as advertised, docs corrected.

This comment needs updating:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table?  If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there.  There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough.  In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid.  There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it.  That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen.  I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere.  I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work.  If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready.  That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
   [ IF EXISTS ] name
[, ...] [ CASCADE | RESTRICT ]
   CONCURRENTLY name

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict.  After that you must write
CONCURRENTLY, followed by exactly one name.  I think what you want is:

DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY name

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY.  It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code.  So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] name [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time.  That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-20 Thread Heikki Linnakangas

On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane  wrote:

Jim Nasby  writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.



Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.


No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.


My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.


So, you're proposing that we remove freelist altogether? Sounds 
reasonable, but that needs to be performance tested somehow. I'm not 
sure what exactly the test should look like, but it probably should 
involve an OLTP workload, and large tables being created, populated to 
fill the cache with pages from the table, and dropped, while the OLTP 
workload is running. I'd imagine that the worst case scenario looks 
something like that.


Removing the freelist should speed up the drop, but the question is how 
costly are the extra clock sweeps that the backends have to do.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-19 Thread Simon Riggs
On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas  wrote:
> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs  wrote:
>> Can I just check with you that the only review comment is a one line
>> change? Seems better to make any additional review comments in one go.
>
> No, I haven't done a full review yet - I just noticed that right at
> the outset and wanted to be sure that got addressed.  I can look it
> over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  
 
-DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY name
 
  
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, ..

 

+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ 
+ 
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the CONCURRENTLY
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular DROP INDEX command can be performed within
+  a transaction block, but DROP INDEX CONCURRENTLY cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 0b3d489..6884fdb 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   DropBehavior behavior,
 	   int msglevel,
 	   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+			int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int option_flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 		 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+		 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, option_flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, option_flags);
 	}
 
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY)
+		hea

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-18 Thread Robert Haas
On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs  wrote:
> Can I just check with you that the only review comment is a one line
> change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed.  I can look it
over more carefully, and test it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-18 Thread Simon Riggs
On Wed, Jan 18, 2012 at 1:28 AM, Robert Haas  wrote:
> On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut  wrote:
>> On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
>>> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut  wrote:
>>> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>>> >> I don't see how setting indisvalid to false helps with this, because
>>> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
>>> >> using the index for queries but still make new index entries when a
>>> >> write operation happens - but to drop an index, I think you'd need to
>>> >> get into a state where no one was using the index for anything at all.
>>> >
>>> > ISTM that one would need to set indisready to false instead.
>>>
>>> Maybe we should set both to false?
>>
>> Well, ready = false and valid = true doesn't make any sense.  There is
>> only just-created -> ready -> valid.  We might as well convert that to a
>> single "char" column, as you had indicated in your earlier email.  But
>> that's independent of the proposed patch.
>
> Sure, but the point is that I think if you want everyone to stop
> touching the index, you ought to mark it both not-valid and not-ready,
> which the current patch doesn't do.

Thanks for the review and the important observation. I agree that I've
changed the wrong column. indisready must be set false. Also agree
setting both false makes most sense.

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut  wrote:
> On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
>> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut  wrote:
>> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>> >> I don't see how setting indisvalid to false helps with this, because
>> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
>> >> using the index for queries but still make new index entries when a
>> >> write operation happens - but to drop an index, I think you'd need to
>> >> get into a state where no one was using the index for anything at all.
>> >
>> > ISTM that one would need to set indisready to false instead.
>>
>> Maybe we should set both to false?
>
> Well, ready = false and valid = true doesn't make any sense.  There is
> only just-created -> ready -> valid.  We might as well convert that to a
> single "char" column, as you had indicated in your earlier email.  But
> that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Peter Eisentraut
On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut  wrote:
> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
> >> I don't see how setting indisvalid to false helps with this, because
> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
> >> using the index for queries but still make new index entries when a
> >> write operation happens - but to drop an index, I think you'd need to
> >> get into a state where no one was using the index for anything at all.
> >
> > ISTM that one would need to set indisready to false instead.
> 
> Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense.  There is
only just-created -> ready -> valid.  We might as well convert that to a
single "char" column, as you had indicated in your earlier email.  But
that's independent of the proposed patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Robert Haas
On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut  wrote:
> On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>> I don't see how setting indisvalid to false helps with this, because
>> IIUC when a session sees indisvalid = false, it is supposed to avoid
>> using the index for queries but still make new index entries when a
>> write operation happens - but to drop an index, I think you'd need to
>> get into a state where no one was using the index for anything at all.
>
> ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Peter Eisentraut
On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
> I don't see how setting indisvalid to false helps with this, because
> IIUC when a session sees indisvalid = false, it is supposed to avoid
> using the index for queries but still make new index entries when a
> write operation happens - but to drop an index, I think you'd need to
> get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Robert Haas
On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs  wrote:
> On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut  wrote:
>> On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
>>> I was poking around at tablecmds and index.c and wonder if a similar
>>> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
>>> create a DROP INDEX CONCURRENTLY, and if there would be any interest
>>> in accepting such a patch.
>>
>> Hmm, it seems I just independently came up with this same concept.  My
>> problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
>> exclusive lock on the table just to clean that up.  If the table is
>> under constant load, you can't easily do that.  So a two-pass DROP INDEX
>> CONCURRENTLY might have been helpful for me.
>
> Here's a patch for this. Please review.

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

Maybe we need to change indisvalid to a "char" and make it three
valued: c = being created currently, v = valid, d = being dropped
concurrently, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-08 Thread Simon Riggs
On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs  wrote:

> Not having a freelist at all is probably a simpler way of avoiding the
> lock contention, so I'll happily back that suggestion instead. Patch
> attached, previous patch revoked.

v2 attached with cleanup of some random stuff that crept onto patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf->freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf->io_in_progress_lock = LWLockAssign();
 			buf->content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl->numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Run the "clock sweep" algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInitFreelistBuffer(volatile BufferDesc *buf)
+{
+#ifdef USE_BUFMGR_FREELIST
+	/*
+	 * Initially link all the buffers together as unused. Subsequent
+	 * management of this list is done by freelist.c.
+	 */
+	buf->freeNext = i + 1;
+#else
+	buf->freeNext = FREENEXT_NOT_IN_LIST;
+#endif
 }
 
 /*
@@ -331,12 +356,14 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
+#ifdef USE_BUFMGR_FREELIST
 		/*
 		 * Grab the whole linked list of free buffers for our strategy. We
 		 * assume it was previously set up by InitBufferPool().
 		 */
 		StrategyControl->firstFreeBuffer = 0;
 		StrategyControl->lastFreeBuffer = NBuffers - 1;
+#endif
 
 		/* Initialize the clock sweep pointer */
 		StrategyControl->nextVictimBuffer = 0;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e43719e..6a03d5d 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -190,6 +190,8 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern void StrategyInitFreelistBuffer(volatile BufferDesc *buf);
+
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-04 Thread Simon Riggs
On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
>
>> Couldn't we just leave the buffers alone? Once an index is dropped and 
>> that's pushed out through the catalog then nothing should be trying to 
>> access them and they'll eventually just get aged out.
>
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file.  It's
> important that we kill the buffers immediately during relation drop.
>
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

> Simon is after a different solution involving getting
> rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf->freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf->io_in_progress_lock = LWLockAssign();
 			buf->content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl->numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Run the "clock sweep" algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; decrement the usage_count (unless pinned) and keep scanning.
 		 */
-		LockBufHdr(buf);
 		if (buf->refcount == 0)
 		{
-			if (buf->usage_count > 0)
+			if (buf->usage_count > StrategyControl->completePasses)
 			{
 buf->usage_count--;
 trycounter = NBuffers;
 			}
 			else
 			{
-/* Found a usable buffer */
-if (strategy != NULL)
-	AddBufferToRing(strategy, buf);
-return buf;
+LockBufHdr(buf);
+if (buf->refcount == 0)
+{
+	UnlockBufHdr(buf);
+	/* Found a usable buffer */
+	if (strategy != NULL)
+		AddBufferToRing(strategy, buf);
+	return buf;
+}
+UnlockBufHdr(buf);
 			}
 		}
 		else if (--trycounter == 0)
@@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 			 * probably better to fail than to risk getting stuck in an
 			 * infinite loop.
 			 */
-			UnlockBufHdr(buf);
 			elog(ERROR, "no unpinned buffers available");
 		}
-		UnlockBufHdr(buf);
 	}
 
 	/* not reached */
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInit

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
> On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane  wrote:
>> Jim Nasby  writes:
>>> Yeah, but the problem we run into is that with every backend trying to run 
>>> the clock on it's own we end up with high contention again... it's just in 
>>> a different place than when we had a true LRU. The clock sweep might be 
>>> cheaper than the linked list was, but it's still awfully expensive. I 
>>> believe our best bet is to have a free list that is actually useful in 
>>> normal operations, and then optimize the cost of pulling buffers out of 
>>> that list as much as possible (and let the bgwriter deal with keeping 
>>> enough pages in that list to satisfy demand).
>> 
>> Well, maybe, but I think the historical evidence suggests that that
>> approach will be a loser, simply because no matter how cheap, the
>> freelist will remain a centralized and heavily contended data structure.
>> IMO we need to be looking for a mechanism that has no single point of
>> contention, and modifying the clock sweep rules looks like the best way
>> to get there.
>> 
>> Still, he who wants to do the work can try whatever approach he feels
>> like.
> 
> It might be possible to partition the free list.  So imagine, for
> example, 8 free lists.  The background writer runs the clock sweep and
> finds some buffers that are about ready to be reallocated and
> distributes one-eighth of them to each free list.  Then, when a
> backend wants to allocate a buffer, it picks a free list somehow
> (round robin?) and pulls a buffer off it.  If the buffer turns out to
> have been used since it was added to the free list, we give up on it
> and try again.  This hopefully shouldn't happen too often, though, as
> long as we only add enough buffers to the free list to satisfy the
> requests that we expect to get over the next
> small-fraction-of-a-second.
> 
> Of course you have to think about what happens if you find that your
> chosen free list is empty.  In that case you probably want to try a
> different one, and if in the worst case where they're all empty, run
> the clock sweep in the foreground.  You probably also want to kick the
> background writer in the pants if even a single one is empty (or
> nearly empty?) and tell it to hurry up and add some more buffers to
> the freelist.

If it comes down to it, we can look at partitioning the free list. But here's 
the thing: this is the strategy that FreeBSD (and I think now Linux as well) 
use to service memory requests, be they for free memory or for reading data 
from disk. If it's good enough for an OS to use, I would expect we could make 
it work as well.

I would expect that pulling a page off the free list would be an extremely fast 
operation... lock the read lock on the list (we should probably have separate 
locks for putting stuff on the list vs taking it off), pin the buffer (which 
shouldn't be contentious), update the read pointer and drop the read lock. Even 
in C code that's probably less than 100 machine instructions, and no looping. 
Compared to the cost of running a clock sweep it should be significantly 
faster. So while there will be contention on the free list read lock, none of 
that contention should last very long.

Do we have any other places where we take extremely short locks like this and 
still run into problems?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> Yeah, but the problem we run into is that with every backend trying to run 
>> the clock on it's own we end up with high contention again... it's just in a 
>> different place than when we had a true LRU. The clock sweep might be 
>> cheaper than the linked list was, but it's still awfully expensive. I 
>> believe our best bet is to have a free list that is actually useful in 
>> normal operations, and then optimize the cost of pulling buffers out of that 
>> list as much as possible (and let the bgwriter deal with keeping enough 
>> pages in that list to satisfy demand).
>
> Well, maybe, but I think the historical evidence suggests that that
> approach will be a loser, simply because no matter how cheap, the
> freelist will remain a centralized and heavily contended data structure.
> IMO we need to be looking for a mechanism that has no single point of
> contention, and modifying the clock sweep rules looks like the best way
> to get there.
>
> Still, he who wants to do the work can try whatever approach he feels
> like.

It might be possible to partition the free list.  So imagine, for
example, 8 free lists.  The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list.  Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it.  If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again.  This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty.  In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground.  You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane
Jim Nasby  writes:
> Yeah, but the problem we run into is that with every backend trying to run 
> the clock on it's own we end up with high contention again... it's just in a 
> different place than when we had a true LRU. The clock sweep might be cheaper 
> than the linked list was, but it's still awfully expensive. I believe our 
> best bet is to have a free list that is actually useful in normal operations, 
> and then optimize the cost of pulling buffers out of that list as much as 
> possible (and let the bgwriter deal with keeping enough pages in that list to 
> satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
> Jim Nasby  writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
> 
>> Couldn't we just leave the buffers alone? Once an index is dropped and 
>> that's pushed out through the catalog then nothing should be trying to 
>> access them and they'll eventually just get aged out.
> 
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file.  It's
> important that we kill the buffers immediately during relation drop.
> 
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist.  Simon is after a different solution involving getting
> rid of the clock sweep, but he has failed to explain how that's not
> going to end up being the same type of contention-prone coding that we
> got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
> takes a lot of spinlocks, but that only matters if there is contention
> for them, and the sweep approach avoids the need for a centralized data
> structure.

Yeah, but the problem we run into is that with every backend trying to run the 
clock on it's own we end up with high contention again... it's just in a 
different place than when we had a true LRU. The clock sweep might be cheaper 
than the linked list was, but it's still awfully expensive. I believe our best 
bet is to have a free list that is actually useful in normal operations, and 
then optimize the cost of pulling buffers out of that list as much as possible 
(and let the bgwriter deal with keeping enough pages in that list to satisfy 
demand).

Heh, it occurs to me that the SQL analogy for how things work right now is that 
backends currently have to run a SeqScan (or 5) to find a free page... what we 
need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.

> (BTW, do we have a separate clock sweep hand for each backend?  If not,
> there might be some low hanging fruit there.)

No... having multiple clock hands is an interesting idea, but I'm worried that 
it could potentially get us into trouble if scores of backends were suddenly 
decrementing usage counts all over the place. For example, what if 5 backends 
all had their hands in basically the same place, all pointing at a very heavily 
used buffer. All 5 backends go for free space, they each grab the spinlock on 
that buffer in succession and suddenly this highly used buffer that started 
with a count of 5 has now been freed. We could potentially use more than one 
hand, but I think the relation between the number of hands and the maximum 
usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane
Jim Nasby  writes:
> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>> This could well be related to the fact that DropRelFileNodeBuffers()
>> does a scan of shared_buffers, which is an O(N) approach no matter the
>> size of the index.

> Couldn't we just leave the buffers alone? Once an index is dropped and that's 
> pushed out through the catalog then nothing should be trying to access them 
> and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.  Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

(BTW, do we have a separate clock sweep hand for each backend?  If not,
there might be some low hanging fruit there.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
> This could well be related to the fact that DropRelFileNodeBuffers()
> does a scan of shared_buffers, which is an O(N) approach no matter the
> size of the index.
> 
> On top of that, taking what Robert Haas mentioned on another thread,
> InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
> an ExclusiveLock on the BufFreelistLock. On a busy system this will be
> heavily contended, so adding blocks to the freelist only if the lock
> is free seems warranted.

Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.

In fact, IIRC the function that scans for buffers actually checks to see if a 
rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Simon Riggs
On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina  wrote:
> On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina  wrote:
>> On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane  wrote:
>>> Assuming the issue really is the physical unlinks (which I agree I'd
>>> like to see some evidence for), I wonder whether the problem could be
>>> addressed by moving smgrDoPendingDeletes() to after locks are released,
>>> instead of before, in CommitTransaction/AbortTransaction.  There does
>>> not seem to be any strong reason why we have to do that before lock
>>> release, since incoming potential users of a table should not be trying
>>> to access the old physical storage after that anyway.
>>
>> Alright, since this concern about confirming the expensive part of
>> index dropping has come up a few times but otherwise the waters are
>> warm, I'll go ahead and do some work to pin things down a bit before
>> we continue working on those assumptions.
>>
>
> This suspicion seems to be proven correct; there came an opportunity
> where we were removing some indexes on a live system and I took the
> opportunity to carefully control and time the process.  There's not
> much relationship between size of the index and the delay, but the
> pauses are still very real. On the other hand, the first time this was
> noticed there was significantly higher load.
>
> I'd still like to do something to solve this problem, though: even if
> the time-consuming part of the process is not file unlinking, it's
> clearly something after the AccessExclusiveLock is acquired based on
> our other measurements.

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..af7215f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -218,12 +218,21 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 }
 
 /*
- * StrategyFreeBuffer: put a buffer on the freelist
+ * StrategyFreeBuffer: put a buffer on the freelist, unless we're busy
  */
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	/*
+	 * The buffer is already invalidated and is now an allocation target.
+	 * Adding buffers back onto the freelist is an optimisation only,
+	 * so we can decide to skip this step if the lock is busy.
+	 * This improves the speed of dropping indexes and tables on a busy system.
+	 * If the system is busy the newly invalidated buffers will be reallocated
+	 * within one clock sweep.
+	 */
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+		return;
 
 	/*
 	 * It is possible that we are told to put something in the freelist that

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-12-31 Thread Simon Riggs
On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut  wrote:
> On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
>> I was poking around at tablecmds and index.c and wonder if a similar
>> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
>> create a DROP INDEX CONCURRENTLY, and if there would be any interest
>> in accepting such a patch.
>
> Hmm, it seems I just independently came up with this same concept.  My
> problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
> exclusive lock on the table just to clean that up.  If the table is
> under constant load, you can't easily do that.  So a two-pass DROP INDEX
> CONCURRENTLY might have been helpful for me.

Here's a patch for this. Please review.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..a5abb5b 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  
 
-DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY name
 
  
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, ..

 

+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  invalid and then commit the change. Next we wait until there are no
+  users locking the table who can see the invalid index.
+ 
+ 
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the CONCURRENTLY
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular DROP INDEX command can be performed within
+  a transaction block, but DROP INDEX CONCURRENTLY cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 223c097..656af21 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,8 +171,9 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   DropBehavior behavior,
 	   int msglevel,
 	   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+			int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
 static void AcquireDeletionLock(const ObjectAddress *object);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 		 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+		 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, option_flags);
 	}
 
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier if doing a drop concurrently
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -407,7 +419,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 		if (thisextra->flags & DEPFLAG_ORIGINAL)
 			continue;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -950,7 +962,7 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
  * depRel is the already-open pg_depend relation.
  */
 static void
-deleteOneObject(const ObjectAddress *object, Relation depRel)
+deleteOneObject(const ObjectAddress *object, Relation depRel, int option_flags)
 {
 	ScanKeyData key[3];
 	int			nkeys;
@@ -1001,9 +1013,16 @@ deleteOneObject(const 

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-12-30 Thread Peter Eisentraut
On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
> recently when frobbing around some indexes I realized that there is no
> equivalent for DROP INDEX, and this is a similar but lesser problem
> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
> EXCLUSIVE lock on the parent table while doing the work to unlink
> files, which nominally one would think to be trivial, but I assure you
> it is not at times for even indexes that are a handful of gigabytes
> (let's say ~=< a dozen).  By non-trivial, I mean it can take 30+
> seconds, but less than a couple of minutes.  The storage layer
> (starting from the higher levels of abstraction) are XFS, a somewhat
> trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).
> 
> I was poking around at tablecmds and index.c and wonder if a similar
> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
> create a DROP INDEX CONCURRENTLY, and if there would be any interest
> in accepting such a patch.

Hmm, it seems I just independently came up with this same concept.  My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up.  If the table is
under constant load, you can't easily do that.  So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-09-09 Thread Daniel Farina
On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina  wrote:
> On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane  wrote:
>> Assuming the issue really is the physical unlinks (which I agree I'd
>> like to see some evidence for), I wonder whether the problem could be
>> addressed by moving smgrDoPendingDeletes() to after locks are released,
>> instead of before, in CommitTransaction/AbortTransaction.  There does
>> not seem to be any strong reason why we have to do that before lock
>> release, since incoming potential users of a table should not be trying
>> to access the old physical storage after that anyway.
>
> Alright, since this concern about confirming the expensive part of
> index dropping has come up a few times but otherwise the waters are
> warm, I'll go ahead and do some work to pin things down a bit before
> we continue working on those assumptions.
>

This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process.  There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.

I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.

Back to the drawing board...

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Daniel Farina
On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina  wrote:
>>> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
>>> recently when frobbing around some indexes I realized that there is no
>>> equivalent for DROP INDEX, and this is a similar but lesser problem
>>> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
>>> EXCLUSIVE lock on the parent table while doing the work to unlink
>>> files, which nominally one would think to be trivial, but I assure you
>>> it is not at times for even indexes that are a handful of gigabytes
>>> (let's say ~=< a dozen).
>
>> Are you sure that you are really waiting on the time to unlink the
>> file?  there's other stuff going on in there like waiting for lock,
>> plan invalidation, etc.  Point being, maybe the time consuming stuff
>> can't really be deferred which would make the proposal moot.
>
> Assuming the issue really is the physical unlinks (which I agree I'd
> like to see some evidence for), I wonder whether the problem could be
> addressed by moving smgrDoPendingDeletes() to after locks are released,
> instead of before, in CommitTransaction/AbortTransaction.  There does
> not seem to be any strong reason why we have to do that before lock
> release, since incoming potential users of a table should not be trying
> to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Tom Lane
Merlin Moncure  writes:
> On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina  wrote:
>> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
>> recently when frobbing around some indexes I realized that there is no
>> equivalent for DROP INDEX, and this is a similar but lesser problem
>> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
>> EXCLUSIVE lock on the parent table while doing the work to unlink
>> files, which nominally one would think to be trivial, but I assure you
>> it is not at times for even indexes that are a handful of gigabytes
>> (let's say ~=< a dozen).

> Are you sure that you are really waiting on the time to unlink the
> file?  there's other stuff going on in there like waiting for lock,
> plan invalidation, etc.  Point being, maybe the time consuming stuff
> can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction.  There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Robert Haas
On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina  wrote:
> Could I make the ACCESS EXCLUSIVE section just long enough to commit
> catalog updates, and then have the bulk of the work happen afterwards?
>
> The general idea is:
>
> 1) set an index as "invalid", to ensure no backend will use it in planning
> 2) wait for the xmin horizon to advance to ensure no open snapshots
> that may not see the invalidation of the index are gone (is there a
> way to tighten that up? although even this conservative version would
> be 80-90% of the value for us...)
> 3) then use performDeletions without taking a lock on the parent
> table, similar to what's in tablecmds.c already.
>
> A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
> instead of waiting for statement confirmation, just like CREATE INDEX
> CONCURRENTLY.

This might be a dumb idea, but could we rearrange CommitTransaction()
so that smgrDoPendingDeletes() happens just a bit further down, after
those ResourceOwnerRelease() calls?  It seems like that might
accomplish what you're trying to do here without needing a new
command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Merlin Moncure
On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina  wrote:
> Hello list,
>
> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
> recently when frobbing around some indexes I realized that there is no
> equivalent for DROP INDEX, and this is a similar but lesser problem
> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
> EXCLUSIVE lock on the parent table while doing the work to unlink
> files, which nominally one would think to be trivial, but I assure you
> it is not at times for even indexes that are a handful of gigabytes
> (let's say ~=< a dozen).  By non-trivial, I mean it can take 30+
> seconds, but less than a couple of minutes.  The storage layer
> (starting from the higher levels of abstraction) are XFS, a somewhat
> trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

Are you sure that you are really waiting on the time to unlink the
file?  there's other stuff going on in there like waiting for lock,
plan invalidation, etc.  Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Daniel Farina
Hello list,

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).  By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes.  The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Quoth index.c:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent 
table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?

The general idea is:

1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.

A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers