Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Fri, Feb 3, 2012 at 10:28 AM, Robert Haas wrote: > On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs wrote: >> On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut wrote: >>> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote: Patch now locks index in AccessExclusiveLock in final stage of drop. >>> >>> Doesn't that defeat the point of doing the CONCURRENTLY business in the >>> first place? >> >> That was my initial reaction. >> >> We lock the index in AccessExclusiveLock only once we are certain >> nobody else is looking at it any more. >> >> So its a Kansas City Shuffle, with safe locking in case of people >> doing strange low level things. > > Yeah, I think this is much safer, and in this version that doesn't > seem to harm concurrency. > > Given our previous experiences in this area, I wouldn't like to bet my > life savings on this having no remaining bugs - but if it does, I > can't find them. > > I'll mark this "Ready for Committer". I don't think this has been committed - does that mean you've decided against doing so, or just haven't had the round tuits? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sun, Jan 8, 2012 at 8:19 AM, Simon Riggs wrote: > On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs wrote: > >> Not having a freelist at all is probably a simpler way of avoiding the >> lock contention, so I'll happily back that suggestion instead. Patch >> attached, previous patch revoked. > > v2 attached with cleanup of some random stuff that crept onto patch. Hi Simon, Based on the way this patch leaves the old code behind (using #ifdef), this looks more like a WIP patch which you want people to do performance testing with, rather than patch proposed for committing. If that is the case, could you outline the type of performance testing where you think it would make a difference (and whether it should be done on top of the main patch from this thread, the concurrent index drop one)? Also, it would be much easier to do the performance testing if this behavior was controlled by a temporarily added GUC, rather than an #ifdef. Do you think it is feasible to do that, or would the overhead of a single "if (some_guc)" per StrategyGetBuffer and StrategyFreeBuffer call distort things too much? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs wrote: > On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut wrote: >> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote: >>> Patch now locks index in AccessExclusiveLock in final stage of drop. >> >> Doesn't that defeat the point of doing the CONCURRENTLY business in the >> first place? > > That was my initial reaction. > > We lock the index in AccessExclusiveLock only once we are certain > nobody else is looking at it any more. > > So its a Kansas City Shuffle, with safe locking in case of people > doing strange low level things. Yeah, I think this is much safer, and in this version that doesn't seem to harm concurrency. Given our previous experiences in this area, I wouldn't like to bet my life savings on this having no remaining bugs - but if it does, I can't find them. I'll mark this "Ready for Committer". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut wrote: > On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote: >> Patch now locks index in AccessExclusiveLock in final stage of drop. > > Doesn't that defeat the point of doing the CONCURRENTLY business in the > first place? That was my initial reaction. We lock the index in AccessExclusiveLock only once we are certain nobody else is looking at it any more. So its a Kansas City Shuffle, with safe locking in case of people doing strange low level things. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote: > Patch now locks index in AccessExclusiveLock in final stage of drop. Doesn't that defeat the point of doing the CONCURRENTLY business in the first place? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Feb 1, 2012 at 2:56 AM, Robert Haas wrote: > I improved the grammar issues in the attached version of the patch - > the syntax is now simpler and more consistent, IF EXISTS now works, > and RESTRICT is accepted (without changing the behavior) while CASCADE > fails with a nicer error message. I also fixed a bug in > RangeVarCallbackForDropRelation. Plus tests as well. Many thanks. I fixed the main bug you observed and your test case now works perfectly. I used pgbench to continuously drop/create an index, so a little more than manual testing. v5 Attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml index 7177ef2..343f7ac 100644 --- a/doc/src/sgml/ref/drop_index.sgml +++ b/doc/src/sgml/ref/drop_index.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] +DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] @@ -50,6 +50,29 @@ DROP INDEX [ IF EXISTS ] name [, .. +CONCURRENTLY + + + When this option is used, PostgreSQL will drop the + index without taking any locks that prevent concurrent selects, inserts, + updates, or deletes on the table; whereas a standard index drop + waits for a lock that locks out everything on the table until it's done. + Concurrent drop index is a two stage process. First, we mark the index + both invalid and not ready then commit the change. Next we wait until + there are no users locking the table who can see the index. + + + There are several caveats to be aware of when using this option. + Only one index name can be specified if the CONCURRENTLY + parameter is specified. Regular DROP INDEX command can be + performed within a transaction block, but + DROP INDEX CONCURRENTLY cannot. + The CASCADE option is not supported when dropping an index concurrently. + + + + + name diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index db86262..58d8f5c 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects, const ObjectAddress *origObject); static void deleteOneObject(const ObjectAddress *object, Relation depRel, int32 flags); -static void doDeletion(const ObjectAddress *object); -static void AcquireDeletionLock(const ObjectAddress *object); +static void doDeletion(const ObjectAddress *object, int flags); +static void AcquireDeletionLock(const ObjectAddress *object, int flags); static void ReleaseDeletionLock(const ObjectAddress *object); static bool find_expr_references_walker(Node *node, find_expr_references_context *context); @@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object, * Acquire deletion lock on the target object. (Ideally the caller has * done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(object); + AcquireDeletionLock(object, 0); /* * Construct a list of objects to delete (ie, the given object plus @@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects, * Acquire deletion lock on each target object. (Ideally the caller * has done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(thisobj); + AcquireDeletionLock(thisobj, flags); findDependentObjects(thisobj, DEPFLAG_ORIGINAL, @@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects, /* And clean up */ free_object_addresses(targetObjects); - heap_close(depRel, RowExclusiveLock); + /* + * We closed depRel earlier in deleteOneObject if doing a drop concurrently + */ + if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY) + heap_close(depRel, RowExclusiveLock); } /* @@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object, * Acquire deletion lock on the target object. (Ideally the caller has * done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(object); + AcquireDeletionLock(object, 0); /* * Construct a list of objects to delete (ie, the given object plus @@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object, * deletion of the owning object.) */ ReleaseDeletionLock(object); -AcquireDeletionLock(&otherObject); +AcquireDeletionLock(&otherObject, 0); /* * The owning object might have been deleted while we waited @@ -706,7 +710,7 @@ findDependentObjects(const ObjectAddress *object, /* * Must lock the dependent object before recursing to it. */ - AcquireDeletionLock(&otherObject); + AcquireDeletionLock(&otherObject, 0); /* * The dependent object mig
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs wrote: > I can't see any way that situation can occur. The patch *explicitly* > waits until all people that can see the index as usable have dropped > their lock. So I don't think this is necessary. Having said that, > since we are talking about the index and not the whole table, if I > believe the above statement then I can't have any reasonable objection > to doing as you suggest. > > Patch now locks index in AccessExclusiveLock in final stage of drop. > > v3 attached. > > If you have suggestions to improve grammar issues, they;re most > welcome. Otherwise this seems good to go. I improved the grammar issues in the attached version of the patch - the syntax is now simpler and more consistent, IF EXISTS now works, and RESTRICT is accepted (without changing the behavior) while CASCADE fails with a nicer error message. I also fixed a bug in RangeVarCallbackForDropRelation. However, testing reveals that this doesn't really work. I found that by playing around with a couple of concurrent sessions, I could get a SELECT statement to hang waiting on the AccessExclusiveLock in the second phase, because we're really still opening the relation (even though it's invalid) and thus still locking it, and thus still waiting for the AccessExclusiveLock. And I managed to get this: rhaas=# select * from foo where a = 1; ERROR: could not open relation with OID 16388 So then I tried changing the lock level back to ShareUpdateExclusiveLock. It appears that that merely narrows the window for errors of this type, rather than getting rid of them. I ran this in one window: pgbench -c 30 -j 30 -f f -n -T 180 where the file f contained this: select * from foo where a = 1 and then in the other window I repeatedly did this: rhaas=# create index foo_a on foo (a); CREATE INDEX rhaas=# drop index concurrently foo_a; DROP INDEX and pgbench started issuing messages like this: Client 2 aborted in state 0: ERROR: could not open relation with OID 16394 Client 9 aborted in state 0: ERROR: could not open relation with OID 16397 Client 18 aborted in state 0: ERROR: could not open relation with OID 16397 My conclusion is that regardless of whether ShareUpdateExclusiveLock or AccessExclusiveLock is used for the final phase of the drop, this isn't safe. I haven't tracked down exactly where the wheels are coming off, but the problem does not occur when using the non-current form of DROP INDEX. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company drop_index_concurrently.v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Thu, Jan 26, 2012 at 2:25 AM, Robert Haas wrote: > On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs wrote: >> Your caution is wise. All users of an index have already checked >> whether the index is usable at plan time, so although there is much >> code that assumes they can look at the index itself, that is not >> executed until after the correct checks. >> >> I'll look at VACUUM and other utilities, so thanks for that. >> >> I don't see much point in having the higher level lock, except perhaps >> as a test this code works. > > I thought of another way this can cause a problem: suppose that while > we're dropping the relation with only ShareUpdateExclusiveLock, we get > as far as calling DropRelFileNodeBuffers. Just after we finish, some > other process that holds AccessShareLock or RowShareLock or > RowExclusiveLock reads and perhaps even dirties a page in the > relation. I can't see any way that situation can occur. The patch *explicitly* waits until all people that can see the index as usable have dropped their lock. So I don't think this is necessary. Having said that, since we are talking about the index and not the whole table, if I believe the above statement then I can't have any reasonable objection to doing as you suggest. Patch now locks index in AccessExclusiveLock in final stage of drop. v3 attached. If you have suggestions to improve grammar issues, they;re most welcome. Otherwise this seems good to go. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml index 7177ef2..aeb1531 100644 --- a/doc/src/sgml/ref/drop_index.sgml +++ b/doc/src/sgml/ref/drop_index.sgml @@ -21,7 +21,9 @@ PostgreSQL documentation -DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] +DROP INDEX + [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] + CONCURRENTLY name @@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, .. +CONCURRENTLY + + + When this option is used, PostgreSQL will drop the + index without taking any locks that prevent concurrent selects, inserts, + updates, or deletes on the table; whereas a standard index drop + waits for a lock that locks out everything on the table until it's done. + Concurrent drop index is a two stage process. First, we mark the index + both invalid and not ready then commit the change. Next we wait until + there are no users locking the table who can see the index. + + + There are several caveats to be aware of when using this option. + Only one index name can be specified if the CONCURRENTLY + parameter is specified. Only one concurrent index drop can occur on a + table at a time and no modifications on the table are allowed meanwhile. + Regular DROP INDEX command can be performed within + a transaction block, but DROP INDEX CONCURRENTLY cannot. + There is no CASCADE option when dropping an index concurrently. + + + + + name diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index db86262..58d8f5c 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects, const ObjectAddress *origObject); static void deleteOneObject(const ObjectAddress *object, Relation depRel, int32 flags); -static void doDeletion(const ObjectAddress *object); -static void AcquireDeletionLock(const ObjectAddress *object); +static void doDeletion(const ObjectAddress *object, int flags); +static void AcquireDeletionLock(const ObjectAddress *object, int flags); static void ReleaseDeletionLock(const ObjectAddress *object); static bool find_expr_references_walker(Node *node, find_expr_references_context *context); @@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object, * Acquire deletion lock on the target object. (Ideally the caller has * done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(object); + AcquireDeletionLock(object, 0); /* * Construct a list of objects to delete (ie, the given object plus @@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects, * Acquire deletion lock on each target object. (Ideally the caller * has done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(thisobj); + AcquireDeletionLock(thisobj, flags); findDependentObjects(thisobj, DEPFLAG_ORIGINAL, @@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects, /* And clean up */ free_object_addresses(targetObjects); - heap_close(depRel, RowExclusiveLock); + /* + * We closed depRel earlier in deleteOneObject if doing a drop concurrently + */ + if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DE
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs wrote: > Your caution is wise. All users of an index have already checked > whether the index is usable at plan time, so although there is much > code that assumes they can look at the index itself, that is not > executed until after the correct checks. > > I'll look at VACUUM and other utilities, so thanks for that. > > I don't see much point in having the higher level lock, except perhaps > as a test this code works. I thought of another way this can cause a problem: suppose that while we're dropping the relation with only ShareUpdateExclusiveLock, we get as far as calling DropRelFileNodeBuffers. Just after we finish, some other process that holds AccessShareLock or RowShareLock or RowExclusiveLock reads and perhaps even dirties a page in the relation. Now we've got pages in the buffer pool that might even be dirty, but the backing file is truncated or perhaps removed altogether. If the page is dirty, I think the background writer will eventually choke trying to write out the page. If the page is not dirty, I'm less certain whether anything is going to explode violently, but it seems mighty sketchy to have pages in the buffer pool with a buffer tag that don't exist any more. As the comment says: * It is the responsibility of higher-level code to ensure that the * deletion or truncation does not lose any data that could be needed * later. It is also the responsibility of higher-level code to ensure * that no other process could be trying to load more pages of the * relation into buffers. ...and of course, the intended mechanism is an AccessExclusiveLock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Mon, Jan 23, 2012 at 4:01 PM, Tom Lane wrote: > The real > problem there is that BufFreelistLock is also used to protect the > clock sweep pointer. Agreed > I think basically we gotta find a way to allow > multiple backends to run clock sweeps concurrently. Or else fix > things so that the freelist never (well, hardly ever) runs dry. Agreed. The only question is what do we do now. I'm happy with the thought of jam tomorrow, I'd like to see some minor improvements now, given that when we say "we'll do that even better in the next release" it often doesn't happen. Which is where those patches come in. I've posted an improved lock wait analysis patch and have scheduled some runs on heavily used systems to see what this tells us. Results from live machines also greatly appreciated, since test systems seem likely to be inappropriate tests. Patch copied here again. This is a key issue since RAM is cheap enough now that people are swamped by it, so large shared_buffers are very common. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Mon, Jan 23, 2012 at 5:06 AM, Robert Haas wrote: > On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane wrote: >> Robert Haas writes: >>> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby wrote: We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work... >> >>> That's kinda my feeling as well. The free list in its current form is >>> pretty much useless, but I don't think we'll save much by getting rid >>> of it, because that's just a single test. The expensive part of what >>> we do while holding BufFreelistLock is, I think, iterating through >>> buffers taking and releasing a spinlock on each one (!). >> >> Yeah ... spinlocks that, by definition, will be uncontested. > > What makes you think that they are uncontested? It is automatically partitioned over 131,072 spinlocks if shared_buffers=1GB, so the chances of contention seem pretty low. If a few very hot index root blocks are heavily contested, still that would only be encountered a few times out of the 131,072. I guess we could count them, similar to your LWLOCK_STATS changes. > Or for that matter, > that even an uncontested spinlock operation is cheap enough to do > while holding a badly contended LWLock? Is the concern that getting a uncontested spinlock has lots of instructions, or that the associated fences are expensive? > >> So I think >> it would be advisable to prove rather than just assume that that's a >> problem. > > It's pretty trivial to prove that there is a very serious problem with > BufFreelistLock. I'll admit I can't prove what the right fix is just > yet, and certainly measurement is warranted. From my own analysis and experiments, the mere act of getting the BufFreelistLock when it is contended is far more important than anything actually done while holding that lock. When the clock-sweep is done often enough that the lock is contended, then it is done often enough to keep the average usagecount low and so the average number of buffers visited during a clock sweep before finding a usable one was around 2.0. YMMV. Can we get some people who run busy high-CPU servers that churn the cache and think they may be getting hit with this problem post the results of this query: select usagecount, count(*) from pg_buffercache group by usagecount; Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Sat, Jan 21, 2012 at 2:29 PM, Jim Nasby wrote: > On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote: >> So, you're proposing that we remove freelist altogether? Sounds reasonable, >> but that needs to be performance tested somehow. I'm not sure what exactly >> the test should look like, but it probably should involve an OLTP workload, >> and large tables being created, populated to fill the cache with pages from >> the table, and dropped, while the OLTP workload is running. I'd imagine that >> the worst case scenario looks something like that. > > We should also look at having the freelist do something useful, instead of > just dropping it completely. Unfortunately that's probably more work... If the head and tail are both protected by BufFreelistLock, I'm pretty sure this will make things worse, not better. If we change to having head and tail protected by separate spinlocks, then I don't see how you can remove the last buffer from the list, or add a buffer to an empty list, without causing havoc. Does anyone have ideas for implementing a cross-platform, lock-free, FIFO linked list? Without that, I don't see how we are going to get anywhere on this approach. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane wrote: The expensive part of what we do while holding BufFreelistLock is, I think, iterating through buffers taking and releasing a spinlock on each one (!). > >>> Yeah ... spinlocks that, by definition, will be uncontested. > >> What makes you think that they are uncontested? > > Ah, never mind. I was thinking that we'd only be touching buffers that > were *on* the freelist, but of course this is incorrect. The real > problem there is that BufFreelistLock is also used to protect the > clock sweep pointer. I think basically we gotta find a way to allow > multiple backends to run clock sweeps concurrently. Or else fix > things so that the freelist never (well, hardly ever) runs dry. I'd come to the same conclusion myself. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
Robert Haas writes: > On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane wrote: >>> The expensive part of what >>> we do while holding BufFreelistLock is, I think, iterating through >>> buffers taking and releasing a spinlock on each one (!). >> Yeah ... spinlocks that, by definition, will be uncontested. > What makes you think that they are uncontested? Ah, never mind. I was thinking that we'd only be touching buffers that were *on* the freelist, but of course this is incorrect. The real problem there is that BufFreelistLock is also used to protect the clock sweep pointer. I think basically we gotta find a way to allow multiple backends to run clock sweeps concurrently. Or else fix things so that the freelist never (well, hardly ever) runs dry. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas wrote: > It's pretty trivial to prove that there is a very serious problem with > BufFreelistLock. I'll admit I can't prove what the right fix is just > yet, and certainly measurement is warranted. I agree there is a problem with BufFreelistLock (so please share your results with Heikki, who seems not to). As a result, I've published patches which reduce contention on that lock in various ways, all of which seem valid to me. There are lots of things we could have done for 9.2 but didn't, yet we have some things that did get done on the table right now so it would be useful to push those through immediately or at least defer discussion on other things until we get back around to this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane wrote: > Robert Haas writes: >> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby wrote: >>> We should also look at having the freelist do something useful, instead of >>> just dropping it completely. Unfortunately that's probably more work... > >> That's kinda my feeling as well. The free list in its current form is >> pretty much useless, but I don't think we'll save much by getting rid >> of it, because that's just a single test. The expensive part of what >> we do while holding BufFreelistLock is, I think, iterating through >> buffers taking and releasing a spinlock on each one (!). > > Yeah ... spinlocks that, by definition, will be uncontested. What makes you think that they are uncontested? Or for that matter, that even an uncontested spinlock operation is cheap enough to do while holding a badly contended LWLock? > So I think > it would be advisable to prove rather than just assume that that's a > problem. It's pretty trivial to prove that there is a very serious problem with BufFreelistLock. I'll admit I can't prove what the right fix is just yet, and certainly measurement is warranted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
Robert Haas writes: > On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby wrote: >> We should also look at having the freelist do something useful, instead of >> just dropping it completely. Unfortunately that's probably more work... > That's kinda my feeling as well. The free list in its current form is > pretty much useless, but I don't think we'll save much by getting rid > of it, because that's just a single test. The expensive part of what > we do while holding BufFreelistLock is, I think, iterating through > buffers taking and releasing a spinlock on each one (!). Yeah ... spinlocks that, by definition, will be uncontested. So I think it would be advisable to prove rather than just assume that that's a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby wrote: > We should also look at having the freelist do something useful, instead of > just dropping it completely. Unfortunately that's probably more work... That's kinda my feeling as well. The free list in its current form is pretty much useless, but I don't think we'll save much by getting rid of it, because that's just a single test. The expensive part of what we do while holding BufFreelistLock is, I think, iterating through buffers taking and releasing a spinlock on each one (!). So I guess my vote would be to leave it alone pending further study, and maybe remove it later if we can't find a way to rejigger it to be more useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote: > On 04.01.2012 13:14, Simon Riggs wrote: >> On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane wrote: >>> Jim Nasby writes: On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: > This could well be related to the fact that DropRelFileNodeBuffers() > does a scan of shared_buffers, which is an O(N) approach no matter the > size of the index. >>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out. >>> >>> No, we can't, because if they're still dirty then the bgwriter would >>> first try to write them to the no-longer-existing storage file. It's >>> important that we kill the buffers immediately during relation drop. >>> >>> I'm still thinking that it might be sufficient to mark the buffers >>> invalid and let the clock sweep find them, thereby eliminating the need >>> for a freelist. >> >> My patch puts things on the freelist only when it is free to do so. >> Not having a freelist at all is probably a simpler way of avoiding the >> lock contention, so I'll happily back that suggestion instead. Patch >> attached, previous patch revoked. > > So, you're proposing that we remove freelist altogether? Sounds reasonable, > but that needs to be performance tested somehow. I'm not sure what exactly > the test should look like, but it probably should involve an OLTP workload, > and large tables being created, populated to fill the cache with pages from > the table, and dropped, while the OLTP workload is running. I'd imagine that > the worst case scenario looks something like that. We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas wrote: > On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs wrote: >> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas wrote: >>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs wrote: Can I just check with you that the only review comment is a one line change? Seems better to make any additional review comments in one go. >>> >>> No, I haven't done a full review yet - I just noticed that right at >>> the outset and wanted to be sure that got addressed. I can look it >>> over more carefully, and test it. >> >> Corrected your point, and another found during retest. >> >> Works as advertised, docs corrected. > > This comment needs updating: > > /* > * To drop an index safely, we must grab exclusive lock on its parent > * table. Exclusive lock on the index alone is insufficient because > * another backend might be about to execute a query on the parent table. > * If it relies on a previously cached list of index OIDs, then it could > * attempt to access the just-dropped index. We must therefore take a > * table lock strong enough to prevent all queries on the table from > * proceeding until we commit and send out a shared-cache-inval notice > * that will make them update their index lists. > */ OK, I did reword some but clearly not enough. > Speaking of that comment, why does a concurrent index drop need any > lock at all on the parent table? If, as the current comment text > says, the point of locking the parent table is to make sure that no > new backends can create relcache entries until our transaction > commits, then it's not needed here, or at least not for that purpose. > We're going to out-wait anyone who has the index open *after* we've > committed our changes to the pg_index entry, so there shouldn't be a > race condition there. There may very well be a reason why we need a > lock on the parent, or there may not, but that's not it. I feel happier locking the parent as well. I'd rather avoid strange weirdness later as has happened before in this type of discussion. > On the flip side, it strikes me that there's considerable danger that > ShareUpdateExclusiveLock on the index might not be strong enough. In > particular, it would fall down if there's anyone who takes an > AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds > to access the index data despite its being marked !indisready and > !indisvalid. There are certainly instances of this in contrib, such > as in pgstatindex(), which I'm pretty sure will abort with a nastygram > if somebody truncates away the file under it. That might not be > enough reason to reject the patch, but I do think it would be the only > place in the system where we allow something like that to happen. I'm > not sure whether there are any similar risks in core - I am a bit > worried about get_actual_variable_range() and gincostestimate(), but I > can't tell for sure whether there is any actual problem there, or > elsewhere. I wonder if we should only do the *first* phase of the > drop with ShareUpdateExcusliveLock, and then after outwaiting > everyone, take an AccessExclusiveLock on the index (but not the table) > to do the rest of the work. If that doesn't allow the drop to proceed > concurrently, then we go find the places that block against it and > teach them not to lock/use/examine indexes that are !indisready. That > way, the worst case is that D-I-C is less concurrent than we'd like, > rather than making random other things fall over - specifically, > anything that count on our traditional guarantee that AccessShareLock > is enough to keep the file from disappearing. Your caution is wise. All users of an index have already checked whether the index is usable at plan time, so although there is much code that assumes they can look at the index itself, that is not executed until after the correct checks. I'll look at VACUUM and other utilities, so thanks for that. I don't see much point in having the higher level lock, except perhaps as a test this code works. > In the department of minor nitpicks, the syntax synopsis you've added > looks wrong: > > DROP INDEX > [ IF EXISTS ] name > [, ...] [ CASCADE | RESTRICT ] > CONCURRENTLY name > > That implies that you may write if exists, followed by 1 or more > names, followed by cascade or restrict. After that you must write > CONCURRENTLY, followed by exactly one name. I think what you want is: > > DROP INDEX [ IF EXISTS ] class="PARAMETER">name [, ...] [ CASCADE | RESTRICT ] > DROP INDEX CONCURRENTLY name > > ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ] > CONCURRENTLY. It could also be very useful to be able to concurrently > drop multiple indexes at once, since that would require waiting out > all concurrent transactions only once instead of once per drop. > Either way, I think we should have only one syntax, and then reject > combinations we can't handle in t
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs wrote: > On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas wrote: >> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs wrote: >>> Can I just check with you that the only review comment is a one line >>> change? Seems better to make any additional review comments in one go. >> >> No, I haven't done a full review yet - I just noticed that right at >> the outset and wanted to be sure that got addressed. I can look it >> over more carefully, and test it. > > Corrected your point, and another found during retest. > > Works as advertised, docs corrected. This comment needs updating: /* * To drop an index safely, we must grab exclusive lock on its parent * table. Exclusive lock on the index alone is insufficient because * another backend might be about to execute a query on the parent table. * If it relies on a previously cached list of index OIDs, then it could * attempt to access the just-dropped index. We must therefore take a * table lock strong enough to prevent all queries on the table from * proceeding until we commit and send out a shared-cache-inval notice * that will make them update their index lists. */ Speaking of that comment, why does a concurrent index drop need any lock at all on the parent table? If, as the current comment text says, the point of locking the parent table is to make sure that no new backends can create relcache entries until our transaction commits, then it's not needed here, or at least not for that purpose. We're going to out-wait anyone who has the index open *after* we've committed our changes to the pg_index entry, so there shouldn't be a race condition there. There may very well be a reason why we need a lock on the parent, or there may not, but that's not it. On the flip side, it strikes me that there's considerable danger that ShareUpdateExclusiveLock on the index might not be strong enough. In particular, it would fall down if there's anyone who takes an AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds to access the index data despite its being marked !indisready and !indisvalid. There are certainly instances of this in contrib, such as in pgstatindex(), which I'm pretty sure will abort with a nastygram if somebody truncates away the file under it. That might not be enough reason to reject the patch, but I do think it would be the only place in the system where we allow something like that to happen. I'm not sure whether there are any similar risks in core - I am a bit worried about get_actual_variable_range() and gincostestimate(), but I can't tell for sure whether there is any actual problem there, or elsewhere. I wonder if we should only do the *first* phase of the drop with ShareUpdateExcusliveLock, and then after outwaiting everyone, take an AccessExclusiveLock on the index (but not the table) to do the rest of the work. If that doesn't allow the drop to proceed concurrently, then we go find the places that block against it and teach them not to lock/use/examine indexes that are !indisready. That way, the worst case is that D-I-C is less concurrent than we'd like, rather than making random other things fall over - specifically, anything that count on our traditional guarantee that AccessShareLock is enough to keep the file from disappearing. In the department of minor nitpicks, the syntax synopsis you've added looks wrong: DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] CONCURRENTLY name That implies that you may write if exists, followed by 1 or more names, followed by cascade or restrict. After that you must write CONCURRENTLY, followed by exactly one name. I think what you want is: DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] DROP INDEX CONCURRENTLY name ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ] CONCURRENTLY. It could also be very useful to be able to concurrently drop multiple indexes at once, since that would require waiting out all concurrent transactions only once instead of once per drop. Either way, I think we should have only one syntax, and then reject combinations we can't handle in the code. So I think we should have the parser accept: DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] name [, ...] [ CASCADE | RESTRICT ] And then, if you try to use cascade, or try to drop multiple indexes at once, we should throw an error saying that those options aren't supported in combination with CONCURRENTLY, rather than rejecting them as a syntax error at parse time. That gives a better error message and will make it easier for anyone who wants to extend this in the future - plus it will allow things like DROP INDEX CONCURRENTLY bob RESTRICT, which ought to be legal even if CASCADE isn't supported. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)
On 04.01.2012 13:14, Simon Riggs wrote: On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane wrote: Jim Nasby writes: On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: This could well be related to the fact that DropRelFileNodeBuffers() does a scan of shared_buffers, which is an O(N) approach no matter the size of the index. Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out. No, we can't, because if they're still dirty then the bgwriter would first try to write them to the no-longer-existing storage file. It's important that we kill the buffers immediately during relation drop. I'm still thinking that it might be sufficient to mark the buffers invalid and let the clock sweep find them, thereby eliminating the need for a freelist. My patch puts things on the freelist only when it is free to do so. Not having a freelist at all is probably a simpler way of avoiding the lock contention, so I'll happily back that suggestion instead. Patch attached, previous patch revoked. So, you're proposing that we remove freelist altogether? Sounds reasonable, but that needs to be performance tested somehow. I'm not sure what exactly the test should look like, but it probably should involve an OLTP workload, and large tables being created, populated to fill the cache with pages from the table, and dropped, while the OLTP workload is running. I'd imagine that the worst case scenario looks something like that. Removing the freelist should speed up the drop, but the question is how costly are the extra clock sweeps that the backends have to do. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas wrote: > On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs wrote: >> Can I just check with you that the only review comment is a one line >> change? Seems better to make any additional review comments in one go. > > No, I haven't done a full review yet - I just noticed that right at > the outset and wanted to be sure that got addressed. I can look it > over more carefully, and test it. Corrected your point, and another found during retest. Works as advertised, docs corrected. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml index 7177ef2..aeb1531 100644 --- a/doc/src/sgml/ref/drop_index.sgml +++ b/doc/src/sgml/ref/drop_index.sgml @@ -21,7 +21,9 @@ PostgreSQL documentation -DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] +DROP INDEX + [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] + CONCURRENTLY name @@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, .. +CONCURRENTLY + + + When this option is used, PostgreSQL will drop the + index without taking any locks that prevent concurrent selects, inserts, + updates, or deletes on the table; whereas a standard index drop + waits for a lock that locks out everything on the table until it's done. + Concurrent drop index is a two stage process. First, we mark the index + both invalid and not ready then commit the change. Next we wait until + there are no users locking the table who can see the index. + + + There are several caveats to be aware of when using this option. + Only one index name can be specified if the CONCURRENTLY + parameter is specified. Only one concurrent index drop can occur on a + table at a time and no modifications on the table are allowed meanwhile. + Regular DROP INDEX command can be performed within + a transaction block, but DROP INDEX CONCURRENTLY cannot. + There is no CASCADE option when dropping an index concurrently. + + + + + name diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 0b3d489..6884fdb 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects, DropBehavior behavior, int msglevel, const ObjectAddress *origObject); -static void deleteOneObject(const ObjectAddress *object, Relation depRel); -static void doDeletion(const ObjectAddress *object); -static void AcquireDeletionLock(const ObjectAddress *object); +static void deleteOneObject(const ObjectAddress *object, Relation depRel, + int option_flags); +static void doDeletion(const ObjectAddress *object, int option_flags); +static void AcquireDeletionLock(const ObjectAddress *object, int option_flags); static void ReleaseDeletionLock(const ObjectAddress *object); static bool find_expr_references_walker(Node *node, find_expr_references_context *context); @@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object, * Acquire deletion lock on the target object. (Ideally the caller has * done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(object); + AcquireDeletionLock(object, 0); /* * Construct a list of objects to delete (ie, the given object plus @@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object, { ObjectAddress *thisobj = targetObjects->refs + i; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, 0); } /* And clean up */ @@ -276,6 +277,13 @@ void performMultipleDeletions(const ObjectAddresses *objects, DropBehavior behavior) { + performMultipleDeletionsWithOptions(objects, behavior, 0); +} + +void +performMultipleDeletionsWithOptions(const ObjectAddresses *objects, + DropBehavior behavior, int option_flags) +{ Relation depRel; ObjectAddresses *targetObjects; int i; @@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects, * Acquire deletion lock on each target object. (Ideally the caller * has done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(thisobj); + AcquireDeletionLock(thisobj, option_flags); findDependentObjects(thisobj, DEPFLAG_ORIGINAL, @@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects, { ObjectAddress *thisobj = targetObjects->refs + i; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, option_flags); } /* And clean up */ free_object_addresses(targetObjects); - heap_close(depRel, RowExclusiveLock); + /* + * We closed depRel earlier in deleteOneObject if doing a drop concurrently + */ + if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY) + hea
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs wrote: > Can I just check with you that the only review comment is a one line > change? Seems better to make any additional review comments in one go. No, I haven't done a full review yet - I just noticed that right at the outset and wanted to be sure that got addressed. I can look it over more carefully, and test it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Jan 18, 2012 at 1:28 AM, Robert Haas wrote: > On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut wrote: >> On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote: >>> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut wrote: >>> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote: >>> >> I don't see how setting indisvalid to false helps with this, because >>> >> IIUC when a session sees indisvalid = false, it is supposed to avoid >>> >> using the index for queries but still make new index entries when a >>> >> write operation happens - but to drop an index, I think you'd need to >>> >> get into a state where no one was using the index for anything at all. >>> > >>> > ISTM that one would need to set indisready to false instead. >>> >>> Maybe we should set both to false? >> >> Well, ready = false and valid = true doesn't make any sense. There is >> only just-created -> ready -> valid. We might as well convert that to a >> single "char" column, as you had indicated in your earlier email. But >> that's independent of the proposed patch. > > Sure, but the point is that I think if you want everyone to stop > touching the index, you ought to mark it both not-valid and not-ready, > which the current patch doesn't do. Thanks for the review and the important observation. I agree that I've changed the wrong column. indisready must be set false. Also agree setting both false makes most sense. Can I just check with you that the only review comment is a one line change? Seems better to make any additional review comments in one go. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut wrote: > On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote: >> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut wrote: >> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote: >> >> I don't see how setting indisvalid to false helps with this, because >> >> IIUC when a session sees indisvalid = false, it is supposed to avoid >> >> using the index for queries but still make new index entries when a >> >> write operation happens - but to drop an index, I think you'd need to >> >> get into a state where no one was using the index for anything at all. >> > >> > ISTM that one would need to set indisready to false instead. >> >> Maybe we should set both to false? > > Well, ready = false and valid = true doesn't make any sense. There is > only just-created -> ready -> valid. We might as well convert that to a > single "char" column, as you had indicated in your earlier email. But > that's independent of the proposed patch. Sure, but the point is that I think if you want everyone to stop touching the index, you ought to mark it both not-valid and not-ready, which the current patch doesn't do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote: > On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut wrote: > > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote: > >> I don't see how setting indisvalid to false helps with this, because > >> IIUC when a session sees indisvalid = false, it is supposed to avoid > >> using the index for queries but still make new index entries when a > >> write operation happens - but to drop an index, I think you'd need to > >> get into a state where no one was using the index for anything at all. > > > > ISTM that one would need to set indisready to false instead. > > Maybe we should set both to false? Well, ready = false and valid = true doesn't make any sense. There is only just-created -> ready -> valid. We might as well convert that to a single "char" column, as you had indicated in your earlier email. But that's independent of the proposed patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut wrote: > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote: >> I don't see how setting indisvalid to false helps with this, because >> IIUC when a session sees indisvalid = false, it is supposed to avoid >> using the index for queries but still make new index entries when a >> write operation happens - but to drop an index, I think you'd need to >> get into a state where no one was using the index for anything at all. > > ISTM that one would need to set indisready to false instead. Maybe we should set both to false? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote: > I don't see how setting indisvalid to false helps with this, because > IIUC when a session sees indisvalid = false, it is supposed to avoid > using the index for queries but still make new index entries when a > write operation happens - but to drop an index, I think you'd need to > get into a state where no one was using the index for anything at all. ISTM that one would need to set indisready to false instead. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs wrote: > On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut wrote: >> On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote: >>> I was poking around at tablecmds and index.c and wonder if a similar >>> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to >>> create a DROP INDEX CONCURRENTLY, and if there would be any interest >>> in accepting such a patch. >> >> Hmm, it seems I just independently came up with this same concept. My >> problem is that if a CREATE INDEX CONCURRENTLY fails, you need an >> exclusive lock on the table just to clean that up. If the table is >> under constant load, you can't easily do that. So a two-pass DROP INDEX >> CONCURRENTLY might have been helpful for me. > > Here's a patch for this. Please review. I don't see how setting indisvalid to false helps with this, because IIUC when a session sees indisvalid = false, it is supposed to avoid using the index for queries but still make new index entries when a write operation happens - but to drop an index, I think you'd need to get into a state where no one was using the index for anything at all. Maybe we need to change indisvalid to a "char" and make it three valued: c = being created currently, v = valid, d = being dropped concurrently, or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs wrote: > Not having a freelist at all is probably a simpler way of avoiding the > lock contention, so I'll happily back that suggestion instead. Patch > attached, previous patch revoked. v2 attached with cleanup of some random stuff that crept onto patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 94cefba..9332a74 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -115,7 +115,7 @@ InitBufferPool(void) * Initially link all the buffers together as unused. Subsequent * management of this list is done by freelist.c. */ - buf->freeNext = i + 1; + StrategyInitFreelistBuffer(buf); buf->io_in_progress_lock = LWLockAssign(); buf->content_lock = LWLockAssign(); diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 3e62448..6b49cae 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -27,6 +27,7 @@ typedef struct /* Clock sweep hand: index of next buffer to consider grabbing */ int nextVictimBuffer; +#ifdef USE_BUFMGR_FREELIST int firstFreeBuffer; /* Head of list of unused buffers */ int lastFreeBuffer; /* Tail of list of unused buffers */ @@ -34,7 +35,7 @@ typedef struct * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is, * when the list is empty) */ - +#endif /* * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. @@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) */ StrategyControl->numBufferAllocs++; +#ifdef USE_BUFMGR_FREELIST /* * Try to get a buffer from the freelist. Note that the freeNext fields * are considered to be protected by the BufFreelistLock not the @@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } UnlockBufHdr(buf); } +#endif - /* Nothing on the freelist, so run the "clock sweep" algorithm */ + /* Run the "clock sweep" algorithm */ trycounter = NBuffers; for (;;) { @@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) void StrategyFreeBuffer(volatile BufferDesc *buf) { +#ifdef USE_BUFMGR_FREELIST LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); /* @@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf) } LWLockRelease(BufFreelistLock); +#endif +} + +/* + * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool + */ +void +StrategyInitFreelistBuffer(volatile BufferDesc *buf) +{ +#ifdef USE_BUFMGR_FREELIST + /* + * Initially link all the buffers together as unused. Subsequent + * management of this list is done by freelist.c. + */ + buf->freeNext = i + 1; +#else + buf->freeNext = FREENEXT_NOT_IN_LIST; +#endif } /* @@ -331,12 +356,14 @@ StrategyInitialize(bool init) */ Assert(init); +#ifdef USE_BUFMGR_FREELIST /* * Grab the whole linked list of free buffers for our strategy. We * assume it was previously set up by InitBufferPool(). */ StrategyControl->firstFreeBuffer = 0; StrategyControl->lastFreeBuffer = NBuffers - 1; +#endif /* Initialize the clock sweep pointer */ StrategyControl->nextVictimBuffer = 0; diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index e43719e..6a03d5d 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -190,6 +190,8 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy, extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc); extern Size StrategyShmemSize(void); extern void StrategyInitialize(bool init); +extern void StrategyInitFreelistBuffer(volatile BufferDesc *buf); + /* buf_table.c */ extern Size BufTableShmemSize(int size); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane wrote: > Jim Nasby writes: >> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: >>> This could well be related to the fact that DropRelFileNodeBuffers() >>> does a scan of shared_buffers, which is an O(N) approach no matter the >>> size of the index. > >> Couldn't we just leave the buffers alone? Once an index is dropped and >> that's pushed out through the catalog then nothing should be trying to >> access them and they'll eventually just get aged out. > > No, we can't, because if they're still dirty then the bgwriter would > first try to write them to the no-longer-existing storage file. It's > important that we kill the buffers immediately during relation drop. > > I'm still thinking that it might be sufficient to mark the buffers > invalid and let the clock sweep find them, thereby eliminating the need > for a freelist. My patch puts things on the freelist only when it is free to do so. Not having a freelist at all is probably a simpler way of avoiding the lock contention, so I'll happily back that suggestion instead. Patch attached, previous patch revoked. > Simon is after a different solution involving getting > rid of the clock sweep... err, No, he isn't. Not sure where that came from since I'm advocating only minor changes there to curb worst case behaviour. But lets discuss that on the main freelist thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 94cefba..9332a74 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -115,7 +115,7 @@ InitBufferPool(void) * Initially link all the buffers together as unused. Subsequent * management of this list is done by freelist.c. */ - buf->freeNext = i + 1; + StrategyInitFreelistBuffer(buf); buf->io_in_progress_lock = LWLockAssign(); buf->content_lock = LWLockAssign(); diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 3e62448..6b49cae 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -27,6 +27,7 @@ typedef struct /* Clock sweep hand: index of next buffer to consider grabbing */ int nextVictimBuffer; +#ifdef USE_BUFMGR_FREELIST int firstFreeBuffer; /* Head of list of unused buffers */ int lastFreeBuffer; /* Tail of list of unused buffers */ @@ -34,7 +35,7 @@ typedef struct * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is, * when the list is empty) */ - +#endif /* * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. @@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) */ StrategyControl->numBufferAllocs++; +#ifdef USE_BUFMGR_FREELIST /* * Try to get a buffer from the freelist. Note that the freeNext fields * are considered to be protected by the BufFreelistLock not the @@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } UnlockBufHdr(buf); } +#endif - /* Nothing on the freelist, so run the "clock sweep" algorithm */ + /* Run the "clock sweep" algorithm */ trycounter = NBuffers; for (;;) { @@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) * If the buffer is pinned or has a nonzero usage_count, we cannot use * it; decrement the usage_count (unless pinned) and keep scanning. */ - LockBufHdr(buf); if (buf->refcount == 0) { - if (buf->usage_count > 0) + if (buf->usage_count > StrategyControl->completePasses) { buf->usage_count--; trycounter = NBuffers; } else { -/* Found a usable buffer */ -if (strategy != NULL) - AddBufferToRing(strategy, buf); -return buf; +LockBufHdr(buf); +if (buf->refcount == 0) +{ + UnlockBufHdr(buf); + /* Found a usable buffer */ + if (strategy != NULL) + AddBufferToRing(strategy, buf); + return buf; +} +UnlockBufHdr(buf); } } else if (--trycounter == 0) @@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) * probably better to fail than to risk getting stuck in an * infinite loop. */ - UnlockBufHdr(buf); elog(ERROR, "no unpinned buffers available"); } - UnlockBufHdr(buf); } /* not reached */ @@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) void StrategyFreeBuffer(volatile BufferDesc *buf) { +#ifdef USE_BUFMGR_FREELIST LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); /* @@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf) } LWLockRelease(BufFreelistLock); +#endif +} + +/* + * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool + */ +void +StrategyInit
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Jan 3, 2012, at 7:34 PM, Robert Haas wrote: > On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane wrote: >> Jim Nasby writes: >>> Yeah, but the problem we run into is that with every backend trying to run >>> the clock on it's own we end up with high contention again... it's just in >>> a different place than when we had a true LRU. The clock sweep might be >>> cheaper than the linked list was, but it's still awfully expensive. I >>> believe our best bet is to have a free list that is actually useful in >>> normal operations, and then optimize the cost of pulling buffers out of >>> that list as much as possible (and let the bgwriter deal with keeping >>> enough pages in that list to satisfy demand). >> >> Well, maybe, but I think the historical evidence suggests that that >> approach will be a loser, simply because no matter how cheap, the >> freelist will remain a centralized and heavily contended data structure. >> IMO we need to be looking for a mechanism that has no single point of >> contention, and modifying the clock sweep rules looks like the best way >> to get there. >> >> Still, he who wants to do the work can try whatever approach he feels >> like. > > It might be possible to partition the free list. So imagine, for > example, 8 free lists. The background writer runs the clock sweep and > finds some buffers that are about ready to be reallocated and > distributes one-eighth of them to each free list. Then, when a > backend wants to allocate a buffer, it picks a free list somehow > (round robin?) and pulls a buffer off it. If the buffer turns out to > have been used since it was added to the free list, we give up on it > and try again. This hopefully shouldn't happen too often, though, as > long as we only add enough buffers to the free list to satisfy the > requests that we expect to get over the next > small-fraction-of-a-second. > > Of course you have to think about what happens if you find that your > chosen free list is empty. In that case you probably want to try a > different one, and if in the worst case where they're all empty, run > the clock sweep in the foreground. You probably also want to kick the > background writer in the pants if even a single one is empty (or > nearly empty?) and tell it to hurry up and add some more buffers to > the freelist. If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that FreeBSD (and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from disk. If it's good enough for an OS to use, I would expect we could make it work as well. I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the list (we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which shouldn't be contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100 machine instructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster. So while there will be contention on the free list read lock, none of that contention should last very long. Do we have any other places where we take extremely short locks like this and still run into problems? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane wrote: > Jim Nasby writes: >> Yeah, but the problem we run into is that with every backend trying to run >> the clock on it's own we end up with high contention again... it's just in a >> different place than when we had a true LRU. The clock sweep might be >> cheaper than the linked list was, but it's still awfully expensive. I >> believe our best bet is to have a free list that is actually useful in >> normal operations, and then optimize the cost of pulling buffers out of that >> list as much as possible (and let the bgwriter deal with keeping enough >> pages in that list to satisfy demand). > > Well, maybe, but I think the historical evidence suggests that that > approach will be a loser, simply because no matter how cheap, the > freelist will remain a centralized and heavily contended data structure. > IMO we need to be looking for a mechanism that has no single point of > contention, and modifying the clock sweep rules looks like the best way > to get there. > > Still, he who wants to do the work can try whatever approach he feels > like. It might be possible to partition the free list. So imagine, for example, 8 free lists. The background writer runs the clock sweep and finds some buffers that are about ready to be reallocated and distributes one-eighth of them to each free list. Then, when a backend wants to allocate a buffer, it picks a free list somehow (round robin?) and pulls a buffer off it. If the buffer turns out to have been used since it was added to the free list, we give up on it and try again. This hopefully shouldn't happen too often, though, as long as we only add enough buffers to the free list to satisfy the requests that we expect to get over the next small-fraction-of-a-second. Of course you have to think about what happens if you find that your chosen free list is empty. In that case you probably want to try a different one, and if in the worst case where they're all empty, run the clock sweep in the foreground. You probably also want to kick the background writer in the pants if even a single one is empty (or nearly empty?) and tell it to hurry up and add some more buffers to the freelist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
Jim Nasby writes: > Yeah, but the problem we run into is that with every backend trying to run > the clock on it's own we end up with high contention again... it's just in a > different place than when we had a true LRU. The clock sweep might be cheaper > than the linked list was, but it's still awfully expensive. I believe our > best bet is to have a free list that is actually useful in normal operations, > and then optimize the cost of pulling buffers out of that list as much as > possible (and let the bgwriter deal with keeping enough pages in that list to > satisfy demand). Well, maybe, but I think the historical evidence suggests that that approach will be a loser, simply because no matter how cheap, the freelist will remain a centralized and heavily contended data structure. IMO we need to be looking for a mechanism that has no single point of contention, and modifying the clock sweep rules looks like the best way to get there. Still, he who wants to do the work can try whatever approach he feels like. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Jan 3, 2012, at 5:28 PM, Tom Lane wrote: > Jim Nasby writes: >> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: >>> This could well be related to the fact that DropRelFileNodeBuffers() >>> does a scan of shared_buffers, which is an O(N) approach no matter the >>> size of the index. > >> Couldn't we just leave the buffers alone? Once an index is dropped and >> that's pushed out through the catalog then nothing should be trying to >> access them and they'll eventually just get aged out. > > No, we can't, because if they're still dirty then the bgwriter would > first try to write them to the no-longer-existing storage file. It's > important that we kill the buffers immediately during relation drop. > > I'm still thinking that it might be sufficient to mark the buffers > invalid and let the clock sweep find them, thereby eliminating the need > for a freelist. Simon is after a different solution involving getting > rid of the clock sweep, but he has failed to explain how that's not > going to end up being the same type of contention-prone coding that we > got rid of by adopting the clock sweep, some years ago. Yeah, the sweep > takes a lot of spinlocks, but that only matters if there is contention > for them, and the sweep approach avoids the need for a centralized data > structure. Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand). Heh, it occurs to me that the SQL analogy for how things work right now is that backends currently have to run a SeqScan (or 5) to find a free page... what we need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;. > (BTW, do we have a separate clock sweep hand for each backend? If not, > there might be some low hanging fruit there.) No... having multiple clock hands is an interesting idea, but I'm worried that it could potentially get us into trouble if scores of backends were suddenly decrementing usage counts all over the place. For example, what if 5 backends all had their hands in basically the same place, all pointing at a very heavily used buffer. All 5 backends go for free space, they each grab the spinlock on that buffer in succession and suddenly this highly used buffer that started with a count of 5 has now been freed. We could potentially use more than one hand, but I think the relation between the number of hands and the maximum usage count has to be tightly controlled. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
Jim Nasby writes: > On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: >> This could well be related to the fact that DropRelFileNodeBuffers() >> does a scan of shared_buffers, which is an O(N) approach no matter the >> size of the index. > Couldn't we just leave the buffers alone? Once an index is dropped and that's > pushed out through the catalog then nothing should be trying to access them > and they'll eventually just get aged out. No, we can't, because if they're still dirty then the bgwriter would first try to write them to the no-longer-existing storage file. It's important that we kill the buffers immediately during relation drop. I'm still thinking that it might be sufficient to mark the buffers invalid and let the clock sweep find them, thereby eliminating the need for a freelist. Simon is after a different solution involving getting rid of the clock sweep, but he has failed to explain how that's not going to end up being the same type of contention-prone coding that we got rid of by adopting the clock sweep, some years ago. Yeah, the sweep takes a lot of spinlocks, but that only matters if there is contention for them, and the sweep approach avoids the need for a centralized data structure. (BTW, do we have a separate clock sweep hand for each backend? If not, there might be some low hanging fruit there.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: > This could well be related to the fact that DropRelFileNodeBuffers() > does a scan of shared_buffers, which is an O(N) approach no matter the > size of the index. > > On top of that, taking what Robert Haas mentioned on another thread, > InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for > an ExclusiveLock on the BufFreelistLock. On a busy system this will be > heavily contended, so adding blocks to the freelist only if the lock > is free seems warranted. Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out. In fact, IIRC the function that scans for buffers actually checks to see if a rel still exists before it returns the buffer... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina wrote: > On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina wrote: >> On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane wrote: >>> Assuming the issue really is the physical unlinks (which I agree I'd >>> like to see some evidence for), I wonder whether the problem could be >>> addressed by moving smgrDoPendingDeletes() to after locks are released, >>> instead of before, in CommitTransaction/AbortTransaction. There does >>> not seem to be any strong reason why we have to do that before lock >>> release, since incoming potential users of a table should not be trying >>> to access the old physical storage after that anyway. >> >> Alright, since this concern about confirming the expensive part of >> index dropping has come up a few times but otherwise the waters are >> warm, I'll go ahead and do some work to pin things down a bit before >> we continue working on those assumptions. >> > > This suspicion seems to be proven correct; there came an opportunity > where we were removing some indexes on a live system and I took the > opportunity to carefully control and time the process. There's not > much relationship between size of the index and the delay, but the > pauses are still very real. On the other hand, the first time this was > noticed there was significantly higher load. > > I'd still like to do something to solve this problem, though: even if > the time-consuming part of the process is not file unlinking, it's > clearly something after the AccessExclusiveLock is acquired based on > our other measurements. This could well be related to the fact that DropRelFileNodeBuffers() does a scan of shared_buffers, which is an O(N) approach no matter the size of the index. On top of that, taking what Robert Haas mentioned on another thread, InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for an ExclusiveLock on the BufFreelistLock. On a busy system this will be heavily contended, so adding blocks to the freelist only if the lock is free seems warranted. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 3e62448..af7215f 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -218,12 +218,21 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } /* - * StrategyFreeBuffer: put a buffer on the freelist + * StrategyFreeBuffer: put a buffer on the freelist, unless we're busy */ void StrategyFreeBuffer(volatile BufferDesc *buf) { - LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); + /* + * The buffer is already invalidated and is now an allocation target. + * Adding buffers back onto the freelist is an optimisation only, + * so we can decide to skip this step if the lock is busy. + * This improves the speed of dropping indexes and tables on a busy system. + * If the system is busy the newly invalidated buffers will be reallocated + * within one clock sweep. + */ + if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE)) + return; /* * It is possible that we are told to put something in the freelist that -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut wrote: > On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote: >> I was poking around at tablecmds and index.c and wonder if a similar >> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to >> create a DROP INDEX CONCURRENTLY, and if there would be any interest >> in accepting such a patch. > > Hmm, it seems I just independently came up with this same concept. My > problem is that if a CREATE INDEX CONCURRENTLY fails, you need an > exclusive lock on the table just to clean that up. If the table is > under constant load, you can't easily do that. So a two-pass DROP INDEX > CONCURRENTLY might have been helpful for me. Here's a patch for this. Please review. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml index 7177ef2..a5abb5b 100644 --- a/doc/src/sgml/ref/drop_index.sgml +++ b/doc/src/sgml/ref/drop_index.sgml @@ -21,7 +21,9 @@ PostgreSQL documentation -DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] +DROP INDEX + [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] + CONCURRENTLY name @@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] name [, .. +CONCURRENTLY + + + When this option is used, PostgreSQL will drop the + index without taking any locks that prevent concurrent selects, inserts, + updates, or deletes on the table; whereas a standard index drop + waits for a lock that locks out everything on the table until it's done. + Concurrent drop index is a two stage process. First, we mark the index + invalid and then commit the change. Next we wait until there are no + users locking the table who can see the invalid index. + + + There are several caveats to be aware of when using this option. + Only one index name can be specified if the CONCURRENTLY + parameter is specified. Only one concurrent index drop can occur on a + table at a time and no modifications on the table are allowed meanwhile. + Regular DROP INDEX command can be performed within + a transaction block, but DROP INDEX CONCURRENTLY cannot. + There is no CASCADE option when dropping an index concurrently. + + + + + name diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 223c097..656af21 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -171,8 +171,9 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects, DropBehavior behavior, int msglevel, const ObjectAddress *origObject); -static void deleteOneObject(const ObjectAddress *object, Relation depRel); -static void doDeletion(const ObjectAddress *object); +static void deleteOneObject(const ObjectAddress *object, Relation depRel, + int option_flags); +static void doDeletion(const ObjectAddress *object, int option_flags); static void AcquireDeletionLock(const ObjectAddress *object); static void ReleaseDeletionLock(const ObjectAddress *object); static bool find_expr_references_walker(Node *node, @@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object, { ObjectAddress *thisobj = targetObjects->refs + i; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, 0); } /* And clean up */ @@ -276,6 +277,13 @@ void performMultipleDeletions(const ObjectAddresses *objects, DropBehavior behavior) { + performMultipleDeletionsWithOptions(objects, behavior, 0); +} + +void +performMultipleDeletionsWithOptions(const ObjectAddresses *objects, + DropBehavior behavior, int option_flags) +{ Relation depRel; ObjectAddresses *targetObjects; int i; @@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects, { ObjectAddress *thisobj = targetObjects->refs + i; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, option_flags); } /* And clean up */ free_object_addresses(targetObjects); - heap_close(depRel, RowExclusiveLock); + /* + * We closed depRel earlier if doing a drop concurrently + */ + if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY) + heap_close(depRel, RowExclusiveLock); } /* @@ -407,7 +419,7 @@ deleteWhatDependsOn(const ObjectAddress *object, if (thisextra->flags & DEPFLAG_ORIGINAL) continue; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, 0); } /* And clean up */ @@ -950,7 +962,7 @@ reportDependentObjects(const ObjectAddresses *targetObjects, * depRel is the already-open pg_depend relation. */ static void -deleteOneObject(const ObjectAddress *object, Relation depRel) +deleteOneObject(const ObjectAddress *object, Relation depRel, int option_flags) { ScanKeyData key[3]; int nkeys; @@ -1001,9 +1013,16 @@ deleteOneObject(const
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote: > At Heroku we use CREATE INDEX CONCURRENTLY with great success, but > recently when frobbing around some indexes I realized that there is no > equivalent for DROP INDEX, and this is a similar but lesser problem > (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS > EXCLUSIVE lock on the parent table while doing the work to unlink > files, which nominally one would think to be trivial, but I assure you > it is not at times for even indexes that are a handful of gigabytes > (let's say ~=< a dozen). By non-trivial, I mean it can take 30+ > seconds, but less than a couple of minutes. The storage layer > (starting from the higher levels of abstraction) are XFS, a somewhat > trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?). > > I was poking around at tablecmds and index.c and wonder if a similar > two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to > create a DROP INDEX CONCURRENTLY, and if there would be any interest > in accepting such a patch. Hmm, it seems I just independently came up with this same concept. My problem is that if a CREATE INDEX CONCURRENTLY fails, you need an exclusive lock on the table just to clean that up. If the table is under constant load, you can't easily do that. So a two-pass DROP INDEX CONCURRENTLY might have been helpful for me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina wrote: > On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane wrote: >> Assuming the issue really is the physical unlinks (which I agree I'd >> like to see some evidence for), I wonder whether the problem could be >> addressed by moving smgrDoPendingDeletes() to after locks are released, >> instead of before, in CommitTransaction/AbortTransaction. There does >> not seem to be any strong reason why we have to do that before lock >> release, since incoming potential users of a table should not be trying >> to access the old physical storage after that anyway. > > Alright, since this concern about confirming the expensive part of > index dropping has come up a few times but otherwise the waters are > warm, I'll go ahead and do some work to pin things down a bit before > we continue working on those assumptions. > This suspicion seems to be proven correct; there came an opportunity where we were removing some indexes on a live system and I took the opportunity to carefully control and time the process. There's not much relationship between size of the index and the delay, but the pauses are still very real. On the other hand, the first time this was noticed there was significantly higher load. I'd still like to do something to solve this problem, though: even if the time-consuming part of the process is not file unlinking, it's clearly something after the AccessExclusiveLock is acquired based on our other measurements. Back to the drawing board... -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane wrote: > Merlin Moncure writes: >> On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina wrote: >>> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but >>> recently when frobbing around some indexes I realized that there is no >>> equivalent for DROP INDEX, and this is a similar but lesser problem >>> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS >>> EXCLUSIVE lock on the parent table while doing the work to unlink >>> files, which nominally one would think to be trivial, but I assure you >>> it is not at times for even indexes that are a handful of gigabytes >>> (let's say ~=< a dozen). > >> Are you sure that you are really waiting on the time to unlink the >> file? there's other stuff going on in there like waiting for lock, >> plan invalidation, etc. Point being, maybe the time consuming stuff >> can't really be deferred which would make the proposal moot. > > Assuming the issue really is the physical unlinks (which I agree I'd > like to see some evidence for), I wonder whether the problem could be > addressed by moving smgrDoPendingDeletes() to after locks are released, > instead of before, in CommitTransaction/AbortTransaction. There does > not seem to be any strong reason why we have to do that before lock > release, since incoming potential users of a table should not be trying > to access the old physical storage after that anyway. Alright, since this concern about confirming the expensive part of index dropping has come up a few times but otherwise the waters are warm, I'll go ahead and do some work to pin things down a bit before we continue working on those assumptions. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
Merlin Moncure writes: > On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina wrote: >> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but >> recently when frobbing around some indexes I realized that there is no >> equivalent for DROP INDEX, and this is a similar but lesser problem >> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS >> EXCLUSIVE lock on the parent table while doing the work to unlink >> files, which nominally one would think to be trivial, but I assure you >> it is not at times for even indexes that are a handful of gigabytes >> (let's say ~=< a dozen). > Are you sure that you are really waiting on the time to unlink the > file? there's other stuff going on in there like waiting for lock, > plan invalidation, etc. Point being, maybe the time consuming stuff > can't really be deferred which would make the proposal moot. Assuming the issue really is the physical unlinks (which I agree I'd like to see some evidence for), I wonder whether the problem could be addressed by moving smgrDoPendingDeletes() to after locks are released, instead of before, in CommitTransaction/AbortTransaction. There does not seem to be any strong reason why we have to do that before lock release, since incoming potential users of a table should not be trying to access the old physical storage after that anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina wrote: > Could I make the ACCESS EXCLUSIVE section just long enough to commit > catalog updates, and then have the bulk of the work happen afterwards? > > The general idea is: > > 1) set an index as "invalid", to ensure no backend will use it in planning > 2) wait for the xmin horizon to advance to ensure no open snapshots > that may not see the invalidation of the index are gone (is there a > way to tighten that up? although even this conservative version would > be 80-90% of the value for us...) > 3) then use performDeletions without taking a lock on the parent > table, similar to what's in tablecmds.c already. > > A DROP INDEX CONCURRENTLY may leave an invalid index if aborted > instead of waiting for statement confirmation, just like CREATE INDEX > CONCURRENTLY. This might be a dumb idea, but could we rearrange CommitTransaction() so that smgrDoPendingDeletes() happens just a bit further down, after those ResourceOwnerRelease() calls? It seems like that might accomplish what you're trying to do here without needing a new command. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina wrote: > Hello list, > > At Heroku we use CREATE INDEX CONCURRENTLY with great success, but > recently when frobbing around some indexes I realized that there is no > equivalent for DROP INDEX, and this is a similar but lesser problem > (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS > EXCLUSIVE lock on the parent table while doing the work to unlink > files, which nominally one would think to be trivial, but I assure you > it is not at times for even indexes that are a handful of gigabytes > (let's say ~=< a dozen). By non-trivial, I mean it can take 30+ > seconds, but less than a couple of minutes. The storage layer > (starting from the higher levels of abstraction) are XFS, a somewhat > trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?). Are you sure that you are really waiting on the time to unlink the file? there's other stuff going on in there like waiting for lock, plan invalidation, etc. Point being, maybe the time consuming stuff can't really be deferred which would make the proposal moot. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should I implement DROP INDEX CONCURRENTLY?
Hello list, At Heroku we use CREATE INDEX CONCURRENTLY with great success, but recently when frobbing around some indexes I realized that there is no equivalent for DROP INDEX, and this is a similar but lesser problem (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS EXCLUSIVE lock on the parent table while doing the work to unlink files, which nominally one would think to be trivial, but I assure you it is not at times for even indexes that are a handful of gigabytes (let's say ~=< a dozen). By non-trivial, I mean it can take 30+ seconds, but less than a couple of minutes. The storage layer (starting from the higher levels of abstraction) are XFS, a somewhat trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?). I was poking around at tablecmds and index.c and wonder if a similar two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to create a DROP INDEX CONCURRENTLY, and if there would be any interest in accepting such a patch. Quoth index.c: /* * To drop an index safely, we must grab exclusive lock on its parent * table. Exclusive lock on the index alone is insufficient because * another backend might be about to execute a query on the parent table. * If it relies on a previously cached list of index OIDs, then it could * attempt to access the just-dropped index. We must therefore take a * table lock strong enough to prevent all queries on the table from * proceeding until we commit and send out a shared-cache-inval notice * that will make them update their index lists. */ Could I make the ACCESS EXCLUSIVE section just long enough to commit catalog updates, and then have the bulk of the work happen afterwards? The general idea is: 1) set an index as "invalid", to ensure no backend will use it in planning 2) wait for the xmin horizon to advance to ensure no open snapshots that may not see the invalidation of the index are gone (is there a way to tighten that up? although even this conservative version would be 80-90% of the value for us...) 3) then use performDeletions without taking a lock on the parent table, similar to what's in tablecmds.c already. A DROP INDEX CONCURRENTLY may leave an invalid index if aborted instead of waiting for statement confirmation, just like CREATE INDEX CONCURRENTLY. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers