On Thu, Jun 16, 2016 at 3:28 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Incomplete splits can be completed either by vacuum or insert as both >> needs exclusive lock on bucket. If vacuum finds split-in-progress flag on a >> bucket then it will complete the split operation, vacuum won't see this flag >> if actually split is in progress on that bucket as vacuum needs cleanup lock >> and split retains pin till end of operation. To make it work for Insert >> operation, one simple idea could be that if insert finds split-in-progress >> flag, then it releases the current exclusive lock on bucket and tries to >> acquire a cleanup lock on bucket, if it gets cleanup lock, then it can >> complete the split and then the insertion of tuple, else it will have a >> exclusive lock on bucket and just perform the insertion of tuple. The >> disadvantage of trying to complete the split in vacuum is that split might >> require new pages and allocating new pages at time of vacuum is not >> advisable. The disadvantage of doing it at time of Insert is that Insert >> might skip it even if there is some scan on the bucket is going on as scan >> will also retain pin on the bucket, but I think that is not a big deal. The >> actual completion of split can be done in two ways: (a) scan the new bucket >> and build a hash table with all of the TIDs you find there. When copying >> tuples from the old bucket, first probe the hash table; if you find a match, >> just skip that tuple (idea suggested by Robert Haas offlist) (b) delete all >> the tuples that are marked as moved_by_split in the new bucket and perform >> the split operation from the beginning using old bucket. > > I have completed the patch with respect to incomplete splits and delayed > cleanup of garbage tuples. For incomplete splits, I have used the option > (a) as mentioned above. The incomplete splits are completed if the > insertion sees split-in-progress flag in a bucket.
It seems to me that there is a potential performance problem here. If the split is still being performed, every insert will see the split-in-progress flag set. The in-progress split retains only a pin on the primary bucket, so other backends could also get an exclusive lock, which is all they need for an insert. It seems that under this algorithm they will now take the exclusive lock, release the exclusive lock, try to take a cleanup lock, fail, again take the exclusive lock. That seems like a lot of extra monkeying around. Wouldn't it be better to take the exclusive lock and then afterwards check if the pin count is 1? If so, even though we only intended to take an exclusive lock, it is actually a cleanup lock. If not, we can simply proceed with the insertion. This way you avoid unlocking and relocking the buffer repeatedly. > The second major thing > this new version of patch has achieved is cleanup of garbage tuples i.e the > tuples that are left in old bucket during split. Currently (in HEAD), as > part of a split operation, we clean the tuples from old bucket after moving > them to new bucket, as we have heavy-weight locks on both old and new bucket > till the whole split operation. In the new design, we need to take cleanup > lock on old bucket and exclusive lock on new bucket to perform the split > operation and we don't retain those locks till the end (release the lock as > we move on to overflow buckets). Now to cleanup the tuples we need a > cleanup lock on a bucket which we might not have at split-end. So I choose > to perform the cleanup of garbage tuples during vacuum and when re-split of > the bucket happens as during both the operations, we do hold cleanup lock. > We can extend the cleanup of garbage to other operations as well if > required. I think it's OK for the squeeze phase to be deferred until vacuum or a subsequent split, but simply removing dead tuples seems like it should be done earlier if possible. As I noted in my last email, it seems like any process that gets an exclusive lock can do that, and probably should. Otherwise, the index might become quite bloated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers