Re: [HACKERS] Gin page deletion bug

2013-11-08 Thread Heikki Linnakangas

On 07.11.2013 22:49, Heikki Linnakangas wrote:

Gin page deletion fails to take into account that there might be a
search in-flight to the page that is deleted. If the page is reused for
something else, the search can get very confused.

...

The regular b-tree code solves this by stamping deleted pages with the
current XID, and only allowing them to be reused once that XID becomes
old enough ( RecentGlobalXmin). Another approach might be to grab a
cleanup-strength lock on the left and parent pages when deleting a page,
and requiring search to keep the pin on the page its coming from, until
it has locked the next page.


I came up with the attached fix. In a nutshell, when walking along a 
right-link, the new page is locked before releasing the lock on the old 
one. Also, never delete the leftmost branch of a posting tree. I believe 
these changes are sufficient to fix the problem, because of the way the 
posting tree is searched:


All searches on a posting tree are full searches, scanning the whole 
tree from left to right. Insertions do seek to the middle of the tree, 
but they are locked out during vacuum by holding a cleanup lock on the 
posting tree root (even without this patch). Thanks to this property, a 
search cannot step on a page that's being deleted by following a 
downlink, if we refrain from deleting the leftmost page on a level, as 
searches only descend down the leftmost branch.


It would be nice to improve that in master - holding an exclusive lock 
on the root page is pretty horrible - but this is a nice back-patchable 
patch. I'm not worried about the loss of concurrency because we now have 
to hold the lock on previous page when stepping to next page. In 
searches, it's only a share-lock, and in insertions, it's very rare that 
you have to step right.


The patch also adds some sanity checks to stepping right: the next page 
should be of the same type as the current page, e.g stepping right 
should not go from leaf to non-leaf or vice versa.


- Heikki
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 2a6be4b..a738d80 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -112,10 +112,8 @@ ginFindLeafPage(GinBtree btree, GinBtreeStack *stack)
 /* rightmost page */
 break;
 
+			stack-buffer = ginStepRight(stack-buffer, btree-index, access);
 			stack-blkno = rightlink;
-			LockBuffer(stack-buffer, GIN_UNLOCK);
-			stack-buffer = ReleaseAndReadBuffer(stack-buffer, btree-index, stack-blkno);
-			LockBuffer(stack-buffer, access);
 			page = BufferGetPage(stack-buffer);
 		}
 
@@ -148,6 +146,41 @@ ginFindLeafPage(GinBtree btree, GinBtreeStack *stack)
 	}
 }
 
+/*
+ * Step right from current page.
+ *
+ * The next page is locked first, before releasing the current page. This is
+ * crucial to protect from concurrent page deletion (see comment in
+ * ginDeletePage).
+ */
+Buffer
+ginStepRight(Buffer buffer, Relation index, int lockmode)
+{
+	Buffer		nextbuffer;
+	Page		page = BufferGetPage(buffer);
+	bool		isLeaf = GinPageIsLeaf(page);
+	bool		isData = GinPageIsData(page);
+	BlockNumber	blkno = GinPageGetOpaque(page)-rightlink;
+
+	nextbuffer = ReadBuffer(index, blkno);
+	LockBuffer(nextbuffer, lockmode);
+	UnlockReleaseBuffer(buffer);
+
+	/* Sanity check that the page we stepped to is of similar kind. */
+	page = BufferGetPage(nextbuffer);
+	if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
+		elog(ERROR, right sibling of GIN page is of different type);
+
+	/*
+	 * Given the proper lock sequence above, we should never land on a
+	 * deleted page.
+	 */
+	if  (GinPageIsDeleted(page))
+		elog(ERROR, right sibling of GIN page was deleted);
+
+	return nextbuffer;
+}
+
 void
 freeGinBtreeStack(GinBtreeStack *stack)
 {
@@ -235,12 +268,12 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack,
 		while ((offset = btree-findChildPtr(btree, page, stack-blkno, InvalidOffsetNumber)) == InvalidOffsetNumber)
 		{
 			blkno = GinPageGetOpaque(page)-rightlink;
-			LockBuffer(buffer, GIN_UNLOCK);
-			ReleaseBuffer(buffer);
 			if (blkno == InvalidBlockNumber)
+			{
+UnlockReleaseBuffer(buffer);
 break;
-			buffer = ReadBuffer(btree-index, blkno);
-			LockBuffer(buffer, GIN_EXCLUSIVE);
+			}
+			buffer = ginStepRight(buffer, btree-index, GIN_EXCLUSIVE);
 			page = BufferGetPage(buffer);
 		}
 
@@ -439,23 +472,21 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 		{
 			BlockNumber rightlink = GinPageGetOpaque(page)-rightlink;
 
-			LockBuffer(parent-buffer, GIN_UNLOCK);
-
 			if (rightlink == InvalidBlockNumber)
 			{
 /*
  * rightmost page, but we don't find parent, we should use
  * plain search...
  */
+LockBuffer(parent-buffer, GIN_UNLOCK);
 ginFindParents(btree, stack, rootBlkno);
 parent = stack-parent;
 Assert(parent != NULL);
 break;
 			}
 
+			parent-buffer = ginStepRight(parent-buffer, btree-index, GIN_EXCLUSIVE);
 			

Re: [HACKERS] Gin page deletion bug

2013-11-08 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 I came up with the attached fix. In a nutshell, when walking along a 
 right-link, the new page is locked before releasing the lock on the old 
 one. Also, never delete the leftmost branch of a posting tree. I believe 
 these changes are sufficient to fix the problem, because of the way the 
 posting tree is searched:

This seems reasonable, but I wonder whether the concurrency discussion
shouldn't be in gist/README, rather than buried in a comment inside
ginDeletePage (not exactly the first place one would think to look IMO).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gin page deletion bug

2013-11-08 Thread Heikki Linnakangas

On 08.11.2013 19:14, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

I came up with the attached fix. In a nutshell, when walking along a
right-link, the new page is locked before releasing the lock on the old
one. Also, never delete the leftmost branch of a posting tree. I believe
these changes are sufficient to fix the problem, because of the way the
posting tree is searched:


This seems reasonable, but I wonder whether the concurrency discussion
shouldn't be in gist/README, rather than buried in a comment inside
ginDeletePage (not exactly the first place one would think to look IMO).


Yeah, README is probably better. There's zero explanation of concurrency 
and locking issues there at the moment, so I added a new section there 
explaining why the page deletion is (now) safe. There's a lot more to 
say about locking in GIN, but I'll leave that for another day.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers