Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2015-06-23 Thread Abhijit Menon-Sen
At 2014-08-20 11:07:44 +0300, hlinnakan...@vmware.com wrote:

 I don't think the new GetBufferWithoutRelcache function is in line
 with the existing ReadBuffer API. I think it would be better to add a
 new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if
 it's already in cache, and InvalidBuffer otherwise.

Hi Heikki.

I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I
wasn't happy with the result when I tried to modify the code to suit. It
didn't make the code easier for me to follow.

(I'll say straight up that it can be done the way you said, and if the
consensus is that it would be an improvement, I'll do it that way. I'm
just not convinced about it, hence this mail.)

For example:

 static Buffer
 ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
   BlockNumber blockNum, ReadBufferMode mode,
   BufferAccessStrategy strategy, bool *hit)
 {
 volatile BufferDesc *bufHdr;
 Block   bufBlock;
 boolfound;
 boolisExtend;
 boolisLocalBuf = SmgrIsTemp(smgr);
 
 *hit = false;
 
 /* Make sure we will have room to remember the buffer pin */
 ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
 
 isExtend = (blockNum == P_NEW);

isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case,
which is good for us. But that probably needs to be mentioned in a
comment here.

 TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
smgr-smgr_rnode.node.spcNode,
smgr-smgr_rnode.node.dbNode,
smgr-smgr_rnode.node.relNode,
smgr-smgr_rnode.backend,
isExtend);

We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but
that's probably OK? I don't understand this TRACE_* stuff. But we need
to either skip this, or also do the corresponding TRACE_*_DONE later.

 if (isLocalBuf)
 {
 bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, found);
 if (found)
 pgBufferUsage.local_blks_hit++;
 else
 pgBufferUsage.local_blks_read++;
 }
 else
 {
 /*
  * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  * not currently in memory.
  */
 bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
  strategy, found);
 if (found)
 pgBufferUsage.shared_blks_hit++;
 else
 pgBufferUsage.shared_blks_read++;
 }

The nicest option I could think of here was to copy the shared_buffers
lookup from BufferAlloc into a new BufferAlloc_shared function, and add
a new branch for RBM_CACHE_ONLY. That would look like:

if (isLocalBuf)
…
else if (mode == RBM_CACHE_ONLY)
{
/*
 * We check to see if the buffer is already cached, and if it's
 * not, we return InvalidBuffer because we know it's not pinned.
 */
bufHdr = BufferAlloc_shared(…, found);
if (!found)
return InvalidBuffer;
LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
return BufferDescriptorGetBuffer(bufHdr);
}
else
{
/*
 * lookup the buffer …
}

…or if we went with the rest of the code and didn't do the lock/return
immediately, we would fall through to this code:

 /* At this point we do NOT hold any locks. */
 
 /* if it was already in the buffer pool, we're done */
 if (found)
 {
 if (!isExtend)
 {
 /* Just need to update stats before we exit */
 *hit = true;
 VacuumPageHit++;
 
 if (VacuumCostActive)
 VacuumCostBalance += VacuumCostPageHit;
 
 TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
   smgr-smgr_rnode.node.spcNode,
   smgr-smgr_rnode.node.dbNode,
   smgr-smgr_rnode.node.relNode,
   smgr-smgr_rnode.backend,
   isExtend,
   found);
 
 /*
  * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
  * locked on return.
  */
 if (!isLocalBuf)
 {
 if (mode == RBM_ZERO_AND_LOCK)
 LWLockAcquire(bufHdr-content_lock, LW_EXCLUSIVE);
 else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
 LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
 }
 
 return BufferDescriptorGetBuffer(bufHdr);
 }

Some of this isn't applicable to RBM_CACHE_ONLY, so we could 

Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-08-20 Thread Heikki Linnakangas

On 07/07/2014 11:46 AM, Abhijit Menon-Sen wrote:

At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote:


Other than some minor comments as mentioned below, I don't have any
more issues, it looks all good.


Thank you. (Updated patch attached.)


I don't think the new GetBufferWithoutRelcache function is in line with 
the existing ReadBuffer API. I think it would be better to add a new 
ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's 
already in cache, and InvalidBuffer otherwise.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-07 Thread Amit Khandekar
On 4 July 2014 19:11, Abhijit Menon-Sen a...@2ndquadrant.com wrote:

 Updated patch attached, thanks.

 Amit: what's your conclusion from the review?


Other than some minor comments as mentioned below, I don't have any more
issues, it looks all good.

XLogLockBlockRangeForCleanup() function header comments has the function
name spelled: XLogBlockRangeForCleanup

In GetBufferWithoutRelcache(), we can call BufferDescriptorGetBuffer(buf)
rather than BufferDescriptorGetBuffer(BufferDescriptors[buf_id]).



 -- Abhijit



Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-07 Thread Abhijit Menon-Sen
At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote:

 Other than some minor comments as mentioned below, I don't have any
 more issues, it looks all good.

Thank you. (Updated patch attached.)

-- Abhijit
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 5f9fc49..dc90c02 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record)
 	 * here during crash recovery.
 	 */
 	if (HotStandbyActiveInReplay())
-	{
-		BlockNumber blkno;
-
-		for (blkno = xlrec-lastBlockVacuumed + 1; blkno  xlrec-block; blkno++)
-		{
-			/*
-			 * We use RBM_NORMAL_NO_LOG mode because it's not an error
-			 * condition to see all-zero pages.  The original btvacuumpage
-			 * scan would have skipped over all-zero pages, noting them in FSM
-			 * but not bothering to initialize them just yet; so we mustn't
-			 * throw an error here.  (We could skip acquiring the cleanup lock
-			 * if PageIsNew, but it's probably not worth the cycles to test.)
-			 *
-			 * XXX we don't actually need to read the block, we just need to
-			 * confirm it is unpinned. If we had a special call into the
-			 * buffer manager we could optimise this so that if the block is
-			 * not in shared_buffers we confirm it as unpinned.
-			 */
-			buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno,
-			RBM_NORMAL_NO_LOG);
-			if (BufferIsValid(buffer))
-			{
-LockBufferForCleanup(buffer);
-UnlockReleaseBuffer(buffer);
-			}
-		}
-	}
+		XLogLockBlockRangeForCleanup(xlrec-node, MAIN_FORKNUM,
+	 xlrec-lastBlockVacuumed + 1,
+	 xlrec-block);
 
 	/*
 	 * If we have a full-page image, restore it (using a cleanup lock) and
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b7829ff..a9aea31 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init)
  *
  * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
  * relation is extended with all-zeroes pages up to the given block number.
- *
- * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't
- * exist, and we don't check for all-zeroes.  Thus, no log entry is made
- * to imply that the page should be dropped or truncated later.
  */
 Buffer
 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
@@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 			log_invalid_page(rnode, forknum, blkno, false);
 			return InvalidBuffer;
 		}
-		if (mode == RBM_NORMAL_NO_LOG)
-			return InvalidBuffer;
 		/* OK to extend the file */
 		/* we do this in recovery only - no rel-extension lock needed */
 		Assert(InRecovery);
@@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+/*
+ * XLogLockBlockRangeForCleanup is used in Hot Standby mode to emulate
+ * the locking effects imposed by VACUUM for nbtrees.
+ */
+void
+XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum,
+			 BlockNumber startBlkno,
+			 BlockNumber uptoBlkno)
+{
+	BlockNumber blkno;
+	BlockNumber lastblock;
+	BlockNumber	endingBlkno;
+	Buffer		buffer;
+	SMgrRelation smgr;
+
+	Assert(startBlkno != P_NEW);
+	Assert(uptoBlkno != P_NEW);
+
+	/* Open the relation at smgr level */
+	smgr = smgropen(rnode, InvalidBackendId);
+
+	/*
+	 * Create the target file if it doesn't already exist.  This lets us cope
+	 * if the replay sequence contains writes to a relation that is later
+	 * deleted.  (The original coding of this routine would instead suppress
+	 * the writes, but that seems like it risks losing valuable data if the
+	 * filesystem loses an inode during a crash.  Better to write the data
+	 * until we are actually told to delete the file.)
+	 */
+	smgrcreate(smgr, forkNum, true);
+
+	lastblock = smgrnblocks(smgr, forkNum);
+
+	endingBlkno = uptoBlkno;
+	if (lastblock  endingBlkno)
+		endingBlkno = lastblock;
+
+	for (blkno = startBlkno; blkno  endingBlkno; blkno++)
+	{
+		/*
+		 * All we need to do here is prove that we can lock each block
+		 * with a cleanup lock. It's not an error to see all-zero pages
+		 * here because the original btvacuumpage would not have thrown
+		 * an error either.
+		 *
+		 * We don't actually need to read the block, we just need to
+		 * confirm it is unpinned.
+		 */
+		buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno);
+		if (BufferIsValid(buffer))
+		{
+			LockBufferForCleanup(buffer);
+			UnlockReleaseBuffer(buffer);
+		}
+	}
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 07ea665..f7976a0 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ 

Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-04 Thread Amit Khandekar
On 3 July 2014 16:59, Simon Riggs si...@2ndquadrant.com wrote:


 I think we should say this though

 LockBufHdr(buf);
 valid = ((buf-flags  BM_VALID) != 0);
 if (valid)
 PinBuffer_Locked(buf);
 else
 UnlockBufHdr(buf);

 since otherwise we would access the buffer flags without the spinlock
 and we would leak a pin if the buffer was not valid


Ah right. That is essential.


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-04 Thread Abhijit Menon-Sen
Updated patch attached, thanks.

Amit: what's your conclusion from the review?

-- Abhijit
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 5f9fc49..dc90c02 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record)
 	 * here during crash recovery.
 	 */
 	if (HotStandbyActiveInReplay())
-	{
-		BlockNumber blkno;
-
-		for (blkno = xlrec-lastBlockVacuumed + 1; blkno  xlrec-block; blkno++)
-		{
-			/*
-			 * We use RBM_NORMAL_NO_LOG mode because it's not an error
-			 * condition to see all-zero pages.  The original btvacuumpage
-			 * scan would have skipped over all-zero pages, noting them in FSM
-			 * but not bothering to initialize them just yet; so we mustn't
-			 * throw an error here.  (We could skip acquiring the cleanup lock
-			 * if PageIsNew, but it's probably not worth the cycles to test.)
-			 *
-			 * XXX we don't actually need to read the block, we just need to
-			 * confirm it is unpinned. If we had a special call into the
-			 * buffer manager we could optimise this so that if the block is
-			 * not in shared_buffers we confirm it as unpinned.
-			 */
-			buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno,
-			RBM_NORMAL_NO_LOG);
-			if (BufferIsValid(buffer))
-			{
-LockBufferForCleanup(buffer);
-UnlockReleaseBuffer(buffer);
-			}
-		}
-	}
+		XLogLockBlockRangeForCleanup(xlrec-node, MAIN_FORKNUM,
+	 xlrec-lastBlockVacuumed + 1,
+	 xlrec-block);
 
 	/*
 	 * If we have a full-page image, restore it (using a cleanup lock) and
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b7829ff..d84f83a 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init)
  *
  * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
  * relation is extended with all-zeroes pages up to the given block number.
- *
- * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't
- * exist, and we don't check for all-zeroes.  Thus, no log entry is made
- * to imply that the page should be dropped or truncated later.
  */
 Buffer
 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
@@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 			log_invalid_page(rnode, forknum, blkno, false);
 			return InvalidBuffer;
 		}
-		if (mode == RBM_NORMAL_NO_LOG)
-			return InvalidBuffer;
 		/* OK to extend the file */
 		/* we do this in recovery only - no rel-extension lock needed */
 		Assert(InRecovery);
@@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+/*
+ * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the
+ * locking effects imposed by VACUUM for nbtrees.
+ */
+void
+XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum,
+			 BlockNumber startBlkno,
+			 BlockNumber uptoBlkno)
+{
+	BlockNumber blkno;
+	BlockNumber lastblock;
+	BlockNumber	endingBlkno;
+	Buffer		buffer;
+	SMgrRelation smgr;
+
+	Assert(startBlkno != P_NEW);
+	Assert(uptoBlkno != P_NEW);
+
+	/* Open the relation at smgr level */
+	smgr = smgropen(rnode, InvalidBackendId);
+
+	/*
+	 * Create the target file if it doesn't already exist.  This lets us cope
+	 * if the replay sequence contains writes to a relation that is later
+	 * deleted.  (The original coding of this routine would instead suppress
+	 * the writes, but that seems like it risks losing valuable data if the
+	 * filesystem loses an inode during a crash.  Better to write the data
+	 * until we are actually told to delete the file.)
+	 */
+	smgrcreate(smgr, forkNum, true);
+
+	lastblock = smgrnblocks(smgr, forkNum);
+
+	endingBlkno = uptoBlkno;
+	if (lastblock  endingBlkno)
+		endingBlkno = lastblock;
+
+	for (blkno = startBlkno; blkno  endingBlkno; blkno++)
+	{
+		/*
+		 * All we need to do here is prove that we can lock each block
+		 * with a cleanup lock. It's not an error to see all-zero pages
+		 * here because the original btvacuumpage would not have thrown
+		 * an error either.
+		 *
+		 * We don't actually need to read the block, we just need to
+		 * confirm it is unpinned.
+		 */
+		buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno);
+		if (BufferIsValid(buffer))
+		{
+			LockBufferForCleanup(buffer);
+			UnlockReleaseBuffer(buffer);
+		}
+	}
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 07ea665..ce5ff70 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -222,8 +222,6 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
  * current physical EOF; that is likely to cause problems in md.c when
  

Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-03 Thread Abhijit Menon-Sen
At 2014-07-03 11:15:53 +0530, amit.khande...@enterprisedb.com wrote:

 In GetBufferWithoutRelcache(), I was wondering, rather than calling
 PinBuffer(), if we do this :
 LockBufHdr(buf);
 PinBuffer_Locked(buf);
 valid = ((buf-flags  BM_VALID) != 0);
 then we can avoid having the new buffer access strategy BAS_DISCARD
 that is introduced in this patch. And so the code changes in
 freelist.c would not be necessary.

Thanks for the suggestion. It sounds sensible at first glance. I'll
think about it a little more, then try it and see.

  will follow up with some performance numbers soon.
 
 Yes, that would be nice.

I'm afraid I haven't had a chance to do this yet.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-03 Thread Abhijit Menon-Sen
FYI, I've attached a patch that does what you suggested. I haven't done
anything else (i.e. testing) with it yet.

-- Abhijit
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 5f9fc49..dc90c02 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record)
 	 * here during crash recovery.
 	 */
 	if (HotStandbyActiveInReplay())
-	{
-		BlockNumber blkno;
-
-		for (blkno = xlrec-lastBlockVacuumed + 1; blkno  xlrec-block; blkno++)
-		{
-			/*
-			 * We use RBM_NORMAL_NO_LOG mode because it's not an error
-			 * condition to see all-zero pages.  The original btvacuumpage
-			 * scan would have skipped over all-zero pages, noting them in FSM
-			 * but not bothering to initialize them just yet; so we mustn't
-			 * throw an error here.  (We could skip acquiring the cleanup lock
-			 * if PageIsNew, but it's probably not worth the cycles to test.)
-			 *
-			 * XXX we don't actually need to read the block, we just need to
-			 * confirm it is unpinned. If we had a special call into the
-			 * buffer manager we could optimise this so that if the block is
-			 * not in shared_buffers we confirm it as unpinned.
-			 */
-			buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno,
-			RBM_NORMAL_NO_LOG);
-			if (BufferIsValid(buffer))
-			{
-LockBufferForCleanup(buffer);
-UnlockReleaseBuffer(buffer);
-			}
-		}
-	}
+		XLogLockBlockRangeForCleanup(xlrec-node, MAIN_FORKNUM,
+	 xlrec-lastBlockVacuumed + 1,
+	 xlrec-block);
 
 	/*
 	 * If we have a full-page image, restore it (using a cleanup lock) and
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b7829ff..d84f83a 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init)
  *
  * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
  * relation is extended with all-zeroes pages up to the given block number.
- *
- * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't
- * exist, and we don't check for all-zeroes.  Thus, no log entry is made
- * to imply that the page should be dropped or truncated later.
  */
 Buffer
 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
@@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 			log_invalid_page(rnode, forknum, blkno, false);
 			return InvalidBuffer;
 		}
-		if (mode == RBM_NORMAL_NO_LOG)
-			return InvalidBuffer;
 		/* OK to extend the file */
 		/* we do this in recovery only - no rel-extension lock needed */
 		Assert(InRecovery);
@@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+/*
+ * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the
+ * locking effects imposed by VACUUM for nbtrees.
+ */
+void
+XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum,
+			 BlockNumber startBlkno,
+			 BlockNumber uptoBlkno)
+{
+	BlockNumber blkno;
+	BlockNumber lastblock;
+	BlockNumber	endingBlkno;
+	Buffer		buffer;
+	SMgrRelation smgr;
+
+	Assert(startBlkno != P_NEW);
+	Assert(uptoBlkno != P_NEW);
+
+	/* Open the relation at smgr level */
+	smgr = smgropen(rnode, InvalidBackendId);
+
+	/*
+	 * Create the target file if it doesn't already exist.  This lets us cope
+	 * if the replay sequence contains writes to a relation that is later
+	 * deleted.  (The original coding of this routine would instead suppress
+	 * the writes, but that seems like it risks losing valuable data if the
+	 * filesystem loses an inode during a crash.  Better to write the data
+	 * until we are actually told to delete the file.)
+	 */
+	smgrcreate(smgr, forkNum, true);
+
+	lastblock = smgrnblocks(smgr, forkNum);
+
+	endingBlkno = uptoBlkno;
+	if (lastblock  endingBlkno)
+		endingBlkno = lastblock;
+
+	for (blkno = startBlkno; blkno  endingBlkno; blkno++)
+	{
+		/*
+		 * All we need to do here is prove that we can lock each block
+		 * with a cleanup lock. It's not an error to see all-zero pages
+		 * here because the original btvacuumpage would not have thrown
+		 * an error either.
+		 *
+		 * We don't actually need to read the block, we just need to
+		 * confirm it is unpinned.
+		 */
+		buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno);
+		if (BufferIsValid(buffer))
+		{
+			LockBufferForCleanup(buffer);
+			UnlockReleaseBuffer(buffer);
+		}
+	}
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 07ea665..d3c7eb7 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -222,8 +222,6 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
  * current physical EOF; that is 

Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-03 Thread Simon Riggs
On 3 July 2014 06:45, Amit Khandekar amit.khande...@enterprisedb.com wrote:

 In GetBufferWithoutRelcache(), I was wondering, rather than calling
 PinBuffer(), if we do this :
 LockBufHdr(buf);
 PinBuffer_Locked(buf);
 valid = ((buf-flags  BM_VALID) != 0);
 then we can avoid having the new buffer access strategy BAS_DISCARD that is
 introduced in this patch. And so the code changes in freelist.c would not be
 necessary.

That looks like a good idea, thanks.

I think we should say this though

LockBufHdr(buf);
valid = ((buf-flags  BM_VALID) != 0);
if (valid)
PinBuffer_Locked(buf);
else
UnlockBufHdr(buf);

since otherwise we would access the buffer flags without the spinlock
and we would leak a pin if the buffer was not valid

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-07-02 Thread Amit Khandekar
On 13 June 2014 14:10, Abhijit Menon-Sen a...@2ndquadrant.com wrote:

 nbtxlog.c:btree_xlog_vacuum() contains the following comment:

  * XXX we don't actually need to read the block, we just need to
  * confirm it is unpinned. If we had a special call into the
  * buffer manager we could optimise this so that if the block is
  * not in shared_buffers we confirm it as unpinned.

 The attached patch introduces an XLogLockBlockRangeForCleanup() function
 that addresses this, as well as a special call into the buffer manager
 that makes it possible to lock the buffers without reading them.


In GetBufferWithoutRelcache(), I was wondering, rather than calling
PinBuffer(), if we do this :
LockBufHdr(buf);
PinBuffer_Locked(buf);
valid = ((buf-flags  BM_VALID) != 0);
then we can avoid having the new buffer access strategy BAS_DISCARD that is
introduced in this patch. And so the code changes in freelist.c would not
be necessary.

Will give further thought on the overall logic in
XLogLockBlockRangeForCleanup().


 will follow up with some performance numbers soon.


Yes, that would be nice.

-Amit


[HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-06-13 Thread Abhijit Menon-Sen
nbtxlog.c:btree_xlog_vacuum() contains the following comment:

 * XXX we don't actually need to read the block, we just need to
 * confirm it is unpinned. If we had a special call into the
 * buffer manager we could optimise this so that if the block is
 * not in shared_buffers we confirm it as unpinned.

The attached patch introduces an XLogLockBlockRangeForCleanup() function
that addresses this, as well as a special call into the buffer manager
that makes it possible to lock the buffers without reading them.

The patch is by Simon Riggs, with some minor edits and testing by me.

I'm adding the patch to the CommitFest, and will follow up with some
performance numbers soon.

-- Abhijit
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 5f9fc49..dc90c02 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record)
 	 * here during crash recovery.
 	 */
 	if (HotStandbyActiveInReplay())
-	{
-		BlockNumber blkno;
-
-		for (blkno = xlrec-lastBlockVacuumed + 1; blkno  xlrec-block; blkno++)
-		{
-			/*
-			 * We use RBM_NORMAL_NO_LOG mode because it's not an error
-			 * condition to see all-zero pages.  The original btvacuumpage
-			 * scan would have skipped over all-zero pages, noting them in FSM
-			 * but not bothering to initialize them just yet; so we mustn't
-			 * throw an error here.  (We could skip acquiring the cleanup lock
-			 * if PageIsNew, but it's probably not worth the cycles to test.)
-			 *
-			 * XXX we don't actually need to read the block, we just need to
-			 * confirm it is unpinned. If we had a special call into the
-			 * buffer manager we could optimise this so that if the block is
-			 * not in shared_buffers we confirm it as unpinned.
-			 */
-			buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno,
-			RBM_NORMAL_NO_LOG);
-			if (BufferIsValid(buffer))
-			{
-LockBufferForCleanup(buffer);
-UnlockReleaseBuffer(buffer);
-			}
-		}
-	}
+		XLogLockBlockRangeForCleanup(xlrec-node, MAIN_FORKNUM,
+	 xlrec-lastBlockVacuumed + 1,
+	 xlrec-block);
 
 	/*
 	 * If we have a full-page image, restore it (using a cleanup lock) and
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b7829ff..302551f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init)
  *
  * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
  * relation is extended with all-zeroes pages up to the given block number.
- *
- * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't
- * exist, and we don't check for all-zeroes.  Thus, no log entry is made
- * to imply that the page should be dropped or truncated later.
  */
 Buffer
 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
@@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 			log_invalid_page(rnode, forknum, blkno, false);
 			return InvalidBuffer;
 		}
-		if (mode == RBM_NORMAL_NO_LOG)
-			return InvalidBuffer;
 		/* OK to extend the file */
 		/* we do this in recovery only - no rel-extension lock needed */
 		Assert(InRecovery);
@@ -375,6 +369,73 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+/*
+ * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the
+ * locking effects imposed by VACUUM for nbtrees.
+ */
+void
+XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum,
+			 BlockNumber startBlkno,
+			 BlockNumber uptoBlkno)
+{
+	BlockNumber blkno;
+	BlockNumber lastblock;
+	BlockNumber	endingBlkno;
+	Buffer		buffer;
+	BufferAccessStrategy bstrategy;
+	SMgrRelation smgr;
+
+	Assert(startBlkno != P_NEW);
+	Assert(uptoBlkno != P_NEW);
+
+	/* Open the relation at smgr level */
+	smgr = smgropen(rnode, InvalidBackendId);
+
+	/*
+	 * Create the target file if it doesn't already exist.  This lets us cope
+	 * if the replay sequence contains writes to a relation that is later
+	 * deleted.  (The original coding of this routine would instead suppress
+	 * the writes, but that seems like it risks losing valuable data if the
+	 * filesystem loses an inode during a crash.  Better to write the data
+	 * until we are actually told to delete the file.)
+	 */
+	smgrcreate(smgr, forkNum, true);
+
+	lastblock = smgrnblocks(smgr, forkNum);
+
+	endingBlkno = uptoBlkno;
+	if (lastblock  endingBlkno)
+		endingBlkno = lastblock;
+
+	/*
+	 * We need to use a BufferAccessStrategy because we do not want to
+	 * increment the buffer's usage_count as we read them. However,
+	 * there is no ring buffer associated with this strategy, since we
+	 * never actually read or write the contents, just lock them.
+	 */
+	bstrategy = GetAccessStrategy(BAS_DISCARD);
+
+	for (blkno