Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
At 2014-08-20 11:07:44 +0300, hlinnakan...@vmware.com wrote: > > I don't think the new GetBufferWithoutRelcache function is in line > with the existing ReadBuffer API. I think it would be better to add a > new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if > it's already in cache, and InvalidBuffer otherwise. Hi Heikki. I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I wasn't happy with the result when I tried to modify the code to suit. It didn't make the code easier for me to follow. (I'll say straight up that it can be done the way you said, and if the consensus is that it would be an improvement, I'll do it that way. I'm just not convinced about it, hence this mail.) For example: > static Buffer > ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, > BlockNumber blockNum, ReadBufferMode mode, > BufferAccessStrategy strategy, bool *hit) > { > volatile BufferDesc *bufHdr; > Block bufBlock; > boolfound; > boolisExtend; > boolisLocalBuf = SmgrIsTemp(smgr); > > *hit = false; > > /* Make sure we will have room to remember the buffer pin */ > ResourceOwnerEnlargeBuffers(CurrentResourceOwner); > > isExtend = (blockNum == P_NEW); isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case, which is good for us. But that probably needs to be mentioned in a comment here. > TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum, >smgr->smgr_rnode.node.spcNode, >smgr->smgr_rnode.node.dbNode, >smgr->smgr_rnode.node.relNode, >smgr->smgr_rnode.backend, >isExtend); We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but that's probably OK? I don't understand this TRACE_* stuff. But we need to either skip this, or also do the corresponding TRACE_*_DONE later. > if (isLocalBuf) > { > bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found); > if (found) > pgBufferUsage.local_blks_hit++; > else > pgBufferUsage.local_blks_read++; > } > else > { > /* > * lookup the buffer. IO_IN_PROGRESS is set if the requested block is > * not currently in memory. > */ > bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum, > strategy, &found); > if (found) > pgBufferUsage.shared_blks_hit++; > else > pgBufferUsage.shared_blks_read++; > } The nicest option I could think of here was to copy the shared_buffers lookup from BufferAlloc into a new BufferAlloc_shared function, and add a new branch for RBM_CACHE_ONLY. That would look like: if (isLocalBuf) … else if (mode == RBM_CACHE_ONLY) { /* * We check to see if the buffer is already cached, and if it's * not, we return InvalidBuffer because we know it's not pinned. */ bufHdr = BufferAlloc_shared(…, &found); if (!found) return InvalidBuffer; LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr)); return BufferDescriptorGetBuffer(bufHdr); } else { /* * lookup the buffer … } …or if we went with the rest of the code and didn't do the lock/return immediately, we would fall through to this code: > /* At this point we do NOT hold any locks. */ > > /* if it was already in the buffer pool, we're done */ > if (found) > { > if (!isExtend) > { > /* Just need to update stats before we exit */ > *hit = true; > VacuumPageHit++; > > if (VacuumCostActive) > VacuumCostBalance += VacuumCostPageHit; > > TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum, > smgr->smgr_rnode.node.spcNode, > smgr->smgr_rnode.node.dbNode, > smgr->smgr_rnode.node.relNode, > smgr->smgr_rnode.backend, > isExtend, > found); > > /* > * In RBM_ZERO_AND_LOCK mode the caller expects the page to be > * locked on return. > */ > if (!isLocalBuf) > { > if (mode == RBM_ZERO_AND_LOCK) > LWLockAcquire(bufHdr->content_lock, LW_EXCLUSIVE); > else if (mode == RBM_ZERO_AND_CLEANUP_LOCK) > LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr)); > } > > return BufferDescript
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 07/07/2014 11:46 AM, Abhijit Menon-Sen wrote: At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote: Other than some minor comments as mentioned below, I don't have any more issues, it looks all good. Thank you. (Updated patch attached.) I don't think the new GetBufferWithoutRelcache function is in line with the existing ReadBuffer API. I think it would be better to add a new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's already in cache, and InvalidBuffer otherwise. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote: > > Other than some minor comments as mentioned below, I don't have any > more issues, it looks all good. Thank you. (Updated patch attached.) -- Abhijit diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index 5f9fc49..dc90c02 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) * here during crash recovery. */ if (HotStandbyActiveInReplay()) - { - BlockNumber blkno; - - for (blkno = xlrec->lastBlockVacuumed + 1; blkno < xlrec->block; blkno++) - { - /* - * We use RBM_NORMAL_NO_LOG mode because it's not an error - * condition to see all-zero pages. The original btvacuumpage - * scan would have skipped over all-zero pages, noting them in FSM - * but not bothering to initialize them just yet; so we mustn't - * throw an error here. (We could skip acquiring the cleanup lock - * if PageIsNew, but it's probably not worth the cycles to test.) - * - * XXX we don't actually need to read the block, we just need to - * confirm it is unpinned. If we had a special call into the - * buffer manager we could optimise this so that if the block is - * not in shared_buffers we confirm it as unpinned. - */ - buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, - RBM_NORMAL_NO_LOG); - if (BufferIsValid(buffer)) - { -LockBufferForCleanup(buffer); -UnlockReleaseBuffer(buffer); - } - } - } + XLogLockBlockRangeForCleanup(xlrec->node, MAIN_FORKNUM, + xlrec->lastBlockVacuumed + 1, + xlrec->block); /* * If we have a full-page image, restore it (using a cleanup lock) and diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b7829ff..a9aea31 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. - * - * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't - * exist, and we don't check for all-zeroes. Thus, no log entry is made - * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, @@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - if (mode == RBM_NORMAL_NO_LOG) - return InvalidBuffer; /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); @@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogLockBlockRangeForCleanup is used in Hot Standby mode to emulate + * the locking effects imposed by VACUUM for nbtrees. + */ +void +XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum, + BlockNumber startBlkno, + BlockNumber uptoBlkno) +{ + BlockNumber blkno; + BlockNumber lastblock; + BlockNumber endingBlkno; + Buffer buffer; + SMgrRelation smgr; + + Assert(startBlkno != P_NEW); + Assert(uptoBlkno != P_NEW); + + /* Open the relation at smgr level */ + smgr = smgropen(rnode, InvalidBackendId); + + /* + * Create the target file if it doesn't already exist. This lets us cope + * if the replay sequence contains writes to a relation that is later + * deleted. (The original coding of this routine would instead suppress + * the writes, but that seems like it risks losing valuable data if the + * filesystem loses an inode during a crash. Better to write the data + * until we are actually told to delete the file.) + */ + smgrcreate(smgr, forkNum, true); + + lastblock = smgrnblocks(smgr, forkNum); + + endingBlkno = uptoBlkno; + if (lastblock < endingBlkno) + endingBlkno = lastblock; + + for (blkno = startBlkno; blkno < endingBlkno; blkno++) + { + /* + * All we need to do here is prove that we can lock each block + * with a cleanup lock. It's not an error to see all-zero pages + * here because the original btvacuumpage would not have thrown + * an error either. + * + * We don't actually need to read the block, we just need to + * confirm it is unpinned. + */ + buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno); + if (BufferIsValid(buffer)) + { + LockBufferForCleanup(buffer); + UnlockReleaseBuffer(buffer); + } + } +} /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 07ea665..f7976a0 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufm
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 4 July 2014 19:11, Abhijit Menon-Sen wrote: > Updated patch attached, thanks. > > Amit: what's your conclusion from the review? > Other than some minor comments as mentioned below, I don't have any more issues, it looks all good. XLogLockBlockRangeForCleanup() function header comments has the function name spelled: XLogBlockRangeForCleanup In GetBufferWithoutRelcache(), we can call BufferDescriptorGetBuffer(buf) rather than BufferDescriptorGetBuffer(&BufferDescriptors[buf_id]). > -- Abhijit >
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
Updated patch attached, thanks. Amit: what's your conclusion from the review? -- Abhijit diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index 5f9fc49..dc90c02 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) * here during crash recovery. */ if (HotStandbyActiveInReplay()) - { - BlockNumber blkno; - - for (blkno = xlrec->lastBlockVacuumed + 1; blkno < xlrec->block; blkno++) - { - /* - * We use RBM_NORMAL_NO_LOG mode because it's not an error - * condition to see all-zero pages. The original btvacuumpage - * scan would have skipped over all-zero pages, noting them in FSM - * but not bothering to initialize them just yet; so we mustn't - * throw an error here. (We could skip acquiring the cleanup lock - * if PageIsNew, but it's probably not worth the cycles to test.) - * - * XXX we don't actually need to read the block, we just need to - * confirm it is unpinned. If we had a special call into the - * buffer manager we could optimise this so that if the block is - * not in shared_buffers we confirm it as unpinned. - */ - buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, - RBM_NORMAL_NO_LOG); - if (BufferIsValid(buffer)) - { -LockBufferForCleanup(buffer); -UnlockReleaseBuffer(buffer); - } - } - } + XLogLockBlockRangeForCleanup(xlrec->node, MAIN_FORKNUM, + xlrec->lastBlockVacuumed + 1, + xlrec->block); /* * If we have a full-page image, restore it (using a cleanup lock) and diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b7829ff..d84f83a 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. - * - * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't - * exist, and we don't check for all-zeroes. Thus, no log entry is made - * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, @@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - if (mode == RBM_NORMAL_NO_LOG) - return InvalidBuffer; /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); @@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the + * locking effects imposed by VACUUM for nbtrees. + */ +void +XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum, + BlockNumber startBlkno, + BlockNumber uptoBlkno) +{ + BlockNumber blkno; + BlockNumber lastblock; + BlockNumber endingBlkno; + Buffer buffer; + SMgrRelation smgr; + + Assert(startBlkno != P_NEW); + Assert(uptoBlkno != P_NEW); + + /* Open the relation at smgr level */ + smgr = smgropen(rnode, InvalidBackendId); + + /* + * Create the target file if it doesn't already exist. This lets us cope + * if the replay sequence contains writes to a relation that is later + * deleted. (The original coding of this routine would instead suppress + * the writes, but that seems like it risks losing valuable data if the + * filesystem loses an inode during a crash. Better to write the data + * until we are actually told to delete the file.) + */ + smgrcreate(smgr, forkNum, true); + + lastblock = smgrnblocks(smgr, forkNum); + + endingBlkno = uptoBlkno; + if (lastblock < endingBlkno) + endingBlkno = lastblock; + + for (blkno = startBlkno; blkno < endingBlkno; blkno++) + { + /* + * All we need to do here is prove that we can lock each block + * with a cleanup lock. It's not an error to see all-zero pages + * here because the original btvacuumpage would not have thrown + * an error either. + * + * We don't actually need to read the block, we just need to + * confirm it is unpinned. + */ + buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno); + if (BufferIsValid(buffer)) + { + LockBufferForCleanup(buffer); + UnlockReleaseBuffer(buffer); + } + } +} /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 07ea665..ce5ff70 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -222,8 +222,6 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * current physical EOF; that is likely to cause problems in md.
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 3 July 2014 16:59, Simon Riggs wrote: > > I think we should say this though > > LockBufHdr(buf); > valid = ((buf->flags & BM_VALID) != 0); > if (valid) > PinBuffer_Locked(buf); > else > UnlockBufHdr(buf); > > since otherwise we would access the buffer flags without the spinlock > and we would leak a pin if the buffer was not valid > Ah right. That is essential.
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 3 July 2014 06:45, Amit Khandekar wrote: > In GetBufferWithoutRelcache(), I was wondering, rather than calling > PinBuffer(), if we do this : > LockBufHdr(buf); > PinBuffer_Locked(buf); > valid = ((buf->flags & BM_VALID) != 0); > then we can avoid having the new buffer access strategy BAS_DISCARD that is > introduced in this patch. And so the code changes in freelist.c would not be > necessary. That looks like a good idea, thanks. I think we should say this though LockBufHdr(buf); valid = ((buf->flags & BM_VALID) != 0); if (valid) PinBuffer_Locked(buf); else UnlockBufHdr(buf); since otherwise we would access the buffer flags without the spinlock and we would leak a pin if the buffer was not valid -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
FYI, I've attached a patch that does what you suggested. I haven't done anything else (i.e. testing) with it yet. -- Abhijit diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index 5f9fc49..dc90c02 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) * here during crash recovery. */ if (HotStandbyActiveInReplay()) - { - BlockNumber blkno; - - for (blkno = xlrec->lastBlockVacuumed + 1; blkno < xlrec->block; blkno++) - { - /* - * We use RBM_NORMAL_NO_LOG mode because it's not an error - * condition to see all-zero pages. The original btvacuumpage - * scan would have skipped over all-zero pages, noting them in FSM - * but not bothering to initialize them just yet; so we mustn't - * throw an error here. (We could skip acquiring the cleanup lock - * if PageIsNew, but it's probably not worth the cycles to test.) - * - * XXX we don't actually need to read the block, we just need to - * confirm it is unpinned. If we had a special call into the - * buffer manager we could optimise this so that if the block is - * not in shared_buffers we confirm it as unpinned. - */ - buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, - RBM_NORMAL_NO_LOG); - if (BufferIsValid(buffer)) - { -LockBufferForCleanup(buffer); -UnlockReleaseBuffer(buffer); - } - } - } + XLogLockBlockRangeForCleanup(xlrec->node, MAIN_FORKNUM, + xlrec->lastBlockVacuumed + 1, + xlrec->block); /* * If we have a full-page image, restore it (using a cleanup lock) and diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b7829ff..d84f83a 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. - * - * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't - * exist, and we don't check for all-zeroes. Thus, no log entry is made - * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, @@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - if (mode == RBM_NORMAL_NO_LOG) - return InvalidBuffer; /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); @@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the + * locking effects imposed by VACUUM for nbtrees. + */ +void +XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum, + BlockNumber startBlkno, + BlockNumber uptoBlkno) +{ + BlockNumber blkno; + BlockNumber lastblock; + BlockNumber endingBlkno; + Buffer buffer; + SMgrRelation smgr; + + Assert(startBlkno != P_NEW); + Assert(uptoBlkno != P_NEW); + + /* Open the relation at smgr level */ + smgr = smgropen(rnode, InvalidBackendId); + + /* + * Create the target file if it doesn't already exist. This lets us cope + * if the replay sequence contains writes to a relation that is later + * deleted. (The original coding of this routine would instead suppress + * the writes, but that seems like it risks losing valuable data if the + * filesystem loses an inode during a crash. Better to write the data + * until we are actually told to delete the file.) + */ + smgrcreate(smgr, forkNum, true); + + lastblock = smgrnblocks(smgr, forkNum); + + endingBlkno = uptoBlkno; + if (lastblock < endingBlkno) + endingBlkno = lastblock; + + for (blkno = startBlkno; blkno < endingBlkno; blkno++) + { + /* + * All we need to do here is prove that we can lock each block + * with a cleanup lock. It's not an error to see all-zero pages + * here because the original btvacuumpage would not have thrown + * an error either. + * + * We don't actually need to read the block, we just need to + * confirm it is unpinned. + */ + buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno); + if (BufferIsValid(buffer)) + { + LockBufferForCleanup(buffer); + UnlockReleaseBuffer(buffer); + } + } +} /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 07ea665..d3c7eb7 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -222,8 +222,6 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * current physical EOF; that
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
At 2014-07-03 11:15:53 +0530, amit.khande...@enterprisedb.com wrote: > > In GetBufferWithoutRelcache(), I was wondering, rather than calling > PinBuffer(), if we do this : > LockBufHdr(buf); > PinBuffer_Locked(buf); > valid = ((buf->flags & BM_VALID) != 0); > then we can avoid having the new buffer access strategy BAS_DISCARD > that is introduced in this patch. And so the code changes in > freelist.c would not be necessary. Thanks for the suggestion. It sounds sensible at first glance. I'll think about it a little more, then try it and see. > > will follow up with some performance numbers soon. > > Yes, that would be nice. I'm afraid I haven't had a chance to do this yet. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 13 June 2014 14:10, Abhijit Menon-Sen wrote: > nbtxlog.c:btree_xlog_vacuum() contains the following comment: > > * XXX we don't actually need to read the block, we just need to > * confirm it is unpinned. If we had a special call into the > * buffer manager we could optimise this so that if the block is > * not in shared_buffers we confirm it as unpinned. > > The attached patch introduces an XLogLockBlockRangeForCleanup() function > that addresses this, as well as a "special call into the buffer manager" > that makes it possible to lock the buffers without reading them. > In GetBufferWithoutRelcache(), I was wondering, rather than calling PinBuffer(), if we do this : LockBufHdr(buf); PinBuffer_Locked(buf); valid = ((buf->flags & BM_VALID) != 0); then we can avoid having the new buffer access strategy BAS_DISCARD that is introduced in this patch. And so the code changes in freelist.c would not be necessary. Will give further thought on the overall logic in XLogLockBlockRangeForCleanup(). > will follow up with some performance numbers soon. > Yes, that would be nice. -Amit
[HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
nbtxlog.c:btree_xlog_vacuum() contains the following comment: * XXX we don't actually need to read the block, we just need to * confirm it is unpinned. If we had a special call into the * buffer manager we could optimise this so that if the block is * not in shared_buffers we confirm it as unpinned. The attached patch introduces an XLogLockBlockRangeForCleanup() function that addresses this, as well as a "special call into the buffer manager" that makes it possible to lock the buffers without reading them. The patch is by Simon Riggs, with some minor edits and testing by me. I'm adding the patch to the CommitFest, and will follow up with some performance numbers soon. -- Abhijit diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index 5f9fc49..dc90c02 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) * here during crash recovery. */ if (HotStandbyActiveInReplay()) - { - BlockNumber blkno; - - for (blkno = xlrec->lastBlockVacuumed + 1; blkno < xlrec->block; blkno++) - { - /* - * We use RBM_NORMAL_NO_LOG mode because it's not an error - * condition to see all-zero pages. The original btvacuumpage - * scan would have skipped over all-zero pages, noting them in FSM - * but not bothering to initialize them just yet; so we mustn't - * throw an error here. (We could skip acquiring the cleanup lock - * if PageIsNew, but it's probably not worth the cycles to test.) - * - * XXX we don't actually need to read the block, we just need to - * confirm it is unpinned. If we had a special call into the - * buffer manager we could optimise this so that if the block is - * not in shared_buffers we confirm it as unpinned. - */ - buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, - RBM_NORMAL_NO_LOG); - if (BufferIsValid(buffer)) - { -LockBufferForCleanup(buffer); -UnlockReleaseBuffer(buffer); - } - } - } + XLogLockBlockRangeForCleanup(xlrec->node, MAIN_FORKNUM, + xlrec->lastBlockVacuumed + 1, + xlrec->block); /* * If we have a full-page image, restore it (using a cleanup lock) and diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b7829ff..302551f 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. - * - * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't - * exist, and we don't check for all-zeroes. Thus, no log entry is made - * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, @@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - if (mode == RBM_NORMAL_NO_LOG) - return InvalidBuffer; /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); @@ -375,6 +369,73 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the + * locking effects imposed by VACUUM for nbtrees. + */ +void +XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum, + BlockNumber startBlkno, + BlockNumber uptoBlkno) +{ + BlockNumber blkno; + BlockNumber lastblock; + BlockNumber endingBlkno; + Buffer buffer; + BufferAccessStrategy bstrategy; + SMgrRelation smgr; + + Assert(startBlkno != P_NEW); + Assert(uptoBlkno != P_NEW); + + /* Open the relation at smgr level */ + smgr = smgropen(rnode, InvalidBackendId); + + /* + * Create the target file if it doesn't already exist. This lets us cope + * if the replay sequence contains writes to a relation that is later + * deleted. (The original coding of this routine would instead suppress + * the writes, but that seems like it risks losing valuable data if the + * filesystem loses an inode during a crash. Better to write the data + * until we are actually told to delete the file.) + */ + smgrcreate(smgr, forkNum, true); + + lastblock = smgrnblocks(smgr, forkNum); + + endingBlkno = uptoBlkno; + if (lastblock < endingBlkno) + endingBlkno = lastblock; + + /* + * We need to use a BufferAccessStrategy because we do not want to + * increment the buffer's usage_count as we read them. However, + * there is no ring buffer associated with this strategy, since we + * never actually read or write the contents, just lock them. + */ + bstrategy = GetAccessStrategy(BAS_DISCARD); + + fo