On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko <[email protected]> wrote:
> On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby <[email protected]> wrote:
>> On 4/6/15 5:18 PM, Greg Stark wrote:
>>>
>>> Only I would suggest thinking of it in terms of two orthogonal boolean
>>> flags rather than three states. It's easier to reason about whether a
>>> table has a specific property than trying to control a state machine in
>>> a predefined pathway.
>>>
>>> So I would say the two flags are:
>>> READONLY: guarantees nothing can be dirtied
>>> ALLFROZEN: guarantees no unfrozen tuples are present
>>>
>>> In practice you can't have the later without the former since vacuum
>>> can't know everything is frozen unless it knows nobody is inserting. But
>>> perhaps there will be cases in the future where that's not true.
>>
>>
>> I'm not so sure about that. There's a logical state progression here (see
>> below). ISTM it's easier to just enforce that in one place instead of a
>> bunch of places having to check multiple conditions. But, I'm not wed to a
>> single field.
>>
>>> Incidentally there are number of other optimisations tat over had in
>>> mind that are only possible on frozen read-only tables:
>>>
>>> 1) Compression: compress the pages and pack them one after the other.
>>> Build a new fork with offsets for each page.
>>>
>>> 2) Automatic partition elimination where the statistics track the
>>> minimum and maximum value per partition (and number of tuples) and treat
>>> then as implicit constraints. In particular it would magically make read
>>> only empty parent partitions be excluded regardless of the where clause.
>>
>>
>> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
>> uncompact and re-compact for #1 when you actually freeze (which maybe isn't
>> worth it), but freezing isn't absolutely required. #2 would only require
>> that everything in the relation is visible; not frozen.
>>
>> I think there's value here to having an ALLVISIBLE state as well as
>> ALLFROZEN.
>>
>
> Based on may suggestions, I'm going to deal with FM at first as one
> patch. It would be simply mechanism and similar to VM, at first patch.
> - Each bit of FM represent single page
> - The bit is set only by vacuum
> - The bit is un-set by inserting and updating and deleting
>
> At second, I'll deal with simply read-only table and 2 states,
> Read/Write(default) and ReadOnly as one patch. ITSM the having the
> Frozen state needs to more discussion. read-only table just allow us
> to disable any updating table, and it's controlled by read-only flag
> pg_class has. And DDL command which changes these status is like ALTER
> TABLE SET READ ONLY, or READ WRITE.
> Also as Alvaro's suggested, the read-only table affect not only
> freezing table but also performance optimization. I'll consider
> including them when I deal with read-only table.
>
Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.
Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.
Current patch adds new source file src/backend/access/heap/frozenmap.c
which is quite similar to visibilitymap.c. They have similar code but
are separated for now. I do refactoring these source code like adding
bitmap.c, if needed.
Also, when skipping vacuum by visibility map, we can skip at least
SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
frozen map.
Please give me feedbacks.
Regards,
-------
Sawada Masahiko
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..53f07fd 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/heap
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o frozenmap.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/frozenmap.c b/src/backend/access/heap/frozenmap.c
new file mode 100644
index 0000000..6e64cb8
--- /dev/null
+++ b/src/backend/access/heap/frozenmap.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * frozenmap.c
+ * bitmap for tracking frozen heap tuples
+ *
+ * Portions Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/access/heap/frozenmap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/frozenmap.h"
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/smgr.h"
+#include "utils/inval.h"
+
+
+//#define TRACE_FROZENMAP
+
+/*
+ * Size of the bitmap on each frozen map page, in bytes. There's no
+ * extra headers, so the whole page minus the standard page header is
+ * used for the bitmap.
+ */
+#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
+
+/* Number of bits allocated for each heap block. */
+#define BITS_PER_HEAPBLOCK 1
+
+/* Number of heap blocks we can represent in one byte. */
+#define HEAPBLOCKS_PER_BYTE 8
+
+/* Number of heap blocks we can represent in one frozen map page. */
+#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)
+
+/* Mapping from heap block number to the right bit in the frozen map */
+#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
+#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)
+#define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
+
+/* table for fast counting of set bits */
+static const uint8 number_of_ones[256] = {
+ 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
+};
+
+/* prototypes for internal routines */
+static Buffer fm_readbuf(Relation rel, BlockNumber blkno, bool extend);
+static void fm_extend(Relation rel, BlockNumber nfmblocks);
+
+
+/*
+ * frozenmap_clear - clear a bit in frozen map
+ *
+ * This function is same logic as visibilitymap_clear.
+ * You must pass a buffer containing the correct map page to this function.
+ * Call frozenmap_pin first to pin the right one. This function doesn't do
+ * any I/O.
+ */
+void
+frozenmap_clear(Relation rel, BlockNumber heapBlk, Buffer buf)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ int mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+ uint8 mask = 1 << mapBit;
+ char *map;
+
+#ifdef TRACE_FROZENMAP
+ elog(DEBUG1, "fm_clear %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ if (!BufferIsValid(buf) || BufferGetBlockNumber(buf) != mapBlock)
+ elog(ERROR, "wrong buffer passed to frozenmap_clear");
+
+ LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+ map = PageGetContents(BufferGetPage(buf));
+
+ if (map[mapByte] & mask)
+ {
+ map[mapByte] &= ~mask;
+
+ MarkBufferDirty(buf);
+ }
+
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+}
+
+/*
+ * frozenmap_pin - pin a map page for setting a bit
+ *
+ * This function is same logic as visibilitymap_pin.
+ * Setting a bit in the frozen map is a two-phase operation. First, call
+ * frozenmap_pin, to pin the frozen map page containing the bit for
+ * the heap page. Because that can require I/O to read the map page, you
+ * shouldn't hold a lock on the heap page while doing that. Then, call
+ * frozenmap_set to actually set the bit.
+ *
+ * On entry, *buf should be InvalidBuffer or a valid buffer returned by
+ * an earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation. On return, *buf is a valid buffer with the map page containing
+ * the bit for heapBlk.
+ *
+ * If the page doesn't exist in the map file yet, it is extended.
+ */
+void
+frozenmap_pin(Relation rel, BlockNumber heapBlk, Buffer *buf)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+
+ /* Reuse the old pinned buffer if possible */
+ if (BufferIsValid(*buf))
+ {
+ if (BufferGetBlockNumber(*buf) == mapBlock)
+ return;
+
+ ReleaseBuffer(*buf);
+ }
+ *buf = fm_readbuf(rel, mapBlock, true);
+}
+
+/*
+ * frozenmap_pin_ok - do we already have the correct page pinned?
+ *
+ * On entry, buf should be InvalidBuffer or a valid buffer returned by
+ * an earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation. The return value indicates whether the buffer covers the
+ * given heapBlk.
+ */
+bool
+frozenmap_pin_ok(BlockNumber heapBlk, Buffer buf)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+
+ return BufferIsValid(buf) && BufferGetBlockNumber(buf) == mapBlock;
+}
+
+/*
+ * frozenmap_set - set a bit on a previously pinned page
+ *
+ * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
+ * or InvalidXLogRecPtr in normal running. The page LSN is advanced to the
+ * one provided; in normal running, we generate a new XLOG record and set the
+ * page LSN to that value. cutoff_xid is the largest xmin on the page being
+ * marked all-frozen; it is needed for Hot Standby, and can be
+ * InvalidTransactionId if the page contains no tuples.
+ *
+ * Caller is expected to set the heap page's PD_ALL_FROZEN bit before calling
+ * this function. Except in recovery, caller should also pass the heap
+ * buffer. When checksums are enabled and we're not in recovery, we must add
+ * the heap buffer to the WAL chain to protect it from being torn.
+ *
+ * You must pass a buffer containing the correct map page to this function.
+ * Call frozenmap_pin first to pin the right one. This function doesn't do
+ * any I/O.
+ */
+void
+frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr, Buffer fmBuf)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+ Page page;
+ char *map;
+
+#ifdef TRACE_FROZENMAP
+ elog(DEBUG1, "fm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+ Assert(InRecovery || BufferIsValid(heapBuf));
+
+ /* Check that we have the right heap page pinned, if present */
+ if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
+ elog(ERROR, "wrong heap buffer passed to frozenmap_set");
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(fmBuf) || BufferGetBlockNumber(fmBuf) != mapBlock)
+ elog(ERROR, "wrong FM buffer passed to frozenmap_set");
+
+ page = BufferGetPage(fmBuf);
+ map = PageGetContents(page);
+ LockBuffer(fmBuf, BUFFER_LOCK_EXCLUSIVE);
+
+ if (!(map[mapByte] & (1 << mapBit)))
+ {
+ START_CRIT_SECTION();
+
+ map[mapByte] |= (1 << mapBit);
+ MarkBufferDirty(fmBuf);
+
+ if (RelationNeedsWAL(rel))
+ {
+ if (XLogRecPtrIsInvalid(recptr))
+ {
+ Assert(!InRecovery);
+ recptr = log_heap_frozenmap(rel->rd_node, heapBuf, fmBuf);
+
+ /*
+ * If data checksums are enabled (or wal_log_hints=on), we
+ * need to protect the heap page from being torn.
+ */
+ if (XLogHintBitIsNeeded())
+ {
+ Page heapPage = BufferGetPage(heapBuf);
+
+ /* caller is expected to set PD_ALL_FROZEN first */
+ Assert(PageIsAllFrozen(heapPage));
+ PageSetLSN(heapPage, recptr);
+ }
+ }
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
+ }
+
+ LockBuffer(fmBuf, BUFFER_LOCK_UNLOCK);
+}
+
+/*
+ * frozenmap_test - test if a bit is set
+ *
+ * Are all tuples on heapBlk frozen to all, according to the frozen map?
+ *
+ * On entry, *buf should be InvalidBuffer or a valid buffer returned by an
+ * earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation. On return, *buf is a valid buffer with the map page containing
+ * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
+ * releasing *buf after it's done testing and setting bits.
+ *
+ * NOTE: This function is typically called without a lock on the heap page,
+ * so somebody else could change the bit just after we look at it. In fact,
+ * since we don't lock the frozen map page either, it's even possible that
+ * someone else could have changed the bit just before we look at it, but yet
+ * we might see the old value. It is the caller's responsibility to deal with
+ * all concurrency issues!
+ */
+bool
+frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *buf)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+ bool result;
+ char *map;
+
+#ifdef TRACE_FROZENMAP
+ elog(DEBUG1, "fm_test %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Reuse the old pinned buffer if possible */
+ if (BufferIsValid(*buf))
+ {
+ if (BufferGetBlockNumber(*buf) != mapBlock)
+ {
+ ReleaseBuffer(*buf);
+ *buf = InvalidBuffer;
+ }
+ }
+
+ if (!BufferIsValid(*buf))
+ {
+ *buf = fm_readbuf(rel, mapBlock, false);
+ if (!BufferIsValid(*buf))
+ return false;
+ }
+
+ map = PageGetContents(BufferGetPage(*buf));
+
+ /*
+ * A single-bit read is atomic. There could be memory-ordering effects
+ * here, but for performance reasons we make it the caller's job to worry
+ * about that.
+ */
+ result = (map[mapByte] & (1 << mapBit)) ? true : false;
+
+ return result;
+}
+
+/*
+ * frozenmap_count - count number of bits set in frozen map
+ *
+ * Note: we ignore the possibility of race conditions when the table is being
+ * extended concurrently with the call. New pages added to the table aren't
+ * going to be marked all-frozen, so they won't affect the result.
+ */
+BlockNumber
+frozenmap_count(Relation rel)
+{
+ BlockNumber result = 0;
+ BlockNumber mapBlock;
+
+ for (mapBlock = 0;; mapBlock++)
+ {
+ Buffer mapBuffer;
+ unsigned char *map;
+ int i;
+
+ /*
+ * Read till we fall off the end of the map. We assume that any extra
+ * bytes in the last page are zeroed, so we don't bother excluding
+ * them from the count.
+ */
+ mapBuffer = fm_readbuf(rel, mapBlock, false);
+ if (!BufferIsValid(mapBuffer))
+ break;
+
+ /*
+ * We choose not to lock the page, since the result is going to be
+ * immediately stale anyway if anyone is concurrently setting or
+ * clearing bits, and we only really need an approximate value.
+ */
+ map = (unsigned char *) PageGetContents(BufferGetPage(mapBuffer));
+
+ for (i = 0; i < MAPSIZE; i++)
+ {
+ result += number_of_ones[map[i]];
+ }
+
+ ReleaseBuffer(mapBuffer);
+ }
+
+ return result;
+}
+
+/*
+ * frozenmap_truncate - truncate the frozen map
+ *
+ * The caller must hold AccessExclusiveLock on the relation, to ensure that
+ * other backends receive the smgr invalidation event that this function sends
+ * before they access the VM again.
+ *
+ * nheapblocks is the new size of the heap.
+ */
+void
+frozenmap_truncate(Relation rel, BlockNumber nheapblocks)
+{
+ BlockNumber newnblocks;
+
+ /* last remaining block, byte, and bit */
+ BlockNumber truncBlock = HEAPBLK_TO_MAPBLOCK(nheapblocks);
+ uint32 truncByte = HEAPBLK_TO_MAPBYTE(nheapblocks);
+ uint8 truncBit = HEAPBLK_TO_MAPBIT(nheapblocks);
+
+#ifdef TRACE_FROZENMAP
+ elog(DEBUG1, "fm_truncate %s %d", RelationGetRelationName(rel), nheapblocks);
+#endif
+
+ RelationOpenSmgr(rel);
+
+ /*
+ * If no frozen map has been created yet for this relation, there's
+ * nothing to truncate.
+ */
+ if (!smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+ return;
+
+ /*
+ * Unless the new size is exactly at a frozen map page boundary, the
+ * tail bits in the last remaining map page, representing truncated heap
+ * blocks, need to be cleared. This is not only tidy, but also necessary
+ * because we don't get a chance to clear the bits if the heap is extended
+ * again.
+ */
+ if (truncByte != 0 || truncBit != 0)
+ {
+ Buffer mapBuffer;
+ Page page;
+ char *map;
+
+ newnblocks = truncBlock + 1;
+
+ mapBuffer = fm_readbuf(rel, truncBlock, false);
+ if (!BufferIsValid(mapBuffer))
+ {
+ /* nothing to do, the file was already smaller */
+ return;
+ }
+
+ page = BufferGetPage(mapBuffer);
+ map = PageGetContents(page);
+
+ LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+ /* Clear out the unwanted bytes. */
+ MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1));
+
+ /*----
+ * Mask out the unwanted bits of the last remaining byte.
+ *
+ * ((1 << 0) - 1) = 00000000
+ * ((1 << 1) - 1) = 00000001
+ * ...
+ * ((1 << 6) - 1) = 00111111
+ * ((1 << 7) - 1) = 01111111
+ *----
+ */
+ map[truncByte] &= (1 << truncBit) - 1;
+
+ MarkBufferDirty(mapBuffer);
+ UnlockReleaseBuffer(mapBuffer);
+ }
+ else
+ newnblocks = truncBlock;
+
+ if (smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM) <= newnblocks)
+ {
+ /* nothing to do, the file was already smaller than requested size */
+ return;
+ }
+
+ /* Truncate the unused VM pages, and send smgr inval message */
+ smgrtruncate(rel->rd_smgr, FROZENMAP_FORKNUM, newnblocks);
+
+ /*
+ * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
+ * sent an smgr cache inval message, which will cause other backends to
+ * invalidate their copy of smgr_vm_nblocks, and this one too at the next
+ * command boundary. But this ensures it isn't outright wrong until then.
+ */
+ if (rel->rd_smgr)
+ rel->rd_smgr->smgr_fm_nblocks = newnblocks;
+}
+
+/*
+ * Read a frozen map page.
+ *
+ * If the page doesn't exist, InvalidBuffer is returned, or if 'extend' is
+ * true, the frozen map file is extended.
+ */
+static Buffer
+fm_readbuf(Relation rel, BlockNumber blkno, bool extend)
+{
+ Buffer buf;
+
+ /*
+ * We might not have opened the relation at the smgr level yet, or we
+ * might have been forced to close it by a sinval message. The code below
+ * won't necessarily notice relation extension immediately when extend =
+ * false, so we rely on sinval messages to ensure that our ideas about the
+ * size of the map aren't too far out of date.
+ */
+ RelationOpenSmgr(rel);
+
+ /*
+ * If we haven't cached the size of the frozen map fork yet, check it
+ * first.
+ */
+ if (rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber)
+ {
+ if (smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+ rel->rd_smgr->smgr_fm_nblocks = smgrnblocks(rel->rd_smgr,
+ FROZENMAP_FORKNUM);
+ else
+ rel->rd_smgr->smgr_fm_nblocks = 0;
+ }
+
+ /* Handle requests beyond EOF */
+ if (blkno >= rel->rd_smgr->smgr_fm_nblocks)
+ {
+ if (extend)
+ fm_extend(rel, blkno + 1);
+ else
+ return InvalidBuffer;
+ }
+
+ /*
+ * Use ZERO_ON_ERROR mode, and initialize the page if necessary. It's
+ * always safe to clear bits, so it's better to clear corrupt pages than
+ * error out.
+ */
+ buf = ReadBufferExtended(rel, FROZENMAP_FORKNUM, blkno,
+ RBM_ZERO_ON_ERROR, NULL);
+ if (PageIsNew(BufferGetPage(buf)))
+ PageInit(BufferGetPage(buf), BLCKSZ, 0);
+ return buf;
+}
+
+/*
+ * Ensure that the frozen map fork is at least vm_nblocks long, extending
+ * it if necessary with zeroed pages.
+ */
+static void
+fm_extend(Relation rel, BlockNumber fm_nblocks)
+{
+ BlockNumber fm_nblocks_now;
+ Page pg;
+
+ pg = (Page) palloc(BLCKSZ);
+ PageInit(pg, BLCKSZ, 0);
+
+ /*
+ * We use the relation extension lock to lock out other backends trying to
+ * extend the frozen map at the same time. It also locks out extension
+ * of the main fork, unnecessarily, but extending the frozen map
+ * happens seldom enough that it doesn't seem worthwhile to have a
+ * separate lock tag type for it.
+ *
+ * Note that another backend might have extended or created the relation
+ * by the time we get the lock.
+ */
+ LockRelationForExtension(rel, ExclusiveLock);
+
+ /* Might have to re-open if a cache flush happened */
+ RelationOpenSmgr(rel);
+
+ /*
+ * Create the file first if it doesn't exist. If smgr_vm_nblocks is
+ * positive then it must exist, no need for an smgrexists call.
+ */
+ if ((rel->rd_smgr->smgr_fm_nblocks == 0 ||
+ rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber) &&
+ !smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+ smgrcreate(rel->rd_smgr, FROZENMAP_FORKNUM, false);
+
+ fm_nblocks_now = smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM);
+
+ /* Now extend the file */
+ while (fm_nblocks_now < fm_nblocks)
+ {
+ PageSetChecksumInplace(pg, fm_nblocks_now);
+
+ smgrextend(rel->rd_smgr, FROZENMAP_FORKNUM, fm_nblocks_now,
+ (char *) pg, false);
+ fm_nblocks_now++;
+ }
+
+ /*
+ * Send a shared-inval message to force other backends to close any smgr
+ * references they may have for this rel, which we are about to change.
+ * This is a useful optimization because it means that backends don't have
+ * to keep checking for creation or extension of the file, which happens
+ * infrequently.
+ */
+ CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);
+
+ /* Update local cache with the up-to-date size */
+ rel->rd_smgr->smgr_fm_nblocks = fm_nblocks_now;
+
+ UnlockRelationForExtension(rel, ExclusiveLock);
+
+ pfree(pg);
+}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..7f7c147 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -38,6 +38,7 @@
*/
#include "postgres.h"
+#include "access/frozenmap.h"
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/hio.h"
@@ -86,7 +87,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
- bool all_visible_cleared, bool new_all_visible_cleared);
+ bool all_visible_cleared, bool new_all_visible_cleared,
+ bool all_frozen_cleared, bool new_all_frozen_cleared);
static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
Bitmapset *hot_attrs,
Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2067,8 +2069,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
- Buffer vmbuffer = InvalidBuffer;
+ Buffer vmbuffer = InvalidBuffer,
+ fmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
+ bool all_frozen_cleared;
/*
* Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2092,12 +2096,14 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
/*
- * Find buffer to insert this tuple into. If the page is all visible,
- * this will also pin the requisite visibility map page.
+ * Find buffer to insert this tuple into. If the page is all visible
+ * of all frozen, this will also pin the requisite visibility map and
+ * frozen map page.
*/
buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
InvalidBuffer, options, bistate,
- &vmbuffer, NULL);
+ &vmbuffer, NULL,
+ &fmbuffer, NULL);
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2113,6 +2119,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
vmbuffer);
}
+ if (PageIsAllFrozen(BufferGetPage(buffer)))
+ {
+ all_frozen_cleared = true;
+ PageClearAllFrozen(BufferGetPage(buffer));
+ frozenmap_clear(relation,
+ ItemPointerGetBlockNumber(&(heaptup->t_self)),
+ fmbuffer);
+ }
+
/*
* XXX Should we set PageSetPrunable on this page ?
*
@@ -2157,6 +2172,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xlrec.offnum = ItemPointerGetOffsetNumber(&heaptup->t_self);
xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+ if (all_frozen_cleared)
+ xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
/*
@@ -2199,6 +2216,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
UnlockReleaseBuffer(buffer);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
+ if (fmbuffer != InvalidBuffer)
+ ReleaseBuffer(fmbuffer);
/*
* If tuple is cachable, mark it for invalidation from the caches in case
@@ -2346,8 +2365,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
while (ndone < ntuples)
{
Buffer buffer;
- Buffer vmbuffer = InvalidBuffer;
+ Buffer vmbuffer = InvalidBuffer,
+ fmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
+ bool all_frozen_cleared = false;
int nthispage;
CHECK_FOR_INTERRUPTS();
@@ -2358,7 +2379,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
*/
buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
InvalidBuffer, options, bistate,
- &vmbuffer, NULL);
+ &vmbuffer, NULL,
+ &fmbuffer, NULL);
page = BufferGetPage(buffer);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -2395,6 +2417,15 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
vmbuffer);
}
+ if (PageIsAllFrozen(page))
+ {
+ all_frozen_cleared = true;
+ PageClearAllFrozen(page);
+ frozenmap_clear(relation,
+ BufferGetBlockNumber(buffer),
+ fmbuffer);
+ }
+
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
*/
@@ -2437,6 +2468,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
tupledata = scratchptr;
xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+ if (all_frozen_cleared)
+ xlrec->flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
xlrec->ntuples = nthispage;
/*
@@ -2509,6 +2542,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
UnlockReleaseBuffer(buffer);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
+ if (fmbuffer != InvalidBuffer)
+ ReleaseBuffer(fmbuffer);
ndone += nthispage;
}
@@ -3053,7 +3088,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Buffer buffer,
newbuf,
vmbuffer = InvalidBuffer,
- vmbuffer_new = InvalidBuffer;
+ vmbuffer_new = InvalidBuffer,
+ fmbuffer = InvalidBuffer,
+ fmbuffer_new = InvalidBuffer;
bool need_toast,
already_marked;
Size newtupsize,
@@ -3067,6 +3104,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
+ bool all_frozen_cleared = false;
+ bool all_frozen_cleared_new = false;
bool checked_lockers;
bool locker_remains;
TransactionId xmax_new_tuple,
@@ -3100,14 +3139,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
page = BufferGetPage(buffer);
/*
- * Before locking the buffer, pin the visibility map page if it appears to
- * be necessary. Since we haven't got the lock yet, someone else might be
- * in the middle of changing this, so we'll need to recheck after we have
- * the lock.
+ * Before locking the buffer, pin the visibility map and frozen map page
+ * if it appears to be necessary. Since we haven't got the lock yet,
+ * someone else might be in the middle of changing this, so we'll need to
+ * recheck after we have the lock.
*/
if (PageIsAllVisible(page))
visibilitymap_pin(relation, block, &vmbuffer);
+ if (PageIsAllFrozen(page))
+ frozenmap_pin(relation, block, &fmbuffer);
+
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
@@ -3390,19 +3432,21 @@ l2:
UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
+ if (fmbuffer_new != InvalidBuffer)
+ ReleaseBuffer(fmbuffer);
bms_free(hot_attrs);
bms_free(key_attrs);
return result;
}
/*
- * If we didn't pin the visibility map page and the page has become all
- * visible while we were busy locking the buffer, or during some
- * subsequent window during which we had it unlocked, we'll have to unlock
- * and re-lock, to avoid holding the buffer lock across an I/O. That's a
- * bit unfortunate, especially since we'll now have to recheck whether the
- * tuple has been locked or updated under us, but hopefully it won't
- * happen very often.
+ * If we didn't pin the visibility(and frozen) map page and the page has
+ * become all visible(and frozen) while we were busy locking the buffer,
+ * or during some subsequent window during which we had it unlocked,
+ * we'll have to unlock and re-lock, to avoid holding the buffer lock
+ * across an I/O. That's a bit unfortunate, especially since we'll now
+ * have to recheck whether the tuple has been locked or updated under us,
+ * but hopefully it won't happen very often.
*/
if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
{
@@ -3412,6 +3456,15 @@ l2:
goto l2;
}
+ if (fmbuffer == InvalidBuffer && PageIsAllFrozen(page))
+ {
+ LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+ frozenmap_pin(relation, block, &fmbuffer);
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+ goto l2;
+
+ }
+
/*
* We're about to do the actual update -- check for conflict first, to
* avoid possibly having to roll back work we've just done.
@@ -3570,7 +3623,8 @@ l2:
/* Assume there's no chance to put heaptup on same page. */
newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
buffer, 0, NULL,
- &vmbuffer_new, &vmbuffer);
+ &vmbuffer_new, &vmbuffer,
+ &fmbuffer_new, &fmbuffer);
}
else
{
@@ -3588,7 +3642,8 @@ l2:
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
buffer, 0, NULL,
- &vmbuffer_new, &vmbuffer);
+ &vmbuffer_new, &vmbuffer,
+ &fmbuffer_new, &fmbuffer);
}
else
{
@@ -3713,6 +3768,22 @@ l2:
vmbuffer_new);
}
+ /* clear PD_ALL_FROZEN flags */
+ if (newbuf == buffer && PageIsAllFrozen(BufferGetPage(buffer)))
+ {
+ all_frozen_cleared = true;
+ PageClearAllFrozen(BufferGetPage(buffer));
+ frozenmap_clear(relation, BufferGetBlockNumber(buffer),
+ fmbuffer);
+ }
+ else if (newbuf != buffer && PageIsAllFrozen(BufferGetPage(newbuf)))
+ {
+ all_frozen_cleared_new = true;
+ PageClearAllFrozen(BufferGetPage(newbuf));
+ frozenmap_clear(relation, BufferGetBlockNumber(newbuf),
+ fmbuffer_new);
+ }
+
if (newbuf != buffer)
MarkBufferDirty(newbuf);
MarkBufferDirty(buffer);
@@ -3736,7 +3807,9 @@ l2:
newbuf, &oldtup, heaptup,
old_key_tuple,
all_visible_cleared,
- all_visible_cleared_new);
+ all_visible_cleared_new,
+ all_frozen_cleared,
+ all_frozen_cleared_new);
if (newbuf != buffer)
{
PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3768,6 +3841,10 @@ l2:
ReleaseBuffer(vmbuffer_new);
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
+ if (BufferIsValid(fmbuffer_new))
+ ReleaseBuffer(fmbuffer_new);
+ if (BufferIsValid(fmbuffer))
+ ReleaseBuffer(fmbuffer);
/*
* Release the lmgr tuple lock, if we had it.
@@ -6534,6 +6611,34 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
}
/*
+ * Perform XLogInsert for a heap-all-frozen operation. heap_buffer is the block
+ * being marked all-frozen, and fm_buffer is the buffer containing the
+ * corresponding frozen map block. Both should have already been modified and dirty.
+ */
+XLogRecPtr
+log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer, Buffer fm_buffer)
+{
+ XLogRecPtr recptr;
+ uint8 flags;
+
+ Assert(BufferIsValid(heap_buffer));
+ Assert(BufferIsValid(fm_buffer));
+
+ XLogBeginInsert();
+
+ XLogRegisterBuffer(0, fm_buffer, 0);
+
+ flags = REGBUF_STANDARD;
+ if (!XLogHintBitIsNeeded())
+ flags |= REGBUF_NO_IMAGE;
+ XLogRegisterBuffer(1, heap_buffer, flags);
+
+ recptr = XLogInsert(RM_HEAP3_ID, XLOG_HEAP3_FROZENMAP);
+
+ return recptr;
+}
+
+/*
* Perform XLogInsert for a heap-visible operation. 'block' is the block
* being marked all-visible, and vm_buffer is the buffer containing the
* corresponding visibility map block. Both should have already been modified
@@ -6577,7 +6682,8 @@ static XLogRecPtr
log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
HeapTuple old_key_tuple,
- bool all_visible_cleared, bool new_all_visible_cleared)
+ bool all_visible_cleared, bool new_all_visible_cleared,
+ bool all_frozen_cleared, bool new_all_frozen_cleared)
{
xl_heap_update xlrec;
xl_heap_header xlhdr;
@@ -6660,6 +6766,10 @@ log_heap_update(Relation reln, Buffer oldbuf,
xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
if (new_all_visible_cleared)
xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
+ if (all_frozen_cleared)
+ xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
+ if (new_all_frozen_cleared)
+ xlrec.flags |= XLOG_HEAP_NEW_ALL_FROZEN_CLEARED;
if (prefixlen > 0)
xlrec.flags |= XLOG_HEAP_PREFIX_FROM_OLD;
if (suffixlen > 0)
@@ -7198,6 +7308,75 @@ heap_xlog_visible(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
+
+/*
+ * Reply XLOG_HEAP3_FROZENMAP record.
+ */
+static void
+heap_xlog_frozenmap(XLogReaderState *record)
+{
+ XLogRecPtr lsn = record->EndRecPtr;
+ Buffer fmbuffer = InvalidBuffer;
+ Buffer buffer;
+ Page page;
+ RelFileNode rnode;
+ BlockNumber blkno;
+ XLogRedoAction action;
+
+ XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno);
+
+ /*
+ * Read the heap page, if it still exists. If the heap file has dropped or
+ * truncated later in recovery, we don't need to update the page, but we'd
+ * better still update the frozen map.
+ */
+ action = XLogReadBufferForRedo(record, 1, &buffer);
+ if (action == BLK_NEEDS_REDO)
+ {
+ page = BufferGetPage(buffer);
+ PageSetAllFrozen(page);
+ MarkBufferDirty(buffer);
+ }
+ else if (action == BLK_RESTORED)
+ {
+ /*
+ * If heap block was backed up, restore it. This can only happen with
+ * checksums enabled.
+ */
+ Assert(DataChecksumsEnabled());
+ }
+ if (BufferIsValid(buffer))
+ UnlockReleaseBuffer(buffer);
+
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
+ &fmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page fmpage = BufferGetPage(fmbuffer);
+ Relation reln;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(fmpage))
+ PageInit(fmpage, BLCKSZ, 0);
+
+ /*
+ * XLogReplayBufferExtended locked the buffer. But frozenmap_set
+ * will handle locking itself.
+ */
+ LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);
+
+ reln = CreateFakeRelcacheEntry(rnode);
+ frozenmap_pin(reln, blkno, &fmbuffer);
+
+ if (lsn > PageGetLSN(fmpage))
+ frozenmap_set(reln, blkno, InvalidBuffer, lsn, fmbuffer);
+
+ ReleaseBuffer(fmbuffer);
+ FreeFakeRelcacheEntry(reln);
+ }
+ else if (BufferIsValid(fmbuffer))
+ UnlockReleaseBuffer(fmbuffer);
+}
+
/*
* Replay XLOG_HEAP2_FREEZE_PAGE records
*/
@@ -7384,6 +7563,20 @@ heap_xlog_insert(XLogReaderState *record)
FreeFakeRelcacheEntry(reln);
}
+ /* The frozen map may need to be fixed even if the heap page is
+ * already up-to-date.
+ */
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ {
+ Relation reln = CreateFakeRelcacheEntry(target_node);
+ Buffer fmbuffer = InvalidBuffer;
+
+ frozenmap_pin(reln, blkno, &fmbuffer);
+ frozenmap_clear(reln, blkno, fmbuffer);
+ ReleaseBuffer(fmbuffer);
+ FreeFakeRelcacheEntry(reln);
+ }
+
/*
* If we inserted the first and only tuple on the page, re-initialize the
* page from scratch.
@@ -7439,6 +7632,9 @@ heap_xlog_insert(XLogReaderState *record)
if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ PageClearAllFrozen(page);
+
MarkBufferDirty(buffer);
}
if (BufferIsValid(buffer))
@@ -7504,6 +7700,21 @@ heap_xlog_multi_insert(XLogReaderState *record)
FreeFakeRelcacheEntry(reln);
}
+ /*
+ * The frozen map may need to be fixed even if the heap page is
+ * already up-to-date.
+ */
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rnode);
+ Buffer fmbuffer = InvalidBuffer;
+
+ visibilitymap_pin(reln, blkno, &fmbuffer);
+ visibilitymap_clear(reln, blkno, fmbuffer);
+ ReleaseBuffer(fmbuffer);
+ FreeFakeRelcacheEntry(reln);
+ }
+
if (isinit)
{
buffer = XLogInitBufferForRedo(record, 0);
@@ -7577,6 +7788,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ PageClearAllFrozen(page);
MarkBufferDirty(buffer);
}
@@ -7660,6 +7873,22 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
}
/*
+ * The frozen map may need to be fixed even if the heap page is
+ * already up-to-date.
+ */
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rnode);
+ Buffer fmbuffer = InvalidBuffer;
+
+ frozenmap_pin(reln, oldblk, &fmbuffer);
+ frozenmap_clear(reln, oldblk, fmbuffer);
+ ReleaseBuffer(fmbuffer);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+
+ /*
* In normal operation, it is important to lock the two pages in
* page-number order, to avoid possible deadlocks against other update
* operations going the other way. However, during WAL replay there can
@@ -7705,6 +7934,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ PageClearAllFrozen(page);
PageSetLSN(page, lsn);
MarkBufferDirty(obuffer);
@@ -7743,6 +7974,21 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
FreeFakeRelcacheEntry(reln);
}
+ /*
+ * The frozen map may need to be fixed even if the heap page is
+ * already up-to-date.
+ */
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rnode);
+ Buffer fmbuffer = InvalidBuffer;
+
+ visibilitymap_pin(reln, oldblk, &fmbuffer);
+ visibilitymap_clear(reln, oldblk, fmbuffer);
+ ReleaseBuffer(fmbuffer);
+ FreeFakeRelcacheEntry(reln);
+ }
+
/* Deal with new tuple */
if (newaction == BLK_NEEDS_REDO)
{
@@ -7840,6 +8086,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
+ if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+ PageClearAllFrozen(page);
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
@@ -8072,6 +8320,21 @@ heap2_redo(XLogReaderState *record)
}
}
+void
+heap3_redo(XLogReaderState *record)
+{
+ uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+ switch (info & XLOG_HEAP_OPMASK)
+ {
+ case XLOG_HEAP3_FROZENMAP:
+ heap_xlog_frozenmap(record);
+ break;
+ default:
+ elog(PANIC, "heap3_redo: unknown op code %u", info);
+ }
+}
+
/*
* heap_sync - sync a heap, for use when no WAL has been written
*
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 6d091f6..5460d4f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -15,6 +15,7 @@
#include "postgres.h"
+#include "access/frozenmap.h"
#include "access/heapam.h"
#include "access/hio.h"
#include "access/htup_details.h"
@@ -156,6 +157,62 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
}
/*
+ * For each heap page which is all-frozen, acquire a pin on the appropriate
+ * frozen map page, if we haven't already got one.
+ *
+ * This function is same logic as GetVisibilityMapPins function.
+ */
+static void
+GetFrozenMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
+ BlockNumber block1, BlockNumber block2,
+ Buffer *fmbuffer1, Buffer *fmbuffer2)
+{
+ bool need_to_pin_buffer1;
+ bool need_to_pin_buffer2;
+
+ Assert(BufferIsValid(buffer1));
+ Assert(buffer2 == InvalidBuffer || buffer1 <= buffer2);
+
+ while (1)
+ {
+ /* Figure out which pins we need but don't have. */
+ need_to_pin_buffer1 = PageIsAllFrozen(BufferGetPage(buffer1))
+ && !frozenmap_pin_ok(block1, *fmbuffer1);
+ need_to_pin_buffer2 = buffer2 != InvalidBuffer
+ && PageIsAllFrozen(BufferGetPage(buffer2))
+ && !frozenmap_pin_ok(block2, *fmbuffer2);
+ if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
+ return;
+
+ /* We must unlock both buffers before doing any I/O. */
+ LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
+ if (buffer2 != InvalidBuffer && buffer2 != buffer1)
+ LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
+
+ /* Get pins. */
+ if (need_to_pin_buffer1)
+ frozenmap_pin(relation, block1, fmbuffer1);
+ if (need_to_pin_buffer2)
+ frozenmap_pin(relation, block2, fmbuffer2);
+
+ /* Relock buffers. */
+ LockBuffer(buffer1, BUFFER_LOCK_EXCLUSIVE);
+ if (buffer2 != InvalidBuffer && buffer2 != buffer1)
+ LockBuffer(buffer2, BUFFER_LOCK_EXCLUSIVE);
+
+ /*
+ * If there are two buffers involved and we pinned just one of them,
+ * it's possible that the second one became all-frozen while we were
+ * busy pinning the first one. If it looks like that's a possible
+ * scenario, we'll need to make a second pass through this loop.
+ */
+ if (buffer2 == InvalidBuffer || buffer1 == buffer2
+ || (need_to_pin_buffer1 && need_to_pin_buffer2))
+ break;
+ }
+}
+
+/*
* RelationGetBufferForTuple
*
* Returns pinned and exclusive-locked buffer of a page in given relation
@@ -215,7 +272,8 @@ Buffer
RelationGetBufferForTuple(Relation relation, Size len,
Buffer otherBuffer, int options,
BulkInsertState bistate,
- Buffer *vmbuffer, Buffer *vmbuffer_other)
+ Buffer *vmbuffer, Buffer *vmbuffer_other,
+ Buffer *fmbuffer, Buffer *fmbuffer_other)
{
bool use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
Buffer buffer = InvalidBuffer;
@@ -316,6 +374,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
buffer = ReadBufferBI(relation, targetBlock, bistate);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
+ if (PageIsAllFrozen(BufferGetPage(buffer)))
+ frozenmap_pin(relation, targetBlock, fmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
else if (otherBlock == targetBlock)
@@ -324,6 +384,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
buffer = otherBuffer;
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
+ if (PageIsAllFrozen(BufferGetPage(buffer)))
+ frozenmap_pin(relation, targetBlock, fmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
else if (otherBlock < targetBlock)
@@ -332,6 +394,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
buffer = ReadBuffer(relation, targetBlock);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
+ if (PageIsAllFrozen(BufferGetPage(buffer)))
+ frozenmap_pin(relation, targetBlock, fmbuffer);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
@@ -341,6 +405,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
buffer = ReadBuffer(relation, targetBlock);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
+ if (PageIsAllFrozen(BufferGetPage(buffer)))
+ frozenmap_pin(relation, targetBlock, fmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
}
@@ -367,13 +433,23 @@ RelationGetBufferForTuple(Relation relation, Size len,
* done.
*/
if (otherBuffer == InvalidBuffer || buffer <= otherBuffer)
+ {
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
+ GetFrozenMapPins(relation, buffer, otherBuffer,
+ targetBlock, otherBlock, fmbuffer,
+ fmbuffer_other);
+ }
else
+ {
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);
+ GetFrozenMapPins(relation, otherBuffer, buffer,
+ otherBlock, targetBlock, fmbuffer_other,
+ fmbuffer);
+ }
/*
* Now we can check to see if there's enough free space here. If so,
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 4f06a26..9a67733 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -149,6 +149,20 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
+void
+heap3_desc(StringInfo buf, XLogReaderState *record)
+{
+ char *rec = XLogRecGetData(record);
+ uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+ if (info == XLOG_HEAP3_FROZENMAP)
+ {
+ xl_heap_clean *xlrec = (xl_heap_clean *) rec;
+
+ appendStringInfo(buf, "remxid %u", xlrec->latestRemovedXid);
+ }
+}
+
const char *
heap_identify(uint8 info)
{
@@ -226,3 +240,18 @@ heap2_identify(uint8 info)
return id;
}
+
+const char *
+heap3_identify(uint8 info)
+{
+ const char *id = NULL;
+
+ switch (info & ~XLR_INFO_MASK)
+ {
+ case XLOG_HEAP3_FROZENMAP:
+ id = "FROZENMAP";
+ break;
+ }
+
+ return id;
+}
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ce398fc..961775e 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -19,6 +19,7 @@
#include "postgres.h"
+#include "access/frozenmap.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -228,6 +229,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
{
bool fsm;
bool vm;
+ bool fm;
/* Open it at the smgr level if not already done */
RelationOpenSmgr(rel);
@@ -238,6 +240,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
rel->rd_smgr->smgr_targblock = InvalidBlockNumber;
rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
+ rel->rd_smgr->smgr_fm_nblocks = InvalidBlockNumber;
/* Truncate the FSM first if it exists */
fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
@@ -249,6 +252,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
if (vm)
visibilitymap_truncate(rel, nblocks);
+ /* Truncate the frozen map too if it exists. */
+ fm = smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM);
+ if (fm)
+ frozenmap_truncate(rel, nblocks);
+
/*
* We WAL-log the truncation before actually truncating, which means
* trouble if the truncation fails. If we then crash, the WAL replay
@@ -282,7 +290,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
* with a truncated heap, but the FSM or visibility map would still
* contain entries for the non-existent heap pages.
*/
- if (fsm || vm)
+ if (fsm || vm || fm)
XLogFlush(lsn);
}
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3febdd5..80a9f96 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -17,6 +17,7 @@
*/
#include "postgres.h"
+#include "access/frozenmap.h"
#include "access/multixact.h"
#include "access/relscan.h"
#include "access/rewriteheap.h"
@@ -1484,6 +1485,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
Oid mapped_tables[4];
int reindex_flags;
int i;
+ Buffer fmbuffer = InvalidBuffer,
+ buf = InvalidBuffer;
+ Relation rel;
+ BlockNumber nblocks, blkno;
/* Zero out possible results from swapped_relation_files */
memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1591,6 +1596,26 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RelationMapRemoveMapping(mapped_tables[i]);
/*
+ * We can ensure that the all tuple of new relation has been completely
+ * frozen at this point since we aquired AccessExclusiveLock already.
+ * We set a bit on frozen map and flag to page header to each page.
+ */
+ rel = relation_open(OIDOldHeap, NoLock);
+ nblocks = RelationGetNumberOfBlocks(rel);
+ for (blkno = 0; blkno < nblocks; blkno++)
+ {
+ buf = ReadBuffer(rel, blkno);
+ PageSetAllFrozen(BufferGetPage(buf));
+ frozenmap_pin(rel, blkno, &fmbuffer);
+ frozenmap_set(rel, blkno, buf, InvalidXLogRecPtr, fmbuffer);
+ ReleaseBuffer(buf);
+ }
+
+ if (fmbuffer != InvalidBuffer)
+ ReleaseBuffer(fmbuffer);
+ relation_close(rel, NoLock);
+
+ /*
* At this point, everything is kosher except that, if we did toast swap
* by links, the toast table's name corresponds to the transient table.
* The name is irrelevant to the backend because it's referenced by OID,
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c3d6e59..8e9940b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -37,6 +37,7 @@
#include <math.h>
+#include "access/frozenmap.h"
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heapam_xlog.h"
@@ -106,6 +107,7 @@ typedef struct LVRelStats
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages we skipped due to a pin */
+ BlockNumber fmskipped_pages; /* # of pages we skipped by frozen map */
double scanned_tuples; /* counts only tuples on scanned pages */
double old_rel_tuples; /* previous value of pg_class.reltuples */
double new_rel_tuples; /* new estimated total # of tuples */
@@ -222,6 +224,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
* than or equal to the requested Xid full-table scan limit; or if the
* table's minimum MultiXactId is older than or equal to the requested
* mxid full-table scan limit.
+ * Even if scan_all is set so far, we could skip to scan some pages
+ * according by frozen map.
*/
scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
xidFullScanLimit);
@@ -247,20 +251,22 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
vac_close_indexes(nindexes, Irel, NoLock);
/*
- * Compute whether we actually scanned the whole relation. If we did, we
- * can adjust relfrozenxid and relminmxid.
+ * Compute whether we actually scanned the whole relation. If we did,
+ * we can adjust relfrozenxid and relminmxid.
*
* NB: We need to check this before truncating the relation, because that
* will change ->rel_pages.
*/
- if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
+ if ((vacrelstats->scanned_pages + vacrelstats->fmskipped_pages)
+ < vacrelstats->rel_pages)
{
- Assert(!scan_all);
scanned_all = false;
}
else
scanned_all = true;
+ scanned_all |= scan_all;
+
/*
* Optionally truncate the relation.
*
@@ -450,7 +456,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
- Buffer vmbuffer = InvalidBuffer;
+ Buffer vmbuffer = InvalidBuffer,
+ fmbuffer = InvalidBuffer;
BlockNumber next_not_all_visible_block;
bool skipping_all_visible_blocks;
xl_heap_freeze_tuple *frozen;
@@ -533,6 +540,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
hastup;
int prev_dead_count;
int nfrozen;
+ int already_nfrozen; /* # of tuples already frozen */
+ int ntup_blk; /* # of tuples in single page */
Size freespace;
bool all_visible_according_to_vm;
bool all_visible;
@@ -562,12 +571,33 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
else
skipping_all_visible_blocks = false;
all_visible_according_to_vm = false;
+
+ /* Even if current block is not all-visible, we scan skip vacuum
+ * this block only when corresponding frozen map bit is set, and
+ * whole table scanning is required.
+ */
+ if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)
+ {
+ vacrelstats->fmskipped_pages++;
+ continue;
+ }
}
else
{
- /* Current block is all-visible */
+ /*
+ * Current block is all-visible.
+ * If frozen map represents that it's all frozen and this
+ * function is called for freezing tuples, we can skip to
+ * vacuum block.
+ */
+ if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)
+ {
+ vacrelstats->fmskipped_pages++;
+ continue;
+ }
if (skipping_all_visible_blocks && !scan_all)
continue;
+
all_visible_according_to_vm = true;
}
@@ -592,6 +622,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
vmbuffer = InvalidBuffer;
}
+ if (BufferIsValid(fmbuffer))
+ {
+ ReleaseBuffer(fmbuffer);
+ fmbuffer = InvalidBuffer;
+ }
+
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -621,6 +657,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
* and did a cycle of index vacuuming.
*/
visibilitymap_pin(onerel, blkno, &vmbuffer);
+ frozenmap_pin(onerel, blkno, &fmbuffer);
buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
@@ -763,6 +800,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
all_visible = true;
has_dead_tuples = false;
nfrozen = 0;
+ already_nfrozen = 0;
+ ntup_blk = 0;
hastup = false;
prev_dead_count = vacrelstats->num_dead_tuples;
maxoff = PageGetMaxOffsetNumber(page);
@@ -917,8 +956,13 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
else
{
num_tuples += 1;
+ ntup_blk += 1;
hastup = true;
+ /* If current tuple is already frozen, count it up */
+ if (HeapTupleHeaderXminFrozen(tuple.t_data))
+ already_nfrozen += 1;
+
/*
* Each non-removable tuple must be checked to see if it needs
* freezing. Note we already have exclusive buffer lock.
@@ -952,6 +996,27 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
heap_execute_freeze_tuple(htup, &frozen[i]);
}
+ /*
+ * If the un-frozen tuple is remaining in current page and
+ * current page is marked as ALL_FROZEN, we should clear it.
+ */
+ if (ntup_blk != (nfrozen + already_nfrozen)
+ && PageIsAllFrozen(page))
+ {
+ PageClearAllFrozen(page);
+ frozenmap_clear(onerel, blkno, fmbuffer);
+ }
+ /*
+ * As a result of scanning a page, we ensure that all tuples
+ * are completely frozen. Set bit on frozen map and PD_ALL_FROZEN
+ * flag on page.
+ */
+ else if (ntup_blk == (nfrozen + already_nfrozen))
+ {
+ PageSetAllFrozen(page);
+ frozenmap_set(onerel, blkno, buf, InvalidXLogRecPtr, fmbuffer);
+ }
+
/* Now WAL-log freezing if neccessary */
if (RelationNeedsWAL(onerel))
{
@@ -1077,13 +1142,18 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
num_tuples);
/*
- * Release any remaining pin on visibility map page.
+ * Release any remaining pin on visibility map and frozen map page.
*/
if (BufferIsValid(vmbuffer))
{
ReleaseBuffer(vmbuffer);
vmbuffer = InvalidBuffer;
}
+ if (BufferIsValid(fmbuffer))
+ {
+ ReleaseBuffer(fmbuffer);
+ fmbuffer = InvalidBuffer;
+ }
/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f96fb24..67898df 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -92,7 +92,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
if (exprType((Node *) tle->expr) != attr->atttypid)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("table row type and query-specified row type do not match"),
+ errmsg("table row type and query-specified row type do not match"),
errdetail("Table has type %s at ordinal position %d, but query expects %s.",
format_type_be(attr->atttypid),
attno,
@@ -117,7 +117,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
if (attno != resultDesc->natts)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("table row type and query-specified row type do not match"),
+ errmsg("table row type and query-specified row type do not match"),
errdetail("Query has too few columns.")));
}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eb7293f..d66660d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -55,6 +55,7 @@ typedef struct XLogRecordBuffer
static void DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
@@ -104,6 +105,10 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
DecodeStandbyOp(ctx, &buf);
break;
+ case RM_HEAP3_ID:
+ DecodeHeap3Op(ctx, &buf);
+ break;
+
case RM_HEAP2_ID:
DecodeHeap2Op(ctx, &buf);
break;
@@ -300,6 +305,29 @@ DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
/*
+ * Handle rmgr HEAP3_ID records for DecodeRecordIntoReorderBuffer().
+ */
+static void
+DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ uint8 info = XLogRecGetInfo(buf->record) & XLOG_HEAP_OPMASK;
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (info)
+ {
+ case XLOG_HEAP3_FROZENMAP:
+ break;
+ default:
+ elog(ERROR, "unexpected RM_HEAP3_ID record type: %u", info);
+ }
+
+}
+
+/*
* Handle rmgr HEAP2_ID records for DecodeRecordIntoReorderBuffer().
*/
static void
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 244b4ea..666e682 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -168,6 +168,7 @@ smgropen(RelFileNode rnode, BackendId backend)
reln->smgr_targblock = InvalidBlockNumber;
reln->smgr_fsm_nblocks = InvalidBlockNumber;
reln->smgr_vm_nblocks = InvalidBlockNumber;
+ reln->smgr_fm_nblocks = InvalidBlockNumber;
reln->smgr_which = 0; /* we only have md.c at present */
/* mark it not open */
diff --git a/src/common/relpath.c b/src/common/relpath.c
index 66dfef1..7eba9ee 100644
--- a/src/common/relpath.c
+++ b/src/common/relpath.c
@@ -35,6 +35,7 @@ const char *const forkNames[] = {
"main", /* MAIN_FORKNUM */
"fsm", /* FSM_FORKNUM */
"vm", /* VISIBILITYMAP_FORKNUM */
+ "fm", /* FROZENMAP_FORKNUM */
"init" /* INIT_FORKNUM */
};
@@ -58,7 +59,7 @@ forkname_to_number(const char *forkName)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid fork name"),
errhint("Valid fork names are \"main\", \"fsm\", "
- "\"vm\", and \"init\".")));
+ "\"vm\", \"fm\" and \"init\".")));
#endif
return InvalidForkNumber;
diff --git a/src/include/access/frozenmap.h b/src/include/access/frozenmap.h
new file mode 100644
index 0000000..0f2e54e
--- /dev/null
+++ b/src/include/access/frozenmap.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * frozenmap.h
+ * frozen map interface
+ *
+ *
+ * Portions Copyright (c) 2007-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/frozenmap.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FROZENMAP_H
+#define FROZENMAP_H
+
+#include "access/xlogdefs.h"
+#include "storage/block.h"
+#include "storage/buf.h"
+#include "utils/relcache.h"
+
+extern void frozenmap_clear(Relation rel, BlockNumber heapBlk,
+ Buffer fmbuf);
+extern void frozenmap_pin(Relation rel, BlockNumber heapBlk,
+ Buffer *fmbuf);
+extern bool frozenmap_pin_ok(BlockNumber heapBlk, Buffer fmbuf);
+extern void frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr, Buffer fmBuf);
+extern bool frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *fmbuf);
+extern BlockNumber frozenmap_count(Relation rel);
+extern void frozenmap_truncate(Relation rel, BlockNumber nheapblocks);
+
+#endif /* FROZENMAP_H */
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f0f89de..087cfeb 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,6 +60,13 @@
#define XLOG_HEAP2_NEW_CID 0x70
/*
+ * heapam.c has a third RmgrId now. These opcodes are associated with
+ * RM_HEAP3_ID, but are not logically different fromthe ones above
+ * asssociated with RM_HEAP_ID. XLOG_HEAP_OPMASK applies to these, too.
+ */
+#define XLOG_HEAP3_FROZENMAP 0x00
+
+/*
* xl_heap_* ->flag values, 8 bits are available.
*/
/* PD_ALL_VISIBLE was cleared */
@@ -73,6 +80,10 @@
#define XLOG_HEAP_SUFFIX_FROM_OLD (1<<6)
/* last xl_heap_multi_insert record for one heap_multi_insert() call */
#define XLOG_HEAP_LAST_MULTI_INSERT (1<<7)
+/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */
+#define XLOG_HEAP_ALL_FROZEN_CLEARED (1<<8)
+/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */
+#define XLOG_HEAP_NEW_ALL_FROZEN_CLEARED (1<<9)
/* convenience macro for checking whether any form of old tuple was logged */
#define XLOG_HEAP_CONTAINS_OLD \
@@ -110,12 +121,12 @@ typedef struct xl_heap_header
typedef struct xl_heap_insert
{
OffsetNumber offnum; /* inserted tuple's offset */
- uint8 flags;
+ uint16 flags;
/* xl_heap_header & TUPLE DATA in backup block 0 */
} xl_heap_insert;
-#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8))
+#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint16))
/*
* This is what we need to know about a multi-insert.
@@ -130,7 +141,7 @@ typedef struct xl_heap_insert
*/
typedef struct xl_heap_multi_insert
{
- uint8 flags;
+ uint16 flags;
uint16 ntuples;
OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
} xl_heap_multi_insert;
@@ -170,7 +181,7 @@ typedef struct xl_heap_update
TransactionId old_xmax; /* xmax of the old tuple */
OffsetNumber old_offnum; /* old tuple's offset */
uint8 old_infobits_set; /* infomask bits to set on old tuple */
- uint8 flags;
+ uint16 flags;
TransactionId new_xmax; /* xmax of the new tuple */
OffsetNumber new_offnum; /* new tuple's offset */
@@ -342,6 +353,9 @@ extern const char *heap_identify(uint8 info);
extern void heap2_redo(XLogReaderState *record);
extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
+extern void heap3_redo(XLogReaderState *record);
+extern void heap3_desc(StringInfo buf, XLogReaderState *record);
+extern const char *heap3_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
@@ -354,6 +368,8 @@ extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer,
TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples,
int ntuples);
+extern XLogRecPtr log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer,
+ Buffer fm_buffer);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
TransactionId cutoff_xid,
TransactionId cutoff_multi,
diff --git a/src/include/access/hio.h b/src/include/access/hio.h
index b014029..1a27ee8 100644
--- a/src/include/access/hio.h
+++ b/src/include/access/hio.h
@@ -40,6 +40,8 @@ extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
Buffer otherBuffer, int options,
BulkInsertState bistate,
- Buffer *vmbuffer, Buffer *vmbuffer_other);
+ Buffer *vmbuffer, Buffer *vmbuffer_other,
+ Buffer *fmbuffer, Buffer *fmbuffer_other
+ );
#endif /* HIO_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 48f04c6..e49c0b0 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -34,6 +34,7 @@ PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, N
PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL)
PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL)
PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL)
+PG_RMGR(RM_HEAP3_ID, "Heap3", heap3_redo, heap3_desc, heap3_identify, NULL, NULL)
PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL)
PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL)
PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 8b4c35c..8420e47 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -47,6 +47,8 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
float4 reltuples; /* # of tuples (not always up-to-date) */
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
+ int32 relallfrozen; /* # of all-frozen blocks (not always
+ up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
@@ -95,7 +97,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 30
+#define Natts_pg_class 31
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,25 +109,26 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_relpages 9
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
-#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_relhasindex 13
-#define Anum_pg_class_relisshared 14
-#define Anum_pg_class_relpersistence 15
-#define Anum_pg_class_relkind 16
-#define Anum_pg_class_relnatts 17
-#define Anum_pg_class_relchecks 18
-#define Anum_pg_class_relhasoids 19
-#define Anum_pg_class_relhaspkey 20
-#define Anum_pg_class_relhasrules 21
-#define Anum_pg_class_relhastriggers 22
-#define Anum_pg_class_relhassubclass 23
-#define Anum_pg_class_relrowsecurity 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relreplident 26
-#define Anum_pg_class_relfrozenxid 27
-#define Anum_pg_class_relminmxid 28
-#define Anum_pg_class_relacl 29
-#define Anum_pg_class_reloptions 30
+#define Anum_pg_class_relallfrozen 12
+#define Anum_pg_class_reltoastrelid 13
+#define Anum_pg_class_relhasindex 14
+#define Anum_pg_class_relisshared 15
+#define Anum_pg_class_relpersistence 16
+#define Anum_pg_class_relkind 17
+#define Anum_pg_class_relnatts 18
+#define Anum_pg_class_relchecks 19
+#define Anum_pg_class_relhasoids 20
+#define Anum_pg_class_relhaspkey 21
+#define Anum_pg_class_relhasrules 22
+#define Anum_pg_class_relhastriggers 23
+#define Anum_pg_class_relhassubclass 24
+#define Anum_pg_class_relrowsecurity 25
+#define Anum_pg_class_relispopulated 26
+#define Anum_pg_class_relreplident 27
+#define Anum_pg_class_relfrozenxid 28
+#define Anum_pg_class_relminmxid 29
+#define Anum_pg_class_relacl 30
+#define Anum_pg_class_reloptions 31
/* ----------------
* initial contents of pg_class
@@ -140,13 +143,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 31 0 t f f f f f t n 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h
index a263779..5d40997 100644
--- a/src/include/common/relpath.h
+++ b/src/include/common/relpath.h
@@ -27,6 +27,7 @@ typedef enum ForkNumber
MAIN_FORKNUM = 0,
FSM_FORKNUM,
VISIBILITYMAP_FORKNUM,
+ FROZENMAP_FORKNUM,
INIT_FORKNUM
/*
@@ -38,7 +39,7 @@ typedef enum ForkNumber
#define MAX_FORKNUM INIT_FORKNUM
-#define FORKNAMECHARS 4 /* max chars for a fork name */
+#define FORKNAMECHARS 5 /* max chars for a fork name */
extern const char *const forkNames[];
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index c2fbffc..f46375d 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -178,8 +178,10 @@ typedef PageHeaderData *PageHeader;
* tuple? */
#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to
* everyone */
+#define PD_ALL_FROZEN 0x0008 /* all tuples on page are completely
+ frozen */
-#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */
+#define PD_VALID_FLAG_BITS 0x000F /* OR of all valid pd_flags bits */
/*
* Page layout version number 0 is for pre-7.3 Postgres releases.
@@ -367,6 +369,13 @@ typedef PageHeaderData *PageHeader;
#define PageClearAllVisible(page) \
(((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)
+#define PageIsAllFrozen(page) \
+ (((PageHeader) (page))->pd_flags & PD_ALL_FROZEN)
+#define PageSetAllFrozen(page) \
+ (((PageHeader) (page))->pd_flags |= PD_ALL_FROZEN)
+#define PageClearAllFrozen(page) \
+ (((PageHeader) (page))->pd_flags &= ~PD_ALL_FROZEN)
+
#define PageIsPrunable(page, oldestxmin) \
( \
AssertMacro(TransactionIdIsNormal(oldestxmin)), \
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 69a624f..2173c20 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -55,6 +55,7 @@ typedef struct SMgrRelationData
BlockNumber smgr_targblock; /* current insertion target block */
BlockNumber smgr_fsm_nblocks; /* last known size of fsm fork */
BlockNumber smgr_vm_nblocks; /* last known size of vm fork */
+ BlockNumber smgr_fm_nblocks; /* last known size of fm fork */
/* additional public fields may someday exist here */
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers