On Sun, Nov 9, 2014 at 10:32 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > On Sun, Nov 9, 2014 at 6:41 AM, Rahila Syed <rahilasye...@gmail.com> wrote: >> Hello, >> >>>The patch was not applied to the master cleanly. Could you update the >>> patch? >> Please find attached updated and rebased patch to compress FPW. Review >> comments given above have been implemented. > > Thanks for updating the patch! Will review it. > > BTW, I got the following compiler warnings. > > xlogreader.c:755: warning: assignment from incompatible pointer type > autovacuum.c:1412: warning: implicit declaration of function > 'CompressBackupBlocksPagesAlloc' > xlogreader.c:755: warning: assignment from incompatible pointer type I have been looking at this patch, here are some comments: 1) This documentation change is incorrect: - <term><varname>full_page_writes</varname> (<type>boolean</type>) + <term><varname>full_page_writes</varname> (<type>enum</type>)</term> <indexterm> <primary><varname>full_page_writes</> configuration parameter</primary> </indexterm> - </term> The termination of block term was correctly places before. 2) This patch defines FullPageWritesStr and full_page_writes_str, but both do more or less the same thing. 3) This patch is touching worker_spi.c and calling CompressBackupBlocksPagesAlloc directly. Why is that necessary? Doesn't a bgworker call InitXLOGAccess once it connects to a database? 4) Be careful as well of whitespaces (code lines should have as well a maximum of 80 characters): + * If compression is set on replace the rdata nodes of backup blocks added in the loop + * above by single rdata node that contains compressed backup blocks and their headers + * except the header of first block which is used to store the information about compression. + */ 5) GetFullPageWriteGUC or something similar is necessary, but I think that for consistency with doPageWrites its value should be fetched in XLogInsert and then passed as an extra argument in XLogRecordAssemble. Thinking more about this, I think that it would be cleaner to simply have a bool flag tracking if compression is active or not, something like doPageCompression, that could be fetched using GetFullPageWriteInfo. Thinking more about it, we could directly track forcePageWrites and fullPageWrites, but that would make back-patching more difficult with not that much gain. 6) Not really a complaint, but note that this patch is using two bits that were unused up to now to store the compression status of a backup block. This is actually safe as long as the maximum page is not higher than 32k, which is the limit authorized by --with-blocksize btw. I think that this deserves a comment at the top of the declaration of BkpBlock. ! unsigned hole_offset:15, /* number of bytes before "hole" */ ! flags:2, /* state of a backup block, see below */ ! hole_length:15; /* number of bytes in "hole" */ 7) Some code in RestoreBackupBlock: + char *uncompressedPages; + + uncompressedPages = (char *)palloc(XLR_TOTAL_BLCKSZ); [...] + /* Check if blocks in WAL record are compressed */ + if (bkpb.flag_compress == BKPBLOCKS_COMPRESSED) + { + /* Checks to see if decompression is successful is made inside the function */ + pglz_decompress((PGLZ_Header *) blk, uncompressedPages); + blk = uncompressedPages; + } uncompressedPages is pallocd'd all the time but you actually just need to do that when the block is compressed. 8) Arf, I don't like much the logic around CompressBackupBlocksPagesAlloc using a malloc to allocate once the space necessary for compressed and uncompressed pages. You are right to not do that inside a critical section, but PG tries to maximize the allocations to be palloc'd. Now it is true that if a palloc does not succeed, PG always ERROR's out (writer adding entry to TODO list)... Hence I think that using a static variable for those compressed and uncompressed pages makes more sense, and this simplifies greatly the patch as well. 9) Is avw_sighup_handler really necessary, what's wrong in allocating it all the time by default? This avoids some potential caveats in error handling as well as in value updates for full_page_writes.
So, note that I am not only complaining about the patch, I actually rewrote it as attached while reviewing, with additional minor cleanups and enhancements. I did as well a couple of tests like the script attached, compression numbers being more or less the same as your previous patch, some noise creating differences. I have done also some regression test runs with a standby replaying behind. I'll go through the patch once again a bit later, but feel free to comment. Regards, -- Michael
compress_test_2.sh
Description: Bourne shell script
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 47b1192..6756172 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2181,14 +2181,15 @@ include_dir 'conf.d' </varlistentry> <varlistentry id="guc-full-page-writes" xreflabel="full_page_writes"> - <term><varname>full_page_writes</varname> (<type>boolean</type>) + <term><varname>full_page_writes</varname> (<type>enum</type>) <indexterm> <primary><varname>full_page_writes</> configuration parameter</primary> </indexterm> </term> <listitem> <para> - When this parameter is on, the <productname>PostgreSQL</> server + When this parameter is <literal>on</> or <literal>compress</>, + the <productname>PostgreSQL</> server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. This is needed because @@ -2206,6 +2207,11 @@ include_dir 'conf.d' </para> <para> + Valid values are <literal>on</>, <literal>compress</>, and <literal>off</>. + The default is <literal>on</>. + </para> + + <para> Turning this parameter off speeds normal operation, but might lead to either unrecoverable data corruption, or silent data corruption, after a system failure. The risks are similar to turning off @@ -2220,9 +2226,13 @@ include_dir 'conf.d' </para> <para> + Setting this parameter to <literal>compress</> compresses + the full page image to reduce the amount of WAL data. + </para> + + <para> This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. - The default is <literal>on</>. </para> </listitem> </varlistentry> diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index e0957ff..15ec521 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -49,7 +49,7 @@ xlog_desc(StringInfo buf, XLogRecord *record) (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo, checkpoint->ThisTimeLineID, checkpoint->PrevTimeLineID, - checkpoint->fullPageWrites ? "true" : "false", + FullPageWritesStr(checkpoint->fullPageWrites), checkpoint->nextXidEpoch, checkpoint->nextXid, checkpoint->nextOid, checkpoint->nextMulti, @@ -118,10 +118,10 @@ xlog_desc(StringInfo buf, XLogRecord *record) } else if (info == XLOG_FPW_CHANGE) { - bool fpw; + int fpw; - memcpy(&fpw, rec, sizeof(bool)); - appendStringInfo(buf, "%s", fpw ? "true" : "false"); + memcpy(&fpw, rec, sizeof(int)); + appendStringInfo(buf, "fpw: %s", FullPageWritesStr(fpw)); } else if (info == XLOG_END_OF_RECOVERY) { diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 99f702c..dc369b4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -62,6 +62,7 @@ #include "utils/builtins.h" #include "utils/guc.h" #include "utils/memutils.h" +#include "utils/pg_lzcompress.h" #include "utils/ps_status.h" #include "utils/relmapper.h" #include "utils/snapmgr.h" @@ -85,7 +86,7 @@ int XLogArchiveTimeout = 0; bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; bool EnableHotStandby = false; -bool fullPageWrites = true; +int fullPageWrites = FULL_PAGE_WRITES_ON; bool wal_log_hints = false; bool log_checkpoints = false; int sync_method = DEFAULT_SYNC_METHOD; @@ -179,7 +180,7 @@ static TimeLineID receiveTLI = 0; * that the recovery starting checkpoint record indicates, and then updated * each time XLOG_FPW_CHANGE record is replayed. */ -static bool lastFullPageWrites; +static int lastFullPageWrites; /* * Local copy of SharedRecoveryInProgress variable. True actually means "not @@ -316,6 +317,13 @@ static XLogRecPtr RedoRecPtr; static bool doPageWrites; /* + * doPageCompression is this backend'd local copy of + * (fullPageWrites == FULL_PAGE_WRITES_COMPRESS). It is used to check if + * a full page write can be compressed. + */ +static int doPageCompression; + +/* * RedoStartLSN points to the checkpoint's REDO location which is specified * in a backup label file, backup history file or control file. In standby * mode, XLOG streaming usually starts from the position where an invalid @@ -464,7 +472,7 @@ typedef struct XLogCtlInsert */ XLogRecPtr RedoRecPtr; /* current redo point for insertions */ bool forcePageWrites; /* forcing full-page writes for PITR? */ - bool fullPageWrites; + int fullPageWrites; /* * exclusiveBackup is true if a backup started with pg_start_backup() is @@ -931,10 +939,11 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn) WALInsertLockAcquire(); /* - * Check to see if my copy of RedoRecPtr or doPageWrites is out of date. - * If so, may have to go back and have the caller recompute everything. - * This can only happen just after a checkpoint, so it's better to be - * slow in this case and fast otherwise. + * Check to see if my copy of RedoRecPtr, doPageWrites or + * doPageCompression is out of date. If so, may have to go back and + * have the caller recompute everything. This can only happen just + * after a checkpoint, so it's better to be slow in this case and + * fast otherwise. * * If we aren't doing full-page writes then RedoRecPtr doesn't actually * affect the contents of the XLOG record, so we'll update our local copy @@ -948,6 +957,7 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn) RedoRecPtr = Insert->RedoRecPtr; } doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); + doPageCompression = (Insert->fullPageWrites == FULL_PAGE_WRITES_COMPRESS); if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) { @@ -5928,6 +5938,7 @@ StartupXLOG(void) RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo; doPageWrites = lastFullPageWrites; + doPageCompression = (lastFullPageWrites == FULL_PAGE_WRITES_COMPRESS); if (RecPtr < checkPoint.redo) ereport(PANIC, @@ -7221,17 +7232,23 @@ GetRedoRecPtr(void) /* * Return information needed to decide whether a modified block needs a - * full-page image to be included in the WAL record. + * full-page image to be included in the WAL record, compressed or not. * * The returned values are cached copies from backend-private memory, and * possibly out-of-date. XLogInsertRecord will re-check them against * up-to-date values, while holding the WAL insert lock. */ void -GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p) -{ - *RedoRecPtr_p = RedoRecPtr; - *doPageWrites_p = doPageWrites; +GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, + bool *doPageWrites_p, + bool *doPageCompression_p) +{ + if (RedoRecPtr_p) + *RedoRecPtr_p = RedoRecPtr; + if (doPageWrites_p) + *doPageWrites_p = doPageWrites; + if (doPageCompression_p) + *doPageCompression_p = doPageCompression; } /* @@ -8471,10 +8488,10 @@ UpdateFullPageWrites(void) * setting it to false, first write the WAL record and then set the global * flag. */ - if (fullPageWrites) + if (fullPageWrites != FULL_PAGE_WRITES_OFF) { WALInsertLockAcquireExclusive(); - Insert->fullPageWrites = true; + Insert->fullPageWrites = fullPageWrites; WALInsertLockRelease(); } @@ -8487,17 +8504,17 @@ UpdateFullPageWrites(void) XLogRecData rdata; rdata.data = (char *) (&fullPageWrites); - rdata.len = sizeof(bool); + rdata.len = sizeof(int); rdata.buffer = InvalidBuffer; rdata.next = NULL; XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE, &rdata); } - if (!fullPageWrites) + if (fullPageWrites == FULL_PAGE_WRITES_OFF) { WALInsertLockAcquireExclusive(); - Insert->fullPageWrites = false; + Insert->fullPageWrites = fullPageWrites; WALInsertLockRelease(); } END_CRIT_SECTION(); @@ -8840,16 +8857,16 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) } else if (info == XLOG_FPW_CHANGE) { - bool fpw; + int fpw; - memcpy(&fpw, XLogRecGetData(record), sizeof(bool)); + memcpy(&fpw, XLogRecGetData(record), sizeof(int)); /* * Update the LSN of the last replayed XLOG_FPW_CHANGE record so that * do_pg_start_backup() and do_pg_stop_backup() can check whether * full_page_writes has been disabled during online backup. */ - if (!fpw) + if (fpw == FULL_PAGE_WRITES_OFF) { SpinLockAcquire(&XLogCtl->info_lck); if (XLogCtl->lastFpwDisableRecPtr < ReadRecPtr) @@ -9192,7 +9209,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, do { - bool checkpointfpw; + int checkpointfpw; /* * Force a CHECKPOINT. Aside from being necessary to prevent torn @@ -9241,7 +9258,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, recptr = XLogCtl->lastFpwDisableRecPtr; SpinLockRelease(&XLogCtl->info_lck); - if (!checkpointfpw || startpoint <= recptr) + if (checkpointfpw == FULL_PAGE_WRITES_OFF || startpoint <= recptr) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL generated with full_page_writes=off was replayed " diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index b83343b..a42b514 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -23,13 +23,21 @@ #include "storage/proc.h" #include "utils/memutils.h" #include "pg_trace.h" +#include "utils/pg_lzcompress.h" static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info, - XLogRecData *rdata, - XLogRecPtr RedoRecPtr, bool doPageWrites, + XLogRecData *rdata, XLogRecPtr RedoRecPtr, + bool doPageWrites, bool doPageCompression, XLogRecPtr *fpw_lsn, XLogRecData **rdt_lastnormal); static void XLogFillBkpBlock(Buffer buffer, bool buffer_std, BkpBlock *bkpb); +static char *CompressBackupBlocks(char *page, uint32 orig_len, + char *dest, uint32 *len); + +/* Storage for backup blocks before and after compression */ +static char compressedPages[BKPBLOCK_COMPRESSED_MAX_SIZE]; +static char uncompressedPages[BKPBLOCK_COMPRESSED_MAX_SIZE]; + /* * Insert an XLOG record having the specified RMID and info bytes, * with the body of the record being the data chunk(s) described by @@ -50,6 +58,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) { XLogRecPtr RedoRecPtr; bool doPageWrites; + bool doPageCompression; XLogRecPtr EndPos; XLogRecPtr fpw_lsn; XLogRecData *rdt; @@ -72,11 +81,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) } /* - * Get values needed to decide whether to do full-page writes. Since we - * don't yet have an insertion lock, these could change under us, but - * XLogInsertRecord will recheck them once it has a lock. + * Get values needed to decide whether to do full-page writes and if yes + * under with conditions. Since we don't yet have an insertion lock, these + * could change under us, but XLogInsertRecord will recheck them once it + * has a lock. */ - GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); + GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites, &doPageCompression); /* * Assemble an XLogRecData chain representing the WAL record, including @@ -89,7 +99,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) */ retry: rdt = XLogRecordAssemble(rmid, info, rdata, RedoRecPtr, doPageWrites, - &fpw_lsn, &rdt_lastnormal); + doPageCompression, &fpw_lsn, &rdt_lastnormal); EndPos = XLogInsertRecord(rdt, fpw_lsn); @@ -106,9 +116,11 @@ retry: * we've modified the rdata chain. */ bool newDoPageWrites; + bool newDoPageCompression; - GetFullPageWriteInfo(&RedoRecPtr, &newDoPageWrites); + GetFullPageWriteInfo(&RedoRecPtr, &newDoPageWrites, &newDoPageCompression); doPageWrites = doPageWrites || newDoPageWrites; + doPageCompression = newDoPageCompression; rdt_lastnormal->next = NULL; goto retry; @@ -136,7 +148,8 @@ retry: static XLogRecData * XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecData *rdata, XLogRecPtr RedoRecPtr, bool doPageWrites, - XLogRecPtr *fpw_lsn, XLogRecData **rdt_lastnormal) + bool doPageCompression, XLogRecPtr *fpw_lsn, + XLogRecData **rdt_lastnormal) { bool isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH); XLogRecData *rdt; @@ -333,6 +346,58 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecData *rdata, } /* + * If compression is set to on replace the rdata nodes of backup blocks + * added in the loop above by single rdata node that contains compressed + * backup blocks and their headers except the header of first block + * which is used to store the information about compression. + */ + if (doPageCompression && + (*rdt_lastnormal)->next != NULL) + { + uint32 orig_len = 0; + bool compressed = false; + BkpBlock *bkpb; + + rdt = (*rdt_lastnormal)->next; + rdt = rdt->next; + for (; rdt != NULL; rdt = rdt->next) + { + memcpy(uncompressedPages + orig_len, rdt->data, rdt->len); + orig_len += rdt->len; + } + if (orig_len) + { + char *compressed_blocks; + uint32 compressed_len = 0; + + /* Compress the backup blocks before including it in rdata chain */ + compressed_blocks = CompressBackupBlocks(uncompressedPages, orig_len, + compressedPages, &(compressed_len)); + if (compressed_blocks != NULL) + { + /* + * total_len is the length of compressed block and its varlena + * header + */ + rdt = ((*rdt_lastnormal)->next)->next; + rdt->data = compressed_blocks; + rdt->len = compressed_len; + total_len = SizeOfXLogRecord + len; + total_len = total_len + sizeof(BkpBlock); + total_len += rdt->len; + rdt->next = NULL; + compressed = true; + } + } + /* Adding information about compression in backup block header */ + bkpb = (BkpBlock *)(*rdt_lastnormal)->next->data; + if (!compressed) + bkpb->flag_compress = BKPBLOCK_UNCOMPRESSED; + else + bkpb->flag_compress = BKPBLOCK_COMPRESSED; + } + + /* * We disallow len == 0 because it provides a useful bit of extra error * checking in ReadRecord. This means that all callers of XLogInsert * must supply at least some not-in-a-buffer data. However, we make an @@ -371,7 +436,7 @@ XLogCheckBufferNeedsBackup(Buffer buffer) bool doPageWrites; Page page; - GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); + GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites, NULL); page = BufferGetPage(buffer); @@ -630,4 +695,39 @@ XLogFillBkpBlock(Buffer buffer, bool buffer_std, BkpBlock *bkpb) bkpb->hole_offset = 0; bkpb->hole_length = 0; } + bkpb->flag_compress = BKPBLOCK_UNCOMPRESSED; +} + +/* + * Create a compressed version of a backup block + * If successful, return a compressed result and set 'len' to its length. + * Otherwise (ie, compressed result is actually bigger than original), + * return NULL. + */ +static char * +CompressBackupBlocks(char *page, uint32 orig_len, char *dest, uint32 *len) +{ + struct varlena *buf = (struct varlena *) dest; + bool ret; + ret = pglz_compress(page, BLCKSZ, + (PGLZ_Header *) buf, PGLZ_strategy_default); + + /* Zero is returned for incompressible data */ + if (!ret) + return NULL; + /* + * We recheck the actual size even if pglz_compress() report success, + * because it might be satisfied with having saved as little as one byte + * in the compressed data --- which could turn into a net loss once you + * consider header and alignment padding. Worst case, the compressed + * format might require three padding bytes (plus header, which is + * included in VARSIZE(buf)), whereas the uncompressed format would take + * only one header byte and no padding if the value is short enough. So + * we insist on a savings of more than 2 bytes to ensure we have a gain. + */ + if (VARSIZE(buf) >= orig_len - 2) + return NULL; + + *len = VARSIZE(buf); + return (char *) buf; } diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 7d573cc..c921870 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -689,50 +689,84 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr) /* Add in the backup blocks, if any */ blk = (char *) XLogRecGetData(record) + len; - for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) + if (remaining != 0) { - uint32 blen; - - if (!(record->xl_info & XLR_BKP_BLOCK(i))) - continue; - if (remaining < sizeof(BkpBlock)) { - report_invalid_record(state, - "invalid backup block size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr); + report_invalid_record(state,"invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); return false; } memcpy(&bkpb, blk, sizeof(BkpBlock)); - if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ) + if (bkpb.flag_compress == BKPBLOCK_UNCOMPRESSED) { - report_invalid_record(state, - "incorrect hole size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr); - return false; + for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) + { + uint32 blen; + + if (!(record->xl_info & XLR_BKP_BLOCK(i))) + continue; + + if (remaining < sizeof(BkpBlock)) + { + report_invalid_record(state, + "invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + memcpy(&bkpb, blk, sizeof(BkpBlock)); + + if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ) + { + report_invalid_record(state, + "incorrect hole size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + + blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length; + + if (remaining < blen) + { + report_invalid_record(state, + "invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + remaining -= blen; + COMP_CRC32C(crc, blk, blen); + blk += blen; + } + /* Check that xl_tot_len agrees with our calculation */ + if (remaining != 0) + { + report_invalid_record(state, + "incorrect total length in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } } - blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length; - - if (remaining < blen) + else { - report_invalid_record(state, - "invalid backup block size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr); - return false; - } - remaining -= blen; - COMP_CRC32C(crc, blk, blen); - blk += blen; - } + struct varlena *tmp; + uint32 b_tot_len; - /* Check that xl_tot_len agrees with our calculation */ - if (remaining != 0) - { - report_invalid_record(state, - "incorrect total length in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr); - return false; + tmp = (struct varlena *) (blk + sizeof(BkpBlock)); + b_tot_len = VARSIZE(tmp); + + /* + * Check to ensure that the total length of compressed blocks stored as varlena + * agrees with the xl_tot_len stored in XLogRecord + */ + if ((remaining - sizeof(BkpBlock)) != b_tot_len) + { + report_invalid_record(state,"invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + COMP_CRC32C(crc, blk, remaining); + } } /* Finally include the record header */ diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 1a21dac..868ce47 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -24,6 +24,7 @@ #include "utils/guc.h" #include "utils/hsearch.h" #include "utils/rel.h" +#include "utils/pg_lzcompress.h" /* @@ -498,14 +499,24 @@ RestoreBackupBlock(XLogRecPtr lsn, XLogRecord *record, int block_index, /* Locate requested BkpBlock in the record */ blk = (char *) XLogRecGetData(record) + record->xl_len; + + memcpy(&bkpb, blk, sizeof(BkpBlock)); + blk = blk + sizeof(BkpBlock); + + /* Decompress blocks if they are compressed */ + if (bkpb.flag_compress == BKPBLOCK_COMPRESSED) + { + char *uncompressedPages; + uncompressedPages = (char *) palloc(XLR_TOTAL_BLCKSZ); + pglz_decompress((PGLZ_Header *) blk, (char *) uncompressedPages); + blk = uncompressedPages; + } + for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) { if (!(record->xl_info & XLR_BKP_BLOCK(i))) continue; - memcpy(&bkpb, blk, sizeof(BkpBlock)); - blk += sizeof(BkpBlock); - if (i == block_index) { /* Found it, apply the update */ @@ -514,6 +525,8 @@ RestoreBackupBlock(XLogRecPtr lsn, XLogRecord *record, int block_index, } blk += BLCKSZ - bkpb.hole_length; + memcpy(&bkpb, blk, sizeof(BkpBlock)); + blk += sizeof(BkpBlock); } /* Caller specified a bogus block_index */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index aca4243..1073f47 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -418,6 +418,23 @@ static const struct config_enum_entry row_security_options[] = { }; /* + * Although only "on", "off", and "compress" are documented, we + * accept all the likely variants of "on" and "off". + */ +static const struct config_enum_entry full_page_writes_options[] = { + {"compress", FULL_PAGE_WRITES_COMPRESS, false}, + {"on", FULL_PAGE_WRITES_ON, false}, + {"off", FULL_PAGE_WRITES_OFF, false}, + {"true", FULL_PAGE_WRITES_ON, true}, + {"false", FULL_PAGE_WRITES_OFF, true}, + {"yes", FULL_PAGE_WRITES_ON, true}, + {"no", FULL_PAGE_WRITES_OFF, true}, + {"1", FULL_PAGE_WRITES_ON, true}, + {"0", FULL_PAGE_WRITES_OFF, true}, + {NULL, 0, false} +}; + +/* * Options for enum values stored in other modules */ extern const struct config_enum_entry wal_level_options[]; @@ -893,20 +910,6 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, { - {"full_page_writes", PGC_SIGHUP, WAL_SETTINGS, - gettext_noop("Writes full pages to WAL when first modified after a checkpoint."), - gettext_noop("A page write in process during an operating system crash might be " - "only partially written to disk. During recovery, the row changes " - "stored in WAL are not enough to recover. This option writes " - "pages when first modified after a checkpoint to WAL so full recovery " - "is possible.") - }, - &fullPageWrites, - true, - NULL, NULL, NULL - }, - - { {"wal_log_hints", PGC_POSTMASTER, WAL_SETTINGS, gettext_noop("Writes full pages to WAL when first modified after a checkpoint, even for a non-critical modifications"), NULL @@ -3423,6 +3426,20 @@ static struct config_enum ConfigureNamesEnum[] = }, { + {"full_page_writes", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Writes full pages to WAL when first modified after a checkpoint."), + gettext_noop("A page write in process during an operating system crash might be " + "only partially written to disk. During recovery, the row changes " + "stored in WAL are not enough to recover. This option writes " + "pages when first modified after a checkpoint to WAL so full recovery " + "is possible.") + }, + &fullPageWrites, + FULL_PAGE_WRITES_ON, full_page_writes_options, + NULL, NULL, NULL + }, + + { {"trace_recovery_messages", PGC_SIGHUP, DEVELOPER_OPTIONS, gettext_noop("Enables logging of recovery-related debugging information."), gettext_noop("Each level includes all the levels that follow it. The later" diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index dac6776..9f51e30 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -185,7 +185,8 @@ # fsync # fsync_writethrough # open_sync -#full_page_writes = on # recover from partial page writes +#full_page_writes = on # recover from partial page writes; + # off, compress, or on #wal_log_hints = off # also do full page writes of non-critical updates #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers # (change requires restart) diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index b2e0793..e250ee0 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -250,7 +250,7 @@ main(int argc, char *argv[]) printf(_("Latest checkpoint's PrevTimeLineID: %u\n"), ControlFile.checkPointCopy.PrevTimeLineID); printf(_("Latest checkpoint's full_page_writes: %s\n"), - ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off")); + FullPageWritesStr(ControlFile.checkPointCopy.fullPageWrites)); printf(_("Latest checkpoint's NextXID: %u/%u\n"), ControlFile.checkPointCopy.nextXidEpoch, ControlFile.checkPointCopy.nextXid); diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index 2ba9946..79b387d 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -517,7 +517,7 @@ GuessControlValues(void) ControlFile.checkPointCopy.redo = SizeOfXLogLongPHD; ControlFile.checkPointCopy.ThisTimeLineID = 1; ControlFile.checkPointCopy.PrevTimeLineID = 1; - ControlFile.checkPointCopy.fullPageWrites = false; + ControlFile.checkPointCopy.fullPageWrites = FULL_PAGE_WRITES_OFF; ControlFile.checkPointCopy.nextXidEpoch = 0; ControlFile.checkPointCopy.nextXid = FirstNormalTransactionId; ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId; @@ -601,7 +601,7 @@ PrintControlValues(bool guessed) printf(_("Latest checkpoint's TimeLineID: %u\n"), ControlFile.checkPointCopy.ThisTimeLineID); printf(_("Latest checkpoint's full_page_writes: %s\n"), - ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off")); + FullPageWritesStr(ControlFile.checkPointCopy.fullPageWrites)); printf(_("Latest checkpoint's NextXID: %u/%u\n"), ControlFile.checkPointCopy.nextXidEpoch, ControlFile.checkPointCopy.nextXid); diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 6f8b5f4..f98572f 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -96,7 +96,6 @@ extern int XLogArchiveTimeout; extern bool XLogArchiveMode; extern char *XLogArchiveCommand; extern bool EnableHotStandby; -extern bool fullPageWrites; extern bool wal_log_hints; extern bool log_checkpoints; @@ -110,6 +109,23 @@ typedef enum WalLevel } WalLevel; extern int wal_level; +/* full-page writes */ +typedef enum FullPageWritesLevel +{ + FULL_PAGE_WRITES_OFF = 0, + FULL_PAGE_WRITES_COMPRESS, + FULL_PAGE_WRITES_ON +} FullPageWritesLevel; +extern int fullPageWrites; + +/* + * Convert full-page write parameter into a readable string. + */ +#define FullPageWritesStr(fpw) \ + (fpw == FULL_PAGE_WRITES_ON ? _("on") : \ + (fpw == FULL_PAGE_WRITES_COMPRESS ? _("compress") : \ + (fpw == FULL_PAGE_WRITES_OFF ? _("off") : _("unrecognized")))) + #define XLogArchivingActive() (XLogArchiveMode && wal_level >= WAL_LEVEL_ARCHIVE) #define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0') @@ -233,7 +249,9 @@ extern bool CreateRestartPoint(int flags); extern void XLogPutNextOid(Oid nextOid); extern XLogRecPtr XLogRestorePoint(const char *rpName); extern void UpdateFullPageWrites(void); -extern void GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p); +extern void GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, + bool *doPageWrites_p, + bool *doPageCompression_p); extern XLogRecPtr GetRedoRecPtr(void); extern XLogRecPtr GetInsertRecPtr(void); extern XLogRecPtr GetFlushRecPtr(void); diff --git a/src/include/access/xloginsert.h b/src/include/access/xloginsert.h index 30c2e84..fd8937c 100644 --- a/src/include/access/xloginsert.h +++ b/src/include/access/xloginsert.h @@ -63,4 +63,9 @@ extern XLogRecPtr log_newpage_buffer(Buffer buffer, bool page_std); extern XLogRecPtr XLogSaveBufferForHint(Buffer buffer, bool buffer_std); extern bool XLogCheckBufferNeedsBackup(Buffer buffer); +/* + * Total size of maximum number of backup blocks in an XLOG record(including + * backup block headers) + */ +#define XLR_TOTAL_BLCKSZ XLR_MAX_BKP_BLOCKS * BLCKSZ + XLR_MAX_BKP_BLOCKS * sizeof(BkpBlock) #endif /* XLOGINSERT_H */ diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h index ab0fb1c..ce2a58e 100644 --- a/src/include/access/xlogrecord.h +++ b/src/include/access/xlogrecord.h @@ -73,6 +73,20 @@ typedef struct XLogRecord #define XLR_BKP_BLOCK(iblk) (0x08 >> (iblk)) /* iblk in 0..3 */ /* + * Compression state of a backup block. + */ +#define BKPBLOCK_UNCOMPRESSED 0 /* uncompressed */ +#define BKPBLOCK_COMPRESSED 1 /* compressed */ + +/* + * Total size of maximum number of backup blocks in an XLOG record (including + * backup block headers). This is used to evaluate the maximum size of + * record whose backup blocks are compressed. + */ +#define BKPBLOCK_COMPRESSED_MAX_SIZE \ + XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ) + +/* * Header info for a backup block appended to an XLOG record. * * As a trivial form of data compression, the XLOG code is aware that @@ -81,6 +95,10 @@ typedef struct XLogRecord * such a "hole" from the stored data (and it's not counted in the * XLOG record's CRC, either). Hence, the amount of block data actually * present following the BkpBlock struct is BLCKSZ - hole_length bytes. + * Compression information is stored on two bits within this structure, + * while the hole offset and length use 15 bits. This is safe as long + * as the maximum size block authorized is up to 32k, this structure + * would need to increase insize if a higher page size is used. * * Note that we don't attempt to align either the BkpBlock struct or the * block's data. So, the struct must be copied to aligned local storage @@ -91,8 +109,9 @@ typedef struct BkpBlock RelFileNode node; /* relation containing block */ ForkNumber fork; /* fork within the relation */ BlockNumber block; /* block number */ - uint16 hole_offset; /* number of bytes before "hole" */ - uint16 hole_length; /* number of bytes in "hole" */ + unsigned hole_offset:15, /* number of bytes before "hole" */ + flag_compress:2,/* flag to store compression information */ + hole_length:15; /* number of bytes in "hole" */ /* ACTUAL BLOCK DATA FOLLOWS AT END OF STRUCT */ } BkpBlock; diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index ba79d25..6a536fc 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -35,7 +35,7 @@ typedef struct CheckPoint TimeLineID ThisTimeLineID; /* current TLI */ TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new * timeline (equals ThisTimeLineID otherwise) */ - bool fullPageWrites; /* current full_page_writes */ + int fullPageWrites; /* current full_page_writes */ uint32 nextXidEpoch; /* higher-order bits of nextXid */ TransactionId nextXid; /* next free XID */ Oid nextOid; /* next free OID */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers