This is a proposal for archive log compression keeping physical log in WAL.

In PotgreSQL 8.2, full-page_writes option came back to cut out physical
log both from WAL and archive log.   To deal with the partial write
during the online backup, physical log is written only during the online

Although this dramatically reduces the log size, it can risk the crash
recovery.   If any page is inconsisitent because of the fault, crash
recovery doesn't work because full page images are necessary to recover
the page in such case.  For critical use, especially in commercial use,
 we don't like to risk the crash recovery chance, while reducing the
archive log size will be crucial too for larger databases.    WAL size
itself may be less critical, because they're reused cyclickly.

Here, I have a simple idea to reduce archive log size while keeping
physical log in xlog:

1. Create new GUC: full_page_compress,

2. Turn on both the full_page_writes and full_page_compress: physical
log will be written to WAL at the first write to a page after the
checkpoint, just as conventional full_page_writes ON.

3. Unless physical log is written during the online backup, this can be
removed from the archive log.   One bit in XLR_BKP_BLOCK_MASK
(XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
three of them are in use) and this mark can be set in XLogInsert().
With the both full_page_writes and full_page_compress on, both logical
log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
flag on.  Having both physical and logical log in a same WAL is not
harmful in the crash recovery.  In the crash recovery, physical log is
used if it's available.  Logical log is used in the archive recovery, as
the corresponding physical log will be removed.

4. The archive command (separate binary), removes physical logs if
XLR_BKP_REMOVABLE flag is on.   Physical logs will be replaced by a
minumum information of very small size, which is used to restore the
physical log to keep other log records's LSN consistent.

5. The restore command (separate binary) restores removed physical log
using the dummy record and restores LSN of other log records.

6. We need to rewrite redo functions so that they ignore the dummy
record inserted in 5.  The amount of code modification will be very small.

As a result, size of the archive log becomes as small as the case with
full_page_writes off, while the physical log is still available in the
crash recovery, maintaining the crash recovery chance.

Comments, questions and any input is welcome.

Koichi Suzuki, NTT Open Source Center

Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to