Where are we on this patch idea? ---------------------------------------------------------------------------
Koichi Suzuki wrote: > Sorry for the late responce; > > Gzip can reduce the archive log size about one fourth. My point is > that it can still be large enough. Removing physical log record (by > replacing them with logical log record) from archive log will achieve > will shrink the size of the archive log to one twentieth, in the case of > pgbehcn test about ten hours (3,600,000 transactions) with database size > about 2GB. In the case of gzip, maybe becuase of higher CPU load, > total throughput for gzip is less than just copying WAL to archive. In > our case, throughput seems to be slightly higher than just copying > (preserving physical log) or gzip. I'll gather the meaturement result > and try to post. > > The size of archive log seems not affected by the size of the database, > but just by the number of transactions. In the case of > full_page_writes=on and full_page_compress=on, compressed archive log > size seems to be dependent only on the number of transactions and > transaction characteristics. > > Our evaluation result is as follows: > Database size: 2GB > WAL size (after 10hours pgbench run): 48.3GB > gzipped size: 8.8GB > removal of the physical log: 2.36GB > fullpage_writes=off log size: 2.42GB > > The reason why archive log size of our case is slightly smaller than > full_page_writes=off is because we remove not only the physical logs > but also each page header and the dummy part at the tail of each log > segment. > > Further, we can apply gzip to this archive (2.36GB). Final size is > 0.75GB, less than one sixtieth of the original WAL. > > Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, > and our compression to 2.36GB needed about 1010sec, slightly less than > just cat command (1386sec). When gzip is combined with our compression > (48.3GB to 0.75GB), total duration was about 1330sec. > > This shows that phyiscal log removal is good selection for the following > case: > > 1) Need same crash recovery possibility as full_page_writes=on, and > 2) Need to shrink the size of archive log for loger period to store. > > Of course, if we care crash recovery in PITR slave, we still need > physical log records in archive log. In this case, because archive log > is not intended to be kept long, its size will not be an issue. > > I'm planning to do archive log size evalutation with other benchmarks > such as DBT-2 as well. > > Materials for this has already been thrown to HACKERS and PATCHES. I > hope you try this. > > > Jim Nasby wrote: > > I thought the drive behind full_page_writes = off was to reduce the > > amount of data being written to pg_xlog, not to shrink the size of a > > PITR log archive. > > > > ISTM that if you want to shrink a PITR log archive you'd be able to get > > good results by (b|g)zip'ing the WAL files in the archive. I quick test > > on my laptop shows over a 4x reduction in size. Presumably that'd be > > even larger if you increased the size of WAL segments. > > > > On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote: > > > >> This is a proposal for archive log compression keeping physical log in > >> WAL. > >> > >> In PotgreSQL 8.2, full-page_writes option came back to cut out physical > >> log both from WAL and archive log. To deal with the partial write > >> during the online backup, physical log is written only during the online > >> backup. > >> > >> Although this dramatically reduces the log size, it can risk the crash > >> recovery. If any page is inconsisitent because of the fault, crash > >> recovery doesn't work because full page images are necessary to recover > >> the page in such case. For critical use, especially in commercial use, > >> we don't like to risk the crash recovery chance, while reducing the > >> archive log size will be crucial too for larger databases. WAL size > >> itself may be less critical, because they're reused cyclickly. > >> > >> Here, I have a simple idea to reduce archive log size while keeping > >> physical log in xlog: > >> > >> 1. Create new GUC: full_page_compress, > >> > >> 2. Turn on both the full_page_writes and full_page_compress: physical > >> log will be written to WAL at the first write to a page after the > >> checkpoint, just as conventional full_page_writes ON. > >> > >> 3. Unless physical log is written during the online backup, this can be > >> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK > >> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only > >> three of them are in use) and this mark can be set in XLogInsert(). > >> With the both full_page_writes and full_page_compress on, both logical > >> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE > >> flag on. Having both physical and logical log in a same WAL is not > >> harmful in the crash recovery. In the crash recovery, physical log is > >> used if it's available. Logical log is used in the archive recovery, as > >> the corresponding physical log will be removed. > >> > >> 4. The archive command (separate binary), removes physical logs if > >> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a > >> minumum information of very small size, which is used to restore the > >> physical log to keep other log records's LSN consistent. > >> > >> 5. The restore command (separate binary) restores removed physical log > >> using the dummy record and restores LSN of other log records. > >> > >> 6. We need to rewrite redo functions so that they ignore the dummy > >> record inserted in 5. The amount of code modification will be very > >> small. > >> > >> As a result, size of the archive log becomes as small as the case with > >> full_page_writes off, while the physical log is still available in the > >> crash recovery, maintaining the crash recovery chance. > >> > >> Comments, questions and any input is welcome. > >> > >> ----- > >> Koichi Suzuki, NTT Open Source Center > >> > >> --Koichi Suzuki > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 6: explain analyze is your friend > >> > > > > -- > > Jim Nasby [EMAIL PROTECTED] > > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > > > > > > > > > -- > Koichi Suzuki > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate