While looking at the streaming replication patch, I can't help but wonder why our WAL format is so complicated.
WAL is divided into WAL segments, each 16 MB by default. Each WAL segment is divided into pages, 8k by default. At the beginning of each WAL page, there's a page header, but the header at the first page of each WAL segment contains a few extra fields. If a WAL record crosses a page boundary, we write as much of it as fits onto the first page, and so-called continuation records with the rest of the data on subsequent pages. In particular I wonder why we bother with the page headers. A much simpler format would be: - get rid of page headers, except for the header at the beginning of each WAL segment - get rid of continuation records - at the end of WAL segment, when there's not enough space to write the next WAL record, always write an XLOG SWITCH record to fill the rest of the segment. The page addr stored in the WAL page header gives some extra protection for detecting end of valid WAL correctly, but we rely on the prev-links and CRC within page for that anyway, so I wouldn't mind losing that. The changes to ReadRecord in the streaming replication patch feel a bit awkward, because it has to work around the fact that WAL is streamed as a stream of bytes, but ReadRecord works one page at a time. I'd like to replace ReadRecord with a simpler ring buffer approach, but handling the continuation records makes it a bit hard. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers