When I worked on the XLogInsert scaling patch, it became apparent that some changes to the WAL format would make it a lot easier. So for 9.3, I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg representation, for identifying a WAL segment. This has no user-visible effect, but makes the code a bit simpler.

2. Don't waste the last WAL segment in each logical 4GB file. Currently, we skip the WAL segment ending with "FF". The comments claim that wasting the last segment "ensures that we don't have problems representing last-byte-position-plus-1", but in my experience, it just makes things more complicated. You have two ways to represent the segment boundary, and some functions are picky on which one is used. For example, XLogWrite() assumes that when you want to flush to the end of a logical log file, you use the "5/FF000000" representation, not "6/00000000". Other functions, like XLogPageRead(), expect the latter.

This is a backwards-incompatible change for external utilities that know how the WAL segment numbering works. Hopefully there aren't too many of those around.

3. Move the only field, xl_rem_len, from the continuation record header straight to the xlog page header, eliminating XLogContRecord altogether. This makes it easier to calculate in advance how much space a WAL record requires, as it no longer depends on how many pages it has to be split across. This wastes 4-8 bytes on every xlog page, but that's not much.

4. Allow WAL record header to be split across page boundaries. Currently, if there are less than SizeOfXLogRecord bytes left on the current WAL page, it is wasted, and the next record is inserted at the beginning of the next page. The problem with that is again that it makes it impossible to know in advance exactly how much space a WAL record requires, because it depends on how many bytes need to be wasted at the end of current page.

These changes will help the XLogInsert scaling patch, by making the space calculations simpler. In essence, to reserve space for a WAL record of size X, you just need to do "bytepos += X". There's a lot more details with that, like mapping from the contiguous byte position to an XLogRecPtr that takes page headers into account, and noticing RedoRecPtr changes safely, but it's a start.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to