Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

Heikki Linnakangas Mon, 17 Mar 2014 07:55:39 -0700

On 03/17/2014 04:33 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnakan...@vmware.com> writes:

2. Instead of storing the new compressed posting list in the WAL record,
store only the new item pointers added to the page. WAL replay would
then have to duplicate the work done in the main insertion code path:
find the right posting lists to insert to, decode them, add the new
items, and re-encode.


That sounds fairly dangerous ... is any user-defined code involved in
those decisions?

No.

This record format would be higher-level, in the sense that we would not
store the physical copy of the compressed posting list as it was formed
originally. The same work would be done at WAL replay. As the code
stands, it will produce exactly the same result, but that's not
guaranteed if we make bugfixes to the code later, and a master and
standby are running different minor version. There's not necessarily
anything wrong with that, but it's something to keep in mind.


Version skew would be a hazard too, all right.  I think it's important
that WAL replay be a pretty mechanical, predictable process.

Yeah. One particular point to note is that if in one place we do themore "high level" thing and have WAL replay re-encode the page as itsees fit, then we can *not* rely on the page being byte-by-byteidentical in other places. Like, in vacuum, where items are deleted.

Heap and B-tree WAL records also rely on PageAddItem etc. to reconstructthe page, instead of making a physical copy of the modified parts. And_bt_restore_page even inserts the items physically in different orderthan the normal codepath does. So for good or bad, there is someprecedence for this.

The imminent danger I see is if we change the logic on how the items aredivided into posting lists, and end up in a situation where a masterserver adds an item to a page, and it just fits, but with thecompression logic the standby version has, it cannot make it fit. As anescape hatch for that, we could have the WAL replay code try thecompression again, with a larger max. posting list size, if it doesn'tfit at first. And/or always leave something like 10 bytes of free spaceon every data page to make up for small differences in the logic.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

Reply via email to