[HACKERS] Timeline following is a bit tangled...

Craig Ringer Sun, 17 Apr 2016 06:42:57 -0700

Hi all

As part of the work I did on timeline following for logical decoding I
mapped out the various code paths relating to timeline following in Pg.


https://wiki.postgresql.org/wiki/TimelineFollowing97

It's surprisingly complex (to me), with lots of completely separate logic
for each different path. Redo has one way to decide when to switch
timelines and which WAL segment to read from. The walsender has two, one
for physical replication and one (with a small overlap) for logical
replication. Logical replication over the SQL interface now has another,
which overlaps mostly but not entirely with the logical walsender one.

One thing that makes it very hard to follow the code (IMO) is that the
xlogreader is totally timeline agnostic. The xlogreader's callers decide
which timeline to read from and when to switch timelines. The actual WAL
segment to read from is determined by the read page callback that the
xlogreader invokes. The callback figures out the timeline by looking
"around" the xlogreader at global state in xlog.c (for redo) or walsender.c
(for phys/logical walsender).

Each place has its own logic for things like the early timeline switch
required to ensure that we read from a segment that's actually locally
present, since older timelines of the same segment won't be present or will
be renamed .partial .

I'd like to reduce the duplication here and try to make it a bit easier to
follow. If doing so doesn't seem worth the (undeniable) risks when messing
with redo then I'll just leave it untouched, I don't feel so strongly about
it as all that.


Because physical rep doesn't use the xlogreader it doesn't make sense to
just add timeline following to the xlogreader directly. It has to be
separate, usable by physical rep and the xlogreader. I think it should be
reasonable to have them both use the same state struct and function though,
where they can just call a func before reading each page to update the
timeline to read from if needed, then have their page read callback use
that timeline. It can keep track of the next timeline, the TLI switchpoint,
whether the timeline became historical since the last page was read, a copy
of the latest timeline history, etc etc, and should probably live in
timeline.c.

I'm not especially thrilled with the code I wrote for logical decoding
timeline following there, and I'm actually more inclined to base that logic
on the physical walsender's code, which is IMO the clearest we currently
have. Extract it, move its state globals into a struct, generalize it.

Sound not completely insane?

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[HACKERS] Timeline following is a bit tangled...

Reply via email to