On 1 June 2017 at 09:23, Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2017-06-01 09:12:04 +0800, Craig Ringer wrote: >> TL;DR: replication origins track LSN without timeline. This is >> ambiguous when physical failover is present since XXXXXXXX/XXXXXXXX >> can now represent more than one state due to timeline forks with >> promotions. Replication origins should track timelines so we can tell >> the difference, I propose to patch them accordingly for pg11. > > I'm not quite convinced that this should be tracked at the origin level. > If you fail over physically, shouldn't we also reconfigure logical > replication? > > Even if we decide this is necessary, I *strongly* suggest trying to get > the existing standby decoding etc wrapped up before starting something > nontrival afresh.
Yeah, I'm not thinking of leaping straight to a patch before we've got the rep on standby stuff nailed down. I just wanted to raise early discussion to make sure it's not entirely the wrong path and/or totally hopeless for core. Logical decoding output plugins would need to keep track of the timeline and send an extra message informing the downstream of a timeline change whenever they see a new timeline. Or include it in all messages (see: extra overhead). Since we don't stop a decoding session when we hit a timeline boundary and force re-connection. (Nor can we, since at some point our restart_lsn will be on the old timeline but the first commits will be on the new timeline). I'll need to think about if/how the decoding plugin can reliably do that. >> Take master A, its physical replica B, and logical decoding client X >> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only >> at 1/500. C has replicated from A up to 1/1000, when A fails. We >> promote B to replace A. Now C connects to B, and requests to resume at >> LSN 1/1000. > > Wouldn't it be better to solve this by querying the new master's > timeline history, and checking whether the current replay point is > pre/post fork? That could work. The decoding client would need to track the last-commit timeline in its own metadata if we're not letting it put it in the replication origin. Manageable, if awkward. Clients would need to know how to fetch and parse timeline history files, which is an irritating thing for every decoding client that wants to support failover to have to support. But I guess it's manageable, if not friendly. And non-Pg-based downstreams would have to do it anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers