On Wed, Aug 3, 2016 at 9:39 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > I think we have a bit of a problem with the behaviour specified for logical > slots, one that makes it hard to prevent a outdated snapshot or backup of a > logical-slot-using downstream from knowing it's missing a chunk of data > that's been consumed from a slot. That's not great since slots are supposed > to ensure a continuous, gapless data stream. > > If the downstream requests that logical decoding restarts at an LSN older > than the slot's confirmed_flush_lsn, we silently ignore the client's request > and start replay at the confirmed_flush_lsn. That's by design and fine > normally, since we know the gap LSNs contained no transactions of interest > to the downstream.
Wow, that sucks. > The cause is an optimisation intended to allow the downstream to avoid > having to do local writes and flushes when the upstream's activity isn't of > interest to it and doesn't result in replicated rows. When the upstream does > a bunch of writes to another database or otherwise produces WAL not of > interest to the downstream we send the downstream keepalive messages that > include the upstream's current xlog position and the client replies to > acknowledge it's seen the new LSN. But, so that we can avoid disk flushes on > the downstream, we permit it to skip advancing its replication origin in > response to those keepalives. We continue to advance the confirmed_flush_lsn > and restart_lsn in the replication slot on the upstream so we can free WAL > that's not needed and move the catalog_xmin up. The replication origin on > the downstream falls behind the confirmed_flush_lsn on the upstream. This seems entirely too clever. The upstream could safely remember that if the downstream asks for WAL position X it's safe to begin streaming from WAL position Y because nothing in the middle is interesting, but it can hardly decide to unilaterally ignore the request position. > The simplest fix would be to require downstreams to flush their replication > origin when they get a hot standby feedback message, before they send a > reply with confirmation. That could be somewhat painful for performance, but > can be alleviated somewhat by waiting for the downstream postgres to get > around to doing a flush anyway and only forcing it if we're getting close to > the walsender timeout. That's pretty much what BDR and pglogical do when > applying transactions to avoid having to do a disk flush for each and every > applied xact. Then we change START_REPLICATION ... LOGICAL so it ERRORs if > you ask for a too-old LSN rather than silently ignoring it. That's basically just proposing to revert this broken optimization, IIUC, and instead just try not to flush too often on the standby. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers