On Wed, Aug 3, 2016 at 9:39 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> I think we have a bit of a problem with the behaviour specified for logical
> slots, one that makes it hard to prevent a outdated snapshot or backup of a
> logical-slot-using downstream from knowing it's missing a chunk of data
> that's been consumed from a slot. That's not great since slots are supposed
> to ensure a continuous, gapless data stream.
> If the downstream requests that logical decoding restarts at an LSN older
> than the slot's confirmed_flush_lsn, we silently ignore the client's request
> and start replay at the confirmed_flush_lsn. That's by design and fine
> normally, since we know the gap LSNs contained no transactions of interest
> to the downstream.

Wow, that sucks.

> The cause is an optimisation intended to allow the downstream to avoid
> having to do local writes and flushes when the upstream's activity isn't of
> interest to it and doesn't result in replicated rows. When the upstream does
> a bunch of writes to another database or otherwise produces WAL not of
> interest to the downstream we send the downstream keepalive messages that
> include the upstream's current xlog position and the client replies to
> acknowledge it's seen the new LSN. But, so that we can avoid disk flushes on
> the downstream, we permit it to skip advancing its replication origin in
> response to those keepalives. We continue to advance the confirmed_flush_lsn
> and restart_lsn in the replication slot on the upstream so we can free WAL
> that's not needed and move the catalog_xmin up. The replication origin on
> the downstream falls behind the confirmed_flush_lsn on the upstream.

This seems entirely too clever.  The upstream could safely remember
that if the downstream asks for WAL position X it's safe to begin
streaming from WAL position Y because nothing in the middle is
interesting, but it can hardly decide to unilaterally ignore the
request position.

> The simplest fix would be to require downstreams to flush their replication
> origin when they get a hot standby feedback message, before they send a
> reply with confirmation. That could be somewhat painful for performance, but
> can be alleviated somewhat by waiting for the downstream postgres to get
> around to doing a flush anyway and only forcing it if we're getting close to
> the walsender timeout. That's pretty much what BDR and pglogical do when
> applying transactions to avoid having to do a disk flush for each and every
> applied xact. Then we change START_REPLICATION ... LOGICAL so it ERRORs if
> you ask for a too-old LSN rather than silently ignoring it.

That's basically just proposing to revert this broken optimization,
IIUC, and instead just try not to flush too often on the standby.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to