On 5 August 2016 at 14:07, Andres Freund <and...@anarazel.de> wrote:
> > > The simplest fix would be to require downstreams to flush their
> > > origin when they get a hot standby feedback message, before they send a
> > > reply with confirmation. That could be somewhat painful for
> performance, but
> > > can be alleviated somewhat by waiting for the downstream postgres to
> > > around to doing a flush anyway and only forcing it if we're getting
> close to
> > > the walsender timeout. That's pretty much what BDR and pglogical do
> > > applying transactions to avoid having to do a disk flush for each and
> > > applied xact. Then we change START_REPLICATION ... LOGICAL so it
> ERRORs if
> > > you ask for a too-old LSN rather than silently ignoring it.
> > That's basically just proposing to revert this broken optimization,
> > IIUC, and instead just try not to flush too often on the standby.
> The effect of the optimization is *massive* if you are replicating a
> less active database, or a less active subset of a database, in a
> cluster with lots of other activity. I don't think that can just be
> disregard, to protect against something with plenty of other failure
Right. Though if we flush lazily I'm surprised the effect is that big,
you're the one who did the work and knows the significance of it.
All I'm trying to say is that I think the current behaviour is too
dangerous. It doesn't just lead to failure, but easy, undetectable, silent
failure when users perform common and simple tasks like starting a snapshot
or filesystem-level pg_start_backup() copy of a DB. The only reason it
can't happen for pg_basebackup too is that we omit slots during
pg_basebackup . That inconsistency between snapshot/fs-level and
pg_basebackup is unfortunate but understandable.
So I'm not saying "this whole idea must go". I'm saying I think it's to
permissive and needs to be able to be stricter about what it's allowed to
skip, so we can differentiate between "nothing interesting here" and "um, I
think someone else consumed data I needed, I'd better bail out now". I've
never been comfortable with the skipping behaviour and found it confusing
right from the start, but now I have definite cases it can cause silent
inconsistencies and really think it needs to be modified.
Robert's point that we could keep track of the skippable range is IMO a
good one. An extra slot attribute with the last LSN that resulted in the
output plugin doing a write to the client would be sufficient, at least at
this point. To anticipate future needs where we might want to allow output
plugins to ignore some things, control could be handed to the output plugin
by allowing it to also make a function call for the position to be
explicitly advanced even if it performs no writes.
That way we can safely skip ahead if the client asks us for an LSN equal to
or after the last real data we sent them, but refuse to skip if we sent
them data after the LSN they're asking for.
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services