On 11 March 2016 at 20:15, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Craig Ringer wrote:
> > Hi all
> > I think I found a couple of logical decoding issues while writing tests
> > failover slots.
> > Despite the docs' claim that a logical slot will replay data "exactly
> > once", a slot's confirmed_lsn can go backwards and the SQL functions can
> > replay the same data more than once.We don't mark a slot as dirty if only
> > its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> > slots this means it also doesn't get replicated via WAL. After a master
> > crash, or for failover slots after a promote event, the confirmed_lsn
> > go backwards. Users of the SQL interface must keep track of the safely
> > locally flushed slot position themselves and throw the repeated data
> > Unlike with the walsender protocol it has no way to ask the server to
> > that data.
> > Worse, because we don't dirty the slot even a *clean shutdown* causes
> > confirmed_lsn to go backwards. That's a bug IMO. We should force a flush
> > all slots at the shutdown checkpoint, whether dirty or not, to address
> Why don't we mark the slot dirty when confirmed_lsn advances? If we fix
> that, doesn't it fix the other problems too?
Yes, it does.
That'll cause slots to be written out at checkpoints when they otherwise
wouldn't have to be, but I'd rather be doing a little more work in this
case. Compared to the disk activity from WAL decoding etc the effect should
be undetectable anyway.
Andres? Any objection to dirtying a slot when the confirmed lsn advances,
so we write it out at the next checkpoint?
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services