On 2020-Apr-28, Kyotaro Horiguchi wrote:
> At Mon, 27 Apr 2020 18:33:42 -0400, Alvaro Herrera <[email protected]>
> wrote in
> > On 2020-Apr-08, Kyotaro Horiguchi wrote:
> >
> > > At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi
> > > <[email protected]> wrote in
> > Thanks for the fix! I propose two changes:
> >
> > 1. reword the error like this:
> >
> > ERROR: replication slot "regression_slot3" cannot be advanced
> > DETAIL: This slot has never previously reserved WAL, or has been
> > invalidated
>
> Agreed to describe what is failed rather than the cause. However,
> logical replications slots are always "previously reserved" at
> creation.
Bah, of course. I was thinking in making the equivalent messages all
identical in all callsites, but maybe they should be different when
slots are logical. I'll go over them again.
> > 2. use the same error in one other place, to wit
> > pg_logical_slot_get_changes() and pg_replication_slot_advance(). I
> > made the DETAIL part the same in all places, but the ERROR line is
> > adjusted to what each callsite is doing.
> > I do think that this change in test_decoding is a bit unpleasant:
> >
> > -ERROR: cannot use physical replication slot for logical decoding
> > +ERROR: cannot get changes from replication slot "repl"
> >
> > The test is
> > -- check that we're detecting a streaming rep slot used for logical
> > decoding
> > SELECT 'init' FROM pg_create_physical_replication_slot('repl');
> > SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1');
>
> The message may be understood as "No change has been made since
> restart_lsn". Does something like the following work?
>
> ERROR: replication slot "repl" is not usable to get changes
That wording seems okay, but my specific point for this error message is
that we were trying to use a physical slot to get logical changes; so
the fact that the slot has been invalidated is secondary and we should
complain about the *type* of slot rather than the restart_lsn.
> By the way there are some other messages that doesn't render the
> symptom but the cause.
>
> "cannot use physical replication slot for logical decoding"
> "replication slot \"%s\" was not created in this database"
>
> Don't they need the same amendment?
Maybe, but I don't want to start rewording every single message in uses
of replication slots ... I prefer to only modify the ones related to the
problem at hand.
> > > > On the other hand, physical replication doesn't break by invlidation.
> > > > [...]
> > Anyway I think the current patch can be applied as is -- and if we want
> > physical replication to have some other behavior, we can patch for that
> > afterwards.
>
> Agreed here. The false-invalidation doesn't lead to any serious
> consequences.
But does it? What happens, for example, if we have a slot used to get a
pg_basebackup, then time passes before starting to stream from it and is
invalidated? I think this "works fine" (meaning that once we try to
stream from the slot to replay at the restored base backup, we will
raise an error immediately), but I haven't tried.
The worst situation would be producing a corrupt replica. I don't think
this is possible.
The ideal behavior I think would be that pg_basebackup aborts
immediately when the slot is invalidated, to avoid wasting more time
producing a doomed backup.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services