Re: Is Recovery actually paused?

Dilip Kumar Sun, 24 Jan 2021 20:35:55 -0800

On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi
<horikyota....@gmail.com> wrote:
>
> At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote 
> in
> > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > >
> > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbal...@gmail.com> wrote:
> > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> > > > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > >>
> > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbal...@gmail.com> 
> > > >> wrote:
> > > >> > Please find the patch for the same.  I haven't added a test case for
> > > >> > this yet.  I mean we can write a test case to pause the recovery and
> > > >> > get the status.  But I am not sure that we can really write a 
> > > >> > reliable
> > > >> > test case for 'pause requested' and 'paused'.
> > > >>
> > > >> +1 to just show the recovery pause state in the output of
> > > >> pg_is_wal_replay_paused. But, should the function name
> > > >> "pg_is_wal_replay_paused" be something like
> > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > >> in a function, I expect a boolean output. Others may have better
> > > >> thoughts.
> > > >>
> > > >> IIUC the above change, ensuring the recovery is paused after it's
> > > >> requested lies with the user. IMHO, the main problem we are trying to
> > > >> solve is not met
> > > >
> > > >
> > > > Basically earlier their was no way for the user yo know whether the 
> > > > recovery is actually paused or not because it was always returning true 
> > > > after pause requested.  Now, we will return whether pause requested or 
> > > > actually paused.  So > for tool designer who want to wait for recovery 
> > > > to get paused can have a loop and wait until the recovery state reaches 
> > > > to paused.  That will give a better control.
> > >
> > > I get it and I agree to have that change. My point was whether we can
> > > have a new function pg_wal_replay_pause_and_wait that waits until
> > > recovery is actually paused ((along with pg_is_wal_replay_paused
> > > returning the actual state than a true/false) so that tool developers
> > > don't need to have the waiting code outside, if at all they care about
> > > it? Others may have better thoughts than me.
> >
> > I think the previous patch was based on that idea where we thought
> > that we can pass an argument to pg_is_wal_replay_paused which can
> > decide whether to wait or return without the wait.  I think this
> > version looks better to me where we give the status instead of
> > waiting.  I am not sure whether we want another version of
> > pg_wal_replay_pause which will wait for actually it to get paused.  I
> > mean there is always a scope to include the functionality in the
> > database which can be achieved by the tool, but this patch was trying
> > to solve the problem that there was no way to know the status so I
> > think returning the correct status should be the scope of this.
>
> I understand that the requirement here is that no record is replayed
> after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
> returns true, and delays taken while recovery don't delay the state
> change. That requirements are really synchronous.
>
> On the other hand the machinery is designed to be asynchronous.
>
> >        * Note that we intentionally don't take the info_lck spinlock
> >        * here.  We might therefore read a slightly stale value of
> >        * the recoveryPause flag, but it can't be very stale (no
> >        * worse than the last spinlock we did acquire).  Since a
> >        * pause request is a pretty asynchronous thing anyway,
> >        * possibly responding to it one WAL record later than we
> >        * otherwise would is a minor issue, so it doesn't seem worth
> >        * adding another spinlock cycle to prevent that.
>
> As the result, this patch tries to introduce several new checkpoints
> to some delaying point so that that waits can find pause request in a
> timely manner. I think we had better use locking (or atomics) for the
> information instead of such scattered checkpoints if we expect that
> machinery to work in a such syhcnronous manner.
>
> That would make the tri-state state variable and many checkpoints
> unnecessary. Maybe.


I don't think the intention was so to make it synchronous,  I think
the main intention was that pg_is_wal_replay_paused can return us the
correct state, in short user can know that whether any more wal will
be replayed after pg_is_wal_replay_paused returns true or some other
state.  I agree that along with that we have also introduced some
extra checkpoint where the recovery process is waiting for WAL and
apply delay and from the pg_wal_replay_pause we had sent a signal to
wakeup the recovery process.  So I am not sure is it worth adding the
lock/atomic variable to make this synchronous.  Any other thoughts on
this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Is Recovery actually paused?

Reply via email to