On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote > in > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > >> > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbal...@gmail.com> > > > >> wrote: > > > >> > Please find the patch for the same. I haven't added a test case for > > > >> > this yet. I mean we can write a test case to pause the recovery and > > > >> > get the status. But I am not sure that we can really write a > > > >> > reliable > > > >> > test case for 'pause requested' and 'paused'. > > > >> > > > >> +1 to just show the recovery pause state in the output of > > > >> pg_is_wal_replay_paused. But, should the function name > > > >> "pg_is_wal_replay_paused" be something like > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > >> in a function, I expect a boolean output. Others may have better > > > >> thoughts. > > > >> > > > >> IIUC the above change, ensuring the recovery is paused after it's > > > >> requested lies with the user. IMHO, the main problem we are trying to > > > >> solve is not met > > > > > > > > > > > > Basically earlier their was no way for the user yo know whether the > > > > recovery is actually paused or not because it was always returning true > > > > after pause requested. Now, we will return whether pause requested or > > > > actually paused. So > for tool designer who want to wait for recovery > > > > to get paused can have a loop and wait until the recovery state reaches > > > > to paused. That will give a better control. > > > > > > I get it and I agree to have that change. My point was whether we can > > > have a new function pg_wal_replay_pause_and_wait that waits until > > > recovery is actually paused ((along with pg_is_wal_replay_paused > > > returning the actual state than a true/false) so that tool developers > > > don't need to have the waiting code outside, if at all they care about > > > it? Others may have better thoughts than me. > > > > I think the previous patch was based on that idea where we thought > > that we can pass an argument to pg_is_wal_replay_paused which can > > decide whether to wait or return without the wait. I think this > > version looks better to me where we give the status instead of > > waiting. I am not sure whether we want another version of > > pg_wal_replay_pause which will wait for actually it to get paused. I > > mean there is always a scope to include the functionality in the > > database which can be achieved by the tool, but this patch was trying > > to solve the problem that there was no way to know the status so I > > think returning the correct status should be the scope of this. > > I understand that the requirement here is that no record is replayed > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused() > returns true, and delays taken while recovery don't delay the state > change. That requirements are really synchronous. > > On the other hand the machinery is designed to be asynchronous. > > > * Note that we intentionally don't take the info_lck spinlock > > * here. We might therefore read a slightly stale value of > > * the recoveryPause flag, but it can't be very stale (no > > * worse than the last spinlock we did acquire). Since a > > * pause request is a pretty asynchronous thing anyway, > > * possibly responding to it one WAL record later than we > > * otherwise would is a minor issue, so it doesn't seem worth > > * adding another spinlock cycle to prevent that. > > As the result, this patch tries to introduce several new checkpoints > to some delaying point so that that waits can find pause request in a > timely manner. I think we had better use locking (or atomics) for the > information instead of such scattered checkpoints if we expect that > machinery to work in a such syhcnronous manner. > > That would make the tri-state state variable and many checkpoints > unnecessary. Maybe.
I don't think the intention was so to make it synchronous, I think the main intention was that pg_is_wal_replay_paused can return us the correct state, in short user can know that whether any more wal will be replayed after pg_is_wal_replay_paused returns true or some other state. I agree that along with that we have also introduced some extra checkpoint where the recovery process is waiting for WAL and apply delay and from the pg_wal_replay_pause we had sent a signal to wakeup the recovery process. So I am not sure is it worth adding the lock/atomic variable to make this synchronous. Any other thoughts on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com