On Wed, 1 Apr 2020 at 22:32, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/30 20:10, Masahiko Sawada wrote:
> > On Fri, 27 Mar 2020 at 17:54, Fujii Masao <masao.fu...@oss.nttdata.com> 
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/04 14:31, Masahiko Sawada wrote:
> >>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fu...@oss.nttdata.com> 
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2020/03/04 13:27, Michael Paquier wrote:
> >>>>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
> >>>>>> Yeah, so 0001 patch sets existing wait events to recovery conflict
> >>>>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
> >>>>>> to the recovery conflict on a snapshot. 0003 patch improves these wait
> >>>>>> events by adding the new type of wait event such as
> >>>>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
> >>>>>> is the fix for existing versions and 0003 patch is an improvement for
> >>>>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
> >>>>
> >>>> Yes, it looks like a improvement rather than bug fix.
> >>>>
> >>>
> >>> Okay, understand.
> >>>
> >>>>> I got my eyes on this patch set.  The full patch set is in my opinion
> >>>>> just a set of improvements, and not bug fixes, so I would refrain from
> >>>>> back-backpatching.
> >>>>
> >>>> I think that the issue (i.e., "waiting" is reported twice needlessly
> >>>> in PS display) that 0002 patch tries to fix is a bug. So it should be
> >>>> fixed even in the back branches.
> >>>
> >>> So we need only two patches: one fixes process title issue and another
> >>> improve wait event. I've attached updated patches.
> >>
> >> I started reading 
> >> v2-0002-Improve-wait-events-for-recovery-conflict-resolut.patch.
> >>
> >> -       ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
> >> +       ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_BUFFER_PIN);
> >>
> >> Currently the wait event indicating the wait for buffer pin has already
> >> been reported. But the above change in the patch changes the name of
> >> wait event for buffer pin only in the startup process. Is this really 
> >> useful?
> >> Isn't the existing wait event for buffer pin enough?
> >>
> >> -       /* Wait to be signaled by the release of the Relation Lock */
> >> -       ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
> >> +               /* Wait to be signaled by the release of the Relation Lock 
> >> */
> >> +               ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_LOCK);
> >>
> >> Same as above. Isn't the existing wait event enough?
> >
> > Yeah, we can use the existing wait events for buffer pin and lock.
> >
> >>
> >> -       /*
> >> -        * Progressively increase the sleep times, but not to more than 
> >> 1s, since
> >> -        * pg_usleep isn't interruptible on some platforms.
> >> -        */
> >> -       standbyWait_us *= 2;
> >> -       if (standbyWait_us > 1000000)
> >> -               standbyWait_us = 1000000;
> >> +       WaitLatch(MyLatch,
> >> +                         WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT,
> >> +                         STANDBY_WAIT_MS,
> >> +                         wait_event_info);
> >> +       ResetLatch(MyLatch);
> >>
> >> ResetLatch() should be called before WaitLatch()?
> >
> > Fixed.
> >
> >>
> >> Could you tell me why you dropped the "increase-sleep-times" logic?
> >
> > I thought we can remove it because WaitLatch is interruptible but my
> > observation was not correct. The waiting startup process is not
> > necessarily woken up by signal. I think it's still better to not wait
> > more than 1 sec even if it's an interruptible wait.
>
> So we don't need to use WaitLatch() there, i.e., it's ok to keep using
> pg_usleep()?
>
> > Attached patch fixes the above and introduces only two wait events of
> > conflict resolution: snapshot and tablespace.
>
> Many thanks for updating the patch!
>
> -       /* Wait to be signaled by the release of the Relation Lock */
> -       ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
> +               /* Wait to be signaled by the release of the Relation Lock */
> +               ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
> +       }
>
> Is this change really valid? What happens if the latch is set during
> ResolveRecoveryConflictWithVirtualXIDs()?
> ResolveRecoveryConflictWithVirtualXIDs() can return after the latch
> is set but before WaitLatch() in WaitExceedsMaxStandbyDelay() is reached.

Thank you for reviewing the patch!

You're right. It's better to keep using pg_usleep() and set the wait
event by pgstat_report_wait_start().

>
> +               default:
> +                       event_name = "unknown wait event";
> +                       break;
>
> Seems this default case should be removed. Please see other
> pgstat_get_wait_xxx() function, so there is no such code.
>
> > I also removed the wait
> > event of conflict resolution of database since it's unlikely to become
> > a user-visible and a long sleep as we discussed before.
>
> Is it worth defining new wait event type RecoveryConflict only for
> those two events? ISTM that it's ok to use IPC event type here.
>

I dropped a new wait even type and added them to IPC wait event type.

I've attached the new version patch.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: recovery_conflict_wait_event_v4.patch
Description: Binary data

Reply via email to