> Date: Mon, 13 Jul 2020 17:07:31 -0400
> From: George Koehler <[email protected]>
> 
> On Mon, 13 Jul 2020 01:18:14 -0500
> Scott Cheloha <[email protected]> wrote:
> 
> > To review, the update protocol is:
> > 
> > 1. Set tk_generation to zero; the update has begun.
> > 
> > 2. Memory barrier.  The side effects of step (1) are "visible" to
> >    other CPUs before those of step (3).
> > 
> > 3. Update the other tk_* members from the active timehands.
> > 
> > 4. Memory barrier.  The side effects of step (3) are "visible" before
> >    those of step (5).
> > 
> > 5. Set tk_generation to th_generation; the update is over.  The
> >    timekeep page is consistent once more.
> > 
> > --
> > 
> > As far as I can tell, tk_generation always changes after an update,
> > during step (5).
> 
> There are 2 values, th0.th_generation and th1.th_generation.  When the
> kernel updates tk_generation, it switches between these 2 values, but
> the values might be equal.
> 
> >From /sys/kern/kern_tc.c tc_windup():
> 
>       tho = timehands;
>       th = tho->th_next;
>       ogen = th->th_generation;
>       th->th_generation = 0;
>       membar_producer();
>       ... /* update th */
>       if (++ogen == 0)
>               ogen = 1;
>       membar_producer();
>       th->th_generation = ogen;
> 
> timehands is a circular list, where th0->th_next == th1 and
> th1->th_next == th0.  Now suppose that we call tc_windup() with,
> 
>       timehands == th0
>       th0.th_generation == 2177
>       th1.th_generation == 2176
> 
> We assign tho = th0, th = th1, and ogen = 2176.  After we update the
> clock, we increment ogen and set th1.th_generation = 2177.  When we
> update the user timekeep page, we change tk_generation from 2177, the
> old th0.th_generation, to 2177, the new th1.th_generation; but libc
> sees the same value 2177 and can't detect the update.
> 
> We update the clock 100 times per second (in a 100 Hz kernel), but
> were incrementing tk_generation only 50 times per second.
> 
> I run the diff from my first mail (copied below).  It moves apart the
> initial values of th_generation, so they don't become equal:
> 
>       th0.th_generation = UINT_MAX / 2        /* changed from 1 */
>       th1.th_generation = 0                   /* not changed */
> 
> It might be better to change how tc_windup() sets ogen.    --George

Yeah, one possible approach would be to increment ogen by two.  A
little bit easier to check that they can never be the same since one
is always odd and the other is always even.

Another possible approach would be to export both timehands.  This
could help avoiding some of the looping int the bin*time() functions
in libc.  Not sure that's worth it.  And we should probably go for the
"quick" fix initially regardless.

> Index: kern/kern_tc.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_tc.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 kern_tc.c
> --- kern/kern_tc.c    6 Jul 2020 13:33:09 -0000       1.62
> +++ kern/kern_tc.c    13 Jul 2020 02:59:58 -0000
> @@ -98,7 +98,8 @@ static struct timehands th0 = {
>       .th_counter = &dummy_timecounter,
>       .th_scale = UINT64_MAX / 1000000,
>       .th_offset = { .sec = 1, .frac = 0 },
> -     .th_generation = 1,
> +     /* Keep apart generations of th0, th1, for user timekeep. */
> +     .th_generation = UINT_MAX / 2,
>       .th_next = &th1
>  };
> 
> 

Reply via email to