> Date: Thu, 10 Aug 2023 16:01:10 -0500
> From: Scott Cheloha <scottchel...@gmail.com>
> 
> On Tue, Aug 08, 2023 at 08:00:47PM +0200, Mark Kettenis wrote:
> > > From: Dale Rahn <dale.r...@gmail.com>
> > > Date: Tue, 8 Aug 2023 12:36:45 -0400
> > > 
> > > Switching the computation of cycles/delaycnt to a proper 64 value math
> > > instead of the '32 bit safe' complex math is likely a good idea.
> > > However I am not completely convinced that switching to 'yield' (current
> > > CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the 
> > > wait
> > > loop makes sense. In a hypervisor environment, this could cause a very
> > > short wait between register writes to become very long, In a 
> > > non-hypervisor
> > > environment there is essentially no improvement because the yield doesn't
> > > really have any benefits on non-hypervisor.
> > 
> > Dale, I think you're confused here.  There is no arcitectural way to
> > trap a YIELD instruction.  The docs explicitly state that SMT and SMP
> > systems.  I suspect that this instruction was primarily introduced for
> > SMT systems that never materialized; I completely believe you that on
> > current hardware it does nothing.
> > 
> > Even without a YIELD instruction a hypervisor might interrupt us and
> > schedule a different vCPU onto the core.  And on real hardware
> > external interrupts may happen.  So you really can't count on delay(9)
> > being accurate.
> > 
> > Linux does use YIELD in it delay loop.
> 
> Okay good.
> 
> > > To my current understanding, there is no useful 'wait short period' on arm
> > > cores.
> > 
> > There is WFET (and WFIT), but that is Armv8.7 material, so not
> > availble on actual hardware that we run on.  Linux uses that
> > instruction in its delay loop when available.
> 
> These look great.  Can't wait for them to be implemented outside of a
> functional simulator.
> 
> > On machines with a working Generic Timer event stream, Linux uses WFE
> > if the delay is long enough.
> 
> I have a prototype for this.  Obviously it's a separate patch.
> 
> Anyway, can I go ahead with the patch below to start?
> 
> - Use 64-bit arithmetic to simplify agtimer_delay().
> 
> - Use CPU_BUSY_CYCLE() ("yield" on arm64) in the loop to match other
>   delay(9) implementations.

ok kettenis@

> Index: agtimer.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v
> retrieving revision 1.23
> diff -u -p -r1.23 agtimer.c
> --- agtimer.c 25 Jul 2023 18:16:19 -0000      1.23
> +++ agtimer.c 10 Aug 2023 20:59:27 -0000
> @@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void)
>  void
>  agtimer_delay(u_int usecs)
>  {
> -     uint64_t                clock, oclock, delta, delaycnt;
> -     uint64_t                csec, usec;
> -     volatile int            j;
> +     uint64_t cycles, start;
>  
> -     if (usecs > (0x80000000 / agtimer_frequency)) {
> -             csec = usecs / 10000;
> -             usec = usecs % 10000;
> -
> -             delaycnt = (agtimer_frequency / 100) * csec +
> -                 (agtimer_frequency / 100) * usec / 10000;
> -     } else {
> -             delaycnt = agtimer_frequency * usecs / 1000000;
> -     }
> -     if (delaycnt <= 1)
> -             for (j = 100; j > 0; j--)
> -                     ;
> -
> -     oclock = agtimer_readcnt64();
> -     while (1) {
> -             for (j = 100; j > 0; j--)
> -                     ;
> -             clock = agtimer_readcnt64();
> -             delta = clock - oclock;
> -             if (delta > delaycnt)
> -                     break;
> -     }
> +     start = agtimer_readcnt64();
> +     cycles = (uint64_t)usecs * agtimer_frequency / 1000000;
> +     while (agtimer_readcnt64() - start < cycles)
> +             CPU_BUSY_CYCLE();
>  }
>  
>  void
> 
> 

Reply via email to