On Tue, Aug 08, 2023 at 08:00:47PM +0200, Mark Kettenis wrote: > > From: Dale Rahn <dale.r...@gmail.com> > > Date: Tue, 8 Aug 2023 12:36:45 -0400 > > > > Switching the computation of cycles/delaycnt to a proper 64 value math > > instead of the '32 bit safe' complex math is likely a good idea. > > However I am not completely convinced that switching to 'yield' (current > > CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the wait > > loop makes sense. In a hypervisor environment, this could cause a very > > short wait between register writes to become very long, In a non-hypervisor > > environment there is essentially no improvement because the yield doesn't > > really have any benefits on non-hypervisor. > > Dale, I think you're confused here. There is no arcitectural way to > trap a YIELD instruction. The docs explicitly state that SMT and SMP > systems. I suspect that this instruction was primarily introduced for > SMT systems that never materialized; I completely believe you that on > current hardware it does nothing. > > Even without a YIELD instruction a hypervisor might interrupt us and > schedule a different vCPU onto the core. And on real hardware > external interrupts may happen. So you really can't count on delay(9) > being accurate. > > Linux does use YIELD in it delay loop.
Okay good. > > To my current understanding, there is no useful 'wait short period' on arm > > cores. > > There is WFET (and WFIT), but that is Armv8.7 material, so not > availble on actual hardware that we run on. Linux uses that > instruction in its delay loop when available. These look great. Can't wait for them to be implemented outside of a functional simulator. > On machines with a working Generic Timer event stream, Linux uses WFE > if the delay is long enough. I have a prototype for this. Obviously it's a separate patch. Anyway, can I go ahead with the patch below to start? - Use 64-bit arithmetic to simplify agtimer_delay(). - Use CPU_BUSY_CYCLE() ("yield" on arm64) in the loop to match other delay(9) implementations. Index: agtimer.c =================================================================== RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v retrieving revision 1.23 diff -u -p -r1.23 agtimer.c --- agtimer.c 25 Jul 2023 18:16:19 -0000 1.23 +++ agtimer.c 10 Aug 2023 20:59:27 -0000 @@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void) void agtimer_delay(u_int usecs) { - uint64_t clock, oclock, delta, delaycnt; - uint64_t csec, usec; - volatile int j; + uint64_t cycles, start; - if (usecs > (0x80000000 / agtimer_frequency)) { - csec = usecs / 10000; - usec = usecs % 10000; - - delaycnt = (agtimer_frequency / 100) * csec + - (agtimer_frequency / 100) * usec / 10000; - } else { - delaycnt = agtimer_frequency * usecs / 1000000; - } - if (delaycnt <= 1) - for (j = 100; j > 0; j--) - ; - - oclock = agtimer_readcnt64(); - while (1) { - for (j = 100; j > 0; j--) - ; - clock = agtimer_readcnt64(); - delta = clock - oclock; - if (delta > delaycnt) - break; - } + start = agtimer_readcnt64(); + cycles = (uint64_t)usecs * agtimer_frequency / 1000000; + while (agtimer_readcnt64() - start < cycles) + CPU_BUSY_CYCLE(); } void