Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Thu, Sep 08, 2022 at 08:24:11AM -0500, Scott Cheloha wrote: > On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote: > > On 09/07, Scott Cheloha wrote: > > > Just to make sure that my changes to acpihpet(4) actually caused > > > the problem, I have a few more questions: > > > > > > 1. When did you change the OS type? > > > > 03 August, after that, there was a local snpashot built from sources > > fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected. > > > > > 2. What was the compilation date of the kernel where you first saw the > > >problem? > > > > Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34 > > +0300. > > > > > 3. If you boot an unpatched kernel does the problem manifest in > > >exactly the same way? > > > > As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11 > > versions (32-bit)", but to "FreeBSD 11 (32-bit)". > > The problem affects only VMs which have Guest OS Version set to "FreeBSD > > Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022. > > > > Sample outputs from the machine running older snapshot which is not > > affected: > > > > $ sysctl kern.version | head -n1 > > kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK > > 2022 > > > > $ sysctl | grep tsc > > kern.timecounter.hardware=tsc > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > > acpitimer0(1000) > > machdep.tscfreq=1500017850 > > machdep.invarianttsc=1 > > Please send a full bug report to b...@openbsd.org. > > Include full a dmesg for the affected machine. > > Something else with VMWare is involved here and I'm not enough of an > expert to say what. More eyes will be helpful. > > Your problem is *probably* caused by my changes, but there are too > many moving parts with all the different VMWare configurations for me > to narrow down what the issue is definitively. I have committed the acpihpet(4) fix. Please send a full bug report to b...@openbsd.org so we can continue poking at this. I have some test code I'd like you to try out so we can get a better look at the TSC calibration process on your machine.
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote: > On 09/07, Scott Cheloha wrote: > > Just to make sure that my changes to acpihpet(4) actually caused > > the problem, I have a few more questions: > > > > 1. When did you change the OS type? > > 03 August, after that, there was a local snpashot built from sources > fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected. > > > 2. What was the compilation date of the kernel where you first saw the > >problem? > > Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34 > +0300. > > > 3. If you boot an unpatched kernel does the problem manifest in > >exactly the same way? > > As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11 > versions (32-bit)", but to "FreeBSD 11 (32-bit)". > The problem affects only VMs which have Guest OS Version set to "FreeBSD > Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022. > > Sample outputs from the machine running older snapshot which is not > affected: > > $ sysctl kern.version | head -n1 > kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK > 2022 > > $ sysctl | grep tsc > kern.timecounter.hardware=tsc > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > acpitimer0(1000) > machdep.tscfreq=1500017850 > machdep.invarianttsc=1 Please send a full bug report to b...@openbsd.org. Include full a dmesg for the affected machine. Something else with VMWare is involved here and I'm not enough of an expert to say what. More eyes will be helpful. Your problem is *probably* caused by my changes, but there are too many moving parts with all the different VMWare configurations for me to narrow down what the issue is definitively.
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On 09/07, Scott Cheloha wrote: > Just to make sure that my changes to acpihpet(4) actually caused > the problem, I have a few more questions: > > 1. When did you change the OS type? 03 August, after that, there was a local snpashot built from sources fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected. > 2. What was the compilation date of the kernel where you first saw the >problem? Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34 +0300. > 3. If you boot an unpatched kernel does the problem manifest in >exactly the same way? As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11 versions (32-bit)", but to "FreeBSD 11 (32-bit)". The problem affects only VMs which have Guest OS Version set to "FreeBSD Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022. Sample outputs from the machine running older snapshot which is not affected: $ sysctl kern.version | head -n1 kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK 2022 $ sysctl | grep tsc kern.timecounter.hardware=tsc kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000) machdep.tscfreq=1500017850 machdep.invarianttsc=1 -- With best regards, Pavel Korovin
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
Oh no, sorry, that was too early. Double-checked once again, no, this problem still exists if Guest OS Version is set to "FreeBSD Pre-11 versions (32-bit)". If I set Guest OS Version to "FreeBSD 11 (32-bit)", the problem disappears. On 09/08, Pavel Korovin wrote: > Hi Scott, > > Thank you for the fix! > > I found what triggered this behaviour: it was the change in Guest OS > Version in VM Options. > > I deploy VMs with sysutils/packer, some time ago I noticed that VM type in > my templates is specified as freebsd11_64Guest, which isn't consistent with > vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)". > > After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this > problem with tsc. > > The provided diff fixes it: > > $ sysctl -a | grep tsc > kern.timecounter.hardware=tsc > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > acpitimer0(1000) > machdep.tscfreq=158129 > machdep.invarianttsc=1 -- With best regards, Pavel Korovin
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Thu, Sep 08, 2022 at 05:01:33AM +0300, Pavel Korovin wrote: > Hi Scott, > > Thank you for the fix! > > I found what triggered this behaviour: it was the change in Guest OS > Version in VM Options. > > I deploy VMs with sysutils/packer, some time ago I noticed that VM type in > my templates is specified as freebsd11_64Guest, which isn't consistent with > vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)". > > After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this > problem with tsc. > > > The provided diff fixes it: > > $ sysctl -a | grep tsc > kern.timecounter.hardware=tsc > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > acpitimer0(1000) > machdep.tscfreq=158129 > machdep.invarianttsc=1 I'm glad you are not seeing the issue anymore. Just to make sure that my changes to acpihpet(4) actually caused the problem, I have a few more questions: 1. When did you change the OS type? 2. What was the compilation date of the kernel where you first saw the problem? 3. If you boot an unpatched kernel does the problem manifest in exactly the same way? -Scott
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
Hi Scott, Thank you for the fix! I found what triggered this behaviour: it was the change in Guest OS Version in VM Options. I deploy VMs with sysutils/packer, some time ago I noticed that VM type in my templates is specified as freebsd11_64Guest, which isn't consistent with vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)". After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this problem with tsc. The provided diff fixes it: $ sysctl -a | grep tsc kern.timecounter.hardware=tsc kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000) machdep.tscfreq=158129 machdep.invarianttsc=1 On 09/06, Scott Cheloha wrote: > On Sat, Sep 03, 2022 at 01:50:28PM +0300, Pavel Korovin wrote: > > After these changes, OpenBSD VMware guest's clock is galloping into the > > future like this: > > Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s > > Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s > > Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s > > Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s > > Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s > > Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s > > Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s > > Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s > > Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s > > Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s > > Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s > > > > VM time right after boot: > > rdate -pn $ntp; date > > Sat Sep 3 13:39:43 MSK 2022 > > Sat Sep 3 13:43:24 MSK 2022 > > > > $ sysctl -a | grep tsc > > kern.timecounter.hardware=tsc > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > > acpitimer0(1000) > > machdep.tscfreq=580245275 > > This frequency looks wrong. > > My first guess is that you are hitting a split-read problem in > acpihpet_delay() when recalibrating the TSC. > > Does this patch fix it? > > If you can't build a kernel for testing I can just commit this and you > can try the snapshot in a day or two. > > Index: acpihpet.c > === > RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v > retrieving revision 1.28 > diff -u -p -r1.28 acpihpet.c > --- acpihpet.c25 Aug 2022 18:01:54 - 1.28 > +++ acpihpet.c6 Sep 2022 16:12:23 - > @@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s > void > acpihpet_delay(int usecs) > { > - uint64_t c, s; > + uint64_t count = 0, cycles; > struct acpihpet_softc *sc = hpet_timecounter.tc_priv; > + uint32_t val1, val2; > > - s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); > - c = usecs * hpet_timecounter.tc_frequency / 100; > - while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c) > + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); > + cycles = usecs * hpet_timecounter.tc_frequency / 100; > + while (count < cycles) { > CPU_BUSY_CYCLE(); > + val1 = val2; > + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, > + HPET_MAIN_COUNTER); > + count += val2 - val1; > + } > } > > u_int -- With best regards, Pavel Korovin
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Sat, Sep 03, 2022 at 01:50:28PM +0300, Pavel Korovin wrote: > After these changes, OpenBSD VMware guest's clock is galloping into the > future like this: > Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s > Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s > Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s > Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s > Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s > Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s > Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s > Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s > Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s > Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s > Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s > > VM time right after boot: > rdate -pn $ntp; date > Sat Sep 3 13:39:43 MSK 2022 > Sat Sep 3 13:43:24 MSK 2022 > > $ sysctl -a | grep tsc > kern.timecounter.hardware=tsc > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) > acpitimer0(1000) > machdep.tscfreq=580245275 This frequency looks wrong. My first guess is that you are hitting a split-read problem in acpihpet_delay() when recalibrating the TSC. Does this patch fix it? If you can't build a kernel for testing I can just commit this and you can try the snapshot in a day or two. Index: acpihpet.c === RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v retrieving revision 1.28 diff -u -p -r1.28 acpihpet.c --- acpihpet.c 25 Aug 2022 18:01:54 - 1.28 +++ acpihpet.c 6 Sep 2022 16:12:23 - @@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s void acpihpet_delay(int usecs) { - uint64_t c, s; + uint64_t count = 0, cycles; struct acpihpet_softc *sc = hpet_timecounter.tc_priv; + uint32_t val1, val2; - s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); - c = usecs * hpet_timecounter.tc_frequency / 100; - while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c) + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); + cycles = usecs * hpet_timecounter.tc_frequency / 100; + while (count < cycles) { CPU_BUSY_CYCLE(); + val1 = val2; + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, + HPET_MAIN_COUNTER); + count += val2 - val1; + } } u_int
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
After these changes, OpenBSD VMware guest's clock is galloping into the future like this: Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s VM time right after boot: rdate -pn $ntp; date Sat Sep 3 13:39:43 MSK 2022 Sat Sep 3 13:43:24 MSK 2022 $ sysctl -a | grep tsc kern.timecounter.hardware=tsc kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000) machdep.tscfreq=580245275 machdep.invarianttsc=1 And in a couple of minutes after boot the timeskew getting like this: $ rdate -pn $ntp; date Sat Sep 3 13:41:49 MSK 2022 Sat Sep 3 13:48:46 MSK 2022 After several hours the VM lives in tomorrow. As a workaround, changing TSC source helps: # sysctl kern.timecounter.hardware=acpihpet0 -- With best regards, Pavel Korovin
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
> On Aug 29, 2022, at 22:54, Jonathan Gray wrote: > > On Mon, Aug 29, 2022 at 12:02:42PM -0500, Scott Cheloha wrote: >> If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a >> vm exit, and both have microsecond or better resolution, then >> hv_delay() is the preferable delay(9) implementation where it is >> available because vm exits have ambiguous overhead. > > with hv_delay() currently doing rdmsr I wouldn't say never > >> >> If that seems sensible to you, I'll commit this switch. > > There is a msr to allow cpuid to report invariant tsc (0x4118). > Used by linux but not documented in the hyper-v tlfs. > > Without that, tsc delay is never used on hyper-v. So leave it > as is until someone running hyper-v/azure would like it changed? Works for me.
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Mon, Aug 29, 2022 at 12:02:42PM -0500, Scott Cheloha wrote: > If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a > vm exit, and both have microsecond or better resolution, then > hv_delay() is the preferable delay(9) implementation where it is > available because vm exits have ambiguous overhead. with hv_delay() currently doing rdmsr I wouldn't say never > > If that seems sensible to you, I'll commit this switch. There is a msr to allow cpuid to report invariant tsc (0x4118). Used by linux but not documented in the hyper-v tlfs. Without that, tsc delay is never used on hyper-v. So leave it as is until someone running hyper-v/azure would like it changed? > > Index: arch/amd64/amd64/tsc.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v > retrieving revision 1.26 > diff -u -p -r1.26 tsc.c > --- arch/amd64/amd64/tsc.c25 Aug 2022 17:38:16 - 1.26 > +++ arch/amd64/amd64/tsc.c29 Aug 2022 16:58:25 - > @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) > > tsc_frequency = tsc_freq_cpuid(ci); > if (tsc_frequency > 0) > - delay_init(tsc_delay, 5000); > + delay_init(tsc_delay, 4000); > } > > static inline int > Index: dev/pv/pvbus.c > === > RCS file: /cvs/src/sys/dev/pv/pvbus.c,v > retrieving revision 1.25 > diff -u -p -r1.25 pvbus.c > --- dev/pv/pvbus.c25 Aug 2022 17:38:16 - 1.25 > +++ dev/pv/pvbus.c29 Aug 2022 16:58:26 - > @@ -320,7 +320,7 @@ pvbus_hyperv(struct pvbus_hv *hv) > > #if NHYPERV > 0 > if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT) > - delay_init(hv_delay, 4000); > + delay_init(hv_delay, 5000); > #endif > } > > >
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Thu, Aug 25, 2022 at 03:57:48PM +1000, Jonathan Gray wrote: > On Wed, Aug 24, 2022 at 11:05:30PM -0500, Scott Cheloha wrote: > > On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote: > > > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote: > > > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual > > > > > devices. 32-bit and 64-bit guests. > > > > > > > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. > > > > > 64-bit guests only. > > > > > > > > > > There is no 8254 in generation 2. > > > > > No HPET in either generation. > > > > > > > > > > hv_delay uses the "Partition Reference Counter MSR" described in > > > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers > > > > > It seems it is available in both generations and could be used from > > > > > i386? > > > > > > > > > > From reading that page hv_delay() should be preferred over > > > > > lapic_delay() > > > > > > > > Alright, I have nudged hv_delay's quality up over lapic_delay's > > > > quality. > > > > > > Before these changes, tsc is probed before pvbus. Do the tsc sanity > > > checks result in it not being considered an option on hyper-v? I think > > > the tsc_delay and hv_delay numbers should be swapped in a later commit. > > > It is unclear if that would change the final delay_func setting. > > > > Why would we prefer hv_delay() to tsc_delay() if we had a > > constant/invariant TSC available in our Hyper-V guest? > > > > When patrick@ emailed me last year about issues with delay(9) on > > Hyper-V, he started by saying that the root of the problem was that > > the OpenBSD guest was not opting to use tsc_delay() because the host > > wasn't reporting a constant/invariant TSC. So the guest was trying to > > use i8254_delay(), which was impossible because Hyper-V Gen2 guests > > don't have an i8254. Hence, hv_delay() was added to the tree. > > > > So, my understanding is that the addition of hv_delay() does not mean > > tsc_delay() is worse than hv_delay(). hv_delay() was added because > > tsc_delay() isn't always an option and (to our surprise) neither is > > i8254_delay(). > > I'm not clear on when rdtsc and rdmsr would cause a vm exit. > Presumably the reference tsc page is provided to avoid that, > but we don't use it. rdtsc and rdmsr don't always cause an exit. > > The wording of Microsoft's "Hypervisor Top Level Functional > Specification" reads as the interface is only available when > the underlying machine has a constant frequency tsc. It also > makes the point that the interface being in time not cycles avoids > problems when the tsc frequency changes on live migration. > > "12.3 Partition Reference Time Enlightenment > > The partition reference time enlightenment presents a reference > time source to a partition which does not require an intercept into > the hypervisor. This enlightenment is available only when the > underlying platform provides support of an invariant processor Time > Stamp Counter (TSC), or iTSC. In such platforms, the processor TSC > frequency remains constant irrespective of changes in the processor's > clock frequency due to the use of power management states such as > ACPI processor performance states, processor idle sleep states (ACPI > C-states), etc. > > The partition reference time enlightenment uses a virtual TSC value, > an offset and a multiplier to enable a guest partition to compute > the normalized reference time since partition creation, in 100nS > units. The mechanism also allows a guest partition to atomically > compute the reference time when the guest partition is migrated to > a platform with a different TSC rate, and provides a fallback > mechanism to support migration to platforms without the constant > rate TSC feature. > > This facility is not intended to be used a source of wall clock > time, since the reference time computed using this facility will > appear to stop during the time that a guest partition is saved until > the subsequent restore." If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a vm exit, and both have microsecond or better resolution, then hv_delay() is the preferable delay(9) implementation where it is available because vm exits have ambiguous overhead. If that seems sensible to you, I'll commit this switch. Index: arch/amd64/amd64/tsc.c === RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v retrieving revision 1.26 diff -u -p -r1.26 tsc.c --- arch/amd64/amd64/tsc.c 25 Aug 2022 17:38:16 - 1.26 +++ arch/amd64/amd64/tsc.c 29 Aug 2022 16:58:25 - @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) tsc_frequency = tsc_freq_cpuid(ci); if (tsc_frequency > 0) - delay_init(tsc_delay, 5000); + delay_init(tsc_delay, 4000); } static inline int Index: dev/pv/pvbus.c
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 24, 2022 at 11:05:30PM -0500, Scott Cheloha wrote: > On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote: > > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote: > > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual > > > > devices. 32-bit and 64-bit guests. > > > > > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. > > > > 64-bit guests only. > > > > > > > > There is no 8254 in generation 2. > > > > No HPET in either generation. > > > > > > > > hv_delay uses the "Partition Reference Counter MSR" described in > > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers > > > > It seems it is available in both generations and could be used from > > > > i386? > > > > > > > > From reading that page hv_delay() should be preferred over lapic_delay() > > > > > > Alright, I have nudged hv_delay's quality up over lapic_delay's > > > quality. > > > > Before these changes, tsc is probed before pvbus. Do the tsc sanity > > checks result in it not being considered an option on hyper-v? I think > > the tsc_delay and hv_delay numbers should be swapped in a later commit. > > It is unclear if that would change the final delay_func setting. > > Why would we prefer hv_delay() to tsc_delay() if we had a > constant/invariant TSC available in our Hyper-V guest? > > When patrick@ emailed me last year about issues with delay(9) on > Hyper-V, he started by saying that the root of the problem was that > the OpenBSD guest was not opting to use tsc_delay() because the host > wasn't reporting a constant/invariant TSC. So the guest was trying to > use i8254_delay(), which was impossible because Hyper-V Gen2 guests > don't have an i8254. Hence, hv_delay() was added to the tree. > > So, my understanding is that the addition of hv_delay() does not mean > tsc_delay() is worse than hv_delay(). hv_delay() was added because > tsc_delay() isn't always an option and (to our surprise) neither is > i8254_delay(). I'm not clear on when rdtsc and rdmsr would cause a vm exit. Presumably the reference tsc page is provided to avoid that, but we don't use it. rdtsc and rdmsr don't always cause an exit. The wording of Microsoft's "Hypervisor Top Level Functional Specification" reads as the interface is only available when the underlying machine has a constant frequency tsc. It also makes the point that the interface being in time not cycles avoids problems when the tsc frequency changes on live migration. "12.3 Partition Reference Time Enlightenment The partition reference time enlightenment presents a reference time source to a partition which does not require an intercept into the hypervisor. This enlightenment is available only when the underlying platform provides support of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms, the processor TSC frequency remains constant irrespective of changes in the processor's clock frequency due to the use of power management states such as ACPI processor performance states, processor idle sleep states (ACPI C-states), etc. The partition reference time enlightenment uses a virtual TSC value, an offset and a multiplier to enable a guest partition to compute the normalized reference time since partition creation, in 100nS units. The mechanism also allows a guest partition to atomically compute the reference time when the guest partition is migrated to a platform with a different TSC rate, and provides a fallback mechanism to support migration to platforms without the constant rate TSC feature. This facility is not intended to be used a source of wall clock time, since the reference time computed using this facility will appear to stop during the time that a guest partition is saved until the subsequent restore."
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote: > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote: > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual > > > devices. 32-bit and 64-bit guests. > > > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. > > > 64-bit guests only. > > > > > > There is no 8254 in generation 2. > > > No HPET in either generation. > > > > > > hv_delay uses the "Partition Reference Counter MSR" described in > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers > > > It seems it is available in both generations and could be used from i386? > > > > > > From reading that page hv_delay() should be preferred over lapic_delay() > > > > Alright, I have nudged hv_delay's quality up over lapic_delay's > > quality. > > Before these changes, tsc is probed before pvbus. Do the tsc sanity > checks result in it not being considered an option on hyper-v? I think > the tsc_delay and hv_delay numbers should be swapped in a later commit. > It is unclear if that would change the final delay_func setting. Why would we prefer hv_delay() to tsc_delay() if we had a constant/invariant TSC available in our Hyper-V guest? When patrick@ emailed me last year about issues with delay(9) on Hyper-V, he started by saying that the root of the problem was that the OpenBSD guest was not opting to use tsc_delay() because the host wasn't reporting a constant/invariant TSC. So the guest was trying to use i8254_delay(), which was impossible because Hyper-V Gen2 guests don't have an i8254. Hence, hv_delay() was added to the tree. So, my understanding is that the addition of hv_delay() does not mean tsc_delay() is worse than hv_delay(). hv_delay() was added because tsc_delay() isn't always an option and (to our surprise) neither is i8254_delay(). > It would be a good idea to have different commits for the places new > delay callbacks are introduced. > > - add delay_init() > - use delay_init() in lapic, tsc, hv_delay > - commit acpihpet > - commit acpitimer I had planned to do separate commits. This ordering seems right. > - swap tsc and hv_delay numbers See above. > > How are we looking now? > > some minor suggestions inline > > have you built a release with this? Just finished building a release and upgrading with it from physical media. I think we are good to go. I incorporated your suggestions below and I'm going to do the first four suggested commits tomorrow unless I hear otherwise. Current combined patch is attached. > > Index: sys/arch/amd64/amd64/lapic.c > > === > > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v > > retrieving revision 1.60 > > diff -u -p -r1.60 lapic.c > > --- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 - 1.60 > > +++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 - > > @@ -486,8 +486,6 @@ wait_next_cycle(void) > > } > > } > > > > -extern void tsc_delay(int); > > - > > this cleanup is unrelated and should be a different diff/commit Ack, will do it separately. > > /* > > * Calibrate the local apic count-down timer (which is running at > > * bus-clock speed) vs. the i8254 counter/timer (which is running at > > @@ -592,8 +590,7 @@ skip_calibration: > > * Now that the timer's calibrated, use the apic timer routines > > * for all our timing needs.. > > */ > > - if (delay_func == i8254_delay) > > - delay_func = lapic_delay; > > + delay_init(lapic_delay, 3000); > > initclock_func = lapic_initclocks; > > } > > } > > Index: sys/arch/amd64/amd64/machdep.c > > === > > RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v > > retrieving revision 1.279 > > diff -u -p -r1.279 machdep.c > > --- sys/arch/amd64/amd64/machdep.c 7 Aug 2022 23:56:06 - 1.279 > > +++ sys/arch/amd64/amd64/machdep.c 23 Aug 2022 17:18:31 - > > @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st > > > > return 0; > > } > > + > > +void > > +delay_init(void(*fn)(int), int fn_quality) > > +{ > > + static int cur_quality = 0; > > + if (fn_quality > cur_quality) { > > + delay_func = fn; > > + cur_quality = fn_quality; > > + } > > +} > > Index: sys/arch/amd64/amd64/tsc.c > > === > > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v > > retrieving revision 1.25 > > diff -u -p -r1.25 tsc.c > > --- sys/arch/amd64/amd64/tsc.c 12 Aug 2022 02:20:36 - 1.25 > > +++ sys/arch/amd64/amd64/tsc.c 23 Aug 2022 17:18:31 - > > @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) > > > > tsc_frequency = tsc_freq_cpuid(ci); > > if (tsc_frequency > 0) > > - delay_func = tsc_delay; > > +
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote: > > Hyper-V generation 1 VMs are bios boot with emulation of the usual > > devices. 32-bit and 64-bit guests. > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. > > 64-bit guests only. > > > > There is no 8254 in generation 2. > > No HPET in either generation. > > > > hv_delay uses the "Partition Reference Counter MSR" described in > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers > > It seems it is available in both generations and could be used from i386? > > > > From reading that page hv_delay() should be preferred over lapic_delay() > > Alright, I have nudged hv_delay's quality up over lapic_delay's > quality. Before these changes, tsc is probed before pvbus. Do the tsc sanity checks result in it not being considered an option on hyper-v? I think the tsc_delay and hv_delay numbers should be swapped in a later commit. It is unclear if that would change the final delay_func setting. It would be a good idea to have different commits for the places new delay callbacks are introduced. - add delay_init() - use delay_init() in lapic, tsc, hv_delay - commit acpihpet - commit acpitimer - swap tsc and hv_delay numbers > > How are we looking now? some minor suggestions inline have you built a release with this? > > Index: sys/arch/amd64/amd64/lapic.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v > retrieving revision 1.60 > diff -u -p -r1.60 lapic.c > --- sys/arch/amd64/amd64/lapic.c 15 Aug 2022 04:17:50 - 1.60 > +++ sys/arch/amd64/amd64/lapic.c 23 Aug 2022 17:18:30 - > @@ -486,8 +486,6 @@ wait_next_cycle(void) > } > } > > -extern void tsc_delay(int); > - this cleanup is unrelated and should be a different diff/commit > /* > * Calibrate the local apic count-down timer (which is running at > * bus-clock speed) vs. the i8254 counter/timer (which is running at > @@ -592,8 +590,7 @@ skip_calibration: >* Now that the timer's calibrated, use the apic timer routines >* for all our timing needs.. >*/ > - if (delay_func == i8254_delay) > - delay_func = lapic_delay; > + delay_init(lapic_delay, 3000); > initclock_func = lapic_initclocks; > } > } > Index: sys/arch/amd64/amd64/machdep.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v > retrieving revision 1.279 > diff -u -p -r1.279 machdep.c > --- sys/arch/amd64/amd64/machdep.c7 Aug 2022 23:56:06 - 1.279 > +++ sys/arch/amd64/amd64/machdep.c23 Aug 2022 17:18:31 - > @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st > > return 0; > } > + > +void > +delay_init(void(*fn)(int), int fn_quality) > +{ > + static int cur_quality = 0; > + if (fn_quality > cur_quality) { > + delay_func = fn; > + cur_quality = fn_quality; > + } > +} > Index: sys/arch/amd64/amd64/tsc.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v > retrieving revision 1.25 > diff -u -p -r1.25 tsc.c > --- sys/arch/amd64/amd64/tsc.c12 Aug 2022 02:20:36 - 1.25 > +++ sys/arch/amd64/amd64/tsc.c23 Aug 2022 17:18:31 - > @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) > > tsc_frequency = tsc_freq_cpuid(ci); > if (tsc_frequency > 0) > - delay_func = tsc_delay; > + delay_init(tsc_delay, 5000); > } > > static inline int > Index: sys/arch/amd64/include/cpu.h > === > RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v > retrieving revision 1.148 > diff -u -p -r1.148 cpu.h > --- sys/arch/amd64/include/cpu.h 22 Aug 2022 08:57:54 - 1.148 > +++ sys/arch/amd64/include/cpu.h 23 Aug 2022 17:18:31 - > @@ -359,6 +359,7 @@ void signotify(struct proc *); > * We need a machine-independent name for this. > */ > extern void (*delay_func)(int); > +void delay_init(void (*)(int), int); > struct timeval; > > #define DELAY(x) (*delay_func)(x) > Index: sys/arch/i386/i386/lapic.c > === > RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v > retrieving revision 1.49 > diff -u -p -r1.49 lapic.c > --- sys/arch/i386/i386/lapic.c15 Aug 2022 04:17:50 - 1.49 > +++ sys/arch/i386/i386/lapic.c23 Aug 2022 17:18:31 - > @@ -395,7 +395,7 @@ lapic_calibrate_timer(struct cpu_info *c >* Now that the timer's calibrated, use the apic timer routines >* for all our timing needs.. >*/ > - delay_func = lapic_delay; > + delay_init(lapic_delay, 3000); >
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Tue, Aug 23, 2022 at 04:04:39PM +1000, Jonathan Gray wrote: > On Mon, Aug 22, 2022 at 09:37:02AM -0500, Scott Cheloha wrote: > > On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote: > > > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote: > > > > > > > > It seems to me it would be cleaner if the decision of what to use for > > > > delay could be moved into an md file. > > > > > > > > Or abstract it by having a numeric weight like timecounters or driver > > > > match return numbers. > > > > > > diff against your previous, does not change lapic_delay > > > > Sorry for the delay. > > > > :) > > > > I was out of town. > > > > I slept on this and you're right, this is better. > > > > Couple tweaks: > > > > - Move the quality numbers into cpu.h and give them names. That way, > > the next time Intel, or AMD, or Microsoft or [...] does something > > foolish we don't need to rototill all these files to juggle the > > quality hierarchy. > > While that would allow arch specific differences, removing one would > require changing at least three places in the tree. Okay, nevermind. Patch updated below. > > - Update both amd64 and i386's lapic.c to use delay_init(). While we > > have a lapic_delay() in the tree it should cooperate with everything > > else. > > > > - Include in any files calling delay_init() where it > > isn't already included. > > > > - Give the variables in delay_init() real names. > > hmm > > > I'm unsure about two small things: > > > > - Can i386 use hv_delay()? The i386 GENERIC config does not list > > hyperv(4) support so my guess is "no" and I have excluded > > HV_DELAY_QUALITY from i386's cpu.h. > > > > - If a Hyper-V guest could choose between hv_delay() and > > lapic_delay(), which would be preferable? Right now I > > have hv_delay() scored lower than lapic_delay(). > > Hyper-V generation 1 VMs are bios boot with emulation of the usual > devices. 32-bit and 64-bit guests. > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. > 64-bit guests only. > > There is no 8254 in generation 2. > No HPET in either generation. > > hv_delay uses the "Partition Reference Counter MSR" described in > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers > It seems it is available in both generations and could be used from i386? > > From reading that page hv_delay() should be preferred over lapic_delay() Alright, I have nudged hv_delay's quality up over lapic_delay's quality. How are we looking now? Index: sys/arch/amd64/amd64/lapic.c === RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v retrieving revision 1.60 diff -u -p -r1.60 lapic.c --- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 - 1.60 +++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 - @@ -486,8 +486,6 @@ wait_next_cycle(void) } } -extern void tsc_delay(int); - /* * Calibrate the local apic count-down timer (which is running at * bus-clock speed) vs. the i8254 counter/timer (which is running at @@ -592,8 +590,7 @@ skip_calibration: * Now that the timer's calibrated, use the apic timer routines * for all our timing needs.. */ - if (delay_func == i8254_delay) - delay_func = lapic_delay; + delay_init(lapic_delay, 3000); initclock_func = lapic_initclocks; } } Index: sys/arch/amd64/amd64/machdep.c === RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v retrieving revision 1.279 diff -u -p -r1.279 machdep.c --- sys/arch/amd64/amd64/machdep.c 7 Aug 2022 23:56:06 - 1.279 +++ sys/arch/amd64/amd64/machdep.c 23 Aug 2022 17:18:31 - @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st return 0; } + +void +delay_init(void(*fn)(int), int fn_quality) +{ + static int cur_quality = 0; + if (fn_quality > cur_quality) { + delay_func = fn; + cur_quality = fn_quality; + } +} Index: sys/arch/amd64/amd64/tsc.c === RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v retrieving revision 1.25 diff -u -p -r1.25 tsc.c --- sys/arch/amd64/amd64/tsc.c 12 Aug 2022 02:20:36 - 1.25 +++ sys/arch/amd64/amd64/tsc.c 23 Aug 2022 17:18:31 - @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) tsc_frequency = tsc_freq_cpuid(ci); if (tsc_frequency > 0) - delay_func = tsc_delay; + delay_init(tsc_delay, 5000); } static inline int Index: sys/arch/amd64/include/cpu.h === RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v retrieving revision 1.148 diff -u -p -r1.148 cpu.h ---
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Mon, Aug 22, 2022 at 09:37:02AM -0500, Scott Cheloha wrote: > On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote: > > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote: > > > > > > It seems to me it would be cleaner if the decision of what to use for > > > delay could be moved into an md file. > > > > > > Or abstract it by having a numeric weight like timecounters or driver > > > match return numbers. > > > > diff against your previous, does not change lapic_delay > > Sorry for the delay. > > :) > > I was out of town. > > I slept on this and you're right, this is better. > > Couple tweaks: > > - Move the quality numbers into cpu.h and give them names. That way, > the next time Intel, or AMD, or Microsoft or [...] does something > foolish we don't need to rototill all these files to juggle the > quality hierarchy. While that would allow arch specific differences, removing one would require changing at least three places in the tree. > > - Update both amd64 and i386's lapic.c to use delay_init(). While we > have a lapic_delay() in the tree it should cooperate with everything > else. > > - Include in any files calling delay_init() where it > isn't already included. > > - Give the variables in delay_init() real names. hmm > > I'm unsure about two small things: > > - Can i386 use hv_delay()? The i386 GENERIC config does not list > hyperv(4) support so my guess is "no" and I have excluded > HV_DELAY_QUALITY from i386's cpu.h. > > - If a Hyper-V guest could choose between hv_delay() and > lapic_delay(), which would be preferable? Right now I > have hv_delay() scored lower than lapic_delay(). Hyper-V generation 1 VMs are bios boot with emulation of the usual devices. 32-bit and 64-bit guests. Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices. 64-bit guests only. There is no 8254 in generation 2. No HPET in either generation. hv_delay uses the "Partition Reference Counter MSR" described in https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers It seems it is available in both generations and could be used from i386? >From reading that page hv_delay() should be preferred over lapic_delay()
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote: > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote: > > > > It seems to me it would be cleaner if the decision of what to use for > > delay could be moved into an md file. > > > > Or abstract it by having a numeric weight like timecounters or driver > > match return numbers. > > diff against your previous, does not change lapic_delay Sorry for the delay. :) I was out of town. I slept on this and you're right, this is better. Couple tweaks: - Move the quality numbers into cpu.h and give them names. That way, the next time Intel, or AMD, or Microsoft or [...] does something foolish we don't need to rototill all these files to juggle the quality hierarchy. - Update both amd64 and i386's lapic.c to use delay_init(). While we have a lapic_delay() in the tree it should cooperate with everything else. - Include in any files calling delay_init() where it isn't already included. - Give the variables in delay_init() real names. I'm unsure about two small things: - Can i386 use hv_delay()? The i386 GENERIC config does not list hyperv(4) support so my guess is "no" and I have excluded HV_DELAY_QUALITY from i386's cpu.h. - If a Hyper-V guest could choose between hv_delay() and lapic_delay(), which would be preferable? Right now I have hv_delay() scored lower than lapic_delay(). Once we've sorted those out, are you OK with the attached patch? mlarkin: still ok? Index: sys/arch/amd64/amd64/lapic.c === RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v retrieving revision 1.60 diff -u -p -r1.60 lapic.c --- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 - 1.60 +++ sys/arch/amd64/amd64/lapic.c22 Aug 2022 14:33:30 - @@ -486,8 +486,6 @@ wait_next_cycle(void) } } -extern void tsc_delay(int); - /* * Calibrate the local apic count-down timer (which is running at * bus-clock speed) vs. the i8254 counter/timer (which is running at @@ -592,8 +590,7 @@ skip_calibration: * Now that the timer's calibrated, use the apic timer routines * for all our timing needs.. */ - if (delay_func == i8254_delay) - delay_func = lapic_delay; + delay_init(lapic_delay, LAPIC_DELAY_QUALITY); initclock_func = lapic_initclocks; } } Index: sys/arch/amd64/amd64/machdep.c === RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v retrieving revision 1.279 diff -u -p -r1.279 machdep.c --- sys/arch/amd64/amd64/machdep.c 7 Aug 2022 23:56:06 - 1.279 +++ sys/arch/amd64/amd64/machdep.c 22 Aug 2022 14:33:30 - @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st return 0; } + +void +delay_init(void(*fn)(int), int fn_quality) +{ + static int cur_quality = 0; + if (fn_quality > cur_quality) { + delay_func = fn; + cur_quality = fn_quality; + } +} Index: sys/arch/amd64/amd64/tsc.c === RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v retrieving revision 1.25 diff -u -p -r1.25 tsc.c --- sys/arch/amd64/amd64/tsc.c 12 Aug 2022 02:20:36 - 1.25 +++ sys/arch/amd64/amd64/tsc.c 22 Aug 2022 14:33:30 - @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) tsc_frequency = tsc_freq_cpuid(ci); if (tsc_frequency > 0) - delay_func = tsc_delay; + delay_init(tsc_delay, TSC_DELAY_QUALITY); } static inline int Index: sys/arch/amd64/include/cpu.h === RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v retrieving revision 1.148 diff -u -p -r1.148 cpu.h --- sys/arch/amd64/include/cpu.h22 Aug 2022 08:57:54 - 1.148 +++ sys/arch/amd64/include/cpu.h22 Aug 2022 14:33:30 - @@ -364,6 +364,13 @@ struct timeval; #define DELAY(x) (*delay_func)(x) #define delay(x) (*delay_func)(x) +void delay_init(void(*)(int), int); + +#define ACPITIMER_DELAY_QUALITY100 +#define ACPIHPET_DELAY_QUALITY 200 +#define HV_DELAY_QUALITY 250 +#define LAPIC_DELAY_QUALITY300 +#define TSC_DELAY_QUALITY 400 #ifdef _KERNEL /* locore.S */ Index: sys/arch/i386/i386/lapic.c === RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v retrieving revision 1.49 diff -u -p -r1.49 lapic.c --- sys/arch/i386/i386/lapic.c 15 Aug 2022 04:17:50 - 1.49 +++ sys/arch/i386/i386/lapic.c 22 Aug 2022 14:33:30 - @@ -395,7 +395,7 @@ lapic_calibrate_timer(struct cpu_info *c * Now that the timer's calibrated, use the apic timer routines * for all our timing needs..
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote: > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote: > > > > It seems to me it would be cleaner if the decision of what to use for > > delay could be moved into an md file. > > > > Or abstract it by having a numeric weight like timecounters or driver > > match return numbers. > > diff against your previous, does not change lapic_delay > I think the combination of diffs is a move in the right direction, so ok mlarkin on these when ready. -ml > diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c > index 932b1dfeb47..c4645b6a6fd 100644 > --- sys/arch/amd64/amd64/machdep.c > +++ sys/arch/amd64/amd64/machdep.c > @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, struct trapframe > *tf) > > return 0; > } > + > +void > +delay_init(void(*f)(int), int v) > +{ > + static int c = 0; > + if (v > c) { > + delay_func = f; > + c = v; > + } > +} > diff --git sys/arch/amd64/amd64/tsc.c sys/arch/amd64/amd64/tsc.c > index fd38dc6359d..8c839357dd2 100644 > --- sys/arch/amd64/amd64/tsc.c > +++ sys/arch/amd64/amd64/tsc.c > @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) > > tsc_frequency = tsc_freq_cpuid(ci); > if (tsc_frequency > 0) > - delay_func = tsc_delay; > + delay_init(tsc_delay, 300); > } > > static inline int > diff --git sys/arch/amd64/include/cpu.h sys/arch/amd64/include/cpu.h > index b8db48f2714..a82af172452 100644 > --- sys/arch/amd64/include/cpu.h > +++ sys/arch/amd64/include/cpu.h > @@ -359,6 +359,7 @@ void signotify(struct proc *); > * We need a machine-independent name for this. > */ > extern void (*delay_func)(int); > +void delay_init(void(*)(int), int); > struct timeval; > > #define DELAY(x) (*delay_func)(x) > diff --git sys/arch/i386/i386/machdep.c sys/arch/i386/i386/machdep.c > index e4cb15b4dc1..7da5c26e240 100644 > --- sys/arch/i386/i386/machdep.c > +++ sys/arch/i386/i386/machdep.c > @@ -3996,3 +3996,13 @@ cpu_rnd_messybits(void) > nanotime(); > return (ts.tv_nsec ^ (ts.tv_sec << 20)); > } > + > +void > +delay_init(void(*f)(int), int v) > +{ > + static int c = 0; > + if (v > c) { > + delay_func = f; > + c = v; > + } > +} > diff --git sys/arch/i386/include/cpu.h sys/arch/i386/include/cpu.h > index 5f300710562..211ee475678 100644 > --- sys/arch/i386/include/cpu.h > +++ sys/arch/i386/include/cpu.h > @@ -302,6 +302,7 @@ void signotify(struct proc *); > * We need a machine-independent name for this. > */ > extern void (*delay_func)(int); > +void delay_init(void(*)(int), int); > struct timeval; > > #define DELAY(x)(*delay_func)(x) > diff --git sys/dev/acpi/acpihpet.c sys/dev/acpi/acpihpet.c > index 6dc595e50ab..4332b4dbc0e 100644 > --- sys/dev/acpi/acpihpet.c > +++ sys/dev/acpi/acpihpet.c > @@ -27,8 +27,6 @@ > #include > #include > > -#include "acpitimer.h" > - > int acpihpet_attached; > > int acpihpet_match(struct device *, void *, void *); > @@ -270,15 +268,7 @@ acpihpet_attach(struct device *parent, struct device > *self, void *aux) > hpet_timecounter.tc_name = sc->sc_dev.dv_xname; > tc_init(_timecounter); > > -#if defined(__amd64__) || defined(__i386__) > - if (delay_func == i8254_delay) > - delay_func = acpihpet_delay; > -#if NACPITIMER > 1 > - extern void acpitimer_delay(int); > - if (delay_func == acpitimer_delay) > - delay_func = acpihpet_delay; > -#endif > -#endif /* defined(__amd64__) || defined(__i386__) */ > + delay_init(acpihpet_delay, 200); > > #if defined(__amd64__) > extern void cpu_recalibrate_tsc(struct timecounter *); > diff --git sys/dev/acpi/acpitimer.c sys/dev/acpi/acpitimer.c > index 0c4c7b71a01..e2757a40f3d 100644 > --- sys/dev/acpi/acpitimer.c > +++ sys/dev/acpi/acpitimer.c > @@ -103,10 +103,8 @@ acpitimerattach(struct device *parent, struct device > *self, void *aux) > acpi_timecounter.tc_name = sc->sc_dev.dv_xname; > tc_init(_timecounter); > > -#if defined(__amd64__) || defined(__i386__) > - if (delay_func == i8254_delay) > - delay_func = acpitimer_delay; > -#endif > + delay_init(acpitimer_delay, 100); > + > #if defined(__amd64__) > extern void cpu_recalibrate_tsc(struct timecounter *); > cpu_recalibrate_tsc(_timecounter); > diff --git sys/dev/acpi/files.acpi sys/dev/acpi/files.acpi > index 8ec3ec2f8b3..f97eb6d4e3e 100644 > --- sys/dev/acpi/files.acpi > +++ sys/dev/acpi/files.acpi > @@ -13,7 +13,7 @@ filedev/acpi/acpidebug.cacpi & ddb > # ACPI timer > device acpitimer > attach acpitimer at acpi > -file dev/acpi/acpitimer.cacpitimer needs-flag > +file dev/acpi/acpitimer.cacpitimer > > # AC device > device acpiac > diff --git sys/dev/pv/pvbus.c sys/dev/pv/pvbus.c > index cbe543ac312..ee53afe2138 100644 > ---
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote: > > It seems to me it would be cleaner if the decision of what to use for > delay could be moved into an md file. > > Or abstract it by having a numeric weight like timecounters or driver > match return numbers. diff against your previous, does not change lapic_delay diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c index 932b1dfeb47..c4645b6a6fd 100644 --- sys/arch/amd64/amd64/machdep.c +++ sys/arch/amd64/amd64/machdep.c @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, struct trapframe *tf) return 0; } + +void +delay_init(void(*f)(int), int v) +{ + static int c = 0; + if (v > c) { + delay_func = f; + c = v; + } +} diff --git sys/arch/amd64/amd64/tsc.c sys/arch/amd64/amd64/tsc.c index fd38dc6359d..8c839357dd2 100644 --- sys/arch/amd64/amd64/tsc.c +++ sys/arch/amd64/amd64/tsc.c @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci) tsc_frequency = tsc_freq_cpuid(ci); if (tsc_frequency > 0) - delay_func = tsc_delay; + delay_init(tsc_delay, 300); } static inline int diff --git sys/arch/amd64/include/cpu.h sys/arch/amd64/include/cpu.h index b8db48f2714..a82af172452 100644 --- sys/arch/amd64/include/cpu.h +++ sys/arch/amd64/include/cpu.h @@ -359,6 +359,7 @@ void signotify(struct proc *); * We need a machine-independent name for this. */ extern void (*delay_func)(int); +void delay_init(void(*)(int), int); struct timeval; #define DELAY(x) (*delay_func)(x) diff --git sys/arch/i386/i386/machdep.c sys/arch/i386/i386/machdep.c index e4cb15b4dc1..7da5c26e240 100644 --- sys/arch/i386/i386/machdep.c +++ sys/arch/i386/i386/machdep.c @@ -3996,3 +3996,13 @@ cpu_rnd_messybits(void) nanotime(); return (ts.tv_nsec ^ (ts.tv_sec << 20)); } + +void +delay_init(void(*f)(int), int v) +{ + static int c = 0; + if (v > c) { + delay_func = f; + c = v; + } +} diff --git sys/arch/i386/include/cpu.h sys/arch/i386/include/cpu.h index 5f300710562..211ee475678 100644 --- sys/arch/i386/include/cpu.h +++ sys/arch/i386/include/cpu.h @@ -302,6 +302,7 @@ void signotify(struct proc *); * We need a machine-independent name for this. */ extern void (*delay_func)(int); +void delay_init(void(*)(int), int); struct timeval; #defineDELAY(x)(*delay_func)(x) diff --git sys/dev/acpi/acpihpet.c sys/dev/acpi/acpihpet.c index 6dc595e50ab..4332b4dbc0e 100644 --- sys/dev/acpi/acpihpet.c +++ sys/dev/acpi/acpihpet.c @@ -27,8 +27,6 @@ #include #include -#include "acpitimer.h" - int acpihpet_attached; int acpihpet_match(struct device *, void *, void *); @@ -270,15 +268,7 @@ acpihpet_attach(struct device *parent, struct device *self, void *aux) hpet_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(_timecounter); -#if defined(__amd64__) || defined(__i386__) - if (delay_func == i8254_delay) - delay_func = acpihpet_delay; -#if NACPITIMER > 1 - extern void acpitimer_delay(int); - if (delay_func == acpitimer_delay) - delay_func = acpihpet_delay; -#endif -#endif /* defined(__amd64__) || defined(__i386__) */ + delay_init(acpihpet_delay, 200); #if defined(__amd64__) extern void cpu_recalibrate_tsc(struct timecounter *); diff --git sys/dev/acpi/acpitimer.c sys/dev/acpi/acpitimer.c index 0c4c7b71a01..e2757a40f3d 100644 --- sys/dev/acpi/acpitimer.c +++ sys/dev/acpi/acpitimer.c @@ -103,10 +103,8 @@ acpitimerattach(struct device *parent, struct device *self, void *aux) acpi_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(_timecounter); -#if defined(__amd64__) || defined(__i386__) - if (delay_func == i8254_delay) - delay_func = acpitimer_delay; -#endif + delay_init(acpitimer_delay, 100); + #if defined(__amd64__) extern void cpu_recalibrate_tsc(struct timecounter *); cpu_recalibrate_tsc(_timecounter); diff --git sys/dev/acpi/files.acpi sys/dev/acpi/files.acpi index 8ec3ec2f8b3..f97eb6d4e3e 100644 --- sys/dev/acpi/files.acpi +++ sys/dev/acpi/files.acpi @@ -13,7 +13,7 @@ file dev/acpi/acpidebug.cacpi & ddb # ACPI timer device acpitimer attach acpitimer at acpi -file dev/acpi/acpitimer.cacpitimer needs-flag +file dev/acpi/acpitimer.cacpitimer # AC device device acpiac diff --git sys/dev/pv/pvbus.c sys/dev/pv/pvbus.c index cbe543ac312..ee53afe2138 100644 --- sys/dev/pv/pvbus.c +++ sys/dev/pv/pvbus.c @@ -319,9 +319,8 @@ pvbus_hyperv(struct pvbus_hv *hv) HYPERV_VERSION_EBX_MINOR_S; #if NHYPERV > 0 - if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT && - delay_func == i8254_delay) - delay_func = hv_delay; + if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT) + delay_init(hv_delay, 250); #endif }
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 17, 2022 at 12:37:52AM -0500, Scott Cheloha wrote: > On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote: > > On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote: > > > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > > > > > > > In the future when the LAPIC timer is run in oneshot mode there will > > > > be no lapic_delay(). > > > > > > > > [...] > > > > > > > > This is *very* bad for older amd64 machines, because you are left with > > > > i8254_delay(). > > > > > > > > I would like to offer a less awful delay(9) implementation for this > > > > class of hardware. Otherwise we may trip over bizarre phantom bugs on > > > > MP kernels because only one CPU can read the i8254 at a time. > > > > > > > > [...] > > > > > > > > Real i386 hardware should be fine. Later models with an ACPI PM timer > > > > will be fine using acpitimer_delay() instead of i8254_delay(). > > > > > > > > [...] > > > > > > > > Here are the sample measurements from my 2017 laptop (kaby lake > > > > refresh) running the attached patch. It takes longer than a > > > > microsecond to read either of the ACPI timers. The PM timer is better > > > > than the HPET. The HPET is a bit better than the i8254. I hope the > > > > numbers are a little better on older hardware. > > > > > > > > acpitimer_test_delay: expected 0.01000 actual 0.10638 > > > > error 0.09638 > > > > acpitimer_test_delay: expected 0.1 actual 0.15464 > > > > error 0.05464 > > > > acpitimer_test_delay: expected 0.00010 actual 0.000107619 > > > > error 0.07619 > > > > acpitimer_test_delay: expected 0.00100 actual 0.001007275 > > > > error 0.07275 > > > > acpitimer_test_delay: expected 0.01000 actual 0.010007891 > > > > error 0.07891 > > > > > > > > acpihpet_test_delay: expected 0.01000 actual 0.22208 > > > > error 0.21208 > > > > acpihpet_test_delay: expected 0.1 actual 0.31690 > > > > error 0.21690 > > > > acpihpet_test_delay: expected 0.00010 actual 0.000112647 > > > > error 0.12647 > > > > acpihpet_test_delay: expected 0.00100 actual 0.001021480 > > > > error 0.21480 > > > > acpihpet_test_delay: expected 0.01000 actual 0.010013736 > > > > error 0.13736 > > > > > > > > i8254_test_delay: expected 0.01000 actual 0.40110 > > > > error 0.39110 > > > > i8254_test_delay: expected 0.1 actual 0.39471 > > > > error 0.29471 > > > > i8254_test_delay: expected 0.00010 actual 0.000128031 > > > > error 0.28031 > > > > i8254_test_delay: expected 0.00100 actual 0.001024586 > > > > error 0.24586 > > > > i8254_test_delay: expected 0.01000 actual 0.010021859 > > > > error 0.21859 > > > > > > Attched is an updated patch. I left the test measurement code in > > > place because I would like to see a test on a real i386 machine, just > > > to make sure it works as expected. I can't imagine why it wouldn't > > > work, but we should never assume anything. > > > > > > Changes from v1: > > > > > > - Actually set delay_func from acpitimerattach() and > > > acpihpet_attach(). > > > > > > I think it's safe to assume, on real hardware, that the ACPI PMT is > > > preferable to the i8254 and the HPET is preferable to both of them. > > > > > > This is not *always* true, but it is true on the older machines that > > > can't use tsc_delay(), so the assumption works in practice. > > > > > > Outside of those three timers, the hierarchy gets murky. There are > > > other timers that are better than the HPET, but they aren't always > > > available. If those timers are already providing delay_func this > > > code does not usurp them. > > > > As I understand it, you want lapic to be in one-shot mode for something > > along the lines of tickless. > > Yes. > > Although "tickless" is a misnomer. > > > So you are trying to find MP machines > > where TSC is not useable for delay? > > Right. Those are the only machines where it's relevant to consider > the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've > forgotten something. > > > TSC is only considered for delay if the invariant and constant flags > > are set. > > invariant: > > "In the Core i7 and future processor generations, the TSC will continue > > to run in the deepest C-states. Therefore, the TSC will run at a > > constant rate in all ACPI P-, C-. and T-states. Support for this feature > > is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant > > TSC support, the OS may use the TSC for wall clock timer services > > (instead of ACPI or HPET timers). TSC reads are much more efficient and > > do not incur the overhead associated with a ring transition or access to > > a platform resource." > > > > constant: > > runs at a constant rate across frequency/P
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote: > On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote: > > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > > > > > In the future when the LAPIC timer is run in oneshot mode there will > > > be no lapic_delay(). > > > > > > [...] > > > > > > This is *very* bad for older amd64 machines, because you are left with > > > i8254_delay(). > > > > > > I would like to offer a less awful delay(9) implementation for this > > > class of hardware. Otherwise we may trip over bizarre phantom bugs on > > > MP kernels because only one CPU can read the i8254 at a time. > > > > > > [...] > > > > > > Real i386 hardware should be fine. Later models with an ACPI PM timer > > > will be fine using acpitimer_delay() instead of i8254_delay(). > > > > > > [...] > > > > > > Here are the sample measurements from my 2017 laptop (kaby lake > > > refresh) running the attached patch. It takes longer than a > > > microsecond to read either of the ACPI timers. The PM timer is better > > > than the HPET. The HPET is a bit better than the i8254. I hope the > > > numbers are a little better on older hardware. > > > > > > acpitimer_test_delay: expected 0.01000 actual 0.10638 error > > > 0.09638 > > > acpitimer_test_delay: expected 0.1 actual 0.15464 error > > > 0.05464 > > > acpitimer_test_delay: expected 0.00010 actual 0.000107619 error > > > 0.07619 > > > acpitimer_test_delay: expected 0.00100 actual 0.001007275 error > > > 0.07275 > > > acpitimer_test_delay: expected 0.01000 actual 0.010007891 error > > > 0.07891 > > > > > > acpihpet_test_delay: expected 0.01000 actual 0.22208 error > > > 0.21208 > > > acpihpet_test_delay: expected 0.1 actual 0.31690 error > > > 0.21690 > > > acpihpet_test_delay: expected 0.00010 actual 0.000112647 error > > > 0.12647 > > > acpihpet_test_delay: expected 0.00100 actual 0.001021480 error > > > 0.21480 > > > acpihpet_test_delay: expected 0.01000 actual 0.010013736 error > > > 0.13736 > > > > > > i8254_test_delay: expected 0.01000 actual 0.40110 error > > > 0.39110 > > > i8254_test_delay: expected 0.1 actual 0.39471 error > > > 0.29471 > > > i8254_test_delay: expected 0.00010 actual 0.000128031 error > > > 0.28031 > > > i8254_test_delay: expected 0.00100 actual 0.001024586 error > > > 0.24586 > > > i8254_test_delay: expected 0.01000 actual 0.010021859 error > > > 0.21859 > > > > Attched is an updated patch. I left the test measurement code in > > place because I would like to see a test on a real i386 machine, just > > to make sure it works as expected. I can't imagine why it wouldn't > > work, but we should never assume anything. > > > > Changes from v1: > > > > - Actually set delay_func from acpitimerattach() and > > acpihpet_attach(). > > > > I think it's safe to assume, on real hardware, that the ACPI PMT is > > preferable to the i8254 and the HPET is preferable to both of them. > > > > This is not *always* true, but it is true on the older machines that > > can't use tsc_delay(), so the assumption works in practice. > > > > Outside of those three timers, the hierarchy gets murky. There are > > other timers that are better than the HPET, but they aren't always > > available. If those timers are already providing delay_func this > > code does not usurp them. > > As I understand it, you want lapic to be in one-shot mode for something > along the lines of tickless. Yes. Although "tickless" is a misnomer. > So you are trying to find MP machines > where TSC is not useable for delay? Right. Those are the only machines where it's relevant to consider the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've forgotten something. > TSC is only considered for delay if the invariant and constant flags > are set. > invariant: > "In the Core i7 and future processor generations, the TSC will continue > to run in the deepest C-states. Therefore, the TSC will run at a > constant rate in all ACPI P-, C-. and T-states. Support for this feature > is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant > TSC support, the OS may use the TSC for wall clock timer services > (instead of ACPI or HPET timers). TSC reads are much more efficient and > do not incur the overhead associated with a ring transition or access to > a platform resource." > > constant: > runs at a constant rate across frequency/P state changes > > Intel constant > (family == 0x0f && model >= 0x03) || (family == 0x06 && model >= 0x0e) > > family 0x06 model 0x0e is yonah, core solo/duo > Intel CPUID doc has > "Intel Core Duo processor, Intel Core Solo processor, model 0Eh. > All processors are manufactured using
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote: > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > > > In the future when the LAPIC timer is run in oneshot mode there will > > be no lapic_delay(). > > > > [...] > > > > This is *very* bad for older amd64 machines, because you are left with > > i8254_delay(). > > > > I would like to offer a less awful delay(9) implementation for this > > class of hardware. Otherwise we may trip over bizarre phantom bugs on > > MP kernels because only one CPU can read the i8254 at a time. > > > > [...] > > > > Real i386 hardware should be fine. Later models with an ACPI PM timer > > will be fine using acpitimer_delay() instead of i8254_delay(). > > > > [...] > > > > Here are the sample measurements from my 2017 laptop (kaby lake > > refresh) running the attached patch. It takes longer than a > > microsecond to read either of the ACPI timers. The PM timer is better > > than the HPET. The HPET is a bit better than the i8254. I hope the > > numbers are a little better on older hardware. > > > > acpitimer_test_delay: expected 0.01000 actual 0.10638 error > > 0.09638 > > acpitimer_test_delay: expected 0.1 actual 0.15464 error > > 0.05464 > > acpitimer_test_delay: expected 0.00010 actual 0.000107619 error > > 0.07619 > > acpitimer_test_delay: expected 0.00100 actual 0.001007275 error > > 0.07275 > > acpitimer_test_delay: expected 0.01000 actual 0.010007891 error > > 0.07891 > > > > acpihpet_test_delay: expected 0.01000 actual 0.22208 error > > 0.21208 > > acpihpet_test_delay: expected 0.1 actual 0.31690 error > > 0.21690 > > acpihpet_test_delay: expected 0.00010 actual 0.000112647 error > > 0.12647 > > acpihpet_test_delay: expected 0.00100 actual 0.001021480 error > > 0.21480 > > acpihpet_test_delay: expected 0.01000 actual 0.010013736 error > > 0.13736 > > > > i8254_test_delay: expected 0.01000 actual 0.40110 error > > 0.39110 > > i8254_test_delay: expected 0.1 actual 0.39471 error > > 0.29471 > > i8254_test_delay: expected 0.00010 actual 0.000128031 error > > 0.28031 > > i8254_test_delay: expected 0.00100 actual 0.001024586 error > > 0.24586 > > i8254_test_delay: expected 0.01000 actual 0.010021859 error > > 0.21859 > > Attched is an updated patch. I left the test measurement code in > place because I would like to see a test on a real i386 machine, just > to make sure it works as expected. I can't imagine why it wouldn't > work, but we should never assume anything. > > Changes from v1: > > - Actually set delay_func from acpitimerattach() and > acpihpet_attach(). > > I think it's safe to assume, on real hardware, that the ACPI PMT is > preferable to the i8254 and the HPET is preferable to both of them. > > This is not *always* true, but it is true on the older machines that > can't use tsc_delay(), so the assumption works in practice. > > Outside of those three timers, the hierarchy gets murky. There are > other timers that are better than the HPET, but they aren't always > available. If those timers are already providing delay_func this > code does not usurp them. As I understand it, you want lapic to be in one-shot mode for something along the lines of tickless. So you are trying to find MP machines where TSC is not useable for delay? TSC is only considered for delay if the invariant and constant flags are set. invariant: "In the Core i7 and future processor generations, the TSC will continue to run in the deepest C-states. Therefore, the TSC will run at a constant rate in all ACPI P-, C-. and T-states. Support for this feature is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." constant: runs at a constant rate across frequency/P state changes Intel constant (family == 0x0f && model >= 0x03) || (family == 0x06 && model >= 0x0e) family 0x06 model 0x0e is yonah, core solo/duo Intel CPUID doc has "Intel Core Duo processor, Intel Core Solo processor, model 0Eh. All processors are manufactured using the 65 nm process." family 0x0f model 0x03 "Pentium 4 processor, Intel Xeon processor, Intel Celeron D processor. All processors are model 03h and manufactured using the 90 nm process." VIA constant model >= 0x0f model 0x0f is Nano AMD constant CPUIDEDX_ITSC set invariant CPUIDEDX_ITSC set > > - Duplicate test measurement code from amd64/lapic.c into i386/lapic.c. > Will be removed in the committed version. > > - Use bus_space_read_8() in acpihpet.c if it's
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote: > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > > > In the future when the LAPIC timer is run in oneshot mode there will > > be no lapic_delay(). > > > > [...] > > > > This is *very* bad for older amd64 machines, because you are left with > > i8254_delay(). > > > > I would like to offer a less awful delay(9) implementation for this > > class of hardware. Otherwise we may trip over bizarre phantom bugs on > > MP kernels because only one CPU can read the i8254 at a time. > > > > [...] > > > > Real i386 hardware should be fine. Later models with an ACPI PM timer > > will be fine using acpitimer_delay() instead of i8254_delay(). > > > > [...] > > Attched is an updated patch. I left the test measurement code in > place because I would like to see a test on a real i386 machine, just > to make sure it works as expected. I can't imagine why it wouldn't > work, but we should never assume anything. > > [...] > > One remaining question I have: > > Is there a nice way to test whether ACPI PMT support is compiled into > the kernel? We can assume the existence of i8254_delay() because > clock.c is required on amd64 and i386. However, acpitimer.c is a > optional, so acpitimer_delay() isn't necessarily there. > > I would rather not introduce a hard requirement on acpitimer.c into > acpihpet.c if there's an easy way to check for the latter. > > Any ideas? And here's the cleaned up patch. Just in case nobody tests i386. Pretty straightforward. acpitimer is preferable to i8254, hpet is preferable to acpitimer and i8254. The only obvious problem I see is the hard dependency this creates in acpihpet.c on acpitimer.c. Index: acpitimer.c === RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v retrieving revision 1.15 diff -u -p -r1.15 acpitimer.c --- acpitimer.c 6 Apr 2022 18:59:27 - 1.15 +++ acpitimer.c 17 Aug 2022 02:56:10 - @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -25,10 +26,13 @@ #include #include +struct acpitimer_softc; + int acpitimermatch(struct device *, void *, void *); void acpitimerattach(struct device *, struct device *, void *); - +void acpitimer_delay(int); u_int acpi_get_timecount(struct timecounter *tc); +uint32_t acpitimer_read(struct acpitimer_softc *); static struct timecounter acpi_timecounter = { .tc_get_timecount = acpi_get_timecount, @@ -98,18 +102,45 @@ acpitimerattach(struct device *parent, s acpi_timecounter.tc_priv = sc; acpi_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(_timecounter); + +#if defined(__amd64__) || defined(__i386__) + if (delay_func == i8254_delay) + delay_func = acpitimer_delay; +#endif #if defined(__amd64__) extern void cpu_recalibrate_tsc(struct timecounter *); cpu_recalibrate_tsc(_timecounter); #endif } +void +acpitimer_delay(int usecs) +{ + uint64_t count = 0, cycles; + struct acpitimer_softc *sc = acpi_timecounter.tc_priv; + uint32_t mask = acpi_timecounter.tc_counter_mask; + uint32_t val1, val2; + + val2 = acpitimer_read(sc); + cycles = usecs * acpi_timecounter.tc_frequency / 100; + while (count < cycles) { + CPU_BUSY_CYCLE(); + val1 = val2; + val2 = acpitimer_read(sc); + count += (val2 - val1) & mask; + } +} u_int acpi_get_timecount(struct timecounter *tc) { - struct acpitimer_softc *sc = tc->tc_priv; - u_int u1, u2, u3; + return acpitimer_read(tc->tc_priv); +} + +uint32_t +acpitimer_read(struct acpitimer_softc *sc) +{ + uint32_t u1, u2, u3; u2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0); u3 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0); Index: acpihpet.c === RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v retrieving revision 1.26 diff -u -p -r1.26 acpihpet.c --- acpihpet.c 6 Apr 2022 18:59:27 - 1.26 +++ acpihpet.c 17 Aug 2022 02:56:10 - @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -31,7 +32,7 @@ int acpihpet_attached; int acpihpet_match(struct device *, void *, void *); void acpihpet_attach(struct device *, struct device *, void *); int acpihpet_activate(struct device *, int); - +void acpihpet_delay(int); u_int acpihpet_gettime(struct timecounter *tc); uint64_t acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh, @@ -262,15 +263,37 @@ acpihpet_attach(struct device *parent, s freq = 1000ull / period; printf(": %lld Hz\n", freq); - hpet_timecounter.tc_frequency = (uint32_t)freq; + hpet_timecounter.tc_frequency = freq; hpet_timecounter.tc_priv = sc; hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > In the future when the LAPIC timer is run in oneshot mode there will > be no lapic_delay(). > > [...] > > This is *very* bad for older amd64 machines, because you are left with > i8254_delay(). > > I would like to offer a less awful delay(9) implementation for this > class of hardware. Otherwise we may trip over bizarre phantom bugs on > MP kernels because only one CPU can read the i8254 at a time. > > [...] > > Real i386 hardware should be fine. Later models with an ACPI PM timer > will be fine using acpitimer_delay() instead of i8254_delay(). > > [...] > > Here are the sample measurements from my 2017 laptop (kaby lake > refresh) running the attached patch. It takes longer than a > microsecond to read either of the ACPI timers. The PM timer is better > than the HPET. The HPET is a bit better than the i8254. I hope the > numbers are a little better on older hardware. > > acpitimer_test_delay: expected 0.01000 actual 0.10638 error > 0.09638 > acpitimer_test_delay: expected 0.1 actual 0.15464 error > 0.05464 > acpitimer_test_delay: expected 0.00010 actual 0.000107619 error > 0.07619 > acpitimer_test_delay: expected 0.00100 actual 0.001007275 error > 0.07275 > acpitimer_test_delay: expected 0.01000 actual 0.010007891 error > 0.07891 > > acpihpet_test_delay: expected 0.01000 actual 0.22208 error > 0.21208 > acpihpet_test_delay: expected 0.1 actual 0.31690 error > 0.21690 > acpihpet_test_delay: expected 0.00010 actual 0.000112647 error > 0.12647 > acpihpet_test_delay: expected 0.00100 actual 0.001021480 error > 0.21480 > acpihpet_test_delay: expected 0.01000 actual 0.010013736 error > 0.13736 > > i8254_test_delay: expected 0.01000 actual 0.40110 error > 0.39110 > i8254_test_delay: expected 0.1 actual 0.39471 error > 0.29471 > i8254_test_delay: expected 0.00010 actual 0.000128031 error > 0.28031 > i8254_test_delay: expected 0.00100 actual 0.001024586 error > 0.24586 > i8254_test_delay: expected 0.01000 actual 0.010021859 error > 0.21859 Attched is an updated patch. I left the test measurement code in place because I would like to see a test on a real i386 machine, just to make sure it works as expected. I can't imagine why it wouldn't work, but we should never assume anything. Changes from v1: - Actually set delay_func from acpitimerattach() and acpihpet_attach(). I think it's safe to assume, on real hardware, that the ACPI PMT is preferable to the i8254 and the HPET is preferable to both of them. This is not *always* true, but it is true on the older machines that can't use tsc_delay(), so the assumption works in practice. Outside of those three timers, the hierarchy gets murky. There are other timers that are better than the HPET, but they aren't always available. If those timers are already providing delay_func this code does not usurp them. - Duplicate test measurement code from amd64/lapic.c into i386/lapic.c. Will be removed in the committed version. - Use bus_space_read_8() in acpihpet.c if it's available. The HPET is a 64-bit counter and the spec permits 32-bit or 64-bit aligned access. As one might predict, this cuts the overhead in half because we're doing half as many reads. This part can go into a separate commit, but I thought it was neat so I'm including it here. One remaining question I have: Is there a nice way to test whether ACPI PMT support is compiled into the kernel? We can assume the existence of i8254_delay() because clock.c is required on amd64 and i386. However, acpitimer.c is a optional, so acpitimer_delay() isn't necessarily there. I would rather not introduce a hard requirement on acpitimer.c into acpihpet.c if there's an easy way to check for the latter. Any ideas? Anyone have i386 hardware results? If I'm reading the timeline right, most P6 machines and beyond (NetBurst, etc) will have an ACPI PMT. I don't know if any real x86 motherboards shipped with an HPET, but it's possible. Here are my updated results with the bus_space_read_8 change: acpitimer_test_delay: expected 0.01000 actual 0.10607 error 0.09607 acpitimer_test_delay: expected 0.1 actual 0.15491 error 0.05491 acpitimer_test_delay: expected 0.00010 actual 0.000107734 error 0.07734 acpitimer_test_delay: expected 0.00100 actual 0.001008006 error 0.08006 acpitimer_test_delay: expected 0.01000 actual 0.010007042 error 0.07042 acpihpet_test_delay: expected 0.01000 actual 0.13282 error 0.12282 acpihpet_test_delay: expected 0.1 actual 0.22743 error 0.12743 acpihpet_test_delay:
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > Hi, > > In the future when the LAPIC timer is run in oneshot mode there will > be no lapic_delay(). > > This is fine if you have a constant TSC, because we have tsc_delay(). > > This is *very* bad for older amd64 machines, because you are left with > i8254_delay(). > > I would like to offer a less awful delay(9) implementation for this > class of hardware. Otherwise we may trip over bizarre phantom bugs on > MP kernels because only one CPU can read the i8254 at a time. > > I think patrick@ was struggling with some version of that problem last > year, but in a VM. > > Real i386 hardware should be fine. Later models with an ACPI PM timer > will be fine using acpitimer_delay() instead of i8254_delay(). > > If this seems reasonable to people I will come back with a cleaned up > patch for testing. > > Thoughts? Preferences? > Seems reasonable to me. Would be interested to see the revised diff once you're done. -ml > -Scott > > Here are the sample measurements from my 2017 laptop (kaby lake > refresh) running the attached patch. It takes longer than a > microsecond to read either of the ACPI timers. The PM timer is better > than the HPET. The HPET is a bit better than the i8254. I hope the > numbers are a little better on older hardware. > > acpitimer_test_delay: expected 0.01000 actual 0.10638 error > 0.09638 > acpitimer_test_delay: expected 0.1 actual 0.15464 error > 0.05464 > acpitimer_test_delay: expected 0.00010 actual 0.000107619 error > 0.07619 > acpitimer_test_delay: expected 0.00100 actual 0.001007275 error > 0.07275 > acpitimer_test_delay: expected 0.01000 actual 0.010007891 error > 0.07891 > > acpihpet_test_delay: expected 0.01000 actual 0.22208 error > 0.21208 > acpihpet_test_delay: expected 0.1 actual 0.31690 error > 0.21690 > acpihpet_test_delay: expected 0.00010 actual 0.000112647 error > 0.12647 > acpihpet_test_delay: expected 0.00100 actual 0.001021480 error > 0.21480 > acpihpet_test_delay: expected 0.01000 actual 0.010013736 error > 0.13736 > > i8254_test_delay: expected 0.01000 actual 0.40110 error > 0.39110 > i8254_test_delay: expected 0.1 actual 0.39471 error > 0.29471 > i8254_test_delay: expected 0.00010 actual 0.000128031 error > 0.28031 > i8254_test_delay: expected 0.00100 actual 0.001024586 error > 0.24586 > i8254_test_delay: expected 0.01000 actual 0.010021859 error > 0.21859 > > Index: dev/acpi/acpihpet.c > === > RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v > retrieving revision 1.26 > diff -u -p -r1.26 acpihpet.c > --- dev/acpi/acpihpet.c 6 Apr 2022 18:59:27 - 1.26 > +++ dev/acpi/acpihpet.c 15 Aug 2022 04:21:58 - > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -31,8 +32,9 @@ int acpihpet_attached; > int acpihpet_match(struct device *, void *, void *); > void acpihpet_attach(struct device *, struct device *, void *); > int acpihpet_activate(struct device *, int); > - > +void acpiphet_delay(u_int); > u_int acpihpet_gettime(struct timecounter *tc); > +void acpihpet_test_delay(u_int); > > uint64_t acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh, > bus_size_t _ioa); > @@ -262,7 +264,7 @@ acpihpet_attach(struct device *parent, s > freq = 1000ull / period; > printf(": %lld Hz\n", freq); > > - hpet_timecounter.tc_frequency = (uint32_t)freq; > + hpet_timecounter.tc_frequency = freq; > hpet_timecounter.tc_priv = sc; > hpet_timecounter.tc_name = sc->sc_dev.dv_xname; > tc_init(_timecounter); > @@ -273,10 +275,43 @@ acpihpet_attach(struct device *parent, s > acpihpet_attached++; > } > > +void > +acpihpet_delay(u_int usecs) > +{ > + uint64_t d, s; > + struct acpihpet_softc *sc = hpet_timecounter.tc_priv; > + > + d = usecs * hpet_timecounter.tc_frequency / 100; > + s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); > + while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < d) > + CPU_BUSY_CYCLE(); > +} > + > u_int > acpihpet_gettime(struct timecounter *tc) > { > struct acpihpet_softc *sc = tc->tc_priv; > > return (bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER)); > +} > + > +void > +acpihpet_test_delay(u_int usecs) > +{ > + struct timespec ac, er, ex, t0, t1; > + > + if (!acpihpet_attached) { > + printf("%s: (no hpet attached)\n", __func__); > + return; > + } > + > + nanouptime(); > + acpihpet_delay(usecs); > + nanouptime(); > + timespecsub(, , ); > +
Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > Hi, > > In the future when the LAPIC timer is run in oneshot mode there will > be no lapic_delay(). > > This is fine if you have a constant TSC, because we have tsc_delay(). > > This is *very* bad for older amd64 machines, because you are left with > i8254_delay(). > > I would like to offer a less awful delay(9) implementation for this > class of hardware. Otherwise we may trip over bizarre phantom bugs on > MP kernels because only one CPU can read the i8254 at a time. i8254 is often gated on Intel since around Comet Lake which is why we started replacing delay_func with tsc_delay earlier when delay was still 8254 in early boot the cpu_freq_ctr() which gets the cpu speed with performance counter msrs and delay gave a wrong result as delay didn't work as expected sys/arch/amd64/amd64/identcpu.c revision 1.119 date: 2021/08/31 15:11:54; author: kettenis; state: Exp; lines: +4 -1; commitid: fyhbiR533m0v3XO9; Use the TSC delay(9) backend earlier on machines where we can. Also use the TSC for delays even if there is a skew between the TSCs of the cores as this doesn't matter for delay(9). Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs where the i8254 behaves in weird ways. ok patrick@, deraadt@, mlarkin@ sys/arch/amd64/amd64/lapic.c revision 1.55 date: 2019/08/03 14:57:51; author: jcs; state: Exp; lines: +8 -4; commitid: NTF41FCHEhE2NllV; If the CPU frequency is available during TSC init, make it available for lapic timer init to avoid calibrating against the 8254 clock. Some newer machines are shipping with the 8254 clock gated for power saving, so it may not be usable. ok mlarkin discussed with deraadt On some Intel hardware getting the frequency out of cpuid for the tsc isn't possible. tsc.c 1.22 describes some of the mess > > I think patrick@ was struggling with some version of that problem last > year, but in a VM. patrick moved pvbus probing so hv_delay would be used earlier sys/arch/amd64/amd64/mainbus.c revision 1.51 date: 2021/08/31 15:52:59; author: patrick; state: Exp; lines: +1 -7; commitid: reA7Lsk1mKRhapKD; Identify the paravirtual bus earlier, as we need to make sure that we have a working delay func ready before the first occurence of delay(). This is necessary on Hyper-V Gen 2 VMs where we don't use the TSC. Discussed with the hackroom ok kettenis@ > > Real i386 hardware should be fine. Later models with an ACPI PM timer > will be fine using acpitimer_delay() instead of i8254_delay(). > > If this seems reasonable to people I will come back with a cleaned up > patch for testing. > > Thoughts? Preferences? > > -Scott > > Here are the sample measurements from my 2017 laptop (kaby lake > refresh) running the attached patch. It takes longer than a > microsecond to read either of the ACPI timers. The PM timer is better > than the HPET. The HPET is a bit better than the i8254. I hope the > numbers are a little better on older hardware. > > acpitimer_test_delay: expected 0.01000 actual 0.10638 error > 0.09638 > acpitimer_test_delay: expected 0.1 actual 0.15464 error > 0.05464 > acpitimer_test_delay: expected 0.00010 actual 0.000107619 error > 0.07619 > acpitimer_test_delay: expected 0.00100 actual 0.001007275 error > 0.07275 > acpitimer_test_delay: expected 0.01000 actual 0.010007891 error > 0.07891 > > acpihpet_test_delay: expected 0.01000 actual 0.22208 error > 0.21208 > acpihpet_test_delay: expected 0.1 actual 0.31690 error > 0.21690 > acpihpet_test_delay: expected 0.00010 actual 0.000112647 error > 0.12647 > acpihpet_test_delay: expected 0.00100 actual 0.001021480 error > 0.21480 > acpihpet_test_delay: expected 0.01000 actual 0.010013736 error > 0.13736 > > i8254_test_delay: expected 0.01000 actual 0.40110 error > 0.39110 > i8254_test_delay: expected 0.1 actual 0.39471 error > 0.29471 > i8254_test_delay: expected 0.00010 actual 0.000128031 error > 0.28031 > i8254_test_delay: expected 0.00100 actual 0.001024586 error > 0.24586 > i8254_test_delay: expected 0.01000 actual 0.010021859 error > 0.21859 cpu0: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz, 2194.88 MHz, 06-0f-0d cpu0: apic clock running at 199MHz acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpitimer_test_delay: expected 0.01000 actual 0.10267 error 0.09267 acpitimer_test_delay: expected 0.1 actual 0.18648 error 0.08648 acpitimer_test_delay: expected 0.00010 actual 0.000108254 error 0.08254 acpitimer_test_delay: expected 0.00100 actual