Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-12 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 08:24:11AM -0500, Scott Cheloha wrote:
> On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote:
> > On 09/07, Scott Cheloha wrote:
> > > Just to make sure that my changes to acpihpet(4) actually caused
> > > the problem, I have a few more questions:
> > > 
> > > 1. When did you change the OS type?
> > 
> > 03 August, after that, there was a local snpashot built from sources
> > fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected.
> > 
> > > 2. What was the compilation date of the kernel where you first saw the
> > >problem?
> > 
> > Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34
> > +0300.
> >  
> > > 3. If you boot an unpatched kernel does the problem manifest in
> > >exactly the same way?
> >  
> > As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11
> > versions (32-bit)", but to "FreeBSD 11 (32-bit)".
> > The problem affects only VMs which have Guest OS Version set to "FreeBSD
> > Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022.
> > 
> > Sample outputs from the machine running older snapshot which is not
> > affected:
> > 
> > $ sysctl kern.version | head -n1
> > kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK
> > 2022
> > 
> > $ sysctl | grep tsc
> > kern.timecounter.hardware=tsc
> > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> > acpitimer0(1000)
> > machdep.tscfreq=1500017850
> > machdep.invarianttsc=1
> 
> Please send a full bug report to b...@openbsd.org.
> 
> Include full a dmesg for the affected machine.
> 
> Something else with VMWare is involved here and I'm not enough of an
> expert to say what.  More eyes will be helpful.
> 
> Your problem is *probably* caused by my changes, but there are too
> many moving parts with all the different VMWare configurations for me
> to narrow down what the issue is definitively.

I have committed the acpihpet(4) fix.

Please send a full bug report to b...@openbsd.org so we can continue
poking at this.  I have some test code I'd like you to try out so we
can get a better look at the TSC calibration process on your machine.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-08 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote:
> On 09/07, Scott Cheloha wrote:
> > Just to make sure that my changes to acpihpet(4) actually caused
> > the problem, I have a few more questions:
> > 
> > 1. When did you change the OS type?
> 
> 03 August, after that, there was a local snpashot built from sources
> fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected.
> 
> > 2. What was the compilation date of the kernel where you first saw the
> >problem?
> 
> Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34
> +0300.
>  
> > 3. If you boot an unpatched kernel does the problem manifest in
> >exactly the same way?
>  
> As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11
> versions (32-bit)", but to "FreeBSD 11 (32-bit)".
> The problem affects only VMs which have Guest OS Version set to "FreeBSD
> Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022.
> 
> Sample outputs from the machine running older snapshot which is not
> affected:
> 
> $ sysctl kern.version | head -n1
> kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK
> 2022
> 
> $ sysctl | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=1500017850
> machdep.invarianttsc=1

Please send a full bug report to b...@openbsd.org.

Include full a dmesg for the affected machine.

Something else with VMWare is involved here and I'm not enough of an
expert to say what.  More eyes will be helpful.

Your problem is *probably* caused by my changes, but there are too
many moving parts with all the different VMWare configurations for me
to narrow down what the issue is definitively.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-07 Thread Pavel Korovin
On 09/07, Scott Cheloha wrote:
> Just to make sure that my changes to acpihpet(4) actually caused
> the problem, I have a few more questions:
> 
> 1. When did you change the OS type?

03 August, after that, there was a local snpashot built from sources
fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected.

> 2. What was the compilation date of the kernel where you first saw the
>problem?

Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34
+0300.
 
> 3. If you boot an unpatched kernel does the problem manifest in
>exactly the same way?
 
As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11
versions (32-bit)", but to "FreeBSD 11 (32-bit)".
The problem affects only VMs which have Guest OS Version set to "FreeBSD
Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022.

Sample outputs from the machine running older snapshot which is not
affected:

$ sysctl kern.version | head -n1
kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK
2022

$ sysctl | grep tsc
kern.timecounter.hardware=tsc
kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
acpitimer0(1000)
machdep.tscfreq=1500017850
machdep.invarianttsc=1

-- 
With best regards,
Pavel Korovin



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-07 Thread Pavel Korovin
Oh no, sorry, that was too early.
Double-checked once again, no, this problem still exists if Guest OS
Version is set to "FreeBSD Pre-11 versions (32-bit)".
If I set Guest OS Version to "FreeBSD 11 (32-bit)", the problem
disappears.

On 09/08, Pavel Korovin wrote:
> Hi Scott,
> 
> Thank you for the fix!
> 
> I found what triggered this behaviour: it was the change in Guest OS
> Version in VM Options.
> 
> I deploy VMs with sysutils/packer, some time ago I noticed that VM type in
> my templates is specified as freebsd11_64Guest, which isn't consistent with
> vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)".
> 
> After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this
> problem with tsc.
> 
> The provided diff fixes it:
> 
> $ sysctl -a | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=158129
> machdep.invarianttsc=1

-- 
With best regards,
Pavel Korovin



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-07 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 05:01:33AM +0300, Pavel Korovin wrote:
> Hi Scott,
> 
> Thank you for the fix!
> 
> I found what triggered this behaviour: it was the change in Guest OS
> Version in VM Options.
> 
> I deploy VMs with sysutils/packer, some time ago I noticed that VM type in
> my templates is specified as freebsd11_64Guest, which isn't consistent with
> vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)".
> 
> After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this
> problem with tsc.
> 
> 
> The provided diff fixes it:
> 
> $ sysctl -a | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=158129
> machdep.invarianttsc=1

I'm glad you are not seeing the issue anymore.

Just to make sure that my changes to acpihpet(4) actually caused
the problem, I have a few more questions:

1. When did you change the OS type?

2. What was the compilation date of the kernel where you first saw the
   problem?

3. If you boot an unpatched kernel does the problem manifest in
   exactly the same way?

-Scott



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-07 Thread Pavel Korovin
Hi Scott,

Thank you for the fix!

I found what triggered this behaviour: it was the change in Guest OS
Version in VM Options.

I deploy VMs with sysutils/packer, some time ago I noticed that VM type in
my templates is specified as freebsd11_64Guest, which isn't consistent with
vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)".

After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this
problem with tsc.

The provided diff fixes it:

$ sysctl -a | grep tsc
kern.timecounter.hardware=tsc
kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
acpitimer0(1000)
machdep.tscfreq=158129
machdep.invarianttsc=1


On 09/06, Scott Cheloha wrote:
> On Sat, Sep 03, 2022 at 01:50:28PM +0300, Pavel Korovin wrote:
> > After these changes, OpenBSD VMware guest's clock is galloping into the
> > future like this:
> > Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s
> > Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s
> > Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s
> > Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s
> > Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s
> > Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s
> > Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s
> > Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s
> > Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s
> > Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s
> > Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s
> > 
> > VM time right after boot:
> > rdate -pn $ntp; date
> > Sat Sep  3 13:39:43 MSK 2022
> > Sat Sep  3 13:43:24 MSK 2022
> > 
> > $ sysctl -a | grep tsc
> > kern.timecounter.hardware=tsc
> > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> > acpitimer0(1000)
> > machdep.tscfreq=580245275
> 
> This frequency looks wrong.
> 
> My first guess is that you are hitting a split-read problem in
> acpihpet_delay() when recalibrating the TSC.
> 
> Does this patch fix it?
> 
> If you can't build a kernel for testing I can just commit this and you
> can try the snapshot in a day or two.
> 
> Index: acpihpet.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 acpihpet.c
> --- acpihpet.c25 Aug 2022 18:01:54 -  1.28
> +++ acpihpet.c6 Sep 2022 16:12:23 -
> @@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
>  void
>  acpihpet_delay(int usecs)
>  {
> - uint64_t c, s;
> + uint64_t count = 0, cycles;
>   struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
> + uint32_t val1, val2;
>  
> - s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
> - c = usecs * hpet_timecounter.tc_frequency / 100;
> - while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
> + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
> + cycles = usecs * hpet_timecounter.tc_frequency / 100;
> + while (count < cycles) {
>   CPU_BUSY_CYCLE();
> + val1 = val2;
> + val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
> + HPET_MAIN_COUNTER);
> + count += val2 - val1;
> + }
>  }
>  
>  u_int

-- 
With best regards,
Pavel Korovin



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-06 Thread Scott Cheloha
On Sat, Sep 03, 2022 at 01:50:28PM +0300, Pavel Korovin wrote:
> After these changes, OpenBSD VMware guest's clock is galloping into the
> future like this:
> Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s
> Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s
> Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s
> Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s
> Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s
> Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s
> Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s
> Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s
> Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s
> Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s
> Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s
> 
> VM time right after boot:
> rdate -pn $ntp; date
> Sat Sep  3 13:39:43 MSK 2022
> Sat Sep  3 13:43:24 MSK 2022
> 
> $ sysctl -a | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=580245275

This frequency looks wrong.

My first guess is that you are hitting a split-read problem in
acpihpet_delay() when recalibrating the TSC.

Does this patch fix it?

If you can't build a kernel for testing I can just commit this and you
can try the snapshot in a day or two.

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  6 Sep 2022 16:12:23 -
@@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
 void
 acpihpet_delay(int usecs)
 {
-   uint64_t c, s;
+   uint64_t count = 0, cycles;
struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+   uint32_t val1, val2;
 
-   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
-   c = usecs * hpet_timecounter.tc_frequency / 100;
-   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   cycles = usecs * hpet_timecounter.tc_frequency / 100;
+   while (count < cycles) {
CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
+   HPET_MAIN_COUNTER);
+   count += val2 - val1;
+   }
 }
 
 u_int



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-03 Thread Pavel Korovin
After these changes, OpenBSD VMware guest's clock is galloping into the
future like this:
Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s
Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s
Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s
Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s
Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s
Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s
Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s
Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s
Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s
Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s
Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s

VM time right after boot:
rdate -pn $ntp; date
Sat Sep  3 13:39:43 MSK 2022
Sat Sep  3 13:43:24 MSK 2022

$ sysctl -a | grep tsc
kern.timecounter.hardware=tsc
kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
acpitimer0(1000)
machdep.tscfreq=580245275
machdep.invarianttsc=1

And in a couple of minutes after boot the timeskew getting like this:
$ rdate -pn $ntp; date
Sat Sep  3 13:41:49 MSK 2022
Sat Sep  3 13:48:46 MSK 2022

After several hours the VM lives in tomorrow.

As a workaround, changing TSC source helps:
# sysctl kern.timecounter.hardware=acpihpet0

-- 
With best regards,
Pavel Korovin



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-29 Thread Scott Cheloha
> On Aug 29, 2022, at 22:54, Jonathan Gray  wrote:
> 
> On Mon, Aug 29, 2022 at 12:02:42PM -0500, Scott Cheloha wrote:
>> If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a
>> vm exit, and both have microsecond or better resolution, then
>> hv_delay() is the preferable delay(9) implementation where it is
>> available because vm exits have ambiguous overhead.
> 
> with hv_delay() currently doing rdmsr I wouldn't say never
> 
>> 
>> If that seems sensible to you, I'll commit this switch.
> 
> There is a msr to allow cpuid to report invariant tsc (0x4118).
> Used by linux but not documented in the hyper-v tlfs.
> 
> Without that, tsc delay is never used on hyper-v.  So leave it
> as is until someone running hyper-v/azure would like it changed?

Works for me.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-29 Thread Jonathan Gray
On Mon, Aug 29, 2022 at 12:02:42PM -0500, Scott Cheloha wrote:
> If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a
> vm exit, and both have microsecond or better resolution, then
> hv_delay() is the preferable delay(9) implementation where it is
> available because vm exits have ambiguous overhead.

with hv_delay() currently doing rdmsr I wouldn't say never

> 
> If that seems sensible to you, I'll commit this switch.

There is a msr to allow cpuid to report invariant tsc (0x4118).
Used by linux but not documented in the hyper-v tlfs.

Without that, tsc delay is never used on hyper-v.  So leave it
as is until someone running hyper-v/azure would like it changed?

> 
> Index: arch/amd64/amd64/tsc.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> retrieving revision 1.26
> diff -u -p -r1.26 tsc.c
> --- arch/amd64/amd64/tsc.c25 Aug 2022 17:38:16 -  1.26
> +++ arch/amd64/amd64/tsc.c29 Aug 2022 16:58:25 -
> @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
>  
>   tsc_frequency = tsc_freq_cpuid(ci);
>   if (tsc_frequency > 0)
> - delay_init(tsc_delay, 5000);
> + delay_init(tsc_delay, 4000);
>  }
>  
>  static inline int
> Index: dev/pv/pvbus.c
> ===
> RCS file: /cvs/src/sys/dev/pv/pvbus.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 pvbus.c
> --- dev/pv/pvbus.c25 Aug 2022 17:38:16 -  1.25
> +++ dev/pv/pvbus.c29 Aug 2022 16:58:26 -
> @@ -320,7 +320,7 @@ pvbus_hyperv(struct pvbus_hv *hv)
>  
>  #if NHYPERV > 0
>   if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT)
> - delay_init(hv_delay, 4000);
> + delay_init(hv_delay, 5000);
>  #endif
>  }
>  
> 
> 



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-29 Thread Scott Cheloha
On Thu, Aug 25, 2022 at 03:57:48PM +1000, Jonathan Gray wrote:
> On Wed, Aug 24, 2022 at 11:05:30PM -0500, Scott Cheloha wrote:
> > On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote:
> > > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > > > > devices.  32-bit and 64-bit guests.
> > > > > 
> > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > > > > 64-bit guests only.
> > > > > 
> > > > > There is no 8254 in generation 2.
> > > > > No HPET in either generation.
> > > > > 
> > > > > hv_delay uses the "Partition Reference Counter MSR" described in
> > > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > > > > It seems it is available in both generations and could be used from 
> > > > > i386?
> > > > > 
> > > > > From reading that page hv_delay() should be preferred over 
> > > > > lapic_delay()
> > > > 
> > > > Alright, I have nudged hv_delay's quality up over lapic_delay's
> > > > quality.
> > > 
> > > Before these changes, tsc is probed before pvbus.  Do the tsc sanity
> > > checks result in it not being considered an option on hyper-v?  I think
> > > the tsc_delay and hv_delay numbers should be swapped in a later commit.
> > > It is unclear if that would change the final delay_func setting.
> > 
> > Why would we prefer hv_delay() to tsc_delay() if we had a
> > constant/invariant TSC available in our Hyper-V guest?
> > 
> > When patrick@ emailed me last year about issues with delay(9) on
> > Hyper-V, he started by saying that the root of the problem was that
> > the OpenBSD guest was not opting to use tsc_delay() because the host
> > wasn't reporting a constant/invariant TSC.  So the guest was trying to
> > use i8254_delay(), which was impossible because Hyper-V Gen2 guests
> > don't have an i8254.  Hence, hv_delay() was added to the tree.
> > 
> > So, my understanding is that the addition of hv_delay() does not mean
> > tsc_delay() is worse than hv_delay().  hv_delay() was added because
> > tsc_delay() isn't always an option and (to our surprise) neither is
> > i8254_delay().
> 
> I'm not clear on when rdtsc and rdmsr would cause a vm exit.
> Presumably the reference tsc page is provided to avoid that,
> but we don't use it.  rdtsc and rdmsr don't always cause an exit.
> 
> The wording of Microsoft's "Hypervisor Top Level Functional
> Specification" reads as the interface is only available when
> the underlying machine has a constant frequency tsc.  It also
> makes the point that the interface being in time not cycles avoids
> problems when the tsc frequency changes on live migration.
> 
> "12.3 Partition Reference Time Enlightenment
> 
> The partition reference time enlightenment presents a reference
> time source to a partition which does not require an intercept into
> the hypervisor. This enlightenment is available only when the
> underlying platform provides support of an invariant processor Time
> Stamp Counter (TSC), or iTSC. In such platforms, the processor TSC
> frequency remains constant irrespective of changes in the processor's
> clock frequency due to the use of power management states such as
> ACPI processor performance states, processor idle sleep states (ACPI
> C-states), etc.
> 
> The partition reference time enlightenment uses a virtual TSC value,
> an offset and a multiplier to enable a guest partition to compute
> the normalized reference time since partition creation, in 100nS
> units. The mechanism also allows a guest partition to atomically
> compute the reference time when the guest partition is migrated to
> a platform with a different TSC rate, and provides a fallback
> mechanism to support migration to platforms without the constant
> rate TSC feature.
> 
> This facility is not intended to be used a source of wall clock
> time, since the reference time computed using this facility will
> appear to stop during the time that a guest partition is saved until
> the subsequent restore."

If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a
vm exit, and both have microsecond or better resolution, then
hv_delay() is the preferable delay(9) implementation where it is
available because vm exits have ambiguous overhead.

If that seems sensible to you, I'll commit this switch.

Index: arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.26
diff -u -p -r1.26 tsc.c
--- arch/amd64/amd64/tsc.c  25 Aug 2022 17:38:16 -  1.26
+++ arch/amd64/amd64/tsc.c  29 Aug 2022 16:58:25 -
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_freq_cpuid(ci);
if (tsc_frequency > 0)
-   delay_init(tsc_delay, 5000);
+   delay_init(tsc_delay, 4000);
 }
 
 static inline int
Index: dev/pv/pvbus.c

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-24 Thread Jonathan Gray
On Wed, Aug 24, 2022 at 11:05:30PM -0500, Scott Cheloha wrote:
> On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote:
> > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > > > devices.  32-bit and 64-bit guests.
> > > > 
> > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > > > 64-bit guests only.
> > > > 
> > > > There is no 8254 in generation 2.
> > > > No HPET in either generation.
> > > > 
> > > > hv_delay uses the "Partition Reference Counter MSR" described in
> > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > > > It seems it is available in both generations and could be used from 
> > > > i386?
> > > > 
> > > > From reading that page hv_delay() should be preferred over lapic_delay()
> > > 
> > > Alright, I have nudged hv_delay's quality up over lapic_delay's
> > > quality.
> > 
> > Before these changes, tsc is probed before pvbus.  Do the tsc sanity
> > checks result in it not being considered an option on hyper-v?  I think
> > the tsc_delay and hv_delay numbers should be swapped in a later commit.
> > It is unclear if that would change the final delay_func setting.
> 
> Why would we prefer hv_delay() to tsc_delay() if we had a
> constant/invariant TSC available in our Hyper-V guest?
> 
> When patrick@ emailed me last year about issues with delay(9) on
> Hyper-V, he started by saying that the root of the problem was that
> the OpenBSD guest was not opting to use tsc_delay() because the host
> wasn't reporting a constant/invariant TSC.  So the guest was trying to
> use i8254_delay(), which was impossible because Hyper-V Gen2 guests
> don't have an i8254.  Hence, hv_delay() was added to the tree.
> 
> So, my understanding is that the addition of hv_delay() does not mean
> tsc_delay() is worse than hv_delay().  hv_delay() was added because
> tsc_delay() isn't always an option and (to our surprise) neither is
> i8254_delay().

I'm not clear on when rdtsc and rdmsr would cause a vm exit.
Presumably the reference tsc page is provided to avoid that,
but we don't use it.  rdtsc and rdmsr don't always cause an exit.

The wording of Microsoft's "Hypervisor Top Level Functional
Specification" reads as the interface is only available when
the underlying machine has a constant frequency tsc.  It also
makes the point that the interface being in time not cycles avoids
problems when the tsc frequency changes on live migration.

"12.3 Partition Reference Time Enlightenment

The partition reference time enlightenment presents a reference
time source to a partition which does not require an intercept into
the hypervisor. This enlightenment is available only when the
underlying platform provides support of an invariant processor Time
Stamp Counter (TSC), or iTSC. In such platforms, the processor TSC
frequency remains constant irrespective of changes in the processor's
clock frequency due to the use of power management states such as
ACPI processor performance states, processor idle sleep states (ACPI
C-states), etc.

The partition reference time enlightenment uses a virtual TSC value,
an offset and a multiplier to enable a guest partition to compute
the normalized reference time since partition creation, in 100nS
units. The mechanism also allows a guest partition to atomically
compute the reference time when the guest partition is migrated to
a platform with a different TSC rate, and provides a fallback
mechanism to support migration to platforms without the constant
rate TSC feature.

This facility is not intended to be used a source of wall clock
time, since the reference time computed using this facility will
appear to stop during the time that a guest partition is saved until
the subsequent restore."



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-24 Thread Scott Cheloha
On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote:
> On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > > devices.  32-bit and 64-bit guests.
> > > 
> > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > > 64-bit guests only.
> > > 
> > > There is no 8254 in generation 2.
> > > No HPET in either generation.
> > > 
> > > hv_delay uses the "Partition Reference Counter MSR" described in
> > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > > It seems it is available in both generations and could be used from i386?
> > > 
> > > From reading that page hv_delay() should be preferred over lapic_delay()
> > 
> > Alright, I have nudged hv_delay's quality up over lapic_delay's
> > quality.
> 
> Before these changes, tsc is probed before pvbus.  Do the tsc sanity
> checks result in it not being considered an option on hyper-v?  I think
> the tsc_delay and hv_delay numbers should be swapped in a later commit.
> It is unclear if that would change the final delay_func setting.

Why would we prefer hv_delay() to tsc_delay() if we had a
constant/invariant TSC available in our Hyper-V guest?

When patrick@ emailed me last year about issues with delay(9) on
Hyper-V, he started by saying that the root of the problem was that
the OpenBSD guest was not opting to use tsc_delay() because the host
wasn't reporting a constant/invariant TSC.  So the guest was trying to
use i8254_delay(), which was impossible because Hyper-V Gen2 guests
don't have an i8254.  Hence, hv_delay() was added to the tree.

So, my understanding is that the addition of hv_delay() does not mean
tsc_delay() is worse than hv_delay().  hv_delay() was added because
tsc_delay() isn't always an option and (to our surprise) neither is
i8254_delay().

> It would be a good idea to have different commits for the places new
> delay callbacks are introduced.
> 
> - add delay_init()
> - use delay_init() in lapic, tsc, hv_delay
> - commit acpihpet
> - commit acpitimer

I had planned to do separate commits.  This ordering seems right.

> - swap tsc and hv_delay numbers

See above.

> > How are we looking now?
> 
> some minor suggestions inline
> 
> have you built a release with this?

Just finished building a release and upgrading with it from physical
media.  I think we are good to go.  I incorporated your suggestions
below and I'm going to do the first four suggested commits tomorrow
unless I hear otherwise.

Current combined patch is attached.

> > Index: sys/arch/amd64/amd64/lapic.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> > retrieving revision 1.60
> > diff -u -p -r1.60 lapic.c
> > --- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
> > +++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 -
> > @@ -486,8 +486,6 @@ wait_next_cycle(void)
> > }
> >  }
> >  
> > -extern void tsc_delay(int);
> > -
> 
> this cleanup is unrelated and should be a different diff/commit

Ack, will do it separately.

> >  /*
> >   * Calibrate the local apic count-down timer (which is running at
> >   * bus-clock speed) vs. the i8254 counter/timer (which is running at
> > @@ -592,8 +590,7 @@ skip_calibration:
> >  * Now that the timer's calibrated, use the apic timer routines
> >  * for all our timing needs..
> >  */
> > -   if (delay_func == i8254_delay)
> > -   delay_func = lapic_delay;
> > +   delay_init(lapic_delay, 3000);
> > initclock_func = lapic_initclocks;
> > }
> >  }
> > Index: sys/arch/amd64/amd64/machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
> > retrieving revision 1.279
> > diff -u -p -r1.279 machdep.c
> > --- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
> > +++ sys/arch/amd64/amd64/machdep.c  23 Aug 2022 17:18:31 -
> > @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
> >  
> > return 0;
> >  }
> > +
> > +void
> > +delay_init(void(*fn)(int), int fn_quality)
> > +{
> > +   static int cur_quality = 0;
> > +   if (fn_quality > cur_quality) {
> > +   delay_func = fn;
> > +   cur_quality = fn_quality;
> > +   }
> > +}
> > Index: sys/arch/amd64/amd64/tsc.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 tsc.c
> > --- sys/arch/amd64/amd64/tsc.c  12 Aug 2022 02:20:36 -  1.25
> > +++ sys/arch/amd64/amd64/tsc.c  23 Aug 2022 17:18:31 -
> > @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
> >  
> > tsc_frequency = tsc_freq_cpuid(ci);
> > if (tsc_frequency > 0)
> > -   delay_func = tsc_delay;
> > +   

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-24 Thread Jonathan Gray
On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > devices.  32-bit and 64-bit guests.
> > 
> > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > 64-bit guests only.
> > 
> > There is no 8254 in generation 2.
> > No HPET in either generation.
> > 
> > hv_delay uses the "Partition Reference Counter MSR" described in
> > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > It seems it is available in both generations and could be used from i386?
> > 
> > From reading that page hv_delay() should be preferred over lapic_delay()
> 
> Alright, I have nudged hv_delay's quality up over lapic_delay's
> quality.

Before these changes, tsc is probed before pvbus.  Do the tsc sanity
checks result in it not being considered an option on hyper-v?  I think
the tsc_delay and hv_delay numbers should be swapped in a later commit.
It is unclear if that would change the final delay_func setting.

It would be a good idea to have different commits for the places new
delay callbacks are introduced.

- add delay_init()
- use delay_init() in lapic, tsc, hv_delay
- commit acpihpet
- commit acpitimer
- swap tsc and hv_delay numbers

> 
> How are we looking now?

some minor suggestions inline

have you built a release with this?

> 
> Index: sys/arch/amd64/amd64/lapic.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> retrieving revision 1.60
> diff -u -p -r1.60 lapic.c
> --- sys/arch/amd64/amd64/lapic.c  15 Aug 2022 04:17:50 -  1.60
> +++ sys/arch/amd64/amd64/lapic.c  23 Aug 2022 17:18:30 -
> @@ -486,8 +486,6 @@ wait_next_cycle(void)
>   }
>  }
>  
> -extern void tsc_delay(int);
> -

this cleanup is unrelated and should be a different diff/commit

>  /*
>   * Calibrate the local apic count-down timer (which is running at
>   * bus-clock speed) vs. the i8254 counter/timer (which is running at
> @@ -592,8 +590,7 @@ skip_calibration:
>* Now that the timer's calibrated, use the apic timer routines
>* for all our timing needs..
>*/
> - if (delay_func == i8254_delay)
> - delay_func = lapic_delay;
> + delay_init(lapic_delay, 3000);
>   initclock_func = lapic_initclocks;
>   }
>  }
> Index: sys/arch/amd64/amd64/machdep.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
> retrieving revision 1.279
> diff -u -p -r1.279 machdep.c
> --- sys/arch/amd64/amd64/machdep.c7 Aug 2022 23:56:06 -   1.279
> +++ sys/arch/amd64/amd64/machdep.c23 Aug 2022 17:18:31 -
> @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
>  
>   return 0;
>  }
> +
> +void
> +delay_init(void(*fn)(int), int fn_quality)
> +{
> + static int cur_quality = 0;
> + if (fn_quality > cur_quality) {
> + delay_func = fn;
> + cur_quality = fn_quality;
> + }
> +}
> Index: sys/arch/amd64/amd64/tsc.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 tsc.c
> --- sys/arch/amd64/amd64/tsc.c12 Aug 2022 02:20:36 -  1.25
> +++ sys/arch/amd64/amd64/tsc.c23 Aug 2022 17:18:31 -
> @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
>  
>   tsc_frequency = tsc_freq_cpuid(ci);
>   if (tsc_frequency > 0)
> - delay_func = tsc_delay;
> + delay_init(tsc_delay, 5000);
>  }
>  
>  static inline int
> Index: sys/arch/amd64/include/cpu.h
> ===
> RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
> retrieving revision 1.148
> diff -u -p -r1.148 cpu.h
> --- sys/arch/amd64/include/cpu.h  22 Aug 2022 08:57:54 -  1.148
> +++ sys/arch/amd64/include/cpu.h  23 Aug 2022 17:18:31 -
> @@ -359,6 +359,7 @@ void signotify(struct proc *);
>   * We need a machine-independent name for this.
>   */
>  extern void (*delay_func)(int);
> +void delay_init(void (*)(int), int);
>  struct timeval;
>  
>  #define DELAY(x) (*delay_func)(x)
> Index: sys/arch/i386/i386/lapic.c
> ===
> RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 lapic.c
> --- sys/arch/i386/i386/lapic.c15 Aug 2022 04:17:50 -  1.49
> +++ sys/arch/i386/i386/lapic.c23 Aug 2022 17:18:31 -
> @@ -395,7 +395,7 @@ lapic_calibrate_timer(struct cpu_info *c
>* Now that the timer's calibrated, use the apic timer routines
>* for all our timing needs..
>*/
> - delay_func = lapic_delay;
> + delay_init(lapic_delay, 3000);
> 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-23 Thread Scott Cheloha
On Tue, Aug 23, 2022 at 04:04:39PM +1000, Jonathan Gray wrote:
> On Mon, Aug 22, 2022 at 09:37:02AM -0500, Scott Cheloha wrote:
> > On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> > > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> > > > 
> > > > It seems to me it would be cleaner if the decision of what to use for
> > > > delay could be moved into an md file.
> > > > 
> > > > Or abstract it by having a numeric weight like timecounters or driver
> > > > match return numbers.
> > > 
> > > diff against your previous, does not change lapic_delay
> > 
> > Sorry for the delay.
> > 
> > :)
> > 
> > I was out of town.
> > 
> > I slept on this and you're right, this is better.
> > 
> > Couple tweaks:
> > 
> > - Move the quality numbers into cpu.h and give them names.  That way,
> >   the next time Intel, or AMD, or Microsoft or [...] does something
> >   foolish we don't need to rototill all these files to juggle the
> >   quality hierarchy.
> 
> While that would allow arch specific differences, removing one would
> require changing at least three places in the tree.

Okay, nevermind.  Patch updated below.

> > - Update both amd64 and i386's lapic.c to use delay_init().  While we
> >   have a lapic_delay() in the tree it should cooperate with everything
> >   else.
> > 
> > - Include  in any files calling delay_init() where it
> >   isn't already included.
> > 
> > - Give the variables in delay_init() real names.
> 
> hmm
> 
> > I'm unsure about two small things:
> > 
> > - Can i386 use hv_delay()?  The i386 GENERIC config does not list
> >   hyperv(4) support so my guess is "no" and I have excluded
> >   HV_DELAY_QUALITY from i386's cpu.h.
> > 
> > - If a Hyper-V guest could choose between hv_delay() and
> >   lapic_delay(), which would be preferable?  Right now I
> >   have hv_delay() scored lower than lapic_delay().
> 
> Hyper-V generation 1 VMs are bios boot with emulation of the usual
> devices.  32-bit and 64-bit guests.
> 
> Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> 64-bit guests only.
> 
> There is no 8254 in generation 2.
> No HPET in either generation.
> 
> hv_delay uses the "Partition Reference Counter MSR" described in
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> It seems it is available in both generations and could be used from i386?
> 
> From reading that page hv_delay() should be preferred over lapic_delay()

Alright, I have nudged hv_delay's quality up over lapic_delay's
quality.

How are we looking now?

Index: sys/arch/amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.60
diff -u -p -r1.60 lapic.c
--- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
+++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 -
@@ -486,8 +486,6 @@ wait_next_cycle(void)
}
 }
 
-extern void tsc_delay(int);
-
 /*
  * Calibrate the local apic count-down timer (which is running at
  * bus-clock speed) vs. the i8254 counter/timer (which is running at
@@ -592,8 +590,7 @@ skip_calibration:
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 */
-   if (delay_func == i8254_delay)
-   delay_func = lapic_delay;
+   delay_init(lapic_delay, 3000);
initclock_func = lapic_initclocks;
}
 }
Index: sys/arch/amd64/amd64/machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.279
diff -u -p -r1.279 machdep.c
--- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
+++ sys/arch/amd64/amd64/machdep.c  23 Aug 2022 17:18:31 -
@@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
 
return 0;
 }
+
+void
+delay_init(void(*fn)(int), int fn_quality)
+{
+   static int cur_quality = 0;
+   if (fn_quality > cur_quality) {
+   delay_func = fn;
+   cur_quality = fn_quality;
+   }
+}
Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.25
diff -u -p -r1.25 tsc.c
--- sys/arch/amd64/amd64/tsc.c  12 Aug 2022 02:20:36 -  1.25
+++ sys/arch/amd64/amd64/tsc.c  23 Aug 2022 17:18:31 -
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_freq_cpuid(ci);
if (tsc_frequency > 0)
-   delay_func = tsc_delay;
+   delay_init(tsc_delay, 5000);
 }
 
 static inline int
Index: sys/arch/amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.148
diff -u -p -r1.148 cpu.h
--- 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-23 Thread Jonathan Gray
On Mon, Aug 22, 2022 at 09:37:02AM -0500, Scott Cheloha wrote:
> On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> > > 
> > > It seems to me it would be cleaner if the decision of what to use for
> > > delay could be moved into an md file.
> > > 
> > > Or abstract it by having a numeric weight like timecounters or driver
> > > match return numbers.
> > 
> > diff against your previous, does not change lapic_delay
> 
> Sorry for the delay.
> 
> :)
> 
> I was out of town.
> 
> I slept on this and you're right, this is better.
> 
> Couple tweaks:
> 
> - Move the quality numbers into cpu.h and give them names.  That way,
>   the next time Intel, or AMD, or Microsoft or [...] does something
>   foolish we don't need to rototill all these files to juggle the
>   quality hierarchy.

While that would allow arch specific differences, removing one would
require changing at least three places in the tree.

> 
> - Update both amd64 and i386's lapic.c to use delay_init().  While we
>   have a lapic_delay() in the tree it should cooperate with everything
>   else.
> 
> - Include  in any files calling delay_init() where it
>   isn't already included.
> 
> - Give the variables in delay_init() real names.

hmm

> 
> I'm unsure about two small things:
> 
> - Can i386 use hv_delay()?  The i386 GENERIC config does not list
>   hyperv(4) support so my guess is "no" and I have excluded
>   HV_DELAY_QUALITY from i386's cpu.h.
> 
> - If a Hyper-V guest could choose between hv_delay() and
>   lapic_delay(), which would be preferable?  Right now I
>   have hv_delay() scored lower than lapic_delay().

Hyper-V generation 1 VMs are bios boot with emulation of the usual
devices.  32-bit and 64-bit guests.

Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
64-bit guests only.

There is no 8254 in generation 2.
No HPET in either generation.

hv_delay uses the "Partition Reference Counter MSR" described in
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
It seems it is available in both generations and could be used from i386?

>From reading that page hv_delay() should be preferred over lapic_delay()



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-22 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> > 
> > It seems to me it would be cleaner if the decision of what to use for
> > delay could be moved into an md file.
> > 
> > Or abstract it by having a numeric weight like timecounters or driver
> > match return numbers.
> 
> diff against your previous, does not change lapic_delay

Sorry for the delay.

:)

I was out of town.

I slept on this and you're right, this is better.

Couple tweaks:

- Move the quality numbers into cpu.h and give them names.  That way,
  the next time Intel, or AMD, or Microsoft or [...] does something
  foolish we don't need to rototill all these files to juggle the
  quality hierarchy.

- Update both amd64 and i386's lapic.c to use delay_init().  While we
  have a lapic_delay() in the tree it should cooperate with everything
  else.

- Include  in any files calling delay_init() where it
  isn't already included.

- Give the variables in delay_init() real names.

I'm unsure about two small things:

- Can i386 use hv_delay()?  The i386 GENERIC config does not list
  hyperv(4) support so my guess is "no" and I have excluded
  HV_DELAY_QUALITY from i386's cpu.h.

- If a Hyper-V guest could choose between hv_delay() and
  lapic_delay(), which would be preferable?  Right now I
  have hv_delay() scored lower than lapic_delay().

Once we've sorted those out, are you OK with the attached patch?

mlarkin: still ok?

Index: sys/arch/amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.60
diff -u -p -r1.60 lapic.c
--- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
+++ sys/arch/amd64/amd64/lapic.c22 Aug 2022 14:33:30 -
@@ -486,8 +486,6 @@ wait_next_cycle(void)
}
 }
 
-extern void tsc_delay(int);
-
 /*
  * Calibrate the local apic count-down timer (which is running at
  * bus-clock speed) vs. the i8254 counter/timer (which is running at
@@ -592,8 +590,7 @@ skip_calibration:
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 */
-   if (delay_func == i8254_delay)
-   delay_func = lapic_delay;
+   delay_init(lapic_delay, LAPIC_DELAY_QUALITY);
initclock_func = lapic_initclocks;
}
 }
Index: sys/arch/amd64/amd64/machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.279
diff -u -p -r1.279 machdep.c
--- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
+++ sys/arch/amd64/amd64/machdep.c  22 Aug 2022 14:33:30 -
@@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
 
return 0;
 }
+
+void
+delay_init(void(*fn)(int), int fn_quality)
+{
+   static int cur_quality = 0;
+   if (fn_quality > cur_quality) {
+   delay_func = fn;
+   cur_quality = fn_quality;
+   }
+}
Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.25
diff -u -p -r1.25 tsc.c
--- sys/arch/amd64/amd64/tsc.c  12 Aug 2022 02:20:36 -  1.25
+++ sys/arch/amd64/amd64/tsc.c  22 Aug 2022 14:33:30 -
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_freq_cpuid(ci);
if (tsc_frequency > 0)
-   delay_func = tsc_delay;
+   delay_init(tsc_delay, TSC_DELAY_QUALITY);
 }
 
 static inline int
Index: sys/arch/amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.148
diff -u -p -r1.148 cpu.h
--- sys/arch/amd64/include/cpu.h22 Aug 2022 08:57:54 -  1.148
+++ sys/arch/amd64/include/cpu.h22 Aug 2022 14:33:30 -
@@ -364,6 +364,13 @@ struct timeval;
 #define DELAY(x)   (*delay_func)(x)
 #define delay(x)   (*delay_func)(x)
 
+void delay_init(void(*)(int), int);
+
+#define ACPITIMER_DELAY_QUALITY100
+#define ACPIHPET_DELAY_QUALITY 200
+#define HV_DELAY_QUALITY   250
+#define LAPIC_DELAY_QUALITY300
+#define TSC_DELAY_QUALITY  400
 
 #ifdef _KERNEL
 /* locore.S */
Index: sys/arch/i386/i386/lapic.c
===
RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
retrieving revision 1.49
diff -u -p -r1.49 lapic.c
--- sys/arch/i386/i386/lapic.c  15 Aug 2022 04:17:50 -  1.49
+++ sys/arch/i386/i386/lapic.c  22 Aug 2022 14:33:30 -
@@ -395,7 +395,7 @@ lapic_calibrate_timer(struct cpu_info *c
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-18 Thread Mike Larkin
On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> >
> > It seems to me it would be cleaner if the decision of what to use for
> > delay could be moved into an md file.
> >
> > Or abstract it by having a numeric weight like timecounters or driver
> > match return numbers.
>
> diff against your previous, does not change lapic_delay
>

I think the combination of diffs is a move in the right direction, so ok
mlarkin on these when ready.

-ml

> diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c
> index 932b1dfeb47..c4645b6a6fd 100644
> --- sys/arch/amd64/amd64/machdep.c
> +++ sys/arch/amd64/amd64/machdep.c
> @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, struct trapframe 
> *tf)
>
>   return 0;
>  }
> +
> +void
> +delay_init(void(*f)(int), int v)
> +{
> + static int c = 0;
> + if (v > c) {
> + delay_func = f;
> + c = v;
> + }
> +}
> diff --git sys/arch/amd64/amd64/tsc.c sys/arch/amd64/amd64/tsc.c
> index fd38dc6359d..8c839357dd2 100644
> --- sys/arch/amd64/amd64/tsc.c
> +++ sys/arch/amd64/amd64/tsc.c
> @@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
>
>   tsc_frequency = tsc_freq_cpuid(ci);
>   if (tsc_frequency > 0)
> - delay_func = tsc_delay;
> + delay_init(tsc_delay, 300);
>  }
>
>  static inline int
> diff --git sys/arch/amd64/include/cpu.h sys/arch/amd64/include/cpu.h
> index b8db48f2714..a82af172452 100644
> --- sys/arch/amd64/include/cpu.h
> +++ sys/arch/amd64/include/cpu.h
> @@ -359,6 +359,7 @@ void signotify(struct proc *);
>   * We need a machine-independent name for this.
>   */
>  extern void (*delay_func)(int);
> +void delay_init(void(*)(int), int);
>  struct timeval;
>
>  #define DELAY(x) (*delay_func)(x)
> diff --git sys/arch/i386/i386/machdep.c sys/arch/i386/i386/machdep.c
> index e4cb15b4dc1..7da5c26e240 100644
> --- sys/arch/i386/i386/machdep.c
> +++ sys/arch/i386/i386/machdep.c
> @@ -3996,3 +3996,13 @@ cpu_rnd_messybits(void)
>   nanotime();
>   return (ts.tv_nsec ^ (ts.tv_sec << 20));
>  }
> +
> +void
> +delay_init(void(*f)(int), int v)
> +{
> + static int c = 0;
> + if (v > c) {
> + delay_func = f;
> + c = v;
> + }
> +}
> diff --git sys/arch/i386/include/cpu.h sys/arch/i386/include/cpu.h
> index 5f300710562..211ee475678 100644
> --- sys/arch/i386/include/cpu.h
> +++ sys/arch/i386/include/cpu.h
> @@ -302,6 +302,7 @@ void signotify(struct proc *);
>   * We need a machine-independent name for this.
>   */
>  extern void (*delay_func)(int);
> +void delay_init(void(*)(int), int);
>  struct timeval;
>
>  #define  DELAY(x)(*delay_func)(x)
> diff --git sys/dev/acpi/acpihpet.c sys/dev/acpi/acpihpet.c
> index 6dc595e50ab..4332b4dbc0e 100644
> --- sys/dev/acpi/acpihpet.c
> +++ sys/dev/acpi/acpihpet.c
> @@ -27,8 +27,6 @@
>  #include 
>  #include 
>
> -#include "acpitimer.h"
> -
>  int acpihpet_attached;
>
>  int acpihpet_match(struct device *, void *, void *);
> @@ -270,15 +268,7 @@ acpihpet_attach(struct device *parent, struct device 
> *self, void *aux)
>   hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
>   tc_init(_timecounter);
>
> -#if defined(__amd64__) || defined(__i386__)
> - if (delay_func == i8254_delay)
> - delay_func = acpihpet_delay;
> -#if NACPITIMER > 1
> - extern void acpitimer_delay(int);
> - if (delay_func == acpitimer_delay)
> - delay_func = acpihpet_delay;
> -#endif
> -#endif /* defined(__amd64__) || defined(__i386__) */
> + delay_init(acpihpet_delay, 200);
>
>  #if defined(__amd64__)
>   extern void cpu_recalibrate_tsc(struct timecounter *);
> diff --git sys/dev/acpi/acpitimer.c sys/dev/acpi/acpitimer.c
> index 0c4c7b71a01..e2757a40f3d 100644
> --- sys/dev/acpi/acpitimer.c
> +++ sys/dev/acpi/acpitimer.c
> @@ -103,10 +103,8 @@ acpitimerattach(struct device *parent, struct device 
> *self, void *aux)
>   acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
>   tc_init(_timecounter);
>
> -#if defined(__amd64__) || defined(__i386__)
> - if (delay_func == i8254_delay)
> - delay_func = acpitimer_delay;
> -#endif
> + delay_init(acpitimer_delay, 100);
> +
>  #if defined(__amd64__)
>   extern void cpu_recalibrate_tsc(struct timecounter *);
>   cpu_recalibrate_tsc(_timecounter);
> diff --git sys/dev/acpi/files.acpi sys/dev/acpi/files.acpi
> index 8ec3ec2f8b3..f97eb6d4e3e 100644
> --- sys/dev/acpi/files.acpi
> +++ sys/dev/acpi/files.acpi
> @@ -13,7 +13,7 @@ filedev/acpi/acpidebug.cacpi & ddb
>  # ACPI timer
>  device   acpitimer
>  attach   acpitimer at acpi
> -file dev/acpi/acpitimer.cacpitimer needs-flag
> +file dev/acpi/acpitimer.cacpitimer
>
>  # AC device
>  device   acpiac
> diff --git sys/dev/pv/pvbus.c sys/dev/pv/pvbus.c
> index cbe543ac312..ee53afe2138 100644
> --- 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-17 Thread Jonathan Gray
On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> 
> It seems to me it would be cleaner if the decision of what to use for
> delay could be moved into an md file.
> 
> Or abstract it by having a numeric weight like timecounters or driver
> match return numbers.

diff against your previous, does not change lapic_delay

diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c
index 932b1dfeb47..c4645b6a6fd 100644
--- sys/arch/amd64/amd64/machdep.c
+++ sys/arch/amd64/amd64/machdep.c
@@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, struct trapframe 
*tf)
 
return 0;
 }
+
+void
+delay_init(void(*f)(int), int v)
+{
+   static int c = 0;
+   if (v > c) {
+   delay_func = f;
+   c = v;
+   }
+}
diff --git sys/arch/amd64/amd64/tsc.c sys/arch/amd64/amd64/tsc.c
index fd38dc6359d..8c839357dd2 100644
--- sys/arch/amd64/amd64/tsc.c
+++ sys/arch/amd64/amd64/tsc.c
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_freq_cpuid(ci);
if (tsc_frequency > 0)
-   delay_func = tsc_delay;
+   delay_init(tsc_delay, 300);
 }
 
 static inline int
diff --git sys/arch/amd64/include/cpu.h sys/arch/amd64/include/cpu.h
index b8db48f2714..a82af172452 100644
--- sys/arch/amd64/include/cpu.h
+++ sys/arch/amd64/include/cpu.h
@@ -359,6 +359,7 @@ void signotify(struct proc *);
  * We need a machine-independent name for this.
  */
 extern void (*delay_func)(int);
+void delay_init(void(*)(int), int);
 struct timeval;
 
 #define DELAY(x)   (*delay_func)(x)
diff --git sys/arch/i386/i386/machdep.c sys/arch/i386/i386/machdep.c
index e4cb15b4dc1..7da5c26e240 100644
--- sys/arch/i386/i386/machdep.c
+++ sys/arch/i386/i386/machdep.c
@@ -3996,3 +3996,13 @@ cpu_rnd_messybits(void)
nanotime();
return (ts.tv_nsec ^ (ts.tv_sec << 20));
 }
+
+void
+delay_init(void(*f)(int), int v)
+{
+   static int c = 0;
+   if (v > c) {
+   delay_func = f;
+   c = v;
+   }
+}
diff --git sys/arch/i386/include/cpu.h sys/arch/i386/include/cpu.h
index 5f300710562..211ee475678 100644
--- sys/arch/i386/include/cpu.h
+++ sys/arch/i386/include/cpu.h
@@ -302,6 +302,7 @@ void signotify(struct proc *);
  * We need a machine-independent name for this.
  */
 extern void (*delay_func)(int);
+void delay_init(void(*)(int), int);
 struct timeval;
 
 #defineDELAY(x)(*delay_func)(x)
diff --git sys/dev/acpi/acpihpet.c sys/dev/acpi/acpihpet.c
index 6dc595e50ab..4332b4dbc0e 100644
--- sys/dev/acpi/acpihpet.c
+++ sys/dev/acpi/acpihpet.c
@@ -27,8 +27,6 @@
 #include 
 #include 
 
-#include "acpitimer.h"
-
 int acpihpet_attached;
 
 int acpihpet_match(struct device *, void *, void *);
@@ -270,15 +268,7 @@ acpihpet_attach(struct device *parent, struct device 
*self, void *aux)
hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
 
-#if defined(__amd64__) || defined(__i386__)
-   if (delay_func == i8254_delay)
-   delay_func = acpihpet_delay;
-#if NACPITIMER > 1
-   extern void acpitimer_delay(int);
-   if (delay_func == acpitimer_delay)
-   delay_func = acpihpet_delay;
-#endif
-#endif /* defined(__amd64__) || defined(__i386__) */
+   delay_init(acpihpet_delay, 200);
 
 #if defined(__amd64__)
extern void cpu_recalibrate_tsc(struct timecounter *);
diff --git sys/dev/acpi/acpitimer.c sys/dev/acpi/acpitimer.c
index 0c4c7b71a01..e2757a40f3d 100644
--- sys/dev/acpi/acpitimer.c
+++ sys/dev/acpi/acpitimer.c
@@ -103,10 +103,8 @@ acpitimerattach(struct device *parent, struct device 
*self, void *aux)
acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
 
-#if defined(__amd64__) || defined(__i386__)
-   if (delay_func == i8254_delay)
-   delay_func = acpitimer_delay;
-#endif
+   delay_init(acpitimer_delay, 100);
+
 #if defined(__amd64__)
extern void cpu_recalibrate_tsc(struct timecounter *);
cpu_recalibrate_tsc(_timecounter);
diff --git sys/dev/acpi/files.acpi sys/dev/acpi/files.acpi
index 8ec3ec2f8b3..f97eb6d4e3e 100644
--- sys/dev/acpi/files.acpi
+++ sys/dev/acpi/files.acpi
@@ -13,7 +13,7 @@ file  dev/acpi/acpidebug.cacpi & ddb
 # ACPI timer
 device acpitimer
 attach acpitimer at acpi
-file   dev/acpi/acpitimer.cacpitimer needs-flag
+file   dev/acpi/acpitimer.cacpitimer
 
 # AC device
 device acpiac
diff --git sys/dev/pv/pvbus.c sys/dev/pv/pvbus.c
index cbe543ac312..ee53afe2138 100644
--- sys/dev/pv/pvbus.c
+++ sys/dev/pv/pvbus.c
@@ -319,9 +319,8 @@ pvbus_hyperv(struct pvbus_hv *hv)
HYPERV_VERSION_EBX_MINOR_S;
 
 #if NHYPERV > 0
-   if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT &&
-   delay_func == i8254_delay)
-   delay_func = hv_delay;
+   if (hv->hv_features & CPUID_HV_MSR_TIME_REFCNT)
+   delay_init(hv_delay, 250);
 #endif
 }
 



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-17 Thread Jonathan Gray
On Wed, Aug 17, 2022 at 12:37:52AM -0500, Scott Cheloha wrote:
> On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote:
> > On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> > > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > > > 
> > > > In the future when the LAPIC timer is run in oneshot mode there will
> > > > be no lapic_delay().
> > > > 
> > > > [...]
> > > > 
> > > > This is *very* bad for older amd64 machines, because you are left with
> > > > i8254_delay().
> > > > 
> > > > I would like to offer a less awful delay(9) implementation for this
> > > > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > > > MP kernels because only one CPU can read the i8254 at a time.
> > > > 
> > > > [...]
> > > > 
> > > > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > > > will be fine using acpitimer_delay() instead of i8254_delay().
> > > > 
> > > > [...]
> > > > 
> > > > Here are the sample measurements from my 2017 laptop (kaby lake
> > > > refresh) running the attached patch.  It takes longer than a
> > > > microsecond to read either of the ACPI timers.  The PM timer is better
> > > > than the HPET.  The HPET is a bit better than the i8254.  I hope the
> > > > numbers are a little better on older hardware.
> > > > 
> > > > acpitimer_test_delay:  expected  0.01000  actual  0.10638  
> > > > error  0.09638
> > > > acpitimer_test_delay:  expected  0.1  actual  0.15464  
> > > > error  0.05464
> > > > acpitimer_test_delay:  expected  0.00010  actual  0.000107619  
> > > > error  0.07619
> > > > acpitimer_test_delay:  expected  0.00100  actual  0.001007275  
> > > > error  0.07275
> > > > acpitimer_test_delay:  expected  0.01000  actual  0.010007891  
> > > > error  0.07891
> > > > 
> > > > acpihpet_test_delay:   expected  0.01000  actual  0.22208  
> > > > error  0.21208
> > > > acpihpet_test_delay:   expected  0.1  actual  0.31690  
> > > > error  0.21690
> > > > acpihpet_test_delay:   expected  0.00010  actual  0.000112647  
> > > > error  0.12647
> > > > acpihpet_test_delay:   expected  0.00100  actual  0.001021480  
> > > > error  0.21480
> > > > acpihpet_test_delay:   expected  0.01000  actual  0.010013736  
> > > > error  0.13736
> > > > 
> > > > i8254_test_delay:  expected  0.01000  actual  0.40110  
> > > > error  0.39110
> > > > i8254_test_delay:  expected  0.1  actual  0.39471  
> > > > error  0.29471
> > > > i8254_test_delay:  expected  0.00010  actual  0.000128031  
> > > > error  0.28031
> > > > i8254_test_delay:  expected  0.00100  actual  0.001024586  
> > > > error  0.24586
> > > > i8254_test_delay:  expected  0.01000  actual  0.010021859  
> > > > error  0.21859
> > > 
> > > Attched is an updated patch.  I left the test measurement code in
> > > place because I would like to see a test on a real i386 machine, just
> > > to make sure it works as expected.  I can't imagine why it wouldn't
> > > work, but we should never assume anything.
> > > 
> > > Changes from v1:
> > > 
> > > - Actually set delay_func from acpitimerattach() and
> > >   acpihpet_attach().
> > > 
> > >   I think it's safe to assume, on real hardware, that the ACPI PMT is
> > >   preferable to the i8254 and the HPET is preferable to both of them.
> > > 
> > >   This is not *always* true, but it is true on the older machines that
> > >   can't use tsc_delay(), so the assumption works in practice.
> > > 
> > >   Outside of those three timers, the hierarchy gets murky.  There are
> > >   other timers that are better than the HPET, but they aren't always
> > >   available.  If those timers are already providing delay_func this
> > >   code does not usurp them.
> > 
> > As I understand it, you want lapic to be in one-shot mode for something
> > along the lines of tickless.
> 
> Yes.
> 
> Although "tickless" is a misnomer.
> 
> > So you are trying to find MP machines
> > where TSC is not useable for delay?
> 
> Right.  Those are the only machines where it's relevant to consider
> the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've
> forgotten something.
> 
> > TSC is only considered for delay if the invariant and constant flags
> > are set.
> > invariant:
> > "In the Core i7 and future processor generations, the TSC will continue
> > to run in the deepest C-states. Therefore, the TSC will run at a
> > constant rate in all ACPI P-, C-. and T-states. Support for this feature
> > is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant
> > TSC support, the OS may use the TSC for wall clock timer services
> > (instead of ACPI or HPET timers). TSC reads are much more efficient and
> > do not incur the overhead associated with a ring transition or access to
> > a platform resource."
> > 
> > constant:
> > runs at a constant rate across frequency/P 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote:
> On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > > 
> > > In the future when the LAPIC timer is run in oneshot mode there will
> > > be no lapic_delay().
> > > 
> > > [...]
> > > 
> > > This is *very* bad for older amd64 machines, because you are left with
> > > i8254_delay().
> > > 
> > > I would like to offer a less awful delay(9) implementation for this
> > > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > > MP kernels because only one CPU can read the i8254 at a time.
> > > 
> > > [...]
> > > 
> > > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > > will be fine using acpitimer_delay() instead of i8254_delay().
> > > 
> > > [...]
> > > 
> > > Here are the sample measurements from my 2017 laptop (kaby lake
> > > refresh) running the attached patch.  It takes longer than a
> > > microsecond to read either of the ACPI timers.  The PM timer is better
> > > than the HPET.  The HPET is a bit better than the i8254.  I hope the
> > > numbers are a little better on older hardware.
> > > 
> > > acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> > > 0.09638
> > > acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> > > 0.05464
> > > acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> > > 0.07619
> > > acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> > > 0.07275
> > > acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> > > 0.07891
> > > 
> > > acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> > > 0.21208
> > > acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> > > 0.21690
> > > acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> > > 0.12647
> > > acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> > > 0.21480
> > > acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> > > 0.13736
> > > 
> > > i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> > > 0.39110
> > > i8254_test_delay:  expected  0.1  actual  0.39471  error  
> > > 0.29471
> > > i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> > > 0.28031
> > > i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> > > 0.24586
> > > i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> > > 0.21859
> > 
> > Attched is an updated patch.  I left the test measurement code in
> > place because I would like to see a test on a real i386 machine, just
> > to make sure it works as expected.  I can't imagine why it wouldn't
> > work, but we should never assume anything.
> > 
> > Changes from v1:
> > 
> > - Actually set delay_func from acpitimerattach() and
> >   acpihpet_attach().
> > 
> >   I think it's safe to assume, on real hardware, that the ACPI PMT is
> >   preferable to the i8254 and the HPET is preferable to both of them.
> > 
> >   This is not *always* true, but it is true on the older machines that
> >   can't use tsc_delay(), so the assumption works in practice.
> > 
> >   Outside of those three timers, the hierarchy gets murky.  There are
> >   other timers that are better than the HPET, but they aren't always
> >   available.  If those timers are already providing delay_func this
> >   code does not usurp them.
> 
> As I understand it, you want lapic to be in one-shot mode for something
> along the lines of tickless.

Yes.

Although "tickless" is a misnomer.

> So you are trying to find MP machines
> where TSC is not useable for delay?

Right.  Those are the only machines where it's relevant to consider
the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've
forgotten something.

> TSC is only considered for delay if the invariant and constant flags
> are set.
> invariant:
> "In the Core i7 and future processor generations, the TSC will continue
> to run in the deepest C-states. Therefore, the TSC will run at a
> constant rate in all ACPI P-, C-. and T-states. Support for this feature
> is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant
> TSC support, the OS may use the TSC for wall clock timer services
> (instead of ACPI or HPET timers). TSC reads are much more efficient and
> do not incur the overhead associated with a ring transition or access to
> a platform resource."
> 
> constant:
> runs at a constant rate across frequency/P state changes
> 
> Intel constant
> (family == 0x0f && model >= 0x03) || (family == 0x06 && model >= 0x0e)
> 
> family 0x06 model 0x0e is yonah, core solo/duo
> Intel CPUID doc has
> "Intel Core Duo processor, Intel Core Solo processor, model 0Eh.
> All processors are manufactured using 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Jonathan Gray
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > 
> > In the future when the LAPIC timer is run in oneshot mode there will
> > be no lapic_delay().
> > 
> > [...]
> > 
> > This is *very* bad for older amd64 machines, because you are left with
> > i8254_delay().
> > 
> > I would like to offer a less awful delay(9) implementation for this
> > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > MP kernels because only one CPU can read the i8254 at a time.
> > 
> > [...]
> > 
> > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > will be fine using acpitimer_delay() instead of i8254_delay().
> > 
> > [...]
> > 
> > Here are the sample measurements from my 2017 laptop (kaby lake
> > refresh) running the attached patch.  It takes longer than a
> > microsecond to read either of the ACPI timers.  The PM timer is better
> > than the HPET.  The HPET is a bit better than the i8254.  I hope the
> > numbers are a little better on older hardware.
> > 
> > acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> > 0.09638
> > acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> > 0.05464
> > acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> > 0.07619
> > acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> > 0.07275
> > acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> > 0.07891
> > 
> > acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> > 0.21208
> > acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> > 0.21690
> > acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> > 0.12647
> > acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> > 0.21480
> > acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> > 0.13736
> > 
> > i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> > 0.39110
> > i8254_test_delay:  expected  0.1  actual  0.39471  error  
> > 0.29471
> > i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> > 0.28031
> > i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> > 0.24586
> > i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> > 0.21859
> 
> Attched is an updated patch.  I left the test measurement code in
> place because I would like to see a test on a real i386 machine, just
> to make sure it works as expected.  I can't imagine why it wouldn't
> work, but we should never assume anything.
> 
> Changes from v1:
> 
> - Actually set delay_func from acpitimerattach() and
>   acpihpet_attach().
> 
>   I think it's safe to assume, on real hardware, that the ACPI PMT is
>   preferable to the i8254 and the HPET is preferable to both of them.
> 
>   This is not *always* true, but it is true on the older machines that
>   can't use tsc_delay(), so the assumption works in practice.
> 
>   Outside of those three timers, the hierarchy gets murky.  There are
>   other timers that are better than the HPET, but they aren't always
>   available.  If those timers are already providing delay_func this
>   code does not usurp them.

As I understand it, you want lapic to be in one-shot mode for something
along the lines of tickless.  So you are trying to find MP machines
where TSC is not useable for delay?

TSC is only considered for delay if the invariant and constant flags
are set.
invariant:
"In the Core i7 and future processor generations, the TSC will continue
to run in the deepest C-states. Therefore, the TSC will run at a
constant rate in all ACPI P-, C-. and T-states. Support for this feature
is indicated by CPUID.0x8000_0007.EDX[8]. On processors with invariant
TSC support, the OS may use the TSC for wall clock timer services
(instead of ACPI or HPET timers). TSC reads are much more efficient and
do not incur the overhead associated with a ring transition or access to
a platform resource."

constant:
runs at a constant rate across frequency/P state changes

Intel constant
(family == 0x0f && model >= 0x03) || (family == 0x06 && model >= 0x0e)

family 0x06 model 0x0e is yonah, core solo/duo
Intel CPUID doc has
"Intel Core Duo processor, Intel Core Solo processor, model 0Eh.
All processors are manufactured using the 65 nm process."

family 0x0f model 0x03
"Pentium 4 processor, Intel Xeon processor, Intel Celeron D processor.
All processors are model 03h and manufactured using the 90 nm process."

VIA constant
model >= 0x0f
model 0x0f is Nano

AMD constant
CPUIDEDX_ITSC set

invariant
CPUIDEDX_ITSC set

> 
> - Duplicate test measurement code from amd64/lapic.c into i386/lapic.c.
>   Will be removed in the committed version.
> 
> - Use bus_space_read_8() in acpihpet.c if it's 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > 
> > In the future when the LAPIC timer is run in oneshot mode there will
> > be no lapic_delay().
> > 
> > [...]
> > 
> > This is *very* bad for older amd64 machines, because you are left with
> > i8254_delay().
> > 
> > I would like to offer a less awful delay(9) implementation for this
> > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > MP kernels because only one CPU can read the i8254 at a time.
> > 
> > [...]
> > 
> > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > will be fine using acpitimer_delay() instead of i8254_delay().
> > 
> > [...]
> 
> Attched is an updated patch.  I left the test measurement code in
> place because I would like to see a test on a real i386 machine, just
> to make sure it works as expected.  I can't imagine why it wouldn't
> work, but we should never assume anything.
> 
> [...]
> 
> One remaining question I have:
> 
> Is there a nice way to test whether ACPI PMT support is compiled into
> the kernel?  We can assume the existence of i8254_delay() because
> clock.c is required on amd64 and i386.  However, acpitimer.c is a
> optional, so acpitimer_delay() isn't necessarily there.
> 
> I would rather not introduce a hard requirement on acpitimer.c into
> acpihpet.c if there's an easy way to check for the latter.
> 
> Any ideas?

And here's the cleaned up patch.  Just in case nobody tests i386.
Pretty straightforward.  acpitimer is preferable to i8254, hpet is
preferable to acpitimer and i8254.

The only obvious problem I see is the hard dependency this creates in
acpihpet.c on acpitimer.c.

Index: acpitimer.c
===
RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v
retrieving revision 1.15
diff -u -p -r1.15 acpitimer.c
--- acpitimer.c 6 Apr 2022 18:59:27 -   1.15
+++ acpitimer.c 17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -25,10 +26,13 @@
 #include 
 #include 
 
+struct acpitimer_softc;
+
 int acpitimermatch(struct device *, void *, void *);
 void acpitimerattach(struct device *, struct device *, void *);
-
+void acpitimer_delay(int);
 u_int acpi_get_timecount(struct timecounter *tc);
+uint32_t acpitimer_read(struct acpitimer_softc *);
 
 static struct timecounter acpi_timecounter = {
.tc_get_timecount = acpi_get_timecount,
@@ -98,18 +102,45 @@ acpitimerattach(struct device *parent, s
acpi_timecounter.tc_priv = sc;
acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
+
+#if defined(__amd64__) || defined(__i386__)
+   if (delay_func == i8254_delay)
+   delay_func = acpitimer_delay;
+#endif
 #if defined(__amd64__)
extern void cpu_recalibrate_tsc(struct timecounter *);
cpu_recalibrate_tsc(_timecounter);
 #endif
 }
 
+void
+acpitimer_delay(int usecs)
+{
+   uint64_t count = 0, cycles;
+   struct acpitimer_softc *sc = acpi_timecounter.tc_priv;
+   uint32_t mask = acpi_timecounter.tc_counter_mask;
+   uint32_t val1, val2;
+
+   val2 = acpitimer_read(sc);
+   cycles = usecs * acpi_timecounter.tc_frequency / 100;
+   while (count < cycles) {
+   CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = acpitimer_read(sc);
+   count += (val2 - val1) & mask;
+   }
+}
 
 u_int
 acpi_get_timecount(struct timecounter *tc)
 {
-   struct acpitimer_softc *sc = tc->tc_priv;
-   u_int u1, u2, u3;
+   return acpitimer_read(tc->tc_priv);
+}
+
+uint32_t
+acpitimer_read(struct acpitimer_softc *sc)
+{
+   uint32_t u1, u2, u3;
 
u2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
u3 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.26
diff -u -p -r1.26 acpihpet.c
--- acpihpet.c  6 Apr 2022 18:59:27 -   1.26
+++ acpihpet.c  17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -31,7 +32,7 @@ int acpihpet_attached;
 int acpihpet_match(struct device *, void *, void *);
 void acpihpet_attach(struct device *, struct device *, void *);
 int acpihpet_activate(struct device *, int);
-
+void acpihpet_delay(int);
 u_int acpihpet_gettime(struct timecounter *tc);
 
 uint64_t   acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
@@ -262,15 +263,37 @@ acpihpet_attach(struct device *parent, s
freq = 1000ull / period;
printf(": %lld Hz\n", freq);
 
-   hpet_timecounter.tc_frequency = (uint32_t)freq;
+   hpet_timecounter.tc_frequency = freq;
hpet_timecounter.tc_priv = sc;
hpet_timecounter.tc_name = sc->sc_dev.dv_xname;

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> 
> In the future when the LAPIC timer is run in oneshot mode there will
> be no lapic_delay().
> 
> [...]
> 
> This is *very* bad for older amd64 machines, because you are left with
> i8254_delay().
> 
> I would like to offer a less awful delay(9) implementation for this
> class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> MP kernels because only one CPU can read the i8254 at a time.
> 
> [...]
> 
> Real i386 hardware should be fine.  Later models with an ACPI PM timer
> will be fine using acpitimer_delay() instead of i8254_delay().
> 
> [...]
> 
> Here are the sample measurements from my 2017 laptop (kaby lake
> refresh) running the attached patch.  It takes longer than a
> microsecond to read either of the ACPI timers.  The PM timer is better
> than the HPET.  The HPET is a bit better than the i8254.  I hope the
> numbers are a little better on older hardware.
> 
> acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> 0.09638
> acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> 0.05464
> acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> 0.07619
> acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> 0.07275
> acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> 0.07891
> 
> acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> 0.21208
> acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> 0.21690
> acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> 0.12647
> acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> 0.21480
> acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> 0.13736
> 
> i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> 0.39110
> i8254_test_delay:  expected  0.1  actual  0.39471  error  
> 0.29471
> i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> 0.28031
> i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> 0.24586
> i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> 0.21859

Attched is an updated patch.  I left the test measurement code in
place because I would like to see a test on a real i386 machine, just
to make sure it works as expected.  I can't imagine why it wouldn't
work, but we should never assume anything.

Changes from v1:

- Actually set delay_func from acpitimerattach() and
  acpihpet_attach().

  I think it's safe to assume, on real hardware, that the ACPI PMT is
  preferable to the i8254 and the HPET is preferable to both of them.

  This is not *always* true, but it is true on the older machines that
  can't use tsc_delay(), so the assumption works in practice.

  Outside of those three timers, the hierarchy gets murky.  There are
  other timers that are better than the HPET, but they aren't always
  available.  If those timers are already providing delay_func this
  code does not usurp them.

- Duplicate test measurement code from amd64/lapic.c into i386/lapic.c.
  Will be removed in the committed version.

- Use bus_space_read_8() in acpihpet.c if it's available.  The HPET is
  a 64-bit counter and the spec permits 32-bit or 64-bit aligned access.

  As one might predict, this cuts the overhead in half because we're
  doing half as many reads.

  This part can go into a separate commit, but I thought it was neat
  so I'm including it here.

One remaining question I have:

Is there a nice way to test whether ACPI PMT support is compiled into
the kernel?  We can assume the existence of i8254_delay() because
clock.c is required on amd64 and i386.  However, acpitimer.c is a
optional, so acpitimer_delay() isn't necessarily there.

I would rather not introduce a hard requirement on acpitimer.c into
acpihpet.c if there's an easy way to check for the latter.

Any ideas?

Anyone have i386 hardware results?  If I'm reading the timeline right,
most P6 machines and beyond (NetBurst, etc) will have an ACPI PMT.  I
don't know if any real x86 motherboards shipped with an HPET, but it's
possible.

Here are my updated results with the bus_space_read_8 change:

acpitimer_test_delay:  expected  0.01000  actual  0.10607  error  
0.09607
acpitimer_test_delay:  expected  0.1  actual  0.15491  error  
0.05491
acpitimer_test_delay:  expected  0.00010  actual  0.000107734  error  
0.07734
acpitimer_test_delay:  expected  0.00100  actual  0.001008006  error  
0.08006
acpitimer_test_delay:  expected  0.01000  actual  0.010007042  error  
0.07042

acpihpet_test_delay:   expected  0.01000  actual  0.13282  error  
0.12282
acpihpet_test_delay:   expected  0.1  actual  0.22743  error  
0.12743
acpihpet_test_delay:   

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-15 Thread Mike Larkin
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> Hi,
>
> In the future when the LAPIC timer is run in oneshot mode there will
> be no lapic_delay().
>
> This is fine if you have a constant TSC, because we have tsc_delay().
>
> This is *very* bad for older amd64 machines, because you are left with
> i8254_delay().
>
> I would like to offer a less awful delay(9) implementation for this
> class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> MP kernels because only one CPU can read the i8254 at a time.
>
> I think patrick@ was struggling with some version of that problem last
> year, but in a VM.
>
> Real i386 hardware should be fine.  Later models with an ACPI PM timer
> will be fine using acpitimer_delay() instead of i8254_delay().
>
> If this seems reasonable to people I will come back with a cleaned up
> patch for testing.
>
> Thoughts?  Preferences?
>

Seems reasonable to me. Would be interested to see the revised diff once you're
done.

-ml

> -Scott
>
> Here are the sample measurements from my 2017 laptop (kaby lake
> refresh) running the attached patch.  It takes longer than a
> microsecond to read either of the ACPI timers.  The PM timer is better
> than the HPET.  The HPET is a bit better than the i8254.  I hope the
> numbers are a little better on older hardware.
>
> acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> 0.09638
> acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> 0.05464
> acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> 0.07619
> acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> 0.07275
> acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> 0.07891
>
> acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> 0.21208
> acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> 0.21690
> acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> 0.12647
> acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> 0.21480
> acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> 0.13736
>
> i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> 0.39110
> i8254_test_delay:  expected  0.1  actual  0.39471  error  
> 0.29471
> i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> 0.28031
> i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> 0.24586
> i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> 0.21859
>
> Index: dev/acpi/acpihpet.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
> retrieving revision 1.26
> diff -u -p -r1.26 acpihpet.c
> --- dev/acpi/acpihpet.c   6 Apr 2022 18:59:27 -   1.26
> +++ dev/acpi/acpihpet.c   15 Aug 2022 04:21:58 -
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> @@ -31,8 +32,9 @@ int acpihpet_attached;
>  int acpihpet_match(struct device *, void *, void *);
>  void acpihpet_attach(struct device *, struct device *, void *);
>  int acpihpet_activate(struct device *, int);
> -
> +void acpiphet_delay(u_int);
>  u_int acpihpet_gettime(struct timecounter *tc);
> +void acpihpet_test_delay(u_int);
>
>  uint64_t acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
>   bus_size_t _ioa);
> @@ -262,7 +264,7 @@ acpihpet_attach(struct device *parent, s
>   freq = 1000ull / period;
>   printf(": %lld Hz\n", freq);
>
> - hpet_timecounter.tc_frequency = (uint32_t)freq;
> + hpet_timecounter.tc_frequency = freq;
>   hpet_timecounter.tc_priv = sc;
>   hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
>   tc_init(_timecounter);
> @@ -273,10 +275,43 @@ acpihpet_attach(struct device *parent, s
>   acpihpet_attached++;
>  }
>
> +void
> +acpihpet_delay(u_int usecs)
> +{
> + uint64_t d, s;
> + struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
> +
> + d = usecs * hpet_timecounter.tc_frequency / 100;
> + s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
> + while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < d)
> + CPU_BUSY_CYCLE();
> +}
> +
>  u_int
>  acpihpet_gettime(struct timecounter *tc)
>  {
>   struct acpihpet_softc *sc = tc->tc_priv;
>
>   return (bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER));
> +}
> +
> +void
> +acpihpet_test_delay(u_int usecs)
> +{
> + struct timespec ac, er, ex, t0, t1;
> +
> + if (!acpihpet_attached) {
> + printf("%s: (no hpet attached)\n", __func__);
> + return;
> + }
> +
> + nanouptime();
> + acpihpet_delay(usecs);
> + nanouptime();
> + timespecsub(, , );
> + 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-15 Thread Jonathan Gray
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> Hi,
> 
> In the future when the LAPIC timer is run in oneshot mode there will
> be no lapic_delay().
> 
> This is fine if you have a constant TSC, because we have tsc_delay().
> 
> This is *very* bad for older amd64 machines, because you are left with
> i8254_delay().
> 
> I would like to offer a less awful delay(9) implementation for this
> class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> MP kernels because only one CPU can read the i8254 at a time.

i8254 is often gated on Intel since around Comet Lake
which is why we started replacing delay_func with tsc_delay earlier

when delay was still 8254 in early boot the cpu_freq_ctr() which
gets the cpu speed with performance counter msrs and delay gave
a wrong result as delay didn't work as expected

sys/arch/amd64/amd64/identcpu.c
revision 1.119
date: 2021/08/31 15:11:54;  author: kettenis;  state: Exp;  lines: +4 -1;  
commitid: fyhbiR533m0v3XO9;
Use the TSC delay(9) backend earlier on machines where we can.  Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


sys/arch/amd64/amd64/lapic.c
revision 1.55
date: 2019/08/03 14:57:51;  author: jcs;  state: Exp;  lines: +8 -4;  commitid: 
NTF41FCHEhE2NllV;
If the CPU frequency is available during TSC init, make it available
for lapic timer init to avoid calibrating against the 8254 clock.
Some newer machines are shipping with the 8254 clock gated for power
saving, so it may not be usable.

ok mlarkin
discussed with deraadt


On some Intel hardware getting the frequency out of cpuid for the tsc
isn't possible.  tsc.c 1.22 describes some of the mess

> 
> I think patrick@ was struggling with some version of that problem last
> year, but in a VM.

patrick moved pvbus probing so hv_delay would be used earlier

sys/arch/amd64/amd64/mainbus.c

revision 1.51
date: 2021/08/31 15:52:59;  author: patrick;  state: Exp;  lines: +1 -7;  
commitid: reA7Lsk1mKRhapKD;
Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay().  This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


> 
> Real i386 hardware should be fine.  Later models with an ACPI PM timer
> will be fine using acpitimer_delay() instead of i8254_delay().
> 
> If this seems reasonable to people I will come back with a cleaned up
> patch for testing.
> 
> Thoughts?  Preferences?
> 
> -Scott
> 
> Here are the sample measurements from my 2017 laptop (kaby lake
> refresh) running the attached patch.  It takes longer than a
> microsecond to read either of the ACPI timers.  The PM timer is better
> than the HPET.  The HPET is a bit better than the i8254.  I hope the
> numbers are a little better on older hardware.
> 
> acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> 0.09638
> acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> 0.05464
> acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> 0.07619
> acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> 0.07275
> acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> 0.07891
> 
> acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> 0.21208
> acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> 0.21690
> acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> 0.12647
> acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> 0.21480
> acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> 0.13736
> 
> i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> 0.39110
> i8254_test_delay:  expected  0.1  actual  0.39471  error  
> 0.29471
> i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> 0.28031
> i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> 0.24586
> i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> 0.21859

cpu0: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz, 2194.88 MHz, 06-0f-0d
cpu0: apic clock running at 199MHz
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpitimer_test_delay: expected 0.01000 actual 0.10267 error 0.09267
acpitimer_test_delay: expected 0.1 actual 0.18648 error 0.08648
acpitimer_test_delay: expected 0.00010 actual 0.000108254 error 0.08254
acpitimer_test_delay: expected 0.00100 actual