tsc: AMD Family 17h, 19h: compute frequency from MSRs

2022-09-29 Thread Scott Cheloha
This patch computes the TSC frequency for AMD family 17h and 19h CPUs
(Zen microarchitecture and up) from AMD-specific MSRs.  Computing the
TSC frequency is faster than calibrating with a separate timer and
introduces no error.

We already do this for Intel CPUs in tsc_freq_cpuid().

I got several successful test reports on family 17h and 19h CPUs in
response to this mail:

https://marc.info/?l=openbsd-tech=166394236029484=2

The details for computing the frequency are in the PPR for 17h and
19h, found here (page numbers are cited in the patch):

https://www.amd.com/system/files/TechDocs/55570-B1-PUB.zip
https://www.amd.com/system/files/TechDocs/56214-B0-PUB.zip

The process is slightly more complicated on AMD CPU families 10h-16h.
I will deal with them in a separate commit.

ok?

Index: include/specialreg.h
===
RCS file: /cvs/src/sys/arch/amd64/include/specialreg.h,v
retrieving revision 1.94
diff -u -p -r1.94 specialreg.h
--- include/specialreg.h30 Aug 2022 17:09:21 -  1.94
+++ include/specialreg.h29 Sep 2022 14:12:53 -
@@ -540,6 +540,10 @@
  */
 #defineMSR_HWCR0xc0010015
 #defineHWCR_FFDIS  0x0040
+#defineHWCR_TSCFREQSEL 0x0100
+
+#defineMSR_PSTATEDEF(_n)   (0xc0010064 + (_n))
+#definePSTATEDEF_EN0x8000ULL
 
 #defineMSR_NB_CFG  0xc001001f
 #defineNB_CFG_DISIOREQLOCK 0x0004ULL
Index: amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.29
diff -u -p -r1.29 tsc.c
--- amd64/tsc.c 22 Sep 2022 04:57:08 -  1.29
+++ amd64/tsc.c 29 Sep 2022 14:12:53 -
@@ -100,6 +100,67 @@ tsc_freq_cpuid(struct cpu_info *ci)
return (0);
 }
 
+uint64_t
+tsc_freq_msr(struct cpu_info *ci)
+{
+   uint64_t base, def, divisor, multiplier;
+
+   if (strcmp(cpu_vendor, "AuthenticAMD") != 0)
+   return 0;
+
+   /*
+* All 10h+ CPUs have Core::X86::Msr:HWCR and the TscFreqSel
+* bit.  If TscFreqSel hasn't been set, the TSC isn't advancing
+* at the core P0 frequency and we need to calibrate by hand.
+*/
+   if (ci->ci_family < 0x10)
+   return 0;
+   if (!ISSET(rdmsr(MSR_HWCR), HWCR_TSCFREQSEL))
+   return 0;
+
+   /*
+* In 10h+ CPUs, Core::X86::Msr::PStateDef defines the voltage
+* and frequency for each core P-state.  We want the P0 frequency.
+* If the En bit isn't set, the register doesn't define a valid
+* P-state.
+*/
+   def = rdmsr(MSR_PSTATEDEF(0));
+   if (!ISSET(def, PSTATEDEF_EN))
+   return 0;
+
+   switch (ci->ci_family) {
+   case 0x17:
+   case 0x19:
+   /*
+* PPR for AMD Family 17h [...]:
+* Models 01h,08h B2, Rev 3.03, pp. 33, 139-140
+* Model 18h B1, Rev 3.16, pp. 36, 143-144
+* Model 60h A1, Rev 3.06, pp. 33, 155-157
+* Model 71h B0, Rev 3.06, pp. 28, 150-151
+*
+* PPR for AMD Family 19h [...]:
+* Model 21h B0, Rev 3.05, pp. 33, 166-167
+*
+* OSRR for AMD Family 17h processors,
+* Models 00h-2Fh, Rev 3.03, pp. 130-131
+*/
+   base = 2;   /* 200.0 MHz */
+   divisor = (def >> 8) & 0x3f;
+   if (divisor <= 0x07 || divisor >= 0x2d)
+   return 0;   /* reserved */
+   if (divisor >= 0x1b && divisor % 2 == 1)
+   return 0;   /* reserved */
+   multiplier = def & 0xff;
+   if (multiplier <= 0x0f)
+   return 0;   /* reserved */
+   break;
+   default:
+   return 0;
+   }
+
+   return base * multiplier / divisor;
+}
+
 void
 tsc_identify(struct cpu_info *ci)
 {
@@ -118,6 +179,8 @@ tsc_identify(struct cpu_info *ci)
tsc_is_invariant = 1;
 
tsc_frequency = tsc_freq_cpuid(ci);
+   if (tsc_frequency == 0)
+   tsc_frequency = tsc_freq_msr(ci);
if (tsc_frequency > 0)
delay_init(tsc_delay, 5000);
 }



Re: [please test] tsc: derive frequency on AMD CPUs from MSRs

2022-09-23 Thread Scott Cheloha
On Fri, Sep 23, 2022 at 07:46:55PM -0600, Theo de Raadt wrote:
> > And it is the wrong time in the release cycle for this.
> 
> No kidding.
> 
> As this makes absolutely no difference for any existing code in 7.2,
> except the strong hazard of accidentally breaking a machine.

It does not need to make release.



Re: [please test] tsc: derive frequency on AMD CPUs from MSRs

2022-09-23 Thread Scott Cheloha
On Sat, Sep 24, 2022 at 11:06:24AM +1000, Jonathan Gray wrote:
> On Fri, Sep 23, 2022 at 09:16:25AM -0500, Scott Cheloha wrote:
> > [...]
> > 
> > The only missing piece is code to read the configuration space on
> > family 10h-16h CPUs to determine how many boosted P-states we need to
> > skip to get to the MSR describing the software P0 state.  I would
> > really appreciate it if someone could explain how to do this at this
> > very early point in boot.  jsg@ pointed me to pci_conf_read(9), but
> > I'm a little confused about how I get the needed pci* inputs at this
> > point in boot.
> 
> I also said you shouldn't be looking at pci devices for this.

Right, but the manual says that's where the information I want is
located.

I might be wrong, of course.  Can't know until I get a test on a CPU
in one of the relevant families.

> I remain unconvinced that all of this is worth it compared to
> calibrating off a timer with a known rate.

For Intel CPUs we use CPUID to determine the TSC frequency where the
leaf is available.  It seems "fair," for lack of a better word, to
make an effort to do the same for AMD CPUs.

The other available timers with known frequencies are not great and
they might be getting worse.  The ISA timer is heavily gated out of
the box on many contemporary machines where it is available.  You can
toggle the gating in the BIOS for now.The PM Timer and the HPET have
been slow to read for years.

I doubt these timers will improve.  At minimum, I think it's safe to
say that they are not a priority.  They are considered "legacy"
hardware, and you know what happens to the legacy stuff.

Calibrating the TSC with one of these other timers introduces error:

1. jmc@'s machine (-0.187% error):

cpu0: MSR C001_0064: en 1 base 2 mul 100 div 10 freq 20 Hz
tsc: calibrating with acpihpet0: 1996260074 Hz

2. robert@'s machine (-0.187% error):

cpu0: MSR C001_0064: en 1 base 2 mul 156 div 8 freq 39 Hz
tsc: calibrating with acpihpet0: 3892696616 Hz

3. Timo Myrra's machine (-0.187% error):

cpu0: MSR C001_0064: en 1 base 2 mul 100 div 10 freq 20 Hz
tsc: calibrating with acpihpet0: 1996264149 Hz

The calibration code can be improved, and I have a patch waiting in
the wings which does so, but you can't beat just *knowing* the
frequency.

... I think we need to make the TSC "just work" in as many contexts as
possible, especially on newer machines.

> And it is the wrong time in the release cycle for this.

This doesn't need to make release, I'm just gauging interest and
testing code.

> Boost could be disabled for the measurement if need by.
> 
> AMD64 Architecture Programmer's Manual
> Volume 2: System Programming
> Publication No. 24593
> Revision 3.38
> 
> "17.2 Core Performance Boost
> ...
> CPB can be disabled using the CPBDis field of the Hardware Configuration
> Register (HWCR MSR) on the appropriate core. When CPB is disabled,
> hardware limits the frequency and voltage of the core to those defined
> by P0.
> 
> Support for core performance boost is indicated by
> CPUID Fn8000_0007_EDX[CPB] = 1."
> 
> "3.2.10 Hardware Configuration Register (HWCR)
> ...
> CpbDis. Bit 25. Core performance boost disable. When set to 1, core 
> performance boost is disabled.
> "
> 
> Processor Programming Reference (PPR)
> for AMD Family 17h Model 01h, Revision B1 Processors
> 54945 Rev 1.14 - April 15, 2017
> 
> "MSRC001_0015 [Hardware Configuration] (HWCR)
> 
> 25 CpbDis: core performance boost disable. Read-write.
> Reset: 0.  0=CPB is requested to be enabled.  1=CPB is disabled.
> Specifies whether core performance boost is requested to be enabled or
> disabled. If core performance boost is disabled while a core is in a
> boosted P-state, the core automatically transitions to the highest
> performance non-boosted P-state."
> 
> [...]

(Caveat: I might be wrong.)

I believe this is only a toggle for whether the CPU can enter or
remain in a boosted P-state.  I do not think that toggling the feature
on or off rewrites the P-state voltage/frequency MSRs on the fly.
Toggling on or toggled off, we will still need a way to
programmatically decide whether a given MSR describes a boosted
P-state or P0.

I have a line on a Sempron machine (family 10h) south of Austin, TX,
$100.  If it works when I pick it up I will probably have it set up to
test within a few days.



Re: [please test] tsc: derive frequency on AMD CPUs from MSRs

2022-09-23 Thread Scott Cheloha
On Fri, Sep 23, 2022 at 10:40:19PM +0300, Timo Myyr?? wrote:
> Scott Cheloha  [2022-09-23, 09:16 -0500]:
> 
> > [...]
> >
> > Test results?  Clues on reading the configuration space?
> >
> > [...]
> 
> Hi,
> 
> Here's a dmesg from thinkpad e485:

Thanks for testing.

> Does these timers affect the booting of kernel? Once I select the kernel
> to boot by pressing enter on "bsd>" line, the boot process takes about
> 18s to proceed from the "booting sr0a:/bsd".

The patch reads a couple MSRs and prints ~10 additional lines during
boot from the primary CPU.  The computed TSC frequency is not used by
the kernel, only printed so I can check whether my code is correct.

It should have zero impact on the length of the boot.  It should not
change any runtime behavior whatsoever.

Your boot probably should not be taking that long, but I can't imagine
how my patch would cause such a dramatic change.

If you reverse the patch, what happens?

> OpenBSD 7.2 (GENERIC.MP) #20: Fri Sep 23 22:27:31 EEST 2022
> t...@asteroid.bittivirhe.fi:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> [...]
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: MSR C001_0064: en 1 base 2 mul 100 div 10 freq 20 Hz
> cpu0: MSR C001_0065: en 1 base 2 mul 102 div 12 freq 17 Hz
> cpu0: MSR C001_0066: en 1 base 2 mul 96 div 12 freq 16 Hz
> cpu0: MSR C001_0067: en 0
> cpu0: MSR C001_0068: en 0
> cpu0: MSR C001_0069: en 0
> cpu0: MSR C001_006A: en 0
> cpu0: MSR C001_006B: en 0
> cpu0: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.30 MHz, 17-11-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way D-cache, 64KB 64b/line 4-way I-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> tsc: calibrating with acpihpet0: 1996264149 Hz

Your family 17h CPU has a computed P0 frequency of 2000MHz.  The
calibrated TSC frequency is 1996264149 Hz.

That seems right to me, thank you for testing.



[please test] tsc: derive frequency on AMD CPUs from MSRs

2022-09-23 Thread Scott Cheloha
Hi,

TL;DR:

I want to compute the TSC frequency on AMD CPUs using the methods laid
out in the AMD manuals instead of calibrating the TSC by hand.

If you have an AMD CPU with an invariant TSC, please apply this patch,
recompile/boot the resulting kernel, and send me the resulting dmesg.

Family 10h-16h CPUs are especially interesting.  If you've got one,
don't be shy!

Long explanation:

On AMD CPUs we calibrate the TSC with a separate timer.  This is slow
and introduces error.  I also worry about a future where legacy timers
are absent or heavily gated (read: useless).

This patch adds most of the code needed to compute the TSC frequency
on AMD family 10h+ CPUs.  CPUs prior to family 10h did not support an
invariant TSC so they are irrelevant.

I have riddled the code with printf(9) calls so I can work out what's
wrong by hand if a test result makes no sense.

The only missing piece is code to read the configuration space on
family 10h-16h CPUs to determine how many boosted P-states we need to
skip to get to the MSR describing the software P0 state.  I would
really appreciate it if someone could explain how to do this at this
very early point in boot.  jsg@ pointed me to pci_conf_read(9), but
I'm a little confused about how I get the needed pci* inputs at this
point in boot.

--

Test results?  Clues on reading the configuration space?

-Scott

Index: tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.29
diff -u -p -r1.29 tsc.c
--- tsc.c   22 Sep 2022 04:57:08 -  1.29
+++ tsc.c   23 Sep 2022 14:04:22 -
@@ -100,6 +100,253 @@ tsc_freq_cpuid(struct cpu_info *ci)
return (0);
 }
 
+uint64_t
+tsc_freq_msr(struct cpu_info *ci)
+{
+   uint64_t base, def, did, did_lsd, did_msd, divisor, fid, multiplier;
+   uint32_t msr, off = 0;
+
+   if (strcmp(cpu_vendor, "AuthenticAMD") != 0)
+   return 0;
+
+   /*
+* All family 10h+ CPUs have MSR_HWCR and the TscFreqSel bit.
+* If TscFreqSel is not set the TSC does not advance at the P0
+* frequency, in which case something is wrong and we need to
+* calibrate by hand.
+*/
+#define HWCR_TSCFREQSEL (1 << 24)
+   if (!ISSET(rdmsr(MSR_HWCR), HWCR_TSCFREQSEL))   /* XXX specialreg.h */
+   return 0;
+#undef HWCR_TSCFREQSEL
+
+   /*
+* For families 10h, 12h, 14h, 15h, and 16h, we need to skip past
+* the boosted P-states (Pb0, Pb1, etc.) to find the MSR describing
+* P0, i.e. the highest performance unboosted P-state.  The number
+* of boosted states is kept in the "Core Performance Boost Control"
+* configuration space register.
+*/
+#ifdef __not_yet__
+   uint32_t reg;
+   switch (ci->ci_family) {
+   case 0x10:
+   /* XXX How do I read config space at this point in boot? */
+   reg = read_config_space(F4x15C);
+   off = (reg >> 2) & 0x1;
+   break;
+   case 0x12:
+   case 0x14:
+   case 0x15:
+   case 0x16:
+   /* XXX How do I read config space at this point in boot? */
+   reg = read_config_space(D18F4x15C);
+   off = (reg >> 2) & 0x7;
+   break;
+   default:
+   break;
+   }
+#endif
+
+/* DEBUG Let's look at all the MSRs to check my math. */
+for (; off < 8; off++) {
+
+   /*
+* In family 10h+, core P-state voltage/frequency definitions
+* are kept in MSRs C001_006[4:B] (eight registers in total).
+* All MSRs in the range are readable, but if the EN bit isn't
+* set the register doesn't define a valid P-state.
+*/
+   msr = 0xc0010064 + off; /* XXX specialreg.h */
+   def = rdmsr(msr);
+   printf("%s: MSR %04X_%04X: en %d",
+   ci->ci_dev->dv_xname, msr >> 16, msr & 0x,
+   !!ISSET(def, 1ULL << 63));
+   if (!ISSET(def, 1ULL << 63)) {  /* XXX specialreg.h */
+   printf("\n");
+   continue;
+   }
+   switch (ci->ci_family) {
+   case 0x10:
+   /* AMD Family 10h Processor BKDG, Rev 3.62, p. 429 */
+   base = 1;   /* 100.0 MHz */
+   did = (def >> 6) & 0x7;
+   divisor = 1ULL << did;
+   fid = def & 0x1f;
+   multiplier = fid + 0x10;
+   printf(" base %llu did %llu div %llu fid %llu mul %llu",
+   base, did, divisor, fid, multiplier);
+   break;
+   case 0x11:
+   /* AMD Family 11h Processor BKDG, Rev 3.62, p. 236 */
+   base = 1;   /* 100.0 MHz */
+   did = (def >> 6) & 0x7;
+   divisor = 1ULL << did;
+   fid = def & 0x1f;
+   multiplier = fid + 0x8;
+   printf(" base %llu did %llu div %llu fid %llu mul %llu",
+   base, did, 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-12 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 08:24:11AM -0500, Scott Cheloha wrote:
> On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote:
> > On 09/07, Scott Cheloha wrote:
> > > Just to make sure that my changes to acpihpet(4) actually caused
> > > the problem, I have a few more questions:
> > > 
> > > 1. When did you change the OS type?
> > 
> > 03 August, after that, there was a local snpashot built from sources
> > fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected.
> > 
> > > 2. What was the compilation date of the kernel where you first saw the
> > >problem?
> > 
> > Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34
> > +0300.
> >  
> > > 3. If you boot an unpatched kernel does the problem manifest in
> > >exactly the same way?
> >  
> > As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11
> > versions (32-bit)", but to "FreeBSD 11 (32-bit)".
> > The problem affects only VMs which have Guest OS Version set to "FreeBSD
> > Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022.
> > 
> > Sample outputs from the machine running older snapshot which is not
> > affected:
> > 
> > $ sysctl kern.version | head -n1
> > kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK
> > 2022
> > 
> > $ sysctl | grep tsc
> > kern.timecounter.hardware=tsc
> > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> > acpitimer0(1000)
> > machdep.tscfreq=1500017850
> > machdep.invarianttsc=1
> 
> Please send a full bug report to b...@openbsd.org.
> 
> Include full a dmesg for the affected machine.
> 
> Something else with VMWare is involved here and I'm not enough of an
> expert to say what.  More eyes will be helpful.
> 
> Your problem is *probably* caused by my changes, but there are too
> many moving parts with all the different VMWare configurations for me
> to narrow down what the issue is definitively.

I have committed the acpihpet(4) fix.

Please send a full bug report to b...@openbsd.org so we can continue
poking at this.  I have some test code I'd like you to try out so we
can get a better look at the TSC calibration process on your machine.



sparc64: 32-bit compatibility cleanup

2022-09-11 Thread Scott Cheloha
kettenis@ suggested in a different thread that we ought to clean up
the 32-bit compatibility cruft in the sparc64 machine headers before
it would be safe to move the clockframe definition into frame.h:

https://marc.info/?l=openbsd-tech=166179164008301=2

> We really should be getting rid of the xxx32 stuff and rename the
> xxx64 ones to xxx.  And move trapframe (and possibly rwindow) to
> frame.h.

miod@ came forward in private and offered the attached patch to do so.

I don't have a sparc64 machine so I can't test it.  But if this
cleanup is indeed a necessary step to consolidating the clockframe
definitions I guess I can just ask:

Does this patch work for everyone?  Can we go ahead with this?

Index: dev/creator.c
===
RCS file: /OpenBSD/src/sys/arch/sparc64/dev/creator.c,v
retrieving revision 1.55
diff -u -p -r1.55 creator.c
--- dev/creator.c   15 Jul 2022 17:57:26 -  1.55
+++ dev/creator.c   30 Aug 2022 18:33:27 -
@@ -33,8 +33,9 @@
 #include 
 #include 
 
-#include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
Index: fpu/fpu.c
===
RCS file: /OpenBSD/src/sys/arch/sparc64/fpu/fpu.c,v
retrieving revision 1.21
diff -u -p -r1.21 fpu.c
--- fpu/fpu.c   19 Aug 2020 10:10:58 -  1.21
+++ fpu/fpu.c   30 Aug 2022 18:33:27 -
@@ -81,22 +81,22 @@
 #include 
 
 int fpu_regoffset(int, int);
-int fpu_insn_fmov(struct fpstate64 *, struct fpemu *, union instr);
-int fpu_insn_fabs(struct fpstate64 *, struct fpemu *, union instr);
-int fpu_insn_fneg(struct fpstate64 *, struct fpemu *, union instr);
+int fpu_insn_fmov(struct fpstate *, struct fpemu *, union instr);
+int fpu_insn_fabs(struct fpstate *, struct fpemu *, union instr);
+int fpu_insn_fneg(struct fpstate *, struct fpemu *, union instr);
 int fpu_insn_itof(struct fpemu *, union instr, int, int *,
 int *, u_int *);
 int fpu_insn_ftoi(struct fpemu *, union instr, int *, int, u_int *);
 int fpu_insn_ftof(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fsqrt(struct fpemu *, union instr, int *, int *, u_int *);
-int fpu_insn_fcmp(struct fpstate64 *, struct fpemu *, union instr, int);
+int fpu_insn_fcmp(struct fpstate *, struct fpemu *, union instr, int);
 int fpu_insn_fmul(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fmulx(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fdiv(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fadd(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fsub(struct fpemu *, union instr, int *, int *, u_int *);
-int fpu_insn_fmovcc(struct proc *, struct fpstate64 *, union instr);
-int fpu_insn_fmovr(struct proc *, struct fpstate64 *, union instr);
+int fpu_insn_fmovcc(struct proc *, struct fpstate *, union instr);
+int fpu_insn_fmovr(struct proc *, struct fpstate *, union instr);
 void fpu_fcopy(u_int *, u_int *, int);
 
 #ifdef DEBUG
@@ -115,7 +115,7 @@ fpu_dumpfpn(struct fpn *fp)
fp->fp_mant[2], fp->fp_mant[3], fp->fp_exp);
 }
 void
-fpu_dumpstate(struct fpstate64 *fs)
+fpu_dumpstate(struct fpstate *fs)
 {
int i;
 
@@ -189,7 +189,7 @@ fpu_fcopy(src, dst, type)
 void
 fpu_cleanup(p, fs)
register struct proc *p;
-   register struct fpstate64 *fs;
+   register struct fpstate *fs;
 {
register int i, fsr = fs->fs_fsr, error;
union instr instr;
@@ -455,7 +455,7 @@ fpu_execute(p, fe, instr)
  */
 int
 fpu_insn_fmov(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -478,7 +478,7 @@ fpu_insn_fmov(fs, fe, instr)
  */
 int
 fpu_insn_fabs(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -502,7 +502,7 @@ fpu_insn_fabs(fs, fe, instr)
  */
 int
 fpu_insn_fneg(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -644,7 +644,7 @@ fpu_insn_fsqrt(fe, instr, rdp, rdtypep, 
  */
 int
 fpu_insn_fcmp(fs, fe, instr, cmpe)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
int cmpe;
@@ -848,7 +848,7 @@ fpu_insn_fsub(fe, instr, rdp, rdtypep, s
 int
 fpu_insn_fmovcc(p, fs, instr)
struct proc *p;
-   struct fpstate64 *fs;
+   struct fpstate *fs;
union instr instr;
 {
int rtype, rd, rs, cond;
@@ -900,7 +900,7 @@ fpu_insn_fmovcc(p, fs, instr)
 int
 fpu_insn_fmovr(p, fs, instr)
struct proc *p;
-   struct fpstate64 *fs;
+   struct fpstate *fs;
union instr instr;
 {
int rtype, rd, rs2, rs1;
Index: fpu/fpu_emu.h
===
RCS file: /OpenBSD/src/sys/arch/sparc64/fpu/fpu_emu.h,v
retrieving revision 1.5
diff -u -p -r1.5 fpu_emu.h
--- 

amd64, i386: lapic_calibrate_timer: panic if timer calibration fails

2022-09-10 Thread Scott Cheloha
Hi,

In lapic_calibrate_timer() we only conditionally decide to use the
lapic timer as our interrupt clock.  That is, lapic timer calibration
can fail and the system will boot anyway.

If after measuring the lapic timer frequency we somehow come up with
zero hertz, we do *not* set initclock_func to lapic_initclocks().
Here's the relevant bits from amd64/lapic.c:

   554  skip_calibration:
   555  printf("%s: apic clock running at %dMHz\n",
   556  ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000));
   557  
   558  if (lapic_per_second != 0) {

  [...] /* (skip ahead a bit...) */

   588  /*
   589   * Now that the timer's calibrated, use the apic timer 
routines
   590   * for all our timing needs..
   591   */
   592  delay_init(lapic_delay, 3000);
   593  initclock_func = lapic_initclocks;
   594  }
   595  }

Line 558.  The corresponding code is identical in i386/lapic.c.

I went ahead and tried it on amd64.  If you force lapic_per_second to
zero the system still boots, but the secondary CPUs just sit idle.
lapic_tval is zero, so when they call lapic_startclock() from
cpu_hatch(), nothing happens.  The i8254 still sends clock interrupts
to CPU0, though, so the system runs in a oddball state where one
processor is doing all the work.

I don't think that this is the intended behavior.  I think this is
just an oversight left over from some older code.  It would be a lot
more sensible to just panic if lapic_per_second is zero here.  Patch
attached.

If a bunch of you prefer to develop a more elaborate fallback scheme
where we don't hatch the secondary CPUs in the event that lapic timer
calibration fails, we could explore that later.  But for now I would
prefer to panic and try to spotlight the problem if it ever occurs in
the wild.

If this change is too risky -- maybe I am breaking someone's weird
setup? -- I can wait until after release.

Thoughts?  Preferences?

Index: amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.63
diff -u -p -r1.63 lapic.c
--- amd64/amd64/lapic.c 10 Sep 2022 01:30:14 -  1.63
+++ amd64/amd64/lapic.c 10 Sep 2022 01:59:52 -
@@ -555,43 +555,44 @@ skip_calibration:
printf("%s: apic clock running at %dMHz\n",
ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000));
 
-   if (lapic_per_second != 0) {
-   /*
-* reprogram the apic timer to run in periodic mode.
-* XXX need to program timer on other cpu's, too.
-*/
-   lapic_tval = (lapic_per_second * 2) / hz;
-   lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
-
-   lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval);
-
-   /*
-* Compute fixed-point ratios between cycles and
-* microseconds to avoid having to do any division
-* in lapic_delay.
-*/
-
-   tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second;
-   lapic_frac_usec_per_cycle = tmp;
-
-   tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100;
-
-   lapic_frac_cycle_per_usec = tmp;
-
-   /*
-* Compute delay in cycles for likely short delays in usec.
-*/
-   for (i = 0; i < 26; i++)
-   lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >>
-   32;
-
-   /*
-* Now that the timer's calibrated, use the apic timer routines
-* for all our timing needs..
-*/
-   delay_init(lapic_delay, 3000);
-   initclock_func = lapic_initclocks;
-   }
+   if (lapic_per_second == 0)
+   panic("%s: apic timer calibration failed", __func__);
+
+   /*
+* reprogram the apic timer to run in periodic mode.
+* XXX need to program timer on other cpu's, too.
+*/
+   lapic_tval = (lapic_per_second * 2) / hz;
+   lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
+
+   lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval);
+
+   /*
+* Compute fixed-point ratios between cycles and
+* microseconds to avoid having to do any division
+* in lapic_delay.
+*/
+
+   tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second;
+   lapic_frac_usec_per_cycle = tmp;
+
+   tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100;
+
+   lapic_frac_cycle_per_usec = tmp;
+
+   /*
+* Compute delay in cycles for likely short delays in usec.
+*/
+   for (i = 0; i < 26; i++)
+   lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >>
+   32;
+
+   /*
+* Now that the timer's 

top(1): remove last vestiges of "last pid"

2022-09-09 Thread Scott Cheloha
millert@ removed most of the "last pid" pieces from top(1) in 1997:

http://cvsweb.openbsd.org/src/usr.bin/top/machine.c?rev=1.7=text/x-cvsweb-markup

Some small bits remain, though.  Can we remove the rest?

Index: display.c
===
RCS file: /cvs/src/usr.bin/top/display.c,v
retrieving revision 1.66
diff -u -p -r1.66 display.c
--- display.c   8 Aug 2022 16:54:09 -   1.66
+++ display.c   10 Sep 2022 01:09:21 -
@@ -234,7 +234,7 @@ format_uptime(char *buf, size_t buflen)
 
 
 void
-i_loadave(pid_t mpid, double *avenrun)
+i_loadave(double *avenrun)
 {
if (screen_length > 1 || !smart_terminal) {
int i;
@@ -243,10 +243,6 @@ i_loadave(pid_t mpid, double *avenrun)
clrtoeol();
 
addstrp("load averages");
-   /* mpid == -1 implies this system doesn't have an _mpid */
-   if (mpid != -1)
-   printwp("last pid: %5ld;  ", (long) mpid);
-
for (i = 0; i < 3; i++)
printwp("%c %5.2f", i == 0 ? ':' : ',', avenrun[i]);
}
Index: display.h
===
RCS file: /cvs/src/usr.bin/top/display.h,v
retrieving revision 1.15
diff -u -p -r1.15 display.h
--- display.h   17 Nov 2018 23:10:08 -  1.15
+++ display.h   10 Sep 2022 01:09:21 -
@@ -35,7 +35,7 @@
 
 /* prototypes */
 int display_resize(void);
-void i_loadave(int, double *);
+void i_loadave(double *);
 void u_loadave(int, double *);
 void i_timeofday(time_t *);
 void i_procstates(int, int *, int);
Index: machine.c
===
RCS file: /cvs/src/usr.bin/top/machine.c,v
retrieving revision 1.111
diff -u -p -r1.111 machine.c
--- machine.c   22 Feb 2022 17:35:01 -  1.111
+++ machine.c   10 Sep 2022 01:09:21 -
@@ -306,7 +306,6 @@ get_system_info(struct system_info *si)
si->cpustates = cpu_states;
si->cpuonline = cpu_online;
si->memory = memory_stats;
-   si->last_pid = -1;
 }
 
 static struct handle handle;
Index: machine.h
===
RCS file: /cvs/src/usr.bin/top/machine.h,v
retrieving revision 1.31
diff -u -p -r1.31 machine.h
--- machine.h   26 Aug 2020 16:21:28 -  1.31
+++ machine.h   10 Sep 2022 01:09:21 -
@@ -49,7 +49,6 @@ struct statics {
  */
 
 struct system_info {
-   pid_t   last_pid;
double  load_avg[NUM_AVERAGES];
int p_total;
int p_active;   /* number of procs considered
Index: top.c
===
RCS file: /cvs/src/usr.bin/top/top.c,v
retrieving revision 1.106
diff -u -p -r1.106 top.c
--- top.c   26 Aug 2020 16:21:28 -  1.106
+++ top.c   10 Sep 2022 01:09:24 -
@@ -560,7 +560,7 @@ restart:
proc_compares[order_index]);
 
/* display the load averages */
-   i_loadave(system_info.last_pid, system_info.load_avg);
+   i_loadave(system_info.load_avg);
 
/* display the current time */
/* this method of getting the time SHOULD be fairly portable */



init(8): signal handler boolean needs to be "volatile"

2022-09-09 Thread Scott Cheloha
The variable "clang" is modified from a signal handler.  It should be
of type sig_atomic_t and it needs to be volatile.

ok?

Index: init.c
===
RCS file: /cvs/src/sbin/init/init.c,v
retrieving revision 1.71
diff -u -p -r1.71 init.c
--- init.c  24 Oct 2021 21:24:21 -  1.71
+++ init.c  9 Sep 2022 19:03:08 -
@@ -176,7 +176,8 @@ void setsecuritylevel(int);
 void setprocresources(char *);
 int getsecuritylevel(void);
 int setupargv(session_t *, struct ttyent *);
-int clang;
+
+volatile sig_atomic_t clang;
 
 void clear_session_logs(session_t *);
 



Re: [please test] pvclock(4): fix several bugs

2022-09-09 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 10:17:12AM -0500, Scott Cheloha wrote:
> > On Sep 8, 2022, at 9:05 AM, Mike Larkin  wrote:
> > [...]
> > 
> > You could compile this and then objdump -D it and see for yourself...
> 
> I can't make heads or tails of it.  Please explain what I am looking
> at and why it is, or is not, atomic.

Or I guess I can just wing it.

Okay, so this C code:

volatile uint64_t pvclock_lastcount;

/* [...] */

void
pvclock_get_timecount(struct timecounter *tc)
{
uint64_t ctr, last;

/* [...] */

do {
last = pvclock_lastcount;
if (ctr < last)
return (last);
} while (atomic_cas_64(_lastcount, last, ctr) != last);

return (ctr);
}

... yields this amd64 disassembly (from the linked bsd binary):

81de242b:   75 2b   jne81de2458 

81de242d:   eb 01   jmp81de2430 

81de242f:   cc  int3   
81de2430:   48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx   
 # 824b20c8 
81de2437:   48 87 d1xchg   %rdx,%rcx
81de243a:   48 39 d1cmp%rdx,%rcx
81de243d:   48 87 d1xchg   %rdx,%rcx
81de2440:   72 13   jb 81de2455 

81de2442:   48 89 c8mov%rcx,%rax
81de2445:   f0 48 0f b1 15 7a fclock cmpxchg %rdx,7142522(%rip) 
   # 824b20c8 
81de244c:   6c 00 
81de244e:   48 39 c8cmp%rcx,%rax
81de2451:   75 dd   jne81de2430 

81de2453:   eb 03   jmp81de2458 

81de2455:   48 8b d1mov%rcx,%rdx
81de2458:   89 d0   mov%edx,%eax
81de245a:   48 83 c4 08 add$0x8,%rsp
81de245e:   41 5e   pop%r14
81de2460:   c9  leaveq 
81de2461:   c3  retq

... and also yields this i386 disassembly (from the pvclock.o object):

 2c7:   75 2e   jne2f7 
 2c9:   eb 05   jmp2d0 
 2cb:   cc  int3   
 2cc:   cc  int3   
 2cd:   cc  int3   
 2ce:   cc  int3   
 2cf:   cc  int3   
 2d0:   8b 15 04 00 00 00   mov0x4,%edx
 2d6:   a1 00 00 00 00  mov0x0,%eax
 2db:   87 d8   xchg   %ebx,%eax
 2dd:   39 d8   cmp%ebx,%eax
 2df:   87 d8   xchg   %ebx,%eax
 2e1:   89 f1   mov%esi,%ecx
 2e3:   19 d1   sbb%edx,%ecx
 2e5:   72 0e   jb 2f5 
 2e7:   89 f1   mov%esi,%ecx
 2e9:   f0 0f c7 0d 00 00 00lock cmpxchg8b 0x0
 2f0:   00 
 2f1:   75 dd   jne2d0 
 2f3:   eb 02   jmp2f7 
 2f5:   8b d8   mov%eax,%ebx
 2f7:   89 d8   mov%ebx,%eax
 2f9:   83 c4 1cadd$0x1c,%esp
 2fc:   5e  pop%esi
 2fd:   5f  pop%edi
 2fe:   5b  pop%ebx
 2ff:   5d  pop%ebp
 300:   c3  ret

If we isolate the pvclock_lastcount loads, on amd64 we have:

81de2430:   48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx   
 # 824b20c8 

and on i386 we have:

 2d0:   8b 15 04 00 00 00   mov0x4,%edx
 2d6:   a1 00 00 00 00  mov0x0,%eax

so the 8-byte load is atomic on amd64 (one load) and non-atomic on
i386 (two loads).

I don't know what jsg@ meant when he said the ifdefs "seemed
unnecessary", but near as I can tell they are necessary.  I need an
atomic 8-byte load and i386 can't, or won't, do it.

So I guess we go back to my original patch.

This resolves kettenis@'s atomic_cas_64() objections because we no
longer need it.

So, once again, the patch in brief:

- Add rdtsc_lfence() to i386/include/cpufunc.h

- Make pvclock_lastcount volatile uint64_t to fix the
  non-PVCLOCK_FLAG_TSC_STABLE case (see sub.).

- Check for SSE2 support in pvclock_match(), we need it for LFENCE
  in pvclock_get_timecount().

- Do RDTSC as soon as possible in the lockless read loop to get
  a better timestamp.

- Use rdtsc_lfence() instead of rdtsc() to get a better timestamp.

- Check whether our TSC lags ti->ti_tsc_timestamp so we don't
  produce a bogus delta.

- Fix the non-PVCLOCK_FLAG_TSC_STABLE case:

  + On amd64 we can do this with an atomic_cas_ulong(9) loop.  We need
to cast the pointer to (unsigned long *) or the compiler complains.
This is safe because sizeof(long) equals sizeof(uint64_t) on amd64.

  + 

Re: acpihpet(4): acpihpet_delay: only use lower 32 bits of counter

2022-09-09 Thread Scott Cheloha
On Fri, Sep 09, 2022 at 03:59:01PM +1000, Jonathan Gray wrote:
> On Thu, Sep 08, 2022 at 08:31:21PM -0500, Scott Cheloha wrote:
> > On Sat, Aug 27, 2022 at 09:28:06PM -0500, Scott Cheloha wrote:
> > > Whoops, forgot about the split read problem.  My mistake.
> > > 
> > > Because 32-bit platforms cannot do bus_space_read_8 atomically, and
> > > i386 can use acpihpet(4), we can only safely use the lower 32 bits of
> > > the counter in acpihpet_delay() (unless we want two versions of
> > > acpihpet_delay()... which I don't).
> > > 
> > > Switch from acpihpet_r() to bus_space_read_4(9) and accumulate cycles
> > > as we do in acpihpet_delay().  Unlike acpitimer(4), the HPET is a
> > > 64-bit counter so we don't need to mask the difference between val1
> > > and val2.
> > > 
> > > [...]
> > 
> > 12 day ping.
> > 
> > This needs fixing before it causes problems.
> 
> the hpet spec says to set a bit to force a 32-bit counter on
> 32-bit platforms
> 
> see 2.4.7 Issues related to 64-bit Timers with 32-bit CPUs, in
> https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/software-developers-hpet-spec-1-0a.pdf

I don't follow your meaning.  Putting the HPET in 32-bit mode doesn't
help us here, it would just break acpihpet_delay() in a different way.

The problem is that acpihpet_delay() is written as a 64-bit delay(9)
and there's no way to do that safely on i386 without introducing extra
overhead.

The easiest and cheapest fix is to rewrite acpihpet_delay() as a
32-bit delay(9), i.e. we count cycles until we pass a threshold.
acpitimer_delay() in acpi/acpitimer.c is a 32-bit delay(9) and it
works great, let's just do the same thing again here.

ok?

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  9 Sep 2022 12:29:41 -
@@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
 void
 acpihpet_delay(int usecs)
 {
-   uint64_t c, s;
+   uint64_t count = 0, cycles;
struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+   uint32_t val1, val2;
 
-   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
-   c = usecs * hpet_timecounter.tc_frequency / 100;
-   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   cycles = usecs * hpet_timecounter.tc_frequency / 100;
+   while (count < cycles) {
CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
+   HPET_MAIN_COUNTER);
+   count += val2 - val1;
+   }
 }
 
 u_int



Re: acpihpet(4): acpihpet_delay: only use lower 32 bits of counter

2022-09-08 Thread Scott Cheloha
On Sat, Aug 27, 2022 at 09:28:06PM -0500, Scott Cheloha wrote:
> Whoops, forgot about the split read problem.  My mistake.
> 
> Because 32-bit platforms cannot do bus_space_read_8 atomically, and
> i386 can use acpihpet(4), we can only safely use the lower 32 bits of
> the counter in acpihpet_delay() (unless we want two versions of
> acpihpet_delay()... which I don't).
> 
> Switch from acpihpet_r() to bus_space_read_4(9) and accumulate cycles
> as we do in acpihpet_delay().  Unlike acpitimer(4), the HPET is a
> 64-bit counter so we don't need to mask the difference between val1
> and val2.
> 
> [...]

12 day ping.

This needs fixing before it causes problems.

ok?

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  9 Sep 2022 01:30:26 -
@@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
 void
 acpihpet_delay(int usecs)
 {
-   uint64_t c, s;
+   uint64_t count = 0, cycles;
struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+   uint32_t val1, val2;
 
-   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
-   c = usecs * hpet_timecounter.tc_frequency / 100;
-   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   cycles = usecs * hpet_timecounter.tc_frequency / 100;
+   while (count < cycles) {
CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
+   HPET_MAIN_COUNTER);
+   count += val2 - val1;
+   }
 }
 
 u_int



Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
> On Sep 8, 2022, at 9:05 AM, Mike Larkin  wrote:
> 
> On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote:
>> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
>>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
>>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
>>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
>>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>>>>>>> 
>>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>>>>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>>>>>>> multiprocessor VM.
>>>>>>>>> 
>>>>>>>>> Ideally, nothing should break.
>>>>>>>>> 
>>>>>>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>>>>>>> use the lower 32 bits, but internally we need to track the full
>>>>>>>>> 64-bit value to allow comparisons with the full value in the
>>>>>>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>>>>>>> loop to get a more accurate timestamp.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>>>>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>>>>>>> to introduce a mutex to protect our comparison and read/write.
>>>>>>>> 
>>>>>>>> i386 has cmpxchg8b, no need to disable interrupts
>>>>>>>> the ifdefs seem excessive
>>>>>>> 
>>>>>>> How do I make use of CMPXCHG8B on i386
>>>>>>> in this context?
>>>>>>> 
>>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on
>>>>>>> i386.
>>>>>> 
>>>>>> static inline uint64_t
>>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
>>>>>> {
>>>>>>  return __sync_val_compare_and_swap(p, o, n);
>>>>>> }
>>>>>> 
>>>>>> Or md atomic.h files could have an equivalent.
>>>>>> Not possible on all 32-bit archs.
>>>>>> 
>>>>>>> 
>>>>>>> We can't use FP registers in the kernel, no?
>>>>>> 
>>>>>> What do FP registers have to do with it?
>>>>>> 
>>>>>>> 
>>>>>>> Am I missing some other avenue?
>>>>>> 
>>>>>> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
>>>>> 
>>>>> LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
>>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some
>>>>> kind of fault on an older CPU.
>>>>> 
>>>>> What are my options on a 586-class CPU for forcing RDTSC to complete
>>>>> before later instructions?
>>>> 
>>>> "3.3.2. Serializing Operations
>>>> After executing certain instructions the Pentium processor serializes
>>>> instruction execution. This means that any modifications to flags,
>>>> registers, and memory for previous instructions are completed before
>>>> the next instruction is fetched and executed. The prefetch queue
>>>> is flushed as a result of serializing operations.
>>>> 
>>

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
> On Sep 8, 2022, at 9:27 AM, Mark Kettenis  wrote:
> 
>> Date: Thu, 8 Sep 2022 08:32:27 -0500
>> From: Scott Cheloha 
>> 
>> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
>>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
>>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
>>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
>>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>>>>>>> 
>>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>>>>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>>>>>>> multiprocessor VM.
>>>>>>>>> 
>>>>>>>>> Ideally, nothing should break.
>>>>>>>>> 
>>>>>>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>>>>>>> use the lower 32 bits, but internally we need to track the full
>>>>>>>>> 64-bit value to allow comparisons with the full value in the
>>>>>>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>>>>>>> loop to get a more accurate timestamp.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>>>>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>>>>>>> to introduce a mutex to protect our comparison and read/write.
>>>>>>>> 
>>>>>>>> i386 has cmpxchg8b, no need to disable interrupts
>>>>>>>> the ifdefs seem excessive
>>>>>>> 
>>>>>>> How do I make use of CMPXCHG8B on i386
>>>>>>> in this context?
>>>>>>> 
>>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on
>>>>>>> i386.
>>>>>> 
>>>>>> static inline uint64_t
>>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
>>>>>> {
>>>>>>  return __sync_val_compare_and_swap(p, o, n);
>>>>>> }
>>>>>> 
>>>>>> Or md atomic.h files could have an equivalent.
>>>>>> Not possible on all 32-bit archs.
>>>>>> 
>>>>>>> 
>>>>>>> We can't use FP registers in the kernel, no?
>>>>>> 
>>>>>> What do FP registers have to do with it?
>>>>>> 
>>>>>>> 
>>>>>>> Am I missing some other avenue?
>>>>>> 
>>>>>> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
>>>>> 
>>>>> LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
>>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some
>>>>> kind of fault on an older CPU.
>>>>> 
>>>>> What are my options on a 586-class CPU for forcing RDTSC to complete
>>>>> before later instructions?
>>>> 
>>>> "3.3.2. Serializing Operations
>>>> After executing certain instructions the Pentium processor serializes
>>>> instruction execution. This means that any modifications to flags,
>>>> registers, and memory for previous instructions are completed before
>>>> the next instruction is fetched and executed. The prefetch queue
>>>> is flushed as a result of serializing operations.
>>&g

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > > >
> > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > > >> dv@ suggested coming to the list to request testing for the 
> > > > > >> pvclock(4)
> > > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > > >> multiprocessor VM.
> > > > > >>
> > > > > >> Ideally, nothing should break.
> > > > > >>
> > > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can 
> > > > > >> only
> > > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > > >>  64-bit value to allow comparisons with the full value in the
> > > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless 
> > > > > >> read
> > > > > >>  loop to get a more accurate timestamp.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > > >> predate
> > > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with 
> > > > > >> an
> > > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we 
> > > > > >> need
> > > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > > >
> > > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > > the ifdefs seem excessive
> > > > >
> > > > > How do I make use of CMPXCHG8B on i386
> > > > > in this context?
> > > > >
> > > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > > i386.
> > > >
> > > > static inline uint64_t
> > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > > {
> > > > return __sync_val_compare_and_swap(p, o, n);
> > > > }
> > > >
> > > > Or md atomic.h files could have an equivalent.
> > > > Not possible on all 32-bit archs.
> > > >
> > > > >
> > > > > We can't use FP registers in the kernel, no?
> > > >
> > > > What do FP registers have to do with it?
> > > >
> > > > >
> > > > > Am I missing some other avenue?
> > > >
> > > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> > >
> > > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > > kind of fault on an older CPU.
> > >
> > > What are my options on a 586-class CPU for forcing RDTSC to complete
> > > before later instructions?
> >
> > "3.3.2. Serializing Operations
> > After executing certain instructions the Pentium processor serializes
> > instruction execution. This means that any modifications to flags,
> > registers, and memory for previous instructions are completed before
> > the next instruction is fetched and executed. The prefetch queue
> > is flushed as a result of serializing operations.
> >
> > The Pentium processor serializes instruction execution after executing
> > one of the following instructions: Move to Special Register (except
> > CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD,
> > CPUID, RSM and WRMSR."
>

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-08 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 05:52:43AM +0300, Pavel Korovin wrote:
> On 09/07, Scott Cheloha wrote:
> > Just to make sure that my changes to acpihpet(4) actually caused
> > the problem, I have a few more questions:
> > 
> > 1. When did you change the OS type?
> 
> 03 August, after that, there was a local snpashot built from sources
> fetched on 17 Aug 2022 22:12:57 +0300 which wasn't affected.
> 
> > 2. What was the compilation date of the kernel where you first saw the
> >problem?
> 
> Locally built snapshot from sources fetched on Wed, 31 Aug 2022 02:05:34
> +0300.
>  
> > 3. If you boot an unpatched kernel does the problem manifest in
> >exactly the same way?
>  
> As I said, I mistakenly changed the OS type not to "FreeBSD Pre-11
> versions (32-bit)", but to "FreeBSD 11 (32-bit)".
> The problem affects only VMs which have Guest OS Version set to "FreeBSD
> Pre-11 versions (32-bit)" on snapshots built after 31 Aug 2022.
> 
> Sample outputs from the machine running older snapshot which is not
> affected:
> 
> $ sysctl kern.version | head -n1
> kern.version=OpenBSD 7.2-beta (GENERIC.MP) #1: Thu Aug 18 15:15:13 MSK
> 2022
> 
> $ sysctl | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=1500017850
> machdep.invarianttsc=1

Please send a full bug report to b...@openbsd.org.

Include full a dmesg for the affected machine.

Something else with VMWare is involved here and I'm not enough of an
expert to say what.  More eyes will be helpful.

Your problem is *probably* caused by my changes, but there are too
many moving parts with all the different VMWare configurations for me
to narrow down what the issue is definitively.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-07 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 05:01:33AM +0300, Pavel Korovin wrote:
> Hi Scott,
> 
> Thank you for the fix!
> 
> I found what triggered this behaviour: it was the change in Guest OS
> Version in VM Options.
> 
> I deploy VMs with sysutils/packer, some time ago I noticed that VM type in
> my templates is specified as freebsd11_64Guest, which isn't consistent with
> vmt(4), as it presents itself as "FreeBSD Pre-11 versions (32-bit)".
> 
> After changing OS type to "FreeBSD Pre-11 versions (32-bit)", I've got this
> problem with tsc.
> 
> 
> The provided diff fixes it:
> 
> $ sysctl -a | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=158129
> machdep.invarianttsc=1

I'm glad you are not seeing the issue anymore.

Just to make sure that my changes to acpihpet(4) actually caused
the problem, I have a few more questions:

1. When did you change the OS type?

2. What was the compilation date of the kernel where you first saw the
   problem?

3. If you boot an unpatched kernel does the problem manifest in
   exactly the same way?

-Scott



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-09-06 Thread Scott Cheloha
On Sat, Sep 03, 2022 at 01:50:28PM +0300, Pavel Korovin wrote:
> After these changes, OpenBSD VMware guest's clock is galloping into the
> future like this:
> Aug 31 02:42:18 build ntpd[55904]: adjusting local clock by -27.085360s
> Aug 31 02:44:26 build ntpd[55904]: adjusting local clock by -116.270573s
> Aug 31 02:47:40 build ntpd[55904]: adjusting local clock by -281.085430s
> Aug 31 02:52:01 build ntpd[55904]: adjusting local clock by -320.064639s
> Aug 31 02:53:09 build ntpd[55904]: adjusting local clock by -385.095886s
> Aug 31 02:54:47 build ntpd[55904]: adjusting local clock by -532.542486s
> Aug 31 02:58:33 build ntpd[55904]: adjusting local clock by -572.363323s
> Aug 31 02:59:38 build ntpd[55904]: adjusting local clock by -655.253598s
> Aug 31 03:01:54 build ntpd[55904]: adjusting local clock by -823.653978s
> Aug 31 03:06:14 build ntpd[55904]: adjusting local clock by -926.705093s
> Aug 31 03:09:00 build ntpd[55904]: adjusting local clock by -1071.837887s
> 
> VM time right after boot:
> rdate -pn $ntp; date
> Sat Sep  3 13:39:43 MSK 2022
> Sat Sep  3 13:43:24 MSK 2022
> 
> $ sysctl -a | grep tsc
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=580245275

This frequency looks wrong.

My first guess is that you are hitting a split-read problem in
acpihpet_delay() when recalibrating the TSC.

Does this patch fix it?

If you can't build a kernel for testing I can just commit this and you
can try the snapshot in a day or two.

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  6 Sep 2022 16:12:23 -
@@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
 void
 acpihpet_delay(int usecs)
 {
-   uint64_t c, s;
+   uint64_t count = 0, cycles;
struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+   uint32_t val1, val2;
 
-   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
-   c = usecs * hpet_timecounter.tc_frequency / 100;
-   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   cycles = usecs * hpet_timecounter.tc_frequency / 100;
+   while (count < cycles) {
CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
+   HPET_MAIN_COUNTER);
+   count += val2 - val1;
+   }
 }
 
 u_int



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > 
> > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > >> dv@ suggested coming to the list to request testing for the pvclock(4)
> > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > >> these changes will only matter in the non-TSC_STABLE case on a
> > >> multiprocessor VM.
> > >> 
> > >> Ideally, nothing should break.
> > >> 
> > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
> > >>  use the lower 32 bits, but internally we need to track the full
> > >>  64-bit value to allow comparisons with the full value in the
> > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > >> 
> > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
> > >>  loop to get a more accurate timestamp.
> > >> 
> > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > >> 
> > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate
> > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > >> 
> > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
> > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
> > >>  to introduce a mutex to protect our comparison and read/write.
> > > 
> > > i386 has cmpxchg8b, no need to disable interrupts
> > > the ifdefs seem excessive
> > 
> > How do I make use of CMPXCHG8B on i386
> > in this context?
> > 
> > atomic_cas_ulong(9) is a 32-bit CAS on
> > i386.
> 
> static inline uint64_t
> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> {
>   return __sync_val_compare_and_swap(p, o, n);
> }
> 
> Or md atomic.h files could have an equivalent.
> Not possible on all 32-bit archs.
> 
> > 
> > We can't use FP registers in the kernel, no?
> 
> What do FP registers have to do with it?
> 
> > 
> > Am I missing some other avenue?
> 
> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.

LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
drop rdtsc_lfence() into cpufunc.h and proceed without causing some
kind of fault on an older CPU.

What are my options on a 586-class CPU for forcing RDTSC to complete
before later instructions?



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
> On Sep 3, 2022, at 07:37, Jonathan Gray  wrote:
> 
> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>> 
>>> On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>> multiprocessor VM.
>>>> 
>>>> Ideally, nothing should break.
>>>> 
>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>> use the lower 32 bits, but internally we need to track the full
>>>> 64-bit value to allow comparisons with the full value in the
>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>> 
>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>> loop to get a more accurate timestamp.
>>>> 
>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>> 
>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>> 
>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>> to introduce a mutex to protect our comparison and read/write.
>>> 
>>> i386 has cmpxchg8b, no need to disable interrupts
>>> the ifdefs seem excessive
>> 
>> How do I make use of CMPXCHG8B on i386
>> in this context?
>> 
>> atomic_cas_ulong(9) is a 32-bit CAS on
>> i386.
> 
> static inline uint64_t
> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> {
>return __sync_val_compare_and_swap(p, o, n);
> }
> 
> Or md atomic.h files could have an equivalent.
> Not possible on all 32-bit archs.

Gotcha.

>> We can't use FP registers in the kernel, no?
> 
> What do FP registers have to do with it?

I read someplace that using FP registers was
a quick and dirty way to take advantage of
the 64-bit-aligned atomic access guarantees
of the Pentium.

>> Am I missing some other avenue?
> 
> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.

I will come back with a fuller patch in a bit.



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> 
> On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>> dv@ suggested coming to the list to request testing for the pvclock(4)
>> driver.  Attached is a patch that corrects several bugs.  Most of
>> these changes will only matter in the non-TSC_STABLE case on a
>> multiprocessor VM.
>> 
>> Ideally, nothing should break.
>> 
>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>  use the lower 32 bits, but internally we need to track the full
>>  64-bit value to allow comparisons with the full value in the
>>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>> 
>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>  loop to get a more accurate timestamp.
>> 
>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>> 
>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>> 
>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>  to introduce a mutex to protect our comparison and read/write.
> 
> i386 has cmpxchg8b, no need to disable interrupts
> the ifdefs seem excessive

How do I make use of CMPXCHG8B on i386
in this context?

atomic_cas_ulong(9) is a 32-bit CAS on
i386.

We can't use FP registers in the kernel, no?

Am I missing some other avenue?



[please test] pvclock(4): fix several bugs

2022-09-02 Thread Scott Cheloha
dv@ suggested coming to the list to request testing for the pvclock(4)
driver.  Attached is a patch that corrects several bugs.  Most of
these changes will only matter in the non-TSC_STABLE case on a
multiprocessor VM.

Ideally, nothing should break.

- pvclock yields a 64-bit value.  The BSD timecounter layer can only
  use the lower 32 bits, but internally we need to track the full
  64-bit value to allow comparisons with the full value in the
  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.

- In pvclock_get_timecount(), move rdtsc() up into the lockless read
  loop to get a more accurate timestamp.

- In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().

- In pvclock_get_timecount(), check that our TSC value doesn't predate
  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.

- In pvclock_get_timecount(), update pvclock_lastcount in the
  non-TSC_STABLE case with more care.  On amd64 we can do this with an
  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
  to introduce a mutex to protect our comparison and read/write.

Index: pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.8
diff -u -p -r1.8 pvclock.c
--- pvclock.c   5 Nov 2021 11:38:29 -   1.8
+++ pvclock.c   2 Sep 2022 22:54:08 -
@@ -27,6 +27,10 @@
 #include 
 #include 
 #include 
+#include 
+#if defined(__i386__)
+#include 
+#endif
 
 #include 
 #include 
@@ -35,7 +39,12 @@
 #include 
 #include 
 
-uint pvclock_lastcount;
+#if defined(__amd64__)
+volatile u_long pvclock_lastcount;
+#elif defined(__i386__)
+struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH);
+uint64_t pvclock_lastcount;
+#endif
 
 struct pvclock_softc {
struct devicesc_dev;
@@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter
 {
struct pvclock_softc*sc = tc->tc_priv;
struct pvclock_time_info*ti;
-   uint64_t tsc_timestamp, system_time, delta, ctr;
+   uint64_t system_time, delta, ctr, tsc;
uint32_t version, mul_frac;
int8_t   shift;
uint8_t  flags;
@@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter
ti = sc->sc_time;
do {
version = pvclock_read_begin(ti);
+   tsc = rdtsc_lfence();
+   if (ti->ti_tsc_timestamp < tsc)
+   delta = tsc - ti->ti_tsc_timestamp;
+   else
+   delta = 0;
system_time = ti->ti_system_time;
-   tsc_timestamp = ti->ti_tsc_timestamp;
mul_frac = ti->ti_tsc_to_system_mul;
shift = ti->ti_tsc_shift;
flags = ti->ti_flags;
@@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter
 * The algorithm is described in
 * linux/Documentation/virtual/kvm/msr.txt
 */
-   delta = rdtsc() - tsc_timestamp;
if (shift < 0)
delta >>= -shift;
else
@@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter
if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0)
return (ctr);
 
-   if (ctr < pvclock_lastcount)
-   return (pvclock_lastcount);
-
-   atomic_swap_uint(_lastcount, ctr);
-
+#if defined(__amd64__)
+   u_long last;
+   do {
+   last = pvclock_lastcount;
+   if (ctr < last)
+   return last;
+   } while (atomic_cas_ulong(_lastcount, last, ctr) != last);
+#elif defined(__i386__)
+   mtx_enter(_mtx);
+   if (pvclock_lastcount < ctr)
+   pvclock_lastcount = ctr;
+   else
+   ctr = pvclock_lastcount;
+   mtx_leave(_mtx);
+#endif
return (ctr);
 }



Re: vmd(8): compute i8254 Read-Back latch from singular timestamp

2022-09-02 Thread Scott Cheloha
On Fri, Sep 02, 2022 at 01:13:00PM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > The 8254 data sheet [1] says this about the Read-Back command:
> >
> >> The read-back command may be used to latch multi-
> >> ple counter output latches (OL) by setting the
> >> COUNT bit D5 = 0 and selecting the desired coun-
> >> ter(s).  This single command is functionally equiva-
> >> lent to several counter latch commands, one for
> >> each counter latched. [...]
> >
> > This is a little ambiguous.  But my hunch is that the intent here is
> > "you can latch multiple counters all at once".  Simultaneously.
> > Otherwise the utility of the read-back command is suspect.
> >
> > To simulate a simultaneous latch, we should only call clock_gettime(2)
> > once and use that singular timestamp to compute olatch for each
> > counter.
> >
> > ok?
> >
> 
> I'm not an expert on the i825{3,4} but have a question below. I did
> quickly test this diff and see no noticeable difference from the point
> of view of my local guests.
> 
> > [1] 8254 Programmable Interval Timer, p. 8
> > https://www.scs.stanford.edu/10wi-cs140/pintos/specs/8254.pdf
> >
> > Index: i8253.c
> > ===
> > RCS file: /cvs/src/usr.sbin/vmd/i8253.c,v
> > retrieving revision 1.34
> > diff -u -p -r1.34 i8253.c
> > --- i8253.c 16 Jun 2021 16:55:02 -  1.34
> > +++ i8253.c 2 Sep 2022 16:25:02 -
> > @@ -128,6 +128,8 @@ i8253_do_readback(uint32_t data)
> > int readback_channel[3] = { TIMER_RB_C0, TIMER_RB_C1, TIMER_RB_C2 };
> > int i;
> >
> > +   clock_gettime(CLOCK_MONOTONIC, );
> > +
> 
> Why make this call to clock_gettime here ^
> 
> > /* bits are inverted here - !TIMER_RB_STATUS == enable chan readback */
> > if (data & ~TIMER_RB_STATUS) {
> > i8253_channel[0].rbs = (data & TIMER_RB_C0) ? 1 : 0;
> > @@ -139,7 +141,6 @@ i8253_do_readback(uint32_t data)
> > if (data & ~TIMER_RB_COUNT) {
> 
> ...instead of here where we can save a possible syscall?

Yes, that's better, I'll go with that.



vmd(8): compute i8254 Read-Back latch from singular timestamp

2022-09-02 Thread Scott Cheloha
The 8254 data sheet [1] says this about the Read-Back command:

> The read-back command may be used to latch multi-
> ple counter output latches (OL) by setting the
> COUNT bit D5 = 0 and selecting the desired coun-
> ter(s).  This single command is functionally equiva-
> lent to several counter latch commands, one for
> each counter latched. [...]

This is a little ambiguous.  But my hunch is that the intent here is
"you can latch multiple counters all at once".  Simultaneously.
Otherwise the utility of the read-back command is suspect.

To simulate a simultaneous latch, we should only call clock_gettime(2)
once and use that singular timestamp to compute olatch for each
counter.

ok?

[1] 8254 Programmable Interval Timer, p. 8
https://www.scs.stanford.edu/10wi-cs140/pintos/specs/8254.pdf

Index: i8253.c
===
RCS file: /cvs/src/usr.sbin/vmd/i8253.c,v
retrieving revision 1.34
diff -u -p -r1.34 i8253.c
--- i8253.c 16 Jun 2021 16:55:02 -  1.34
+++ i8253.c 2 Sep 2022 16:25:02 -
@@ -128,6 +128,8 @@ i8253_do_readback(uint32_t data)
int readback_channel[3] = { TIMER_RB_C0, TIMER_RB_C1, TIMER_RB_C2 };
int i;
 
+   clock_gettime(CLOCK_MONOTONIC, );
+
/* bits are inverted here - !TIMER_RB_STATUS == enable chan readback */
if (data & ~TIMER_RB_STATUS) {
i8253_channel[0].rbs = (data & TIMER_RB_C0) ? 1 : 0;
@@ -139,7 +141,6 @@ i8253_do_readback(uint32_t data)
if (data & ~TIMER_RB_COUNT) {
for (i = 0; i < 3; i++) {
if (data & readback_channel[i]) {
-   clock_gettime(CLOCK_MONOTONIC, );
timespecsub(, _channel[i].ts, );
ns = delta.tv_sec * 10 + delta.tv_nsec;
ticks = ns / NS_PER_TICK;



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-29 Thread Scott Cheloha
> On Aug 29, 2022, at 22:54, Jonathan Gray  wrote:
> 
> On Mon, Aug 29, 2022 at 12:02:42PM -0500, Scott Cheloha wrote:
>> If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a
>> vm exit, and both have microsecond or better resolution, then
>> hv_delay() is the preferable delay(9) implementation where it is
>> available because vm exits have ambiguous overhead.
> 
> with hv_delay() currently doing rdmsr I wouldn't say never
> 
>> 
>> If that seems sensible to you, I'll commit this switch.
> 
> There is a msr to allow cpuid to report invariant tsc (0x4118).
> Used by linux but not documented in the hyper-v tlfs.
> 
> Without that, tsc delay is never used on hyper-v.  So leave it
> as is until someone running hyper-v/azure would like it changed?

Works for me.



Re: all architectures: put clockframe definition in frame.h?

2022-08-29 Thread Scott Cheloha
On Mon, Aug 29, 2022 at 06:50:08PM +0200, Mark Kettenis wrote:
> > Date: Mon, 29 Aug 2022 11:33:19 -0500
> > From: Scott Cheloha 
> > 
> > On Fri, Aug 19, 2022 at 01:24:47PM +0200, Mark Kettenis wrote:
> > >
> > > This is one of those annoying corners where there is too much
> > > unecessary MD variation. Currently travelling without a laptop, so I
> > > can't easily check the tree. But one note I wanted to make is that the
> > > definition of struct clockframe and the CLKF_XXX macros should stay
> > > together. 
> > 
> > Sure.  Here's a version that consolidates the CLKF macros into frame.h
> > alongside the clockframe definitions.
> > 
> > Notes by arch:
> > 
> > alpha, amd64, hppa, i386, m88k, mips64, powerpc64, sh, sparc64:
> > 
> > - clockframe is defined in cpu.h with CLKF macros.
> > 
> > - Move clockframe definition and CLKF macros from cpu.h to frame.h.
> > 
> > arm, powerpc:
> > 
> > - clockframe is defined in frame.h.
> > 
> > - CLKF macros are defined in cpu.h.
> > 
> > - Move CLKF macros from cpu.h to frame.h.
> > 
> > arm64, riscv64:
> > 
> > - clockframe is defined in cpu.h with CLKF macros.
> > 
> > - clockframe is *also* defined in frame.h.
> > 
> > - Delete clockframe definition from frame.h
> > 
> > - Move (other) clockframe definition and CLKF macros from cpu.h to frame.h.
> > 
> > sparc64 remains the only one that looks not-quite-right because
> > trapframe64 is defined in reg.h, not frame.h.
> 
> Yes, that is not going to work.
> 
> We really should be getting rid f the xxx32 stuff and rename the xxx64
> ones to xxx.  And move trapframe (and possibly rwindow) to frame.h.

So we would get rid of all the 32-bit compat stuff from arch/sparc64?

That's a pretty big change.

Index: fpu/fpu.c
===
RCS file: /cvs/src/sys/arch/sparc64/fpu/fpu.c,v
retrieving revision 1.21
diff -u -p -r1.21 fpu.c
--- fpu/fpu.c   19 Aug 2020 10:10:58 -  1.21
+++ fpu/fpu.c   29 Aug 2022 18:44:50 -
@@ -81,22 +81,22 @@
 #include 
 
 int fpu_regoffset(int, int);
-int fpu_insn_fmov(struct fpstate64 *, struct fpemu *, union instr);
-int fpu_insn_fabs(struct fpstate64 *, struct fpemu *, union instr);
-int fpu_insn_fneg(struct fpstate64 *, struct fpemu *, union instr);
+int fpu_insn_fmov(struct fpstate *, struct fpemu *, union instr);
+int fpu_insn_fabs(struct fpstate *, struct fpemu *, union instr);
+int fpu_insn_fneg(struct fpstate *, struct fpemu *, union instr);
 int fpu_insn_itof(struct fpemu *, union instr, int, int *,
 int *, u_int *);
 int fpu_insn_ftoi(struct fpemu *, union instr, int *, int, u_int *);
 int fpu_insn_ftof(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fsqrt(struct fpemu *, union instr, int *, int *, u_int *);
-int fpu_insn_fcmp(struct fpstate64 *, struct fpemu *, union instr, int);
+int fpu_insn_fcmp(struct fpstate *, struct fpemu *, union instr, int);
 int fpu_insn_fmul(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fmulx(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fdiv(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fadd(struct fpemu *, union instr, int *, int *, u_int *);
 int fpu_insn_fsub(struct fpemu *, union instr, int *, int *, u_int *);
-int fpu_insn_fmovcc(struct proc *, struct fpstate64 *, union instr);
-int fpu_insn_fmovr(struct proc *, struct fpstate64 *, union instr);
+int fpu_insn_fmovcc(struct proc *, struct fpstate *, union instr);
+int fpu_insn_fmovr(struct proc *, struct fpstate *, union instr);
 void fpu_fcopy(u_int *, u_int *, int);
 
 #ifdef DEBUG
@@ -115,7 +115,7 @@ fpu_dumpfpn(struct fpn *fp)
fp->fp_mant[2], fp->fp_mant[3], fp->fp_exp);
 }
 void
-fpu_dumpstate(struct fpstate64 *fs)
+fpu_dumpstate(struct fpstate *fs)
 {
int i;
 
@@ -189,7 +189,7 @@ fpu_fcopy(src, dst, type)
 void
 fpu_cleanup(p, fs)
register struct proc *p;
-   register struct fpstate64 *fs;
+   register struct fpstate *fs;
 {
register int i, fsr = fs->fs_fsr, error;
union instr instr;
@@ -455,7 +455,7 @@ fpu_execute(p, fe, instr)
  */
 int
 fpu_insn_fmov(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -478,7 +478,7 @@ fpu_insn_fmov(fs, fe, instr)
  */
 int
 fpu_insn_fabs(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -502,7 +502,7 @@ fpu_insn_fabs(fs, fe, instr)
  */
 int
 fpu_insn_fneg(fs, fe, instr)
-   struct fpstate64 *fs;
+   struct fpstate *fs;
struct fpemu *fe;
union instr instr;
 {
@@ -644,7 +644,7 @@ fpu

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-29 Thread Scott Cheloha
On Thu, Aug 25, 2022 at 03:57:48PM +1000, Jonathan Gray wrote:
> On Wed, Aug 24, 2022 at 11:05:30PM -0500, Scott Cheloha wrote:
> > On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote:
> > > On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > > > > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > > > > devices.  32-bit and 64-bit guests.
> > > > > 
> > > > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > > > > 64-bit guests only.
> > > > > 
> > > > > There is no 8254 in generation 2.
> > > > > No HPET in either generation.
> > > > > 
> > > > > hv_delay uses the "Partition Reference Counter MSR" described in
> > > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > > > > It seems it is available in both generations and could be used from 
> > > > > i386?
> > > > > 
> > > > > From reading that page hv_delay() should be preferred over 
> > > > > lapic_delay()
> > > > 
> > > > Alright, I have nudged hv_delay's quality up over lapic_delay's
> > > > quality.
> > > 
> > > Before these changes, tsc is probed before pvbus.  Do the tsc sanity
> > > checks result in it not being considered an option on hyper-v?  I think
> > > the tsc_delay and hv_delay numbers should be swapped in a later commit.
> > > It is unclear if that would change the final delay_func setting.
> > 
> > Why would we prefer hv_delay() to tsc_delay() if we had a
> > constant/invariant TSC available in our Hyper-V guest?
> > 
> > When patrick@ emailed me last year about issues with delay(9) on
> > Hyper-V, he started by saying that the root of the problem was that
> > the OpenBSD guest was not opting to use tsc_delay() because the host
> > wasn't reporting a constant/invariant TSC.  So the guest was trying to
> > use i8254_delay(), which was impossible because Hyper-V Gen2 guests
> > don't have an i8254.  Hence, hv_delay() was added to the tree.
> > 
> > So, my understanding is that the addition of hv_delay() does not mean
> > tsc_delay() is worse than hv_delay().  hv_delay() was added because
> > tsc_delay() isn't always an option and (to our surprise) neither is
> > i8254_delay().
> 
> I'm not clear on when rdtsc and rdmsr would cause a vm exit.
> Presumably the reference tsc page is provided to avoid that,
> but we don't use it.  rdtsc and rdmsr don't always cause an exit.
> 
> The wording of Microsoft's "Hypervisor Top Level Functional
> Specification" reads as the interface is only available when
> the underlying machine has a constant frequency tsc.  It also
> makes the point that the interface being in time not cycles avoids
> problems when the tsc frequency changes on live migration.
> 
> "12.3 Partition Reference Time Enlightenment
> 
> The partition reference time enlightenment presents a reference
> time source to a partition which does not require an intercept into
> the hypervisor. This enlightenment is available only when the
> underlying platform provides support of an invariant processor Time
> Stamp Counter (TSC), or iTSC. In such platforms, the processor TSC
> frequency remains constant irrespective of changes in the processor's
> clock frequency due to the use of power management states such as
> ACPI processor performance states, processor idle sleep states (ACPI
> C-states), etc.
> 
> The partition reference time enlightenment uses a virtual TSC value,
> an offset and a multiplier to enable a guest partition to compute
> the normalized reference time since partition creation, in 100nS
> units. The mechanism also allows a guest partition to atomically
> compute the reference time when the guest partition is migrated to
> a platform with a different TSC rate, and provides a fallback
> mechanism to support migration to platforms without the constant
> rate TSC feature.
> 
> This facility is not intended to be used a source of wall clock
> time, since the reference time computed using this facility will
> appear to stop during the time that a guest partition is saved until
> the subsequent restore."

If hv_delay() never causes a vm exit, but tsc_delay() *might* cause a
vm exit, and both have microsecond or better resolution, then
hv_delay() is the preferable delay(9) implementation where it is
available because vm exits have ambiguous overhead.

If that seems sensible to you, I'll commit this switch.

Index: arch/amd64/amd6

Re: all architectures: put clockframe definition in frame.h?

2022-08-29 Thread Scott Cheloha
On Fri, Aug 19, 2022 at 01:24:47PM +0200, Mark Kettenis wrote:
>
> This is one of those annoying corners where there is too much
> unecessary MD variation. Currently travelling without a laptop, so I
> can't easily check the tree. But one note I wanted to make is that the
> definition of struct clockframe and the CLKF_XXX macros should stay
> together. 

Sure.  Here's a version that consolidates the CLKF macros into frame.h
alongside the clockframe definitions.

Notes by arch:

alpha, amd64, hppa, i386, m88k, mips64, powerpc64, sh, sparc64:

- clockframe is defined in cpu.h with CLKF macros.

- Move clockframe definition and CLKF macros from cpu.h to frame.h.

arm, powerpc:

- clockframe is defined in frame.h.

- CLKF macros are defined in cpu.h.

- Move CLKF macros from cpu.h to frame.h.

arm64, riscv64:

- clockframe is defined in cpu.h with CLKF macros.

- clockframe is *also* defined in frame.h.

- Delete clockframe definition from frame.h

- Move (other) clockframe definition and CLKF macros from cpu.h to frame.h.

sparc64 remains the only one that looks not-quite-right because
trapframe64 is defined in reg.h, not frame.h.

Index: alpha/include/cpu.h
===
RCS file: /cvs/src/sys/arch/alpha/include/cpu.h,v
retrieving revision 1.66
diff -u -p -r1.66 cpu.h
--- alpha/include/cpu.h 10 Aug 2022 10:41:35 -  1.66
+++ alpha/include/cpu.h 29 Aug 2022 16:28:56 -
@@ -297,25 +297,6 @@ cpu_rnd_messybits(void)
 }
 
 /*
- * Arguments to hardclock and gatherstats encapsulate the previous
- * machine state in an opaque clockframe.  On the Alpha, we use
- * what we push on an interrupt (a trapframe).
- */
-struct clockframe {
-   struct trapframecf_tf;
-};
-#defineCLKF_USERMODE(framep)   
\
-   (((framep)->cf_tf.tf_regs[FRAME_PS] & ALPHA_PSL_USERMODE) != 0)
-#defineCLKF_PC(framep) ((framep)->cf_tf.tf_regs[FRAME_PC])
-
-/*
- * This isn't perfect; if the clock interrupt comes in before the
- * r/m/w cycle is complete, we won't be counted... but it's not
- * like this statistic has to be extremely accurate.
- */
-#defineCLKF_INTR(framep)   (curcpu()->ci_intrdepth)
-
-/*
  * This is used during profiling to integrate system time.
  */
 #definePROC_PC(p)  ((p)->p_md.md_tf->tf_regs[FRAME_PC])
Index: alpha/include/frame.h
===
RCS file: /cvs/src/sys/arch/alpha/include/frame.h,v
retrieving revision 1.4
diff -u -p -r1.4 frame.h
--- alpha/include/frame.h   23 Mar 2011 16:54:34 -  1.4
+++ alpha/include/frame.h   29 Aug 2022 16:28:56 -
@@ -92,4 +92,23 @@ struct trapframe {
unsigned long   tf_regs[FRAME_SIZE];/* See above */
 };
 
+/*
+ * Arguments to hardclock and gatherstats encapsulate the previous
+ * machine state in an opaque clockframe.  On the Alpha, we use
+ * what we push on an interrupt (a trapframe).
+ */
+struct clockframe {
+   struct trapframecf_tf;
+};
+#defineCLKF_USERMODE(framep)   
\
+   (((framep)->cf_tf.tf_regs[FRAME_PS] & ALPHA_PSL_USERMODE) != 0)
+#defineCLKF_PC(framep) ((framep)->cf_tf.tf_regs[FRAME_PC])
+
+/*
+ * This isn't perfect; if the clock interrupt comes in before the
+ * r/m/w cycle is complete, we won't be counted... but it's not
+ * like this statistic has to be extremely accurate.
+ */
+#defineCLKF_INTR(framep)   (curcpu()->ci_intrdepth)
+
 #endif /* _MACHINE_FRAME_H_ */
Index: amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.149
diff -u -p -r1.149 cpu.h
--- amd64/include/cpu.h 25 Aug 2022 17:25:25 -  1.149
+++ amd64/include/cpu.h 29 Aug 2022 16:28:56 -
@@ -336,17 +336,6 @@ cpu_rnd_messybits(void)
 #define curpcb curcpu()->ci_curpcb
 
 /*
- * Arguments to hardclock, softclock and statclock
- * encapsulate the previous machine state in an opaque
- * clockframe; for now, use generic intrframe.
- */
-#define clockframe intrframe
-
-#defineCLKF_USERMODE(frame)USERMODE((frame)->if_cs, 
(frame)->if_rflags)
-#define CLKF_PC(frame) ((frame)->if_rip)
-#define CLKF_INTR(frame)   (curcpu()->ci_idepth > 1)
-
-/*
  * Give a profiling tick to the current process when the user profiling
  * buffer pages are invalid.  On the i386, request an ast to send us
  * through usertrap(), marking the proc as needing a profiling tick.
Index: amd64/include/frame.h
===
RCS file: /cvs/src/sys/arch/amd64/include/frame.h,v
retrieving revision 1.10
diff -u -p -r1.10 frame.h
--- amd64/include/frame.h   10 Jul 2018 08:57:44 -  1.10
+++ amd64/include/frame.h   29 Aug 2022 16:28:56 -
@@ -171,4 +171,14 @@ struct callframe {
long 

i386/lapic.c: sync with amd64/lapic.c

2022-08-28 Thread Scott Cheloha
As promised off-list: in anticipation of merging the clock interrupt
code, let's sync up the lapic timer parts of i386/lapic.c with the
corresponding parts in amd64/lapic.c.  They will need identical
changes to use the new code, so the more alike they are the better.

Notable differences remaining in the timer code:

- We use i82489_readreg() and i82489_writereg() on i386 instead of
  lapic_readreg() and lapic_writereg().

- lapic_clockintr() is just plain different on i386, I'm not
  touching it yet.

- No way to skip_calibration on i386.

We can do synchronized cleanup in a later patch.

Does this compile and boot on i386?  If so, ok?

Index: i386/i386/lapic.c
===
RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
retrieving revision 1.50
diff -u -p -r1.50 lapic.c
--- i386/i386/lapic.c   25 Aug 2022 17:38:16 -  1.50
+++ i386/i386/lapic.c   28 Aug 2022 20:24:55 -
@@ -244,11 +244,41 @@ u_int32_t lapic_tval;
 /*
  * this gets us up to a 4GHz busclock
  */
-u_int32_t lapic_per_second;
+u_int32_t lapic_per_second = 0;
 u_int32_t lapic_frac_usec_per_cycle;
 u_int64_t lapic_frac_cycle_per_usec;
 u_int32_t lapic_delaytab[26];
 
+void lapic_timer_oneshot(uint32_t, uint32_t);
+void lapic_timer_periodic(uint32_t, uint32_t);
+
+/*
+ * Start the local apic countdown timer.
+ *
+ * First set the mode, mask, and vector.  Then set the
+ * divisor.  Last, set the cycle count: this restarts
+ * the countdown.
+ */
+static inline void
+lapic_timer_start(uint32_t mode, uint32_t mask, uint32_t cycles)
+{
+   i82489_writereg(LAPIC_LVTT, mode | mask | LAPIC_TIMER_VECTOR);
+   i82489_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
+   i82489_writereg(LAPIC_ICR_TIMER, cycles);
+}
+
+void
+lapic_timer_oneshot(uint32_t mask, uint32_t cycles)
+{
+   lapic_timer_start(LAPIC_LVTT_TM_ONESHOT, mask, cycles);
+}
+
+void
+lapic_timer_periodic(uint32_t mask, uint32_t cycles)
+{
+   lapic_timer_start(LAPIC_LVTT_TM_PERIODIC, mask, cycles);
+}
+
 void
 lapic_clockintr(void *arg)
 {
@@ -262,17 +292,7 @@ lapic_clockintr(void *arg)
 void
 lapic_startclock(void)
 {
-   /*
-* Start local apic countdown timer running, in repeated mode.
-*
-* Mask the clock interrupt and set mode,
-* then set divisor,
-* then unmask and set the vector.
-*/
-   i82489_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_LVTT_M);
-   i82489_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
-   i82489_writereg(LAPIC_ICR_TIMER, lapic_tval);
-   i82489_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_TIMER_VECTOR);
+   lapic_timer_periodic(0, lapic_tval);
 }
 
 void
@@ -284,6 +304,7 @@ lapic_initclocks(void)
 }
 
 extern int gettick(void);  /* XXX put in header file */
+extern u_long rtclock_tval; /* XXX put in header file */
 
 static __inline void
 wait_next_cycle(void)
@@ -325,38 +346,45 @@ lapic_calibrate_timer(struct cpu_info *c
 * Configure timer to one-shot, interrupt masked,
 * large positive number.
 */
-   i82489_writereg(LAPIC_LVTT, LAPIC_LVTT_M);
-   i82489_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
-   i82489_writereg(LAPIC_ICR_TIMER, 0x8000);
+   lapic_timer_oneshot(LAPIC_LVTT_M, 0x8000);
 
-   s = intr_disable();
+   if (delay_func == i8254_delay) {
+   s = intr_disable();
 
-   /* wait for current cycle to finish */
-   wait_next_cycle();
+   /* wait for current cycle to finish */
+   wait_next_cycle();
 
-   startapic = lapic_gettick();
+   startapic = lapic_gettick();
 
-   /* wait the next hz cycles */
-   for (i = 0; i < hz; i++)
-   wait_next_cycle();
+   /* wait the next hz cycles */
+   for (i = 0; i < hz; i++)
+   wait_next_cycle();
 
-   endapic = lapic_gettick();
+   endapic = lapic_gettick();
 
-   intr_restore(s);
+   intr_restore(s);
 
-   dtick = hz * TIMER_DIV(hz);
-   dapic = startapic-endapic;
+   dtick = hz * rtclock_tval;
+   dapic = startapic-endapic;
 
-   /*
-* there are TIMER_FREQ ticks per second.
-* in dtick ticks, there are dapic bus clocks.
-*/
-   tmp = (TIMER_FREQ * dapic) / dtick;
+   /*
+* there are TIMER_FREQ ticks per second.
+* in dtick ticks, there are dapic bus clocks.
+*/
+   tmp = (TIMER_FREQ * dapic) / dtick;
 
-   lapic_per_second = tmp;
+   lapic_per_second = tmp;
+   } else {
+   s = intr_disable();
+   startapic = lapic_gettick();
+   delay(1 * 1000 * 1000);
+   endapic = lapic_gettick();
+   intr_restore(s);
+   lapic_per_second = startapic - endapic;
+   }
 
-   printf("%s: apic clock running at %lldMHz\n",
-   

acpihpet(4): acpihpet_delay: only use lower 32 bits of counter

2022-08-27 Thread Scott Cheloha
Whoops, forgot about the split read problem.  My mistake.

Because 32-bit platforms cannot do bus_space_read_8 atomically, and
i386 can use acpihpet(4), we can only safely use the lower 32 bits of
the counter in acpihpet_delay() (unless we want two versions of
acpihpet_delay()... which I don't).

Switch from acpihpet_r() to bus_space_read_4(9) and accumulate cycles
as we do in acpihpet_delay().  Unlike acpitimer(4), the HPET is a
64-bit counter so we don't need to mask the difference between val1
and val2.

ok?

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  28 Aug 2022 02:26:07 -
@@ -281,13 +281,19 @@ acpihpet_attach(struct device *parent, s
 void
 acpihpet_delay(int usecs)
 {
-   uint64_t c, s;
+   uint64_t count = 0, cycles;
struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+   uint32_t val1, val2;
 
-   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
-   c = usecs * hpet_timecounter.tc_frequency / 100;
-   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c)
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   cycles = usecs * hpet_timecounter.tc_frequency / 100;
+   while (count < cycles) {
CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
+   HPET_MAIN_COUNTER);
+   count += val2 - val1;
+   }
 }
 
 u_int



Re: When did PCs stop using ISA Timer 1?

2022-08-26 Thread Scott Cheloha
On Sat, Aug 27, 2022 at 11:33:58AM +1000, Jonathan Gray wrote:
> On Fri, Aug 26, 2022 at 11:09:19AM -0500, Scott Cheloha wrote:
> > Hi,
> > 
> > TLDR:
> > 
> > 1. When did PCs stop using ISA Timer 1 to trigger DRAM refresh?
> > 
> > 2. Are any PCs that rely on ISA Timer 1 for DRAM refresh capable of
> >running OpenBSD as it exists today?
> > 
> > Long version:
> > 
> > I have a history question for the list.  Maybe one of you hardware
> > jocks or history buffs can help me out.
> > 
> > So, in the IBM AT/PC and, later, all ISA-compatible systems, the ISA
> > timer (an i8253 or compatible clock) has 3 independent 16-bit
> > counters.
> > 
> > The first, Timer 0, is available for use by the operating system.
> > 
> > The second, Timer 1, was traditionally programmed by the BIOS at a
> > particular rate to trigger DRAM refresh.
> > 
> > The third, Timer 2, is usually wired up to the PC speaker and may be
> > used by the operating system to produce primitive sound effects.
> > 
> > I found a more detailed explanation of what Timer 1 actually did in
> > this book:
> > 
> > https://ia601901.us.archive.org/12/items/ISA_System_Architecture/ISA_System_Architecture.pdf
> > 
> > > ISA System Architecture Third Edition (1995)
> > > Chapter 24: ISA Timers
> > > p. 471
> > >
> > > Refresh Timer (Timer 1)
> > > 
> > > The refresh timer, or timer 1, is a programmable frequency source. The
> > > same 1.19318MHz signal (used by timer 0) provides the refresh timer's
> > > clock input.  The programmer specifies a divisor to be divided into
> > > the input clock to yield the desired output frequency.  During the
> > > POST, a divisor of 0012h, or a decimal 18, is written to the refresh
> > > timer at I/O address 0041h.  The input clock frequency of 1.19318MHz is
> > > therefore divided by 18 to yield an output frequency of 66287.77Hz,
> > > or a pulse every 15.09 microseconds.
> > > 
> > > This is the refresh request signal that triggers the DRAM refresh
> > > logic to become bus master once every 15.09 microseconds so it can
> > > refresh another row in DRAM memory throughout the system.  For more
> > > information on DRAM refresh, refer to the chapter entitled "RAM
> > > Memory: Theory of Operation."
> > 
> > This is fascinating.
> > 
> > But obviously this is no longer true in modern PCs.  The ISA bus is
> > still emulated in modern PCs, and DRAM in modern PCs still needs
> > refreshing, but they don't rely on the emulated ISA timer to make it
> > happen.
> > 
> > So, when did PCs stop using ISA Timer 1 for DRAM refresh?
> > 
> > The IBM AT/PC was built around the 80286.  Was it with the advent of
> > the 80386 (1985)?  The 80486 (1989)?  P5 (1993)?  P6 (1995)?  Later?
> > 
> > Was the change independent of a particular processor generation jump?
> > Like, maybe a technological advance in the state of the art in DRAM
> > obsoleted the use of ISA Timer 1 for refresh?
> > 
> > And then, more importantly, are any machines that rely on ISA Timer 1
> > for DRAM refresh actually capable of running OpenBSD as it exists
> > today?
> 
> What difference does it make?  We don't use counter 1.

I noticed that on non-LAPIC systems we program channel 0 in periodic
mode with an initial count of 11932 to effect a 100hz clock interrupt.
And then we also use that same channel to count time, but because we
aren't using the full 16-bit range we need to do all this checking and
incrementing to handle premature overflow to make it appear as though
the full counter is being used.

And I had this whimsical idea: gee, wouldn't it be so much easier to
use channel 0 for clock interrupts and a different channel for
counting time?

But then I started reading and saw that channel 1 had a dedicated
purpose in the bad old days.

So I was left wondering when channel 1 stopped performing that task,
and whether those systems (a) predate the APIC and (b) can even run
OpenBSD at all.

What is the minimum chipset?  486? 586?  You've been doing some
sprucing, so I am unsure.  I know the 80386 is out.

> The PCH datasheets from 100 series and later only document counter 0
> and counter 2.
> 
> 9 series and earlier datasheet has
> "The PCH contains three counters that have fixed uses."
> 100 series and later
> "The PCH contains two counters that have fixed uses."

What does the PCH 9 series and earlier pertain to?  What socket would
have it?

(I didn't even know Intel had documented this, thanks.)



When did PCs stop using ISA Timer 1?

2022-08-26 Thread Scott Cheloha
Hi,

TLDR:

1. When did PCs stop using ISA Timer 1 to trigger DRAM refresh?

2. Are any PCs that rely on ISA Timer 1 for DRAM refresh capable of
   running OpenBSD as it exists today?

Long version:

I have a history question for the list.  Maybe one of you hardware
jocks or history buffs can help me out.

So, in the IBM AT/PC and, later, all ISA-compatible systems, the ISA
timer (an i8253 or compatible clock) has 3 independent 16-bit
counters.

The first, Timer 0, is available for use by the operating system.

The second, Timer 1, was traditionally programmed by the BIOS at a
particular rate to trigger DRAM refresh.

The third, Timer 2, is usually wired up to the PC speaker and may be
used by the operating system to produce primitive sound effects.

I found a more detailed explanation of what Timer 1 actually did in
this book:

https://ia601901.us.archive.org/12/items/ISA_System_Architecture/ISA_System_Architecture.pdf

> ISA System Architecture Third Edition (1995)
> Chapter 24: ISA Timers
> p. 471
>
> Refresh Timer (Timer 1)
> 
> The refresh timer, or timer 1, is a programmable frequency source. The
> same 1.19318MHz signal (used by timer 0) provides the refresh timer's
> clock input.  The programmer specifies a divisor to be divided into
> the input clock to yield the desired output frequency.  During the
> POST, a divisor of 0012h, or a decimal 18, is written to the refresh
> timer at I/O address 0041h.  The input clock frequency of 1.19318MHz is
> therefore divided by 18 to yield an output frequency of 66287.77Hz,
> or a pulse every 15.09 microseconds.
> 
> This is the refresh request signal that triggers the DRAM refresh
> logic to become bus master once every 15.09 microseconds so it can
> refresh another row in DRAM memory throughout the system.  For more
> information on DRAM refresh, refer to the chapter entitled "RAM
> Memory: Theory of Operation."

This is fascinating.

But obviously this is no longer true in modern PCs.  The ISA bus is
still emulated in modern PCs, and DRAM in modern PCs still needs
refreshing, but they don't rely on the emulated ISA timer to make it
happen.

So, when did PCs stop using ISA Timer 1 for DRAM refresh?

The IBM AT/PC was built around the 80286.  Was it with the advent of
the 80386 (1985)?  The 80486 (1989)?  P5 (1993)?  P6 (1995)?  Later?

Was the change independent of a particular processor generation jump?
Like, maybe a technological advance in the state of the art in DRAM
obsoleted the use of ISA Timer 1 for refresh?

And then, more importantly, are any machines that rely on ISA Timer 1
for DRAM refresh actually capable of running OpenBSD as it exists
today?

-Scott



acpihpet(4): use bus_space_{read,write}_8() where available

2022-08-25 Thread Scott Cheloha
The HPET is a 64-bit counter.  The spec permits both 32-bit and 64-bit
aligned access.  We should use bus_space_read_8() in acpihpet_r()
where it is available to improve the accuracy of acpihpet_delay().
The math is obvious: one read is faster than two.

Switching acpihpet_w() to bus_space_read_write_8() is not strictly
necessary, but it does shrink the object file a bit and also keeps the
two functions symmetrical.

-current:

-rw-r--r--  1 ssc  wobj 53512 Aug 25 13:28 obj/acpihpet.o

patched:

-rw-r--r--  1 ssc  wobj 50040 Aug 25 13:29 obj/acpihpet.o

So we shave 3472 bytes off the module on amd64.

As suggested by jsg@ in the big ACPI delay thread, I am using __LP64__
to decide between 4-byte and 8-byte bus access.

ok?

Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.28
diff -u -p -r1.28 acpihpet.c
--- acpihpet.c  25 Aug 2022 18:01:54 -  1.28
+++ acpihpet.c  25 Aug 2022 18:34:11 -
@@ -86,20 +86,28 @@ struct cfdriver acpihpet_cd = {
 uint64_t
 acpihpet_r(bus_space_tag_t iot, bus_space_handle_t ioh, bus_size_t ioa)
 {
+#if defined(__LP64__)
+   return bus_space_read_8(iot, ioh, ioa);
+#else
uint64_t val;
 
val = bus_space_read_4(iot, ioh, ioa + 4);
val = val << 32;
val |= bus_space_read_4(iot, ioh, ioa);
return (val);
+#endif
 }
 
 void
 acpihpet_w(bus_space_tag_t iot, bus_space_handle_t ioh, bus_size_t ioa,
 uint64_t val)
 {
+#if defined(__LP64__)
+   bus_space_write_8(iot, ioh, ioa, val);
+#else
bus_space_write_4(iot, ioh, ioa + 4, val >> 32);
bus_space_write_4(iot, ioh, ioa, val & 0x);
+#endif
 }
 
 int



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-24 Thread Scott Cheloha
On Wed, Aug 24, 2022 at 05:51:14PM +1000, Jonathan Gray wrote:
> On Tue, Aug 23, 2022 at 12:20:39PM -0500, Scott Cheloha wrote:
> > > Hyper-V generation 1 VMs are bios boot with emulation of the usual
> > > devices.  32-bit and 64-bit guests.
> > > 
> > > Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> > > 64-bit guests only.
> > > 
> > > There is no 8254 in generation 2.
> > > No HPET in either generation.
> > > 
> > > hv_delay uses the "Partition Reference Counter MSR" described in
> > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> > > It seems it is available in both generations and could be used from i386?
> > > 
> > > From reading that page hv_delay() should be preferred over lapic_delay()
> > 
> > Alright, I have nudged hv_delay's quality up over lapic_delay's
> > quality.
> 
> Before these changes, tsc is probed before pvbus.  Do the tsc sanity
> checks result in it not being considered an option on hyper-v?  I think
> the tsc_delay and hv_delay numbers should be swapped in a later commit.
> It is unclear if that would change the final delay_func setting.

Why would we prefer hv_delay() to tsc_delay() if we had a
constant/invariant TSC available in our Hyper-V guest?

When patrick@ emailed me last year about issues with delay(9) on
Hyper-V, he started by saying that the root of the problem was that
the OpenBSD guest was not opting to use tsc_delay() because the host
wasn't reporting a constant/invariant TSC.  So the guest was trying to
use i8254_delay(), which was impossible because Hyper-V Gen2 guests
don't have an i8254.  Hence, hv_delay() was added to the tree.

So, my understanding is that the addition of hv_delay() does not mean
tsc_delay() is worse than hv_delay().  hv_delay() was added because
tsc_delay() isn't always an option and (to our surprise) neither is
i8254_delay().

> It would be a good idea to have different commits for the places new
> delay callbacks are introduced.
> 
> - add delay_init()
> - use delay_init() in lapic, tsc, hv_delay
> - commit acpihpet
> - commit acpitimer

I had planned to do separate commits.  This ordering seems right.

> - swap tsc and hv_delay numbers

See above.

> > How are we looking now?
> 
> some minor suggestions inline
> 
> have you built a release with this?

Just finished building a release and upgrading with it from physical
media.  I think we are good to go.  I incorporated your suggestions
below and I'm going to do the first four suggested commits tomorrow
unless I hear otherwise.

Current combined patch is attached.

> > Index: sys/arch/amd64/amd64/lapic.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> > retrieving revision 1.60
> > diff -u -p -r1.60 lapic.c
> > --- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
> > +++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 -
> > @@ -486,8 +486,6 @@ wait_next_cycle(void)
> > }
> >  }
> >  
> > -extern void tsc_delay(int);
> > -
> 
> this cleanup is unrelated and should be a different diff/commit

Ack, will do it separately.

> >  /*
> >   * Calibrate the local apic count-down timer (which is running at
> >   * bus-clock speed) vs. the i8254 counter/timer (which is running at
> > @@ -592,8 +590,7 @@ skip_calibration:
> >  * Now that the timer's calibrated, use the apic timer routines
> >  * for all our timing needs..
> >  */
> > -   if (delay_func == i8254_delay)
> > -   delay_func = lapic_delay;
> > +   delay_init(lapic_delay, 3000);
> > initclock_func = lapic_initclocks;
> > }
> >  }
> > Index: sys/arch/amd64/amd64/machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
> > retrieving revision 1.279
> > diff -u -p -r1.279 machdep.c
> > --- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
> > +++ sys/arch/amd64/amd64/machdep.c  23 Aug 2022 17:18:31 -
> > @@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
> >  
> > return 0;
> >  }
> > +
> > +void
> > +delay_init(void(*fn)(int), int fn_quality)
> > +{
> > +   static int cur_quality = 0;
> > +   if (fn_quality > cur_quality) {
> > +   delay_func = fn;
> > +   cur_quality = fn_quality;
> > +   }
> > +}
> > Index: sys/arch/amd64/amd64/tsc.c
> > =

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-23 Thread Scott Cheloha
On Tue, Aug 23, 2022 at 04:04:39PM +1000, Jonathan Gray wrote:
> On Mon, Aug 22, 2022 at 09:37:02AM -0500, Scott Cheloha wrote:
> > On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> > > On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> > > > 
> > > > It seems to me it would be cleaner if the decision of what to use for
> > > > delay could be moved into an md file.
> > > > 
> > > > Or abstract it by having a numeric weight like timecounters or driver
> > > > match return numbers.
> > > 
> > > diff against your previous, does not change lapic_delay
> > 
> > Sorry for the delay.
> > 
> > :)
> > 
> > I was out of town.
> > 
> > I slept on this and you're right, this is better.
> > 
> > Couple tweaks:
> > 
> > - Move the quality numbers into cpu.h and give them names.  That way,
> >   the next time Intel, or AMD, or Microsoft or [...] does something
> >   foolish we don't need to rototill all these files to juggle the
> >   quality hierarchy.
> 
> While that would allow arch specific differences, removing one would
> require changing at least three places in the tree.

Okay, nevermind.  Patch updated below.

> > - Update both amd64 and i386's lapic.c to use delay_init().  While we
> >   have a lapic_delay() in the tree it should cooperate with everything
> >   else.
> > 
> > - Include  in any files calling delay_init() where it
> >   isn't already included.
> > 
> > - Give the variables in delay_init() real names.
> 
> hmm
> 
> > I'm unsure about two small things:
> > 
> > - Can i386 use hv_delay()?  The i386 GENERIC config does not list
> >   hyperv(4) support so my guess is "no" and I have excluded
> >   HV_DELAY_QUALITY from i386's cpu.h.
> > 
> > - If a Hyper-V guest could choose between hv_delay() and
> >   lapic_delay(), which would be preferable?  Right now I
> >   have hv_delay() scored lower than lapic_delay().
> 
> Hyper-V generation 1 VMs are bios boot with emulation of the usual
> devices.  32-bit and 64-bit guests.
> 
> Hyper-V generation 2 VMs are 64-bit uefi with paravirtualised devices.
> 64-bit guests only.
> 
> There is no 8254 in generation 2.
> No HPET in either generation.
> 
> hv_delay uses the "Partition Reference Counter MSR" described in
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers
> It seems it is available in both generations and could be used from i386?
> 
> From reading that page hv_delay() should be preferred over lapic_delay()

Alright, I have nudged hv_delay's quality up over lapic_delay's
quality.

How are we looking now?

Index: sys/arch/amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.60
diff -u -p -r1.60 lapic.c
--- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
+++ sys/arch/amd64/amd64/lapic.c23 Aug 2022 17:18:30 -
@@ -486,8 +486,6 @@ wait_next_cycle(void)
}
 }
 
-extern void tsc_delay(int);
-
 /*
  * Calibrate the local apic count-down timer (which is running at
  * bus-clock speed) vs. the i8254 counter/timer (which is running at
@@ -592,8 +590,7 @@ skip_calibration:
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 */
-   if (delay_func == i8254_delay)
-   delay_func = lapic_delay;
+   delay_init(lapic_delay, 3000);
initclock_func = lapic_initclocks;
}
 }
Index: sys/arch/amd64/amd64/machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.279
diff -u -p -r1.279 machdep.c
--- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
+++ sys/arch/amd64/amd64/machdep.c  23 Aug 2022 17:18:31 -
@@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
 
return 0;
 }
+
+void
+delay_init(void(*fn)(int), int fn_quality)
+{
+   static int cur_quality = 0;
+   if (fn_quality > cur_quality) {
+   delay_func = fn;
+   cur_quality = fn_quality;
+   }
+}
Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.25
diff -u -p -r1.25 tsc.c
--- sys/arch/amd64/amd64/tsc.c  12 Aug 2022 02:20:36 -  1.25
+++ sys/arch/amd64/amd64/tsc.c  23 Aug 2022 17:18:31 -
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_f

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-22 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 09:00:12PM +1000, Jonathan Gray wrote:
> On Wed, Aug 17, 2022 at 04:53:20PM +1000, Jonathan Gray wrote:
> > 
> > It seems to me it would be cleaner if the decision of what to use for
> > delay could be moved into an md file.
> > 
> > Or abstract it by having a numeric weight like timecounters or driver
> > match return numbers.
> 
> diff against your previous, does not change lapic_delay

Sorry for the delay.

:)

I was out of town.

I slept on this and you're right, this is better.

Couple tweaks:

- Move the quality numbers into cpu.h and give them names.  That way,
  the next time Intel, or AMD, or Microsoft or [...] does something
  foolish we don't need to rototill all these files to juggle the
  quality hierarchy.

- Update both amd64 and i386's lapic.c to use delay_init().  While we
  have a lapic_delay() in the tree it should cooperate with everything
  else.

- Include  in any files calling delay_init() where it
  isn't already included.

- Give the variables in delay_init() real names.

I'm unsure about two small things:

- Can i386 use hv_delay()?  The i386 GENERIC config does not list
  hyperv(4) support so my guess is "no" and I have excluded
  HV_DELAY_QUALITY from i386's cpu.h.

- If a Hyper-V guest could choose between hv_delay() and
  lapic_delay(), which would be preferable?  Right now I
  have hv_delay() scored lower than lapic_delay().

Once we've sorted those out, are you OK with the attached patch?

mlarkin: still ok?

Index: sys/arch/amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.60
diff -u -p -r1.60 lapic.c
--- sys/arch/amd64/amd64/lapic.c15 Aug 2022 04:17:50 -  1.60
+++ sys/arch/amd64/amd64/lapic.c22 Aug 2022 14:33:30 -
@@ -486,8 +486,6 @@ wait_next_cycle(void)
}
 }
 
-extern void tsc_delay(int);
-
 /*
  * Calibrate the local apic count-down timer (which is running at
  * bus-clock speed) vs. the i8254 counter/timer (which is running at
@@ -592,8 +590,7 @@ skip_calibration:
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 */
-   if (delay_func == i8254_delay)
-   delay_func = lapic_delay;
+   delay_init(lapic_delay, LAPIC_DELAY_QUALITY);
initclock_func = lapic_initclocks;
}
 }
Index: sys/arch/amd64/amd64/machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.279
diff -u -p -r1.279 machdep.c
--- sys/arch/amd64/amd64/machdep.c  7 Aug 2022 23:56:06 -   1.279
+++ sys/arch/amd64/amd64/machdep.c  22 Aug 2022 14:33:30 -
@@ -2069,3 +2069,13 @@ check_context(const struct reg *regs, st
 
return 0;
 }
+
+void
+delay_init(void(*fn)(int), int fn_quality)
+{
+   static int cur_quality = 0;
+   if (fn_quality > cur_quality) {
+   delay_func = fn;
+   cur_quality = fn_quality;
+   }
+}
Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.25
diff -u -p -r1.25 tsc.c
--- sys/arch/amd64/amd64/tsc.c  12 Aug 2022 02:20:36 -  1.25
+++ sys/arch/amd64/amd64/tsc.c  22 Aug 2022 14:33:30 -
@@ -109,7 +109,7 @@ tsc_identify(struct cpu_info *ci)
 
tsc_frequency = tsc_freq_cpuid(ci);
if (tsc_frequency > 0)
-   delay_func = tsc_delay;
+   delay_init(tsc_delay, TSC_DELAY_QUALITY);
 }
 
 static inline int
Index: sys/arch/amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.148
diff -u -p -r1.148 cpu.h
--- sys/arch/amd64/include/cpu.h22 Aug 2022 08:57:54 -  1.148
+++ sys/arch/amd64/include/cpu.h22 Aug 2022 14:33:30 -
@@ -364,6 +364,13 @@ struct timeval;
 #define DELAY(x)   (*delay_func)(x)
 #define delay(x)   (*delay_func)(x)
 
+void delay_init(void(*)(int), int);
+
+#define ACPITIMER_DELAY_QUALITY100
+#define ACPIHPET_DELAY_QUALITY 200
+#define HV_DELAY_QUALITY   250
+#define LAPIC_DELAY_QUALITY300
+#define TSC_DELAY_QUALITY  400
 
 #ifdef _KERNEL
 /* locore.S */
Index: sys/arch/i386/i386/lapic.c
===
RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
retrieving revision 1.49
diff -u -p -r1.49 lapic.c
--- sys/arch/i386/i386/lapic.c  15 Aug 2022 04:17:50 -  1.49
+++ sys/arch/i386/i386/lapic.c  22 Aug 2022 14:33:30 -
@@ -395,7 +395,7 @@ lapic_calibrate_timer(struct cpu_info *c
 * Now that the timer's calibrated, use the apic timer routines
 * for all our timing needs..
 

Re: all architectures: put clockframe definition in frame.h?

2022-08-19 Thread Scott Cheloha
> On Aug 19, 2022, at 01:52, Claudio Jeker  wrote:
> 
> On Thu, Aug 18, 2022 at 10:32:36PM -0500, Scott Cheloha wrote:
>> Hi,
>> 
>> clockframe is sometimes defined in cpu.h, sometimes in frame.h, and
>> sometimes defined once each in both header files.
>> 
>> Can we put the clockframe definitions in frame.h?  Always?  It is, at
>> least ostensibly, a "frame".
>> 
>> I do not want to consolidate the clockframe definitions in cpu.h
>> because this is creating a circular dependency problem for my clock
>> interrupt patch.
>> 
>> In particular, cpu.h needs a data structure defined in a new header
>> file to add it to struct cpu_info on all architectures, like this:
>> 
>> /* cpu.h */
>> 
>> #include 
>> 
>> struct cpu_info {
>>/* ... */
>>struct clockintr_state;
>> };
>> 
>> ... but the header clockintr.h needs the clockframe definition so it
>> can prototype functions accepting a clockframe pointer, like this:
>> 
>> /* clockintr.h */
>> 
>> #include /* this works fine */
>> 
>> #ifdef this_does_not_work
>> #include 
>> #endif
>> 
>> int clockintr_foo(struct clockframe *, int, short);
>> int clockintr_bar(struct clockframe *, char *, long);
> 
> You can also just do:
> 
> struct clockframe;
> int clockintr_foo(struct clockframe *, int, short);
> int clockintr_bar(struct clockframe *, char *, long);
> 
> There is no need to have the full struct definition for a pointer.
> With that there is no need to include machine/frame.h or machine/cpu.h
> In my opinion this may be the preferred way of handling this but unifying
> the definitions into one place still makes sense to me.

That was the first thing I tried.

It doesn't work if clockframe is not a
real struct.  On some platforms it's just
a preprocessor macro.  So you end up
with a redefinition and compilation fails,
hence this patch.

That particular landmine is already in the
tree, in systm.h.  It seems to have laid
dormant due to #include ordering and luck.

I think there is also a discussion to be had
about whether we could just throw away
the "clockframe" abstraction and put a
trapframe/intrframe/whatever pointer into
cpu_info and rewrite the CLKF macros to
use a cpu_info pointer...

... but that seemed like more work than just
making the place-of-definition consistent.

Does that sound more appealing than "fixing"
clockframe?



all architectures: put clockframe definition in frame.h?

2022-08-18 Thread Scott Cheloha
Hi,

clockframe is sometimes defined in cpu.h, sometimes in frame.h, and
sometimes defined once each in both header files.

Can we put the clockframe definitions in frame.h?  Always?  It is, at
least ostensibly, a "frame".

I do not want to consolidate the clockframe definitions in cpu.h
because this is creating a circular dependency problem for my clock
interrupt patch.

In particular, cpu.h needs a data structure defined in a new header
file to add it to struct cpu_info on all architectures, like this:

/* cpu.h */

#include 

struct cpu_info {
/* ... */
struct clockintr_state;
};

... but the header clockintr.h needs the clockframe definition so it
can prototype functions accepting a clockframe pointer, like this:

/* clockintr.h */

#include   /* this works fine */

#ifdef this_does_not_work
#include 
#endif

int clockintr_foo(struct clockframe *, int, short);
int clockintr_bar(struct clockframe *, char *, long);

struct clockintr_state {
char *cs_foo;
int cs_bar;
};

--

Hopefully I have illustrated the problem.

The only architecture where this might be a problem is sparc64.
There, clockframe is defined in terms of trapframe64, which is defined
in reg.h, not frame.h.

kettenis: can we put clockframe in frame.h on sparc64 or am I buying
trouble?

I can't compile-test this everywhere, but because every architecture's
cpu.h includes frame.h I don't think this can break anything (except
on sparc64).

The CLKF macros can remain in cpu.h.  They are not data structures so
putting them in frame.h looks odd on most architectures.

Index: alpha/include/cpu.h
===
RCS file: /cvs/src/sys/arch/alpha/include/cpu.h,v
retrieving revision 1.66
diff -u -p -r1.66 cpu.h
--- alpha/include/cpu.h 10 Aug 2022 10:41:35 -  1.66
+++ alpha/include/cpu.h 19 Aug 2022 03:27:06 -
@@ -296,14 +296,6 @@ cpu_rnd_messybits(void)
return alpha_rpcc();
 }
 
-/*
- * Arguments to hardclock and gatherstats encapsulate the previous
- * machine state in an opaque clockframe.  On the Alpha, we use
- * what we push on an interrupt (a trapframe).
- */
-struct clockframe {
-   struct trapframecf_tf;
-};
 #defineCLKF_USERMODE(framep)   
\
(((framep)->cf_tf.tf_regs[FRAME_PS] & ALPHA_PSL_USERMODE) != 0)
 #defineCLKF_PC(framep) ((framep)->cf_tf.tf_regs[FRAME_PC])
Index: alpha/include/frame.h
===
RCS file: /cvs/src/sys/arch/alpha/include/frame.h,v
retrieving revision 1.4
diff -u -p -r1.4 frame.h
--- alpha/include/frame.h   23 Mar 2011 16:54:34 -  1.4
+++ alpha/include/frame.h   19 Aug 2022 03:27:08 -
@@ -92,4 +92,13 @@ struct trapframe {
unsigned long   tf_regs[FRAME_SIZE];/* See above */
 };
 
+/*
+ * Arguments to hardclock and gatherstats encapsulate the previous
+ * machine state in an opaque clockframe.  On the Alpha, we use
+ * what we push on an interrupt (a trapframe).
+ */
+struct clockframe {
+   struct trapframecf_tf;
+};
+
 #endif /* _MACHINE_FRAME_H_ */
Index: amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.147
diff -u -p -r1.147 cpu.h
--- amd64/include/cpu.h 12 Aug 2022 02:20:36 -  1.147
+++ amd64/include/cpu.h 19 Aug 2022 03:27:08 -
@@ -335,13 +335,6 @@ cpu_rnd_messybits(void)
 
 #define curpcb curcpu()->ci_curpcb
 
-/*
- * Arguments to hardclock, softclock and statclock
- * encapsulate the previous machine state in an opaque
- * clockframe; for now, use generic intrframe.
- */
-#define clockframe intrframe
-
 #defineCLKF_USERMODE(frame)USERMODE((frame)->if_cs, 
(frame)->if_rflags)
 #define CLKF_PC(frame) ((frame)->if_rip)
 #define CLKF_INTR(frame)   (curcpu()->ci_idepth > 1)
Index: amd64/include/frame.h
===
RCS file: /cvs/src/sys/arch/amd64/include/frame.h,v
retrieving revision 1.10
diff -u -p -r1.10 frame.h
--- amd64/include/frame.h   10 Jul 2018 08:57:44 -  1.10
+++ amd64/include/frame.h   19 Aug 2022 03:27:08 -
@@ -138,6 +138,12 @@ struct intrframe {
int64_t if_ss;
 };
 
+/*
+ * Arguments to hardclock, softclock and statclock
+ * encapsulate the previous machine state in an opaque
+ * clockframe; for now, use generic intrframe.
+ */
+#define clockframe intrframe
 
 /*
  * The trampoline frame used on the kernel stack page which is present
Index: arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.27
diff -u -p -r1.27 cpu.h
--- arm64/include/cpu.h 13 Jul 2022 09:28:19 -  1.27
+++ arm64/include/cpu.h 19 Aug 2022 03:27:08 -
@@ -49,7 +49,6 @@
 
 /* All the CLKF_* macros 

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-17 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 01:30:29PM +, Visa Hankala wrote:
> On Tue, Aug 09, 2022 at 09:54:02AM -0500, Scott Cheloha wrote:
> > On Tue, Aug 09, 2022 at 02:03:31PM +, Visa Hankala wrote:
> > > On Mon, Aug 08, 2022 at 02:52:37AM -0500, Scott Cheloha wrote:
> > > > One thing I'm still uncertain about is how glxclk fits into the
> > > > loongson picture.  It's an interrupt clock that runs hardclock() and
> > > > statclock(), but the code doesn't do any logical masking, so I don't
> > > > know whether or not I need to adjust anything in that code or account
> > > > for it at all.  If there's no logical masking there's no deferral, so
> > > > it would never call need to call md_triggerclock() from splx(9).
> > > 
> > > I think the masking of glxclk interrupts are handled by the ISA
> > > interrupt code.
> > 
> > Do those machines not have Coprocessor 0?  If they do, why would you
> > prefer glxclk over CP0?
> > 
> > > The patch misses md_triggerclock definition in mips64_machdep.c.
> > 
> > Whoops, forgot that file.  Fuller patch below.
> > 
> > > I have put this to the test on the mips64 ports builder machines.
> 
> The machines completed a build with this patch without problems.
> I tested with the debug counters removed from cp0_trigger_int5().
> 
> OK visa@

Thank you for testing!

There was a loongson portion to this patch.  Is this OK on loongson or
just octeon?

Also, what did the debug counters look like when you yanked them?  If
cp0_raise_miss was non-zero I will double the initial offset to 32
cycles.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote:
> On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > > 
> > > In the future when the LAPIC timer is run in oneshot mode there will
> > > be no lapic_delay().
> > > 
> > > [...]
> > > 
> > > This is *very* bad for older amd64 machines, because you are left with
> > > i8254_delay().
> > > 
> > > I would like to offer a less awful delay(9) implementation for this
> > > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > > MP kernels because only one CPU can read the i8254 at a time.
> > > 
> > > [...]
> > > 
> > > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > > will be fine using acpitimer_delay() instead of i8254_delay().
> > > 
> > > [...]
> > > 
> > > Here are the sample measurements from my 2017 laptop (kaby lake
> > > refresh) running the attached patch.  It takes longer than a
> > > microsecond to read either of the ACPI timers.  The PM timer is better
> > > than the HPET.  The HPET is a bit better than the i8254.  I hope the
> > > numbers are a little better on older hardware.
> > > 
> > > acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> > > 0.09638
> > > acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> > > 0.05464
> > > acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> > > 0.07619
> > > acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> > > 0.07275
> > > acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> > > 0.07891
> > > 
> > > acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> > > 0.21208
> > > acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> > > 0.21690
> > > acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> > > 0.12647
> > > acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> > > 0.21480
> > > acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> > > 0.13736
> > > 
> > > i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> > > 0.39110
> > > i8254_test_delay:  expected  0.1  actual  0.39471  error  
> > > 0.29471
> > > i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> > > 0.28031
> > > i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> > > 0.24586
> > > i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> > > 0.21859
> > 
> > Attched is an updated patch.  I left the test measurement code in
> > place because I would like to see a test on a real i386 machine, just
> > to make sure it works as expected.  I can't imagine why it wouldn't
> > work, but we should never assume anything.
> > 
> > Changes from v1:
> > 
> > - Actually set delay_func from acpitimerattach() and
> >   acpihpet_attach().
> > 
> >   I think it's safe to assume, on real hardware, that the ACPI PMT is
> >   preferable to the i8254 and the HPET is preferable to both of them.
> > 
> >   This is not *always* true, but it is true on the older machines that
> >   can't use tsc_delay(), so the assumption works in practice.
> > 
> >   Outside of those three timers, the hierarchy gets murky.  There are
> >   other timers that are better than the HPET, but they aren't always
> >   available.  If those timers are already providing delay_func this
> >   code does not usurp them.
> 
> As I understand it, you want lapic to be in one-shot mode for something
> along the lines of tickless.

Yes.

Although "tickless" is a misnomer.

> So you are trying to find MP machines
> where TSC is not useable for delay?

Right.  Those are the only machines where it's relevant to consider
the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've
forgotten something.

> TSC is only considered for delay if the invariant and constant flags
> are set.
> invariant:
> "In the Core i7 and future processor generations, the TSC will continue
> to run in the deepest C-states. Therefore, the TSC will run at a
> constant rate in all ACPI P-,

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > 
> > In the future when the LAPIC timer is run in oneshot mode there will
> > be no lapic_delay().
> > 
> > [...]
> > 
> > This is *very* bad for older amd64 machines, because you are left with
> > i8254_delay().
> > 
> > I would like to offer a less awful delay(9) implementation for this
> > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > MP kernels because only one CPU can read the i8254 at a time.
> > 
> > [...]
> > 
> > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > will be fine using acpitimer_delay() instead of i8254_delay().
> > 
> > [...]
> 
> Attched is an updated patch.  I left the test measurement code in
> place because I would like to see a test on a real i386 machine, just
> to make sure it works as expected.  I can't imagine why it wouldn't
> work, but we should never assume anything.
> 
> [...]
> 
> One remaining question I have:
> 
> Is there a nice way to test whether ACPI PMT support is compiled into
> the kernel?  We can assume the existence of i8254_delay() because
> clock.c is required on amd64 and i386.  However, acpitimer.c is a
> optional, so acpitimer_delay() isn't necessarily there.
> 
> I would rather not introduce a hard requirement on acpitimer.c into
> acpihpet.c if there's an easy way to check for the latter.
> 
> Any ideas?

And here's the cleaned up patch.  Just in case nobody tests i386.
Pretty straightforward.  acpitimer is preferable to i8254, hpet is
preferable to acpitimer and i8254.

The only obvious problem I see is the hard dependency this creates in
acpihpet.c on acpitimer.c.

Index: acpitimer.c
===
RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v
retrieving revision 1.15
diff -u -p -r1.15 acpitimer.c
--- acpitimer.c 6 Apr 2022 18:59:27 -   1.15
+++ acpitimer.c 17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -25,10 +26,13 @@
 #include 
 #include 
 
+struct acpitimer_softc;
+
 int acpitimermatch(struct device *, void *, void *);
 void acpitimerattach(struct device *, struct device *, void *);
-
+void acpitimer_delay(int);
 u_int acpi_get_timecount(struct timecounter *tc);
+uint32_t acpitimer_read(struct acpitimer_softc *);
 
 static struct timecounter acpi_timecounter = {
.tc_get_timecount = acpi_get_timecount,
@@ -98,18 +102,45 @@ acpitimerattach(struct device *parent, s
acpi_timecounter.tc_priv = sc;
acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
+
+#if defined(__amd64__) || defined(__i386__)
+   if (delay_func == i8254_delay)
+   delay_func = acpitimer_delay;
+#endif
 #if defined(__amd64__)
extern void cpu_recalibrate_tsc(struct timecounter *);
cpu_recalibrate_tsc(_timecounter);
 #endif
 }
 
+void
+acpitimer_delay(int usecs)
+{
+   uint64_t count = 0, cycles;
+   struct acpitimer_softc *sc = acpi_timecounter.tc_priv;
+   uint32_t mask = acpi_timecounter.tc_counter_mask;
+   uint32_t val1, val2;
+
+   val2 = acpitimer_read(sc);
+   cycles = usecs * acpi_timecounter.tc_frequency / 100;
+   while (count < cycles) {
+   CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = acpitimer_read(sc);
+   count += (val2 - val1) & mask;
+   }
+}
 
 u_int
 acpi_get_timecount(struct timecounter *tc)
 {
-   struct acpitimer_softc *sc = tc->tc_priv;
-   u_int u1, u2, u3;
+   return acpitimer_read(tc->tc_priv);
+}
+
+uint32_t
+acpitimer_read(struct acpitimer_softc *sc)
+{
+   uint32_t u1, u2, u3;
 
u2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
u3 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.26
diff -u -p -r1.26 acpihpet.c
--- acpihpet.c  6 Apr 2022 18:59:27 -   1.26
+++ acpihpet.c  17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -31,7 +32,7 @@ int acpihpet_attached;
 int acpihpet_match(struct device *, void *, void *);
 void acpihpet_attach(struct device *, struct device *, void *);
 int acpihpet_activate(struct device *, int);
-
+void acpihpet_delay(int);
 u_int acpihpet_gettime(struct timecounter *tc);
 
 uint64_t   acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
@@ -262,15 +263,37 @@ acpihpet_attach(struct device *parent, s
freq = 1000ull / period;
printf(": %lld Hz\n", freq);
 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> 
> In the future when the LAPIC timer is run in oneshot mode there will
> be no lapic_delay().
> 
> [...]
> 
> This is *very* bad for older amd64 machines, because you are left with
> i8254_delay().
> 
> I would like to offer a less awful delay(9) implementation for this
> class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> MP kernels because only one CPU can read the i8254 at a time.
> 
> [...]
> 
> Real i386 hardware should be fine.  Later models with an ACPI PM timer
> will be fine using acpitimer_delay() instead of i8254_delay().
> 
> [...]
> 
> Here are the sample measurements from my 2017 laptop (kaby lake
> refresh) running the attached patch.  It takes longer than a
> microsecond to read either of the ACPI timers.  The PM timer is better
> than the HPET.  The HPET is a bit better than the i8254.  I hope the
> numbers are a little better on older hardware.
> 
> acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> 0.09638
> acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> 0.05464
> acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> 0.07619
> acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> 0.07275
> acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> 0.07891
> 
> acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> 0.21208
> acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> 0.21690
> acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> 0.12647
> acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> 0.21480
> acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> 0.13736
> 
> i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> 0.39110
> i8254_test_delay:  expected  0.1  actual  0.39471  error  
> 0.29471
> i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> 0.28031
> i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> 0.24586
> i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> 0.21859

Attched is an updated patch.  I left the test measurement code in
place because I would like to see a test on a real i386 machine, just
to make sure it works as expected.  I can't imagine why it wouldn't
work, but we should never assume anything.

Changes from v1:

- Actually set delay_func from acpitimerattach() and
  acpihpet_attach().

  I think it's safe to assume, on real hardware, that the ACPI PMT is
  preferable to the i8254 and the HPET is preferable to both of them.

  This is not *always* true, but it is true on the older machines that
  can't use tsc_delay(), so the assumption works in practice.

  Outside of those three timers, the hierarchy gets murky.  There are
  other timers that are better than the HPET, but they aren't always
  available.  If those timers are already providing delay_func this
  code does not usurp them.

- Duplicate test measurement code from amd64/lapic.c into i386/lapic.c.
  Will be removed in the committed version.

- Use bus_space_read_8() in acpihpet.c if it's available.  The HPET is
  a 64-bit counter and the spec permits 32-bit or 64-bit aligned access.

  As one might predict, this cuts the overhead in half because we're
  doing half as many reads.

  This part can go into a separate commit, but I thought it was neat
  so I'm including it here.

One remaining question I have:

Is there a nice way to test whether ACPI PMT support is compiled into
the kernel?  We can assume the existence of i8254_delay() because
clock.c is required on amd64 and i386.  However, acpitimer.c is a
optional, so acpitimer_delay() isn't necessarily there.

I would rather not introduce a hard requirement on acpitimer.c into
acpihpet.c if there's an easy way to check for the latter.

Any ideas?

Anyone have i386 hardware results?  If I'm reading the timeline right,
most P6 machines and beyond (NetBurst, etc) will have an ACPI PMT.  I
don't know if any real x86 motherboards shipped with an HPET, but it's
possible.

Here are my updated results with the bus_space_read_8 change:

acpitimer_test_delay:  expected  0.01000  actual  0.10607  error  
0.09607
acpitimer_test_delay:  expected  0.1  actual  0.15491  error  
0.05491
acpitimer_test_delay:  expected  0.00010  actual  0.000107734  error  
0.07734
acpitimer_test_delay:  expected  0.00100  actual  0.001008006  error  
0.08006
acpitimer_test_delay:  expected  0.01000  actual  0.010007042  error  
0.07042

acpihpet_test_delay

[RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-14 Thread Scott Cheloha
Hi,

In the future when the LAPIC timer is run in oneshot mode there will
be no lapic_delay().

This is fine if you have a constant TSC, because we have tsc_delay().

This is *very* bad for older amd64 machines, because you are left with
i8254_delay().

I would like to offer a less awful delay(9) implementation for this
class of hardware.  Otherwise we may trip over bizarre phantom bugs on
MP kernels because only one CPU can read the i8254 at a time.

I think patrick@ was struggling with some version of that problem last
year, but in a VM.

Real i386 hardware should be fine.  Later models with an ACPI PM timer
will be fine using acpitimer_delay() instead of i8254_delay().

If this seems reasonable to people I will come back with a cleaned up
patch for testing.

Thoughts?  Preferences?

-Scott

Here are the sample measurements from my 2017 laptop (kaby lake
refresh) running the attached patch.  It takes longer than a
microsecond to read either of the ACPI timers.  The PM timer is better
than the HPET.  The HPET is a bit better than the i8254.  I hope the
numbers are a little better on older hardware.

acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
0.09638
acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
0.05464
acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
0.07619
acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
0.07275
acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
0.07891

acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
0.21208
acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
0.21690
acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
0.12647
acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
0.21480
acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
0.13736

i8254_test_delay:  expected  0.01000  actual  0.40110  error  
0.39110
i8254_test_delay:  expected  0.1  actual  0.39471  error  
0.29471
i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
0.28031
i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
0.24586
i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
0.21859

Index: dev/acpi/acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.26
diff -u -p -r1.26 acpihpet.c
--- dev/acpi/acpihpet.c 6 Apr 2022 18:59:27 -   1.26
+++ dev/acpi/acpihpet.c 15 Aug 2022 04:21:58 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -31,8 +32,9 @@ int acpihpet_attached;
 int acpihpet_match(struct device *, void *, void *);
 void acpihpet_attach(struct device *, struct device *, void *);
 int acpihpet_activate(struct device *, int);
-
+void acpiphet_delay(u_int);
 u_int acpihpet_gettime(struct timecounter *tc);
+void acpihpet_test_delay(u_int);
 
 uint64_t   acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
bus_size_t _ioa);
@@ -262,7 +264,7 @@ acpihpet_attach(struct device *parent, s
freq = 1000ull / period;
printf(": %lld Hz\n", freq);
 
-   hpet_timecounter.tc_frequency = (uint32_t)freq;
+   hpet_timecounter.tc_frequency = freq;
hpet_timecounter.tc_priv = sc;
hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
@@ -273,10 +275,43 @@ acpihpet_attach(struct device *parent, s
acpihpet_attached++;
 }
 
+void
+acpihpet_delay(u_int usecs)
+{
+   uint64_t d, s;
+   struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+
+   d = usecs * hpet_timecounter.tc_frequency / 100;
+   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < d)
+   CPU_BUSY_CYCLE();
+}
+
 u_int
 acpihpet_gettime(struct timecounter *tc)
 {
struct acpihpet_softc *sc = tc->tc_priv;
 
return (bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER));
+}
+
+void
+acpihpet_test_delay(u_int usecs)
+{
+   struct timespec ac, er, ex, t0, t1;
+
+   if (!acpihpet_attached) {
+   printf("%s: (no hpet attached)\n", __func__);
+   return;
+   }
+
+   nanouptime();
+   acpihpet_delay(usecs);
+   nanouptime();
+   timespecsub(, , );
+   NSEC_TO_TIMESPEC(usecs * 1000ULL, );
+   timespecsub(, , );
+   printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n",
+   __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec,
+   er.tv_sec, er.tv_nsec);
 }
Index: dev/acpi/acpitimer.c
===
RCS file: 

renice(8): don't succeed after 256 errors

2022-08-11 Thread Scott Cheloha
This is a good one.

$ renice -n -1 -p 1 ; echo $?
renice: setpriority: 1: Operation not permitted
1
$ renice -n -1 -p 1 1 ; echo $?
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
2
$ renice -n -1 -p 1 1 1 ; echo $?
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
3
$ renice -n -1 -p $(jot -b 1 256) 2>/dev/null; echo $?
0

Fix is to just set error instead of incrementing it.

ok?

Index: renice.c
===
RCS file: /cvs/src/usr.bin/renice/renice.c,v
retrieving revision 1.21
diff -u -p -r1.21 renice.c
--- renice.c25 Jan 2019 00:19:26 -  1.21
+++ renice.c11 Aug 2022 22:49:23 -
@@ -155,14 +155,14 @@ main(int argc, char **argv)
 static int
 renice(struct renice_param *p, struct renice_param *end)
 {
-   int new, old, errors = 0;
+   int new, old, error = 0;
 
for (; p < end; p++) {
errno = 0;
old = getpriority(p->id_type, p->id);
if (errno) {
warn("getpriority: %d", p->id);
-   errors++;
+   error = 1;
continue;
}
if (p->pri_type == RENICE_INCREMENT)
@@ -171,13 +171,13 @@ renice(struct renice_param *p, struct re
p->pri < PRIO_MIN ? PRIO_MIN : p->pri;
if (setpriority(p->id_type, p->id, new) == -1) {
warn("setpriority: %d", p->id);
-   errors++;
+   error = 1;
continue;
}
printf("%d: old priority %d, new priority %d\n",
p->id, old, new);
}
-   return (errors);
+   return error;
 }
 
 __dead void



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Thu, Aug 11, 2022 at 02:22:08AM +0200, Jeremie Courreges-Anglas wrote:
> On Wed, Aug 10 2022, Scott Cheloha  wrote:
> > [...]
> >
> > 1. Our ksh(1) already checks for stdout errors in the echo builtin.
> 
> So do any of the scripts in our source tree use /bin/echo for whatever
> reason?  If so, could one of these scripts be broken if /bin/echo
> started to report an error?  Shouldn't those scripts be reviewed?

I didn't look.

There are... hundreds files that look like shell scripts in src.

$ cd /usr/src
$ find . -exec egrep -l '\#.*\!.*sh' > ~/src-shell-script-paths
$ wc -l ~/src-shell-script-paths
1118 /home/ssc/src-shell-script-paths

A lot of them are in regress/.

I guess I better start looking.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Wed, Aug 10, 2022 at 02:23:08PM -0600, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > On Wed, Aug 10, 2022 at 12:26:17PM -0600, Theo de Raadt wrote:
> > > Scott Cheloha  wrote:
> > > 
> > > > We're sorta-kinda circling around adding the missing (?) stdio error
> > > > checking to other utilities in bin/ and usr.bin/, no?  I want to be
> > > > sure I understand how to do the next patch, because if we do that it
> > > > will probably be a bunch of programs all at once.
> > > 
> > > This specific program has not checked for this condition since at least
> > > 2 AT UNIX.
> > > 
> > > Your change does not just add a new warning.  It adds a new exit code
> > > condition.
> > > 
> > > Some scripts using echo, which accepted the condition because echo would
> > > exit 0 and not check for this condition, will now see this exit 1.  Some
> > > scripts will abort, because they use "set -o errexit" or similar.
> > > 
> > > You are changing the exit code for a command which is used a lot.
> > > 
> > > POSIX does not require or specify exit 1 for this condition.  If you
> > > disagree, please show where it says so.
> > 
> > It's the usual thing.  >0 if "an error occurred".
> 
> The 40 year old code base says otherwise.
> 
> > Here is my thinking:
> > 
> > echo(1) has ONE job: print the arguments given.
> > 
> > If it fails to print those arguments, shouldn't we signal that to the
> > program that forked echo(1)?
> 
> Only if you validate all callers can handle this change in behaviour.
> 
> > How is echo(1) supposed to differentiate between a write(2) that is
> > allowed to fail, e.g. a diagnostic printout from fw_update to the
> > user's stderr, and one that is not allowed to fail?
> 
> Perhaps it is not supposed to validate this problem  in 2022, because it
> didn't validate it for 40 years.
> 
> > Consider this scenario:
> > 
> > 1.  A shell script uses echo(1) to write something to a file.
> > 
> > /bin/echo foo.dat >> /var/workerd/data-processing-list
> > 
> > 2.  The bytes don't arrive on disk because the file system is full.
> > 
> > 3.  The shell script succeeds because echo(1) can't fail, even if
> > it fails to do what it was told to do.
> > 
> > Isn't that bad?
> > 
> > And it isn't necessarily true that some other thing will fail later
> > and the whole interlocking system will fail.  ENOSPC is a transient
> > condition.  One write(2) can fail and the next write(2) can succeed.
> 
> Yet, for 40 years noone complained.
> 
> Consider the situation you break and change the behaviour of 1000's of
> shell scripts, and haven'd lifted your finger once to review all the
> shell scripts that call echo.
> 
> Have you even compared this behaviour to the echo built-ins in all
> the shells?

I assume what you mean to say is, roughly:

Gee, this seems risky.

What do other echo implementations do?

1. Our ksh(1) already checks for stdout errors in the echo builtin.

2. FreeBSD's /bin/echo has checked for writev(2) errors in /bin/echo
   since 2003:

https://cgit.freebsd.org/src/commit/bin/echo/echo.c?id=91b7d6dc5871f532b1a86ee76389a9bc348bdf58

3. NetBSD's /bin/echo has checked for stdout errors with ferror(3)
   since 2008:

http://cvsweb.netbsd.org/bsdweb.cgi/src/bin/echo/echo.c?rev=1.18=text/x-cvsweb-markup_with_tag=MAIN

4. NetBSD's /bin/sh echo builtin has checked for write errors since
   2008:

http://cvsweb.netbsd.org/bsdweb.cgi/src/bin/sh/bltin/echo.c?rev=1.14=text/x-cvsweb-markup_with_tag=MAIN

5. OpenSolaris has checked for fflush(3) errors in /usr/bin/echo since
   2005 (OpenSolaris launch):

https://github.com/illumos/illumos-gate/blob/7c478bd95313f5f23a4c958a745db2134aa03244/usr/src/cmd/echo/echo.c#L144

6. Looking forward, illumos inherited and retains the behavior in
   their /usr/bin/echo.

7. Extrapolating backward, we can assume Solaris did that checking in
   /usr/bin/echo prior to 2005.

8. GNU Coreutils echo has checked for fflush(3) and fclose(3) errors on
   stdout since 2000:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/echo.c?id=d3683509b3953beb014e540f6d6194658ede1dea

   They use close_stdout() in an atexit(3) hook.  close_stdout() is a
   convenience function provided by gnulib since 1998 that does what I
   described:

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=23928550db5d400f27fa67de29c738ca324a31ea;hp=f76477e515b36a1e10f7734aac3c5478ccf75989

   Maybe of note is that they do this atexit(3) stdout flush/close
   error checking for many of their utilities.

9. The GNU Bash echo builtin has checked for write errors since v2.04,
   in 2000:

https://git.savannah.gnu.org/cgit/bash.git/commit/builtins/echo.def?id=bb70624e964126b7ac4ff085ba163a9c35ffa18f

   They even noted it in the CHANGES file for that release:

https://git.savannah.gnu.org/cgit/bash.git/commit/CHANGES?id=bb70624e964126b7ac4ff085ba163a9c35ffa18f

--

I don't think that we are first movers in this case.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Wed, Aug 10, 2022 at 12:26:17PM -0600, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > We're sorta-kinda circling around adding the missing (?) stdio error
> > checking to other utilities in bin/ and usr.bin/, no?  I want to be
> > sure I understand how to do the next patch, because if we do that it
> > will probably be a bunch of programs all at once.
> 
> This specific program has not checked for this condition since at least
> 2 AT UNIX.
> 
> Your change does not just add a new warning.  It adds a new exit code
> condition.
> 
> Some scripts using echo, which accepted the condition because echo would
> exit 0 and not check for this condition, will now see this exit 1.  Some
> scripts will abort, because they use "set -o errexit" or similar.
> 
> You are changing the exit code for a command which is used a lot.
> 
> POSIX does not require or specify exit 1 for this condition.  If you
> disagree, please show where it says so.

It's the usual thing.  >0 if "an error occurred".

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html

EXIT STATUS

The following exit values shall be returned:

 0
Successful completion.
>0
An error occurred.

CONSEQUENCES OF ERRORS

Default.

> So my question is:  What will be broken by this change?
> 
> Nothing isn't an answer.  I can write a 5 line shell script that will
> observe the change in behaviour.  Many large shell scripts could break
> from this.  I am thinking of fw_update and the installer, but it could
> also be a problem in Makefiles.

Here is my thinking:

echo(1) has ONE job: print the arguments given.

If it fails to print those arguments, shouldn't we signal that to the
program that forked echo(1)?

How is echo(1) supposed to differentiate between a write(2) that is
allowed to fail, e.g. a diagnostic printout from fw_update to the
user's stderr, and one that is not allowed to fail?

> > I want to be sure I understand how to do the next patch, because if we
> > do that it will probably be a bunch of programs all at once.
> 
> If you cannot speak to the exit code command changing for this one
> simple program, I think there is no case for adding to to hundreds of
> other programs.  Unless POSIX specifies the requirement, I'd like to see
> some justification.
> 
> There will always be situations that UNIX didn't anticipate or handle,
> and then POSIX failed to specify.  Such things are now unhandled, probably
> forever, and have become defacto standards.
> 
> On the balance, is your diff improving on some dangerous problem, or is
> it introducing a vast number of dangerous new risks which cannot be
> identified (and which would require an audit of every known script
> calling echo).  Has such an audit been started?

Consider this scenario:

1.  A shell script uses echo(1) to write something to a file.

/bin/echo foo.dat >> /var/workerd/data-processing-list

2.  The bytes don't arrive on disk because the file system is full.

3.  The shell script succeeds because echo(1) can't fail, even if
it fails to do what it was told to do.

Isn't that bad?

And it isn't necessarily true that some other thing will fail later
and the whole interlocking system will fail.  ENOSPC is a transient
condition.  One write(2) can fail and the next write(2) can succeed.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Sat, Jul 30, 2022 at 05:23:37PM -0600, Todd C. Miller wrote:
> On Sat, 30 Jul 2022 18:19:02 -0500, Scott Cheloha wrote:
> 
> > Bump.  The standard's error cases for fflush(3) are identical to those
> > for fclose(3):
> >
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html
> >
> > Is the fact that our fclose(3) can succeed even if the error flag is
> > set a bug?
> 
> As far as I can tell, neither fflush() nor fclose() check the status
> of the error flag, though they may set it of course.  That is why
> I was suggesting an explicit ferror() call at the end.

I'm sorry, I'm having a dumb moment, I don't quite understand what
you're looking for.

Please tweak my patch so it's the way you want it, with the ferror(3)
call in the right spot.

We're sorta-kinda circling around adding the missing (?) stdio error
checking to other utilities in bin/ and usr.bin/, no?  I want to be
sure I understand how to do the next patch, because if we do that it
will probably be a bunch of programs all at once.

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  10 Aug 2022 18:00:12 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || ferror(stdout) || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



Re: ts(1): parse input format string only once

2022-08-10 Thread Scott Cheloha
On Fri, Jul 29, 2022 at 08:13:14AM -0500, Scott Cheloha wrote:
> On Wed, Jul 13, 2022 at 12:50:24AM -0500, Scott Cheloha wrote:
> > We reduce overhead if we only parse the user's format string once.  To
> > achieve that, this patch does the following:
> > 
> > [...]
> > 
> > - When parsing the user format string in fmtfmt(), keep a list of
> >   where each microsecond substring lands in buf.  We'll need it later.
> > 
> > - Move the printing part of fmtfmt() into a new function, fmtprint().
> >   fmtprint() is now called from the main loop instead of fmtfmt().
> > 
> > - In fmtprint(), before calling strftime(3), update any microsecond
> >   substrings in buf using the list we built earlier in fmtfmt().  Note
> >   that if there aren't any such substrings we don't call snprintf(3)
> >   at all.
> > 
> > [...]
> 
> Two week bump.
> 
> Here is a stripped-down patch with only the above changes.  Hopefully
> this makes the intent of the patch more obvious.

Four week bump + rebase.

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.9
diff -u -p -r1.9 ts.c
--- ts.c3 Aug 2022 16:54:30 -   1.9
+++ ts.c10 Aug 2022 17:49:53 -
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -27,13 +28,20 @@
 #include 
 #include 
 
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*format = "%b %d %H:%M:%S";
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(void);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -90,6 +98,8 @@ main(int argc, char *argv[])
if ((outbuf = calloc(1, obsize)) == NULL)
err(1, NULL);
 
+   fmtfmt();
+
/* force UTC for interval calculations */
if (iflag || sflag)
if (setenv("TZ", "UTC", 1) == -1)
@@ -108,7 +118,7 @@ main(int argc, char *argv[])
timespecadd(, _offset, );
else
ts = now;
-   fmtfmt();
+   fmtprint();
if (iflag)
start = now;
}
@@ -134,15 +144,11 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(const struct timespec *ts)
+fmtfmt(void)
 {
-   struct tm *tm;
-   char *f, us[7];
-
-   if ((tm = localtime(>tv_sec)) == NULL)
-   err(1, "localtime");
+   char *f;
+   struct usec *u;
 
-   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 
@@ -161,12 +167,34 @@ fmtfmt(const struct timespec *ts)
f[0] = f[1];
f[1] = '.';
f += 2;
+   u = malloc(sizeof u);
+   if (u == NULL)
+   err(1, NULL);
+   u->pos = f;
+   SIMPLEQ_INSERT_TAIL(_queue, u, next);
l = strlen(f);
memmove(f + 6, f, l + 1);
-   memcpy(f, us, 6);
f += 6;
}
} while (*f != '\0');
+}
+
+static void
+fmtprint(const struct timespec *ts)
+{
+   char us[8];
+   struct tm *tm;
+   struct usec *u;
+
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   /* Update any microsecond substrings in the format buffer. */
+   if (!SIMPLEQ_EMPTY(_queue)) {
+   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
+   SIMPLEQ_FOREACH(u, _queue, next)
+   memcpy(u->pos, us, 6);
+   }
 
*outbuf = '\0';
if (*buf != '\0') {



Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 06:02:10PM +, Miod Vallat wrote:
> > Other platforms (architectures?) (powerpc, powerpc64, arm64, riscv64)
> > multiplex their singular interrupt clock to schedule both a
> > fixed-period hardclock and a pseudorandom statclock.
> > 
> > This is the direction I intend to take every platform, mips64
> > included, after the next release.
> > 
> > In that context, would there be any reason to prefer glxclk to
> > CP0.count?
> 
> No. The cop0 timer is supposed to be the most reliable timer available.
> (although one may argue that, on sgi, the xbow timer on some systems is
> even better quality)

Alright, got it.  If glxclk provides no other utility aside from an
interrupt clock on loongson, then you and I can coordinate unhooking
it when we switch loongson to the new clockintr code in the Fall.

If I'm missing something and it does other work, then nevermind.

Does the latest patch work on any loongson machines you have?

I didn't see any other splx(9) implementations aside from bonito and
the one for loongson3.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   9 Aug 2022 14:48:47 -
@@ -60,6 +60,7 @@ const struct cfattach clock_ca = {
 };
 
 void   cp0_startclock(struct cpu_info *);
+void   cp0_trigger_int5(void);
 uint32_t cp0_int5(uint32_t, struct trapframe *);
 
 int
@@ -86,19 +87,20 @@ clockattach(struct device *parent, struc
cp0_set_compare(cp0_get_count() - 1);
 
md_startclock = cp0_startclock;
+   md_triggerclock = cp0_trigger_int5;
 }
 
 /*
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  cannot be run the tick is handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +115,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked, defer any work until it
+* is unmasked from splx(9).
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_clock_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_clock_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +145,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
-   ci->ci_pendingticks++;
+   pendingticks++;
cp0_set_compare(ci->ci_cpu_counter_last);
}
 
/*
-* Process clock interrupt unless it is currently masked.
+* Process clock interrupt.
 */
-   if (tf->ipl < IPL_CLOCK) {
 #ifdef MULTIPROCESSOR
-   register_t sr;
+   register_t sr;
 
-   sr = getsr();
-   ENABLEIPI();
+   sr = getsr();
+   ENABLEIPI();
 #endif
-   while (ci->ci_pendingticks) {
-   atomic_inc_long(
-   (unsigned long *)_clock_count.ec_count);
-   hardclock(tf);
-   ci->ci_pendingticks--;
-   }
+   while (pendingticks) {
+   atomic_inc_long((unsigned long *)_clock_count.ec_count);
+   hardclock(tf);
+   pendingticks--;
+   }
 #ifdef MULTIPROCESSOR
-   setsr(sr);
+   setsr(sr);
 #endif
-   }
 
return CR_INT_5;/* Clock is always on 5 */
+}
+
+unsigned long cp0_raise_calls, cp0_raise_miss;
+
+/*
+ * Trigger the clock interrupt.
+ * 
+ * We need to spin until either (a) INT5 is pending or (b) the compare
+ * register leads the count register, i.e. we know INT5 will be pending
+ * very soon.
+ *
+ * To ensure we don't spin forever, double the compensatory offset
+ 

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 02:56:54PM +, Miod Vallat wrote:
> > Do those machines not have Coprocessor 0?  If they do, why would you
> > prefer glxclk over CP0?
> 
> cop0 only provides one timer, from which both the scheduling clock and
> statclk are derived. glxclk allows two timers to be used, and thus can
> provide a more reliable statclk (see the Torek paper, etc - it is even
> mentioned in the glxclk manual page).

Other platforms (architectures?) (powerpc, powerpc64, arm64, riscv64)
multiplex their singular interrupt clock to schedule both a
fixed-period hardclock and a pseudorandom statclock.

This is the direction I intend to take every platform, mips64
included, after the next release.

In that context, would there be any reason to prefer glxclk to
CP0.count?



Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 02:03:31PM +, Visa Hankala wrote:
> On Mon, Aug 08, 2022 at 02:52:37AM -0500, Scott Cheloha wrote:
> > One thing I'm still uncertain about is how glxclk fits into the
> > loongson picture.  It's an interrupt clock that runs hardclock() and
> > statclock(), but the code doesn't do any logical masking, so I don't
> > know whether or not I need to adjust anything in that code or account
> > for it at all.  If there's no logical masking there's no deferral, so
> > it would never call need to call md_triggerclock() from splx(9).
> 
> I think the masking of glxclk interrupts are handled by the ISA
> interrupt code.

Do those machines not have Coprocessor 0?  If they do, why would you
prefer glxclk over CP0?

> The patch misses md_triggerclock definition in mips64_machdep.c.

Whoops, forgot that file.  Fuller patch below.

> I have put this to the test on the mips64 ports builder machines.

Cool, thank you for testing.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   9 Aug 2022 14:48:47 -
@@ -60,6 +60,7 @@ const struct cfattach clock_ca = {
 };
 
 void   cp0_startclock(struct cpu_info *);
+void   cp0_trigger_int5(void);
 uint32_t cp0_int5(uint32_t, struct trapframe *);
 
 int
@@ -86,19 +87,20 @@ clockattach(struct device *parent, struc
cp0_set_compare(cp0_get_count() - 1);
 
md_startclock = cp0_startclock;
+   md_triggerclock = cp0_trigger_int5;
 }
 
 /*
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  cannot be run the tick is handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +115,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked, defer any work until it
+* is unmasked from splx(9).
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_clock_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_clock_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +145,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
-   ci->ci_pendingticks++;
+   pendingticks++;
cp0_set_compare(ci->ci_cpu_counter_last);
}
 
/*
-* Process clock interrupt unless it is currently masked.
+* Process clock interrupt.
 */
-   if (tf->ipl < IPL_CLOCK) {
 #ifdef MULTIPROCESSOR
-   register_t sr;
+   register_t sr;
 
-   sr = getsr();
-   ENABLEIPI();
+   sr = getsr();
+   ENABLEIPI();
 #endif
-   while (ci->ci_pendingticks) {
-   atomic_inc_long(
-   (unsigned long *)_clock_count.ec_count);
-   hardclock(tf);
-   ci->ci_pendingticks--;
-   }
+   while (pendingticks) {
+   atomic_inc_long((unsigned long *)_clock_count.ec_count);
+   hardclock(tf);
+   pendingticks--;
+   }
 #ifdef MULTIPROCESSOR
-   setsr(sr);
+   setsr(sr);
 #endif
-   }
 
return CR_INT_5;/* Clock is always on 5 */
+}
+
+unsigned long cp0_raise_calls, cp0_raise_miss;
+
+/*
+ * Trigger the clock interrupt.
+ * 
+ * We need to spin until either (a) INT5 is pending or (b) the compare
+ * register leads the count register, i.e. we know INT5 will be pending
+ * very soon.
+ *
+ * To ensure we don't spin forever, dou

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-08 Thread Scott Cheloha
On Sun, Aug 07, 2022 at 11:05:37AM +, Visa Hankala wrote:
> On Sun, Jul 31, 2022 at 01:28:18PM -0500, Scott Cheloha wrote:
> > Apparently mips64, i.e. octeon and loongson, has the same problem as
> > powerpc/macppc and powerpc64.  The timer interrupt is normally only
> > logically masked, not physically masked in the hardware, when we're
> > running at or above IPL_CLOCK.  If we arrive at cp0_int5() when the
> > clock interrupt is logically masked we postpone all work until the
> > next tick.  This is a problem for my WIP clock interrupt work.
> 
> I think the use of logical masking has been a design choice, not
> something dictated by the hardware. Physical masking should be possible,
> but some extra care would be needed to implement it, as the mips64
> interrupt code is a bit clunky.

That would be cleaner, but from the sound of it, it's easier to start
with this.

> > So, this patch is basically the same as what I did for macppc and what
> > I have proposed for powerpc64.
> > 
> > - Add a new member, ci_timer_deferred, to mips64's cpu_info struct.
> > 
> >   While here, remove ci_pendingticks.  We don't need it anymore.
> > 
> > - If we get to cp0_int5() and our IPL is too high, set
> >   cpu_info.ci_timer_deferred and return.
> > 
> > - If we get to cp0_int5() and our IPL is low enough, clear
> >   cpu_info.ci_timer_deferred and do clock interrupt work.
> > 
> > - In splx(9), if the new IPL is low enough and cpu_info.ci_timer_deferred
> >   is set, trigger the clock interrupt.
> > 
> > The only big difference is that mips64 uses an equality comparison
> > when deciding whether to arm the timer interrupt, so it's really easy
> > to "miss" CP0.count when you're setting CP0.compare.
> > 
> > To address this I've written a function, cp0_raise_int5(), that spins
> > until it is sure the timer interrupt will go off.  The function needed
> > a bit of new code for reading CP0.cause, which I've added to
> > cp0access.S.  I am using an initial offset of 16 cycles based on
> > experimentation with the machine I have access to, a 500Mhz CN50xx.
> > Values lower than 16 require more than one loop to arm the timer.  If
> > that value is insufficient for other machines we can try passing the
> > initial offset as an argument to the function.
> 
> It should not be necessary to make the initial offset variable. The
> offset is primarily a function of the length and content of the
> instruction sequence. Some unpredictability comes from cache misses
> and maybe branch prediction failures.

Gotcha.  So it mostly depends on the number of instructions between
loading CP0.count and storing CP0.compare.

> > I wasn't sure where to put the prototype for cp0_raise_int5() so I
> > stuck it in mips64/cpu.h.  If there's a better place for it, just say
> > so.
> 
> Currently, mips64 clock.c is formulated as a proper driver. I think
> callers should not invoke its functions directly but use a hook instead.
> The MI mips64 code starts the clock through the md_startclock function
> pointer. Maybe there could be md_triggerclock.
> 
> To reduce risk of confusion, I would rename cp0_raise_int5 to
> cp0_trigger_int5, as `raise' overlaps with the spl API. Also,
> ci_clock_deferred instead of ci_timer_deferred would look more
> consistent with the surrounding code.

Okay, I took all these suggestions and incorporated them.  Updated
patch attached.

One thing I'm still uncertain about is how glxclk fits into the
loongson picture.  It's an interrupt clock that runs hardclock() and
statclock(), but the code doesn't do any logical masking, so I don't
know whether or not I need to adjust anything in that code or account
for it at all.  If there's no logical masking there's no deferral, so
it would never call need to call md_triggerclock() from splx(9).

Also:

My EdgeRouter PoE just finished a serial `make build`.  Took almost 12
days.  Which is a good sign!  Lots of opportunity for the patch to
fail and the clock to die.

In that time, under what I assume is relatively heavy load, the clock
interrupt deferral counters look like this:

cp0_raise_calls at 0x81701308: 133049
cp0_raise_miss at 0x81701300: 0

So 16 cycles as the initial offset works great.  We never ran the loop
more than once, i.e. we never "missed" CP0.count.

The machine has been up a little more than a million seconds.  So, at
100hz, with no separate statclock, and 2 CPUs, we'd expect ~200 clock
interrupts a second, or 200 million in total.

In ~200,000,000 cp0_int5() calls, we deferred ~133,000 of them, or
~0.0665%.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips

Re: top(1): display uptime in HH:MM:SS format

2022-08-07 Thread Scott Cheloha
On Fri, Sep 18, 2020 at 03:59:05PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> - An HH:MM:SS format uptime is useful in top(1).  It's also more
>   visually consistent with the local timestamp printed on the line
>   above it, so it is easier to read at a glance.
> 
> - The variable printing of "days" is annoying.  I would rather it
>   just told me "0 days" if it had been less than one day.  It sucks
>   when the information you want moves around or isn't shown at all.
>   It's clever, sure, but I'd rather it be consistent.
> 
> This patch changes the uptime format string to "up D days HH:MM:SS".
> The format string does not vary with the elapsed uptime.  There is no
> inclusion/omission of the plural suffix depending on whether days is
> equal to one.
> 
> [...]

Whoops, forgot about this one.  September 18, 2020.  What a time to be
alive.

Let's try this again.  98 week bump.

To recap, this patch makes the uptime formatting in top(1) produce
more constant-width results.  The formatting is now always:

up D days HH:MM:SS

so only the day-count changes size.  The day-count is also always
printed: if the machine has not been up for a full day it prints

up 0 days HH:MM:SS

For example, the upper lines on the top(1) running on my machine
currently look like this:

load averages:  0.29,  0.29,  0.27 jetsam.attlocal.net 18:12:16
82 processes: 81 idle, 1 on processorup 3 days 07:14:01

I have been running with this for almost two years and I love it.
I would like to commit it.

The only feedback I got when I originally posted this was that the
output formatting would no longer be the same as uptime(1)'s.  I don't
think that matters very much.  The person who offered the feedback
didn't think it mattered either, they were just hypothesizing
objections.

ok?

Index: display.c
===
RCS file: /cvs/src/usr.bin/top/display.c,v
retrieving revision 1.65
diff -u -p -r1.65 display.c
--- display.c   26 Aug 2020 16:21:28 -  1.65
+++ display.c   7 Aug 2022 23:14:25 -
@@ -208,31 +208,28 @@ display_init(struct statics * statics)
return (display_lines);
 }
 
+/*
+ * Print the time elapsed since the system booted.
+ */
 static void
 format_uptime(char *buf, size_t buflen)
 {
-   time_t uptime;
-   int days, hrs, mins;
struct timespec boottime;
+   time_t uptime;
+   unsigned int days, hrs, mins, secs;
+
+   if (clock_gettime(CLOCK_BOOTTIME, ) == -1)
+   err(1, "clock_gettime");
 
-   /*
-* Print how long system has been up.
-*/
-   if (clock_gettime(CLOCK_BOOTTIME, ) != -1) {
-   uptime = boottime.tv_sec;
-   uptime += 30;
-   days = uptime / (3600 * 24);
-   uptime %= (3600 * 24);
-   hrs = uptime / 3600;
-   uptime %= 3600;
-   mins = uptime / 60;
-   if (days > 0)
-   snprintf(buf, buflen, "up %d day%s, %2d:%02d",
-   days, days > 1 ? "s" : "", hrs, mins);
-   else
-   snprintf(buf, buflen, "up %2d:%02d",
-   hrs, mins);
-   }
+   uptime = boottime.tv_sec;
+   days = uptime / (3600 * 24);
+   uptime %= (3600 * 24);
+   hrs = uptime / 3600;
+   uptime %= 3600;
+   mins = uptime / 60;
+   secs = uptime % 60;
+   snprintf(buf, buflen, "up %u days %02u:%02u:%02u",
+   days, hrs, mins, secs);
 }
 
 



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-04 Thread Scott Cheloha
On Fri, Aug 05, 2022 at 12:34:59AM +0200, Jeremie Courreges-Anglas wrote:
> >> [...]
> >> 
> >> You're adding the timer reset to plic_setipl() but the latter is called
> >> after softintr processing in plic_splx().
> >> 
> >>/* Pending software intr is handled here */
> >>if (ci->ci_ipending & riscv_smask[new])
> >>riscv_do_pending_intr(new);
> >> 
> >>plic_setipl(new);
> >
> > Yes, but plic_setipl() is also called from the softintr loop in
> > riscv_do_pending_intr() (riscv64/intr.c) right *before* we dispatch
> > any pending soft interrupts:
> >
> >594  void
> >595  riscv_do_pending_intr(int pcpl)
> >596  {
> >597  struct cpu_info *ci = curcpu();
> >598  u_long sie;
> >599
> >600  sie = intr_disable();
> >601
> >602  #define DO_SOFTINT(si, ipl) \
> >603  if ((ci->ci_ipending & riscv_smask[pcpl]) & \
> >604  SI_TO_IRQBIT(si)) { \
> >605  ci->ci_ipending &= ~SI_TO_IRQBIT(si);   \
> > *  606  riscv_intr_func.setipl(ipl);\
> >607  intr_restore(sie);  \
> >608  softintr_dispatch(si);  \
> >609  sie = intr_disable();   \
> >610  }
> >611
> >612  do {
> >613  DO_SOFTINT(SIR_TTY, IPL_SOFTTTY);
> >614  DO_SOFTINT(SIR_NET, IPL_SOFTNET);
> >615  DO_SOFTINT(SIR_CLOCK, IPL_SOFTCLOCK);
> >616  DO_SOFTINT(SIR_SOFT, IPL_SOFT);
> >617  } while (ci->ci_ipending & riscv_smask[pcpl]);
> >
> > We might be fine doing it just once in plic_splx() before we do any
> > soft interrupt stuff.  That's closer to what we're doing on other
> > platforms.
> >
> > I just figured it'd be safer to do it in plic_setipl() because we're
> > already disabling interrupts there.  It seems I guessed correctly
> > because the patch didn't hang your machine.
> 
> Ugh, I had missed that setipl call, thanks for pointing it out.

Np.

> Since I don't wander into this code on a casual basis I won't object,
> but this looks very unobvious to me.  :)

I kind of agree.

I think it would be cleaner -- logically cleaner, not necessarily
cleaner in the code -- to mask timer interrupts when we raise the IPL
to or beyond IPL_CLOCK and unmask timer interrupts when we drop the
IPL below IPL_CLOCK.

... but doing it this way is a lot faster than taking the time to read
and understand the RISC-V privileged architecture spec and how the SBI
interacts with it.

At a glance I see that there are separate Interrupt-Enable bits for
External, Timer, and Software interrupts at the supervisor level.  So
what I'm imagining might be possible.  I just don't know how to get
the current code to do what I've described.



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-04 Thread Scott Cheloha
On Thu, Aug 04, 2022 at 09:39:13AM +0200, Jeremie Courreges-Anglas wrote:
> On Mon, Aug 01 2022, Scott Cheloha  wrote:
> > On Mon, Aug 01, 2022 at 07:15:33PM +0200, Jeremie Courreges-Anglas wrote:
> >> On Sun, Jul 31 2022, Scott Cheloha  wrote:
> >> > Hi,
> >> >
> >> > I am unsure how to properly mask RISC-V timer interrupts in hardware
> >> > at or above IPL_CLOCK.  I think that would be cleaner than doing
> >> > another timer interrupt deferral thing.
> >> >
> >> > But, just to get the ball rolling, here a first attempt at the timer
> >> > interrupt deferral thing for riscv64.  The motivation, as with every
> >> > other platform, is to eventually make it unnecessary for the machine
> >> > dependent code to know anything about the clock interrupt schedule.
> >> >
> >> > The thing I'm most unsure about is where to retrigger the timer in the
> >> > PLIC code.  It seems right to do it from plic_setipl() because I want
> >> > to retrigger it before doing soft interrupt work, but I'm not sure.
> 
> You're adding the timer reset to plic_setipl() but the latter is called
> after softintr processing in plic_splx().
> 
>   /* Pending software intr is handled here */
>   if (ci->ci_ipending & riscv_smask[new])
>   riscv_do_pending_intr(new);
> 
>   plic_setipl(new);

Yes, but plic_setipl() is also called from the softintr loop in
riscv_do_pending_intr() (riscv64/intr.c) right *before* we dispatch
any pending soft interrupts:

   594  void
   595  riscv_do_pending_intr(int pcpl)
   596  {
   597  struct cpu_info *ci = curcpu();
   598  u_long sie;
   599
   600  sie = intr_disable();
   601
   602  #define DO_SOFTINT(si, ipl) \
   603  if ((ci->ci_ipending & riscv_smask[pcpl]) & \
   604  SI_TO_IRQBIT(si)) { \
   605  ci->ci_ipending &= ~SI_TO_IRQBIT(si);   \
*  606  riscv_intr_func.setipl(ipl);\
   607  intr_restore(sie);  \
   608  softintr_dispatch(si);  \
   609  sie = intr_disable();   \
   610  }
   611
   612  do {
   613  DO_SOFTINT(SIR_TTY, IPL_SOFTTTY);
   614  DO_SOFTINT(SIR_NET, IPL_SOFTNET);
   615  DO_SOFTINT(SIR_CLOCK, IPL_SOFTCLOCK);
   616  DO_SOFTINT(SIR_SOFT, IPL_SOFT);
   617  } while (ci->ci_ipending & riscv_smask[pcpl]);

We might be fine doing it just once in plic_splx() before we do any
soft interrupt stuff.  That's closer to what we're doing on other
platforms.

I just figured it'd be safer to do it in plic_setipl() because we're
already disabling interrupts there.  It seems I guessed correctly
because the patch didn't hang your machine.

> >> > Unless I'm missing something, I don't think I need to do anything in
> >> > the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?
> >>
> >> No idea about about the items above, but...
> >> 
> >> > I have no riscv64 machine, so this is untested.  Would appreciate
> >> > tests and feedback.
> >> 
> >> There's an #include  missing in plic.c,
> >
> > Whoops, corrected patch attached below.
> >
> >> with that added your diff builds and GENERIC.MP seems to behave
> >> (currently running make -j4 build), but I don't know exactly which
> >> problems I should look for.
> >
> > Thank you for trying it out.
> >
> > The patch changes how clock interrupt work is deferred on riscv64.
> >
> > If the code is wrong, the hardclock and statclock should eventually
> > die on every CPU.  The death of the hardclock in particular would
> > manifest to the user as livelock.  The scheduler would stop preempting
> > userspace and it would be impossible to use the machine interactively.
> >
> > There isn't really a direct way to exercise this code change.
> >
> > The best we can do is make the machine busy.  If the machine is busy
> > we can expect more spl(9) calls and more deferred clock interrupt
> > work, which leaves more opportunities for the bug to surface.
> >
> > So, a parallel `make build` is fine.  It's our gold standard for
> > making the machine really busy.
> 
> The diff survived three make -j4 build/release in a row, the clock seems
> stable.

Awesome!  Thank you for hammering on it.

kettenis, mlarkin, drahn:

Is this code fine or do you want to go about this in a different way?

Index: dev/plic.c
==

Re: wc(1): accelerate word counting

2022-08-03 Thread Scott Cheloha
On Wed, Nov 17, 2021 at 08:37:53AM -0600, Scott Cheloha wrote:
> In wc(1) we currently count words, both ASCII and multibyte, in a
> getline(3) loop.
> 
> This makes sense in the multibyte case because stdio handles all the
> nasty buffer resizing for us.  We avoid splitting a multibyte between
> two read(2) calls and the resulting code is simpler.
> 
> However, for ASCII input we don't have the split-character problem.
> Using getline(3) doesn't really buy us anything.  We can count words
> in a big buffer (as we do in the ASCII byte- and line-counting modes)
> just fine.
> 
> [...]

37 week bump.

Counting words in a big buffer is faster than doing it with
getline(3).  We don't need the convenience of getline(3) except
in the multibyte case.

The state machine for counting words doesn't need to change because
word transitions still happen within a single byte.  We just move the
logic out of the getline(3) loop and into a read(2) loop.

As for "faster", consider The Adventures of Sherlock Holmes:

$ ftp -o sherlock-holmes.txt https://www.gutenberg.org/files/1661/1661-0.txt
Trying 152.19.134.47...
Requesting https://www.gutenberg.org/files/1661/1661-0.txt
100% |**|   593 KB00:01
607430 bytes received in 1.05 seconds (563.58 KB/s)
$ ls -lh sherlock-holmes.txt
-rw-r--r--  1 ssc  ssc   593K Jun  9  2021 sherlock-holmes.txt

-current:

$ command time /usr/bin/wc $(jot -b ~/sherlock-holmes.txt 200) | tail -n 1
2.081 real 2.730 user 0.080 sys
 2460800 21512000 121486000 total

Patched:

$ command time obj/wc $(jot -b /home/ssc/sherlock-holmes.txt 200) | tail -n 1
1.093 real 1.910 user 0.030 sys
 2460800 21512000 121486000 total

So, twice as fast on an input with normal-ish line lengths.

ok?

Index: wc.c
===
RCS file: /cvs/src/usr.bin/wc/wc.c,v
retrieving revision 1.29
diff -u -p -r1.29 wc.c
--- wc.c28 Nov 2021 19:28:42 -  1.29
+++ wc.c3 Aug 2022 23:11:45 -
@@ -145,16 +145,42 @@ cnt(const char *path)
fd = STDIN_FILENO;
}
 
-   if (!doword && !multibyte) {
+   if (!multibyte) {
if (bufsz < _MAXBSIZE &&
(buf = realloc(buf, _MAXBSIZE)) == NULL)
err(1, NULL);
+
+   /*
+* According to POSIX, a word is a "maximal string of
+* characters delimited by whitespace."  Nothing is said
+* about a character being printing or non-printing.
+*/
+   if (doword) {
+   gotsp = 1;
+   while ((len = read(fd, buf, _MAXBSIZE)) > 0) {
+   charct += len;
+   for (C = buf; len--; ++C) {
+   if (isspace((unsigned char)*C)) {
+   gotsp = 1;
+   if (*C == '\n')
+   ++linect;
+   } else if (gotsp) {
+   gotsp = 0;
+   ++wordct;
+   }
+   }
+   }
+   if (len == -1) {
+   warn("%s", file);
+   rval = 1;
+   }
+   }
/*
 * Line counting is split out because it's a lot
 * faster to get lines than to get words, since
 * the word count requires some logic.
 */
-   if (doline) {
+   else if (doline) {
while ((len = read(fd, buf, _MAXBSIZE)) > 0) {
charct += len;
for (C = buf; len--; ++C)
@@ -204,46 +230,26 @@ cnt(const char *path)
return;
}
 
-   /*
-* Do it the hard way.
-* According to POSIX, a word is a "maximal string of
-* characters delimited by whitespace."  Nothing is said
-* about a character being printing or non-printing.
-*/
gotsp = 1;
while ((len = getline(, , stream)) > 0) {
-   if (multibyte) {
-   const char *end = buf + len;
-   for (C = buf; C < end; C += len) {
-   ++charct;
-   len = mbtowc(, C, MB_CUR_MAX);
-   

Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-08-02 Thread Scott Cheloha
On Mon, Jul 25, 2022 at 06:44:31PM -0500, Scott Cheloha wrote:
> On Mon, Jul 25, 2022 at 01:52:36PM +0200, Mark Kettenis wrote:
> > > Date: Sun, 24 Jul 2022 19:33:57 -0500
> > > From: Scott Cheloha 
> > > 
> > > On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> > > > 
> > > > [...]
> > > > 
> > > > I don't have a powerpc64 machine, so this is untested.  [...]
> > > 
> > > gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.
> > > 
> > > Here is a corrected patch that, according to gkoehler@, actually
> > > compiles.
> > 
> > Thanks.  I already figured that bit out myself.  Did some limited
> > testing, but it seems to work correctly.  No noticable effect on the
> > timekeeping even when building clang on all the (4) cores.
> 
> I wouldn't expect this patch to impact timekeeping.  All we're doing
> is calling hardclock(9) a bit sooner than we normally would a few
> times every second.
> 
> I would expect to see slightly more distinct interrupts (uvmexp.intrs)
> per second because we aren't actively batching hardclock(9) and
> statclock calls.
> 
> ... by the way, uvmexp.intrs should probably be incremented
> atomically, no?
> 
> > Regarding the diff, I think it would be better to avoid changing
> > trap.c.  That function is complicated enough and splitting the logic
> > for this over three files makes it a bit harder to understand.  So you
> > could have:
> > 
> > void
> > decr_intr(struct trapframe *frame)
> > {
> > struct cpu_info *ci = curcpu();
> > ...
> > int s;
> > 
> > if (ci->ci_cpl >= IPL_CLOCK) {
> > ci->ci_dec_deferred = 1;
> > mtdec(UINT32_MAX >> 1); /* clear DEC exception */
> > return;
> > }
> > 
> > ci->ci_dec_deferred = 0;
> > 
> > ...
> > }
> > 
> > That has the downside of course that it will be slightly less
> > efficient if we're at IPL_CLOCK or above, but that really shouldn't
> > happen often enough for it to matter.
> 
> Yep.  It's an extra function call, the overhead is small.
> 
> Updated patch below.

At what point do we consider the patch safe?  Have you seen any hangs?

Wanna run with it another week?

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 23:43:47 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 23:43:47 -
@@ -98,6 +98,17 @@ decr_intr(struct trapframe *frame)
int s;
 
/*
+* If the clock interrupt is masked, postpone all work until
+* it is unmasked in splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -130,30 +141,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct cloc

dmesg(8): fail if given positional arguments

2022-08-02 Thread Scott Cheloha
dmesg(8) doesn't use any positional arguments.  It's a usage error if
any are present.

ok?

Index: dmesg.c
===
RCS file: /cvs/src/sbin/dmesg/dmesg.c,v
retrieving revision 1.31
diff -u -p -r1.31 dmesg.c
--- dmesg.c 24 Dec 2019 13:20:44 -  1.31
+++ dmesg.c 2 Aug 2022 16:48:13 -
@@ -89,6 +89,9 @@ main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
+   if (argc != 0)
+   usage();
+
if (memf == NULL && nlistf == NULL) {
int mib[2], msgbufsize;
size_t len;



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-01 Thread Scott Cheloha
On Mon, Aug 01, 2022 at 07:15:33PM +0200, Jeremie Courreges-Anglas wrote:
> On Sun, Jul 31 2022, Scott Cheloha  wrote:
> > Hi,
> >
> > I am unsure how to properly mask RISC-V timer interrupts in hardware
> > at or above IPL_CLOCK.  I think that would be cleaner than doing
> > another timer interrupt deferral thing.
> >
> > But, just to get the ball rolling, here a first attempt at the timer
> > interrupt deferral thing for riscv64.  The motivation, as with every
> > other platform, is to eventually make it unnecessary for the machine
> > dependent code to know anything about the clock interrupt schedule.
> >
> > The thing I'm most unsure about is where to retrigger the timer in the
> > PLIC code.  It seems right to do it from plic_setipl() because I want
> > to retrigger it before doing soft interrupt work, but I'm not sure.
> >
> > Unless I'm missing something, I don't think I need to do anything in
> > the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?
> 
> No idea about about the items above, but...
> 
> > I have no riscv64 machine, so this is untested.  Would appreciate
> > tests and feedback.
> 
> There's an #include  missing in plic.c,

Whoops, corrected patch attached below.

> with that added your diff builds and GENERIC.MP seems to behave
> (currently running make -j4 build), but I don't know exactly which
> problems I should look for.

Thank you for trying it out.

The patch changes how clock interrupt work is deferred on riscv64.

If the code is wrong, the hardclock and statclock should eventually
die on every CPU.  The death of the hardclock in particular would
manifest to the user as livelock.  The scheduler would stop preempting
userspace and it would be impossible to use the machine interactively.

There isn't really a direct way to exercise this code change.

The best we can do is make the machine busy.  If the machine is busy
we can expect more spl(9) calls and more deferred clock interrupt
work, which leaves more opportunities for the bug to surface.

So, a parallel `make build` is fine.  It's our gold standard for
making the machine really busy.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/riscv64/include/cpu.h,v
retrieving revision 1.12
diff -u -p -r1.12 cpu.h
--- include/cpu.h   10 Jun 2022 21:34:15 -  1.12
+++ include/cpu.h   1 Aug 2022 17:36:41 -
@@ -92,7 +92,7 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;
+   volatile intci_timer_deferred;
 
uint32_tci_cpl;
uint32_tci_ipending;
Index: riscv64/clock.c
===
RCS file: /cvs/src/sys/arch/riscv64/riscv64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- riscv64/clock.c 24 Jul 2021 22:41:09 -  1.3
+++ riscv64/clock.c 1 Aug 2022 17:36:41 -
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -106,6 +107,17 @@ clock_intr(void *frame)
int s;
 
/*
+* If the clock interrupt is masked, defer all clock interrupt
+* work until the clock interrupt is unmasked from splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   sbi_set_timer(UINT64_MAX);
+   return 0;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last clock interrupt,
 * we arrange for earlier interrupt next time.
 */
@@ -132,30 +144,23 @@ clock_intr(void *frame)
 
sbi_set_timer(nextevent);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;

Re: [v5] amd64: simplify TSC sync testing

2022-08-01 Thread Scott Cheloha
On Mon, Aug 01, 2022 at 03:03:36PM +0900, Masato Asou wrote:
> Hi, Scott.
> 
> I tested v5 patch on my ESXi on Ryzen7.
> It works fine for me.

Is this the same Ryzen7 box as in the prior message?

Or do you have two different boxes, one running OpenBSD on the bare
metal, and this one running ESXi?



Re: [v4] amd64: simplify TSC sync testing

2022-07-31 Thread Scott Cheloha
> On Jul 31, 2022, at 23:48, Masato Asou  wrote:
> 
> Hi, Scott
> 
> I tested your patch on my Ryzen7 box.
> And I got failed message:
> 
> $ sysctl -a | grep tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=3593244667
> machdep.invarianttsc=1
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> $ dmesg | grep failed
> tsc: cpu0/cpu2: sync test round 1/2 failed
> tsc: cpu0/cpu4: sync test round 1/2 failed
> tsc: cpu0/cpu5: sync test round 1/2 failed
> tsc: cpu0/cpu6: sync test round 1/2 failed
> tsc: cpu0/cpu7: sync test round 1/2 failed
> $ 

Thank you for testing.

Please try with the latest patch.  v5 is posted
on tech@ now.

> dmesg:
> 
> OpenBSD 7.2-beta (GENERIC.MP) #10: Mon Aug  1 13:12:06 JST 2022
>a...@g2-obsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34256752640 (32669MB)
> avail mem = 33201152000 (31663MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdb64 (63 entries)
> bios0: vendor American Megatrends Inc. version "9015" date 03/03/2020
> bios0: MouseComputer Co.,Ltd. LM-AG400

You may also want to try updating your BIOS.



riscv64: trigger deferred timer interrupts from splx(9)

2022-07-31 Thread Scott Cheloha
Hi,

I am unsure how to properly mask RISC-V timer interrupts in hardware
at or above IPL_CLOCK.  I think that would be cleaner than doing
another timer interrupt deferral thing.

But, just to get the ball rolling, here a first attempt at the timer
interrupt deferral thing for riscv64.  The motivation, as with every
other platform, is to eventually make it unnecessary for the machine
dependent code to know anything about the clock interrupt schedule.

The thing I'm most unsure about is where to retrigger the timer in the
PLIC code.  It seems right to do it from plic_setipl() because I want
to retrigger it before doing soft interrupt work, but I'm not sure.

Unless I'm missing something, I don't think I need to do anything in
the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?

I have no riscv64 machine, so this is untested.  Would appreciate
tests and feedback.

-Scott

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/riscv64/include/cpu.h,v
retrieving revision 1.12
diff -u -p -r1.12 cpu.h
--- include/cpu.h   10 Jun 2022 21:34:15 -  1.12
+++ include/cpu.h   1 Aug 2022 01:13:38 -
@@ -92,7 +92,7 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;
+   volatile intci_timer_deferred;
 
uint32_tci_cpl;
uint32_tci_ipending;
Index: riscv64/clock.c
===
RCS file: /cvs/src/sys/arch/riscv64/riscv64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- riscv64/clock.c 24 Jul 2021 22:41:09 -  1.3
+++ riscv64/clock.c 1 Aug 2022 01:13:38 -
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -106,6 +107,17 @@ clock_intr(void *frame)
int s;
 
/*
+* If the clock interrupt is masked, defer all clock interrupt
+* work until the clock interrupt is unmasked from splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   sbi_set_timer(UINT64_MAX);
+   return 0;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last clock interrupt,
 * we arrange for earlier interrupt next time.
 */
@@ -132,30 +144,23 @@ clock_intr(void *frame)
 
sbi_set_timer(nextevent);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
}
+
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
+
+   intr_disable();
+   splx(s);
 
return 0;
 }
Index: dev/plic.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/plic.c,v
retrieving revision 1.10
diff -u -p -r1.10 plic.c
--- dev/plic.c  6 Apr 2022 18:59:27 -   1.10
+++ dev/plic.c  1 Aug 2022 01:13:38 -
@@ -557,6 +557,10 @@ plic_setipl(int new)
/* higher values are higher priority */
plic_set_threshold(ci->ci_cpuid, new);
 
+   /* trigger deferred timer interrupt if cpl is now low enough */
+   if (ci->ci_timer_deferred && new < IPL_CLOCK)
+   sbi_set_timer(0);
+
intr_restore(sie);
 }
 



mips64: trigger deferred timer interrupt from splx(9)

2022-07-31 Thread Scott Cheloha
Hi,

Apparently mips64, i.e. octeon and loongson, has the same problem as
powerpc/macppc and powerpc64.  The timer interrupt is normally only
logically masked, not physically masked in the hardware, when we're
running at or above IPL_CLOCK.  If we arrive at cp0_int5() when the
clock interrupt is logically masked we postpone all work until the
next tick.  This is a problem for my WIP clock interrupt work.

So, this patch is basically the same as what I did for macppc and what
I have proposed for powerpc64.

- Add a new member, ci_timer_deferred, to mips64's cpu_info struct.

  While here, remove ci_pendingticks.  We don't need it anymore.

- If we get to cp0_int5() and our IPL is too high, set
  cpu_info.ci_timer_deferred and return.

- If we get to cp0_int5() and our IPL is low enough, clear
  cpu_info.ci_timer_deferred and do clock interrupt work.

- In splx(9), if the new IPL is low enough and cpu_info.ci_timer_deferred
  is set, trigger the clock interrupt.

The only big difference is that mips64 uses an equality comparison
when deciding whether to arm the timer interrupt, so it's really easy
to "miss" CP0.count when you're setting CP0.compare.

To address this I've written a function, cp0_raise_int5(), that spins
until it is sure the timer interrupt will go off.  The function needed
a bit of new code for reading CP0.cause, which I've added to
cp0access.S.  I am using an initial offset of 16 cycles based on
experimentation with the machine I have access to, a 500Mhz CN50xx.
Values lower than 16 require more than one loop to arm the timer.  If
that value is insufficient for other machines we can try passing the
initial offset as an argument to the function.

I wasn't sure where to put the prototype for cp0_raise_int5() so I
stuck it in mips64/cpu.h.  If there's a better place for it, just say
so.

I also left some atomic counters for you to poke at with pstat(8) if
you want to see what the machine is doing in cp0_raise_int5(), i.e.
how often we defer clock interrupt work and how many loops you take to
arm the timer interrupt.  Those will be removed before commit.

I'm running a `make build` on my EdgeRouter PoE.  It only has 512MB of
RAM, so I can't do a parallel build without hanging the machine when
attempting to compile LLVM.  The build has been running for four days
and the machine has not yet hung, so I think this patch is correct-ish.
I will holler if it hangs.

visa: Assuming this code looks right, could you test this on a
  beefier octeon machine?  Preferably a parallel build?

miod: I'm unclear whether loongson uses cp0_int5().  Am I missing
  code here, or are my changes in arch/loongson sufficient?
  If it's sufficient, could you test this?

  I have no loongson hardware, so this is uncompiled there.
  Sorry in advance if it does not compile.

Thoughts?

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   31 Jul 2022 18:18:05 -
@@ -92,13 +92,13 @@ clockattach(struct device *parent, struc
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  can not be run the is tick handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +113,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked we can't do any work until
+* it is unmasked.
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +143,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {

[v5] amd64: simplify TSC sync testing

2022-07-30 Thread Scott Cheloha
Hi,

At the urging of sthen@ and dv@, here is v5.

Two major changes from v4:

- Add the function tc_reset_quality() to kern_tc.c and use it
  to lower the quality of the TSC timecounter if we fail the
  sync test.

  tc_reset_quality() will choose a new active timecounter if,
  after the quality change, the given timecounter is no longer
  the best timecounter.

  The upshot is: if you fail the TSC sync test you should boot
  with the HPET as your active timecounter.  If you don't have
  an HPET you'll be using something else.

- Drop the SMT accomodation from the hot loop.  It hasn't been
  necessary since last year when I rewrote the test to run without
  a mutex.  In the rewritten test, the two CPUs in the hot loop
  are not competing for any resources so they should not be able
  to starve one another.

dv: Could you double-check that this still chooses the right
timecounter on your machine?  If so, I will ask deraadt@ to
put this into snaps to replace v4.

Additional test reports are welcome.  Include your dmesg.

--

I do not see much more I can do to improve this patch.

I am seeking patch review and OKs.

I am especially interested in whether my assumptions in tsc_ap_test()
and tsc_bp_test() are correct.  The whole patch depends on those
assumptions.  Is this a valid way to test for TSC desync?  Or am I
missing membar_producer()/membar_consumer() calls?

Here is the long version of "what" and "why" for this patch.

The patch is attached at the end.

- Computing a per-CPU TSC skew value is error-prone, especially
  on multisocket machines and VMs.  My best guess is that larger
  latencies appear to the skew measurement test as TSC desync,
  and so the TSC is demoted to a kernel timecounter on these
  machines or marked non-monotonic.

  This patch eliminates per-CPU TSC skew values.  Instead of trying
  to measure and correct for TSC desync we only try to detect desync,
  which is less error-prone.  This approach should allow a wider
  variety of machines to use the TSC as a timecounter when running
  OpenBSD.

- In the new sync test, both CPUs repeatedly try to detect whether
  their TSC is trailing the other CPU's TSC.  The upside to this
  approach is that it yields no false positives (if my assumptions
  about AMD64 memory access and instruction serialization are correct).
  The downside to this approach is that it takes more time than the
  current skew measurement test.  Each test round takes 1ms, and
  we run up to two rounds per CPU, so this patch slows boot down
  by 2ms per AP.

- If any CPU fails the sync test, the TSC is marked non-monotonic
  and a different timecounter is activated.  The TC_USER flag
  remains intact.  There is no "middle ground" where we fall back
  to only using the TSC in the kernel.

- Because there is no per-CPU skew value, there is also no concept
  of TSC drift anymore.

- Before running the test, we check for the IA32_TSC_ADJUST
  register and reset it if necessary.  This is a trivial way
  to work around firmware bugs that desync the TSC before we
  reach the kernel.

  Unfortunately, at the moment this register appears to only
  be available on Intel processors and I cannot find an equivalent
  but differently-named MSR for AMD processors.

--

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  31 Jul 2022 03:06:39 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,264 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
  

Re: echo(1): check for stdio errors

2022-07-30 Thread Scott Cheloha
On Mon, Jul 11, 2022 at 01:27:23PM -0500, Scott Cheloha wrote:
> On Mon, Jul 11, 2022 at 08:31:04AM -0600, Todd C. Miller wrote:
> > On Sun, 10 Jul 2022 20:58:35 -0900, Philip Guenther wrote:
> > 
> > > Three thoughts:
> > > 1) Since stdio errors are sticky, is there any real advantage to checking
> > > each call instead of just checking the final fclose()?
> 
> My thinking was that we have no idea how many arguments we're going to
> print, so we may as well fail as soon as possible.
> 
> Maybe in more complex programs there would be a code-length or
> complexity-reducing upside to deferring the ferror(3) check until,
> say, the end of a subroutine or something.
> 
> > > [...]
> > 
> > Will that really catch all errors?  From what I can tell, fclose(3)
> > can succeed even if the error flag was set.  The pattern I prefer
> > is to use a final fflush(3) followed by a call to ferror(3) before
> > the fclose(3).
> 
> [...]

Bump.  The standard's error cases for fflush(3) are identical to those
for fclose(3):

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html

Is the fact that our fclose(3) can succeed even if the error flag is
set a bug?

Also, can I go ahead with this?  With this patch, echo(1) fails if we
(for example) try to write to a full file system.  So we are certainly
catching more stdio failures:

$ /bin/echo test > /tmp/myfile

/tmp: write failed, file system is full
$ echo $?
0

$ obj/echo test > /tmp/myfile

/tmp: write failed, file system is full
echo: stdout: No space left on device
$ echo $?
1

Progress!  Note that the shell builtin already fails in this case:

$ type echo
echo is a shell builtin
$ echo test > /tmp/myfile

/tmp: write failed, file system is full
jetsam$ echo $?
1

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  30 Jul 2022 23:10:24 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



rc(8): reorder_libs(): print names of relinked libraries

2022-07-29 Thread Scott Cheloha
Recently I've been doing some MIPS64 stuff on my EdgeRouter PoE.  It
has a USB disk, two 500MHz processors, and 512MB of RAM.

So, every time I reboot to test the next iteration of my kernel
patch, I get to here:

reordering libraries: 

and I sit there for half a minute or more and wonder what the hell
it's doing.

And, in my intellectual brain, I know it's relinking the libraries
and that this is slow because it needs to link a bunch of object files
and my machine is slow and my disk is slow and I have almost no RAM.

But!  My animal brain wishes I could see some indication of progress.
Because the script has told me it is linking more than one library.
So, as with daemon startup, I am curious which library it is working
on at any given moment.

Can we print the library names as they are being relinked?

With the attached patch the boot now looks like this:

reordering libraries: ld.so libc.so.96.1 libcrypto.so.49.1.

We print the library name before it is relinked, so you can know which
library it is linking.

If for some reason we fail on a particular library, it instead looks
like this:

reordering libraries: ld.so(failed).

... which is me trying to imitate what we do for daemon startup.

Thoughts?

I know this makes rc(8) a bit noisier but it really does improve my
(for want of a better term) "user experience" as I wait for my machine
to boot.

Index: rc
===
RCS file: /cvs/src/etc/rc,v
retrieving revision 1.563
diff -u -p -r1.563 rc
--- rc  28 Jul 2022 16:06:04 -  1.563
+++ rc  30 Jul 2022 00:15:26 -
@@ -193,7 +193,7 @@ reorder_libs() {
# Remount the (read-only) filesystems in _ro_list as read-write.
for _mp in $_ro_list; do
if ! mount -u -w $_mp; then
-   echo ' failed.'
+   echo '(failed).'
return
fi
done
@@ -210,6 +210,7 @@ reorder_libs() {
_install='install -F -o root -g bin -m 0444'
_lib=${_liba##*/}
_lib=${_lib%.a}
+   echo -n " $_lib"
_lib_dir=${_liba#$_relink}
_lib_dir=${_lib_dir%/*}
cd $_tmpdir
@@ -243,9 +244,9 @@ reorder_libs() {
done
 
if $_error; then
-   echo ' failed.'
+   echo '(failed).'
else
-   echo ' done.'
+   echo '.'
fi
 }
 



Re: ts(1): parse input format string only once

2022-07-29 Thread Scott Cheloha
On Wed, Jul 13, 2022 at 12:50:24AM -0500, Scott Cheloha wrote:
> We reduce overhead if we only parse the user's format string once.  To
> achieve that, this patch does the following:
> 
> [...]
> 
> - When parsing the user format string in fmtfmt(), keep a list of
>   where each microsecond substring lands in buf.  We'll need it later.
> 
> - Move the printing part of fmtfmt() into a new function, fmtprint().
>   fmtprint() is now called from the main loop instead of fmtfmt().
> 
> - In fmtprint(), before calling strftime(3), update any microsecond
>   substrings in buf using the list we built earlier in fmtfmt().  Note
>   that if there aren't any such substrings we don't call snprintf(3)
>   at all.
> 
> [...]

Two week bump.

Here is a stripped-down patch with only the above changes.  Hopefully
this makes the intent of the patch more obvious.

In short, parse the user format string only once, and then only update
the microsecond parts (if any) when we print each new timestamp.

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.8
diff -u -p -r1.8 ts.c
--- ts.c7 Jul 2022 10:40:25 -   1.8
+++ ts.c29 Jul 2022 13:12:07 -
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -27,13 +28,20 @@
 #include 
 #include 
 
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*format = "%b %d %H:%M:%S";
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(void);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -88,6 +96,8 @@ main(int argc, char *argv[])
if ((outbuf = calloc(1, obsize)) == NULL)
err(1, NULL);
 
+   fmtfmt();
+
/* force UTC for interval calculations */
if (iflag || sflag)
if (setenv("TZ", "UTC", 1) == -1)
@@ -106,7 +116,7 @@ main(int argc, char *argv[])
timespecadd(, _offset, );
else
ts = now;
-   fmtfmt();
+   fmtprint();
if (iflag)
start = now;
}
@@ -132,15 +142,11 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(const struct timespec *ts)
+fmtfmt(void)
 {
-   struct tm *tm;
-   char *f, us[7];
-
-   if ((tm = localtime(>tv_sec)) == NULL)
-   err(1, "localtime");
+   char *f;
+   struct usec *u;
 
-   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 
@@ -159,12 +165,34 @@ fmtfmt(const struct timespec *ts)
f[0] = f[1];
f[1] = '.';
f += 2;
+   u = malloc(sizeof u);
+   if (u == NULL)
+   err(1, NULL);
+   u->pos = f;
+   SIMPLEQ_INSERT_TAIL(_queue, u, next);
l = strlen(f);
memmove(f + 6, f, l + 1);
-   memcpy(f, us, 6);
f += 6;
}
} while (*f != '\0');
+}
+
+static void
+fmtprint(const struct timespec *ts)
+{
+   char us[8];
+   struct tm *tm;
+   struct usec *u;
+
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   /* Update any microsecond substrings in the format buffer. */
+   if (!SIMPLEQ_EMPTY(_queue)) {
+   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
+   SIMPLEQ_FOREACH(u, _queue, next)
+   memcpy(u->pos, us, 6);
+   }
 
*outbuf = '\0';
if (*buf != '\0') {



Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
On Thu, Jul 28, 2022 at 04:57:41PM -0400, Dave Voutila wrote:
> 
> Stuart Henderson  writes:
> 
> > On 2022/07/28 12:57, Scott Cheloha wrote:
> >> On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
> >> >
> >> > This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
> >> > running the latest kernel from snaps.
> >>
> >> Define "breaking".
> >
> > That's clear from the output:
> >
> > : On 2022/07/28 07:55, Dave Voutila wrote:
> > : > $ sysctl -a | grep tsc
> > : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> > : > acpitimer0(1000)
> > : > machdep.tscfreq=2096064730
> > : > machdep.invarianttsc=1
> > : >
> > : > $ sysctl kern.timecounter
> > : > kern.timecounter.tick=1
> > : > kern.timecounter.timestepwarnings=0
> > : > kern.timecounter.hardware=i8254
> > : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> > : > acpitimer0(1000)
> >
> >> The code detects TSC desync and marks the timecounter non-monotonic.
> >
> > That's good (and I think as would have happened before)
> >
> >> So it uses the i8254 instead.
> >
> > But that's not so good, there are higher prio timecounters available,
> > acpihpet0 and acpitimer0, which would be better choices than i8254.
> 
> Exactly my point. Thanks Stuart.

Okay, please try this patch on the machine in question.

It adds a tc_detach() function to kern_tc.c.  The first time we fail
the sync test, the BP calls tc_detach(), changes the TSC's tc_quality
to a negative value to tell everyone "this is not monotonic", then
reinstalls the TSC timecounter again with tc_init().

Because we are making this call *once*, from one place, I do not think
the O(n) removal time matters, so I have not switched the tc_list from
SLIST to TAILQ.

It is possible for a thread to be asleep in sysctl_tc_hardware()
during resume, but the thread would be done iterating through the list
if it had reached rw_enter_write(), so removing/adding tsc_timecounter
to the list during resume cannot break list traversal.

Switching the active timecounter during resume is also fine.  The only
race is with tc_adjfreq().  If a thread is asleep in adjfreq(2) when
the system suspends, and we change the active timecounter during
resume, the frequency change may be applied to the "wrong" timecounter.

... but this is always a race, because adjfreq(2) only operates on the
active timecounter, and root can change it at any time via sysctl(2).
So it's not a new problem.

...

It might be simpler to just change tc_lock from a rwlock to a mutex.
Then the MP analysis is much simpler across a suspend/resume.

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  29 Jul 2022 01:06:17 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,276 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
}
 
-   if (tsc_drift_observed > TSC_DRIFT_MAX) {
-   printf("ERROR: %lld cycle TSC drift observed\n",
-   (long long)tsc_drift_observed);
-   tsc_timecounter.tc_quality = -1000;
-   tsc_timecounter.tc_user = 0

Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
> On Jul 28, 2022, at 13:41, Stuart Henderson  wrote:
> 
> On 2022/07/28 12:57, Scott Cheloha wrote:
>>> On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
>>> 
>>> This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
>>> running the latest kernel from snaps.
>> 
>> Define "breaking".
> 
> That's clear from the output:
> 
> : On 2022/07/28 07:55, Dave Voutila wrote:
> : > $ sysctl -a | grep tsc
> : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> : > acpitimer0(1000)
> : > machdep.tscfreq=2096064730
> : > machdep.invarianttsc=1
> : > 
> : > $ sysctl kern.timecounter
> : > kern.timecounter.tick=1
> : > kern.timecounter.timestepwarnings=0
> : > kern.timecounter.hardware=i8254
> : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> : > acpitimer0(1000)
> 
>> The code detects TSC desync and marks the timecounter non-monotonic.
> 
> That's good (and I think as would have happened before)
> 
>> So it uses the i8254 instead.
> 
> But that's not so good, there are higher prio timecounters available,
> acpihpet0 and acpitimer0, which would be better choices than i8254.

Okay that was my second guess.

I will send out a patch addressing this in
a bit.



Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > Hi,
> >
> > Thanks to everyone who tested v3.
> >
> > Attached is v4.  I would like to put this into snaps (bcc: deraadt@).
> >
> > If you've been following along and testing these patches, feel free to
> > continue testing.  If your results change from v3 to v4, please reply
> > with what happened and your dmesg.
> >
> > I made a few small changes from v3:
> >
> > - Only run the sync test after failing it on TSC_DEBUG kernels.
> >   For example, it would be a waste of time to run the sync test
> >   for 62 other CPU pairs if the CPU0/CPU1 sync test failed.
> >
> > - Pad the tsc_test_status struct by hand.  Try to keep
> >   tsc_test_status.val onto its own cache line and try to prevent one
> >   instance of the struct from sharing a cache line with another
> >   instance.
> >
> > I am looking for OKs.
> >
> > Assuming the results from snaps testing aren't catastrophic, and this
> > version is OK'd, I hope to commit this after a couple weeks in snaps.
> 
> This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
> running the latest kernel from snaps.

Define "breaking".

The code detects TSC desync and marks the timecounter non-monotonic.
So it uses the i8254 instead.

This is the intended behavior of the patch.

The latest news on the desync we're seeing on certain Ryzen CPUs is
that an engineer at AMD has said it might be a bug in AGESA and that
if/when BIOS vendors pull in a fix from AMD and distribute it to
customers it may solve the problem:

https://bugzilla.kernel.org/show_bug.cgi?id=216146#c7

--

Or do you mean "breaking" in some other way?



Re: sleep.1: misc. cleanup

2022-07-27 Thread Scott Cheloha
On Wed, Jul 27, 2022 at 07:31:11AM +0100, Jason McIntyre wrote:
> On Tue, Jul 26, 2022 at 09:18:47PM -0500, Scott Cheloha wrote:
> > A few improvements I want to make to the sleep(1) manpage.
> > 
> > DESCRIPTION
> > 
> > - "for a minimum of" is better said "for at least".
> > 
> 
> hi.
> 
> i can;t really distinguish between one form being better than the other.
> "until at least" is the posix wording; "for a minimum" the text in
> net/free/open etc.

I am confident "until at least" sounds more natural than "for a
minimum of".

> 
> > - The seconds argument can be zero, so say "non-negative".
> > 
> > - Specify that the number (the whole thing) is decimal to exclude
> >   e.g. hex numbers.  It then follows that the optional fraction
> >   must also be decimal.
> > 
> > - I don't think we need to inspire the reader to use sleep(1) in any
> >   particular way.  We can just demonstrate these patterns in the
> >   Examples provided later.
> > 
> > ASYNCHRONOUS EVENTS
> > 
> > - Note that SIGALRM wakes sleep(1) up "early".
> > 
> > EXAMPLES
> > 
> > - Simplify the first example.  I think parenthetically pointing the
> >   reader to at(1) muddies what ought to be the simplest possible
> >   example.  Scheduling jobs is a way more advanced topic, sleep(1)
> >   is more like a shell primitive.
> > 
> > - Shorten the interval in the first example.  A half hour is not
> >   interactive.
> > 
> > - Get rid of the entire csh(1) example.  It's extremely complex and
> >   the bulk of the text is spent explaining things that aren't about
> >   sleep(1) at all.
> > 
> >   Maybe also of note is that very few other manpages offer csh(1)
> >   examples.  Is there a rule about that?
> > 
> 
> i suppose the dominance of sh has led to examples getting written in this
> style. but that doesn;t mean we have to rewrite all csh examples. i
> think we usually use sh for script examples, but try to make sure that
> all examples work regardless of the user shell.
> 
> you're right that the current section is a bit wordy though.

Alright, I'm leaving it out then.

> > - Tweak the third example to show the reader that you can sleep
> >   for a fraction of a second, as mentioned in the Description.
> > 
> > STANDARDS
> > 
> > - Prefer active voice.
> > 
> >   "The handling of fractional arguments" is better said
> >   "Support for fractional seconds".
> > 
> >   Shorten "is provided as" to "is".
> > 
> > SEE ALSO
> > 
> > - Seems logical to point back to nanosleep(2) and sleep(3).
> > 
> 
> normally we'd try to avoid sending the reader of section1 pages to
> sections 2/3/9. but if there's stuff there that will help the user (not
> code writer) then it'd make sense. is there?

Nope, removed.

> > - Add echo(1) and ls(1) from the EXAMPLES.
> > 
> 
> that's not needed. we don;t add every command listed in the page to SEE
> ALSO. just really pages we think will help people better understand the
> subject they're reading about. so echo(1) does not really help you
> understand sleep(1). however you should leave the reference to at(1) -
> in this case it shows you how to do something like sleep, but better
> suited to some specific tasks.

Okay, I have dropped echo(1) and ls(1) and restored at(1).

> >   ... unsure if we actually need to reference these or if it's
> >   a distraction.  The existing examples make use of awk(1) but
> >   do not Xr it in this section, unsure if there is a rule about
> >   this.
> > 
> > - Add signal(3) because we talk about SIGALRM.
> > 
> 
> again, i think that's outside the scope of sleep(1).

We explicitly mention that sleep(1) has non-standard behavior when
receiving SIGALRM.  It's a feature.  You can use it to, for example,
manually intervene and abbreviate a long delay in a script.

... should I cook up an example with kill(1)?

--

Here's an updated diff.

I have also noted in History that sleep(1) was rewritten for 4.4BSD
(to avoid issues with the AT copyright, I assume).  Keith Bostic
committed it, but I don't know if he actually rewrote it.

Index: sleep.1
===
RCS file: /cvs/src/bin/sleep/sleep.1,v
retrieving revision 1.22
diff -u -p -r1.22 sleep.1
--- sleep.1 16 Aug 2016 18:51:25 -  1.22
+++ sleep.1 27 Jul 2022 18:35:02 -
@@ -45,58 +45,27 @@
 .Sh DESCRIPTION
 The
 .Nm
-utility
-suspends execution for a minimum of the specified number of
-.Ar seconds .

sleep.1: misc. cleanup

2022-07-26 Thread Scott Cheloha
A few improvements I want to make to the sleep(1) manpage.

DESCRIPTION

- "for a minimum of" is better said "for at least".

- The seconds argument can be zero, so say "non-negative".

- Specify that the number (the whole thing) is decimal to exclude
  e.g. hex numbers.  It then follows that the optional fraction
  must also be decimal.

- I don't think we need to inspire the reader to use sleep(1) in any
  particular way.  We can just demonstrate these patterns in the
  Examples provided later.

ASYNCHRONOUS EVENTS

- Note that SIGALRM wakes sleep(1) up "early".

EXAMPLES

- Simplify the first example.  I think parenthetically pointing the
  reader to at(1) muddies what ought to be the simplest possible
  example.  Scheduling jobs is a way more advanced topic, sleep(1)
  is more like a shell primitive.

- Shorten the interval in the first example.  A half hour is not
  interactive.

- Get rid of the entire csh(1) example.  It's extremely complex and
  the bulk of the text is spent explaining things that aren't about
  sleep(1) at all.

  Maybe also of note is that very few other manpages offer csh(1)
  examples.  Is there a rule about that?

- Tweak the third example to show the reader that you can sleep
  for a fraction of a second, as mentioned in the Description.

STANDARDS

- Prefer active voice.

  "The handling of fractional arguments" is better said
  "Support for fractional seconds".

  Shorten "is provided as" to "is".

SEE ALSO

- Seems logical to point back to nanosleep(2) and sleep(3).

- Add echo(1) and ls(1) from the EXAMPLES.

  ... unsure if we actually need to reference these or if it's
  a distraction.  The existing examples make use of awk(1) but
  do not Xr it in this section, unsure if there is a rule about
  this.

- Add signal(3) because we talk about SIGALRM.

HISTORY

- Not merely "appeared": "first appeared".

--

Tweaks?  ok?

Index: sleep.1
===
RCS file: /cvs/src/bin/sleep/sleep.1,v
retrieving revision 1.22
diff -u -p -r1.22 sleep.1
--- sleep.1 16 Aug 2016 18:51:25 -  1.22
+++ sleep.1 27 Jul 2022 02:16:18 -
@@ -45,62 +45,35 @@
 .Sh DESCRIPTION
 The
 .Nm
-utility
-suspends execution for a minimum of the specified number of
-.Ar seconds .
-This number must be positive and may contain a decimal fraction.
-.Nm
-is commonly used to schedule the execution of other commands (see below).
+utility suspends execution until at least the given number of
+.Ar seconds
+have elapsed.
+.Ar seconds
+must be a non-negative decimal value and may contain a fraction.
 .Sh ASYNCHRONOUS EVENTS
 .Bl -tag -width "SIGALRMXXX"
 .It Dv SIGALRM
-Terminate normally, with a zero exit status.
+Terminate early, with a zero exit status.
 .El
 .Sh EXIT STATUS
 .Ex -std sleep
 .Sh EXAMPLES
-Wait a half hour before running the script
-.Pa command_file
-(see also the
-.Xr at 1
-utility):
-.Pp
-.Dl (sleep 1800; sh command_file >& errors)&
-.Pp
-To repetitively run a command (with
-.Xr csh 1 ) :
-.Bd -literal -offset indent
-while (! -r zzz.rawdata)
-   sleep 300
-end
-foreach i (*.rawdata)
-   sleep 70
-   awk -f collapse_data $i >> results
-end
-.Ed
+Wait five seconds before running a command:
 .Pp
-The scenario for such a script might be: a program currently
-running is taking longer than expected to process a series of
-files, and it would be nice to have another program start
-processing the files created by the first program as soon as it is finished
-(when
-.Pa zzz.rawdata
-is created).
-The script checks every five minutes for this file.
-When it is found, processing is done in several steps
-by sleeping 70 seconds between each
-.Xr awk 1
-job.
+.Dl $ sleep 5 ; echo Hello, World!
 .Pp
-To monitor the growth of a file without consuming too many resources:
+List a file twice per second:
 .Bd -literal -offset indent
-while true; do
-   ls -l file
-   sleep 5
+while ls -l file; do
+   sleep 0.5
 done
 .Ed
 .Sh SEE ALSO
-.Xr at 1
+.Xr echo 1 ,
+.Xr ls 1 ,
+.Xr nanosleep 2 ,
+.Xr signal 3 ,
+.Xr sleep 3
 .Sh STANDARDS
 The
 .Nm
@@ -108,10 +81,11 @@ utility is compliant with the
 .St -p1003.1-2008
 specification.
 .Pp
-The handling of fractional arguments is provided as an extension to that
-specification.
+Support for fractional
+.Ar seconds
+is an extension to that specification.
 .Sh HISTORY
 A
 .Nm
-utility appeared in
+utility first appeared in
 .At v4 .



Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-25 Thread Scott Cheloha
On Mon, Jul 25, 2022 at 01:52:36PM +0200, Mark Kettenis wrote:
> > Date: Sun, 24 Jul 2022 19:33:57 -0500
> > From: Scott Cheloha 
> > 
> > On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > I don't have a powerpc64 machine, so this is untested.  [...]
> > 
> > gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.
> > 
> > Here is a corrected patch that, according to gkoehler@, actually
> > compiles.
> 
> Thanks.  I already figured that bit out myself.  Did some limited
> testing, but it seems to work correctly.  No noticable effect on the
> timekeeping even when building clang on all the (4) cores.

I wouldn't expect this patch to impact timekeeping.  All we're doing
is calling hardclock(9) a bit sooner than we normally would a few
times every second.

I would expect to see slightly more distinct interrupts (uvmexp.intrs)
per second because we aren't actively batching hardclock(9) and
statclock calls.

... by the way, uvmexp.intrs should probably be incremented
atomically, no?

> Regarding the diff, I think it would be better to avoid changing
> trap.c.  That function is complicated enough and splitting the logic
> for this over three files makes it a bit harder to understand.  So you
> could have:
> 
> void
> decr_intr(struct trapframe *frame)
> {
>   struct cpu_info *ci = curcpu();
>   ...
>   int s;
> 
>   if (ci->ci_cpl >= IPL_CLOCK) {
>   ci->ci_dec_deferred = 1;
>   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
>   return;
>   }
> 
>   ci->ci_dec_deferred = 0;
> 
>   ...
> }
> 
> That has the downside of course that it will be slightly less
> efficient if we're at IPL_CLOCK or above, but that really shouldn't
> happen often enough for it to matter.

Yep.  It's an extra function call, the overhead is small.

Updated patch below.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 23:43:47 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 23:43:47 -
@@ -98,6 +98,17 @@ decr_intr(struct trapframe *frame)
int s;
 
/*
+* If the clock interrupt is masked, postpone all work until
+* it is unmasked in splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -130,30 +141,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
}
+
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
+
+   intr_d

Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-24 Thread Scott Cheloha
On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> I don't have a powerpc64 machine, so this is untested.  [...]

gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.

Here is a corrected patch that, according to gkoehler@, actually
compiles.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 00:30:33 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 00:30:33 -
@@ -130,30 +130,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
+   s = splclock();
+   intr_enable();
 
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
+   }
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
 
-   intr_disable();
-   splx(s);
-   }
+   intr_disable();
+   splx(s);
 }
 
 void
Index: powerpc64/intr.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/intr.c,v
retrieving revision 1.9
diff -u -p -r1.9 intr.c
--- powerpc64/intr.c26 Sep 2020 17:56:54 -  1.9
+++ powerpc64/intr.c25 Jul 2022 00:30:33 -
@@ -139,6 +139,11 @@ splx(int new)
 {
struct cpu_info *ci = curcpu();
 
+   if (ci->ci_dec_deferred && new < IPL_CLOCK) {
+   mtdec(0);
+   mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
+
if (ci->ci_ipending & intr_smask[new])
intr_do_pending(new);
 
Index: powerpc64/trap.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v
retrieving revision 1.51
diff -u -p -r1.51 trap.c
--- powerpc64/trap.c11 May 2021 18:21:12 -  1.51
+++ powerpc64/trap.c25 Jul 2022 00:30:33 -
@@ -65,9 +65,15 @@ trap(struct trapframe *frame)
switch (type) {
case EXC_DECR:
uvmexp.intrs++;
-   ci->ci_idepth++;
-   decr_intr(frame);
-   ci->ci_idepth--;
+   if (ci->ci_cpl < IPL_CLOCK) {
+   ci->ci_dec_deferred = 0;
+   ci->ci_idepth++;
+   decr_intr(frame);
+   ci->ci_idepth--;
+   } else {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   }
return;
case EXC_EXI:
uvmexp.intrs++;



powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-23 Thread Scott Cheloha
Okay, we did this for powerpc/macppc, on to powerpc64.

It's roughly the same problem as before:

- On powerpc64 we need to leave the DEC unmasked at or above
  IPL_CLOCK.

- Currently we defer clock interrupt work to the next tick if a DEC
  interrupt arrives when the CPU's priority level is at or above
  IPL_CLOCK.

- This is a problem because the MD code needs to know about
  when the next clock interrupt event is scheduled and I intend
  to make that information machine-independent and handle it
  in machine-independent code in the future.

- This patch instead defers clock interrupt work to the next splx(9)
  call where the CPU's priority level is dropping below IPL_CLOCK.
  This requires no knowledge of when the next clock interrupt
  event is scheduled.

The code is almost identical to what we did for powerpc/macppc,
except that:

- We can do the ci_dec_deferred handling in trap(), which is a
  bit cleaner.

- There is only one splx() function that needs modifying.

Unless I'm missing something, we no longer need the struct member
cpu_info.ci_statspending.

I don't have a powerpc64 machine, so this is untested.  I would
appreciate tests and review.  If you're copied on this, I'm under the
impression you have a powerpc64 machine or know someone who might.

Thoughts?  Test results?

I'm really sorry if this doesn't work out of the box and your machine
hangs.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   24 Jul 2022 01:08:22 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   24 Jul 2022 01:08:22 -
@@ -130,30 +130,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
+   s = splclock();
+   intr_enable();
 
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
+   }
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
 
-   intr_disable();
-   splx(s);
-   }
+   intr_disable();
+   splx(s);
 }
 
 void
Index: powerpc64/intr.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/intr.c,v
retrieving revision 1.9
diff -u -p -r1.9 intr.c
--- powerpc64/intr.c26 Sep 2020 17:56:54 -  1.9
+++ powerpc64/intr.c24 Jul 2022 01:08:22 -
@@ -139,6 +139,11 @@ splx(int new)
 {
struct cpu_info *ci = curcpu();
 
+   if (ci->ci_dec_deferred && new < IPL_CLOCK) {
+   mtdec(0);
+   mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
+
if (ci->ci_ipending & intr_smask[new])
intr_do_pending(new);
 
Index: powerpc64/trap.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v
retrieving revision 1.51
diff -u -p -r1.51 trap.c
--- powerpc64/trap.c11 May 2021 18:21:12 -  1.51
+++ powerpc64/trap.c24 Jul 2022 01:08:22 -
@@ -65,9 +65,15 @@ trap(struct trapframe *frame)
switch (type) {
case EXC_DECR:
uvmexp.intrs++;
-   ci->ci_idepth++;
-   decr_intr(frame);
-   ci->ci_idepth--;
+   if (ci->ci_cpl < IPL_CLOCK) {
+   ci->ci_decr_deferred = 0;
+   

[v2] timeout.9: rewrite

2022-07-22 Thread Scott Cheloha
Hi,

As promised, here is the timeout.9 manpage rewrite I've been sitting
on.  I am pretty sure jmc@ (and maybe schwarze@) read an earlier
version of this.  It has drifted a bit since then, but not much.

My main goal here is to make all the "gotchas" in the timeout API more
explicit.  The API is large, so the manpage is necessarily longer than
the average manpage.

We're also stuck in the midst of an API transition, so there is some
overlap in the API coverage.  Hopefully most of that redundancy can be
consolidated in the future after I finish the clock interrupt work.

-Scott

Index: share/man/man9/timeout.9
===
RCS file: /cvs/src/share/man/man9/timeout.9,v
retrieving revision 1.55
diff -u -p -r1.55 timeout.9
--- share/man/man9/timeout.922 Jun 2022 14:10:49 -  1.55
+++ share/man/man9/timeout.922 Jul 2022 18:34:14 -
@@ -1,6 +1,7 @@
 .\"$OpenBSD: timeout.9,v 1.55 2022/06/22 14:10:49 visa Exp $
 .\"
 .\" Copyright (c) 2000 Artur Grabowski 
+.\" Copyright (c) 2021, 2022 Scott Cheloha 
 .\" All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
@@ -36,6 +37,8 @@
 .Nm timeout_add_nsec ,
 .Nm timeout_add_usec ,
 .Nm timeout_add_tv ,
+.Nm timeout_rel_nsec ,
+.Nm timeout_abs_ts ,
 .Nm timeout_del ,
 .Nm timeout_del_barrier ,
 .Nm timeout_barrier ,
@@ -44,281 +47,375 @@
 .Nm timeout_triggered ,
 .Nm TIMEOUT_INITIALIZER ,
 .Nm TIMEOUT_INITIALIZER_FLAGS
-.Nd execute a function after a specified period of time
+.Nd execute a function in the future
 .Sh SYNOPSIS
 .In sys/types.h
 .In sys/timeout.h
 .Ft void
-.Fn timeout_set "struct timeout *to" "void (*fn)(void *)" "void *arg"
+.Fo timeout_set
+.Fa "struct timeout *to"
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
 .Ft void
 .Fo timeout_set_flags
 .Fa "struct timeout *to"
 .Fa "void (*fn)(void *)"
 .Fa "void *arg"
+.Fa "int kclock"
 .Fa "int flags"
 .Fc
 .Ft void
-.Fn timeout_set_proc "struct timeout *to" "void (*fn)(void *)" "void *arg"
+.Fo timeout_set_proc
+.Fa "struct timeout *to"
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
 .Ft int
-.Fn timeout_add "struct timeout *to" "int ticks"
+.Fo timeout_add
+.Fa "struct timeout *to"
+.Fa "int nticks"
+.Fc
 .Ft int
-.Fn timeout_del "struct timeout *to"
+.Fo timeout_add_sec
+.Fa "struct timeout *to"
+.Fa "int secs"
+.Fc
 .Ft int
-.Fn timeout_del_barrier "struct timeout *to"
-.Ft void
-.Fn timeout_barrier "struct timeout *to"
+.Fo timeout_add_msec
+.Fa "struct timeout *to"
+.Fa "int msecs"
+.Fc
 .Ft int
-.Fn timeout_pending "struct timeout *to"
+.Fo timeout_add_usec
+.Fa "struct timeout *to"
+.Fa "int usecs"
+.Fc
 .Ft int
-.Fn timeout_initialized "struct timeout *to"
+.Fo timeout_add_nsec
+.Fa "struct timeout *to"
+.Fa "int nsecs"
+.Fc
 .Ft int
-.Fn timeout_triggered "struct timeout *to"
+.Fo timeout_add_tv
+.Fa "struct timeout *to"
+.Fa "struct timeval *tv"
+.Fc
 .Ft int
-.Fn timeout_add_tv "struct timeout *to" "struct timeval *"
+.Fo timeout_rel_nsec
+.Fa "struct timeout *to"
+.Fa "uint64_t nsecs"
+.Fc
 .Ft int
-.Fn timeout_add_sec "struct timeout *to" "int sec"
+.Fo timeout_abs_ts
+.Fa "struct timeout *to"
+.Fa "const struct timespec *abs"
+.Fc
 .Ft int
-.Fn timeout_add_msec "struct timeout *to" "int msec"
+.Fo timeout_del
+.Fa "struct timeout *to"
+.Fc
+.Ft int
+.Fo timeout_del_barrier
+.Fa "struct timeout *to"
+.Fc
+.Ft void
+.Fo timeout_barrier
+.Fa "struct timeout *to"
+.Fc
+.Ft int
+.Fo timeout_pending
+.Fa "struct timeout *to"
+.Fc
 .Ft int
-.Fn timeout_add_usec "struct timeout *to" "int usec"
+.Fo timeout_initialized
+.Fa "struct timeout *to"
+.Fc
 .Ft int
-.Fn timeout_add_nsec "struct timeout *to" "int nsec"
-.Fn TIMEOUT_INITIALIZER "void (*fn)(void *)" "void *arg"
-.Fn TIMEOUT_INITIALIZER_FLAGS "void (*fn)(void *)" "void *arg" "int flags"
+.Fo timeout_triggered
+.Fa "struct timeout *to"
+.Fc
+.Fo TIMEOUT_INITIALIZER
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
+.Fo TIMEOUT_INITIALIZER_FLAGS
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fa "int kclock"
+.Fa "int flags"
+.Fc
 .Sh DESCRIPTION
 The
 .Nm timeout
-API provides a mechanism to execute a function at a given time.
-The granularity of the time is limited by the granularity of the
-.Xr hardclock 9
-timer which

Re: timeout.9: fix description

2022-07-22 Thread Scott Cheloha
> On Jul 22, 2022, at 05:50, Klemens Nanni  wrote:
> 
> NAME has it right:
>... – execute a function after a specified period of time
> 
> but DESCRIPTION says something else:
>The timeout API provides a mechanism to execute a function
>at a given time.
> 
> The latter reads as if I could pass a specific point in time, e.g.
> Fri Jul 22 16:00:00 UTC 2022, at which a function should be run.
> 
> But this API is about timeouts, i.e. a duration of time like 10s,
> which does not my understanding of "at a given time".
> 
> 
> So reuse NAME's wording and make it a proper sentence.
> 
> Feedback? OK?

I rewrote this page a year or so ago but I
think I dropped the patch due to lack of
developer input.  If you give me 12 hours I
will send it out for your consideration.

I would send it sooner, but I accidentally cut
the fiber-optic cable with a shovel last night,
so we all need to wait until this afternoon for
AT to replace it.



[v4] amd64: simplify TSC sync testing

2022-07-20 Thread Scott Cheloha
Hi,

Thanks to everyone who tested v3.

Attached is v4.  I would like to put this into snaps (bcc: deraadt@).

If you've been following along and testing these patches, feel free to
continue testing.  If your results change from v3 to v4, please reply
with what happened and your dmesg.

I made a few small changes from v3:

- Only run the sync test after failing it on TSC_DEBUG kernels.
  For example, it would be a waste of time to run the sync test
  for 62 other CPU pairs if the CPU0/CPU1 sync test failed.

- Pad the tsc_test_status struct by hand.  Try to keep
  tsc_test_status.val onto its own cache line and try to prevent one
  instance of the struct from sharing a cache line with another
  instance.

I am looking for OKs.

Assuming the results from snaps testing aren't catastrophic, and this
version is OK'd, I hope to commit this after a couple weeks in snaps.

There are two things I'm unsure about that I hope a reviewer will
comment on:

- Do we need to keep the double-test?  IIUC the purpose of the
  double-test is to check for drift.  But with this change we no
  longer have a concept of drift.

- Is the LFENCE in tsc_test_ap()/tst_test_bp() sufficient
  to ensure one TSC value predates the other?  Or do I need
  to insert membar_consumer()/membar_producer() calls to
  provide that guarantee?

-Scott

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  20 Jul 2022 21:58:40 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,276 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
}
 
-   if (tsc_drift_observed > TSC_DRIFT_MAX) {
-   printf("ERROR: %lld cycle TSC drift observed\n",
-   (long long)tsc_drift_observed);
-   tsc_timecounter.tc_quality = -1000;
-   tsc_timecounter.tc_user = 0;
-   tsc_is_invariant = 0;
-   }
-
tc_init(_timecounter);
 }
 
-/*
- * Record drift (in clock cycles).  Called during AP startup.
- */
 void
-tsc_sync_drift(int64_t drift)
+tsc_delay(int usecs)
 {
-   if (drift < 0)
-   drift = -drift;
-   if (drift > tsc_drift_observed)
-   tsc_drift_observed = drift;
+   uint64_t interval, start;
+
+   interval = (uint64_t)usecs * tsc_frequency / 100;
+   start = rdtsc_lfence();
+   while (rdtsc_lfence() - start < interval)
+   CPU_BUSY_CYCLE();
 }
 
+#ifdef MULTIPROCESSOR
+
+#define TSC_DEBUG 1
+
+/*
+ * Protections for global variables in this code:
+ *
+ * a   Modified atomically
+ * b   Protected by a barrier
+ * p   Only modified by the primary CPU
+ */
+
+#define TSC_TEST_MS1   /* Test round duration */
+#define TSC_TEST_ROUNDS2   /* Number of test rounds */
+
 /*
- * Called during startup of APs, by the boot processor.  Interrupts
- * are disabled on entry.
+ * tsc_test_status.val is cacheline-aligned (64-byte) to limit
+ * false sharing during the test and reduce our margin of error.
  */
+struct tsc_test_status {
+   volatile uint64_t val;  /* [b] latest RDTSC value */
+   uint64_t pad1[7];
+   uint64_t lag_count; /* [b] number of lags seen by CPU */
+   uint64_t lag_max;   /* [b] Biggest lag seen */
+   int64_t adj;/* [b] initial IA32_TSC_ADJUST value */
+   uint64_t pad2[5];
+} __aligned(64);
+struct tsc_test_status tsc_ap_status;  /* [b] Test results from AP */

Re: [v3] amd64: simplify TSC sync testing

2022-07-20 Thread Scott Cheloha
> On Jul 20, 2022, at 01:48, Masato Asou  wrote:
> 
> Sorry, my latest reply.
> 
> I tested your patch on my Proxmox Virtual Environment on Ryzen7 box.
> It works fine for me.

This VM doesn't have the ITSC CPU flag,
how is it using the TSC as a timecounter?

> OpenBSD 7.1-current (GENERIC.MP) #1: Wed Jul 20 14:15:23 JST 2022
>a...@pve-obsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17162952704 (16367MB)
> avail mem = 16625430528 (15855MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf59c0 (10 entries)
> bios0: vendor SeaBIOS version "rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org" 
> date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC SSDT HPET WAET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Common KVM processor, 3593.56 MHz, 0f-06-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,CMPLEG

Here, no "ITSC".


ts(1): parse input format string only once

2022-07-12 Thread Scott Cheloha
We reduce overhead if we only parse the user's format string once.  To
achieve that, this patch does the following:

- Move "format" into main().  We don't need it as a global anymore.

- Move buffer allocation into fmtfmt().  As claudio@ mentioned in a
  different thread, we need at most (3 * strlen(format) + 1) bytes for
  buf, the parsed format string.  I have added a comment explaining the
  allocation.  I also left an assert(3) to confirm my math.  Unsure
  whether or not to leave the assert(3) in... we only run the assert
  once, so it isn't very costly.

- In fmtfmt(), preallocate a flat 512 bytes for outbuf.  We aren't using
  the 10x allocation for buf anymore, so keeping it for outbuf seems
  arbitrary. If we're going to use a magic number I figure it may as
  well be large enough for practical timestamps and a power of two.
  Feel free to suggest something else.

- Call fmtfmt() where we used to do buffer allocation in main().

- When parsing the user format string in fmtfmt(), keep a list of
  where each microsecond substring lands in buf.  We'll need it later.

- Move the printing part of fmtfmt() into a new function, fmtprint().
  fmtprint() is now called from the main loop instead of fmtfmt().

- In fmtprint(), before calling strftime(3), update any microsecond
  substrings in buf using the list we built earlier in fmtfmt().  Note
  that if there aren't any such substrings we don't call snprintf(3)
  at all.

--

Okay, on to the numbers.  My benchmark input is a million newlines:

$ yes '' | head -n 100 > newline-1M.txt

The benchmark is "real time taken to timestamp the input."

Patched ts(1) is about 45% faster using the empty format string.
N=100.

x ts-head.dat1
+ ts-patch.dat1
N Min Max  Median AvgStddev
x 100   1.74203061.820921   1.7468652   1.7504513   0.010192689
+ 100  0.96225744  0.98482864  0.96404194  0.96658115  0.0052265094
Difference at 99.5% confidence
-0.78387 +/- 0.00353946
-44.781% +/- 0.202203%
(Student's t, pooled s = 0.00809961)

Patched ts(1) is about 25% faster using the default format string,
i.e. '%b %d %H:%M:%S'.  N=100.

x ts-head.dat2
+ ts-patch.dat2
NMinMax MedianAvg   Stddev
x 100  4.7128656  4.9162049  4.7212578  4.7313946  0.026306241
+ 100  3.5083849  3.7382005  3.5126801  3.5271755   0.03256854
Difference at 99.5% confidence
-1.20422 +/- 0.0129365
-25.4517% +/- 0.273418%
(Student's t, pooled s = 0.0296034)

Patched ts(1) is about 10% faster using the format string '%FT%.TZ'.
This format is similar to the ISO 8601 timestamp format but with added
microsecond granularity.  N=100.

x ts-head.dat4
+ ts-patch.dat4
NMinMax MedianAvg   Stddev
x 100  6.5432762  7.0483151  6.5909038  6.6034806  0.065535466
+ 100  5.9177588  6.5244303  5.9288786  5.9535684  0.074405632
Difference at 99.5% confidence
-0.649912 +/- 0.0306379
-9.84196% +/- 0.463966%
(Student's t, pooled s = 0.070111)

All differences are statistically significant at a 99.5 CI.

--

Thoughts?  Tweaks?  ok?

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.8
diff -u -p -r1.8 ts.c
--- ts.c7 Jul 2022 10:40:25 -   1.8
+++ ts.c13 Jul 2022 05:41:55 -
@@ -17,8 +17,10 @@
  */
 
 #include 
+#include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -27,18 +29,25 @@
 #include 
 #include 
 
-static char*format = "%b %d %H:%M:%S";
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(const char *);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
 main(int argc, char *argv[])
 {
+   char *format = "%b %d %H:%M:%S";
int iflag, mflag, sflag;
int ch, prev;
struct timespec start, now, utc_offset, ts;
@@ -75,18 +84,7 @@ main(int argc, char *argv[])
if ((iflag && sflag) || argc > 1)
usage();
 
-   if (argc == 1)
-   format = *argv;
-
-   bufsize = strlen(format) + 1;
-   if (bufsize > SIZE_MAX / 10)
-   errx(1, "format string too big");
-   bufsize *= 10;
-   obsize = bufsize;
-   if ((buf = calloc(1, bufsize)) == NULL)
-   err(1, NULL);
-   if ((outbuf = calloc(1, obsize)) == NULL)
-   err(1, NULL);
+   fmtfmt(argc == 1 ? *argv : format);
 
/* force UTC for interval calculations */
if (iflag || sflag)
@@ -106,7 +104,7 @@ main(int argc, char *argv[])

Re: echo(1): check for stdio errors

2022-07-11 Thread Scott Cheloha
On Mon, Jul 11, 2022 at 08:31:04AM -0600, Todd C. Miller wrote:
> On Sun, 10 Jul 2022 20:58:35 -0900, Philip Guenther wrote:
> 
> > Three thoughts:
> > 1) Since stdio errors are sticky, is there any real advantage to checking
> > each call instead of just checking the final fclose()?

My thinking was that we have no idea how many arguments we're going to
print, so we may as well fail as soon as possible.

Maybe in more complex programs there would be a code-length or
complexity-reducing upside to deferring the ferror(3) check until,
say, the end of a subroutine or something.

> > [...]
> 
> Will that really catch all errors?  From what I can tell, fclose(3)
> can succeed even if the error flag was set.  The pattern I prefer
> is to use a final fflush(3) followed by a call to ferror(3) before
> the fclose(3).

That's weird, I was under the impression POSIX mandated an error case
for the implicit fflush(3) done by fclose(3).  But I'm looking at the
standard and seeing nothing specific.

So, yes?  It is probably more portable to check fflush(3) explicitly?

This feels redundant though.  Like, obviously I want to flush the
descriptor when we close the stream, and obviously I would want to
know if the flush failed.  That's why I'm using stdio.

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  11 Jul 2022 18:19:39 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



echo(1): check for stdio errors

2022-07-10 Thread Scott Cheloha
ok?

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  10 Jul 2022 22:00:18 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
if (!nflag)
putchar('\n');
+   if (fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
; pvbus0 at mainbus0: bhyve
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 unknown vendor 0x1275 product 0x1275 rev 0x00
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> virtio0 at pci0 dev 2 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio0
> scsibus1 at vioblk0: 1 targets
> sd0 at scsibus1 targ 0 lun 0: 
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors
> virtio0: msix shared
> ahci0 at pci0 dev 3 function 0 "Intel 82801H AHCI" rev 0x00: msi, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus2 at ahci0: 32 targets
> cd0 at scsibus2 targ 0 lun 0:  removable
> virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio1: address 00:a0:98:db:89:86
> virtio1: msix shared
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0 mux 1
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> /dev/ksyms: Symbol table not valid.
> vscsi0 at root
> scsibus3 at vscsi0: 256 targets
> softraid0 at root
> scsibus4 at softraid0: 256 targets
> root on sd0a (31879798ea82ad23.a) swap on sd0b dump on sd0b
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> 
> 
> On 7/5/22 11:06, Scott Cheloha wrote:
> > Hi,
> > 
> > Once again, I am trying to change our approach to TSC sync testing to
> > eliminate false positive results.  Instead of trying to repair the TSC
> > by measuring skew, we just spin in a lockless loop looking for skew
> > and mark the TSC as broken if we detect any.
> > 
> > This is motivated in part by some multisocket machines that do not use
> > the TSC as a timecounter because the current sync test confuses NUMA
> > latency for TSC skew.
> > 
> > The only difference between this version and the prior version (v2) is
> > that we check whether we have the IA32_TSC_ADJUST register by hand in
> > tsc_reset_adjust().  If someone wants to help me rearrange cpu_hatch()
> > so we do CPU identification before TSC sync testing we can remove the
> > workaround later.
> > 
> > If you have the IA32_TSC_ADJUST register and it is non-zero going into
> > the test, you will see something on the console like this:
> > 
> > tsc: cpu5: IA32_TSC_ADJUST: -150 -> 0
> > 
> > This does *not* mean you failed the test.  It just means you probably
> > have a bug in your BIOS (or some other firmware) and you should report
> > it to your vendor.
> > 
> > If you fail the test you will see something like this:
> > 
> > tsc: cpu0/cpu2: sync test round 1/2 failed
> > tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> > 
> > A printout like this would mean that the sync test for cpu2 failed.
> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> > If this happens for *any* CPU we mark the TSC timecounter as
> > defective.
> > 
> > --
> > 
> > Please test!  Send your dmesg, pass or fail.
> > 
> > I am especially interested in:
> > 
> > 1. A test from dv@.  Your dual-socket machine has the IA32_TSC_ADJUST
> > register but it failed the test running patch v2.  Maybe it will pass
> > with this version?
> > 
> > 2. Other multisocket machines.
> > 
> > 3. There were reports of TSC issues with OpenBSD VMs running on ESXi.
> > What happens when you run with this patch?
> > 
> > 4. OpenBSD VMs on other hypervisors.
> > 
> > 5. Non-Lenovo machines, non-Intel machines.
> > 
> > -Scott
> > 
> > Index: amd64/tsc.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > retrieving revision 1.24
> > diff -u -p -r1.24 tsc.c
> > --- amd64/tsc.c 31 Aug 2021 15:11:54 -  1.24
> > +++ amd64/tsc.c 5 Jul 2022 01:52:10 -
> > @@ -36,13 +36,6 @@ int  tsc_recalibrate;
> >   uint64_t  tsc_frequency;
> >   int   tsc_is_invariant;
> > -#defineTSC_DRIFT_MAX   250
> > -#define TSC_SKEW_MAX   100
> > -int64_ttsc_drift_observed;
> > -
> > -volatile int64_t   tsc_sync_val;
> > -volatile struct cpu_info   *tsc_sync_cpu;
> > -
> >   u_int tsc_get_timecount(struct timecounter *tc);
> >   void  tsc_delay(int usecs);
> > @@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timec

Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 01:58:51PM -0700, Mike Larkin wrote:
> On Wed, Jul 06, 2022 at 11:48:41AM -0500, Scott Cheloha wrote:
> > > On Jul 6, 2022, at 11:36 AM, Mike Larkin  wrote:
> > >
> > > On Tue, Jul 05, 2022 at 07:16:26PM -0500, Scott Cheloha wrote:
> > >> On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
> > >>> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
> > >>>>
> > >>>> [...]
> > >>>
> > >>> Here's the output from a 4 socket 80 thread machine.
> > >>
> > >> Oh nice.  I think this is the biggest machine we've tried so far.
> > >>
> > >>> kern.timecounter reports tsc after boot.
> > >>
> > >> Excellent.
> > >>
> > >>> Looks like this machine doesn't have the adjust MSR?
> > >>
> > >> IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
> > >> 2011 or 2012.  I can't find the exact revision.
> > >>
> > >> (I really wish there was a comprehensive version history for this sort
> > >> of thing, i.e. this MSR first appeared in the blah-blah uarch, this
> > >> instruction is available on all uarchs after yada-yada, etc.)
> > >>
> > >> There are apparently several versions of the E7-4870 in the E7
> > >> "family".  If your CPU predates that, or launched 2012-2014, the MSR
> > >> may not have made the cut.
> > >>
> > >> An aside: I cannot find any evidence of AMD supporting this MSR in any
> > >> processor.  It would be really, really nice if they did.  If you (or
> > >> anyone reading) knows anything about this, or whether they have an
> > >> equivalent MSR, shout it out.
> > >>
> > >>> Other than that, machine seems stable.
> > >>
> > >> Good, glad to hear it.  Thank you for testing.
> > >>
> > >> Has this machine had issues using the TSC on -current in the past?
> > >>
> > >> (If you have the time) what does the dmesg look like on the -current
> > >> kernel with TSC_DEBUG enabled?
> > >
> > > Looks like you enabled TSC_DEBUG in your diff, so what I sent you is what 
> > > you
> > > are asking for...?
> >
> > No, I mean on the -current *unpatched* kernel.  Sorry if that wasn't
> > clear.
> >
> > Our -current kernel prints more detailed information if TSC_DEBUG
> > is enabled.  In particular, I'm curious if the unpatched kernel
> > detects any skew or drift on your machine, and if so, how much.
> >
> 
> here you go. I didnt run with all 80 cpus since -current doesnt have my
> " > 64 cpus" diff, but I think this is what you're after in any case.

Yes!  This is what I was looking for, thanks.

> cpu0: TSC skew=0 observed drift=0
> cpu1: TSC skew=112 observed drift=0
> cpu2: TSC skew=102 observed drift=0
> cpu3: TSC skew=-134 observed drift=0
> cpu4: TSC skew=4 observed drift=0
> cpu5: TSC skew=68 observed drift=0
> cpu6: TSC skew=22 observed drift=0
> cpu7: TSC skew=-52 observed drift=0
> cpu8: TSC skew=8 observed drift=0
> cpu9: TSC skew=-18 observed drift=0
> cpu10: TSC skew=10 observed drift=0
> cpu11: TSC skew=76 observed drift=0
> cpu12: TSC skew=-2 observed drift=0
> cpu13: TSC skew=-4 observed drift=0
> cpu14: TSC skew=-2 observed drift=0
> cpu15: TSC skew=-28 observed drift=0
> cpu16: TSC skew=6 observed drift=0
> cpu17: TSC skew=-8 observed drift=0
> cpu18: TSC skew=0 observed drift=0
> cpu19: TSC skew=-32 observed drift=0
> cpu20: TSC skew=0 observed drift=0
> cpu21: TSC skew=-26 observed drift=0
> cpu22: TSC skew=0 observed drift=0
> cpu23: TSC skew=22 observed drift=0
> cpu24: TSC skew=-12 observed drift=0
> cpu25: TSC skew=-14 observed drift=0
> cpu26: TSC skew=76 observed drift=0
> cpu27: TSC skew=-64 observed drift=0
> cpu28: TSC skew=-2 observed drift=0
> cpu29: TSC skew=34 observed drift=0
> cpu30: TSC skew=22 observed drift=0
> cpu31: TSC skew=-58 observed drift=0
> cpu32: TSC skew=-2 observed drift=0
> cpu33: TSC skew=6 observed drift=0
> cpu34: TSC skew=46 observed drift=0
> cpu35: TSC skew=20 observed drift=0
> cpu36: TSC skew=34 observed drift=0
> cpu37: TSC skew=-8 observed drift=0
> cpu38: TSC skew=48 observed drift=0
> cpu39: TSC skew=-10 observed drift=0
> cpu40: TSC skew=0 observed drift=0
> cpu41: TSC skew=72 observed drift=0
> cpu42: TSC skew=2 observed drift=0
> cpu43: TSC skew=-46 observed drift=0
> cpu44: TSC skew=-2 observed drift=0
> cpu45: TSC skew=-14 observed drift=0
> cpu46: TSC skew=-2 observed drift=0
> cpu47: TSC skew=-32 observed drift=0
> cpu48: TSC skew=12 observed drift=0
> cpu49: TSC skew=-16 observed drift=0
> cpu50: TSC skew=84 observed drift=0
> cpu51: TSC skew=-44 observed drift=0
> cpu52: TSC skew=-4 observed drift=0
> cpu53: TSC skew=4 observed drift=0
> cpu54: TSC skew=16 observed drift=0
> cpu55: TSC skew=-56 observed drift=0
> cpu56: TSC skew=-10 observed drift=0
> cpu57: TSC skew=6 observed drift=0
> cpu58: TSC skew=6 observed drift=0
> cpu59: TSC skew=-40 observed drift=0
> cpu60: TSC skew=-4 observed drift=0
> cpu61: TSC skew=-6 observed drift=0
> cpu62: TSC skew=74 observed drift=0
> cpu63: TSC skew=-48 observed drift=0



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 08:20:05PM -0400, Mohamed Aslan wrote:
> > First, you need to update to the latest firmware.  Maybe they already
> > fixed the problem.  I don't see any mention of the TSC in the BIOS
> > changelog for the e495 but maybe you'll get lucky.
> > 
> > Second, if they haven't fixed the problem with the latest firmware, I
> > recommend you reach out to Lenovo and report the problem.
> > 
> > Lenovo seem to have been sympathetic to reports about TSC desync in
> > the past on other models and issued firmware fixes.  For example,
> > the v1.28 firmware for the ThinkPad A485 contained a fix for what
> > I assume is a very similar problem to the one you're having:
> > 
> > https://download.lenovo.com/pccbbs/mobiles/r0wuj65wd.txt
> > 
> > And this forum post, for example, got some response from Lenovo staff:
> > 
> > https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T-series-Laptops/T14s-G1-AMD-TSC-clock-unusable/m-p/5070296?page=1
> > 
> > So, open a post for your model and cite the other posts.
> > 
> > They might not be sympathetic to the fact that you're seeing the issue
> > on OpenBSD.  If that's a problem you should be able to reproduce the
> > problem with a recent Linux kernel.  The Linux kernel runs a similar
> > sync test during boot and will complain if the TSCs are not
> > synchronized.
> > 
> > Honestly, to save time you may want to just boot up a supported Linux
> > distribution and grab the error message before you ask for support.
> > 
> 
> I can confirm that this is also the case with Linux. This is the
> output of dmesg on Void Linux:
> 
> [0.00] tsc: Fast TSC calibration using PIT  
> [0.00] tsc: Detected 2096.114 MHz processor
> ...
> ...
> [1.314252] TSC synchronization [CPU#0 -> CPU#1]:
> [1.314252] Measured 6615806646 cycles TSC warp between CPUs, turning off 
> TSC clock.
> [1.314252] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [1.314397]   #2  #3  #4  #5  #6  #7

This is good news.  My code isn't the only code finding a problem :)

> 
> Not sure if Void is a Lenovo supported Linux distribution, still
> though I think it's worth reporting.

Probably not.  Your laptop may not even be "Linux certified",
but it's worth reporting all the same.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
> On Jul 6, 2022, at 10:04 AM, Christian Weisgerber  wrote:
> 
> Scott Cheloha:
> 
>>> kern.timecounter.tick=1
>>> kern.timecounter.timestepwarnings=0
>>> kern.timecounter.hardware=i8254
>>> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
>> 
>> This is expected behavior with the patch.
>> 
>> cpu0's TSC is way out of sync with every
>> other CPU's TSC, so the TSC is marked
>> as a bad timecounter and a different one is
>> chosen.
> 
> Shouldn't it pick acpihpet0 then?

It depends on the order the timecounters are installed.
If acpihpet0 is already installed before we degrade the
TSC's .quality value then the timecounter subsystem won't
switch to it when we install the next counter because it
assumes .quality values cannot change on the fly (a
reasonable assumption).

We don't yet have a tc_detach(9) function that uninstalls
a timecounter cleanly and chooses the next best counter
available.

This is something I want to add in a future patch.  FreeBSD
has something similar.  It may be called "tc_ban", iirc.

The alternative is to wait until we've tested synchronization
for every CPU before calling tc_init(9).  This approach is
more annoying, though, as it requires additional state.  We
would also still have the same problem when we resume from
suspend.

The dream is to be able to do something like this during the
sync test:

if (tsc_sync_test_failed) {
tc_detach(_timecounter);
tsc_timecounter.quality = -2000;
tc_init(_timecounter);
}

When we call tc_detach(9) the timecounter code would pick
the next best counter automagically.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
> On Jul 6, 2022, at 11:36 AM, Mike Larkin  wrote:
> 
> On Tue, Jul 05, 2022 at 07:16:26PM -0500, Scott Cheloha wrote:
>> On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
>>> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
>>>> 
>>>> [...]
>>> 
>>> Here's the output from a 4 socket 80 thread machine.
>> 
>> Oh nice.  I think this is the biggest machine we've tried so far.
>> 
>>> kern.timecounter reports tsc after boot.
>> 
>> Excellent.
>> 
>>> Looks like this machine doesn't have the adjust MSR?
>> 
>> IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
>> 2011 or 2012.  I can't find the exact revision.
>> 
>> (I really wish there was a comprehensive version history for this sort
>> of thing, i.e. this MSR first appeared in the blah-blah uarch, this
>> instruction is available on all uarchs after yada-yada, etc.)
>> 
>> There are apparently several versions of the E7-4870 in the E7
>> "family".  If your CPU predates that, or launched 2012-2014, the MSR
>> may not have made the cut.
>> 
>> An aside: I cannot find any evidence of AMD supporting this MSR in any
>> processor.  It would be really, really nice if they did.  If you (or
>> anyone reading) knows anything about this, or whether they have an
>> equivalent MSR, shout it out.
>> 
>>> Other than that, machine seems stable.
>> 
>> Good, glad to hear it.  Thank you for testing.
>> 
>> Has this machine had issues using the TSC on -current in the past?
>> 
>> (If you have the time) what does the dmesg look like on the -current
>> kernel with TSC_DEBUG enabled?
> 
> Looks like you enabled TSC_DEBUG in your diff, so what I sent you is what you
> are asking for...?

No, I mean on the -current *unpatched* kernel.  Sorry if that wasn't
clear.

Our -current kernel prints more detailed information if TSC_DEBUG
is enabled.  In particular, I'm curious if the unpatched kernel
detects any skew or drift on your machine, and if so, how much.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 01:48:39AM -0400, Mohamed Aslan wrote:
> > This is expected behavior with the patch.
> > 
> > cpu0's TSC is way out of sync with every
> > other CPU's TSC, so the TSC is marked
> > as a bad timecounter and a different one is
> > chosen.
> 
> Yes, I can see. Just want to add that without your latest patch the
> kernel chooses the TSC as clocksource, however only the *user* TSC
> was disabled (cpu1: disabling user TSC (skew=-5028216492)).
> 
> > Are you running the latest BIOS available
> > for your machine?
> 
> No, I don't think I am.

First, you need to update to the latest firmware.  Maybe they already
fixed the problem.  I don't see any mention of the TSC in the BIOS
changelog for the e495 but maybe you'll get lucky.

Second, if they haven't fixed the problem with the latest firmware, I
recommend you reach out to Lenovo and report the problem.

Lenovo seem to have been sympathetic to reports about TSC desync in
the past on other models and issued firmware fixes.  For example,
the v1.28 firmware for the ThinkPad A485 contained a fix for what
I assume is a very similar problem to the one you're having:

https://download.lenovo.com/pccbbs/mobiles/r0wuj65wd.txt

And this forum post, for example, got some response from Lenovo staff:

https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T-series-Laptops/T14s-G1-AMD-TSC-clock-unusable/m-p/5070296?page=1

So, open a post for your model and cite the other posts.

They might not be sympathetic to the fact that you're seeing the issue
on OpenBSD.  If that's a problem you should be able to reproduce the
problem with a recent Linux kernel.  The Linux kernel runs a similar
sync test during boot and will complain if the TSCs are not
synchronized.

Honestly, to save time you may want to just boot up a supported Linux
distribution and grab the error message before you ask for support.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
> On Jul 5, 2022, at 23:02, Mohamed Aslan  wrote:
> 
> Hi,
> 
> Apologies. My bad, I applied the latest patch but booted into another
> kernel with an earlier patch!
> 
> Here's what I got with your latest patch:
> 
> $ dmesg | grep 'tsc'
> tsc: cpu0/cpu1: sync test round 1/2 failed
> tsc: cpu0/cpu1: cpu0: 40162 lags 5112675666 cycles
> tsc: cpu0/cpu2: sync test round 1/2 failed
> tsc: cpu0/cpu2: cpu0: 18995 lags 5112675645 cycles
> tsc: cpu0/cpu3: sync test round 1/2 failed
> tsc: cpu0/cpu3: cpu0: 19136 lags 5112675645 cycles
> tsc: cpu0/cpu4: sync test round 1/2 failed
> tsc: cpu0/cpu4: cpu0: 19451 lags 5112675645 cycles
> tsc: cpu0/cpu5: sync test round 1/2 failed
> tsc: cpu0/cpu5: cpu0: 18625 lags 5112675645 cycles
> tsc: cpu0/cpu6: sync test round 1/2 failed
> tsc: cpu0/cpu6: cpu0: 18208 lags 5112675645 cycles
> tsc: cpu0/cpu7: sync test round 1/2 failed
> tsc: cpu0/cpu7: cpu0: 17739 lags 5112675645 cycles
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=i8254
> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)

This is expected behavior with the patch.

cpu0's TSC is way out of sync with every
other CPU's TSC, so the TSC is marked
as a bad timecounter and a different one is
chosen.

Are you running the latest BIOS available
for your machine?



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
> On Jul 5, 2022, at 21:31, Mohamed Aslan  wrote:
> 
> Hello,
> 
> I just tested your patch on my lenovo e495 laptop, unfortunately
> still no tsc.
> 
> $ dmesg | grep 'tsc:'
> tsc: cpu0/cpu1 sync round 1: 20468 regressions
> tsc: cpu0/cpu1 sync round 1: cpu0 lags cpu1 by 5351060292 cycles
> tsc: cpu0/cpu1 sync round 1: cpu1 lags cpu0 by 0 cycles
> tsc: cpu0/cpu2 sync round 1: 10272 regressions
> tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 5351060271 cycles
> tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 0 cycles
> tsc: cpu0/cpu3 sync round 1: 9709 regressions
> tsc: cpu0/cpu3 sync round 1: cpu0 lags cpu3 by 5351060271 cycles
> tsc: cpu0/cpu3 sync round 1: cpu3 lags cpu0 by 0 cycles
> tsc: cpu0/cpu4 sync round 1: 10386 regressions
> tsc: cpu0/cpu4 sync round 1: cpu0 lags cpu4 by 5351060271 cycles
> tsc: cpu0/cpu4 sync round 1: cpu4 lags cpu0 by 0 cycles
> tsc: cpu0/cpu5 sync round 1: 10408 regressions
> tsc: cpu0/cpu5 sync round 1: cpu0 lags cpu5 by 5351060271 cycles
> tsc: cpu0/cpu5 sync round 1: cpu5 lags cpu0 by 0 cycles
> tsc: cpu0/cpu6 sync round 1: 10102 regressions
> tsc: cpu0/cpu6 sync round 1: cpu0 lags cpu6 by 5351060271 cycles
> tsc: cpu0/cpu6 sync round 1: cpu6 lags cpu0 by 0 cycles
> tsc: cpu0/cpu7 sync round 1: 9336 regressions
> tsc: cpu0/cpu7 sync round 1: cpu0 lags cpu7 by 5351060271 cycles
> tsc: cpu0/cpu7 sync round 1: cpu7 lags cpu0 by 0 cycles

This is not the latest patch.

Please apply the latest patch and try again.

If possible, please also include your dmesg
from a -current kernel with the TSC_DEBUG
option set.



Re: powerpc, macppc: retrigger deferred DEC interrupts from splx(9)

2022-07-05 Thread Scott Cheloha
On Thu, Jun 23, 2022 at 09:58:48PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> Thoughts?  Tweaks?
> 
> [...]

miod: Any issues?

kettenis:  Anything to add?  ok?

drahn:  Anything to add?  ok?

--

It would be nice (but not strictly necessary) to test this on a
machine doing "real work".

Who does the macppc package builds?

Index: macppc/macppc/clock.c
===
RCS file: /cvs/src/sys/arch/macppc/macppc/clock.c,v
retrieving revision 1.48
diff -u -p -r1.48 clock.c
--- macppc/macppc/clock.c   23 Feb 2021 04:44:30 -  1.48
+++ macppc/macppc/clock.c   24 Jun 2022 02:49:58 -
@@ -128,6 +128,20 @@ decr_intr(struct clockframe *frame)
return;
 
/*
+* We can't actually mask DEC interrupts, i.e. mask MSR(EE),
+* at or above IPL_CLOCK without masking other essential
+* interrupts.  To simulate masking, we retrigger the DEC
+* by hand from splx(9) the next time our IPL drops below
+* IPL_CLOCK.
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   ppc_mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -160,39 +174,35 @@ decr_intr(struct clockframe *frame)
 */
ppc_mtdec(nextevent - tb);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-
-   /*
-* Reenable interrupts
-*/
-   ppc_intr_enable(1);
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < ci->ci_prevtb) {
-   /* sync lasttb with hardclock */
-   ci->ci_lasttb += ticks_per_intr;
-   clk_count.ec_count++;
-   hardclock(frame);
-   }
-
-   while (nstats-- > 0)
-   statclock(frame);
-
-   splx(s);
-   (void) ppc_intr_disable();
-
-   /* if a tick has occurred while dealing with these,
-* dont service it now, delay until the next tick.
-*/
+   nstats += ci->ci_statspending;
+   ci->ci_statspending = 0;
+
+   s = splclock();
+
+   /*
+* Reenable interrupts
+*/
+   ppc_intr_enable(1);
+
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < ci->ci_prevtb) {
+   /* sync lasttb with hardclock */
+   ci->ci_lasttb += ticks_per_intr;
+   clk_count.ec_count++;
+   hardclock(frame);
}
+
+   while (nstats-- > 0)
+   statclock(frame);
+
+   splx(s);
+   (void) ppc_intr_disable();
+
+   /* if a tick has occurred while dealing with these,
+* dont service it now, delay until the next tick.
+*/
 }
 
 void cpu_startclock(void);
Index: macppc/dev/openpic.c
===
RCS file: /cvs/src/sys/arch/macppc/dev/openpic.c,v
retrieving revision 1.89
diff -u -p -r1.89 openpic.c
--- macppc/dev/openpic.c21 Feb 2022 10:38:50 -  1.89
+++ macppc/dev/openpic.c24 Jun 2022 02:49:59 -
@@ -382,6 +382,10 @@ openpic_splx(int newcpl)
 
intr = ppc_intr_disable();
openpic_setipl(newcpl);
+   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
+   ppc_mtdec(0);
+   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
if (newcpl < IPL_SOFTTTY && (ci->ci_ipending & ppc_smask[newcpl])) {
s = splsofttty();
dosoftint(newcpl);
Index: macppc/dev/macintr.c
===
RCS file: /cvs/src/sys/arch/macppc/dev/macintr.c,v
retrieving revision 1.56
diff -u -p -r1.56 macintr.c
--- macppc/dev/macintr.c13 Mar 2022 12:33:01 -  1.56
+++ macppc/dev/macintr.c24 Jun 2022 02:49:59 -
@@ -170,6 +170,10 @@ macintr_splx(int newcpl)
 
intr = ppc_intr_disable();
macintr_setipl(newcpl);
+   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
+   ppc_mtdec(0);
+   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
if ((newcpl < IPL_SOFTTTY && ci->ci_ipending & ppc_smask[newcpl])) {
s = splsofttty();
dosoftint(newcpl);
Index: powerp

Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 09:14:03AM +0900, Yuichiro NAITO wrote:
> Hi, Scott.
> 
> I tested your patch on my OpenBSD running on ESXi.
> It works fine for me and I never see monotonic clock going backward.
> There is nothing extra messages in my dmesg.

Great!  Thanks for testing.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
> > 
> > [...]
> 
> Here's the output from a 4 socket 80 thread machine.

Oh nice.  I think this is the biggest machine we've tried so far.

> kern.timecounter reports tsc after boot.

Excellent.

> Looks like this machine doesn't have the adjust MSR?

IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
2011 or 2012.  I can't find the exact revision.

(I really wish there was a comprehensive version history for this sort
of thing, i.e. this MSR first appeared in the blah-blah uarch, this
instruction is available on all uarchs after yada-yada, etc.)

There are apparently several versions of the E7-4870 in the E7
"family".  If your CPU predates that, or launched 2012-2014, the MSR
may not have made the cut.

An aside: I cannot find any evidence of AMD supporting this MSR in any
processor.  It would be really, really nice if they did.  If you (or
anyone reading) knows anything about this, or whether they have an
equivalent MSR, shout it out.

> Other than that, machine seems stable.

Good, glad to hear it.  Thank you for testing.

Has this machine had issues using the TSC on -current in the past?

(If you have the time) what does the dmesg look like on the -current
kernel with TSC_DEBUG enabled?



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 06:40:26PM +0200, Stuart Henderson wrote:
> On 2022/07/05 11:22, Scott Cheloha wrote:
> > On Tue, Jul 05, 2022 at 05:47:51PM +0200, Stuart Henderson wrote:
> > > On 2022/07/04 21:06, Scott Cheloha wrote:
> > > > 4. OpenBSD VMs on other hypervisors.
> > > 
> > > KVM on proxmox VE 7.1-12
> > > 
> > > I force acpihpet0 on this; it defaults to pvclock which results in
> > > timekeeping so bad that ntpd can't correct
> > 
> > That is an interesting problem.  Probably worth looking at pvclock(4)
> > separately.
> > 
> > > $ sysctl kern.timecounter
> > > kern.timecounter.tick=1
> > > kern.timecounter.timestepwarnings=0
> > > kern.timecounter.hardware=acpihpet0
> > > kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
> > > acpitimer0(1000)
> > > 
> > > OpenBSD 7.1-current (GENERIC.MP) #45: Tue Jul  5 16:11:00 BST 2022
> > > st...@bamboo.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> > > real mem = 8573001728 (8175MB)
> > > avail mem = 8295833600 (7911MB)
> > > random: good seed from bootblocks
> > > mpath0 at root
> > > scsibus0 at mpath0: 256 targets
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf58c0 (10 entries)
> > > bios0: vendor SeaBIOS version 
> > > "rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org" date 04/01/2014
> > > bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> > > acpi0 at bios0: ACPI 1.0
> > > acpi0: sleep states S3 S4 S5
> > > acpi0: tables DSDT FACP APIC SSDT HPET WAET
> > > acpi0: wakeup devices
> > > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD Ryzen 5 PRO 5650G with Radeon Graphics, 3893.04 MHz, 19-50-00
> > > cpu0: 
> > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,CPCTR,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBRS,IBPB,STIBP,SSBD,IBPB,IBRS,STIBP,SSBD,VIRTSSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > 
> > This machine doesn't have the ITSC flag, so we would never consider
> > using the TSC as a timecounter.  The sync test is not run, but that
> > makes sense.
> > 
> > ... is that expected?  Should the machine have the ITSC flag?
> > 
> > (I'm not familiar with Proxmox.)
> > 
> 
> No idea to be honest. The cpu type is set to "host" so it should pass
> things through, but perhaps it deliberately filters out ITSC. Mostly
> wanted to point it out as a "doesn't make things worse" (and because
> you specifically wanted tests on other VMs :)

Gotcha, that's okay then.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 05:47:51PM +0200, Stuart Henderson wrote:
> On 2022/07/04 21:06, Scott Cheloha wrote:
> > 4. OpenBSD VMs on other hypervisors.
> 
> KVM on proxmox VE 7.1-12
> 
> I force acpihpet0 on this; it defaults to pvclock which results in
> timekeeping so bad that ntpd can't correct

That is an interesting problem.  Probably worth looking at pvclock(4)
separately.

> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
> acpitimer0(1000)
> 
> OpenBSD 7.1-current (GENERIC.MP) #45: Tue Jul  5 16:11:00 BST 2022
> st...@bamboo.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8573001728 (8175MB)
> avail mem = 8295833600 (7911MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf58c0 (10 entries)
> bios0: vendor SeaBIOS version "rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org" 
> date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC SSDT HPET WAET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 PRO 5650G with Radeon Graphics, 3893.04 MHz, 19-50-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,CPCTR,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBRS,IBPB,STIBP,SSBD,IBPB,IBRS,STIBP,SSBD,VIRTSSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES

This machine doesn't have the ITSC flag, so we would never consider
using the TSC as a timecounter.  The sync test is not run, but that
makes sense.

... is that expected?  Should the machine have the ITSC flag?

(I'm not familiar with Proxmox.)



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 05:38:04PM +0200, Stuart Henderson wrote:
> On 2022/07/04 21:06, Scott Cheloha wrote:
> > 2. Other multisocket machines.
> 
> This is from the R620 where I originally discovered the problems
> with SMP with the previous TSC test:
> 
> $ dmesg|grep tsc
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> 
> --- old   Tue Jul  5 15:34:06 2022
> +++ new   Tue Jul  5 15:34:08 2022
> @@ -1,7 +1,7 @@
> [snip]

Okay, so on the -current kernel the TSC is marked defective, but with
this patch (v3) the TSC is fine: you get no printouts on the console
from the TSC module.

Good, excellent.

Thank you for testing again.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 10:53:43AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > On Tue, Jul 05, 2022 at 07:15:31AM -0400, Dave Voutila wrote:
> >>
> >> Scott Cheloha  writes:
> >>
> >> > [...]
> >> >
> >> > If you fail the test you will see something like this:
> >> >
> >> >  tsc: cpu0/cpu2: sync test round 1/2 failed
> >> >  tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> >> >
> >> > A printout like this would mean that the sync test for cpu2 failed.
> >> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> >> > If this happens for *any* CPU we mark the TSC timecounter as
> >> > defective.
> >>
> >> I think this passes now on my dual-socket Xeon box?
> >
> > Yes, it passes.  The timecounter on your machine should still have a
> > quality of 2000, i.e. we didn't mark it defective.
> >
> >> Full dmesg at the end of the email[1], but just the `tsc:' lines look
> >> like:
> >>
> >> $ grep tsc dmesg.txt
> >> tsc: cpu0: IA32_TSC_ADJUST: -5774382067215574 -> 0
> >> tsc: cpu1: IA32_TSC_ADJUST: -5774382076335870 -> 0
> >> tsc: cpu2: IA32_TSC_ADJUST: -5774382073829798 -> 0
> >> tsc: cpu3: IA32_TSC_ADJUST: -5774382071913818 -> 0
> >> tsc: cpu4: IA32_TSC_ADJUST: -5774382075956770 -> 0
> >> tsc: cpu5: IA32_TSC_ADJUST: -5774382074583181 -> 0
> >> tsc: cpu6: IA32_TSC_ADJUST: -5774382073199574 -> 0
> >> tsc: cpu7: IA32_TSC_ADJUST: -5774382076500135 -> 0
> >> tsc: cpu8: IA32_TSC_ADJUST: -5774382074705354 -> 0
> >> tsc: cpu9: IA32_TSC_ADJUST: -5774382075954945 -> 0
> >> tsc: cpu10: IA32_TSC_ADJUST: -5774382070567294 -> 0
> >> tsc: cpu11: IA32_TSC_ADJUST: -5774382075968443 -> 0
> >> tsc: cpu12: IA32_TSC_ADJUST: -5774382067353478 -> 0
> >> tsc: cpu13: IA32_TSC_ADJUST: -5774382071926523 -> 0
> >> tsc: cpu14: IA32_TSC_ADJUST: -5774382074619890 -> 0
> >> tsc: cpu15: IA32_TSC_ADJUST: -5774382070107058 -> 0
> >> tsc: cpu16: IA32_TSC_ADJUST: -5774382076196640 -> 0
> >> tsc: cpu17: IA32_TSC_ADJUST: -5774382075090665 -> 0
> >> tsc: cpu18: IA32_TSC_ADJUST: -5774382073529646 -> 0
> >> tsc: cpu19: IA32_TSC_ADJUST: -5774382076443616 -> 0
> >> tsc: cpu20: IA32_TSC_ADJUST: -5774382074994536 -> 0
> >> tsc: cpu21: IA32_TSC_ADJUST: -5774382076309520 -> 0
> >> tsc: cpu22: IA32_TSC_ADJUST: -5774382070947686 -> 0
> >> tsc: cpu23: IA32_TSC_ADJUST: -5774382073056320 -> 0
> >
> > Fascinating.  Wonder what the heck it's doing down there.
> >
> >> It does look like there's a newer BIOS version for this machine, so I'll
> >> try updating it later today and repeating the test to see if anything
> >> changes.
> 
> After a BIOS update, still similar output.
> 
> "new" bios:
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec0f0 (105 entries)
> bios0: vendor Dell Inc. version "A34" date 10/19/2020
> bios0: Dell Inc. Precision Tower 7810
> 
> $ dmesg | grep tsc
> tsc: cpu0: IA32_TSC_ADJUST: -4070378216 -> 0
> tsc: cpu1: IA32_TSC_ADJUST: -4081094631 -> 0
> tsc: cpu2: IA32_TSC_ADJUST: -4078853396 -> 0
> tsc: cpu3: IA32_TSC_ADJUST: -4074362824 -> 0
> tsc: cpu4: IA32_TSC_ADJUST: -4080872645 -> 0
> tsc: cpu5: IA32_TSC_ADJUST: -4075673830 -> 0
> tsc: cpu6: IA32_TSC_ADJUST: -4081906959 -> 0
> tsc: cpu7: IA32_TSC_ADJUST: -4073006269 -> 0
> tsc: cpu8: IA32_TSC_ADJUST: -4081803214 -> 0
> tsc: cpu9: IA32_TSC_ADJUST: -4081294540 -> 0
> tsc: cpu10: IA32_TSC_ADJUST: -4079817920 -> 0
> tsc: cpu11: IA32_TSC_ADJUST: -4079871039 -> 0
> tsc: cpu12: IA32_TSC_ADJUST: -4070522580 -> 0
> tsc: cpu13: IA32_TSC_ADJUST: -4077205405 -> 0
> tsc: cpu14: IA32_TSC_ADJUST: -4081797309 -> 0
> tsc: cpu15: IA32_TSC_ADJUST: -4078574630 -> 0
> tsc: cpu16: IA32_TSC_ADJUST: -4081539272 -> 0
> tsc: cpu17: IA32_TSC_ADJUST: -4079657247 -> 0
> tsc: cpu18: IA32_TSC_ADJUST: -4080469326 -> 0
> tsc: cpu19: IA32_TSC_ADJUST: -4073404194 -> 0
> tsc: cpu20: IA32_TSC_ADJUST: -4081473720 -> 0
> tsc: cpu21: IA32_TSC_ADJUST: -4076195877 -> 0
> tsc: cpu22: IA32_TSC_ADJUST: -4077876814 -> 0
> tsc: cpu23: IA32_TSC_ADJUST: -4081863303 -> 0
> 
> And still a quality tsc :) :
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)

Alrighty, that's &qu

Re: ts(1): make timespec-handling code more obvious

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 11:53:26AM +0200, Claudio Jeker wrote:
> On Tue, Jul 05, 2022 at 11:34:21AM +, Job Snijders wrote:
> > On Tue, Jul 05, 2022 at 11:08:13AM +0200, Claudio Jeker wrote:
> > > On Mon, Jul 04, 2022 at 05:10:05PM -0500, Scott Cheloha wrote:
> > > > On Mon, Jul 04, 2022 at 11:15:24PM +0200, Claudio Jeker wrote:
> > > > > On Mon, Jul 04, 2022 at 01:28:12PM -0500, Scott Cheloha wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Couple things:
> > > > > > 
> > > > > > [...]
> > > > > 
> > > > > I don't like the introduction of all these local variables that are 
> > > > > just
> > > > > hard to follow and need extra code pathes. Happy to rename roff to 
> > > > > offset,
> > > > > start_offset or something similar. Also moving the localtime call into
> > > > > fmtfmt() is fine.
> > > > 
> > > > You need an "elapsed" variable to avoid overwriting "now" in the
> > > > -i flag case to avoid calling clock_gettime(2) twice.
> > > > 
> > > > We can get rid of "utc_start" and just reuse "now" for the initial
> > > > value of CLOCK_REALTIME.
> > > > 
> > > > How is this?
> > > 
> > > How about this instead?
> > 
> > Looks like an improvement
> > 
> > The suggestion to change 'ms' to 'us' might be a good one to roll into
> > this changeset too.
> 
> Ah right, we print us not ms.
>  
> > nitpick: the changeset doesn't apply cleanly:
> 
> Forgot to update that tree :)
> 
> Updated diff below

This is fine by me, you took most of what I wanted, and even
the "ms" -> "us" name change :)

One nit below, otherwise: ok cheloha@

> Index: ts.c
> ===
> RCS file: /cvs/src/usr.bin/ts/ts.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 ts.c
> --- ts.c  4 Jul 2022 17:29:03 -   1.6
> +++ ts.c  5 Jul 2022 09:51:38 -
> @@ -32,7 +32,7 @@ static char *buf;
>  static char  *outbuf;
>  static size_t bufsize;
>  
> -static void   fmtfmt(struct tm *, long);
> +static void   fmtfmt(const struct timespec *);
>  static void __deadusage(void);
>  
>  int
> @@ -40,8 +40,7 @@ main(int argc, char *argv[])
>  {
>   int iflag, mflag, sflag;
>   int ch, prev;
> - struct timespec roff, start, now;
> - struct tm *tm;
> + struct timespec start, now, utc_offset, ts;
>   clockid_t clock = CLOCK_REALTIME;
>  
>   if (pledge("stdio", NULL) == -1)
> @@ -93,22 +92,22 @@ main(int argc, char *argv[])
>   if (setenv("TZ", "UTC", 1) == -1)
>   err(1, "setenv UTC");
>  
> - clock_gettime(CLOCK_REALTIME, );
>   clock_gettime(clock, );
> - timespecsub(, , );
> + clock_gettime(CLOCK_REALTIME, _offset);
> + timespecsub(_offset, , _offset);

You don't need to initialize utc_offset except in the -m flag case.



  1   2   3   4   5   6   7   8   >