all architectures: put clockframe definition in frame.h?

2022-08-18 Thread Scott Cheloha
Hi,

clockframe is sometimes defined in cpu.h, sometimes in frame.h, and
sometimes defined once each in both header files.

Can we put the clockframe definitions in frame.h?  Always?  It is, at
least ostensibly, a "frame".

I do not want to consolidate the clockframe definitions in cpu.h
because this is creating a circular dependency problem for my clock
interrupt patch.

In particular, cpu.h needs a data structure defined in a new header
file to add it to struct cpu_info on all architectures, like this:

/* cpu.h */

#include 

struct cpu_info {
/* ... */
struct clockintr_state;
};

... but the header clockintr.h needs the clockframe definition so it
can prototype functions accepting a clockframe pointer, like this:

/* clockintr.h */

#include   /* this works fine */

#ifdef this_does_not_work
#include 
#endif

int clockintr_foo(struct clockframe *, int, short);
int clockintr_bar(struct clockframe *, char *, long);

struct clockintr_state {
char *cs_foo;
int cs_bar;
};

--

Hopefully I have illustrated the problem.

The only architecture where this might be a problem is sparc64.
There, clockframe is defined in terms of trapframe64, which is defined
in reg.h, not frame.h.

kettenis: can we put clockframe in frame.h on sparc64 or am I buying
trouble?

I can't compile-test this everywhere, but because every architecture's
cpu.h includes frame.h I don't think this can break anything (except
on sparc64).

The CLKF macros can remain in cpu.h.  They are not data structures so
putting them in frame.h looks odd on most architectures.

Index: alpha/include/cpu.h
===
RCS file: /cvs/src/sys/arch/alpha/include/cpu.h,v
retrieving revision 1.66
diff -u -p -r1.66 cpu.h
--- alpha/include/cpu.h 10 Aug 2022 10:41:35 -  1.66
+++ alpha/include/cpu.h 19 Aug 2022 03:27:06 -
@@ -296,14 +296,6 @@ cpu_rnd_messybits(void)
return alpha_rpcc();
 }
 
-/*
- * Arguments to hardclock and gatherstats encapsulate the previous
- * machine state in an opaque clockframe.  On the Alpha, we use
- * what we push on an interrupt (a trapframe).
- */
-struct clockframe {
-   struct trapframecf_tf;
-};
 #defineCLKF_USERMODE(framep)   
\
(((framep)->cf_tf.tf_regs[FRAME_PS] & ALPHA_PSL_USERMODE) != 0)
 #defineCLKF_PC(framep) ((framep)->cf_tf.tf_regs[FRAME_PC])
Index: alpha/include/frame.h
===
RCS file: /cvs/src/sys/arch/alpha/include/frame.h,v
retrieving revision 1.4
diff -u -p -r1.4 frame.h
--- alpha/include/frame.h   23 Mar 2011 16:54:34 -  1.4
+++ alpha/include/frame.h   19 Aug 2022 03:27:08 -
@@ -92,4 +92,13 @@ struct trapframe {
unsigned long   tf_regs[FRAME_SIZE];/* See above */
 };
 
+/*
+ * Arguments to hardclock and gatherstats encapsulate the previous
+ * machine state in an opaque clockframe.  On the Alpha, we use
+ * what we push on an interrupt (a trapframe).
+ */
+struct clockframe {
+   struct trapframecf_tf;
+};
+
 #endif /* _MACHINE_FRAME_H_ */
Index: amd64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.147
diff -u -p -r1.147 cpu.h
--- amd64/include/cpu.h 12 Aug 2022 02:20:36 -  1.147
+++ amd64/include/cpu.h 19 Aug 2022 03:27:08 -
@@ -335,13 +335,6 @@ cpu_rnd_messybits(void)
 
 #define curpcb curcpu()->ci_curpcb
 
-/*
- * Arguments to hardclock, softclock and statclock
- * encapsulate the previous machine state in an opaque
- * clockframe; for now, use generic intrframe.
- */
-#define clockframe intrframe
-
 #defineCLKF_USERMODE(frame)USERMODE((frame)->if_cs, 
(frame)->if_rflags)
 #define CLKF_PC(frame) ((frame)->if_rip)
 #define CLKF_INTR(frame)   (curcpu()->ci_idepth > 1)
Index: amd64/include/frame.h
===
RCS file: /cvs/src/sys/arch/amd64/include/frame.h,v
retrieving revision 1.10
diff -u -p -r1.10 frame.h
--- amd64/include/frame.h   10 Jul 2018 08:57:44 -  1.10
+++ amd64/include/frame.h   19 Aug 2022 03:27:08 -
@@ -138,6 +138,12 @@ struct intrframe {
int64_t if_ss;
 };
 
+/*
+ * Arguments to hardclock, softclock and statclock
+ * encapsulate the previous machine state in an opaque
+ * clockframe; for now, use generic intrframe.
+ */
+#define clockframe intrframe
 
 /*
  * The trampoline frame used on the kernel stack page which is present
Index: arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.27
diff -u -p -r1.27 cpu.h
--- arm64/include/cpu.h 13 Jul 2022 09:28:19 -  1.27
+++ arm64/include/cpu.h 19 Aug 2022 03:27:08 -
@@ -49,7 +49,6 @@
 
 /* All the CLKF_* macros 

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-17 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 01:30:29PM +, Visa Hankala wrote:
> On Tue, Aug 09, 2022 at 09:54:02AM -0500, Scott Cheloha wrote:
> > On Tue, Aug 09, 2022 at 02:03:31PM +, Visa Hankala wrote:
> > > On Mon, Aug 08, 2022 at 02:52:37AM -0500, Scott Cheloha wrote:
> > > > One thing I'm still uncertain about is how glxclk fits into the
> > > > loongson picture.  It's an interrupt clock that runs hardclock() and
> > > > statclock(), but the code doesn't do any logical masking, so I don't
> > > > know whether or not I need to adjust anything in that code or account
> > > > for it at all.  If there's no logical masking there's no deferral, so
> > > > it would never call need to call md_triggerclock() from splx(9).
> > > 
> > > I think the masking of glxclk interrupts are handled by the ISA
> > > interrupt code.
> > 
> > Do those machines not have Coprocessor 0?  If they do, why would you
> > prefer glxclk over CP0?
> > 
> > > The patch misses md_triggerclock definition in mips64_machdep.c.
> > 
> > Whoops, forgot that file.  Fuller patch below.
> > 
> > > I have put this to the test on the mips64 ports builder machines.
> 
> The machines completed a build with this patch without problems.
> I tested with the debug counters removed from cp0_trigger_int5().
> 
> OK visa@

Thank you for testing!

There was a loongson portion to this patch.  Is this OK on loongson or
just octeon?

Also, what did the debug counters look like when you yanked them?  If
cp0_raise_miss was non-zero I will double the initial offset to 32
cycles.



Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Wed, Aug 17, 2022 at 02:28:14PM +1000, Jonathan Gray wrote:
> On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> > On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > > 
> > > In the future when the LAPIC timer is run in oneshot mode there will
> > > be no lapic_delay().
> > > 
> > > [...]
> > > 
> > > This is *very* bad for older amd64 machines, because you are left with
> > > i8254_delay().
> > > 
> > > I would like to offer a less awful delay(9) implementation for this
> > > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > > MP kernels because only one CPU can read the i8254 at a time.
> > > 
> > > [...]
> > > 
> > > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > > will be fine using acpitimer_delay() instead of i8254_delay().
> > > 
> > > [...]
> > > 
> > > Here are the sample measurements from my 2017 laptop (kaby lake
> > > refresh) running the attached patch.  It takes longer than a
> > > microsecond to read either of the ACPI timers.  The PM timer is better
> > > than the HPET.  The HPET is a bit better than the i8254.  I hope the
> > > numbers are a little better on older hardware.
> > > 
> > > acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> > > 0.09638
> > > acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> > > 0.05464
> > > acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> > > 0.07619
> > > acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> > > 0.07275
> > > acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> > > 0.07891
> > > 
> > > acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> > > 0.21208
> > > acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> > > 0.21690
> > > acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> > > 0.12647
> > > acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> > > 0.21480
> > > acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> > > 0.13736
> > > 
> > > i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> > > 0.39110
> > > i8254_test_delay:  expected  0.1  actual  0.39471  error  
> > > 0.29471
> > > i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> > > 0.28031
> > > i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> > > 0.24586
> > > i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> > > 0.21859
> > 
> > Attched is an updated patch.  I left the test measurement code in
> > place because I would like to see a test on a real i386 machine, just
> > to make sure it works as expected.  I can't imagine why it wouldn't
> > work, but we should never assume anything.
> > 
> > Changes from v1:
> > 
> > - Actually set delay_func from acpitimerattach() and
> >   acpihpet_attach().
> > 
> >   I think it's safe to assume, on real hardware, that the ACPI PMT is
> >   preferable to the i8254 and the HPET is preferable to both of them.
> > 
> >   This is not *always* true, but it is true on the older machines that
> >   can't use tsc_delay(), so the assumption works in practice.
> > 
> >   Outside of those three timers, the hierarchy gets murky.  There are
> >   other timers that are better than the HPET, but they aren't always
> >   available.  If those timers are already providing delay_func this
> >   code does not usurp them.
> 
> As I understand it, you want lapic to be in one-shot mode for something
> along the lines of tickless.

Yes.

Although "tickless" is a misnomer.

> So you are trying to find MP machines
> where TSC is not useable for delay?

Right.  Those are the only machines where it's relevant to consider
the accuracy of acpitimer_delay() or acpihpet_delay()... unless I've
forgotten something.

> TSC is only considered for delay if the invariant and constant flags
> are set.
> invariant:
> "In the Core i7 and future processor generations, the TSC will continue
> to run in the deepest C-states. Therefore, the TSC will run at a
> constant rate in all ACPI P-,

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Tue, Aug 16, 2022 at 11:53:51AM -0500, Scott Cheloha wrote:
> On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> > 
> > In the future when the LAPIC timer is run in oneshot mode there will
> > be no lapic_delay().
> > 
> > [...]
> > 
> > This is *very* bad for older amd64 machines, because you are left with
> > i8254_delay().
> > 
> > I would like to offer a less awful delay(9) implementation for this
> > class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> > MP kernels because only one CPU can read the i8254 at a time.
> > 
> > [...]
> > 
> > Real i386 hardware should be fine.  Later models with an ACPI PM timer
> > will be fine using acpitimer_delay() instead of i8254_delay().
> > 
> > [...]
> 
> Attched is an updated patch.  I left the test measurement code in
> place because I would like to see a test on a real i386 machine, just
> to make sure it works as expected.  I can't imagine why it wouldn't
> work, but we should never assume anything.
> 
> [...]
> 
> One remaining question I have:
> 
> Is there a nice way to test whether ACPI PMT support is compiled into
> the kernel?  We can assume the existence of i8254_delay() because
> clock.c is required on amd64 and i386.  However, acpitimer.c is a
> optional, so acpitimer_delay() isn't necessarily there.
> 
> I would rather not introduce a hard requirement on acpitimer.c into
> acpihpet.c if there's an easy way to check for the latter.
> 
> Any ideas?

And here's the cleaned up patch.  Just in case nobody tests i386.
Pretty straightforward.  acpitimer is preferable to i8254, hpet is
preferable to acpitimer and i8254.

The only obvious problem I see is the hard dependency this creates in
acpihpet.c on acpitimer.c.

Index: acpitimer.c
===
RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v
retrieving revision 1.15
diff -u -p -r1.15 acpitimer.c
--- acpitimer.c 6 Apr 2022 18:59:27 -   1.15
+++ acpitimer.c 17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -25,10 +26,13 @@
 #include 
 #include 
 
+struct acpitimer_softc;
+
 int acpitimermatch(struct device *, void *, void *);
 void acpitimerattach(struct device *, struct device *, void *);
-
+void acpitimer_delay(int);
 u_int acpi_get_timecount(struct timecounter *tc);
+uint32_t acpitimer_read(struct acpitimer_softc *);
 
 static struct timecounter acpi_timecounter = {
.tc_get_timecount = acpi_get_timecount,
@@ -98,18 +102,45 @@ acpitimerattach(struct device *parent, s
acpi_timecounter.tc_priv = sc;
acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
+
+#if defined(__amd64__) || defined(__i386__)
+   if (delay_func == i8254_delay)
+   delay_func = acpitimer_delay;
+#endif
 #if defined(__amd64__)
extern void cpu_recalibrate_tsc(struct timecounter *);
cpu_recalibrate_tsc(_timecounter);
 #endif
 }
 
+void
+acpitimer_delay(int usecs)
+{
+   uint64_t count = 0, cycles;
+   struct acpitimer_softc *sc = acpi_timecounter.tc_priv;
+   uint32_t mask = acpi_timecounter.tc_counter_mask;
+   uint32_t val1, val2;
+
+   val2 = acpitimer_read(sc);
+   cycles = usecs * acpi_timecounter.tc_frequency / 100;
+   while (count < cycles) {
+   CPU_BUSY_CYCLE();
+   val1 = val2;
+   val2 = acpitimer_read(sc);
+   count += (val2 - val1) & mask;
+   }
+}
 
 u_int
 acpi_get_timecount(struct timecounter *tc)
 {
-   struct acpitimer_softc *sc = tc->tc_priv;
-   u_int u1, u2, u3;
+   return acpitimer_read(tc->tc_priv);
+}
+
+uint32_t
+acpitimer_read(struct acpitimer_softc *sc)
+{
+   uint32_t u1, u2, u3;
 
u2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
u3 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0);
Index: acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.26
diff -u -p -r1.26 acpihpet.c
--- acpihpet.c  6 Apr 2022 18:59:27 -   1.26
+++ acpihpet.c  17 Aug 2022 02:56:10 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -31,7 +32,7 @@ int acpihpet_attached;
 int acpihpet_match(struct device *, void *, void *);
 void acpihpet_attach(struct device *, struct device *, void *);
 int acpihpet_activate(struct device *, int);
-
+void acpihpet_delay(int);
 u_int acpihpet_gettime(struct timecounter *tc);
 
 uint64_t   acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
@@ -262,15 +263,37 @@ acpihpet_attach(struct device *parent, s
freq = 1000ull / period;
printf(": %lld Hz\n", freq);
 

Re: [RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-16 Thread Scott Cheloha
On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote:
> 
> In the future when the LAPIC timer is run in oneshot mode there will
> be no lapic_delay().
> 
> [...]
> 
> This is *very* bad for older amd64 machines, because you are left with
> i8254_delay().
> 
> I would like to offer a less awful delay(9) implementation for this
> class of hardware.  Otherwise we may trip over bizarre phantom bugs on
> MP kernels because only one CPU can read the i8254 at a time.
> 
> [...]
> 
> Real i386 hardware should be fine.  Later models with an ACPI PM timer
> will be fine using acpitimer_delay() instead of i8254_delay().
> 
> [...]
> 
> Here are the sample measurements from my 2017 laptop (kaby lake
> refresh) running the attached patch.  It takes longer than a
> microsecond to read either of the ACPI timers.  The PM timer is better
> than the HPET.  The HPET is a bit better than the i8254.  I hope the
> numbers are a little better on older hardware.
> 
> acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
> 0.09638
> acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
> 0.05464
> acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
> 0.07619
> acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
> 0.07275
> acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
> 0.07891
> 
> acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
> 0.21208
> acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
> 0.21690
> acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
> 0.12647
> acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
> 0.21480
> acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
> 0.13736
> 
> i8254_test_delay:  expected  0.01000  actual  0.40110  error  
> 0.39110
> i8254_test_delay:  expected  0.1  actual  0.39471  error  
> 0.29471
> i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
> 0.28031
> i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
> 0.24586
> i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
> 0.21859

Attched is an updated patch.  I left the test measurement code in
place because I would like to see a test on a real i386 machine, just
to make sure it works as expected.  I can't imagine why it wouldn't
work, but we should never assume anything.

Changes from v1:

- Actually set delay_func from acpitimerattach() and
  acpihpet_attach().

  I think it's safe to assume, on real hardware, that the ACPI PMT is
  preferable to the i8254 and the HPET is preferable to both of them.

  This is not *always* true, but it is true on the older machines that
  can't use tsc_delay(), so the assumption works in practice.

  Outside of those three timers, the hierarchy gets murky.  There are
  other timers that are better than the HPET, but they aren't always
  available.  If those timers are already providing delay_func this
  code does not usurp them.

- Duplicate test measurement code from amd64/lapic.c into i386/lapic.c.
  Will be removed in the committed version.

- Use bus_space_read_8() in acpihpet.c if it's available.  The HPET is
  a 64-bit counter and the spec permits 32-bit or 64-bit aligned access.

  As one might predict, this cuts the overhead in half because we're
  doing half as many reads.

  This part can go into a separate commit, but I thought it was neat
  so I'm including it here.

One remaining question I have:

Is there a nice way to test whether ACPI PMT support is compiled into
the kernel?  We can assume the existence of i8254_delay() because
clock.c is required on amd64 and i386.  However, acpitimer.c is a
optional, so acpitimer_delay() isn't necessarily there.

I would rather not introduce a hard requirement on acpitimer.c into
acpihpet.c if there's an easy way to check for the latter.

Any ideas?

Anyone have i386 hardware results?  If I'm reading the timeline right,
most P6 machines and beyond (NetBurst, etc) will have an ACPI PMT.  I
don't know if any real x86 motherboards shipped with an HPET, but it's
possible.

Here are my updated results with the bus_space_read_8 change:

acpitimer_test_delay:  expected  0.01000  actual  0.10607  error  
0.09607
acpitimer_test_delay:  expected  0.1  actual  0.15491  error  
0.05491
acpitimer_test_delay:  expected  0.00010  actual  0.000107734  error  
0.07734
acpitimer_test_delay:  expected  0.00100  actual  0.001008006  error  
0.08006
acpitimer_test_delay:  expected  0.01000  actual  0.010007042  error  
0.07042

acpihpet_test_delay

[RFC] acpi: add acpitimer_delay(), acpihpet_delay()

2022-08-14 Thread Scott Cheloha
Hi,

In the future when the LAPIC timer is run in oneshot mode there will
be no lapic_delay().

This is fine if you have a constant TSC, because we have tsc_delay().

This is *very* bad for older amd64 machines, because you are left with
i8254_delay().

I would like to offer a less awful delay(9) implementation for this
class of hardware.  Otherwise we may trip over bizarre phantom bugs on
MP kernels because only one CPU can read the i8254 at a time.

I think patrick@ was struggling with some version of that problem last
year, but in a VM.

Real i386 hardware should be fine.  Later models with an ACPI PM timer
will be fine using acpitimer_delay() instead of i8254_delay().

If this seems reasonable to people I will come back with a cleaned up
patch for testing.

Thoughts?  Preferences?

-Scott

Here are the sample measurements from my 2017 laptop (kaby lake
refresh) running the attached patch.  It takes longer than a
microsecond to read either of the ACPI timers.  The PM timer is better
than the HPET.  The HPET is a bit better than the i8254.  I hope the
numbers are a little better on older hardware.

acpitimer_test_delay:  expected  0.01000  actual  0.10638  error  
0.09638
acpitimer_test_delay:  expected  0.1  actual  0.15464  error  
0.05464
acpitimer_test_delay:  expected  0.00010  actual  0.000107619  error  
0.07619
acpitimer_test_delay:  expected  0.00100  actual  0.001007275  error  
0.07275
acpitimer_test_delay:  expected  0.01000  actual  0.010007891  error  
0.07891

acpihpet_test_delay:   expected  0.01000  actual  0.22208  error  
0.21208
acpihpet_test_delay:   expected  0.1  actual  0.31690  error  
0.21690
acpihpet_test_delay:   expected  0.00010  actual  0.000112647  error  
0.12647
acpihpet_test_delay:   expected  0.00100  actual  0.001021480  error  
0.21480
acpihpet_test_delay:   expected  0.01000  actual  0.010013736  error  
0.13736

i8254_test_delay:  expected  0.01000  actual  0.40110  error  
0.39110
i8254_test_delay:  expected  0.1  actual  0.39471  error  
0.29471
i8254_test_delay:  expected  0.00010  actual  0.000128031  error  
0.28031
i8254_test_delay:  expected  0.00100  actual  0.001024586  error  
0.24586
i8254_test_delay:  expected  0.01000  actual  0.010021859  error  
0.21859

Index: dev/acpi/acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.26
diff -u -p -r1.26 acpihpet.c
--- dev/acpi/acpihpet.c 6 Apr 2022 18:59:27 -   1.26
+++ dev/acpi/acpihpet.c 15 Aug 2022 04:21:58 -
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -31,8 +32,9 @@ int acpihpet_attached;
 int acpihpet_match(struct device *, void *, void *);
 void acpihpet_attach(struct device *, struct device *, void *);
 int acpihpet_activate(struct device *, int);
-
+void acpiphet_delay(u_int);
 u_int acpihpet_gettime(struct timecounter *tc);
+void acpihpet_test_delay(u_int);
 
 uint64_t   acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh,
bus_size_t _ioa);
@@ -262,7 +264,7 @@ acpihpet_attach(struct device *parent, s
freq = 1000ull / period;
printf(": %lld Hz\n", freq);
 
-   hpet_timecounter.tc_frequency = (uint32_t)freq;
+   hpet_timecounter.tc_frequency = freq;
hpet_timecounter.tc_priv = sc;
hpet_timecounter.tc_name = sc->sc_dev.dv_xname;
tc_init(_timecounter);
@@ -273,10 +275,43 @@ acpihpet_attach(struct device *parent, s
acpihpet_attached++;
 }
 
+void
+acpihpet_delay(u_int usecs)
+{
+   uint64_t d, s;
+   struct acpihpet_softc *sc = hpet_timecounter.tc_priv;
+
+   d = usecs * hpet_timecounter.tc_frequency / 100;
+   s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER);
+   while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < d)
+   CPU_BUSY_CYCLE();
+}
+
 u_int
 acpihpet_gettime(struct timecounter *tc)
 {
struct acpihpet_softc *sc = tc->tc_priv;
 
return (bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER));
+}
+
+void
+acpihpet_test_delay(u_int usecs)
+{
+   struct timespec ac, er, ex, t0, t1;
+
+   if (!acpihpet_attached) {
+   printf("%s: (no hpet attached)\n", __func__);
+   return;
+   }
+
+   nanouptime();
+   acpihpet_delay(usecs);
+   nanouptime();
+   timespecsub(, , );
+   NSEC_TO_TIMESPEC(usecs * 1000ULL, );
+   timespecsub(, , );
+   printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n",
+   __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec,
+   er.tv_sec, er.tv_nsec);
 }
Index: dev/acpi/acpitimer.c
===
RCS file: 

renice(8): don't succeed after 256 errors

2022-08-11 Thread Scott Cheloha
This is a good one.

$ renice -n -1 -p 1 ; echo $?
renice: setpriority: 1: Operation not permitted
1
$ renice -n -1 -p 1 1 ; echo $?
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
2
$ renice -n -1 -p 1 1 1 ; echo $?
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
renice: setpriority: 1: Operation not permitted
3
$ renice -n -1 -p $(jot -b 1 256) 2>/dev/null; echo $?
0

Fix is to just set error instead of incrementing it.

ok?

Index: renice.c
===
RCS file: /cvs/src/usr.bin/renice/renice.c,v
retrieving revision 1.21
diff -u -p -r1.21 renice.c
--- renice.c25 Jan 2019 00:19:26 -  1.21
+++ renice.c11 Aug 2022 22:49:23 -
@@ -155,14 +155,14 @@ main(int argc, char **argv)
 static int
 renice(struct renice_param *p, struct renice_param *end)
 {
-   int new, old, errors = 0;
+   int new, old, error = 0;
 
for (; p < end; p++) {
errno = 0;
old = getpriority(p->id_type, p->id);
if (errno) {
warn("getpriority: %d", p->id);
-   errors++;
+   error = 1;
continue;
}
if (p->pri_type == RENICE_INCREMENT)
@@ -171,13 +171,13 @@ renice(struct renice_param *p, struct re
p->pri < PRIO_MIN ? PRIO_MIN : p->pri;
if (setpriority(p->id_type, p->id, new) == -1) {
warn("setpriority: %d", p->id);
-   errors++;
+   error = 1;
continue;
}
printf("%d: old priority %d, new priority %d\n",
p->id, old, new);
}
-   return (errors);
+   return error;
 }
 
 __dead void



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Thu, Aug 11, 2022 at 02:22:08AM +0200, Jeremie Courreges-Anglas wrote:
> On Wed, Aug 10 2022, Scott Cheloha  wrote:
> > [...]
> >
> > 1. Our ksh(1) already checks for stdout errors in the echo builtin.
> 
> So do any of the scripts in our source tree use /bin/echo for whatever
> reason?  If so, could one of these scripts be broken if /bin/echo
> started to report an error?  Shouldn't those scripts be reviewed?

I didn't look.

There are... hundreds files that look like shell scripts in src.

$ cd /usr/src
$ find . -exec egrep -l '\#.*\!.*sh' > ~/src-shell-script-paths
$ wc -l ~/src-shell-script-paths
1118 /home/ssc/src-shell-script-paths

A lot of them are in regress/.

I guess I better start looking.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Wed, Aug 10, 2022 at 02:23:08PM -0600, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > On Wed, Aug 10, 2022 at 12:26:17PM -0600, Theo de Raadt wrote:
> > > Scott Cheloha  wrote:
> > > 
> > > > We're sorta-kinda circling around adding the missing (?) stdio error
> > > > checking to other utilities in bin/ and usr.bin/, no?  I want to be
> > > > sure I understand how to do the next patch, because if we do that it
> > > > will probably be a bunch of programs all at once.
> > > 
> > > This specific program has not checked for this condition since at least
> > > 2 AT UNIX.
> > > 
> > > Your change does not just add a new warning.  It adds a new exit code
> > > condition.
> > > 
> > > Some scripts using echo, which accepted the condition because echo would
> > > exit 0 and not check for this condition, will now see this exit 1.  Some
> > > scripts will abort, because they use "set -o errexit" or similar.
> > > 
> > > You are changing the exit code for a command which is used a lot.
> > > 
> > > POSIX does not require or specify exit 1 for this condition.  If you
> > > disagree, please show where it says so.
> > 
> > It's the usual thing.  >0 if "an error occurred".
> 
> The 40 year old code base says otherwise.
> 
> > Here is my thinking:
> > 
> > echo(1) has ONE job: print the arguments given.
> > 
> > If it fails to print those arguments, shouldn't we signal that to the
> > program that forked echo(1)?
> 
> Only if you validate all callers can handle this change in behaviour.
> 
> > How is echo(1) supposed to differentiate between a write(2) that is
> > allowed to fail, e.g. a diagnostic printout from fw_update to the
> > user's stderr, and one that is not allowed to fail?
> 
> Perhaps it is not supposed to validate this problem  in 2022, because it
> didn't validate it for 40 years.
> 
> > Consider this scenario:
> > 
> > 1.  A shell script uses echo(1) to write something to a file.
> > 
> > /bin/echo foo.dat >> /var/workerd/data-processing-list
> > 
> > 2.  The bytes don't arrive on disk because the file system is full.
> > 
> > 3.  The shell script succeeds because echo(1) can't fail, even if
> > it fails to do what it was told to do.
> > 
> > Isn't that bad?
> > 
> > And it isn't necessarily true that some other thing will fail later
> > and the whole interlocking system will fail.  ENOSPC is a transient
> > condition.  One write(2) can fail and the next write(2) can succeed.
> 
> Yet, for 40 years noone complained.
> 
> Consider the situation you break and change the behaviour of 1000's of
> shell scripts, and haven'd lifted your finger once to review all the
> shell scripts that call echo.
> 
> Have you even compared this behaviour to the echo built-ins in all
> the shells?

I assume what you mean to say is, roughly:

Gee, this seems risky.

What do other echo implementations do?

1. Our ksh(1) already checks for stdout errors in the echo builtin.

2. FreeBSD's /bin/echo has checked for writev(2) errors in /bin/echo
   since 2003:

https://cgit.freebsd.org/src/commit/bin/echo/echo.c?id=91b7d6dc5871f532b1a86ee76389a9bc348bdf58

3. NetBSD's /bin/echo has checked for stdout errors with ferror(3)
   since 2008:

http://cvsweb.netbsd.org/bsdweb.cgi/src/bin/echo/echo.c?rev=1.18=text/x-cvsweb-markup_with_tag=MAIN

4. NetBSD's /bin/sh echo builtin has checked for write errors since
   2008:

http://cvsweb.netbsd.org/bsdweb.cgi/src/bin/sh/bltin/echo.c?rev=1.14=text/x-cvsweb-markup_with_tag=MAIN

5. OpenSolaris has checked for fflush(3) errors in /usr/bin/echo since
   2005 (OpenSolaris launch):

https://github.com/illumos/illumos-gate/blob/7c478bd95313f5f23a4c958a745db2134aa03244/usr/src/cmd/echo/echo.c#L144

6. Looking forward, illumos inherited and retains the behavior in
   their /usr/bin/echo.

7. Extrapolating backward, we can assume Solaris did that checking in
   /usr/bin/echo prior to 2005.

8. GNU Coreutils echo has checked for fflush(3) and fclose(3) errors on
   stdout since 2000:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/echo.c?id=d3683509b3953beb014e540f6d6194658ede1dea

   They use close_stdout() in an atexit(3) hook.  close_stdout() is a
   convenience function provided by gnulib since 1998 that does what I
   described:

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=23928550db5d400f27fa67de29c738ca324a31ea;hp=f76477e515b36a1e10f7734aac3c5478ccf75989

   Maybe of note is that they do this atexit(3) stdout flush/close
   error checking for many of their utilities.

9. The GNU Bash echo builtin has checked for write errors since v2.04,
   in 2000:

https://git.savannah.gnu.org/cgit/bash.git/commit/builtins/echo.def?id=bb70624e964126b7ac4ff085ba163a9c35ffa18f

   They even noted it in the CHANGES file for that release:

https://git.savannah.gnu.org/cgit/bash.git/commit/CHANGES?id=bb70624e964126b7ac4ff085ba163a9c35ffa18f

--

I don't think that we are first movers in this case.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Wed, Aug 10, 2022 at 12:26:17PM -0600, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > We're sorta-kinda circling around adding the missing (?) stdio error
> > checking to other utilities in bin/ and usr.bin/, no?  I want to be
> > sure I understand how to do the next patch, because if we do that it
> > will probably be a bunch of programs all at once.
> 
> This specific program has not checked for this condition since at least
> 2 AT UNIX.
> 
> Your change does not just add a new warning.  It adds a new exit code
> condition.
> 
> Some scripts using echo, which accepted the condition because echo would
> exit 0 and not check for this condition, will now see this exit 1.  Some
> scripts will abort, because they use "set -o errexit" or similar.
> 
> You are changing the exit code for a command which is used a lot.
> 
> POSIX does not require or specify exit 1 for this condition.  If you
> disagree, please show where it says so.

It's the usual thing.  >0 if "an error occurred".

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html

EXIT STATUS

The following exit values shall be returned:

 0
Successful completion.
>0
An error occurred.

CONSEQUENCES OF ERRORS

Default.

> So my question is:  What will be broken by this change?
> 
> Nothing isn't an answer.  I can write a 5 line shell script that will
> observe the change in behaviour.  Many large shell scripts could break
> from this.  I am thinking of fw_update and the installer, but it could
> also be a problem in Makefiles.

Here is my thinking:

echo(1) has ONE job: print the arguments given.

If it fails to print those arguments, shouldn't we signal that to the
program that forked echo(1)?

How is echo(1) supposed to differentiate between a write(2) that is
allowed to fail, e.g. a diagnostic printout from fw_update to the
user's stderr, and one that is not allowed to fail?

> > I want to be sure I understand how to do the next patch, because if we
> > do that it will probably be a bunch of programs all at once.
> 
> If you cannot speak to the exit code command changing for this one
> simple program, I think there is no case for adding to to hundreds of
> other programs.  Unless POSIX specifies the requirement, I'd like to see
> some justification.
> 
> There will always be situations that UNIX didn't anticipate or handle,
> and then POSIX failed to specify.  Such things are now unhandled, probably
> forever, and have become defacto standards.
> 
> On the balance, is your diff improving on some dangerous problem, or is
> it introducing a vast number of dangerous new risks which cannot be
> identified (and which would require an audit of every known script
> calling echo).  Has such an audit been started?

Consider this scenario:

1.  A shell script uses echo(1) to write something to a file.

/bin/echo foo.dat >> /var/workerd/data-processing-list

2.  The bytes don't arrive on disk because the file system is full.

3.  The shell script succeeds because echo(1) can't fail, even if
it fails to do what it was told to do.

Isn't that bad?

And it isn't necessarily true that some other thing will fail later
and the whole interlocking system will fail.  ENOSPC is a transient
condition.  One write(2) can fail and the next write(2) can succeed.



Re: echo(1): check for stdio errors

2022-08-10 Thread Scott Cheloha
On Sat, Jul 30, 2022 at 05:23:37PM -0600, Todd C. Miller wrote:
> On Sat, 30 Jul 2022 18:19:02 -0500, Scott Cheloha wrote:
> 
> > Bump.  The standard's error cases for fflush(3) are identical to those
> > for fclose(3):
> >
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html
> >
> > Is the fact that our fclose(3) can succeed even if the error flag is
> > set a bug?
> 
> As far as I can tell, neither fflush() nor fclose() check the status
> of the error flag, though they may set it of course.  That is why
> I was suggesting an explicit ferror() call at the end.

I'm sorry, I'm having a dumb moment, I don't quite understand what
you're looking for.

Please tweak my patch so it's the way you want it, with the ferror(3)
call in the right spot.

We're sorta-kinda circling around adding the missing (?) stdio error
checking to other utilities in bin/ and usr.bin/, no?  I want to be
sure I understand how to do the next patch, because if we do that it
will probably be a bunch of programs all at once.

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  10 Aug 2022 18:00:12 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || ferror(stdout) || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



Re: ts(1): parse input format string only once

2022-08-10 Thread Scott Cheloha
On Fri, Jul 29, 2022 at 08:13:14AM -0500, Scott Cheloha wrote:
> On Wed, Jul 13, 2022 at 12:50:24AM -0500, Scott Cheloha wrote:
> > We reduce overhead if we only parse the user's format string once.  To
> > achieve that, this patch does the following:
> > 
> > [...]
> > 
> > - When parsing the user format string in fmtfmt(), keep a list of
> >   where each microsecond substring lands in buf.  We'll need it later.
> > 
> > - Move the printing part of fmtfmt() into a new function, fmtprint().
> >   fmtprint() is now called from the main loop instead of fmtfmt().
> > 
> > - In fmtprint(), before calling strftime(3), update any microsecond
> >   substrings in buf using the list we built earlier in fmtfmt().  Note
> >   that if there aren't any such substrings we don't call snprintf(3)
> >   at all.
> > 
> > [...]
> 
> Two week bump.
> 
> Here is a stripped-down patch with only the above changes.  Hopefully
> this makes the intent of the patch more obvious.

Four week bump + rebase.

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.9
diff -u -p -r1.9 ts.c
--- ts.c3 Aug 2022 16:54:30 -   1.9
+++ ts.c10 Aug 2022 17:49:53 -
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -27,13 +28,20 @@
 #include 
 #include 
 
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*format = "%b %d %H:%M:%S";
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(void);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -90,6 +98,8 @@ main(int argc, char *argv[])
if ((outbuf = calloc(1, obsize)) == NULL)
err(1, NULL);
 
+   fmtfmt();
+
/* force UTC for interval calculations */
if (iflag || sflag)
if (setenv("TZ", "UTC", 1) == -1)
@@ -108,7 +118,7 @@ main(int argc, char *argv[])
timespecadd(, _offset, );
else
ts = now;
-   fmtfmt();
+   fmtprint();
if (iflag)
start = now;
}
@@ -134,15 +144,11 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(const struct timespec *ts)
+fmtfmt(void)
 {
-   struct tm *tm;
-   char *f, us[7];
-
-   if ((tm = localtime(>tv_sec)) == NULL)
-   err(1, "localtime");
+   char *f;
+   struct usec *u;
 
-   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 
@@ -161,12 +167,34 @@ fmtfmt(const struct timespec *ts)
f[0] = f[1];
f[1] = '.';
f += 2;
+   u = malloc(sizeof u);
+   if (u == NULL)
+   err(1, NULL);
+   u->pos = f;
+   SIMPLEQ_INSERT_TAIL(_queue, u, next);
l = strlen(f);
memmove(f + 6, f, l + 1);
-   memcpy(f, us, 6);
f += 6;
}
} while (*f != '\0');
+}
+
+static void
+fmtprint(const struct timespec *ts)
+{
+   char us[8];
+   struct tm *tm;
+   struct usec *u;
+
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   /* Update any microsecond substrings in the format buffer. */
+   if (!SIMPLEQ_EMPTY(_queue)) {
+   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
+   SIMPLEQ_FOREACH(u, _queue, next)
+   memcpy(u->pos, us, 6);
+   }
 
*outbuf = '\0';
if (*buf != '\0') {



Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 06:02:10PM +, Miod Vallat wrote:
> > Other platforms (architectures?) (powerpc, powerpc64, arm64, riscv64)
> > multiplex their singular interrupt clock to schedule both a
> > fixed-period hardclock and a pseudorandom statclock.
> > 
> > This is the direction I intend to take every platform, mips64
> > included, after the next release.
> > 
> > In that context, would there be any reason to prefer glxclk to
> > CP0.count?
> 
> No. The cop0 timer is supposed to be the most reliable timer available.
> (although one may argue that, on sgi, the xbow timer on some systems is
> even better quality)

Alright, got it.  If glxclk provides no other utility aside from an
interrupt clock on loongson, then you and I can coordinate unhooking
it when we switch loongson to the new clockintr code in the Fall.

If I'm missing something and it does other work, then nevermind.

Does the latest patch work on any loongson machines you have?

I didn't see any other splx(9) implementations aside from bonito and
the one for loongson3.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   9 Aug 2022 14:48:47 -
@@ -60,6 +60,7 @@ const struct cfattach clock_ca = {
 };
 
 void   cp0_startclock(struct cpu_info *);
+void   cp0_trigger_int5(void);
 uint32_t cp0_int5(uint32_t, struct trapframe *);
 
 int
@@ -86,19 +87,20 @@ clockattach(struct device *parent, struc
cp0_set_compare(cp0_get_count() - 1);
 
md_startclock = cp0_startclock;
+   md_triggerclock = cp0_trigger_int5;
 }
 
 /*
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  cannot be run the tick is handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +115,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked, defer any work until it
+* is unmasked from splx(9).
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_clock_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_clock_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +145,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
-   ci->ci_pendingticks++;
+   pendingticks++;
cp0_set_compare(ci->ci_cpu_counter_last);
}
 
/*
-* Process clock interrupt unless it is currently masked.
+* Process clock interrupt.
 */
-   if (tf->ipl < IPL_CLOCK) {
 #ifdef MULTIPROCESSOR
-   register_t sr;
+   register_t sr;
 
-   sr = getsr();
-   ENABLEIPI();
+   sr = getsr();
+   ENABLEIPI();
 #endif
-   while (ci->ci_pendingticks) {
-   atomic_inc_long(
-   (unsigned long *)_clock_count.ec_count);
-   hardclock(tf);
-   ci->ci_pendingticks--;
-   }
+   while (pendingticks) {
+   atomic_inc_long((unsigned long *)_clock_count.ec_count);
+   hardclock(tf);
+   pendingticks--;
+   }
 #ifdef MULTIPROCESSOR
-   setsr(sr);
+   setsr(sr);
 #endif
-   }
 
return CR_INT_5;/* Clock is always on 5 */
+}
+
+unsigned long cp0_raise_calls, cp0_raise_miss;
+
+/*
+ * Trigger the clock interrupt.
+ * 
+ * We need to spin until either (a) INT5 is pending or (b) the compare
+ * register leads the count register, i.e. we know INT5 will be pending
+ * very soon.
+ *
+ * To ensure we don't spin forever, double the compensatory offset
+ 

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 02:56:54PM +, Miod Vallat wrote:
> > Do those machines not have Coprocessor 0?  If they do, why would you
> > prefer glxclk over CP0?
> 
> cop0 only provides one timer, from which both the scheduling clock and
> statclk are derived. glxclk allows two timers to be used, and thus can
> provide a more reliable statclk (see the Torek paper, etc - it is even
> mentioned in the glxclk manual page).

Other platforms (architectures?) (powerpc, powerpc64, arm64, riscv64)
multiplex their singular interrupt clock to schedule both a
fixed-period hardclock and a pseudorandom statclock.

This is the direction I intend to take every platform, mips64
included, after the next release.

In that context, would there be any reason to prefer glxclk to
CP0.count?



Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-09 Thread Scott Cheloha
On Tue, Aug 09, 2022 at 02:03:31PM +, Visa Hankala wrote:
> On Mon, Aug 08, 2022 at 02:52:37AM -0500, Scott Cheloha wrote:
> > One thing I'm still uncertain about is how glxclk fits into the
> > loongson picture.  It's an interrupt clock that runs hardclock() and
> > statclock(), but the code doesn't do any logical masking, so I don't
> > know whether or not I need to adjust anything in that code or account
> > for it at all.  If there's no logical masking there's no deferral, so
> > it would never call need to call md_triggerclock() from splx(9).
> 
> I think the masking of glxclk interrupts are handled by the ISA
> interrupt code.

Do those machines not have Coprocessor 0?  If they do, why would you
prefer glxclk over CP0?

> The patch misses md_triggerclock definition in mips64_machdep.c.

Whoops, forgot that file.  Fuller patch below.

> I have put this to the test on the mips64 ports builder machines.

Cool, thank you for testing.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   9 Aug 2022 14:48:47 -
@@ -60,6 +60,7 @@ const struct cfattach clock_ca = {
 };
 
 void   cp0_startclock(struct cpu_info *);
+void   cp0_trigger_int5(void);
 uint32_t cp0_int5(uint32_t, struct trapframe *);
 
 int
@@ -86,19 +87,20 @@ clockattach(struct device *parent, struc
cp0_set_compare(cp0_get_count() - 1);
 
md_startclock = cp0_startclock;
+   md_triggerclock = cp0_trigger_int5;
 }
 
 /*
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  cannot be run the tick is handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +115,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked, defer any work until it
+* is unmasked from splx(9).
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_clock_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_clock_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +145,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
-   ci->ci_pendingticks++;
+   pendingticks++;
cp0_set_compare(ci->ci_cpu_counter_last);
}
 
/*
-* Process clock interrupt unless it is currently masked.
+* Process clock interrupt.
 */
-   if (tf->ipl < IPL_CLOCK) {
 #ifdef MULTIPROCESSOR
-   register_t sr;
+   register_t sr;
 
-   sr = getsr();
-   ENABLEIPI();
+   sr = getsr();
+   ENABLEIPI();
 #endif
-   while (ci->ci_pendingticks) {
-   atomic_inc_long(
-   (unsigned long *)_clock_count.ec_count);
-   hardclock(tf);
-   ci->ci_pendingticks--;
-   }
+   while (pendingticks) {
+   atomic_inc_long((unsigned long *)_clock_count.ec_count);
+   hardclock(tf);
+   pendingticks--;
+   }
 #ifdef MULTIPROCESSOR
-   setsr(sr);
+   setsr(sr);
 #endif
-   }
 
return CR_INT_5;/* Clock is always on 5 */
+}
+
+unsigned long cp0_raise_calls, cp0_raise_miss;
+
+/*
+ * Trigger the clock interrupt.
+ * 
+ * We need to spin until either (a) INT5 is pending or (b) the compare
+ * register leads the count register, i.e. we know INT5 will be pending
+ * very soon.
+ *
+ * To ensure we don't spin forever, dou

Re: mips64: trigger deferred timer interrupt from splx(9)

2022-08-08 Thread Scott Cheloha
On Sun, Aug 07, 2022 at 11:05:37AM +, Visa Hankala wrote:
> On Sun, Jul 31, 2022 at 01:28:18PM -0500, Scott Cheloha wrote:
> > Apparently mips64, i.e. octeon and loongson, has the same problem as
> > powerpc/macppc and powerpc64.  The timer interrupt is normally only
> > logically masked, not physically masked in the hardware, when we're
> > running at or above IPL_CLOCK.  If we arrive at cp0_int5() when the
> > clock interrupt is logically masked we postpone all work until the
> > next tick.  This is a problem for my WIP clock interrupt work.
> 
> I think the use of logical masking has been a design choice, not
> something dictated by the hardware. Physical masking should be possible,
> but some extra care would be needed to implement it, as the mips64
> interrupt code is a bit clunky.

That would be cleaner, but from the sound of it, it's easier to start
with this.

> > So, this patch is basically the same as what I did for macppc and what
> > I have proposed for powerpc64.
> > 
> > - Add a new member, ci_timer_deferred, to mips64's cpu_info struct.
> > 
> >   While here, remove ci_pendingticks.  We don't need it anymore.
> > 
> > - If we get to cp0_int5() and our IPL is too high, set
> >   cpu_info.ci_timer_deferred and return.
> > 
> > - If we get to cp0_int5() and our IPL is low enough, clear
> >   cpu_info.ci_timer_deferred and do clock interrupt work.
> > 
> > - In splx(9), if the new IPL is low enough and cpu_info.ci_timer_deferred
> >   is set, trigger the clock interrupt.
> > 
> > The only big difference is that mips64 uses an equality comparison
> > when deciding whether to arm the timer interrupt, so it's really easy
> > to "miss" CP0.count when you're setting CP0.compare.
> > 
> > To address this I've written a function, cp0_raise_int5(), that spins
> > until it is sure the timer interrupt will go off.  The function needed
> > a bit of new code for reading CP0.cause, which I've added to
> > cp0access.S.  I am using an initial offset of 16 cycles based on
> > experimentation with the machine I have access to, a 500Mhz CN50xx.
> > Values lower than 16 require more than one loop to arm the timer.  If
> > that value is insufficient for other machines we can try passing the
> > initial offset as an argument to the function.
> 
> It should not be necessary to make the initial offset variable. The
> offset is primarily a function of the length and content of the
> instruction sequence. Some unpredictability comes from cache misses
> and maybe branch prediction failures.

Gotcha.  So it mostly depends on the number of instructions between
loading CP0.count and storing CP0.compare.

> > I wasn't sure where to put the prototype for cp0_raise_int5() so I
> > stuck it in mips64/cpu.h.  If there's a better place for it, just say
> > so.
> 
> Currently, mips64 clock.c is formulated as a proper driver. I think
> callers should not invoke its functions directly but use a hook instead.
> The MI mips64 code starts the clock through the md_startclock function
> pointer. Maybe there could be md_triggerclock.
> 
> To reduce risk of confusion, I would rename cp0_raise_int5 to
> cp0_trigger_int5, as `raise' overlaps with the spl API. Also,
> ci_clock_deferred instead of ci_timer_deferred would look more
> consistent with the surrounding code.

Okay, I took all these suggestions and incorporated them.  Updated
patch attached.

One thing I'm still uncertain about is how glxclk fits into the
loongson picture.  It's an interrupt clock that runs hardclock() and
statclock(), but the code doesn't do any logical masking, so I don't
know whether or not I need to adjust anything in that code or account
for it at all.  If there's no logical masking there's no deferral, so
it would never call need to call md_triggerclock() from splx(9).

Also:

My EdgeRouter PoE just finished a serial `make build`.  Took almost 12
days.  Which is a good sign!  Lots of opportunity for the patch to
fail and the clock to die.

In that time, under what I assume is relatively heavy load, the clock
interrupt deferral counters look like this:

cp0_raise_calls at 0x81701308: 133049
cp0_raise_miss at 0x81701300: 0

So 16 cycles as the initial offset works great.  We never ran the loop
more than once, i.e. we never "missed" CP0.count.

The machine has been up a little more than a million seconds.  So, at
100hz, with no separate statclock, and 2 CPUs, we'd expect ~200 clock
interrupts a second, or 200 million in total.

In ~200,000,000 cp0_int5() calls, we deferred ~133,000 of them, or
~0.0665%.

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips

Re: top(1): display uptime in HH:MM:SS format

2022-08-07 Thread Scott Cheloha
On Fri, Sep 18, 2020 at 03:59:05PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> - An HH:MM:SS format uptime is useful in top(1).  It's also more
>   visually consistent with the local timestamp printed on the line
>   above it, so it is easier to read at a glance.
> 
> - The variable printing of "days" is annoying.  I would rather it
>   just told me "0 days" if it had been less than one day.  It sucks
>   when the information you want moves around or isn't shown at all.
>   It's clever, sure, but I'd rather it be consistent.
> 
> This patch changes the uptime format string to "up D days HH:MM:SS".
> The format string does not vary with the elapsed uptime.  There is no
> inclusion/omission of the plural suffix depending on whether days is
> equal to one.
> 
> [...]

Whoops, forgot about this one.  September 18, 2020.  What a time to be
alive.

Let's try this again.  98 week bump.

To recap, this patch makes the uptime formatting in top(1) produce
more constant-width results.  The formatting is now always:

up D days HH:MM:SS

so only the day-count changes size.  The day-count is also always
printed: if the machine has not been up for a full day it prints

up 0 days HH:MM:SS

For example, the upper lines on the top(1) running on my machine
currently look like this:

load averages:  0.29,  0.29,  0.27 jetsam.attlocal.net 18:12:16
82 processes: 81 idle, 1 on processorup 3 days 07:14:01

I have been running with this for almost two years and I love it.
I would like to commit it.

The only feedback I got when I originally posted this was that the
output formatting would no longer be the same as uptime(1)'s.  I don't
think that matters very much.  The person who offered the feedback
didn't think it mattered either, they were just hypothesizing
objections.

ok?

Index: display.c
===
RCS file: /cvs/src/usr.bin/top/display.c,v
retrieving revision 1.65
diff -u -p -r1.65 display.c
--- display.c   26 Aug 2020 16:21:28 -  1.65
+++ display.c   7 Aug 2022 23:14:25 -
@@ -208,31 +208,28 @@ display_init(struct statics * statics)
return (display_lines);
 }
 
+/*
+ * Print the time elapsed since the system booted.
+ */
 static void
 format_uptime(char *buf, size_t buflen)
 {
-   time_t uptime;
-   int days, hrs, mins;
struct timespec boottime;
+   time_t uptime;
+   unsigned int days, hrs, mins, secs;
+
+   if (clock_gettime(CLOCK_BOOTTIME, ) == -1)
+   err(1, "clock_gettime");
 
-   /*
-* Print how long system has been up.
-*/
-   if (clock_gettime(CLOCK_BOOTTIME, ) != -1) {
-   uptime = boottime.tv_sec;
-   uptime += 30;
-   days = uptime / (3600 * 24);
-   uptime %= (3600 * 24);
-   hrs = uptime / 3600;
-   uptime %= 3600;
-   mins = uptime / 60;
-   if (days > 0)
-   snprintf(buf, buflen, "up %d day%s, %2d:%02d",
-   days, days > 1 ? "s" : "", hrs, mins);
-   else
-   snprintf(buf, buflen, "up %2d:%02d",
-   hrs, mins);
-   }
+   uptime = boottime.tv_sec;
+   days = uptime / (3600 * 24);
+   uptime %= (3600 * 24);
+   hrs = uptime / 3600;
+   uptime %= 3600;
+   mins = uptime / 60;
+   secs = uptime % 60;
+   snprintf(buf, buflen, "up %u days %02u:%02u:%02u",
+   days, hrs, mins, secs);
 }
 
 



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-04 Thread Scott Cheloha
On Fri, Aug 05, 2022 at 12:34:59AM +0200, Jeremie Courreges-Anglas wrote:
> >> [...]
> >> 
> >> You're adding the timer reset to plic_setipl() but the latter is called
> >> after softintr processing in plic_splx().
> >> 
> >>/* Pending software intr is handled here */
> >>if (ci->ci_ipending & riscv_smask[new])
> >>riscv_do_pending_intr(new);
> >> 
> >>plic_setipl(new);
> >
> > Yes, but plic_setipl() is also called from the softintr loop in
> > riscv_do_pending_intr() (riscv64/intr.c) right *before* we dispatch
> > any pending soft interrupts:
> >
> >594  void
> >595  riscv_do_pending_intr(int pcpl)
> >596  {
> >597  struct cpu_info *ci = curcpu();
> >598  u_long sie;
> >599
> >600  sie = intr_disable();
> >601
> >602  #define DO_SOFTINT(si, ipl) \
> >603  if ((ci->ci_ipending & riscv_smask[pcpl]) & \
> >604  SI_TO_IRQBIT(si)) { \
> >605  ci->ci_ipending &= ~SI_TO_IRQBIT(si);   \
> > *  606  riscv_intr_func.setipl(ipl);\
> >607  intr_restore(sie);  \
> >608  softintr_dispatch(si);  \
> >609  sie = intr_disable();   \
> >610  }
> >611
> >612  do {
> >613  DO_SOFTINT(SIR_TTY, IPL_SOFTTTY);
> >614  DO_SOFTINT(SIR_NET, IPL_SOFTNET);
> >615  DO_SOFTINT(SIR_CLOCK, IPL_SOFTCLOCK);
> >616  DO_SOFTINT(SIR_SOFT, IPL_SOFT);
> >617  } while (ci->ci_ipending & riscv_smask[pcpl]);
> >
> > We might be fine doing it just once in plic_splx() before we do any
> > soft interrupt stuff.  That's closer to what we're doing on other
> > platforms.
> >
> > I just figured it'd be safer to do it in plic_setipl() because we're
> > already disabling interrupts there.  It seems I guessed correctly
> > because the patch didn't hang your machine.
> 
> Ugh, I had missed that setipl call, thanks for pointing it out.

Np.

> Since I don't wander into this code on a casual basis I won't object,
> but this looks very unobvious to me.  :)

I kind of agree.

I think it would be cleaner -- logically cleaner, not necessarily
cleaner in the code -- to mask timer interrupts when we raise the IPL
to or beyond IPL_CLOCK and unmask timer interrupts when we drop the
IPL below IPL_CLOCK.

... but doing it this way is a lot faster than taking the time to read
and understand the RISC-V privileged architecture spec and how the SBI
interacts with it.

At a glance I see that there are separate Interrupt-Enable bits for
External, Timer, and Software interrupts at the supervisor level.  So
what I'm imagining might be possible.  I just don't know how to get
the current code to do what I've described.



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-04 Thread Scott Cheloha
On Thu, Aug 04, 2022 at 09:39:13AM +0200, Jeremie Courreges-Anglas wrote:
> On Mon, Aug 01 2022, Scott Cheloha  wrote:
> > On Mon, Aug 01, 2022 at 07:15:33PM +0200, Jeremie Courreges-Anglas wrote:
> >> On Sun, Jul 31 2022, Scott Cheloha  wrote:
> >> > Hi,
> >> >
> >> > I am unsure how to properly mask RISC-V timer interrupts in hardware
> >> > at or above IPL_CLOCK.  I think that would be cleaner than doing
> >> > another timer interrupt deferral thing.
> >> >
> >> > But, just to get the ball rolling, here a first attempt at the timer
> >> > interrupt deferral thing for riscv64.  The motivation, as with every
> >> > other platform, is to eventually make it unnecessary for the machine
> >> > dependent code to know anything about the clock interrupt schedule.
> >> >
> >> > The thing I'm most unsure about is where to retrigger the timer in the
> >> > PLIC code.  It seems right to do it from plic_setipl() because I want
> >> > to retrigger it before doing soft interrupt work, but I'm not sure.
> 
> You're adding the timer reset to plic_setipl() but the latter is called
> after softintr processing in plic_splx().
> 
>   /* Pending software intr is handled here */
>   if (ci->ci_ipending & riscv_smask[new])
>   riscv_do_pending_intr(new);
> 
>   plic_setipl(new);

Yes, but plic_setipl() is also called from the softintr loop in
riscv_do_pending_intr() (riscv64/intr.c) right *before* we dispatch
any pending soft interrupts:

   594  void
   595  riscv_do_pending_intr(int pcpl)
   596  {
   597  struct cpu_info *ci = curcpu();
   598  u_long sie;
   599
   600  sie = intr_disable();
   601
   602  #define DO_SOFTINT(si, ipl) \
   603  if ((ci->ci_ipending & riscv_smask[pcpl]) & \
   604  SI_TO_IRQBIT(si)) { \
   605  ci->ci_ipending &= ~SI_TO_IRQBIT(si);   \
*  606  riscv_intr_func.setipl(ipl);\
   607  intr_restore(sie);  \
   608  softintr_dispatch(si);  \
   609  sie = intr_disable();   \
   610  }
   611
   612  do {
   613  DO_SOFTINT(SIR_TTY, IPL_SOFTTTY);
   614  DO_SOFTINT(SIR_NET, IPL_SOFTNET);
   615  DO_SOFTINT(SIR_CLOCK, IPL_SOFTCLOCK);
   616  DO_SOFTINT(SIR_SOFT, IPL_SOFT);
   617  } while (ci->ci_ipending & riscv_smask[pcpl]);

We might be fine doing it just once in plic_splx() before we do any
soft interrupt stuff.  That's closer to what we're doing on other
platforms.

I just figured it'd be safer to do it in plic_setipl() because we're
already disabling interrupts there.  It seems I guessed correctly
because the patch didn't hang your machine.

> >> > Unless I'm missing something, I don't think I need to do anything in
> >> > the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?
> >>
> >> No idea about about the items above, but...
> >> 
> >> > I have no riscv64 machine, so this is untested.  Would appreciate
> >> > tests and feedback.
> >> 
> >> There's an #include  missing in plic.c,
> >
> > Whoops, corrected patch attached below.
> >
> >> with that added your diff builds and GENERIC.MP seems to behave
> >> (currently running make -j4 build), but I don't know exactly which
> >> problems I should look for.
> >
> > Thank you for trying it out.
> >
> > The patch changes how clock interrupt work is deferred on riscv64.
> >
> > If the code is wrong, the hardclock and statclock should eventually
> > die on every CPU.  The death of the hardclock in particular would
> > manifest to the user as livelock.  The scheduler would stop preempting
> > userspace and it would be impossible to use the machine interactively.
> >
> > There isn't really a direct way to exercise this code change.
> >
> > The best we can do is make the machine busy.  If the machine is busy
> > we can expect more spl(9) calls and more deferred clock interrupt
> > work, which leaves more opportunities for the bug to surface.
> >
> > So, a parallel `make build` is fine.  It's our gold standard for
> > making the machine really busy.
> 
> The diff survived three make -j4 build/release in a row, the clock seems
> stable.

Awesome!  Thank you for hammering on it.

kettenis, mlarkin, drahn:

Is this code fine or do you want to go about this in a different way?

Index: dev/plic.c
==

Re: wc(1): accelerate word counting

2022-08-03 Thread Scott Cheloha
On Wed, Nov 17, 2021 at 08:37:53AM -0600, Scott Cheloha wrote:
> In wc(1) we currently count words, both ASCII and multibyte, in a
> getline(3) loop.
> 
> This makes sense in the multibyte case because stdio handles all the
> nasty buffer resizing for us.  We avoid splitting a multibyte between
> two read(2) calls and the resulting code is simpler.
> 
> However, for ASCII input we don't have the split-character problem.
> Using getline(3) doesn't really buy us anything.  We can count words
> in a big buffer (as we do in the ASCII byte- and line-counting modes)
> just fine.
> 
> [...]

37 week bump.

Counting words in a big buffer is faster than doing it with
getline(3).  We don't need the convenience of getline(3) except
in the multibyte case.

The state machine for counting words doesn't need to change because
word transitions still happen within a single byte.  We just move the
logic out of the getline(3) loop and into a read(2) loop.

As for "faster", consider The Adventures of Sherlock Holmes:

$ ftp -o sherlock-holmes.txt https://www.gutenberg.org/files/1661/1661-0.txt
Trying 152.19.134.47...
Requesting https://www.gutenberg.org/files/1661/1661-0.txt
100% |**|   593 KB00:01
607430 bytes received in 1.05 seconds (563.58 KB/s)
$ ls -lh sherlock-holmes.txt
-rw-r--r--  1 ssc  ssc   593K Jun  9  2021 sherlock-holmes.txt

-current:

$ command time /usr/bin/wc $(jot -b ~/sherlock-holmes.txt 200) | tail -n 1
2.081 real 2.730 user 0.080 sys
 2460800 21512000 121486000 total

Patched:

$ command time obj/wc $(jot -b /home/ssc/sherlock-holmes.txt 200) | tail -n 1
1.093 real 1.910 user 0.030 sys
 2460800 21512000 121486000 total

So, twice as fast on an input with normal-ish line lengths.

ok?

Index: wc.c
===
RCS file: /cvs/src/usr.bin/wc/wc.c,v
retrieving revision 1.29
diff -u -p -r1.29 wc.c
--- wc.c28 Nov 2021 19:28:42 -  1.29
+++ wc.c3 Aug 2022 23:11:45 -
@@ -145,16 +145,42 @@ cnt(const char *path)
fd = STDIN_FILENO;
}
 
-   if (!doword && !multibyte) {
+   if (!multibyte) {
if (bufsz < _MAXBSIZE &&
(buf = realloc(buf, _MAXBSIZE)) == NULL)
err(1, NULL);
+
+   /*
+* According to POSIX, a word is a "maximal string of
+* characters delimited by whitespace."  Nothing is said
+* about a character being printing or non-printing.
+*/
+   if (doword) {
+   gotsp = 1;
+   while ((len = read(fd, buf, _MAXBSIZE)) > 0) {
+   charct += len;
+   for (C = buf; len--; ++C) {
+   if (isspace((unsigned char)*C)) {
+   gotsp = 1;
+   if (*C == '\n')
+   ++linect;
+   } else if (gotsp) {
+   gotsp = 0;
+   ++wordct;
+   }
+   }
+   }
+   if (len == -1) {
+   warn("%s", file);
+   rval = 1;
+   }
+   }
/*
 * Line counting is split out because it's a lot
 * faster to get lines than to get words, since
 * the word count requires some logic.
 */
-   if (doline) {
+   else if (doline) {
while ((len = read(fd, buf, _MAXBSIZE)) > 0) {
charct += len;
for (C = buf; len--; ++C)
@@ -204,46 +230,26 @@ cnt(const char *path)
return;
}
 
-   /*
-* Do it the hard way.
-* According to POSIX, a word is a "maximal string of
-* characters delimited by whitespace."  Nothing is said
-* about a character being printing or non-printing.
-*/
gotsp = 1;
while ((len = getline(, , stream)) > 0) {
-   if (multibyte) {
-   const char *end = buf + len;
-   for (C = buf; C < end; C += len) {
-   ++charct;
-   len = mbtowc(, C, MB_CUR_MAX);
-   

Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-08-02 Thread Scott Cheloha
On Mon, Jul 25, 2022 at 06:44:31PM -0500, Scott Cheloha wrote:
> On Mon, Jul 25, 2022 at 01:52:36PM +0200, Mark Kettenis wrote:
> > > Date: Sun, 24 Jul 2022 19:33:57 -0500
> > > From: Scott Cheloha 
> > > 
> > > On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> > > > 
> > > > [...]
> > > > 
> > > > I don't have a powerpc64 machine, so this is untested.  [...]
> > > 
> > > gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.
> > > 
> > > Here is a corrected patch that, according to gkoehler@, actually
> > > compiles.
> > 
> > Thanks.  I already figured that bit out myself.  Did some limited
> > testing, but it seems to work correctly.  No noticable effect on the
> > timekeeping even when building clang on all the (4) cores.
> 
> I wouldn't expect this patch to impact timekeeping.  All we're doing
> is calling hardclock(9) a bit sooner than we normally would a few
> times every second.
> 
> I would expect to see slightly more distinct interrupts (uvmexp.intrs)
> per second because we aren't actively batching hardclock(9) and
> statclock calls.
> 
> ... by the way, uvmexp.intrs should probably be incremented
> atomically, no?
> 
> > Regarding the diff, I think it would be better to avoid changing
> > trap.c.  That function is complicated enough and splitting the logic
> > for this over three files makes it a bit harder to understand.  So you
> > could have:
> > 
> > void
> > decr_intr(struct trapframe *frame)
> > {
> > struct cpu_info *ci = curcpu();
> > ...
> > int s;
> > 
> > if (ci->ci_cpl >= IPL_CLOCK) {
> > ci->ci_dec_deferred = 1;
> > mtdec(UINT32_MAX >> 1); /* clear DEC exception */
> > return;
> > }
> > 
> > ci->ci_dec_deferred = 0;
> > 
> > ...
> > }
> > 
> > That has the downside of course that it will be slightly less
> > efficient if we're at IPL_CLOCK or above, but that really shouldn't
> > happen often enough for it to matter.
> 
> Yep.  It's an extra function call, the overhead is small.
> 
> Updated patch below.

At what point do we consider the patch safe?  Have you seen any hangs?

Wanna run with it another week?

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 23:43:47 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 23:43:47 -
@@ -98,6 +98,17 @@ decr_intr(struct trapframe *frame)
int s;
 
/*
+* If the clock interrupt is masked, postpone all work until
+* it is unmasked in splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -130,30 +141,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct cloc

dmesg(8): fail if given positional arguments

2022-08-02 Thread Scott Cheloha
dmesg(8) doesn't use any positional arguments.  It's a usage error if
any are present.

ok?

Index: dmesg.c
===
RCS file: /cvs/src/sbin/dmesg/dmesg.c,v
retrieving revision 1.31
diff -u -p -r1.31 dmesg.c
--- dmesg.c 24 Dec 2019 13:20:44 -  1.31
+++ dmesg.c 2 Aug 2022 16:48:13 -
@@ -89,6 +89,9 @@ main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
+   if (argc != 0)
+   usage();
+
if (memf == NULL && nlistf == NULL) {
int mib[2], msgbufsize;
size_t len;



Re: riscv64: trigger deferred timer interrupts from splx(9)

2022-08-01 Thread Scott Cheloha
On Mon, Aug 01, 2022 at 07:15:33PM +0200, Jeremie Courreges-Anglas wrote:
> On Sun, Jul 31 2022, Scott Cheloha  wrote:
> > Hi,
> >
> > I am unsure how to properly mask RISC-V timer interrupts in hardware
> > at or above IPL_CLOCK.  I think that would be cleaner than doing
> > another timer interrupt deferral thing.
> >
> > But, just to get the ball rolling, here a first attempt at the timer
> > interrupt deferral thing for riscv64.  The motivation, as with every
> > other platform, is to eventually make it unnecessary for the machine
> > dependent code to know anything about the clock interrupt schedule.
> >
> > The thing I'm most unsure about is where to retrigger the timer in the
> > PLIC code.  It seems right to do it from plic_setipl() because I want
> > to retrigger it before doing soft interrupt work, but I'm not sure.
> >
> > Unless I'm missing something, I don't think I need to do anything in
> > the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?
> 
> No idea about about the items above, but...
> 
> > I have no riscv64 machine, so this is untested.  Would appreciate
> > tests and feedback.
> 
> There's an #include  missing in plic.c,

Whoops, corrected patch attached below.

> with that added your diff builds and GENERIC.MP seems to behave
> (currently running make -j4 build), but I don't know exactly which
> problems I should look for.

Thank you for trying it out.

The patch changes how clock interrupt work is deferred on riscv64.

If the code is wrong, the hardclock and statclock should eventually
die on every CPU.  The death of the hardclock in particular would
manifest to the user as livelock.  The scheduler would stop preempting
userspace and it would be impossible to use the machine interactively.

There isn't really a direct way to exercise this code change.

The best we can do is make the machine busy.  If the machine is busy
we can expect more spl(9) calls and more deferred clock interrupt
work, which leaves more opportunities for the bug to surface.

So, a parallel `make build` is fine.  It's our gold standard for
making the machine really busy.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/riscv64/include/cpu.h,v
retrieving revision 1.12
diff -u -p -r1.12 cpu.h
--- include/cpu.h   10 Jun 2022 21:34:15 -  1.12
+++ include/cpu.h   1 Aug 2022 17:36:41 -
@@ -92,7 +92,7 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;
+   volatile intci_timer_deferred;
 
uint32_tci_cpl;
uint32_tci_ipending;
Index: riscv64/clock.c
===
RCS file: /cvs/src/sys/arch/riscv64/riscv64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- riscv64/clock.c 24 Jul 2021 22:41:09 -  1.3
+++ riscv64/clock.c 1 Aug 2022 17:36:41 -
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -106,6 +107,17 @@ clock_intr(void *frame)
int s;
 
/*
+* If the clock interrupt is masked, defer all clock interrupt
+* work until the clock interrupt is unmasked from splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   sbi_set_timer(UINT64_MAX);
+   return 0;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last clock interrupt,
 * we arrange for earlier interrupt next time.
 */
@@ -132,30 +144,23 @@ clock_intr(void *frame)
 
sbi_set_timer(nextevent);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;

Re: [v5] amd64: simplify TSC sync testing

2022-08-01 Thread Scott Cheloha
On Mon, Aug 01, 2022 at 03:03:36PM +0900, Masato Asou wrote:
> Hi, Scott.
> 
> I tested v5 patch on my ESXi on Ryzen7.
> It works fine for me.

Is this the same Ryzen7 box as in the prior message?

Or do you have two different boxes, one running OpenBSD on the bare
metal, and this one running ESXi?



Re: [v4] amd64: simplify TSC sync testing

2022-07-31 Thread Scott Cheloha
> On Jul 31, 2022, at 23:48, Masato Asou  wrote:
> 
> Hi, Scott
> 
> I tested your patch on my Ryzen7 box.
> And I got failed message:
> 
> $ sysctl -a | grep tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> machdep.tscfreq=3593244667
> machdep.invarianttsc=1
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
> acpitimer0(1000)
> $ dmesg | grep failed
> tsc: cpu0/cpu2: sync test round 1/2 failed
> tsc: cpu0/cpu4: sync test round 1/2 failed
> tsc: cpu0/cpu5: sync test round 1/2 failed
> tsc: cpu0/cpu6: sync test round 1/2 failed
> tsc: cpu0/cpu7: sync test round 1/2 failed
> $ 

Thank you for testing.

Please try with the latest patch.  v5 is posted
on tech@ now.

> dmesg:
> 
> OpenBSD 7.2-beta (GENERIC.MP) #10: Mon Aug  1 13:12:06 JST 2022
>a...@g2-obsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34256752640 (32669MB)
> avail mem = 33201152000 (31663MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdb64 (63 entries)
> bios0: vendor American Megatrends Inc. version "9015" date 03/03/2020
> bios0: MouseComputer Co.,Ltd. LM-AG400

You may also want to try updating your BIOS.



riscv64: trigger deferred timer interrupts from splx(9)

2022-07-31 Thread Scott Cheloha
Hi,

I am unsure how to properly mask RISC-V timer interrupts in hardware
at or above IPL_CLOCK.  I think that would be cleaner than doing
another timer interrupt deferral thing.

But, just to get the ball rolling, here a first attempt at the timer
interrupt deferral thing for riscv64.  The motivation, as with every
other platform, is to eventually make it unnecessary for the machine
dependent code to know anything about the clock interrupt schedule.

The thing I'm most unsure about is where to retrigger the timer in the
PLIC code.  It seems right to do it from plic_setipl() because I want
to retrigger it before doing soft interrupt work, but I'm not sure.

Unless I'm missing something, I don't think I need to do anything in
the default interrupt handler code, i.e. riscv64_dflt_setipl(), right?

I have no riscv64 machine, so this is untested.  Would appreciate
tests and feedback.

-Scott

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/riscv64/include/cpu.h,v
retrieving revision 1.12
diff -u -p -r1.12 cpu.h
--- include/cpu.h   10 Jun 2022 21:34:15 -  1.12
+++ include/cpu.h   1 Aug 2022 01:13:38 -
@@ -92,7 +92,7 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;
+   volatile intci_timer_deferred;
 
uint32_tci_cpl;
uint32_tci_ipending;
Index: riscv64/clock.c
===
RCS file: /cvs/src/sys/arch/riscv64/riscv64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- riscv64/clock.c 24 Jul 2021 22:41:09 -  1.3
+++ riscv64/clock.c 1 Aug 2022 01:13:38 -
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -106,6 +107,17 @@ clock_intr(void *frame)
int s;
 
/*
+* If the clock interrupt is masked, defer all clock interrupt
+* work until the clock interrupt is unmasked from splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   sbi_set_timer(UINT64_MAX);
+   return 0;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last clock interrupt,
 * we arrange for earlier interrupt next time.
 */
@@ -132,30 +144,23 @@ clock_intr(void *frame)
 
sbi_set_timer(nextevent);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
}
+
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
+
+   intr_disable();
+   splx(s);
 
return 0;
 }
Index: dev/plic.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/plic.c,v
retrieving revision 1.10
diff -u -p -r1.10 plic.c
--- dev/plic.c  6 Apr 2022 18:59:27 -   1.10
+++ dev/plic.c  1 Aug 2022 01:13:38 -
@@ -557,6 +557,10 @@ plic_setipl(int new)
/* higher values are higher priority */
plic_set_threshold(ci->ci_cpuid, new);
 
+   /* trigger deferred timer interrupt if cpl is now low enough */
+   if (ci->ci_timer_deferred && new < IPL_CLOCK)
+   sbi_set_timer(0);
+
intr_restore(sie);
 }
 



mips64: trigger deferred timer interrupt from splx(9)

2022-07-31 Thread Scott Cheloha
Hi,

Apparently mips64, i.e. octeon and loongson, has the same problem as
powerpc/macppc and powerpc64.  The timer interrupt is normally only
logically masked, not physically masked in the hardware, when we're
running at or above IPL_CLOCK.  If we arrive at cp0_int5() when the
clock interrupt is logically masked we postpone all work until the
next tick.  This is a problem for my WIP clock interrupt work.

So, this patch is basically the same as what I did for macppc and what
I have proposed for powerpc64.

- Add a new member, ci_timer_deferred, to mips64's cpu_info struct.

  While here, remove ci_pendingticks.  We don't need it anymore.

- If we get to cp0_int5() and our IPL is too high, set
  cpu_info.ci_timer_deferred and return.

- If we get to cp0_int5() and our IPL is low enough, clear
  cpu_info.ci_timer_deferred and do clock interrupt work.

- In splx(9), if the new IPL is low enough and cpu_info.ci_timer_deferred
  is set, trigger the clock interrupt.

The only big difference is that mips64 uses an equality comparison
when deciding whether to arm the timer interrupt, so it's really easy
to "miss" CP0.count when you're setting CP0.compare.

To address this I've written a function, cp0_raise_int5(), that spins
until it is sure the timer interrupt will go off.  The function needed
a bit of new code for reading CP0.cause, which I've added to
cp0access.S.  I am using an initial offset of 16 cycles based on
experimentation with the machine I have access to, a 500Mhz CN50xx.
Values lower than 16 require more than one loop to arm the timer.  If
that value is insufficient for other machines we can try passing the
initial offset as an argument to the function.

I wasn't sure where to put the prototype for cp0_raise_int5() so I
stuck it in mips64/cpu.h.  If there's a better place for it, just say
so.

I also left some atomic counters for you to poke at with pstat(8) if
you want to see what the machine is doing in cp0_raise_int5(), i.e.
how often we defer clock interrupt work and how many loops you take to
arm the timer interrupt.  Those will be removed before commit.

I'm running a `make build` on my EdgeRouter PoE.  It only has 512MB of
RAM, so I can't do a parallel build without hanging the machine when
attempting to compile LLVM.  The build has been running for four days
and the machine has not yet hung, so I think this patch is correct-ish.
I will holler if it hangs.

visa: Assuming this code looks right, could you test this on a
  beefier octeon machine?  Preferably a parallel build?

miod: I'm unclear whether loongson uses cp0_int5().  Am I missing
  code here, or are my changes in arch/loongson sufficient?
  If it's sufficient, could you test this?

  I have no loongson hardware, so this is uncompiled there.
  Sorry in advance if it does not compile.

Thoughts?

Index: mips64/mips64/clock.c
===
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.45
diff -u -p -r1.45 clock.c
--- mips64/mips64/clock.c   6 Apr 2022 18:59:26 -   1.45
+++ mips64/mips64/clock.c   31 Jul 2022 18:18:05 -
@@ -92,13 +92,13 @@ clockattach(struct device *parent, struc
  *  Interrupt handler for targets using the internal count register
  *  as interval clock. Normally the system is run with the clock
  *  interrupt always enabled. Masking is done here and if the clock
- *  can not be run the tick is just counted and handled later when
- *  the clock is logically unmasked again.
+ *  can not be run the is tick handled later when the clock is logically
+ *  unmasked again.
  */
 uint32_t
 cp0_int5(uint32_t mask, struct trapframe *tf)
 {
-   u_int32_t clkdiff;
+   u_int32_t clkdiff, pendingticks = 0;
struct cpu_info *ci = curcpu();
 
/*
@@ -113,15 +113,26 @@ cp0_int5(uint32_t mask, struct trapframe
}
 
/*
+* If the clock interrupt is masked we can't do any work until
+* it is unmasked.
+*/
+   if (tf->ipl >= IPL_CLOCK) {
+   ci->ci_timer_deferred = 1;
+   cp0_set_compare(cp0_get_count() - 1);
+   return CR_INT_5;
+   }
+   ci->ci_timer_deferred = 0;
+
+   /*
 * Count how many ticks have passed since the last clock interrupt...
 */
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
while (clkdiff >= ci->ci_cpu_counter_interval) {
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
-   ci->ci_pendingticks++;
+   pendingticks++;
}
-   ci->ci_pendingticks++;
+   pendingticks++;
ci->ci_cpu_counter_last += ci->ci_cpu_counter_interval;
 
/*
@@ -132,32 +143,64 @@ cp0_int5(uint32_t mask, struct trapframe
clkdiff = cp0_get_count() - ci->ci_cpu_counter_last;
if ((int)clkdiff >= 0) {

[v5] amd64: simplify TSC sync testing

2022-07-30 Thread Scott Cheloha
Hi,

At the urging of sthen@ and dv@, here is v5.

Two major changes from v4:

- Add the function tc_reset_quality() to kern_tc.c and use it
  to lower the quality of the TSC timecounter if we fail the
  sync test.

  tc_reset_quality() will choose a new active timecounter if,
  after the quality change, the given timecounter is no longer
  the best timecounter.

  The upshot is: if you fail the TSC sync test you should boot
  with the HPET as your active timecounter.  If you don't have
  an HPET you'll be using something else.

- Drop the SMT accomodation from the hot loop.  It hasn't been
  necessary since last year when I rewrote the test to run without
  a mutex.  In the rewritten test, the two CPUs in the hot loop
  are not competing for any resources so they should not be able
  to starve one another.

dv: Could you double-check that this still chooses the right
timecounter on your machine?  If so, I will ask deraadt@ to
put this into snaps to replace v4.

Additional test reports are welcome.  Include your dmesg.

--

I do not see much more I can do to improve this patch.

I am seeking patch review and OKs.

I am especially interested in whether my assumptions in tsc_ap_test()
and tsc_bp_test() are correct.  The whole patch depends on those
assumptions.  Is this a valid way to test for TSC desync?  Or am I
missing membar_producer()/membar_consumer() calls?

Here is the long version of "what" and "why" for this patch.

The patch is attached at the end.

- Computing a per-CPU TSC skew value is error-prone, especially
  on multisocket machines and VMs.  My best guess is that larger
  latencies appear to the skew measurement test as TSC desync,
  and so the TSC is demoted to a kernel timecounter on these
  machines or marked non-monotonic.

  This patch eliminates per-CPU TSC skew values.  Instead of trying
  to measure and correct for TSC desync we only try to detect desync,
  which is less error-prone.  This approach should allow a wider
  variety of machines to use the TSC as a timecounter when running
  OpenBSD.

- In the new sync test, both CPUs repeatedly try to detect whether
  their TSC is trailing the other CPU's TSC.  The upside to this
  approach is that it yields no false positives (if my assumptions
  about AMD64 memory access and instruction serialization are correct).
  The downside to this approach is that it takes more time than the
  current skew measurement test.  Each test round takes 1ms, and
  we run up to two rounds per CPU, so this patch slows boot down
  by 2ms per AP.

- If any CPU fails the sync test, the TSC is marked non-monotonic
  and a different timecounter is activated.  The TC_USER flag
  remains intact.  There is no "middle ground" where we fall back
  to only using the TSC in the kernel.

- Because there is no per-CPU skew value, there is also no concept
  of TSC drift anymore.

- Before running the test, we check for the IA32_TSC_ADJUST
  register and reset it if necessary.  This is a trivial way
  to work around firmware bugs that desync the TSC before we
  reach the kernel.

  Unfortunately, at the moment this register appears to only
  be available on Intel processors and I cannot find an equivalent
  but differently-named MSR for AMD processors.

--

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  31 Jul 2022 03:06:39 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,264 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
  

Re: echo(1): check for stdio errors

2022-07-30 Thread Scott Cheloha
On Mon, Jul 11, 2022 at 01:27:23PM -0500, Scott Cheloha wrote:
> On Mon, Jul 11, 2022 at 08:31:04AM -0600, Todd C. Miller wrote:
> > On Sun, 10 Jul 2022 20:58:35 -0900, Philip Guenther wrote:
> > 
> > > Three thoughts:
> > > 1) Since stdio errors are sticky, is there any real advantage to checking
> > > each call instead of just checking the final fclose()?
> 
> My thinking was that we have no idea how many arguments we're going to
> print, so we may as well fail as soon as possible.
> 
> Maybe in more complex programs there would be a code-length or
> complexity-reducing upside to deferring the ferror(3) check until,
> say, the end of a subroutine or something.
> 
> > > [...]
> > 
> > Will that really catch all errors?  From what I can tell, fclose(3)
> > can succeed even if the error flag was set.  The pattern I prefer
> > is to use a final fflush(3) followed by a call to ferror(3) before
> > the fclose(3).
> 
> [...]

Bump.  The standard's error cases for fflush(3) are identical to those
for fclose(3):

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html

Is the fact that our fclose(3) can succeed even if the error flag is
set a bug?

Also, can I go ahead with this?  With this patch, echo(1) fails if we
(for example) try to write to a full file system.  So we are certainly
catching more stdio failures:

$ /bin/echo test > /tmp/myfile

/tmp: write failed, file system is full
$ echo $?
0

$ obj/echo test > /tmp/myfile

/tmp: write failed, file system is full
echo: stdout: No space left on device
$ echo $?
1

Progress!  Note that the shell builtin already fails in this case:

$ type echo
echo is a shell builtin
$ echo test > /tmp/myfile

/tmp: write failed, file system is full
jetsam$ echo $?
1

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  30 Jul 2022 23:10:24 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



rc(8): reorder_libs(): print names of relinked libraries

2022-07-29 Thread Scott Cheloha
Recently I've been doing some MIPS64 stuff on my EdgeRouter PoE.  It
has a USB disk, two 500MHz processors, and 512MB of RAM.

So, every time I reboot to test the next iteration of my kernel
patch, I get to here:

reordering libraries: 

and I sit there for half a minute or more and wonder what the hell
it's doing.

And, in my intellectual brain, I know it's relinking the libraries
and that this is slow because it needs to link a bunch of object files
and my machine is slow and my disk is slow and I have almost no RAM.

But!  My animal brain wishes I could see some indication of progress.
Because the script has told me it is linking more than one library.
So, as with daemon startup, I am curious which library it is working
on at any given moment.

Can we print the library names as they are being relinked?

With the attached patch the boot now looks like this:

reordering libraries: ld.so libc.so.96.1 libcrypto.so.49.1.

We print the library name before it is relinked, so you can know which
library it is linking.

If for some reason we fail on a particular library, it instead looks
like this:

reordering libraries: ld.so(failed).

... which is me trying to imitate what we do for daemon startup.

Thoughts?

I know this makes rc(8) a bit noisier but it really does improve my
(for want of a better term) "user experience" as I wait for my machine
to boot.

Index: rc
===
RCS file: /cvs/src/etc/rc,v
retrieving revision 1.563
diff -u -p -r1.563 rc
--- rc  28 Jul 2022 16:06:04 -  1.563
+++ rc  30 Jul 2022 00:15:26 -
@@ -193,7 +193,7 @@ reorder_libs() {
# Remount the (read-only) filesystems in _ro_list as read-write.
for _mp in $_ro_list; do
if ! mount -u -w $_mp; then
-   echo ' failed.'
+   echo '(failed).'
return
fi
done
@@ -210,6 +210,7 @@ reorder_libs() {
_install='install -F -o root -g bin -m 0444'
_lib=${_liba##*/}
_lib=${_lib%.a}
+   echo -n " $_lib"
_lib_dir=${_liba#$_relink}
_lib_dir=${_lib_dir%/*}
cd $_tmpdir
@@ -243,9 +244,9 @@ reorder_libs() {
done
 
if $_error; then
-   echo ' failed.'
+   echo '(failed).'
else
-   echo ' done.'
+   echo '.'
fi
 }
 



Re: ts(1): parse input format string only once

2022-07-29 Thread Scott Cheloha
On Wed, Jul 13, 2022 at 12:50:24AM -0500, Scott Cheloha wrote:
> We reduce overhead if we only parse the user's format string once.  To
> achieve that, this patch does the following:
> 
> [...]
> 
> - When parsing the user format string in fmtfmt(), keep a list of
>   where each microsecond substring lands in buf.  We'll need it later.
> 
> - Move the printing part of fmtfmt() into a new function, fmtprint().
>   fmtprint() is now called from the main loop instead of fmtfmt().
> 
> - In fmtprint(), before calling strftime(3), update any microsecond
>   substrings in buf using the list we built earlier in fmtfmt().  Note
>   that if there aren't any such substrings we don't call snprintf(3)
>   at all.
> 
> [...]

Two week bump.

Here is a stripped-down patch with only the above changes.  Hopefully
this makes the intent of the patch more obvious.

In short, parse the user format string only once, and then only update
the microsecond parts (if any) when we print each new timestamp.

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.8
diff -u -p -r1.8 ts.c
--- ts.c7 Jul 2022 10:40:25 -   1.8
+++ ts.c29 Jul 2022 13:12:07 -
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -27,13 +28,20 @@
 #include 
 #include 
 
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*format = "%b %d %H:%M:%S";
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(void);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -88,6 +96,8 @@ main(int argc, char *argv[])
if ((outbuf = calloc(1, obsize)) == NULL)
err(1, NULL);
 
+   fmtfmt();
+
/* force UTC for interval calculations */
if (iflag || sflag)
if (setenv("TZ", "UTC", 1) == -1)
@@ -106,7 +116,7 @@ main(int argc, char *argv[])
timespecadd(, _offset, );
else
ts = now;
-   fmtfmt();
+   fmtprint();
if (iflag)
start = now;
}
@@ -132,15 +142,11 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(const struct timespec *ts)
+fmtfmt(void)
 {
-   struct tm *tm;
-   char *f, us[7];
-
-   if ((tm = localtime(>tv_sec)) == NULL)
-   err(1, "localtime");
+   char *f;
+   struct usec *u;
 
-   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 
@@ -159,12 +165,34 @@ fmtfmt(const struct timespec *ts)
f[0] = f[1];
f[1] = '.';
f += 2;
+   u = malloc(sizeof u);
+   if (u == NULL)
+   err(1, NULL);
+   u->pos = f;
+   SIMPLEQ_INSERT_TAIL(_queue, u, next);
l = strlen(f);
memmove(f + 6, f, l + 1);
-   memcpy(f, us, 6);
f += 6;
}
} while (*f != '\0');
+}
+
+static void
+fmtprint(const struct timespec *ts)
+{
+   char us[8];
+   struct tm *tm;
+   struct usec *u;
+
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   /* Update any microsecond substrings in the format buffer. */
+   if (!SIMPLEQ_EMPTY(_queue)) {
+   snprintf(us, sizeof(us), "%06ld", ts->tv_nsec / 1000);
+   SIMPLEQ_FOREACH(u, _queue, next)
+   memcpy(u->pos, us, 6);
+   }
 
*outbuf = '\0';
if (*buf != '\0') {



Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
On Thu, Jul 28, 2022 at 04:57:41PM -0400, Dave Voutila wrote:
> 
> Stuart Henderson  writes:
> 
> > On 2022/07/28 12:57, Scott Cheloha wrote:
> >> On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
> >> >
> >> > This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
> >> > running the latest kernel from snaps.
> >>
> >> Define "breaking".
> >
> > That's clear from the output:
> >
> > : On 2022/07/28 07:55, Dave Voutila wrote:
> > : > $ sysctl -a | grep tsc
> > : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> > : > acpitimer0(1000)
> > : > machdep.tscfreq=2096064730
> > : > machdep.invarianttsc=1
> > : >
> > : > $ sysctl kern.timecounter
> > : > kern.timecounter.tick=1
> > : > kern.timecounter.timestepwarnings=0
> > : > kern.timecounter.hardware=i8254
> > : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> > : > acpitimer0(1000)
> >
> >> The code detects TSC desync and marks the timecounter non-monotonic.
> >
> > That's good (and I think as would have happened before)
> >
> >> So it uses the i8254 instead.
> >
> > But that's not so good, there are higher prio timecounters available,
> > acpihpet0 and acpitimer0, which would be better choices than i8254.
> 
> Exactly my point. Thanks Stuart.

Okay, please try this patch on the machine in question.

It adds a tc_detach() function to kern_tc.c.  The first time we fail
the sync test, the BP calls tc_detach(), changes the TSC's tc_quality
to a negative value to tell everyone "this is not monotonic", then
reinstalls the TSC timecounter again with tc_init().

Because we are making this call *once*, from one place, I do not think
the O(n) removal time matters, so I have not switched the tc_list from
SLIST to TAILQ.

It is possible for a thread to be asleep in sysctl_tc_hardware()
during resume, but the thread would be done iterating through the list
if it had reached rw_enter_write(), so removing/adding tsc_timecounter
to the list during resume cannot break list traversal.

Switching the active timecounter during resume is also fine.  The only
race is with tc_adjfreq().  If a thread is asleep in adjfreq(2) when
the system suspends, and we change the active timecounter during
resume, the frequency change may be applied to the "wrong" timecounter.

... but this is always a race, because adjfreq(2) only operates on the
active timecounter, and root can change it at any time via sysctl(2).
So it's not a new problem.

...

It might be simpler to just change tc_lock from a rwlock to a mutex.
Then the MP analysis is much simpler across a suspend/resume.

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  29 Jul 2022 01:06:17 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,276 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
}
 
-   if (tsc_drift_observed > TSC_DRIFT_MAX) {
-   printf("ERROR: %lld cycle TSC drift observed\n",
-   (long long)tsc_drift_observed);
-   tsc_timecounter.tc_quality = -1000;
-   tsc_timecounter.tc_user = 0

Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
> On Jul 28, 2022, at 13:41, Stuart Henderson  wrote:
> 
> On 2022/07/28 12:57, Scott Cheloha wrote:
>>> On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
>>> 
>>> This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
>>> running the latest kernel from snaps.
>> 
>> Define "breaking".
> 
> That's clear from the output:
> 
> : On 2022/07/28 07:55, Dave Voutila wrote:
> : > $ sysctl -a | grep tsc
> : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> : > acpitimer0(1000)
> : > machdep.tscfreq=2096064730
> : > machdep.invarianttsc=1
> : > 
> : > $ sysctl kern.timecounter
> : > kern.timecounter.tick=1
> : > kern.timecounter.timestepwarnings=0
> : > kern.timecounter.hardware=i8254
> : > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000)
> : > acpitimer0(1000)
> 
>> The code detects TSC desync and marks the timecounter non-monotonic.
> 
> That's good (and I think as would have happened before)
> 
>> So it uses the i8254 instead.
> 
> But that's not so good, there are higher prio timecounters available,
> acpihpet0 and acpitimer0, which would be better choices than i8254.

Okay that was my second guess.

I will send out a patch addressing this in
a bit.



Re: [v4] amd64: simplify TSC sync testing

2022-07-28 Thread Scott Cheloha
On Thu, Jul 28, 2022 at 07:55:40AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > Hi,
> >
> > Thanks to everyone who tested v3.
> >
> > Attached is v4.  I would like to put this into snaps (bcc: deraadt@).
> >
> > If you've been following along and testing these patches, feel free to
> > continue testing.  If your results change from v3 to v4, please reply
> > with what happened and your dmesg.
> >
> > I made a few small changes from v3:
> >
> > - Only run the sync test after failing it on TSC_DEBUG kernels.
> >   For example, it would be a waste of time to run the sync test
> >   for 62 other CPU pairs if the CPU0/CPU1 sync test failed.
> >
> > - Pad the tsc_test_status struct by hand.  Try to keep
> >   tsc_test_status.val onto its own cache line and try to prevent one
> >   instance of the struct from sharing a cache line with another
> >   instance.
> >
> > I am looking for OKs.
> >
> > Assuming the results from snaps testing aren't catastrophic, and this
> > version is OK'd, I hope to commit this after a couple weeks in snaps.
> 
> This is breaking timecounter selection on my x13 Ryzen 5 Pro laptop
> running the latest kernel from snaps.

Define "breaking".

The code detects TSC desync and marks the timecounter non-monotonic.
So it uses the i8254 instead.

This is the intended behavior of the patch.

The latest news on the desync we're seeing on certain Ryzen CPUs is
that an engineer at AMD has said it might be a bug in AGESA and that
if/when BIOS vendors pull in a fix from AMD and distribute it to
customers it may solve the problem:

https://bugzilla.kernel.org/show_bug.cgi?id=216146#c7

--

Or do you mean "breaking" in some other way?



Re: sleep.1: misc. cleanup

2022-07-27 Thread Scott Cheloha
On Wed, Jul 27, 2022 at 07:31:11AM +0100, Jason McIntyre wrote:
> On Tue, Jul 26, 2022 at 09:18:47PM -0500, Scott Cheloha wrote:
> > A few improvements I want to make to the sleep(1) manpage.
> > 
> > DESCRIPTION
> > 
> > - "for a minimum of" is better said "for at least".
> > 
> 
> hi.
> 
> i can;t really distinguish between one form being better than the other.
> "until at least" is the posix wording; "for a minimum" the text in
> net/free/open etc.

I am confident "until at least" sounds more natural than "for a
minimum of".

> 
> > - The seconds argument can be zero, so say "non-negative".
> > 
> > - Specify that the number (the whole thing) is decimal to exclude
> >   e.g. hex numbers.  It then follows that the optional fraction
> >   must also be decimal.
> > 
> > - I don't think we need to inspire the reader to use sleep(1) in any
> >   particular way.  We can just demonstrate these patterns in the
> >   Examples provided later.
> > 
> > ASYNCHRONOUS EVENTS
> > 
> > - Note that SIGALRM wakes sleep(1) up "early".
> > 
> > EXAMPLES
> > 
> > - Simplify the first example.  I think parenthetically pointing the
> >   reader to at(1) muddies what ought to be the simplest possible
> >   example.  Scheduling jobs is a way more advanced topic, sleep(1)
> >   is more like a shell primitive.
> > 
> > - Shorten the interval in the first example.  A half hour is not
> >   interactive.
> > 
> > - Get rid of the entire csh(1) example.  It's extremely complex and
> >   the bulk of the text is spent explaining things that aren't about
> >   sleep(1) at all.
> > 
> >   Maybe also of note is that very few other manpages offer csh(1)
> >   examples.  Is there a rule about that?
> > 
> 
> i suppose the dominance of sh has led to examples getting written in this
> style. but that doesn;t mean we have to rewrite all csh examples. i
> think we usually use sh for script examples, but try to make sure that
> all examples work regardless of the user shell.
> 
> you're right that the current section is a bit wordy though.

Alright, I'm leaving it out then.

> > - Tweak the third example to show the reader that you can sleep
> >   for a fraction of a second, as mentioned in the Description.
> > 
> > STANDARDS
> > 
> > - Prefer active voice.
> > 
> >   "The handling of fractional arguments" is better said
> >   "Support for fractional seconds".
> > 
> >   Shorten "is provided as" to "is".
> > 
> > SEE ALSO
> > 
> > - Seems logical to point back to nanosleep(2) and sleep(3).
> > 
> 
> normally we'd try to avoid sending the reader of section1 pages to
> sections 2/3/9. but if there's stuff there that will help the user (not
> code writer) then it'd make sense. is there?

Nope, removed.

> > - Add echo(1) and ls(1) from the EXAMPLES.
> > 
> 
> that's not needed. we don;t add every command listed in the page to SEE
> ALSO. just really pages we think will help people better understand the
> subject they're reading about. so echo(1) does not really help you
> understand sleep(1). however you should leave the reference to at(1) -
> in this case it shows you how to do something like sleep, but better
> suited to some specific tasks.

Okay, I have dropped echo(1) and ls(1) and restored at(1).

> >   ... unsure if we actually need to reference these or if it's
> >   a distraction.  The existing examples make use of awk(1) but
> >   do not Xr it in this section, unsure if there is a rule about
> >   this.
> > 
> > - Add signal(3) because we talk about SIGALRM.
> > 
> 
> again, i think that's outside the scope of sleep(1).

We explicitly mention that sleep(1) has non-standard behavior when
receiving SIGALRM.  It's a feature.  You can use it to, for example,
manually intervene and abbreviate a long delay in a script.

... should I cook up an example with kill(1)?

--

Here's an updated diff.

I have also noted in History that sleep(1) was rewritten for 4.4BSD
(to avoid issues with the AT copyright, I assume).  Keith Bostic
committed it, but I don't know if he actually rewrote it.

Index: sleep.1
===
RCS file: /cvs/src/bin/sleep/sleep.1,v
retrieving revision 1.22
diff -u -p -r1.22 sleep.1
--- sleep.1 16 Aug 2016 18:51:25 -  1.22
+++ sleep.1 27 Jul 2022 18:35:02 -
@@ -45,58 +45,27 @@
 .Sh DESCRIPTION
 The
 .Nm
-utility
-suspends execution for a minimum of the specified number of
-.Ar seconds .

sleep.1: misc. cleanup

2022-07-26 Thread Scott Cheloha
A few improvements I want to make to the sleep(1) manpage.

DESCRIPTION

- "for a minimum of" is better said "for at least".

- The seconds argument can be zero, so say "non-negative".

- Specify that the number (the whole thing) is decimal to exclude
  e.g. hex numbers.  It then follows that the optional fraction
  must also be decimal.

- I don't think we need to inspire the reader to use sleep(1) in any
  particular way.  We can just demonstrate these patterns in the
  Examples provided later.

ASYNCHRONOUS EVENTS

- Note that SIGALRM wakes sleep(1) up "early".

EXAMPLES

- Simplify the first example.  I think parenthetically pointing the
  reader to at(1) muddies what ought to be the simplest possible
  example.  Scheduling jobs is a way more advanced topic, sleep(1)
  is more like a shell primitive.

- Shorten the interval in the first example.  A half hour is not
  interactive.

- Get rid of the entire csh(1) example.  It's extremely complex and
  the bulk of the text is spent explaining things that aren't about
  sleep(1) at all.

  Maybe also of note is that very few other manpages offer csh(1)
  examples.  Is there a rule about that?

- Tweak the third example to show the reader that you can sleep
  for a fraction of a second, as mentioned in the Description.

STANDARDS

- Prefer active voice.

  "The handling of fractional arguments" is better said
  "Support for fractional seconds".

  Shorten "is provided as" to "is".

SEE ALSO

- Seems logical to point back to nanosleep(2) and sleep(3).

- Add echo(1) and ls(1) from the EXAMPLES.

  ... unsure if we actually need to reference these or if it's
  a distraction.  The existing examples make use of awk(1) but
  do not Xr it in this section, unsure if there is a rule about
  this.

- Add signal(3) because we talk about SIGALRM.

HISTORY

- Not merely "appeared": "first appeared".

--

Tweaks?  ok?

Index: sleep.1
===
RCS file: /cvs/src/bin/sleep/sleep.1,v
retrieving revision 1.22
diff -u -p -r1.22 sleep.1
--- sleep.1 16 Aug 2016 18:51:25 -  1.22
+++ sleep.1 27 Jul 2022 02:16:18 -
@@ -45,62 +45,35 @@
 .Sh DESCRIPTION
 The
 .Nm
-utility
-suspends execution for a minimum of the specified number of
-.Ar seconds .
-This number must be positive and may contain a decimal fraction.
-.Nm
-is commonly used to schedule the execution of other commands (see below).
+utility suspends execution until at least the given number of
+.Ar seconds
+have elapsed.
+.Ar seconds
+must be a non-negative decimal value and may contain a fraction.
 .Sh ASYNCHRONOUS EVENTS
 .Bl -tag -width "SIGALRMXXX"
 .It Dv SIGALRM
-Terminate normally, with a zero exit status.
+Terminate early, with a zero exit status.
 .El
 .Sh EXIT STATUS
 .Ex -std sleep
 .Sh EXAMPLES
-Wait a half hour before running the script
-.Pa command_file
-(see also the
-.Xr at 1
-utility):
-.Pp
-.Dl (sleep 1800; sh command_file >& errors)&
-.Pp
-To repetitively run a command (with
-.Xr csh 1 ) :
-.Bd -literal -offset indent
-while (! -r zzz.rawdata)
-   sleep 300
-end
-foreach i (*.rawdata)
-   sleep 70
-   awk -f collapse_data $i >> results
-end
-.Ed
+Wait five seconds before running a command:
 .Pp
-The scenario for such a script might be: a program currently
-running is taking longer than expected to process a series of
-files, and it would be nice to have another program start
-processing the files created by the first program as soon as it is finished
-(when
-.Pa zzz.rawdata
-is created).
-The script checks every five minutes for this file.
-When it is found, processing is done in several steps
-by sleeping 70 seconds between each
-.Xr awk 1
-job.
+.Dl $ sleep 5 ; echo Hello, World!
 .Pp
-To monitor the growth of a file without consuming too many resources:
+List a file twice per second:
 .Bd -literal -offset indent
-while true; do
-   ls -l file
-   sleep 5
+while ls -l file; do
+   sleep 0.5
 done
 .Ed
 .Sh SEE ALSO
-.Xr at 1
+.Xr echo 1 ,
+.Xr ls 1 ,
+.Xr nanosleep 2 ,
+.Xr signal 3 ,
+.Xr sleep 3
 .Sh STANDARDS
 The
 .Nm
@@ -108,10 +81,11 @@ utility is compliant with the
 .St -p1003.1-2008
 specification.
 .Pp
-The handling of fractional arguments is provided as an extension to that
-specification.
+Support for fractional
+.Ar seconds
+is an extension to that specification.
 .Sh HISTORY
 A
 .Nm
-utility appeared in
+utility first appeared in
 .At v4 .



Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-25 Thread Scott Cheloha
On Mon, Jul 25, 2022 at 01:52:36PM +0200, Mark Kettenis wrote:
> > Date: Sun, 24 Jul 2022 19:33:57 -0500
> > From: Scott Cheloha 
> > 
> > On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > I don't have a powerpc64 machine, so this is untested.  [...]
> > 
> > gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.
> > 
> > Here is a corrected patch that, according to gkoehler@, actually
> > compiles.
> 
> Thanks.  I already figured that bit out myself.  Did some limited
> testing, but it seems to work correctly.  No noticable effect on the
> timekeeping even when building clang on all the (4) cores.

I wouldn't expect this patch to impact timekeeping.  All we're doing
is calling hardclock(9) a bit sooner than we normally would a few
times every second.

I would expect to see slightly more distinct interrupts (uvmexp.intrs)
per second because we aren't actively batching hardclock(9) and
statclock calls.

... by the way, uvmexp.intrs should probably be incremented
atomically, no?

> Regarding the diff, I think it would be better to avoid changing
> trap.c.  That function is complicated enough and splitting the logic
> for this over three files makes it a bit harder to understand.  So you
> could have:
> 
> void
> decr_intr(struct trapframe *frame)
> {
>   struct cpu_info *ci = curcpu();
>   ...
>   int s;
> 
>   if (ci->ci_cpl >= IPL_CLOCK) {
>   ci->ci_dec_deferred = 1;
>   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
>   return;
>   }
> 
>   ci->ci_dec_deferred = 0;
> 
>   ...
> }
> 
> That has the downside of course that it will be slightly less
> efficient if we're at IPL_CLOCK or above, but that really shouldn't
> happen often enough for it to matter.

Yep.  It's an extra function call, the overhead is small.

Updated patch below.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 23:43:47 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 23:43:47 -
@@ -98,6 +98,17 @@ decr_intr(struct trapframe *frame)
int s;
 
/*
+* If the clock interrupt is masked, postpone all work until
+* it is unmasked in splx(9).
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -130,30 +141,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   s = splclock();
+   intr_enable();
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
-
-   intr_disable();
-   splx(s);
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
}
+
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
+
+   intr_d

Re: powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-24 Thread Scott Cheloha
On Sat, Jul 23, 2022 at 08:14:32PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> I don't have a powerpc64 machine, so this is untested.  [...]

gkoehler@ has pointed out two dumb typos in the prior patch.  My bad.

Here is a corrected patch that, according to gkoehler@, actually
compiles.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   25 Jul 2022 00:30:33 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   25 Jul 2022 00:30:33 -
@@ -130,30 +130,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
+   s = splclock();
+   intr_enable();
 
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
+   }
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
 
-   intr_disable();
-   splx(s);
-   }
+   intr_disable();
+   splx(s);
 }
 
 void
Index: powerpc64/intr.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/intr.c,v
retrieving revision 1.9
diff -u -p -r1.9 intr.c
--- powerpc64/intr.c26 Sep 2020 17:56:54 -  1.9
+++ powerpc64/intr.c25 Jul 2022 00:30:33 -
@@ -139,6 +139,11 @@ splx(int new)
 {
struct cpu_info *ci = curcpu();
 
+   if (ci->ci_dec_deferred && new < IPL_CLOCK) {
+   mtdec(0);
+   mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
+
if (ci->ci_ipending & intr_smask[new])
intr_do_pending(new);
 
Index: powerpc64/trap.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v
retrieving revision 1.51
diff -u -p -r1.51 trap.c
--- powerpc64/trap.c11 May 2021 18:21:12 -  1.51
+++ powerpc64/trap.c25 Jul 2022 00:30:33 -
@@ -65,9 +65,15 @@ trap(struct trapframe *frame)
switch (type) {
case EXC_DECR:
uvmexp.intrs++;
-   ci->ci_idepth++;
-   decr_intr(frame);
-   ci->ci_idepth--;
+   if (ci->ci_cpl < IPL_CLOCK) {
+   ci->ci_dec_deferred = 0;
+   ci->ci_idepth++;
+   decr_intr(frame);
+   ci->ci_idepth--;
+   } else {
+   ci->ci_dec_deferred = 1;
+   mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   }
return;
case EXC_EXI:
uvmexp.intrs++;



powerpc64: retrigger deferred DEC interrupts from splx(9)

2022-07-23 Thread Scott Cheloha
Okay, we did this for powerpc/macppc, on to powerpc64.

It's roughly the same problem as before:

- On powerpc64 we need to leave the DEC unmasked at or above
  IPL_CLOCK.

- Currently we defer clock interrupt work to the next tick if a DEC
  interrupt arrives when the CPU's priority level is at or above
  IPL_CLOCK.

- This is a problem because the MD code needs to know about
  when the next clock interrupt event is scheduled and I intend
  to make that information machine-independent and handle it
  in machine-independent code in the future.

- This patch instead defers clock interrupt work to the next splx(9)
  call where the CPU's priority level is dropping below IPL_CLOCK.
  This requires no knowledge of when the next clock interrupt
  event is scheduled.

The code is almost identical to what we did for powerpc/macppc,
except that:

- We can do the ci_dec_deferred handling in trap(), which is a
  bit cleaner.

- There is only one splx() function that needs modifying.

Unless I'm missing something, we no longer need the struct member
cpu_info.ci_statspending.

I don't have a powerpc64 machine, so this is untested.  I would
appreciate tests and review.  If you're copied on this, I'm under the
impression you have a powerpc64 machine or know someone who might.

Thoughts?  Test results?

I'm really sorry if this doesn't work out of the box and your machine
hangs.

Index: include/cpu.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/cpu.h,v
retrieving revision 1.31
diff -u -p -r1.31 cpu.h
--- include/cpu.h   6 Jul 2021 09:34:07 -   1.31
+++ include/cpu.h   24 Jul 2022 01:08:22 -
@@ -74,9 +74,9 @@ struct cpu_info {
uint64_tci_lasttb;
uint64_tci_nexttimerevent;
uint64_tci_nextstatevent;
-   int ci_statspending;

volatile intci_cpl;
+   volatile intci_dec_deferred;
uint32_tci_ipending;
uint32_tci_idepth;
 #ifdef DIAGNOSTIC
Index: powerpc64/clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- powerpc64/clock.c   23 Feb 2021 04:44:31 -  1.3
+++ powerpc64/clock.c   24 Jul 2022 01:08:22 -
@@ -130,30 +130,23 @@ decr_intr(struct trapframe *frame)
mtdec(nextevent - tb);
mtdec(nextevent - mftb());
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
+   s = splclock();
+   intr_enable();
 
-   s = splclock();
-   intr_enable();
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < prevtb) {
-   ci->ci_lasttb += tick_increment;
-   clock_count.ec_count++;
-   hardclock((struct clockframe *)frame);
-   }
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < prevtb) {
+   ci->ci_lasttb += tick_increment;
+   clock_count.ec_count++;
+   hardclock((struct clockframe *)frame);
+   }
 
-   while (nstats-- > 0)
-   statclock((struct clockframe *)frame);
+   while (nstats-- > 0)
+   statclock((struct clockframe *)frame);
 
-   intr_disable();
-   splx(s);
-   }
+   intr_disable();
+   splx(s);
 }
 
 void
Index: powerpc64/intr.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/intr.c,v
retrieving revision 1.9
diff -u -p -r1.9 intr.c
--- powerpc64/intr.c26 Sep 2020 17:56:54 -  1.9
+++ powerpc64/intr.c24 Jul 2022 01:08:22 -
@@ -139,6 +139,11 @@ splx(int new)
 {
struct cpu_info *ci = curcpu();
 
+   if (ci->ci_dec_deferred && new < IPL_CLOCK) {
+   mtdec(0);
+   mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
+
if (ci->ci_ipending & intr_smask[new])
intr_do_pending(new);
 
Index: powerpc64/trap.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v
retrieving revision 1.51
diff -u -p -r1.51 trap.c
--- powerpc64/trap.c11 May 2021 18:21:12 -  1.51
+++ powerpc64/trap.c24 Jul 2022 01:08:22 -
@@ -65,9 +65,15 @@ trap(struct trapframe *frame)
switch (type) {
case EXC_DECR:
uvmexp.intrs++;
-   ci->ci_idepth++;
-   decr_intr(frame);
-   ci->ci_idepth--;
+   if (ci->ci_cpl < IPL_CLOCK) {
+   ci->ci_decr_deferred = 0;
+   

[v2] timeout.9: rewrite

2022-07-22 Thread Scott Cheloha
Hi,

As promised, here is the timeout.9 manpage rewrite I've been sitting
on.  I am pretty sure jmc@ (and maybe schwarze@) read an earlier
version of this.  It has drifted a bit since then, but not much.

My main goal here is to make all the "gotchas" in the timeout API more
explicit.  The API is large, so the manpage is necessarily longer than
the average manpage.

We're also stuck in the midst of an API transition, so there is some
overlap in the API coverage.  Hopefully most of that redundancy can be
consolidated in the future after I finish the clock interrupt work.

-Scott

Index: share/man/man9/timeout.9
===
RCS file: /cvs/src/share/man/man9/timeout.9,v
retrieving revision 1.55
diff -u -p -r1.55 timeout.9
--- share/man/man9/timeout.922 Jun 2022 14:10:49 -  1.55
+++ share/man/man9/timeout.922 Jul 2022 18:34:14 -
@@ -1,6 +1,7 @@
 .\"$OpenBSD: timeout.9,v 1.55 2022/06/22 14:10:49 visa Exp $
 .\"
 .\" Copyright (c) 2000 Artur Grabowski 
+.\" Copyright (c) 2021, 2022 Scott Cheloha 
 .\" All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
@@ -36,6 +37,8 @@
 .Nm timeout_add_nsec ,
 .Nm timeout_add_usec ,
 .Nm timeout_add_tv ,
+.Nm timeout_rel_nsec ,
+.Nm timeout_abs_ts ,
 .Nm timeout_del ,
 .Nm timeout_del_barrier ,
 .Nm timeout_barrier ,
@@ -44,281 +47,375 @@
 .Nm timeout_triggered ,
 .Nm TIMEOUT_INITIALIZER ,
 .Nm TIMEOUT_INITIALIZER_FLAGS
-.Nd execute a function after a specified period of time
+.Nd execute a function in the future
 .Sh SYNOPSIS
 .In sys/types.h
 .In sys/timeout.h
 .Ft void
-.Fn timeout_set "struct timeout *to" "void (*fn)(void *)" "void *arg"
+.Fo timeout_set
+.Fa "struct timeout *to"
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
 .Ft void
 .Fo timeout_set_flags
 .Fa "struct timeout *to"
 .Fa "void (*fn)(void *)"
 .Fa "void *arg"
+.Fa "int kclock"
 .Fa "int flags"
 .Fc
 .Ft void
-.Fn timeout_set_proc "struct timeout *to" "void (*fn)(void *)" "void *arg"
+.Fo timeout_set_proc
+.Fa "struct timeout *to"
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
 .Ft int
-.Fn timeout_add "struct timeout *to" "int ticks"
+.Fo timeout_add
+.Fa "struct timeout *to"
+.Fa "int nticks"
+.Fc
 .Ft int
-.Fn timeout_del "struct timeout *to"
+.Fo timeout_add_sec
+.Fa "struct timeout *to"
+.Fa "int secs"
+.Fc
 .Ft int
-.Fn timeout_del_barrier "struct timeout *to"
-.Ft void
-.Fn timeout_barrier "struct timeout *to"
+.Fo timeout_add_msec
+.Fa "struct timeout *to"
+.Fa "int msecs"
+.Fc
 .Ft int
-.Fn timeout_pending "struct timeout *to"
+.Fo timeout_add_usec
+.Fa "struct timeout *to"
+.Fa "int usecs"
+.Fc
 .Ft int
-.Fn timeout_initialized "struct timeout *to"
+.Fo timeout_add_nsec
+.Fa "struct timeout *to"
+.Fa "int nsecs"
+.Fc
 .Ft int
-.Fn timeout_triggered "struct timeout *to"
+.Fo timeout_add_tv
+.Fa "struct timeout *to"
+.Fa "struct timeval *tv"
+.Fc
 .Ft int
-.Fn timeout_add_tv "struct timeout *to" "struct timeval *"
+.Fo timeout_rel_nsec
+.Fa "struct timeout *to"
+.Fa "uint64_t nsecs"
+.Fc
 .Ft int
-.Fn timeout_add_sec "struct timeout *to" "int sec"
+.Fo timeout_abs_ts
+.Fa "struct timeout *to"
+.Fa "const struct timespec *abs"
+.Fc
 .Ft int
-.Fn timeout_add_msec "struct timeout *to" "int msec"
+.Fo timeout_del
+.Fa "struct timeout *to"
+.Fc
+.Ft int
+.Fo timeout_del_barrier
+.Fa "struct timeout *to"
+.Fc
+.Ft void
+.Fo timeout_barrier
+.Fa "struct timeout *to"
+.Fc
+.Ft int
+.Fo timeout_pending
+.Fa "struct timeout *to"
+.Fc
 .Ft int
-.Fn timeout_add_usec "struct timeout *to" "int usec"
+.Fo timeout_initialized
+.Fa "struct timeout *to"
+.Fc
 .Ft int
-.Fn timeout_add_nsec "struct timeout *to" "int nsec"
-.Fn TIMEOUT_INITIALIZER "void (*fn)(void *)" "void *arg"
-.Fn TIMEOUT_INITIALIZER_FLAGS "void (*fn)(void *)" "void *arg" "int flags"
+.Fo timeout_triggered
+.Fa "struct timeout *to"
+.Fc
+.Fo TIMEOUT_INITIALIZER
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
+.Fo TIMEOUT_INITIALIZER_FLAGS
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fa "int kclock"
+.Fa "int flags"
+.Fc
 .Sh DESCRIPTION
 The
 .Nm timeout
-API provides a mechanism to execute a function at a given time.
-The granularity of the time is limited by the granularity of the
-.Xr hardclock 9
-timer which

Re: timeout.9: fix description

2022-07-22 Thread Scott Cheloha
> On Jul 22, 2022, at 05:50, Klemens Nanni  wrote:
> 
> NAME has it right:
>... – execute a function after a specified period of time
> 
> but DESCRIPTION says something else:
>The timeout API provides a mechanism to execute a function
>at a given time.
> 
> The latter reads as if I could pass a specific point in time, e.g.
> Fri Jul 22 16:00:00 UTC 2022, at which a function should be run.
> 
> But this API is about timeouts, i.e. a duration of time like 10s,
> which does not my understanding of "at a given time".
> 
> 
> So reuse NAME's wording and make it a proper sentence.
> 
> Feedback? OK?

I rewrote this page a year or so ago but I
think I dropped the patch due to lack of
developer input.  If you give me 12 hours I
will send it out for your consideration.

I would send it sooner, but I accidentally cut
the fiber-optic cable with a shovel last night,
so we all need to wait until this afternoon for
AT to replace it.



[v4] amd64: simplify TSC sync testing

2022-07-20 Thread Scott Cheloha
Hi,

Thanks to everyone who tested v3.

Attached is v4.  I would like to put this into snaps (bcc: deraadt@).

If you've been following along and testing these patches, feel free to
continue testing.  If your results change from v3 to v4, please reply
with what happened and your dmesg.

I made a few small changes from v3:

- Only run the sync test after failing it on TSC_DEBUG kernels.
  For example, it would be a waste of time to run the sync test
  for 62 other CPU pairs if the CPU0/CPU1 sync test failed.

- Pad the tsc_test_status struct by hand.  Try to keep
  tsc_test_status.val onto its own cache line and try to prevent one
  instance of the struct from sharing a cache line with another
  instance.

I am looking for OKs.

Assuming the results from snaps testing aren't catastrophic, and this
version is OK'd, I hope to commit this after a couple weeks in snaps.

There are two things I'm unsure about that I hope a reviewer will
comment on:

- Do we need to keep the double-test?  IIUC the purpose of the
  double-test is to check for drift.  But with this change we no
  longer have a concept of drift.

- Is the LFENCE in tsc_test_ap()/tst_test_bp() sufficient
  to ensure one TSC value predates the other?  Or do I need
  to insert membar_consumer()/membar_producer() calls to
  provide that guarantee?

-Scott

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  20 Jul 2022 21:58:40 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,276 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
}
 
-   if (tsc_drift_observed > TSC_DRIFT_MAX) {
-   printf("ERROR: %lld cycle TSC drift observed\n",
-   (long long)tsc_drift_observed);
-   tsc_timecounter.tc_quality = -1000;
-   tsc_timecounter.tc_user = 0;
-   tsc_is_invariant = 0;
-   }
-
tc_init(_timecounter);
 }
 
-/*
- * Record drift (in clock cycles).  Called during AP startup.
- */
 void
-tsc_sync_drift(int64_t drift)
+tsc_delay(int usecs)
 {
-   if (drift < 0)
-   drift = -drift;
-   if (drift > tsc_drift_observed)
-   tsc_drift_observed = drift;
+   uint64_t interval, start;
+
+   interval = (uint64_t)usecs * tsc_frequency / 100;
+   start = rdtsc_lfence();
+   while (rdtsc_lfence() - start < interval)
+   CPU_BUSY_CYCLE();
 }
 
+#ifdef MULTIPROCESSOR
+
+#define TSC_DEBUG 1
+
+/*
+ * Protections for global variables in this code:
+ *
+ * a   Modified atomically
+ * b   Protected by a barrier
+ * p   Only modified by the primary CPU
+ */
+
+#define TSC_TEST_MS1   /* Test round duration */
+#define TSC_TEST_ROUNDS2   /* Number of test rounds */
+
 /*
- * Called during startup of APs, by the boot processor.  Interrupts
- * are disabled on entry.
+ * tsc_test_status.val is cacheline-aligned (64-byte) to limit
+ * false sharing during the test and reduce our margin of error.
  */
+struct tsc_test_status {
+   volatile uint64_t val;  /* [b] latest RDTSC value */
+   uint64_t pad1[7];
+   uint64_t lag_count; /* [b] number of lags seen by CPU */
+   uint64_t lag_max;   /* [b] Biggest lag seen */
+   int64_t adj;/* [b] initial IA32_TSC_ADJUST value */
+   uint64_t pad2[5];
+} __aligned(64);
+struct tsc_test_status tsc_ap_status;  /* [b] Test results from AP */

Re: [v3] amd64: simplify TSC sync testing

2022-07-20 Thread Scott Cheloha
> On Jul 20, 2022, at 01:48, Masato Asou  wrote:
> 
> Sorry, my latest reply.
> 
> I tested your patch on my Proxmox Virtual Environment on Ryzen7 box.
> It works fine for me.

This VM doesn't have the ITSC CPU flag,
how is it using the TSC as a timecounter?

> OpenBSD 7.1-current (GENERIC.MP) #1: Wed Jul 20 14:15:23 JST 2022
>a...@pve-obsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17162952704 (16367MB)
> avail mem = 16625430528 (15855MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf59c0 (10 entries)
> bios0: vendor SeaBIOS version "rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org" 
> date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC SSDT HPET WAET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Common KVM processor, 3593.56 MHz, 0f-06-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,CMPLEG

Here, no "ITSC".


ts(1): parse input format string only once

2022-07-12 Thread Scott Cheloha
We reduce overhead if we only parse the user's format string once.  To
achieve that, this patch does the following:

- Move "format" into main().  We don't need it as a global anymore.

- Move buffer allocation into fmtfmt().  As claudio@ mentioned in a
  different thread, we need at most (3 * strlen(format) + 1) bytes for
  buf, the parsed format string.  I have added a comment explaining the
  allocation.  I also left an assert(3) to confirm my math.  Unsure
  whether or not to leave the assert(3) in... we only run the assert
  once, so it isn't very costly.

- In fmtfmt(), preallocate a flat 512 bytes for outbuf.  We aren't using
  the 10x allocation for buf anymore, so keeping it for outbuf seems
  arbitrary. If we're going to use a magic number I figure it may as
  well be large enough for practical timestamps and a power of two.
  Feel free to suggest something else.

- Call fmtfmt() where we used to do buffer allocation in main().

- When parsing the user format string in fmtfmt(), keep a list of
  where each microsecond substring lands in buf.  We'll need it later.

- Move the printing part of fmtfmt() into a new function, fmtprint().
  fmtprint() is now called from the main loop instead of fmtfmt().

- In fmtprint(), before calling strftime(3), update any microsecond
  substrings in buf using the list we built earlier in fmtfmt().  Note
  that if there aren't any such substrings we don't call snprintf(3)
  at all.

--

Okay, on to the numbers.  My benchmark input is a million newlines:

$ yes '' | head -n 100 > newline-1M.txt

The benchmark is "real time taken to timestamp the input."

Patched ts(1) is about 45% faster using the empty format string.
N=100.

x ts-head.dat1
+ ts-patch.dat1
N Min Max  Median AvgStddev
x 100   1.74203061.820921   1.7468652   1.7504513   0.010192689
+ 100  0.96225744  0.98482864  0.96404194  0.96658115  0.0052265094
Difference at 99.5% confidence
-0.78387 +/- 0.00353946
-44.781% +/- 0.202203%
(Student's t, pooled s = 0.00809961)

Patched ts(1) is about 25% faster using the default format string,
i.e. '%b %d %H:%M:%S'.  N=100.

x ts-head.dat2
+ ts-patch.dat2
NMinMax MedianAvg   Stddev
x 100  4.7128656  4.9162049  4.7212578  4.7313946  0.026306241
+ 100  3.5083849  3.7382005  3.5126801  3.5271755   0.03256854
Difference at 99.5% confidence
-1.20422 +/- 0.0129365
-25.4517% +/- 0.273418%
(Student's t, pooled s = 0.0296034)

Patched ts(1) is about 10% faster using the format string '%FT%.TZ'.
This format is similar to the ISO 8601 timestamp format but with added
microsecond granularity.  N=100.

x ts-head.dat4
+ ts-patch.dat4
NMinMax MedianAvg   Stddev
x 100  6.5432762  7.0483151  6.5909038  6.6034806  0.065535466
+ 100  5.9177588  6.5244303  5.9288786  5.9535684  0.074405632
Difference at 99.5% confidence
-0.649912 +/- 0.0306379
-9.84196% +/- 0.463966%
(Student's t, pooled s = 0.070111)

All differences are statistically significant at a 99.5 CI.

--

Thoughts?  Tweaks?  ok?

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.8
diff -u -p -r1.8 ts.c
--- ts.c7 Jul 2022 10:40:25 -   1.8
+++ ts.c13 Jul 2022 05:41:55 -
@@ -17,8 +17,10 @@
  */
 
 #include 
+#include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -27,18 +29,25 @@
 #include 
 #include 
 
-static char*format = "%b %d %H:%M:%S";
+SIMPLEQ_HEAD(, usec) usec_queue = SIMPLEQ_HEAD_INITIALIZER(usec_queue);
+struct usec {
+   SIMPLEQ_ENTRY(usec) next;
+   char *pos;
+};
+
 static char*buf;
 static char*outbuf;
 static size_t   bufsize;
 static size_t   obsize;
 
-static void fmtfmt(const struct timespec *);
+static void fmtfmt(const char *);
+static void fmtprint(const struct timespec *);
 static void __dead  usage(void);
 
 int
 main(int argc, char *argv[])
 {
+   char *format = "%b %d %H:%M:%S";
int iflag, mflag, sflag;
int ch, prev;
struct timespec start, now, utc_offset, ts;
@@ -75,18 +84,7 @@ main(int argc, char *argv[])
if ((iflag && sflag) || argc > 1)
usage();
 
-   if (argc == 1)
-   format = *argv;
-
-   bufsize = strlen(format) + 1;
-   if (bufsize > SIZE_MAX / 10)
-   errx(1, "format string too big");
-   bufsize *= 10;
-   obsize = bufsize;
-   if ((buf = calloc(1, bufsize)) == NULL)
-   err(1, NULL);
-   if ((outbuf = calloc(1, obsize)) == NULL)
-   err(1, NULL);
+   fmtfmt(argc == 1 ? *argv : format);
 
/* force UTC for interval calculations */
if (iflag || sflag)
@@ -106,7 +104,7 @@ main(int argc, char *argv[])

Re: echo(1): check for stdio errors

2022-07-11 Thread Scott Cheloha
On Mon, Jul 11, 2022 at 08:31:04AM -0600, Todd C. Miller wrote:
> On Sun, 10 Jul 2022 20:58:35 -0900, Philip Guenther wrote:
> 
> > Three thoughts:
> > 1) Since stdio errors are sticky, is there any real advantage to checking
> > each call instead of just checking the final fclose()?

My thinking was that we have no idea how many arguments we're going to
print, so we may as well fail as soon as possible.

Maybe in more complex programs there would be a code-length or
complexity-reducing upside to deferring the ferror(3) check until,
say, the end of a subroutine or something.

> > [...]
> 
> Will that really catch all errors?  From what I can tell, fclose(3)
> can succeed even if the error flag was set.  The pattern I prefer
> is to use a final fflush(3) followed by a call to ferror(3) before
> the fclose(3).

That's weird, I was under the impression POSIX mandated an error case
for the implicit fflush(3) done by fclose(3).  But I'm looking at the
standard and seeing nothing specific.

So, yes?  It is probably more portable to check fflush(3) explicitly?

This feels redundant though.  Like, obviously I want to flush the
descriptor when we close the stream, and obviously I would want to
know if the flush failed.  That's why I'm using stdio.

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  11 Jul 2022 18:19:39 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



echo(1): check for stdio errors

2022-07-10 Thread Scott Cheloha
ok?

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  10 Jul 2022 22:00:18 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
if (!nflag)
putchar('\n');
+   if (fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
; pvbus0 at mainbus0: bhyve
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 unknown vendor 0x1275 product 0x1275 rev 0x00
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> virtio0 at pci0 dev 2 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio0
> scsibus1 at vioblk0: 1 targets
> sd0 at scsibus1 targ 0 lun 0: 
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors
> virtio0: msix shared
> ahci0 at pci0 dev 3 function 0 "Intel 82801H AHCI" rev 0x00: msi, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus2 at ahci0: 32 targets
> cd0 at scsibus2 targ 0 lun 0:  removable
> virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio1: address 00:a0:98:db:89:86
> virtio1: msix shared
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0 mux 1
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> /dev/ksyms: Symbol table not valid.
> vscsi0 at root
> scsibus3 at vscsi0: 256 targets
> softraid0 at root
> scsibus4 at softraid0: 256 targets
> root on sd0a (31879798ea82ad23.a) swap on sd0b dump on sd0b
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> 
> 
> On 7/5/22 11:06, Scott Cheloha wrote:
> > Hi,
> > 
> > Once again, I am trying to change our approach to TSC sync testing to
> > eliminate false positive results.  Instead of trying to repair the TSC
> > by measuring skew, we just spin in a lockless loop looking for skew
> > and mark the TSC as broken if we detect any.
> > 
> > This is motivated in part by some multisocket machines that do not use
> > the TSC as a timecounter because the current sync test confuses NUMA
> > latency for TSC skew.
> > 
> > The only difference between this version and the prior version (v2) is
> > that we check whether we have the IA32_TSC_ADJUST register by hand in
> > tsc_reset_adjust().  If someone wants to help me rearrange cpu_hatch()
> > so we do CPU identification before TSC sync testing we can remove the
> > workaround later.
> > 
> > If you have the IA32_TSC_ADJUST register and it is non-zero going into
> > the test, you will see something on the console like this:
> > 
> > tsc: cpu5: IA32_TSC_ADJUST: -150 -> 0
> > 
> > This does *not* mean you failed the test.  It just means you probably
> > have a bug in your BIOS (or some other firmware) and you should report
> > it to your vendor.
> > 
> > If you fail the test you will see something like this:
> > 
> > tsc: cpu0/cpu2: sync test round 1/2 failed
> > tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> > 
> > A printout like this would mean that the sync test for cpu2 failed.
> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> > If this happens for *any* CPU we mark the TSC timecounter as
> > defective.
> > 
> > --
> > 
> > Please test!  Send your dmesg, pass or fail.
> > 
> > I am especially interested in:
> > 
> > 1. A test from dv@.  Your dual-socket machine has the IA32_TSC_ADJUST
> > register but it failed the test running patch v2.  Maybe it will pass
> > with this version?
> > 
> > 2. Other multisocket machines.
> > 
> > 3. There were reports of TSC issues with OpenBSD VMs running on ESXi.
> > What happens when you run with this patch?
> > 
> > 4. OpenBSD VMs on other hypervisors.
> > 
> > 5. Non-Lenovo machines, non-Intel machines.
> > 
> > -Scott
> > 
> > Index: amd64/tsc.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > retrieving revision 1.24
> > diff -u -p -r1.24 tsc.c
> > --- amd64/tsc.c 31 Aug 2021 15:11:54 -  1.24
> > +++ amd64/tsc.c 5 Jul 2022 01:52:10 -
> > @@ -36,13 +36,6 @@ int  tsc_recalibrate;
> >   uint64_t  tsc_frequency;
> >   int   tsc_is_invariant;
> > -#defineTSC_DRIFT_MAX   250
> > -#define TSC_SKEW_MAX   100
> > -int64_ttsc_drift_observed;
> > -
> > -volatile int64_t   tsc_sync_val;
> > -volatile struct cpu_info   *tsc_sync_cpu;
> > -
> >   u_int tsc_get_timecount(struct timecounter *tc);
> >   void  tsc_delay(int usecs);
> > @@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timec

Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 01:58:51PM -0700, Mike Larkin wrote:
> On Wed, Jul 06, 2022 at 11:48:41AM -0500, Scott Cheloha wrote:
> > > On Jul 6, 2022, at 11:36 AM, Mike Larkin  wrote:
> > >
> > > On Tue, Jul 05, 2022 at 07:16:26PM -0500, Scott Cheloha wrote:
> > >> On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
> > >>> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
> > >>>>
> > >>>> [...]
> > >>>
> > >>> Here's the output from a 4 socket 80 thread machine.
> > >>
> > >> Oh nice.  I think this is the biggest machine we've tried so far.
> > >>
> > >>> kern.timecounter reports tsc after boot.
> > >>
> > >> Excellent.
> > >>
> > >>> Looks like this machine doesn't have the adjust MSR?
> > >>
> > >> IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
> > >> 2011 or 2012.  I can't find the exact revision.
> > >>
> > >> (I really wish there was a comprehensive version history for this sort
> > >> of thing, i.e. this MSR first appeared in the blah-blah uarch, this
> > >> instruction is available on all uarchs after yada-yada, etc.)
> > >>
> > >> There are apparently several versions of the E7-4870 in the E7
> > >> "family".  If your CPU predates that, or launched 2012-2014, the MSR
> > >> may not have made the cut.
> > >>
> > >> An aside: I cannot find any evidence of AMD supporting this MSR in any
> > >> processor.  It would be really, really nice if they did.  If you (or
> > >> anyone reading) knows anything about this, or whether they have an
> > >> equivalent MSR, shout it out.
> > >>
> > >>> Other than that, machine seems stable.
> > >>
> > >> Good, glad to hear it.  Thank you for testing.
> > >>
> > >> Has this machine had issues using the TSC on -current in the past?
> > >>
> > >> (If you have the time) what does the dmesg look like on the -current
> > >> kernel with TSC_DEBUG enabled?
> > >
> > > Looks like you enabled TSC_DEBUG in your diff, so what I sent you is what 
> > > you
> > > are asking for...?
> >
> > No, I mean on the -current *unpatched* kernel.  Sorry if that wasn't
> > clear.
> >
> > Our -current kernel prints more detailed information if TSC_DEBUG
> > is enabled.  In particular, I'm curious if the unpatched kernel
> > detects any skew or drift on your machine, and if so, how much.
> >
> 
> here you go. I didnt run with all 80 cpus since -current doesnt have my
> " > 64 cpus" diff, but I think this is what you're after in any case.

Yes!  This is what I was looking for, thanks.

> cpu0: TSC skew=0 observed drift=0
> cpu1: TSC skew=112 observed drift=0
> cpu2: TSC skew=102 observed drift=0
> cpu3: TSC skew=-134 observed drift=0
> cpu4: TSC skew=4 observed drift=0
> cpu5: TSC skew=68 observed drift=0
> cpu6: TSC skew=22 observed drift=0
> cpu7: TSC skew=-52 observed drift=0
> cpu8: TSC skew=8 observed drift=0
> cpu9: TSC skew=-18 observed drift=0
> cpu10: TSC skew=10 observed drift=0
> cpu11: TSC skew=76 observed drift=0
> cpu12: TSC skew=-2 observed drift=0
> cpu13: TSC skew=-4 observed drift=0
> cpu14: TSC skew=-2 observed drift=0
> cpu15: TSC skew=-28 observed drift=0
> cpu16: TSC skew=6 observed drift=0
> cpu17: TSC skew=-8 observed drift=0
> cpu18: TSC skew=0 observed drift=0
> cpu19: TSC skew=-32 observed drift=0
> cpu20: TSC skew=0 observed drift=0
> cpu21: TSC skew=-26 observed drift=0
> cpu22: TSC skew=0 observed drift=0
> cpu23: TSC skew=22 observed drift=0
> cpu24: TSC skew=-12 observed drift=0
> cpu25: TSC skew=-14 observed drift=0
> cpu26: TSC skew=76 observed drift=0
> cpu27: TSC skew=-64 observed drift=0
> cpu28: TSC skew=-2 observed drift=0
> cpu29: TSC skew=34 observed drift=0
> cpu30: TSC skew=22 observed drift=0
> cpu31: TSC skew=-58 observed drift=0
> cpu32: TSC skew=-2 observed drift=0
> cpu33: TSC skew=6 observed drift=0
> cpu34: TSC skew=46 observed drift=0
> cpu35: TSC skew=20 observed drift=0
> cpu36: TSC skew=34 observed drift=0
> cpu37: TSC skew=-8 observed drift=0
> cpu38: TSC skew=48 observed drift=0
> cpu39: TSC skew=-10 observed drift=0
> cpu40: TSC skew=0 observed drift=0
> cpu41: TSC skew=72 observed drift=0
> cpu42: TSC skew=2 observed drift=0
> cpu43: TSC skew=-46 observed drift=0
> cpu44: TSC skew=-2 observed drift=0
> cpu45: TSC skew=-14 observed drift=0
> cpu46: TSC skew=-2 observed drift=0
> cpu47: TSC skew=-32 observed drift=0
> cpu48: TSC skew=12 observed drift=0
> cpu49: TSC skew=-16 observed drift=0
> cpu50: TSC skew=84 observed drift=0
> cpu51: TSC skew=-44 observed drift=0
> cpu52: TSC skew=-4 observed drift=0
> cpu53: TSC skew=4 observed drift=0
> cpu54: TSC skew=16 observed drift=0
> cpu55: TSC skew=-56 observed drift=0
> cpu56: TSC skew=-10 observed drift=0
> cpu57: TSC skew=6 observed drift=0
> cpu58: TSC skew=6 observed drift=0
> cpu59: TSC skew=-40 observed drift=0
> cpu60: TSC skew=-4 observed drift=0
> cpu61: TSC skew=-6 observed drift=0
> cpu62: TSC skew=74 observed drift=0
> cpu63: TSC skew=-48 observed drift=0



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 08:20:05PM -0400, Mohamed Aslan wrote:
> > First, you need to update to the latest firmware.  Maybe they already
> > fixed the problem.  I don't see any mention of the TSC in the BIOS
> > changelog for the e495 but maybe you'll get lucky.
> > 
> > Second, if they haven't fixed the problem with the latest firmware, I
> > recommend you reach out to Lenovo and report the problem.
> > 
> > Lenovo seem to have been sympathetic to reports about TSC desync in
> > the past on other models and issued firmware fixes.  For example,
> > the v1.28 firmware for the ThinkPad A485 contained a fix for what
> > I assume is a very similar problem to the one you're having:
> > 
> > https://download.lenovo.com/pccbbs/mobiles/r0wuj65wd.txt
> > 
> > And this forum post, for example, got some response from Lenovo staff:
> > 
> > https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T-series-Laptops/T14s-G1-AMD-TSC-clock-unusable/m-p/5070296?page=1
> > 
> > So, open a post for your model and cite the other posts.
> > 
> > They might not be sympathetic to the fact that you're seeing the issue
> > on OpenBSD.  If that's a problem you should be able to reproduce the
> > problem with a recent Linux kernel.  The Linux kernel runs a similar
> > sync test during boot and will complain if the TSCs are not
> > synchronized.
> > 
> > Honestly, to save time you may want to just boot up a supported Linux
> > distribution and grab the error message before you ask for support.
> > 
> 
> I can confirm that this is also the case with Linux. This is the
> output of dmesg on Void Linux:
> 
> [0.00] tsc: Fast TSC calibration using PIT  
> [0.00] tsc: Detected 2096.114 MHz processor
> ...
> ...
> [1.314252] TSC synchronization [CPU#0 -> CPU#1]:
> [1.314252] Measured 6615806646 cycles TSC warp between CPUs, turning off 
> TSC clock.
> [1.314252] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [1.314397]   #2  #3  #4  #5  #6  #7

This is good news.  My code isn't the only code finding a problem :)

> 
> Not sure if Void is a Lenovo supported Linux distribution, still
> though I think it's worth reporting.

Probably not.  Your laptop may not even be "Linux certified",
but it's worth reporting all the same.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
> On Jul 6, 2022, at 10:04 AM, Christian Weisgerber  wrote:
> 
> Scott Cheloha:
> 
>>> kern.timecounter.tick=1
>>> kern.timecounter.timestepwarnings=0
>>> kern.timecounter.hardware=i8254
>>> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
>> 
>> This is expected behavior with the patch.
>> 
>> cpu0's TSC is way out of sync with every
>> other CPU's TSC, so the TSC is marked
>> as a bad timecounter and a different one is
>> chosen.
> 
> Shouldn't it pick acpihpet0 then?

It depends on the order the timecounters are installed.
If acpihpet0 is already installed before we degrade the
TSC's .quality value then the timecounter subsystem won't
switch to it when we install the next counter because it
assumes .quality values cannot change on the fly (a
reasonable assumption).

We don't yet have a tc_detach(9) function that uninstalls
a timecounter cleanly and chooses the next best counter
available.

This is something I want to add in a future patch.  FreeBSD
has something similar.  It may be called "tc_ban", iirc.

The alternative is to wait until we've tested synchronization
for every CPU before calling tc_init(9).  This approach is
more annoying, though, as it requires additional state.  We
would also still have the same problem when we resume from
suspend.

The dream is to be able to do something like this during the
sync test:

if (tsc_sync_test_failed) {
tc_detach(_timecounter);
tsc_timecounter.quality = -2000;
tc_init(_timecounter);
}

When we call tc_detach(9) the timecounter code would pick
the next best counter automagically.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
> On Jul 6, 2022, at 11:36 AM, Mike Larkin  wrote:
> 
> On Tue, Jul 05, 2022 at 07:16:26PM -0500, Scott Cheloha wrote:
>> On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
>>> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
>>>> 
>>>> [...]
>>> 
>>> Here's the output from a 4 socket 80 thread machine.
>> 
>> Oh nice.  I think this is the biggest machine we've tried so far.
>> 
>>> kern.timecounter reports tsc after boot.
>> 
>> Excellent.
>> 
>>> Looks like this machine doesn't have the adjust MSR?
>> 
>> IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
>> 2011 or 2012.  I can't find the exact revision.
>> 
>> (I really wish there was a comprehensive version history for this sort
>> of thing, i.e. this MSR first appeared in the blah-blah uarch, this
>> instruction is available on all uarchs after yada-yada, etc.)
>> 
>> There are apparently several versions of the E7-4870 in the E7
>> "family".  If your CPU predates that, or launched 2012-2014, the MSR
>> may not have made the cut.
>> 
>> An aside: I cannot find any evidence of AMD supporting this MSR in any
>> processor.  It would be really, really nice if they did.  If you (or
>> anyone reading) knows anything about this, or whether they have an
>> equivalent MSR, shout it out.
>> 
>>> Other than that, machine seems stable.
>> 
>> Good, glad to hear it.  Thank you for testing.
>> 
>> Has this machine had issues using the TSC on -current in the past?
>> 
>> (If you have the time) what does the dmesg look like on the -current
>> kernel with TSC_DEBUG enabled?
> 
> Looks like you enabled TSC_DEBUG in your diff, so what I sent you is what you
> are asking for...?

No, I mean on the -current *unpatched* kernel.  Sorry if that wasn't
clear.

Our -current kernel prints more detailed information if TSC_DEBUG
is enabled.  In particular, I'm curious if the unpatched kernel
detects any skew or drift on your machine, and if so, how much.



Re: [v3] amd64: simplify TSC sync testing

2022-07-06 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 01:48:39AM -0400, Mohamed Aslan wrote:
> > This is expected behavior with the patch.
> > 
> > cpu0's TSC is way out of sync with every
> > other CPU's TSC, so the TSC is marked
> > as a bad timecounter and a different one is
> > chosen.
> 
> Yes, I can see. Just want to add that without your latest patch the
> kernel chooses the TSC as clocksource, however only the *user* TSC
> was disabled (cpu1: disabling user TSC (skew=-5028216492)).
> 
> > Are you running the latest BIOS available
> > for your machine?
> 
> No, I don't think I am.

First, you need to update to the latest firmware.  Maybe they already
fixed the problem.  I don't see any mention of the TSC in the BIOS
changelog for the e495 but maybe you'll get lucky.

Second, if they haven't fixed the problem with the latest firmware, I
recommend you reach out to Lenovo and report the problem.

Lenovo seem to have been sympathetic to reports about TSC desync in
the past on other models and issued firmware fixes.  For example,
the v1.28 firmware for the ThinkPad A485 contained a fix for what
I assume is a very similar problem to the one you're having:

https://download.lenovo.com/pccbbs/mobiles/r0wuj65wd.txt

And this forum post, for example, got some response from Lenovo staff:

https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T-series-Laptops/T14s-G1-AMD-TSC-clock-unusable/m-p/5070296?page=1

So, open a post for your model and cite the other posts.

They might not be sympathetic to the fact that you're seeing the issue
on OpenBSD.  If that's a problem you should be able to reproduce the
problem with a recent Linux kernel.  The Linux kernel runs a similar
sync test during boot and will complain if the TSCs are not
synchronized.

Honestly, to save time you may want to just boot up a supported Linux
distribution and grab the error message before you ask for support.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
> On Jul 5, 2022, at 23:02, Mohamed Aslan  wrote:
> 
> Hi,
> 
> Apologies. My bad, I applied the latest patch but booted into another
> kernel with an earlier patch!
> 
> Here's what I got with your latest patch:
> 
> $ dmesg | grep 'tsc'
> tsc: cpu0/cpu1: sync test round 1/2 failed
> tsc: cpu0/cpu1: cpu0: 40162 lags 5112675666 cycles
> tsc: cpu0/cpu2: sync test round 1/2 failed
> tsc: cpu0/cpu2: cpu0: 18995 lags 5112675645 cycles
> tsc: cpu0/cpu3: sync test round 1/2 failed
> tsc: cpu0/cpu3: cpu0: 19136 lags 5112675645 cycles
> tsc: cpu0/cpu4: sync test round 1/2 failed
> tsc: cpu0/cpu4: cpu0: 19451 lags 5112675645 cycles
> tsc: cpu0/cpu5: sync test round 1/2 failed
> tsc: cpu0/cpu5: cpu0: 18625 lags 5112675645 cycles
> tsc: cpu0/cpu6: sync test round 1/2 failed
> tsc: cpu0/cpu6: cpu0: 18208 lags 5112675645 cycles
> tsc: cpu0/cpu7: sync test round 1/2 failed
> tsc: cpu0/cpu7: cpu0: 17739 lags 5112675645 cycles
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=i8254
> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)

This is expected behavior with the patch.

cpu0's TSC is way out of sync with every
other CPU's TSC, so the TSC is marked
as a bad timecounter and a different one is
chosen.

Are you running the latest BIOS available
for your machine?



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
> On Jul 5, 2022, at 21:31, Mohamed Aslan  wrote:
> 
> Hello,
> 
> I just tested your patch on my lenovo e495 laptop, unfortunately
> still no tsc.
> 
> $ dmesg | grep 'tsc:'
> tsc: cpu0/cpu1 sync round 1: 20468 regressions
> tsc: cpu0/cpu1 sync round 1: cpu0 lags cpu1 by 5351060292 cycles
> tsc: cpu0/cpu1 sync round 1: cpu1 lags cpu0 by 0 cycles
> tsc: cpu0/cpu2 sync round 1: 10272 regressions
> tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 5351060271 cycles
> tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 0 cycles
> tsc: cpu0/cpu3 sync round 1: 9709 regressions
> tsc: cpu0/cpu3 sync round 1: cpu0 lags cpu3 by 5351060271 cycles
> tsc: cpu0/cpu3 sync round 1: cpu3 lags cpu0 by 0 cycles
> tsc: cpu0/cpu4 sync round 1: 10386 regressions
> tsc: cpu0/cpu4 sync round 1: cpu0 lags cpu4 by 5351060271 cycles
> tsc: cpu0/cpu4 sync round 1: cpu4 lags cpu0 by 0 cycles
> tsc: cpu0/cpu5 sync round 1: 10408 regressions
> tsc: cpu0/cpu5 sync round 1: cpu0 lags cpu5 by 5351060271 cycles
> tsc: cpu0/cpu5 sync round 1: cpu5 lags cpu0 by 0 cycles
> tsc: cpu0/cpu6 sync round 1: 10102 regressions
> tsc: cpu0/cpu6 sync round 1: cpu0 lags cpu6 by 5351060271 cycles
> tsc: cpu0/cpu6 sync round 1: cpu6 lags cpu0 by 0 cycles
> tsc: cpu0/cpu7 sync round 1: 9336 regressions
> tsc: cpu0/cpu7 sync round 1: cpu0 lags cpu7 by 5351060271 cycles
> tsc: cpu0/cpu7 sync round 1: cpu7 lags cpu0 by 0 cycles

This is not the latest patch.

Please apply the latest patch and try again.

If possible, please also include your dmesg
from a -current kernel with the TSC_DEBUG
option set.



Re: powerpc, macppc: retrigger deferred DEC interrupts from splx(9)

2022-07-05 Thread Scott Cheloha
On Thu, Jun 23, 2022 at 09:58:48PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> Thoughts?  Tweaks?
> 
> [...]

miod: Any issues?

kettenis:  Anything to add?  ok?

drahn:  Anything to add?  ok?

--

It would be nice (but not strictly necessary) to test this on a
machine doing "real work".

Who does the macppc package builds?

Index: macppc/macppc/clock.c
===
RCS file: /cvs/src/sys/arch/macppc/macppc/clock.c,v
retrieving revision 1.48
diff -u -p -r1.48 clock.c
--- macppc/macppc/clock.c   23 Feb 2021 04:44:30 -  1.48
+++ macppc/macppc/clock.c   24 Jun 2022 02:49:58 -
@@ -128,6 +128,20 @@ decr_intr(struct clockframe *frame)
return;
 
/*
+* We can't actually mask DEC interrupts, i.e. mask MSR(EE),
+* at or above IPL_CLOCK without masking other essential
+* interrupts.  To simulate masking, we retrigger the DEC
+* by hand from splx(9) the next time our IPL drops below
+* IPL_CLOCK.
+*/
+   if (ci->ci_cpl >= IPL_CLOCK) {
+   ci->ci_dec_deferred = 1;
+   ppc_mtdec(UINT32_MAX >> 1); /* clear DEC exception */
+   return;
+   }
+   ci->ci_dec_deferred = 0;
+
+   /*
 * Based on the actual time delay since the last decrementer reload,
 * we arrange for earlier interrupt next time.
 */
@@ -160,39 +174,35 @@ decr_intr(struct clockframe *frame)
 */
ppc_mtdec(nextevent - tb);
 
-   if (ci->ci_cpl >= IPL_CLOCK) {
-   ci->ci_statspending += nstats;
-   } else {
-   nstats += ci->ci_statspending;
-   ci->ci_statspending = 0;
-
-   s = splclock();
-
-   /*
-* Reenable interrupts
-*/
-   ppc_intr_enable(1);
-
-   /*
-* Do standard timer interrupt stuff.
-*/
-   while (ci->ci_lasttb < ci->ci_prevtb) {
-   /* sync lasttb with hardclock */
-   ci->ci_lasttb += ticks_per_intr;
-   clk_count.ec_count++;
-   hardclock(frame);
-   }
-
-   while (nstats-- > 0)
-   statclock(frame);
-
-   splx(s);
-   (void) ppc_intr_disable();
-
-   /* if a tick has occurred while dealing with these,
-* dont service it now, delay until the next tick.
-*/
+   nstats += ci->ci_statspending;
+   ci->ci_statspending = 0;
+
+   s = splclock();
+
+   /*
+* Reenable interrupts
+*/
+   ppc_intr_enable(1);
+
+   /*
+* Do standard timer interrupt stuff.
+*/
+   while (ci->ci_lasttb < ci->ci_prevtb) {
+   /* sync lasttb with hardclock */
+   ci->ci_lasttb += ticks_per_intr;
+   clk_count.ec_count++;
+   hardclock(frame);
}
+
+   while (nstats-- > 0)
+   statclock(frame);
+
+   splx(s);
+   (void) ppc_intr_disable();
+
+   /* if a tick has occurred while dealing with these,
+* dont service it now, delay until the next tick.
+*/
 }
 
 void cpu_startclock(void);
Index: macppc/dev/openpic.c
===
RCS file: /cvs/src/sys/arch/macppc/dev/openpic.c,v
retrieving revision 1.89
diff -u -p -r1.89 openpic.c
--- macppc/dev/openpic.c21 Feb 2022 10:38:50 -  1.89
+++ macppc/dev/openpic.c24 Jun 2022 02:49:59 -
@@ -382,6 +382,10 @@ openpic_splx(int newcpl)
 
intr = ppc_intr_disable();
openpic_setipl(newcpl);
+   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
+   ppc_mtdec(0);
+   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
if (newcpl < IPL_SOFTTTY && (ci->ci_ipending & ppc_smask[newcpl])) {
s = splsofttty();
dosoftint(newcpl);
Index: macppc/dev/macintr.c
===
RCS file: /cvs/src/sys/arch/macppc/dev/macintr.c,v
retrieving revision 1.56
diff -u -p -r1.56 macintr.c
--- macppc/dev/macintr.c13 Mar 2022 12:33:01 -  1.56
+++ macppc/dev/macintr.c24 Jun 2022 02:49:59 -
@@ -170,6 +170,10 @@ macintr_splx(int newcpl)
 
intr = ppc_intr_disable();
macintr_setipl(newcpl);
+   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
+   ppc_mtdec(0);
+   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
+   }
if ((newcpl < IPL_SOFTTTY && ci->ci_ipending & ppc_smask[newcpl])) {
s = splsofttty();
dosoftint(newcpl);
Index: powerp

Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Wed, Jul 06, 2022 at 09:14:03AM +0900, Yuichiro NAITO wrote:
> Hi, Scott.
> 
> I tested your patch on my OpenBSD running on ESXi.
> It works fine for me and I never see monotonic clock going backward.
> There is nothing extra messages in my dmesg.

Great!  Thanks for testing.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 01:38:32PM -0700, Mike Larkin wrote:
> On Mon, Jul 04, 2022 at 09:06:55PM -0500, Scott Cheloha wrote:
> > 
> > [...]
> 
> Here's the output from a 4 socket 80 thread machine.

Oh nice.  I think this is the biggest machine we've tried so far.

> kern.timecounter reports tsc after boot.

Excellent.

> Looks like this machine doesn't have the adjust MSR?

IA32_TSC_ADJUST first appears in the Intel SDM Vol. 3 some time in
2011 or 2012.  I can't find the exact revision.

(I really wish there was a comprehensive version history for this sort
of thing, i.e. this MSR first appeared in the blah-blah uarch, this
instruction is available on all uarchs after yada-yada, etc.)

There are apparently several versions of the E7-4870 in the E7
"family".  If your CPU predates that, or launched 2012-2014, the MSR
may not have made the cut.

An aside: I cannot find any evidence of AMD supporting this MSR in any
processor.  It would be really, really nice if they did.  If you (or
anyone reading) knows anything about this, or whether they have an
equivalent MSR, shout it out.

> Other than that, machine seems stable.

Good, glad to hear it.  Thank you for testing.

Has this machine had issues using the TSC on -current in the past?

(If you have the time) what does the dmesg look like on the -current
kernel with TSC_DEBUG enabled?



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 06:40:26PM +0200, Stuart Henderson wrote:
> On 2022/07/05 11:22, Scott Cheloha wrote:
> > On Tue, Jul 05, 2022 at 05:47:51PM +0200, Stuart Henderson wrote:
> > > On 2022/07/04 21:06, Scott Cheloha wrote:
> > > > 4. OpenBSD VMs on other hypervisors.
> > > 
> > > KVM on proxmox VE 7.1-12
> > > 
> > > I force acpihpet0 on this; it defaults to pvclock which results in
> > > timekeeping so bad that ntpd can't correct
> > 
> > That is an interesting problem.  Probably worth looking at pvclock(4)
> > separately.
> > 
> > > $ sysctl kern.timecounter
> > > kern.timecounter.tick=1
> > > kern.timecounter.timestepwarnings=0
> > > kern.timecounter.hardware=acpihpet0
> > > kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
> > > acpitimer0(1000)
> > > 
> > > OpenBSD 7.1-current (GENERIC.MP) #45: Tue Jul  5 16:11:00 BST 2022
> > > st...@bamboo.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> > > real mem = 8573001728 (8175MB)
> > > avail mem = 8295833600 (7911MB)
> > > random: good seed from bootblocks
> > > mpath0 at root
> > > scsibus0 at mpath0: 256 targets
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf58c0 (10 entries)
> > > bios0: vendor SeaBIOS version 
> > > "rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org" date 04/01/2014
> > > bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> > > acpi0 at bios0: ACPI 1.0
> > > acpi0: sleep states S3 S4 S5
> > > acpi0: tables DSDT FACP APIC SSDT HPET WAET
> > > acpi0: wakeup devices
> > > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD Ryzen 5 PRO 5650G with Radeon Graphics, 3893.04 MHz, 19-50-00
> > > cpu0: 
> > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,CPCTR,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBRS,IBPB,STIBP,SSBD,IBPB,IBRS,STIBP,SSBD,VIRTSSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > 
> > This machine doesn't have the ITSC flag, so we would never consider
> > using the TSC as a timecounter.  The sync test is not run, but that
> > makes sense.
> > 
> > ... is that expected?  Should the machine have the ITSC flag?
> > 
> > (I'm not familiar with Proxmox.)
> > 
> 
> No idea to be honest. The cpu type is set to "host" so it should pass
> things through, but perhaps it deliberately filters out ITSC. Mostly
> wanted to point it out as a "doesn't make things worse" (and because
> you specifically wanted tests on other VMs :)

Gotcha, that's okay then.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 05:47:51PM +0200, Stuart Henderson wrote:
> On 2022/07/04 21:06, Scott Cheloha wrote:
> > 4. OpenBSD VMs on other hypervisors.
> 
> KVM on proxmox VE 7.1-12
> 
> I force acpihpet0 on this; it defaults to pvclock which results in
> timekeeping so bad that ntpd can't correct

That is an interesting problem.  Probably worth looking at pvclock(4)
separately.

> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
> acpitimer0(1000)
> 
> OpenBSD 7.1-current (GENERIC.MP) #45: Tue Jul  5 16:11:00 BST 2022
> st...@bamboo.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8573001728 (8175MB)
> avail mem = 8295833600 (7911MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf58c0 (10 entries)
> bios0: vendor SeaBIOS version "rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org" 
> date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC SSDT HPET WAET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 PRO 5650G with Radeon Graphics, 3893.04 MHz, 19-50-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,CPCTR,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBRS,IBPB,STIBP,SSBD,IBPB,IBRS,STIBP,SSBD,VIRTSSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES

This machine doesn't have the ITSC flag, so we would never consider
using the TSC as a timecounter.  The sync test is not run, but that
makes sense.

... is that expected?  Should the machine have the ITSC flag?

(I'm not familiar with Proxmox.)



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 05:38:04PM +0200, Stuart Henderson wrote:
> On 2022/07/04 21:06, Scott Cheloha wrote:
> > 2. Other multisocket machines.
> 
> This is from the R620 where I originally discovered the problems
> with SMP with the previous TSC test:
> 
> $ dmesg|grep tsc
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> 
> --- old   Tue Jul  5 15:34:06 2022
> +++ new   Tue Jul  5 15:34:08 2022
> @@ -1,7 +1,7 @@
> [snip]

Okay, so on the -current kernel the TSC is marked defective, but with
this patch (v3) the TSC is fine: you get no printouts on the console
from the TSC module.

Good, excellent.

Thank you for testing again.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 10:53:43AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > On Tue, Jul 05, 2022 at 07:15:31AM -0400, Dave Voutila wrote:
> >>
> >> Scott Cheloha  writes:
> >>
> >> > [...]
> >> >
> >> > If you fail the test you will see something like this:
> >> >
> >> >  tsc: cpu0/cpu2: sync test round 1/2 failed
> >> >  tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> >> >
> >> > A printout like this would mean that the sync test for cpu2 failed.
> >> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> >> > If this happens for *any* CPU we mark the TSC timecounter as
> >> > defective.
> >>
> >> I think this passes now on my dual-socket Xeon box?
> >
> > Yes, it passes.  The timecounter on your machine should still have a
> > quality of 2000, i.e. we didn't mark it defective.
> >
> >> Full dmesg at the end of the email[1], but just the `tsc:' lines look
> >> like:
> >>
> >> $ grep tsc dmesg.txt
> >> tsc: cpu0: IA32_TSC_ADJUST: -5774382067215574 -> 0
> >> tsc: cpu1: IA32_TSC_ADJUST: -5774382076335870 -> 0
> >> tsc: cpu2: IA32_TSC_ADJUST: -5774382073829798 -> 0
> >> tsc: cpu3: IA32_TSC_ADJUST: -5774382071913818 -> 0
> >> tsc: cpu4: IA32_TSC_ADJUST: -5774382075956770 -> 0
> >> tsc: cpu5: IA32_TSC_ADJUST: -5774382074583181 -> 0
> >> tsc: cpu6: IA32_TSC_ADJUST: -5774382073199574 -> 0
> >> tsc: cpu7: IA32_TSC_ADJUST: -5774382076500135 -> 0
> >> tsc: cpu8: IA32_TSC_ADJUST: -5774382074705354 -> 0
> >> tsc: cpu9: IA32_TSC_ADJUST: -5774382075954945 -> 0
> >> tsc: cpu10: IA32_TSC_ADJUST: -5774382070567294 -> 0
> >> tsc: cpu11: IA32_TSC_ADJUST: -5774382075968443 -> 0
> >> tsc: cpu12: IA32_TSC_ADJUST: -5774382067353478 -> 0
> >> tsc: cpu13: IA32_TSC_ADJUST: -5774382071926523 -> 0
> >> tsc: cpu14: IA32_TSC_ADJUST: -5774382074619890 -> 0
> >> tsc: cpu15: IA32_TSC_ADJUST: -5774382070107058 -> 0
> >> tsc: cpu16: IA32_TSC_ADJUST: -5774382076196640 -> 0
> >> tsc: cpu17: IA32_TSC_ADJUST: -5774382075090665 -> 0
> >> tsc: cpu18: IA32_TSC_ADJUST: -5774382073529646 -> 0
> >> tsc: cpu19: IA32_TSC_ADJUST: -5774382076443616 -> 0
> >> tsc: cpu20: IA32_TSC_ADJUST: -5774382074994536 -> 0
> >> tsc: cpu21: IA32_TSC_ADJUST: -5774382076309520 -> 0
> >> tsc: cpu22: IA32_TSC_ADJUST: -5774382070947686 -> 0
> >> tsc: cpu23: IA32_TSC_ADJUST: -5774382073056320 -> 0
> >
> > Fascinating.  Wonder what the heck it's doing down there.
> >
> >> It does look like there's a newer BIOS version for this machine, so I'll
> >> try updating it later today and repeating the test to see if anything
> >> changes.
> 
> After a BIOS update, still similar output.
> 
> "new" bios:
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec0f0 (105 entries)
> bios0: vendor Dell Inc. version "A34" date 10/19/2020
> bios0: Dell Inc. Precision Tower 7810
> 
> $ dmesg | grep tsc
> tsc: cpu0: IA32_TSC_ADJUST: -4070378216 -> 0
> tsc: cpu1: IA32_TSC_ADJUST: -4081094631 -> 0
> tsc: cpu2: IA32_TSC_ADJUST: -4078853396 -> 0
> tsc: cpu3: IA32_TSC_ADJUST: -4074362824 -> 0
> tsc: cpu4: IA32_TSC_ADJUST: -4080872645 -> 0
> tsc: cpu5: IA32_TSC_ADJUST: -4075673830 -> 0
> tsc: cpu6: IA32_TSC_ADJUST: -4081906959 -> 0
> tsc: cpu7: IA32_TSC_ADJUST: -4073006269 -> 0
> tsc: cpu8: IA32_TSC_ADJUST: -4081803214 -> 0
> tsc: cpu9: IA32_TSC_ADJUST: -4081294540 -> 0
> tsc: cpu10: IA32_TSC_ADJUST: -4079817920 -> 0
> tsc: cpu11: IA32_TSC_ADJUST: -4079871039 -> 0
> tsc: cpu12: IA32_TSC_ADJUST: -4070522580 -> 0
> tsc: cpu13: IA32_TSC_ADJUST: -4077205405 -> 0
> tsc: cpu14: IA32_TSC_ADJUST: -4081797309 -> 0
> tsc: cpu15: IA32_TSC_ADJUST: -4078574630 -> 0
> tsc: cpu16: IA32_TSC_ADJUST: -4081539272 -> 0
> tsc: cpu17: IA32_TSC_ADJUST: -4079657247 -> 0
> tsc: cpu18: IA32_TSC_ADJUST: -4080469326 -> 0
> tsc: cpu19: IA32_TSC_ADJUST: -4073404194 -> 0
> tsc: cpu20: IA32_TSC_ADJUST: -4081473720 -> 0
> tsc: cpu21: IA32_TSC_ADJUST: -4076195877 -> 0
> tsc: cpu22: IA32_TSC_ADJUST: -4077876814 -> 0
> tsc: cpu23: IA32_TSC_ADJUST: -4081863303 -> 0
> 
> And still a quality tsc :) :
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)

Alrighty, that's &qu

Re: ts(1): make timespec-handling code more obvious

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 11:53:26AM +0200, Claudio Jeker wrote:
> On Tue, Jul 05, 2022 at 11:34:21AM +, Job Snijders wrote:
> > On Tue, Jul 05, 2022 at 11:08:13AM +0200, Claudio Jeker wrote:
> > > On Mon, Jul 04, 2022 at 05:10:05PM -0500, Scott Cheloha wrote:
> > > > On Mon, Jul 04, 2022 at 11:15:24PM +0200, Claudio Jeker wrote:
> > > > > On Mon, Jul 04, 2022 at 01:28:12PM -0500, Scott Cheloha wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Couple things:
> > > > > > 
> > > > > > [...]
> > > > > 
> > > > > I don't like the introduction of all these local variables that are 
> > > > > just
> > > > > hard to follow and need extra code pathes. Happy to rename roff to 
> > > > > offset,
> > > > > start_offset or something similar. Also moving the localtime call into
> > > > > fmtfmt() is fine.
> > > > 
> > > > You need an "elapsed" variable to avoid overwriting "now" in the
> > > > -i flag case to avoid calling clock_gettime(2) twice.
> > > > 
> > > > We can get rid of "utc_start" and just reuse "now" for the initial
> > > > value of CLOCK_REALTIME.
> > > > 
> > > > How is this?
> > > 
> > > How about this instead?
> > 
> > Looks like an improvement
> > 
> > The suggestion to change 'ms' to 'us' might be a good one to roll into
> > this changeset too.
> 
> Ah right, we print us not ms.
>  
> > nitpick: the changeset doesn't apply cleanly:
> 
> Forgot to update that tree :)
> 
> Updated diff below

This is fine by me, you took most of what I wanted, and even
the "ms" -> "us" name change :)

One nit below, otherwise: ok cheloha@

> Index: ts.c
> ===
> RCS file: /cvs/src/usr.bin/ts/ts.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 ts.c
> --- ts.c  4 Jul 2022 17:29:03 -   1.6
> +++ ts.c  5 Jul 2022 09:51:38 -
> @@ -32,7 +32,7 @@ static char *buf;
>  static char  *outbuf;
>  static size_t bufsize;
>  
> -static void   fmtfmt(struct tm *, long);
> +static void   fmtfmt(const struct timespec *);
>  static void __deadusage(void);
>  
>  int
> @@ -40,8 +40,7 @@ main(int argc, char *argv[])
>  {
>   int iflag, mflag, sflag;
>   int ch, prev;
> - struct timespec roff, start, now;
> - struct tm *tm;
> + struct timespec start, now, utc_offset, ts;
>   clockid_t clock = CLOCK_REALTIME;
>  
>   if (pledge("stdio", NULL) == -1)
> @@ -93,22 +92,22 @@ main(int argc, char *argv[])
>   if (setenv("TZ", "UTC", 1) == -1)
>   err(1, "setenv UTC");
>  
> - clock_gettime(CLOCK_REALTIME, );
>   clock_gettime(clock, );
> - timespecsub(, , );
> + clock_gettime(CLOCK_REALTIME, _offset);
> + timespecsub(_offset, , _offset);

You don't need to initialize utc_offset except in the -m flag case.



Re: [v3] amd64: simplify TSC sync testing

2022-07-05 Thread Scott Cheloha
On Tue, Jul 05, 2022 at 07:15:31AM -0400, Dave Voutila wrote:
> 
> Scott Cheloha  writes:
> 
> > [...]
> >
> > If you fail the test you will see something like this:
> >
> > tsc: cpu0/cpu2: sync test round 1/2 failed
> > tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles
> >
> > A printout like this would mean that the sync test for cpu2 failed.
> > In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
> > If this happens for *any* CPU we mark the TSC timecounter as
> > defective.
> 
> I think this passes now on my dual-socket Xeon box?

Yes, it passes.  The timecounter on your machine should still have a
quality of 2000, i.e. we didn't mark it defective.

> Full dmesg at the end of the email[1], but just the `tsc:' lines look
> like:
> 
> $ grep tsc dmesg.txt
> tsc: cpu0: IA32_TSC_ADJUST: -5774382067215574 -> 0
> tsc: cpu1: IA32_TSC_ADJUST: -5774382076335870 -> 0
> tsc: cpu2: IA32_TSC_ADJUST: -5774382073829798 -> 0
> tsc: cpu3: IA32_TSC_ADJUST: -5774382071913818 -> 0
> tsc: cpu4: IA32_TSC_ADJUST: -5774382075956770 -> 0
> tsc: cpu5: IA32_TSC_ADJUST: -5774382074583181 -> 0
> tsc: cpu6: IA32_TSC_ADJUST: -5774382073199574 -> 0
> tsc: cpu7: IA32_TSC_ADJUST: -5774382076500135 -> 0
> tsc: cpu8: IA32_TSC_ADJUST: -5774382074705354 -> 0
> tsc: cpu9: IA32_TSC_ADJUST: -5774382075954945 -> 0
> tsc: cpu10: IA32_TSC_ADJUST: -5774382070567294 -> 0
> tsc: cpu11: IA32_TSC_ADJUST: -5774382075968443 -> 0
> tsc: cpu12: IA32_TSC_ADJUST: -5774382067353478 -> 0
> tsc: cpu13: IA32_TSC_ADJUST: -5774382071926523 -> 0
> tsc: cpu14: IA32_TSC_ADJUST: -5774382074619890 -> 0
> tsc: cpu15: IA32_TSC_ADJUST: -5774382070107058 -> 0
> tsc: cpu16: IA32_TSC_ADJUST: -5774382076196640 -> 0
> tsc: cpu17: IA32_TSC_ADJUST: -5774382075090665 -> 0
> tsc: cpu18: IA32_TSC_ADJUST: -5774382073529646 -> 0
> tsc: cpu19: IA32_TSC_ADJUST: -5774382076443616 -> 0
> tsc: cpu20: IA32_TSC_ADJUST: -5774382074994536 -> 0
> tsc: cpu21: IA32_TSC_ADJUST: -5774382076309520 -> 0
> tsc: cpu22: IA32_TSC_ADJUST: -5774382070947686 -> 0
> tsc: cpu23: IA32_TSC_ADJUST: -5774382073056320 -> 0

Fascinating.  Wonder what the heck it's doing down there.

> It does look like there's a newer BIOS version for this machine, so I'll
> try updating it later today and repeating the test to see if anything
> changes.

Sure thing, thanks for testing.



[v3] amd64: simplify TSC sync testing

2022-07-04 Thread Scott Cheloha
Hi,

Once again, I am trying to change our approach to TSC sync testing to
eliminate false positive results.  Instead of trying to repair the TSC
by measuring skew, we just spin in a lockless loop looking for skew
and mark the TSC as broken if we detect any.

This is motivated in part by some multisocket machines that do not use
the TSC as a timecounter because the current sync test confuses NUMA
latency for TSC skew.

The only difference between this version and the prior version (v2) is
that we check whether we have the IA32_TSC_ADJUST register by hand in
tsc_reset_adjust().  If someone wants to help me rearrange cpu_hatch()
so we do CPU identification before TSC sync testing we can remove the
workaround later.

If you have the IA32_TSC_ADJUST register and it is non-zero going into
the test, you will see something on the console like this:

tsc: cpu5: IA32_TSC_ADJUST: -150 -> 0

This does *not* mean you failed the test.  It just means you probably
have a bug in your BIOS (or some other firmware) and you should report
it to your vendor.

If you fail the test you will see something like this:

tsc: cpu0/cpu2: sync test round 1/2 failed
tsc: cpu0/cpu2: cpu2: 13043 lags 438 cycles

A printout like this would mean that the sync test for cpu2 failed.
In particular, cpu2's TSC trails cpu0's TSC by at least 438 cycles.
If this happens for *any* CPU we mark the TSC timecounter as
defective.

--

Please test!  Send your dmesg, pass or fail.

I am especially interested in:

1. A test from dv@.  Your dual-socket machine has the IA32_TSC_ADJUST
   register but it failed the test running patch v2.  Maybe it will pass
   with this version?

2. Other multisocket machines.

3. There were reports of TSC issues with OpenBSD VMs running on ESXi.
   What happens when you run with this patch?

4. OpenBSD VMs on other hypervisors.

5. Non-Lenovo machines, non-Intel machines.

-Scott

Index: amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- amd64/tsc.c 31 Aug 2021 15:11:54 -  1.24
+++ amd64/tsc.c 5 Jul 2022 01:52:10 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,267 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
}
 
-   if (tsc_drift_observed > TSC_DRIFT_MAX) {
-   printf("ERROR: %lld cycle TSC drift observed\n",
-   (long long)tsc_drift_observed);
-   tsc_timecounter.tc_quality = -1000;
-   tsc_timecounter.tc_user = 0;
-   tsc_is_invariant = 0;
-   }
-
tc_init(_timecounter);
 }
 
-/*
- * Record drift (in clock cycles).  Called during AP startup.
- */
 void
-tsc_sync_drift(int64_t drift)
+tsc_delay(int usecs)
 {
-   if (drift < 0)
-   drift = -drift;
-   if (drift > tsc_drift_observed)
-   tsc_drift_observed = drift;
+   uint64_t interval, start;
+
+   interval = (uint64_t)usecs * tsc_frequency / 100;
+   start = rdtsc_lfence();
+   while (rdtsc_lfence() - start < interval)
+   CPU_BUSY_CYCLE();
 }
 
+#ifdef MULTIPROCESSOR
+
+#define TSC_DEBUG 1
+
 /*
- * Called during startup of APs, by the boot processor.  Interrupts
- * are disabled on entry.
+ * Protections for global variables in this code:
+ *
+ * a   Modified atomically
+ * b   Protected by a barrier
+ * p   Only modified by the primary CPU
  */
+
+#define TSC_TEST_MS1   /* Test round duration */
+#define TSC_TEST_ROUNDS2   /* Number of test rounds */
+
+struct tsc_test_status {
+   volatile uint64_t val 

Re: ts(1): make timespec-handling code more obvious

2022-07-04 Thread Scott Cheloha
On Mon, Jul 04, 2022 at 11:15:24PM +0200, Claudio Jeker wrote:
> On Mon, Jul 04, 2022 at 01:28:12PM -0500, Scott Cheloha wrote:
> > Hi,
> > 
> > Couple things:
> > 
> > [...]
> 
> I don't like the introduction of all these local variables that are just
> hard to follow and need extra code pathes. Happy to rename roff to offset,
> start_offset or something similar. Also moving the localtime call into
> fmtfmt() is fine.

You need an "elapsed" variable to avoid overwriting "now" in the
-i flag case to avoid calling clock_gettime(2) twice.

We can get rid of "utc_start" and just reuse "now" for the initial
value of CLOCK_REALTIME.

How is this?

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.6
diff -u -p -r1.6 ts.c
--- ts.c4 Jul 2022 17:29:03 -   1.6
+++ ts.c4 Jul 2022 22:06:56 -
@@ -32,7 +32,7 @@ static char   *buf;
 static char*outbuf;
 static size_t   bufsize;
 
-static void fmtfmt(struct tm *, long);
+static void fmtfmt(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -40,8 +40,7 @@ main(int argc, char *argv[])
 {
int iflag, mflag, sflag;
int ch, prev;
-   struct timespec roff, start, now;
-   struct tm *tm;
+   struct timespec elapsed, now, start, utc_offset;
clockid_t clock = CLOCK_REALTIME;
 
if (pledge("stdio", NULL) == -1)
@@ -93,22 +92,25 @@ main(int argc, char *argv[])
if (setenv("TZ", "UTC", 1) == -1)
err(1, "setenv UTC");
 
-   clock_gettime(CLOCK_REALTIME, );
clock_gettime(clock, );
-   timespecsub(, , );
+   if (mflag) {
+   clock_gettime(CLOCK_REALTIME, );
+   timespecsub(, , _offset);
+   }
 
for (prev = '\n'; (ch = getchar()) != EOF; prev = ch) {
if (prev == '\n') {
clock_gettime(clock, );
-   if (iflag || sflag)
-   timespecsub(, , );
-   else if (mflag)
-   timespecadd(, , );
-   if (iflag)
-   clock_gettime(clock, );
-   if ((tm = localtime(_sec)) == NULL)
-   err(1, "localtime");
-   fmtfmt(tm, now.tv_nsec);
+   if (iflag || sflag) {
+   timespecsub(, , );
+   if (iflag)
+   start = now;
+   fmtfmt();
+   } else {
+   if (mflag)
+   timespecadd(, _offset, );
+   fmtfmt();
+   }
}
if (putchar(ch) == EOF)
break;
@@ -132,11 +134,15 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(struct tm *tm, long tv_nsec)
+fmtfmt(const struct timespec *ts)
 {
+   struct tm *tm;
char *f, ms[7];
 
-   snprintf(ms, sizeof(ms), "%06ld", tv_nsec / 1000);
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   snprintf(ms, sizeof(ms), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 



ts(1): make timespec-handling code more obvious

2022-07-04 Thread Scott Cheloha
Hi,

Couple things:

- Use additional timespec variables to make our intent more obvious.

  Add "elapsed", "utc_offset", and "utc_start".

  "roff" is a confusing name, "utc_offset" is better.

  Yes, I know the clock is called CLOCK_REALTIME, but that's a
  historical accident.  Ideally they would have called it CLOCK_UTC
  or CLOCK_UNIX.  Sigh.

- Before the loop, we only need to compute utc_offset in the -m flag
  case.

- Before the loop, we only need to do two clock_gettime(2) calls in
  the -m flag case.

- In the loop, we can use the new variables to help clarify what we're
  doing:

  + In the -i and -s flag cases we're using an elapsed time for the
timestamp, so compute "elapsed".

  + In the default and -m cases we're using an absolute time for the
timestamp, so (if necessary) compute "now".

- We don't need to call clock_gettime(2) twice in the -i flag case.
  Just compute "elapsed" and then assign "now" to "start".  Easy.

- I think the pimp my ride joke is cute, but calling the function
  "print_timestamp()" is a bit more obvious.  It also tells the
  reader that there's a side effect.

- Because we always call localtime(3), we can move that call into
  print_timestamp() and just pass the timespec as argument.

  This makes it clearer where the nanosecond value is coming from,
  which in turn makes it clearer that the string "ms" will be exactly
  6 bytes in length.

  I think "ms" stands for "microseconds", in which case a better name
  is "us" or "usecs", but that's outside the scope of this patch.

--

ok?

Index: ts.c
===
RCS file: /cvs/src/usr.bin/ts/ts.c,v
retrieving revision 1.6
diff -u -p -r1.6 ts.c
--- ts.c4 Jul 2022 17:29:03 -   1.6
+++ ts.c4 Jul 2022 18:17:20 -
@@ -32,7 +32,7 @@ static char   *buf;
 static char*outbuf;
 static size_t   bufsize;
 
-static void fmtfmt(struct tm *, long);
+static void print_timestamp(const struct timespec *);
 static void __dead  usage(void);
 
 int
@@ -40,8 +40,7 @@ main(int argc, char *argv[])
 {
int iflag, mflag, sflag;
int ch, prev;
-   struct timespec roff, start, now;
-   struct tm *tm;
+   struct timespec elapsed, now, start, utc_offset, utc_start;
clockid_t clock = CLOCK_REALTIME;
 
if (pledge("stdio", NULL) == -1)
@@ -93,22 +92,28 @@ main(int argc, char *argv[])
if (setenv("TZ", "UTC", 1) == -1)
err(1, "setenv UTC");
 
-   clock_gettime(CLOCK_REALTIME, );
-   clock_gettime(clock, );
-   timespecsub(, , );
+   if (clock != CLOCK_REALTIME) {
+   clock_gettime(clock, );
+   if (mflag) {
+   clock_gettime(CLOCK_REALTIME, _start);
+   timespecsub(_start, , _offset);
+   }
+   } else
+   clock_gettime(CLOCK_REALTIME, );
 
for (prev = '\n'; (ch = getchar()) != EOF; prev = ch) {
if (prev == '\n') {
clock_gettime(clock, );
-   if (iflag || sflag)
-   timespecsub(, , );
-   else if (mflag)
-   timespecadd(, , );
-   if (iflag)
-   clock_gettime(clock, );
-   if ((tm = localtime(_sec)) == NULL)
-   err(1, "localtime");
-   fmtfmt(tm, now.tv_nsec);
+   if (iflag || sflag) {
+   timespecsub(, , );
+   if (iflag)
+   start = now;
+   print_timestamp();
+   } else {
+   if (mflag)
+   timespecadd(, _offset, );
+   print_timestamp();
+   }
}
if (putchar(ch) == EOF)
break;
@@ -132,11 +137,15 @@ usage(void)
  * so you can format while you format
  */
 static void
-fmtfmt(struct tm *tm, long tv_nsec)
+print_timestamp(const struct timespec *ts)
 {
+   struct tm *tm;
char *f, ms[7];
 
-   snprintf(ms, sizeof(ms), "%06ld", tv_nsec / 1000);
+   if ((tm = localtime(>tv_sec)) == NULL)
+   err(1, "localtime");
+
+   snprintf(ms, sizeof(ms), "%06ld", ts->tv_nsec / 1000);
strlcpy(buf, format, bufsize);
f = buf;
 



Re: powerpc, macppc: retrigger deferred DEC interrupts from splx(9)

2022-06-29 Thread Scott Cheloha
On Wed, Jun 29, 2022 at 10:34:53PM -0400, George Koehler wrote:
> Hi.  I have a question about splx, below.
> 
> On Thu, 23 Jun 2022 21:58:48 -0500
> Scott Cheloha  wrote:
> 
> > My machine uses openpic(4).  I would appreciate tests on a
> > non-openpic(4) Mac, though all tests are appreciated.
> 
> We only run on New World Macs, and the only ones without openpic(4)
> might be the oldest models of iMac G3 from 1998; these would attach
> macintr0 and not openpic0 in dmesg.  I don't know anyone who might
> have such an iMac.  The iMac model PowerMac2,1 from 1999 (with the
> (slot-loading cd drive) does have openpic(4).

If it's imperative we test it on a non-openpic(4) machine I might be
able to scrounge one on craigslist.

... they can't be completely extinct, right?

> > Index: macppc/dev/openpic.c
> > ===
> > RCS file: /cvs/src/sys/arch/macppc/dev/openpic.c,v
> > retrieving revision 1.89
> > diff -u -p -r1.89 openpic.c
> > --- macppc/dev/openpic.c21 Feb 2022 10:38:50 -  1.89
> > +++ macppc/dev/openpic.c24 Jun 2022 02:49:59 -
> > @@ -382,6 +382,10 @@ openpic_splx(int newcpl)
> >  
> > intr = ppc_intr_disable();
> > openpic_setipl(newcpl);
> > +   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
> > +   ppc_mtdec(0);
> > +   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
> > +   }
> > if (newcpl < IPL_SOFTTTY && (ci->ci_ipending & ppc_smask[newcpl])) {
> > s = splsofttty();
> > dosoftint(newcpl);
> 
> The 2nd mtdec tries to raise dec_intr by changing bit 1 << 31 of the
> decrementer register from 0 to 1.

Yes, exactly.  My read of PowerPC 2.01 is that the DEC exception
is raised when the DEC's MSB goes from 0 to 1.

> I suspect if the decrementer can
> also decrement itself from 0 to UINT32_MAX, and raise dec_intr early,
> before we reach the 2nd mtdec.  This would be bad, because this
> ppc_mtdec(UINT32_MAX) would override the ppc_mtdec(nextevent - tb) in
> dec_intr and lose the next scheduled clock interrupt.

To be perfectly clear, you are concerned about this scenario:

> > +   if (ci->ci_dec_deferred && newcpl < IPL_CLOCK) {
> > +   ppc_mtdec(0);

/* DEC interrupt fires *here*. */
/* We jump to decrint() and then call decr_intr(). */

> > +   ppc_mtdec(UINT32_MAX);  /* raise DEC exception */
> > +   }

I think it's possible for the DEC exception to occur in that spot.
However, external/DEC *interrupts* are explicitly disabled, so I don't
think that we will jump to decrint() until the next time we do

ppc_intr_enable(1);

That first happens in dosoftint().  If we don't call dosoftint(), it
may happen at the end of splx(), provided that interrupts weren't
already disabled when we called splx().

> Testing might miss this problem.  For example, a randomly reordered
> kernel might place the 2 mtdec instructions in different pages, which
> has a small chance of a page fault on a Mac G5.
> 
> Would this be better?
> 
>   ppc_mtdec(1 >> UINT32_MAX);
>   ppc_mtdec(UINT32_MAX);

I assume you meant to type

ppc_mtdec(UINT32_MAX >> 1);

I will tweak the code and try this out.  My reading of PowerPC 2.01
suggests that this will do the job just fine.

But again, I'm unsure whether we need this.  External and DEC
interrupts should be masked when we run this code unless I'm
misunderstanding what ppc_intr_disable() actually does.



Re: start unlocking kbind(2)

2022-06-24 Thread Scott Cheloha
On Wed, Jun 15, 2022 at 10:40:35PM -0500, Scott Cheloha wrote:
> On Wed, Jun 15, 2022 at 06:17:07PM -0600, Theo de Raadt wrote:
> > Mark Kettenis  wrote:
> > 
> > > Well, I believe that Scott was trying to fix a race condition that can
> > > only happen if code is using kbind(2) incorrectly, i.e. when the
> > > threads deliberately pass different cookies to kbind(2) or execute
> > > kbind(2) from different "text" addresses.
> > > 
> > > I still think the solution is simply to accept that race condition.
> > 
> > Right.
> > 
> > People are not calling kbind.  They are calling syscall(SYS_kbind
> > 
> > The man page says "don't do that".  No user serviceable parts inside.
> > Do not provide to children.
> > 
> > That said, Scott is about to share a diff he and I did a few cycles
> > around, to at least make the call-in transaction be a lock.
> 
> [...]
> 
> This patch reorganizes the logic in sys_kbind() preceding the
> copyin(9) so that we only need to take the kernel lock in a single
> spot if something goes wrong and we need to raise SIGILL.
> 
> It also puts the per-process mutex, ps_mtx, around that logic.  This
> guarantees that the first thread to reach sys_kbind() sets
> ps_kbind_addr and ps_kbind_cookie, even in oddball situations where
> the program isn't using ld.so(1) correctly.
> 
> [...]

10 days, no replies public or private.  I trust that means the patch
is fine.

Here is a tweaked patch with the binding loop wrapped with the kernel
lock.  We can more carefully determine whether uvm_unmap_remove(),
uvm_map_extract(), and uvm_unmap_detach() are MP-safe in a subsequent
patch.  They *look* safe but I can't be sure and nobody volunteered
any thoughts or review for the prior patch so we'll do the
conservative thing and push the kernel lock down, just as dlg@ was
going to do.

If something comes of guenther@'s PT_OPENBSD_KBIND idea we can
trivially remove ps_mtx from this code.

OK?

Index: kern/syscalls.master
===
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.224
diff -u -p -r1.224 syscalls.master
--- kern/syscalls.master16 May 2022 07:36:04 -  1.224
+++ kern/syscalls.master24 Jun 2022 14:43:26 -
@@ -194,7 +194,7 @@
const struct timespec *times, int flag); }
 85 STD { int sys_futimens(int fd, \
const struct timespec *times); }
-86 STD { int sys_kbind(const struct __kbind *param, \
+86 STD NOLOCK  { int sys_kbind(const struct __kbind *param, \
size_t psize, int64_t proc_cookie); }
 87 STD NOLOCK  { int sys_clock_gettime(clockid_t clock_id, \
struct timespec *tp); }
Index: kern/init_sysent.c
===
RCS file: /cvs/src/sys/kern/init_sysent.c,v
retrieving revision 1.237
diff -u -p -r1.237 init_sysent.c
--- kern/init_sysent.c  16 May 2022 07:38:10 -  1.237
+++ kern/init_sysent.c  24 Jun 2022 14:43:26 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: init_sysent.c,v 1.237 2022/05/16 07:38:10 mvs Exp $   */
+/* $OpenBSD$   */
 
 /*
  * System call switch table.
@@ -204,7 +204,7 @@ const struct sysent sysent[] = {
sys_utimensat },/* 84 = utimensat */
{ 2, s(struct sys_futimens_args), 0,
sys_futimens }, /* 85 = futimens */
-   { 3, s(struct sys_kbind_args), 0,
+   { 3, s(struct sys_kbind_args), SY_NOLOCK | 0,
sys_kbind },/* 86 = kbind */
{ 2, s(struct sys_clock_gettime_args), SY_NOLOCK | 0,
sys_clock_gettime },/* 87 = clock_gettime */
Index: sys/proc.h
===
RCS file: /cvs/src/sys/sys/proc.h,v
retrieving revision 1.330
diff -u -p -r1.330 proc.h
--- sys/proc.h  13 May 2022 15:32:00 -  1.330
+++ sys/proc.h  24 Jun 2022 14:43:27 -
@@ -234,8 +234,8 @@ struct process {
uint64_t ps_pledge;
uint64_t ps_execpledge;
 
-   int64_t ps_kbind_cookie;
-   u_long  ps_kbind_addr;
+   int64_t ps_kbind_cookie;/* [m] */
+   u_long  ps_kbind_addr;  /* [m] */
 
 /* End area that is copied on creation. */
 #define ps_endcopy ps_refcnt
Index: sys/syscall.h
===
RCS file: /cvs/src/sys/sys/syscall.h,v
retrieving revision 1.234
diff -u -p -r1.234 syscall.h
--- sys/syscall.h   16 May 2022 07:38:10 -  1.234
+++ sys/syscall.h   24 Jun 2022 14:43:27 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: syscall.h,v 1.234 2022/05/16 07:38:10 mvs Exp $   */
+/* $OpenBSD

powerpc, macppc: retrigger deferred DEC interrupts from splx(9)

2022-06-23 Thread Scott Cheloha
Hi,

One of the problems obstructing my dynamic clock interrupt patch is
that clock interrupts on powerpc don't (can't?) behave the same as
clock interrupts on amd64, arm64, and sparc64.

In particular, for historical reasons, on powerpc you cannot mask
decrementer (DEC) interrupts without *also* masking other interrupts
that we need to (generally) leave unmasked.

The upshot is that the DEC is unmasked at IPL_CLOCK and IPL_HIGH on
powerpc.  It's always running, it can arrive at any time.  We work
around the obvious problem this poses by postponing clock interrupt
work to a later tick if a DEC interrupt arrives when the CPU is at
IPL_CLOCK or IPL_HIGH.

This solution is insufficient for a machine-independent clock
interrupt subsystem like the one in my patch.

The only way forward I can see is to instead postpone clock interrupt
work until the next splx(9) call wherein the CPU's IPL is dropping
below IPL_CLOCK.

This patch does that.  We need to raise the DEC exception immediately
after we change the IPL, so there is a little bit of code duplicated
across the various splx(9) implementations for powerpc and macppc.
The changes in macppc/clock.c are hopefully straightforward.

This boots on my PowerMac G5 ("PowerMac7,3") and has survived two
`make build` runs and several more kernel builds.

Caveat: this machine is not quite stable.  I can't run parallel
userland builds, i.e.  `make -j2 build` without eventually hanging the
machine.  Also, sometimes it hangs at boot while /etc/netstart is
running.  These problems existed before this patch and remain after
the patch is applied.

However, this patch does not seem to have made the machine *more*
unstable, which is a good sign, I think.

My machine uses openpic(4).  I would appreciate tests on a
non-openpic(4) Mac, though all tests are appreciated.

Thoughts?  Tweaks?

If we merge this change I plan make an equivalent change on powerpc64.

dmesg below, patch attached at the end.

-Scott

[ using 1321372 bytes of bsd ELF symbol table ]
console out [ATY,Whelk_A] console in [keyboard], using USB
using parent ATY,WhelkParent:: memaddr a000, size 1000 : consaddr 
a0008000 : ioaddr 9002, size 2: width 1152 linebytes 1280 height 870 
depth 8
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2022 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.1-current (GENERIC.MP) #4: Thu Jun 23 10:55:15 CDT 2022
ssc@peanut.local:/usr/src/sys/arch/macppc/compile/GENERIC.MP
real mem = 2147483648 (2048MB)
avail mem = 2051514368 (1956MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: model PowerMac7,3
cpu0 at mainbus0: 970FX (Revision 0x300): 1800 MHz
cpu1 at mainbus0: 970FX (Revision 0x300): 1800 MHz
mem0 at mainbus0
spdmem0 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem1 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem2 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem3 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
memc0 at mainbus0: u3 rev 0xb3
kiic0 at memc0 offset 0xf8001000
iic0 at kiic0
lmtemp0 at iic0 addr 0x4a: ds1775
maxtmp0 at iic0 addr 0x4c: max6690
maxtmp1 at iic0 addr 0x4e: max6690
"cy28508" at iic0 addr 0x69 not configured
"cy2213" at iic0 addr 0x65 not configured
fcu0 at iic0 addr 0xaf
"pca9556" at iic0 addr 0x18 not configured
adc0 at iic0 addr 0x2c: ad7417
"24256" at iic0 addr 0x50 not configured
"pca9556" at iic0 addr 0x19 not configured
adc1 at iic0 addr 0x2d: ad7417
"24256" at iic0 addr 0x51 not configured
"dart" at memc0 offset 0xf8033000 not configured
"mpic" at memc0 offset 0xf804 not configured
mpcpcibr0 at mainbus0 pci: u3-agp
pci0 at mpcpcibr0 bus 0
pchb0 at pci0 dev 11 function 0 "Apple U3 AGP" rev 0x00
appleagp0 at pchb0
agp0 at appleagp0: aperture at 0x0, size 0x1000
radeondrm0 at pci0 dev 16 function 0 "ATI Radeon 9600" rev 0x00
drm0 at radeondrm0
radeondrm0: irq 48
ht0 at mainbus0: u3-ht, 6 devices
pci1 at ht0 bus 0
hpb0 at pci1 dev 1 function 0 "Apple U3" rev 0x00: 85 sources
pci2 at hpb0 bus 1
macobio0 at pci2 dev 7 function 0 "Apple K2 Macio" rev 0x60
openpic0 at macobio0 offset 0x4: version 0x4614 feature 770302 LE
macgpio0 at macobio0 offset 0x50
"pmu-interrupt" at macgpio0 offset 0x9 not configured
"programmer-switch" at macgpio0 offset 0x11 not configured
"modem-reset" at macgpio0 offset 0x1d not configured
"modem-power" at macgpio0 offset 0x1e not configured
"fcu-interrupt" at macgpio0 offset 0x15 not configured
"fcu-hw-reset" at macgpio0 offset 0x3a not configured
"slewing-done" at macgpio0 offset 0x23 not configured
"codec-input-data-mux" at macgpio0 offset 0xb not configured
"line-input-detect" at macgpio0 offset 0xc not configured
"codec-error-irq" at macgpio0 offset 0xd not configured
"dig-hw-reset" at macgpio0 offset 0x14 not configured
"line-output-detect" at macgpio0 offset 0x16 not configured
"headphone-detect" at macgpio0 offset 0x17 not 

kernel: remove global "randompid" toggle

2022-06-16 Thread Scott Cheloha
All PIDs after we fork init(8) are random.  This has been the case for
over 8 years:

https://cvsweb.openbsd.org/src/sys/kern/init_main.c?rev=1.193=text/x-cvsweb-markup

Are we keeping this "randompid" global around to make it possible to
disable random PIDs by toggling it in ddb(4)?

Maybe we need it because some future platform might have difficulty
gathering the necessary entropy before this point in the kernel
main()?

... or can we just remove it like this?

ok?

Index: sys/proc.h
===
RCS file: /cvs/src/sys/sys/proc.h,v
retrieving revision 1.330
diff -u -p -r1.330 proc.h
--- sys/proc.h  13 May 2022 15:32:00 -  1.330
+++ sys/proc.h  16 Jun 2022 00:18:13 -
@@ -499,7 +499,6 @@ extern struct proc proc0;   /* Process sl
 extern struct process process0;/* Process slot for kernel 
threads. */
 extern int nprocesses, maxprocess; /* Cur and max number of processes. */
 extern int nthreads, maxthread;/* Cur and max number of 
threads. */
-extern int randompid;  /* fork() should create random pid's */
 
 LIST_HEAD(proclist, proc);
 LIST_HEAD(processlist, process);
Index: kern/init_main.c
===
RCS file: /cvs/src/sys/kern/init_main.c,v
retrieving revision 1.315
diff -u -p -r1.315 init_main.c
--- kern/init_main.c22 Feb 2022 01:15:01 -  1.315
+++ kern/init_main.c16 Jun 2022 00:18:13 -
@@ -431,8 +431,6 @@ main(void *framep)
initprocess = initproc->p_p;
}
 
-   randompid = 1;
-
/*
 * Create any kernel threads whose creation was deferred because
 * initprocess had not yet been created.
Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.240
diff -u -p -r1.240 kern_fork.c
--- kern/kern_fork.c13 May 2022 15:32:00 -  1.240
+++ kern/kern_fork.c16 Jun 2022 00:18:13 -
@@ -67,7 +67,6 @@
 
 intnprocesses = 1; /* process 0 */
 intnthreads = 1;   /* proc 0 */
-intrandompid;  /* when set to 1, pid's go random */
 struct forkstat forkstat;
 
 void fork_return(void *);
@@ -638,20 +637,22 @@ ispidtaken(pid_t pid)
 pid_t
 allocpid(void)
 {
-   static pid_t lastpid;
+   static int first = 1;
pid_t pid;
 
-   if (!randompid) {
-   /* only used early on for system processes */
-   pid = ++lastpid;
-   } else {
-   /* Find an unused pid satisfying lastpid < pid <= PID_MAX */
-   do {
-   pid = arc4random_uniform(PID_MAX - lastpid) + 1 +
-   lastpid;
-   } while (ispidtaken(pid));
+   /* The first PID allocated is always 1. */
+   if (__predict_false(first)) {
+   first = 0;
+   return 1;
}
 
+   /*
+* All subsequent PIDs are chosen randomly.  We need to
+* find an unused PID in the range [2, PID_MAX].
+*/
+   do {
+   pid = arc4random_uniform(PID_MAX - 1) + 2;
+   } while (ispidtaken(pid));
return pid;
 }
 



Re: start unlocking kbind(2)

2022-06-15 Thread Scott Cheloha
On Wed, Jun 15, 2022 at 06:17:07PM -0600, Theo de Raadt wrote:
> Mark Kettenis  wrote:
> 
> > Well, I believe that Scott was trying to fix a race condition that can
> > only happen if code is using kbind(2) incorrectly, i.e. when the
> > threads deliberately pass different cookies to kbind(2) or execute
> > kbind(2) from different "text" addresses.
> > 
> > I still think the solution is simply to accept that race condition.
> 
> Right.
> 
> People are not calling kbind.  They are calling syscall(SYS_kbind
> 
> The man page says "don't do that".  No user serviceable parts inside.
> Do not provide to children.
> 
> That said, Scott is about to share a diff he and I did a few cycles
> around, to at least make the call-in transaction be a lock.

Okay, here it is.

This patch reorganizes the logic in sys_kbind() preceding the
copyin(9) so that we only need to take the kernel lock in a single
spot if something goes wrong and we need to raise SIGILL.

It also puts the per-process mutex, ps_mtx, around that logic.  This
guarantees that the first thread to reach sys_kbind() sets
ps_kbind_addr and ps_kbind_cookie, even in oddball situations where
the program isn't using ld.so(1) correctly.

I am aware that this is "unsupported", but on balance I would prefer
that the initialization of those two variables was atomic.  I think
taking the mutex is a very small price to pay for the guarantee that
the "security check" logic always happens the way it was intended to
happen in the order it was intended to happen.  Note that we are not
wrapping the binding loop up in a lock, just the ps_kbind_* variable
logic.

... if Mark and Philip want to push back on that, I think I would
yield and just drop the mutex.

... also, if Philip goes ahead with the PT_OPENBSD_KBIND thing we no
longer need the mutex because, as I understand it, we would no longer
need the ps_kbind_cookie.

Even in either of those cases I still think the logic refactoring
shown here is a good thing.

--

I have been running with this now for a day and haven't hit any
panics.  The last time I tried unlocking kbind(2) I hit panics pretty
quickly due to KERNEL_ASSERT_LOCKED() calls down in the binding loop.

Because I'm not seeing that I assume this means something has changed
down in UVM and we no longer need to wrap the binding loop with the
kernel lock.

Thoughts?

Index: uvm/uvm_mmap.c
===
RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
retrieving revision 1.169
diff -u -p -r1.169 uvm_mmap.c
--- uvm/uvm_mmap.c  19 Jan 2022 10:43:48 -  1.169
+++ uvm/uvm_mmap.c  16 Jun 2022 03:36:53 -
@@ -1127,7 +1127,7 @@ sys_kbind(struct proc *p, void *v, regis
size_t psize, s;
u_long pc;
int count, i, extra;
-   int error;
+   int error, sigill = 0;
 
/*
 * extract syscall args from uap
@@ -1135,23 +1135,41 @@ sys_kbind(struct proc *p, void *v, regis
paramp = SCARG(uap, param);
psize = SCARG(uap, psize);
 
-   /* a NULL paramp disables the syscall for the process */
-   if (paramp == NULL) {
-   if (pr->ps_kbind_addr != 0)
-   sigexit(p, SIGILL);
-   pr->ps_kbind_addr = BOGO_PC;
-   return 0;
-   }
-
-   /* security checks */
+   /*
+* If paramp is NULL and we're uninitialized, disable the syscall
+* for the process.  Raise SIGILL if we're already initialized.
+*
+* If paramp is non-NULL and we're uninitialized, do initialization.
+* Otherwise, do security checks raise SIGILL on failure.
+*/
pc = PROC_PC(p);
-   if (pr->ps_kbind_addr == 0) {
+   mtx_enter(>ps_mtx);
+   if (paramp == NULL) {
+   if (pr->ps_kbind_addr == 0)
+   pr->ps_kbind_addr = BOGO_PC;
+   else
+   sigill = 1;
+   } else if (pr->ps_kbind_addr == 0) {
pr->ps_kbind_addr = pc;
pr->ps_kbind_cookie = SCARG(uap, proc_cookie);
-   } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC)
-   sigexit(p, SIGILL);
-   else if (pr->ps_kbind_cookie != SCARG(uap, proc_cookie))
+   } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC ||
+   pr->ps_kbind_cookie != SCARG(uap, proc_cookie)) {
+   sigill = 1;
+   }
+   mtx_leave(>ps_mtx);
+
+   /* Raise SIGILL if something is off. */
+   if (sigill) {
+   KERNEL_LOCK();
sigexit(p, SIGILL);
+   /* NOTREACHED */
+   KERNEL_UNLOCK();
+   }
+
+   /* We're done if we were disabling the syscall. */
+   if (paramp == NULL)
+   return 0;
+
if (psize < sizeof(struct __kbind) || psize > sizeof(param))
return EINVAL;
if ((error = copyin(paramp, , psize)))
Index: kern/syscalls.master
===

Re: start unlocking kbind(2)

2022-06-13 Thread Scott Cheloha
On Sun, Jun 12, 2022 at 12:12:33AM -0600, Theo de Raadt wrote:
> David Gwynne  wrote:
> 
> > On Wed, May 18, 2022 at 07:42:32PM -0600, Theo de Raadt wrote:
> > > Mark Kettenis  wrote:
> > > 
> > > > > Isn't the vm_map_lock enough?
> > > > 
> > > > Could be.  The fast path is going to take that lock anyway.  This
> > > > would require a bit of surgery to uvm_map_extract() to make sure we
> > > > don't take the vm_map_lock twice.  Worth exploring I'd say.
> > > 
> > > I think the vm_map_lock can be dropped before it reaches that code,
> > > because of 3 cases: (1) new kbind lock, (2) a repeated kbind lock and
> > > return, or (3) violation and process termination.
> > > 
> > > So before doing the copyin() and updates, simply vm_map_unlock()
> > > 
> > > Will that work and isn't it simpler than David's proposal?
> > 
> > I'm not super familiar with uvm so it's hard for me to say it would or
> > wouldn't be simpler, but my initial impressions are that it would be a
> > lot more surgery and I wouldn't be confident I did such changes right.
> 
> i don't understand.  As a rule, using fewer locks and mutexes is simpler
> to understand.
> 
> i do not understand the purpose for your diff, and think you have simply
> brushed my questions aside.
> 
> > Releasing a lock and then taking it again quickly can have undesirable
> > consequences too. If something else is waiting on the rwlock,
> > releasing it will wake them up, but they will probably lose because
> > we've already taken it again.
> 
> Then don't release the vm_map_lock and regrab it, but keep it active.
> 
> I think there is a lot of fuss going on here for a system call which, as
> far as I can tell, is never called in a threaded program.  Static libc or
> ld.so always call kbind early on, before threads, precisely ONCE, and any
> later call to it kills the program but why do we need a mutex for the
> simple action of observing that ps_kbind_addr is not 0?
> 
> Am I wrong that kbind is never called twice in the same address space?

Isn't this exactly what happened the last time we tried this?

> Can someone describe a synthetic sequence where racing sys_kbind calls
> perform the wrong action?

This is highly synthetic, but:

If Thread1 sets ps_kbind_addr before Thread2 but Thread2 compares
ps_kbind_cookie with SCARG(uap, proc_cookie) before Thread1 can change
ps_kbind_addr from 0 to its own SCARG(uap, proc_cookie) it is possible
that Thread2 will pass the security check and proceed, even though it
should have caused a SIGILL.

Thread2 knows ps_kbind_cookie is initialy zero, so there is a
theoretical race.

If you wanted to make this statistically impossible you could
initialize ps_kbind_cookie to a random value during fork(2) so that
Thread2 has a 1 in 2^64 chance of guessing the the initial
ps_kbind_cookie value and bypassing the security check as
described above.

If you do that, then we can do the security check locklessly with
atomic_cas_ulong(9).



Re: powerpc64: do tc_init(9) before cpu_startclock()

2022-05-26 Thread Scott Cheloha
> On May 24, 2022, at 7:12 PM, Scott Cheloha  wrote:
> 
> In the future, the clock interrupt will need a working timecounter to
> accurately reschedule itself.
> 
> Move tc_init(9) up before cpu_startclock().
> 
> (I can't test this but it seems correct.)
> 
> ok?

Ping.

This is trivial, can someone with powerpc64 hardware confirm this
boots?

> Index: clock.c
> ===
> RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 clock.c
> --- clock.c   23 Feb 2021 04:44:31 -  1.3
> +++ clock.c   25 May 2022 00:05:59 -
> @@ -57,6 +57,9 @@ tb_get_timecount(struct timecounter *tc)
> void
> cpu_initclocks(void)
> {
> + tb_timecounter.tc_frequency = tb_freq;
> + tc_init(_timecounter);
> +
>   tick_increment = tb_freq / hz;
> 
>   stathz = 100;
> @@ -68,9 +71,6 @@ cpu_initclocks(void)
>   evcount_attach(_count, "stat", NULL);
> 
>   cpu_startclock();
> -
> - tb_timecounter.tc_frequency = tb_freq;
> - tc_init(_timecounter);
> }
> 
> void



powerpc64: do tc_init(9) before cpu_startclock()

2022-05-24 Thread Scott Cheloha
In the future, the clock interrupt will need a working timecounter to
accurately reschedule itself.

Move tc_init(9) up before cpu_startclock().

(I can't test this but it seems correct.)

ok?

Index: clock.c
===
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/clock.c,v
retrieving revision 1.3
diff -u -p -r1.3 clock.c
--- clock.c 23 Feb 2021 04:44:31 -  1.3
+++ clock.c 25 May 2022 00:05:59 -
@@ -57,6 +57,9 @@ tb_get_timecount(struct timecounter *tc)
 void
 cpu_initclocks(void)
 {
+   tb_timecounter.tc_frequency = tb_freq;
+   tc_init(_timecounter);
+
tick_increment = tb_freq / hz;
 
stathz = 100;
@@ -68,9 +71,6 @@ cpu_initclocks(void)
evcount_attach(_count, "stat", NULL);
 
cpu_startclock();
-
-   tb_timecounter.tc_frequency = tb_freq;
-   tc_init(_timecounter);
 }
 
 void



librthread: validate timespec inputs with timespecisvalid(3)

2022-05-14 Thread Scott Cheloha
ok?

Index: rthread_rwlock_compat.c
===
RCS file: /cvs/src/lib/librthread/rthread_rwlock_compat.c,v
retrieving revision 1.1
diff -u -p -r1.1 rthread_rwlock_compat.c
--- rthread_rwlock_compat.c 13 Feb 2019 13:15:39 -  1.1
+++ rthread_rwlock_compat.c 14 May 2022 14:29:27 -
@@ -143,8 +143,7 @@ int
 pthread_rwlock_timedrdlock(pthread_rwlock_t *lockp,
 const struct timespec *abstime)
 {
-   if (abstime == NULL || abstime->tv_nsec < 0 ||
-   abstime->tv_nsec >= 10)
+   if (abstime == NULL || !timespecisvalid(abstime))
return (EINVAL);
return (_rthread_rwlock_rdlock(lockp, abstime, 0));
 }
@@ -210,8 +209,7 @@ int
 pthread_rwlock_timedwrlock(pthread_rwlock_t *lockp,
 const struct timespec *abstime)
 {
-   if (abstime == NULL || abstime->tv_nsec < 0 ||
-   abstime->tv_nsec >= 10)
+   if (abstime == NULL || !timespecisvalid(abstime))
return (EINVAL);
return (_rthread_rwlock_wrlock(lockp, abstime, 0));
 }
Index: rthread_sem.c
===
RCS file: /cvs/src/lib/librthread/rthread_sem.c,v
retrieving revision 1.32
diff -u -p -r1.32 rthread_sem.c
--- rthread_sem.c   6 Apr 2020 00:01:08 -   1.32
+++ rthread_sem.c   14 May 2022 14:29:27 -
@@ -254,8 +254,7 @@ sem_timedwait(sem_t *semp, const struct 
int error;
PREP_CANCEL_POINT(tib);
 
-   if (!semp || !(sem = *semp) || abstime == NULL ||
-  abstime->tv_nsec < 0 || abstime->tv_nsec >= 10) {
+   if (!semp || !(sem = *semp) || !abstime || !timespecisvalid(abstime)) {
errno = EINVAL;
return (-1);
}
Index: rthread_sem_compat.c
===
RCS file: /cvs/src/lib/librthread/rthread_sem_compat.c,v
retrieving revision 1.1
diff -u -p -r1.1 rthread_sem_compat.c
--- rthread_sem_compat.c8 Jun 2018 13:53:01 -   1.1
+++ rthread_sem_compat.c14 May 2022 14:29:27 -
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -266,8 +267,7 @@ sem_timedwait(sem_t *semp, const struct 
int r;
PREP_CANCEL_POINT(tib);
 
-   if (!semp || !(sem = *semp) || abstime == NULL ||
-   abstime->tv_nsec < 0 || abstime->tv_nsec >= 10) {
+   if (!semp || !(sem = *semp) || !abstime || !timespecisvalid(abstime)) {
errno = EINVAL;
return (-1);
}



Re: amd64: do CPU identification before TSC sync test

2022-05-11 Thread Scott Cheloha
> On May 11, 2022, at 2:51 AM, Yuichiro NAITO  wrote:
> 
> On 5/11/22 14:34, Yuichiro NAITO wrote:
>> After applying your patch, cpu1 is not identified on my current kernel.
>> Dmesg shows as follows. I'll see it further more.
> 
> I found that LAPIC is necessary for `delay` function that is used in 
> `idendifycpu` and
> waiting loop for CPUF_IDENTIFY flag is set.

Both lapic_delay() and tsc_delay() are plenty fast.

i8254_delay() might cause issues on a VM.

I need to think a bit, but your rearrangement of the patch is a step in
the right direction.

I will reply later with something better.

Thank you for testing!



Re: [v2] amd64: simplify TSC sync testing

2022-05-10 Thread Scott Cheloha
On Wed, May 11, 2022 at 10:52:55AM +0900, Yuichiro NAITO wrote:
> Hi, Scott.
> 
> Recently I started running OpenBSD on ESXi.
> I'm facing monotonic time going back problem as same as Yasuoka-san's report.
> 
> https://marc.info/?l=openbsd-tech=161657532610882=2
> 
> I've tried your v2 patch. It seems the problem has been solved in my 
> enviroment.
> But I'm a little bit confused about the patch. May I ask you goal of the 
> patch?

The primary goal of the patch is to eliminate false positives when
testing for TSC skew.

NUMA lag seems to sometimes fool the current test.  It would be nice
to be able to use the TSC in userland on multisocket machines.

> Is it indented to fix the problem or is it for collecting test cases and the 
> results?

The intention of the patch was not to fix the problem Yasuoka is
describing.

If it fixes that problem it is unintentional.

> Here is the dmesg shown in my patched environment.
> 
> [...]



Re: amd64: do CPU identification before TSC sync test

2022-05-10 Thread Scott Cheloha
On Tue, Mar 29, 2022 at 10:24:03AM -0500, Scott Cheloha wrote:
> On Tue, Mar 29, 2022 at 03:26:49PM +1100, Jonathan Gray wrote:
> > On Mon, Mar 28, 2022 at 10:52:09PM -0500, Scott Cheloha wrote:
> > > I want to use the IA32_TSC_ADJUST MSR where available when testing TSC
> > > synchronization.  We note if it's available during CPU identification.
> > > 
> > > Can we do CPU identification earlier in cpu_hatch() and
> > > cpu_start_secondary(), before we do the TSC sync testing?
> > > 
> > > This can wait until after release.  I'm just trying to suss out
> > > whether there is an order dependency I'm not seeing.  My laptop
> > > appears to boot and resume no differently with this patch.
> > > 
> > > Thoughts?
> > 
> > The rest aside, moving the cpu_ucode_apply() call to after the
> > identifycpu() call is wrong as microcode can add cpuid bits.
> > I would keep cpu_tsx_disable() before it as well.
> 
> Okay, moved them up.
> 
> > I'm sure I've had problems trying to change the sequencing
> > of lapic, tsc freq and identify in the past.  It caused problems
> > only on certain machines.
> 
> [...]

6 week bump + rebase.

Once again, I want to do CPU identification before the TSC sync test
so we can check for and use the IA32_TSC_ADJUST MSR during the sync
test.

Does anyone understand amd64 CPU startup well enough to say whether
this rearrangement is going to break something?

Is this ok?

Index: cpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.156
diff -u -p -r1.156 cpu.c
--- cpu.c   26 Apr 2022 08:35:30 -  1.156
+++ cpu.c   11 May 2022 02:00:37 -
@@ -852,20 +852,7 @@ cpu_start_secondary(struct cpu_info *ci)
printf("dropping into debugger; continue from here to resume 
boot\n");
db_enter();
 #endif
-   } else {
-   /*
-* Synchronize time stamp counters. Invalidate cache and
-* synchronize twice (in tsc_sync_bp) to minimize possible
-* cache effects. Disable interrupts to try and rule out any
-* external interference.
-*/
-   s = intr_disable();
-   wbinvd();
-   tsc_sync_bp(ci);
-   intr_restore(s);
-#ifdef TSC_DEBUG
-   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
-#endif
+   goto cleanup;
}
 
if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
@@ -875,11 +862,28 @@ cpu_start_secondary(struct cpu_info *ci)
for (i = 200; (ci->ci_flags & CPUF_IDENTIFY) && i > 0; i--)
delay(10);
 
-   if (ci->ci_flags & CPUF_IDENTIFY)
+   if (ci->ci_flags & CPUF_IDENTIFY) {
printf("%s: failed to identify\n",
ci->ci_dev->dv_xname);
+   goto cleanup;
+   }
}
 
+   /*
+* Synchronize time stamp counters. Invalidate cache and
+* synchronize twice (in tsc_sync_bp) to minimize possible
+* cache effects. Disable interrupts to try and rule out any
+* external interference.
+*/
+   s = intr_disable();
+   wbinvd();
+   tsc_sync_bp(ci);
+   intr_restore(s);
+#ifdef TSC_DEBUG
+   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
+#endif
+
+cleanup:
CPU_START_CLEANUP(ci);
 
pmap_kremove(MP_TRAMPOLINE, PAGE_SIZE);
@@ -940,18 +944,8 @@ cpu_hatch(void *v)
if (ci->ci_flags & CPUF_PRESENT)
panic("%s: already running!?", ci->ci_dev->dv_xname);
 #endif
-
-   /*
-* Synchronize the TSC for the first time. Note that interrupts are
-* off at this point.
-*/
-   wbinvd();
ci->ci_flags |= CPUF_PRESENT;
-   ci->ci_tsc_skew = 0;/* reset on resume */
-   tsc_sync_ap(ci);
 
-   lapic_enable();
-   lapic_startclock();
cpu_ucode_apply(ci);
cpu_tsx_disable(ci);
 
@@ -970,6 +964,17 @@ cpu_hatch(void *v)
/* Prevent identifycpu() from running again */
atomic_setbits_int(>ci_flags, CPUF_IDENTIFIED);
}
+
+   /*
+* Synchronize the TSC for the first time. Note that interrupts are
+* off at this point.
+*/
+   wbinvd();
+   ci->ci_tsc_skew = 0;/* reset on resume */
+   tsc_sync_ap(ci);
+
+   lapic_enable();
+   lapic_startclock();
 
while ((ci->ci_flags & CPUF_GO) == 0)
delay(10);



Re: ratecheck mutex

2022-05-04 Thread Scott Cheloha
> On May 3, 2022, at 17:16, Alexander Bluhm  wrote:
> 
> Hi,
> 
> We have one comment that locking for ratecheck(9) is missing.  In
> all other places locking status of the struct timeval *lasttime
> is unclear.
> 
> The easiest fix is a global mutex for all lasttime in ratecheck().
> This covers the usual usecase of the function.

Why not declare a struct ratecheck with
a per-struct mutex?

It seems odd to be heading toward more
parallel processing in e.g. the networking
stack and introduce a global point of
contention.



Re: kstat(1): implement wait with setitimer(2)

2022-05-02 Thread Scott Cheloha
On Sat, Apr 30, 2022 at 01:27:44AM +0200, Alexander Bluhm wrote:
> On Thu, Apr 28, 2022 at 08:54:02PM -0500, Scott Cheloha wrote:
> > On Thu, Sep 17, 2020 at 06:29:48PM -0500, Scott Cheloha wrote:
> > > [...]
> > >
> > > Using nanosleep(2) to print the stats periodically causes the period
> > > to drift.  If you use setitimer(2) it won't drift.
> > >
> > > ok?
> >
> > 19 month bump and rebase.
> >
> > I have updated the patch according to input from kn@.
> >
> > Once again, using nanosleep(2) here to print the stats periodically is
> > flawed.  The period will drift.  Using setitimer(2)/sigsuspend(2) is
> > better.
> >
> > While here:
> >
> > - We don't need the hundred million second upper bound anymore.  Just
> >   cap the wait at UINT_MAX seconds.
> >
> > - Use the idiomatic strtonum(3) error message format, it works here.
> >
> > ok?
> 
> I would prefer to block the alarm signal with sigprocmask(2) and
> only catch it during sigsuspend(2).  Although the timeout should
> only happen while we sleep, blocking signals while we don't expect
> them, gives me a better feeling.

Whenever I block signals, deraadt@ rises up out of the floorboards and
says "I hate masking signals, don't do that."

... but if millert@ is still fine with the attached patch, which does
sigprocmask(2), I'll go ahead with it.

> Please check the error code of signal(3).

Sure.

> otherwise diff looks good to me

Still look good?

Index: kstat.c
===
RCS file: /cvs/src/usr.bin/kstat/kstat.c,v
retrieving revision 1.9
diff -u -p -r1.9 kstat.c
--- kstat.c 22 Apr 2022 00:29:20 -  1.9
+++ kstat.c 3 May 2022 01:37:36 -
@@ -15,6 +15,8 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -104,6 +106,7 @@ kstat_cmp(const struct kstat_entry *ea, 
 RBT_PROTOTYPE(kstat_tree, kstat_entry, entry, kstat_cmp);
 RBT_GENERATE(kstat_tree, kstat_entry, entry, kstat_cmp);
 
+static void handle_alrm(int);
 static struct kstat_filter *
kstat_filter_parse(char *);
 static int kstat_filter_entry(struct kstat_filters *,
@@ -134,16 +137,17 @@ main(int argc, char *argv[])
int fd;
const char *errstr;
int ch;
-   struct timespec interval = { 0, 0 };
+   struct itimerval itv;
+   sigset_t empty, mask;
int i;
+   unsigned int wait = 0;
 
while ((ch = getopt(argc, argv, "w:")) != -1) {
switch (ch) {
case 'w':
-   interval.tv_sec = strtonum(optarg, 1, 1,
-   );
+   wait = strtonum(optarg, 1, UINT_MAX, );
if (errstr != NULL)
-   errx(1, "wait %s: %s", optarg, errstr);
+   errx(1, "wait is %s: %s", errstr, optarg);
break;
default:
usage();
@@ -168,12 +172,25 @@ main(int argc, char *argv[])
kstat_list(, fd, version, );
kstat_print();
 
-   if (interval.tv_sec == 0)
+   if (wait == 0)
return (0);
 
-   for (;;) {
-   nanosleep(, NULL);
+   if (signal(SIGALRM, handle_alrm) == SIG_ERR)
+   err(1, "signal");
+   sigemptyset();
+   sigemptyset();
+   sigaddset(, SIGALRM);
+   if (sigprocmask(SIG_BLOCK, , NULL) == -1)
+   err(1, "sigprocmask");
+
+   itv.it_value.tv_sec = wait;
+   itv.it_value.tv_usec = 0;
+   itv.it_interval = itv.it_value;
+   if (setitimer(ITIMER_REAL, , NULL) == -1)
+   err(1, "setitimer");
 
+   for (;;) {
+   sigsuspend();
kstat_read(, fd);
kstat_print();
}
@@ -547,4 +564,9 @@ kstat_read(struct kstat_tree *kt, int fd
if (ioctl(fd, KSTATIOC_FIND_ID, ksreq) == -1)
err(1, "update id %llu", ksreq->ks_id);
}
+}
+
+static void
+handle_alrm(int signo)
+{
 }



Re: speaker(4): unhook driver and manpage from build

2022-04-29 Thread Scott Cheloha
> On Apr 29, 2022, at 10:40, Jeremie Courreges-Anglas  wrote:
> 
> On Thu, Apr 28 2022, Scott Cheloha  wrote:
>> speaker(4) is a whimsical thing, but I don't think we should have a
>> dedicated chiptune interpreter in the kernel.
> 
>> This patch unhooks the driver and the manpage from the build.  The
>> driver is built for alpha, amd64, and i386.
>> 
>> A subsequent patch will move all relevant files to the attic and clean
>> up manpage cross references.
>> 
>> Nothing in base or xenocara includes .
>> 
>> I see a couple SPKRTONE and SPKRTUNE symbols in ports, but I imagine
>> those ports don't use the symbols if they are missing.
>> 
>> ok?
> 
> People seem to find this useful, for real world use cases.  Is there
> a technical reason to delete it besides "this doesn't belong here"?

We're often looking for ways to shrink the
kernel.

The code is a bit dusty.

I guess if the SPKRTUNE interpreter were
in userspace I would be a lot happier, but
that seems unlikely at this late date.



Re: speaker(4): unhook driver and manpage from build

2022-04-29 Thread Scott Cheloha
> On Apr 29, 2022, at 09:33, Angelo  wrote:
> 
> Hello
> 
>> On Thu, Apr 28, 2022 at 06:34:00AM -0500, Scott Cheloha wrote:
>> speaker(4) is a whimsical thing, but I don't think we should have a
>> dedicated chiptune interpreter in the kernel.
>> 
>> This patch unhooks the driver and the manpage from the build.  The
>> driver is built for alpha, amd64, and i386.
> 
> If this patch is applied, I can probably no longer do something like
> this?
> 
>$ echo -n 'L30cbdcbdcbdP2cbdcbdcbd' > /dev/speaker
> 
> If that's the case, then it would be good if we can keep speaker(4).
> I have multiple headless systems at home which start to beep under
> certain conditions, to alert me.

This is a perfectly valid use case. Crude, but
absent a real audio device you're just using
what you've got.

If the systems in question have *no* other
audio output then I am less inclined to
remove it.



Re: speaker(4): unhook driver and manpage from build

2022-04-29 Thread Scott Cheloha
> On Apr 29, 2022, at 10:06, j...@entropicblur.com wrote:
> 
> On 2022-04-29 08:31, Angelo wrote:
>> Hello
>>> On Thu, Apr 28, 2022 at 06:34:00AM -0500, Scott Cheloha wrote:
>>> speaker(4) is a whimsical thing, but I don't think we should have a
>>> dedicated chiptune interpreter in the kernel.
>>> This patch unhooks the driver and the manpage from the build.  The
>>> driver is built for alpha, amd64, and i386.
>> If this patch is applied, I can probably no longer do something like
>> this?
>>$ echo -n 'L30cbdcbdcbdP2cbdcbdcbd' > /dev/speaker
>> If that's the case, then it would be good if we can keep speaker(4).
>> I have multiple headless systems at home which start to beep under
>> certain conditions, to alert me.
> 
> Does this change mean the system bell will no longer be available, ie:
> 
> echo ^G
> 
> I would really prefer to NOT lose this, until/unless an alternate mechanism 
> to route the system bell through the regular sound output could be added.

No.  Beeping is handled by pcppi(4).  The
speaker(4) device is different, it provides an
interface to userspace for playing melodies
out of pcppi(4).



Re: timecounting: use full 96-bit product when computing high-res time

2022-04-29 Thread Scott Cheloha
On Thu, Oct 14, 2021 at 04:13:18PM -0500, Scott Cheloha wrote:
>
> [...]
> 
> When we compute high resolution time, both in the kernel and in libc,
> we get a 32-bit (or smaller) value from the active timecounter and
> scale it up into a 128-bit bintime.
> 
> The scaling math currently looks like this in the kernel:
> 
> [...]
> 
> The problem with this code is that if the product
> 
>   th->tc_scale * tc_delta(th)
> 
> exceeds UINT64_MAX, the result overflows and we lose time.
> 
> [...]
> 
> The solution to this problem is to use the full 96-bit product when we
> scale the count up into a bintime.  We're multiplying a u_int
> (32-bit), the count, by a uint64_t, the scale, but we're not capturing
> the upper 32 bits of that product.  If we did, we would have a longer
> grace period between clock interrupts before we lost time.
> 
> The attached patch adds a TIMECOUNT_TO_BINTIME() function to sys/time.h
> and puts it to use in sys/kern/kern_tc.c and lib/libc/sys/microtime.c.
> The math is a bit boring, see the patch if you are curious.
> 
> As for the cost, there is a small but significant increase in overhead
> when reading the clock with the TSC.  Slower timecounters (HPET, ACPI
> timer) are so slow the extra overhead is noise.
> 
> [...]
> 
> It looks to me that on amd64, userspace clock_gettime(2) is up to ~10%
> slower with the patch.  But there is a lot of variation between the
> comparisons, so I don't think it's a consistent 10%.  I'd say 10% is
> an upper bound.

6 month bump.  Absent some sort of discussion I'm going to commit this
in a week, I'm confident the math is correct.

As before, the only potentially contentious aspect to this change is
the additional overhead that we will carry on every gettimeofday(2)
call, clock_gettime(2) call, and high-res time call in the kernel.

Profiling the overhead for this is very difficult, at least on amd64.
I am still confident about my 10% upper bound on amd64 with the
userspace TSC, but the actual overhead in a given a/b test varies a
lot within that range.  I've seen everything from 2% to 9%.  It's all
over the place, just never more than 10%.

I think 10% is tolerable.  If you object to that, PLEASE speak up.
There are more complex ways to fix this problem that might eliminate
the overhead, but they are uglier.

If you are an evangelist for a non-amd64 platform I would really
appreciate an a/b test for the overhead in userspace.  If you need
test code please ask on-list and I'll provide it.

Index: lib/libc/sys/microtime.c
===
RCS file: /cvs/src/lib/libc/sys/microtime.c,v
retrieving revision 1.1
diff -u -p -r1.1 microtime.c
--- lib/libc/sys/microtime.c6 Jul 2020 13:33:06 -   1.1
+++ lib/libc/sys/microtime.c29 Apr 2022 01:07:26 -
@@ -45,10 +45,10 @@ binuptime(struct bintime *bt, struct tim
do {
gen = tk->tk_generation;
membar_consumer();
-   *bt = tk->tk_offset;
if (tc_delta(tk, ))
return -1;
-   bintimeaddfrac(bt, tk->tk_scale * delta, bt);
+   TIMECOUNT_TO_BINTIME(delta, tk->tk_scale, bt);
+   bintimeadd(bt, >tk_offset, bt);
membar_consumer();
} while (gen == 0 || gen != tk->tk_generation);
 
@@ -65,7 +65,8 @@ binruntime(struct bintime *bt, struct ti
membar_consumer();
if (tc_delta(tk, ))
return -1;
-   bintimeaddfrac(>tk_offset, tk->tk_scale * delta, bt);
+   TIMECOUNT_TO_BINTIME(delta, tk->tk_scale, bt);
+   bintimeadd(bt, >tk_offset, bt);
bintimesub(bt, >tk_naptime, bt);
membar_consumer();
} while (gen == 0 || gen != tk->tk_generation);
@@ -81,10 +82,10 @@ bintime(struct bintime *bt, struct timek
do {
gen = tk->tk_generation;
membar_consumer();
-   *bt = tk->tk_offset;
if (tc_delta(tk, ))
return -1;
-   bintimeaddfrac(bt, tk->tk_scale * delta, bt);
+   TIMECOUNT_TO_BINTIME(delta, tk->tk_scale, bt);
+   bintimeadd(bt, >tk_offset, bt);
bintimeadd(bt, >tk_boottime, bt);
membar_consumer();
} while (gen == 0 || gen != tk->tk_generation);
Index: sys/kern/kern_tc.c
===
RCS file: /cvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.75
diff -u -p -r1.75 kern_tc.c
--- sys/kern/kern_tc.c  24 Oct 2021 00:02:25 -  1.75
+++ sys/kern/kern_tc.c  29 Apr 2022 01:07:26 -
@@ -189,8 +189,8 @@ binuptime(struct bintime *bt)
th = timehands;
  

Re: kstat(1): implement wait with setitimer(2)

2022-04-28 Thread Scott Cheloha
On Thu, Sep 17, 2020 at 06:29:48PM -0500, Scott Cheloha wrote:
> [...]
> 
> Using nanosleep(2) to print the stats periodically causes the period
> to drift.  If you use setitimer(2) it won't drift.
> 
> ok?

19 month bump and rebase.

I have updated the patch according to input from kn@.

Once again, using nanosleep(2) here to print the stats periodically is
flawed.  The period will drift.  Using setitimer(2)/sigsuspend(2) is
better.

While here:

- We don't need the hundred million second upper bound anymore.  Just
  cap the wait at UINT_MAX seconds.

- Use the idiomatic strtonum(3) error message format, it works here.

ok?

Index: kstat.c
===
RCS file: /cvs/src/usr.bin/kstat/kstat.c,v
retrieving revision 1.9
diff -u -p -r1.9 kstat.c
--- kstat.c 22 Apr 2022 00:29:20 -  1.9
+++ kstat.c 29 Apr 2022 01:43:31 -
@@ -15,6 +15,8 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -104,6 +106,7 @@ kstat_cmp(const struct kstat_entry *ea, 
 RBT_PROTOTYPE(kstat_tree, kstat_entry, entry, kstat_cmp);
 RBT_GENERATE(kstat_tree, kstat_entry, entry, kstat_cmp);
 
+static void handle_alrm(int);
 static struct kstat_filter *
kstat_filter_parse(char *);
 static int kstat_filter_entry(struct kstat_filters *,
@@ -134,16 +137,17 @@ main(int argc, char *argv[])
int fd;
const char *errstr;
int ch;
-   struct timespec interval = { 0, 0 };
+   struct itimerval itv;
+   unsigned int wait = 0;
+   sigset_t empty;
int i;
 
while ((ch = getopt(argc, argv, "w:")) != -1) {
switch (ch) {
case 'w':
-   interval.tv_sec = strtonum(optarg, 1, 1,
-   );
+   wait = strtonum(optarg, 1, UINT_MAX, );
if (errstr != NULL)
-   errx(1, "wait %s: %s", optarg, errstr);
+   errx(1, "wait is %s: %s", errstr, optarg);
break;
default:
usage();
@@ -165,15 +169,21 @@ main(int argc, char *argv[])
if (ioctl(fd, KSTATIOC_VERSION, ) == -1)
err(1, "kstat version");
 
-   kstat_list(, fd, version, );
-   kstat_print();
-
-   if (interval.tv_sec == 0)
+   if (wait == 0) {
+   kstat_list(, fd, version, );
+   kstat_print();
return (0);
+   }
 
+   sigemptyset();
+   signal(SIGALRM, handle_alrm);
+   itv.it_value.tv_sec = wait;
+   itv.it_value.tv_usec = 0;
+   itv.it_interval = itv.it_value;
+   if (setitimer(ITIMER_REAL, , NULL) == -1)
+   err(1, "setitimer");
for (;;) {
-   nanosleep(, NULL);
-
+   sigsuspend();
kstat_read(, fd);
kstat_print();
}
@@ -547,4 +557,9 @@ kstat_read(struct kstat_tree *kt, int fd
if (ioctl(fd, KSTATIOC_FIND_ID, ksreq) == -1)
err(1, "update id %llu", ksreq->ks_id);
}
+}
+
+static void
+handle_alrm(int signo)
+{
 }



speaker(4): unhook driver and manpage from build

2022-04-28 Thread Scott Cheloha
speaker(4) is a whimsical thing, but I don't think we should have a
dedicated chiptune interpreter in the kernel.

This patch unhooks the driver and the manpage from the build.  The
driver is built for alpha, amd64, and i386.

A subsequent patch will move all relevant files to the attic and clean
up manpage cross references.

Nothing in base or xenocara includes .

I see a couple SPKRTONE and SPKRTUNE symbols in ports, but I imagine
those ports don't use the symbols if they are missing.

ok?

Index: distrib/sets/lists/comp/mi
===
RCS file: /cvs/src/distrib/sets/lists/comp/mi,v
retrieving revision 1.1597
diff -u -p -r1.1597 mi
--- distrib/sets/lists/comp/mi  20 Mar 2022 10:54:43 -  1.1597
+++ distrib/sets/lists/comp/mi  28 Apr 2022 11:29:15 -
@@ -439,7 +439,6 @@
 ./usr/include/dev/isa/sbdspvar.h
 ./usr/include/dev/isa/sbreg.h
 ./usr/include/dev/isa/sbvar.h
-./usr/include/dev/isa/spkrio.h
 ./usr/include/dev/isa/vga_isavar.h
 ./usr/include/dev/isa/viasioreg.h
 ./usr/include/dev/isa/wbsioreg.h
Index: distrib/sets/lists/man/mi
===
RCS file: /cvs/src/distrib/sets/lists/man/mi,v
retrieving revision 1.1664
diff -u -p -r1.1664 mi
--- distrib/sets/lists/man/mi   20 Apr 2022 01:39:49 -  1.1664
+++ distrib/sets/lists/man/mi   28 Apr 2022 11:29:15 -
@@ -1956,7 +1956,6 @@
 ./usr/share/man/man4/sparc64/zs.4
 ./usr/share/man/man4/sparc64/zx.4
 ./usr/share/man/man4/spdmem.4
-./usr/share/man/man4/speaker.4
 ./usr/share/man/man4/sppp.4
 ./usr/share/man/man4/sqphy.4
 ./usr/share/man/man4/ssdfb.4
Index: etc/MAKEDEV.common
===
RCS file: /cvs/src/etc/MAKEDEV.common,v
retrieving revision 1.115
diff -u -p -r1.115 MAKEDEV.common
--- etc/MAKEDEV.common  7 Jan 2022 01:13:15 -   1.115
+++ etc/MAKEDEV.common  28 Apr 2022 11:29:15 -
@@ -148,7 +148,6 @@ target(all, joy, 0, 1)dnl
 twrget(all, rnd, random)dnl
 target(all, uk, 0)dnl
 twrget(all, vi, video, 0, 1)dnl
-twrget(all, speak, speaker)dnl
 target(all, asc, 0)dnl
 target(all, radio, 0)dnl
 target(all, tuner, 0)dnl
@@ -462,8 +461,6 @@ _mkdev(bpf, bpf, {-M bpf c major_bpf_c 0
M bpf0 c major_bpf_c 0 600-})dnl
 _mkdev(tun, {-tun*-}, {-M tun$U c major_tun_c $U 600-}, 600)dnl
 _mkdev(tap, {-tap*-}, {-M tap$U c major_tap_c $U 600-}, 600)dnl
-__devitem(speak, speaker, PC speaker,spkr)dnl
-_mkdev(speak, speaker, {-M speaker c major_speak_c 0 600-})dnl
 __devitem(tun, tun*, Network tunnel driver)dnl
 __devitem(tap, tap*, Ethernet tunnel driver)dnl
 __devitem(rnd, *random, In-kernel random data source,random)dnl
Index: etc/etc.alpha/MAKEDEV.md
===
RCS file: /cvs/src/etc/etc.alpha/MAKEDEV.md,v
retrieving revision 1.78
diff -u -p -r1.78 MAKEDEV.md
--- etc/etc.alpha/MAKEDEV.md11 Nov 2021 09:47:32 -  1.78
+++ etc/etc.alpha/MAKEDEV.md28 Apr 2022 11:29:15 -
@@ -76,7 +76,6 @@ _DEV(pppac, 71)
 _DEV(radio, 59)
 _DEV(rnd, 34)
 _DEV(rmidi, 41)
-_DEV(speak, 40)
 _DEV(tun, 7)
 _DEV(tap, 68)
 _DEV(tuner, 58)
Index: etc/etc.amd64/MAKEDEV.md
===
RCS file: /cvs/src/etc/etc.amd64/MAKEDEV.md,v
retrieving revision 1.80
diff -u -p -r1.80 MAKEDEV.md
--- etc/etc.amd64/MAKEDEV.md7 Jan 2022 01:13:15 -   1.80
+++ etc/etc.amd64/MAKEDEV.md28 Apr 2022 11:29:15 -
@@ -88,7 +88,6 @@ _DEV(pppac, 99)
 _DEV(radio, 76)
 _DEV(rnd, 45)
 _DEV(rmidi, 52)
-_DEV(speak, 27)
 _DEV(tun, 40)
 _DEV(tap, 93)
 _DEV(tuner, 49)
Index: etc/etc.i386/MAKEDEV.md
===
RCS file: /cvs/src/etc/etc.i386/MAKEDEV.md,v
retrieving revision 1.95
diff -u -p -r1.95 MAKEDEV.md
--- etc/etc.i386/MAKEDEV.md 7 Jan 2022 01:13:15 -   1.95
+++ etc/etc.i386/MAKEDEV.md 28 Apr 2022 11:29:15 -
@@ -90,7 +90,6 @@ _DEV(pppac, 99)
 _DEV(radio, 76)
 _DEV(rnd, 45)
 _DEV(rmidi, 52)
-_DEV(speak, 27)
 _DEV(tun, 40)
 _DEV(tap, 94)
 _DEV(tuner, 49)
Index: share/man/man4/Makefile
===
RCS file: /cvs/src/share/man/man4/Makefile,v
retrieving revision 1.817
diff -u -p -r1.817 Makefile
--- share/man/man4/Makefile 18 Jan 2022 07:53:39 -  1.817
+++ share/man/man4/Makefile 28 Apr 2022 11:29:15 -
@@ -80,7 +80,7 @@ MAN=  aac.4 abcrtc.4 abl.4 ac97.4 acphy.4
safte.4 sbus.4 schsio.4 scsi.4 sd.4 \
sdmmc.4 sdhc.4 se.4 ses.4 sf.4 sili.4 \
simpleamp.4 simpleaudio.4 simplefb.4 simplepanel.4 siop.4 sis.4 sk.4 \
-   sm.4 smsc.4 softraid.4 spdmem.4 sdtemp.4 speaker.4 sppp.4 sqphy.4 \
+   sm.4 smsc.4 softraid.4 spdmem.4 sdtemp.4 sppp.4 sqphy.4 \
ssdfb.4 st.4 ste.4 stge.4 sti.4 stp.4 sv.4 sxiccmu.4 \
sxidog.4 sximmc.4 sxipio.4 sxipwm.4 sxirsb.4 sxirtc.4 sxisid.4 \
sxisyscon.4 sxitemp.4 

Re: amd64: do CPU identification before TSC sync test

2022-03-29 Thread Scott Cheloha
On Tue, Mar 29, 2022 at 03:26:49PM +1100, Jonathan Gray wrote:
> On Mon, Mar 28, 2022 at 10:52:09PM -0500, Scott Cheloha wrote:
> > I want to use the IA32_TSC_ADJUST MSR where available when testing TSC
> > synchronization.  We note if it's available during CPU identification.
> > 
> > Can we do CPU identification earlier in cpu_hatch() and
> > cpu_start_secondary(), before we do the TSC sync testing?
> > 
> > This can wait until after release.  I'm just trying to suss out
> > whether there is an order dependency I'm not seeing.  My laptop
> > appears to boot and resume no differently with this patch.
> > 
> > Thoughts?
> 
> The rest aside, moving the cpu_ucode_apply() call to after the
> identifycpu() call is wrong as microcode can add cpuid bits.
> I would keep cpu_tsx_disable() before it as well.

Okay, moved them up.

> I'm sure I've had problems trying to change the sequencing
> of lapic, tsc freq and identify in the past.  It caused problems
> only on certain machines.

Uh, identifycpu() calls tsc_identify() calls tsc_freq_cpuid(), which
quietly sets lapic_per_second (a variable in amd64/lapic.c) based on
the core crystal frequency reported via CPUID.

If lapic_per_second is non-zero when lapic_calibrate_timer() is
called, the BP skips manual calibration of the LAPIC counter with
the i8254.

It might be less surprising if lapic_calibrate_timer() grabbed the
core crystal frequency via a function in amd64/tsc.c.  Then the order
of TSC-sync-test and LAPIC setup wouldn't matter.  We disable
interrupts when doing TSC sync anyway, so even if the LAPIC were
running we wouldn't see the interrupts.

...

But maybe you're talking about some other issue :)

Index: cpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.155
diff -u -p -r1.155 cpu.c
--- cpu.c   21 Feb 2022 11:03:39 -  1.155
+++ cpu.c   29 Mar 2022 15:19:54 -
@@ -842,20 +842,7 @@ cpu_start_secondary(struct cpu_info *ci)
printf("dropping into debugger; continue from here to resume 
boot\n");
db_enter();
 #endif
-   } else {
-   /*
-* Synchronize time stamp counters. Invalidate cache and
-* synchronize twice (in tsc_sync_bp) to minimize possible
-* cache effects. Disable interrupts to try and rule out any
-* external interference.
-*/
-   s = intr_disable();
-   wbinvd();
-   tsc_sync_bp(ci);
-   intr_restore(s);
-#ifdef TSC_DEBUG
-   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
-#endif
+   goto cleanup;
}
 
if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
@@ -865,11 +852,28 @@ cpu_start_secondary(struct cpu_info *ci)
for (i = 200; (ci->ci_flags & CPUF_IDENTIFY) && i > 0; i--)
delay(10);
 
-   if (ci->ci_flags & CPUF_IDENTIFY)
+   if (ci->ci_flags & CPUF_IDENTIFY) {
printf("%s: failed to identify\n",
ci->ci_dev->dv_xname);
+   goto cleanup;
+   }
}
 
+   /*
+* Synchronize time stamp counters. Invalidate cache and
+* synchronize twice (in tsc_sync_bp) to minimize possible
+* cache effects. Disable interrupts to try and rule out any
+* external interference.
+*/
+   s = intr_disable();
+   wbinvd();
+   tsc_sync_bp(ci);
+   intr_restore(s);
+#ifdef TSC_DEBUG
+   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
+#endif
+
+cleanup:
CPU_START_CLEANUP(ci);
 
pmap_kremove(MP_TRAMPOLINE, PAGE_SIZE);
@@ -930,18 +934,8 @@ cpu_hatch(void *v)
if (ci->ci_flags & CPUF_PRESENT)
panic("%s: already running!?", ci->ci_dev->dv_xname);
 #endif
-
-   /*
-* Synchronize the TSC for the first time. Note that interrupts are
-* off at this point.
-*/
-   wbinvd();
ci->ci_flags |= CPUF_PRESENT;
-   ci->ci_tsc_skew = 0;/* reset on resume */
-   tsc_sync_ap(ci);
 
-   lapic_enable();
-   lapic_startclock();
cpu_ucode_apply(ci);
cpu_tsx_disable(ci);
 
@@ -960,6 +954,17 @@ cpu_hatch(void *v)
/* Prevent identifycpu() from running again */
atomic_setbits_int(>ci_flags, CPUF_IDENTIFIED);
}
+
+   /*
+* Synchronize the TSC for the first time. Note that interrupts are
+* off at this point.
+*/
+   wbinvd();
+   ci->ci_tsc_skew = 0;/* reset on resume */
+   tsc_sync_ap(ci);
+
+   lapic_enable();
+   lapic_startclock();
 
while ((ci->ci_flags & CPUF_GO) == 0)
delay(10);



amd64: do CPU identification before TSC sync test

2022-03-28 Thread Scott Cheloha
I want to use the IA32_TSC_ADJUST MSR where available when testing TSC
synchronization.  We note if it's available during CPU identification.

Can we do CPU identification earlier in cpu_hatch() and
cpu_start_secondary(), before we do the TSC sync testing?

This can wait until after release.  I'm just trying to suss out
whether there is an order dependency I'm not seeing.  My laptop
appears to boot and resume no differently with this patch.

Thoughts?

Index: cpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.155
diff -u -p -r1.155 cpu.c
--- cpu.c   21 Feb 2022 11:03:39 -  1.155
+++ cpu.c   29 Mar 2022 03:49:31 -
@@ -842,20 +842,7 @@ cpu_start_secondary(struct cpu_info *ci)
printf("dropping into debugger; continue from here to resume 
boot\n");
db_enter();
 #endif
-   } else {
-   /*
-* Synchronize time stamp counters. Invalidate cache and
-* synchronize twice (in tsc_sync_bp) to minimize possible
-* cache effects. Disable interrupts to try and rule out any
-* external interference.
-*/
-   s = intr_disable();
-   wbinvd();
-   tsc_sync_bp(ci);
-   intr_restore(s);
-#ifdef TSC_DEBUG
-   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
-#endif
+   goto cleanup;
}
 
if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
@@ -865,11 +852,28 @@ cpu_start_secondary(struct cpu_info *ci)
for (i = 200; (ci->ci_flags & CPUF_IDENTIFY) && i > 0; i--)
delay(10);
 
-   if (ci->ci_flags & CPUF_IDENTIFY)
+   if (ci->ci_flags & CPUF_IDENTIFY) {
printf("%s: failed to identify\n",
ci->ci_dev->dv_xname);
+   goto cleanup;
+   }
}
 
+   /*
+* Synchronize time stamp counters. Invalidate cache and
+* synchronize twice (in tsc_sync_bp) to minimize possible
+* cache effects. Disable interrupts to try and rule out any
+* external interference.
+*/
+   s = intr_disable();
+   wbinvd();
+   tsc_sync_bp(ci);
+   intr_restore(s);
+#ifdef TSC_DEBUG
+   printf("TSC skew=%lld\n", (long long)ci->ci_tsc_skew);
+#endif
+
+cleanup:
CPU_START_CLEANUP(ci);
 
pmap_kremove(MP_TRAMPOLINE, PAGE_SIZE);
@@ -930,20 +934,7 @@ cpu_hatch(void *v)
if (ci->ci_flags & CPUF_PRESENT)
panic("%s: already running!?", ci->ci_dev->dv_xname);
 #endif
-
-   /*
-* Synchronize the TSC for the first time. Note that interrupts are
-* off at this point.
-*/
-   wbinvd();
ci->ci_flags |= CPUF_PRESENT;
-   ci->ci_tsc_skew = 0;/* reset on resume */
-   tsc_sync_ap(ci);
-
-   lapic_enable();
-   lapic_startclock();
-   cpu_ucode_apply(ci);
-   cpu_tsx_disable(ci);
 
if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
/*
@@ -960,6 +951,19 @@ cpu_hatch(void *v)
/* Prevent identifycpu() from running again */
atomic_setbits_int(>ci_flags, CPUF_IDENTIFIED);
}
+
+   /*
+* Synchronize the TSC for the first time. Note that interrupts are
+* off at this point.
+*/
+   wbinvd();
+   ci->ci_tsc_skew = 0;/* reset on resume */
+   tsc_sync_ap(ci);
+
+   lapic_enable();
+   lapic_startclock();
+   cpu_ucode_apply(ci);
+   cpu_tsx_disable(ci);
 
while ((ci->ci_flags & CPUF_GO) == 0)
delay(10);



ssh: xstrdup(): use memcpy(3)

2022-03-09 Thread Scott Cheloha
The strdup(3) implementation in libc uses memcpy(3), not strlcpy(3).

There is no upside to using strlcpy(3) here if we know the length of
str before we copy it to the destination buffer.

... unless we're worried the length of str will change?  Which would
be very paranoid.  But if that's the case we should be checking that
the return value of strlcpy(3) equals len and calling fatal() if it
isn't.

ok?

Index: xmalloc.c
===
RCS file: /cvs/src/usr.bin/ssh/xmalloc.c,v
retrieving revision 1.36
diff -u -p -r1.36 xmalloc.c
--- xmalloc.c   12 Nov 2019 22:32:48 -  1.36
+++ xmalloc.c   10 Mar 2022 01:06:54 -
@@ -85,8 +85,7 @@ xstrdup(const char *str)
 
len = strlen(str) + 1;
cp = xmalloc(len);
-   strlcpy(cp, str, len);
-   return cp;
+   return memcpy(cp, str, len);
 }
 
 int



[v2] amd64: simplify TSC sync testing

2022-02-22 Thread Scott Cheloha
Hi,

Here is a second draft patch for changing our approach to TSC
synchronization.

With this patch, instead of trying to fix desync with a handshake we
test for desync with a (more) foolproof loop and then don't attempt to
correct for desync if we detect it.

The motivation for a more foolproof loop is to eliminate the false
positive results seen on multisocket CPUs using the current handshake.
The handshake seems to interpret NUMA lag as desync and disables the
TSC on some multisocket systems.  Ideally this should not happen.

The motivation for not attempting to fix desync in the kernel with a
per-CPU skew value is, well, I think it's too error-prone.  I think
reliably correcting TSC desync in software without the IA32_TSC_ADJUST
register is basically impossible given the speed of the TSC.  If it
were a slower clock it would be more feasible, but this is not the
case.

One thing that doesn't work correctly yet is resetting
IA32_TSC_ADJUST.  The relevant feature flag is missing during the
first boot.  Could we move CPU identification up to an earlier point
in cpu_hatch() so the flag is set when we do the sync test?

--

I asked for review on the actual meat of the previous patch and got
nothing.  So I have dumbed the code down to weed out confounding
factors.

I am relatively confident that if this test detects desync that
something is off with your TSC.  Not totally confident, because I
haven't gotten any review yet, but I'm getting more confident.

When I forcibly desync the TSC on my APs during boot by 150 cycles
using the IA32_TSC_ADJUST register I get output like this:

Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu1: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu1: cpu1: 947 lags 32 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu2: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu2: cpu2: 1215 lags 30 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu3: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu3: cpu3: 1085 lags 28 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu4: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu4: cpu4: 18667 lags 94 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu5: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu5: cpu5: 771 lags 34 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu6: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu6: cpu6: 842 lags 30 cycles
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu7: sync test round 1/2 failed
Feb 21 10:05:14 jetsam /bsd: tsc: cpu0/cpu7: cpu7: 895 lags 30 cycles

The CPUs are not running at full speed during boot, so we only measure
~30 cycles of lag when the true lag is 150 cycles.  We get ~94 cycles
on CPU4 because it is an SMT twin with CPU0... for some reason this
reduces the margin of error.  In general, the margin of error shrinks
with higher clock rates.

Please test!  In particular:

- I'd love retests on systems that failed the test using the previous
  patch.  Almost all of these were AMD Ryzen CPUs.  It's hard to say
  what the issue is there.  My vague guess is a firmware bug.

  One would hope that AMD's QA would catch an issue with the #RESET
  signal, which is supposed to start all TSCs on all CPUs from zero
  simultaneously.  I am unsure how you would diagnose that it was,
  in fact, a firmware bug though.

- Multisocket systems

- Multiprocessor VMs

Please include your dmesg.

Thanks!

-Scott

Index: amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- amd64/tsc.c 31 Aug 2021 15:11:54 -  1.24
+++ amd64/tsc.c 23 Feb 2022 02:23:24 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & 

look(1): drop "rpath" promise after open(2)/fstat(2)

2022-02-08 Thread Scott Cheloha
The look(1) program needs to open(2) and fstat(2) exactly one file
during its runtime.  Using unveil(2) seems like overkill here.

This seems closer to what we want:

- pledge(2) initially with "stdio rpath" at the top of main().
  We know we need to read a file at this point but don't yet
  know which one.

- pledge(2) down to "stdio" after we have opened the file
  in question and called fstat(2) to get its size.  The rest
  of the program is computation and stdio.

- Remove the unveil(2) call.  We don't need it if we're only
  working with one file and it's already open.

Unless I have misunderstood something, we don't need "rpath" to
mmap(2) the descriptor into memory after opening it, so drop "rpath"
before the mmap(2) call.

ok?

Index: look.c
===
RCS file: /cvs/src/usr.bin/look/look.c,v
retrieving revision 1.25
diff -u -p -r1.25 look.c
--- look.c  24 Oct 2021 21:24:16 -  1.25
+++ look.c  9 Feb 2022 01:26:38 -
@@ -77,6 +77,9 @@ main(int argc, char *argv[])
int ch, fd, termchar;
char *back, *file, *front, *string, *p;
 
+   if (pledge("stdio rpath", NULL) == -1)
+   err(2, "pledge");
+
file = _PATH_WORDS;
termchar = '\0';
while ((ch = getopt(argc, argv, "dft:")) != -1)
@@ -110,11 +113,6 @@ main(int argc, char *argv[])
usage();
}
 
-   if (unveil(file, "r") == -1)
-   err(2, "unveil %s", file);
-   if (pledge("stdio rpath", NULL) == -1)
-   err(2, "pledge");
-
if (termchar != '\0' && (p = strchr(string, termchar)) != NULL)
*++p = '\0';
 
@@ -122,6 +120,10 @@ main(int argc, char *argv[])
err(2, "%s", file);
if (sb.st_size > SIZE_MAX)
errc(2, EFBIG, "%s", file);
+
+   if (pledge("stdio", NULL) == -1)
+   err(2, "pledge");
+
if ((front = mmap(NULL,
(size_t)sb.st_size, PROT_READ, MAP_PRIVATE, fd, (off_t)0)) == 
MAP_FAILED)
err(2, "%s", file);



Re: tr(1): improve table names

2022-02-08 Thread Scott Cheloha
On Sun, Jan 30, 2022 at 10:23:43AM -0600, Scott Cheloha wrote:
> In tr(1), we have these two global arrays, "string1" and "string2".
> 
> I have a few complaints:
> 
> 1. They are not strings.  They are lookup tables.  The names are
>misleading.
> 
> 2. The arguments given to tr(1) in argv[] are indeed called "string1"
>and "string2".  These are the names used in the standard, the manpage,
>and the usage printout.
> 
>However, the lookup tables are merely *described* by these arguments.
>They are not the arguments themselves.
> 
> 3. The meaning of the contents of these lookup tables changes depending
>on which of the five different operating modes tr(1) is running in.
> 
>string1[i] might mean "delete byte i" or "squeeze byte i" or
>"replace byte i with the value string1[i]" depending on how
>tr(1) was invoked.
> 
> Given this, I think it'd be a lot nicer if we named the tables to
> indicate which transformation operation they correspond to.
> 
> We have three such operations: "delete", "squeeze", and "translate".
> So we ought to have a table for each.  And in setup() we should call
> the table a "table", not a "string".
> 
> Now when you look at the loops in main() you can immediately
> understand which operation is happening without needing to consult the
> manpage or the comments.  (Seriously, look.)
> 
> I have more cleanup I want to do here in tr.c, but I think renaming
> these tables first is going to make the rest of it a lot easier to
> review.
> 
> ok?

1 week bump.

ok?

Index: tr.c
===
RCS file: /cvs/src/usr.bin/tr/tr.c,v
retrieving revision 1.20
diff -u -p -r1.20 tr.c
--- tr.c2 Nov 2021 15:45:52 -   1.20
+++ tr.c30 Jan 2022 16:14:21 -
@@ -40,7 +40,8 @@
 
 #include "extern.h"
 
-static int string1[NCHARS] = {
+int delete[NCHARS], squeeze[NCHARS];
+int translate[NCHARS] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* ASCII */
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
@@ -73,7 +74,7 @@ static int string1[NCHARS] = {
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
-}, string2[NCHARS];
+};
 
 STR s1 = { STRING1, NORMAL, 0, OOBCH, { 0, OOBCH }, NULL, NULL };
 STR s2 = { STRING2, NORMAL, 0, OOBCH, { 0, OOBCH }, NULL, NULL };
@@ -122,11 +123,11 @@ main(int argc, char *argv[])
if (argc != 2)
usage();
 
-   setup(string1, argv[0], , cflag);
-   setup(string2, argv[1], , 0);
+   setup(delete, argv[0], , cflag);
+   setup(squeeze, argv[1], , 0);
 
for (lastch = OOBCH; (ch = getchar()) != EOF;)
-   if (!string1[ch] && (!string2[ch] || lastch != ch)) {
+   if (!delete[ch] && (!squeeze[ch] || lastch != ch)) {
lastch = ch;
(void)putchar(ch);
}
@@ -141,10 +142,10 @@ main(int argc, char *argv[])
if (argc != 1)
usage();
 
-   setup(string1, argv[0], , cflag);
+   setup(delete, argv[0], , cflag);
 
while ((ch = getchar()) != EOF)
-   if (!string1[ch])
+   if (!delete[ch])
(void)putchar(ch);
exit(0);
}
@@ -154,10 +155,10 @@ main(int argc, char *argv[])
 * Squeeze all characters (or complemented characters) in string1.
 */
if (sflag && argc == 1) {
-   setup(string1, argv[0], , cflag);
+   setup(squeeze, argv[0], , cflag);
 
for (lastch = OOBCH; (ch = getchar()) != EOF;)
-   if (!string1[ch] || lastch != ch) {
+   if (!squeeze[ch] || lastch != ch) {
lastch = ch;
(void)putchar(ch);
}
@@ -177,7 +178,7 @@ main(int argc, char *argv[])
s2.str = (unsigned char *)argv[1];
 
if (cflag)
-   for (cnt = NCHARS, p = string1; cnt--;)
+   for (cnt = NCHARS, p = translate; cnt--;)
*p++ = OOBCH;
 
if (!next())
@@ -187,45 +188,45 @@ main(int argc, char *argv[])
ch = s2.lastch;
if (sflag)
while (next()) {
-   string1[s1.lastch] = ch = s2.lastch;
-   string2[ch] = 1;
+   translate[s1.l

rev(1): drop "rpath" promise in no-file branch

2022-02-08 Thread Scott Cheloha
We don't need "rpath" if we're only processing the standard input.

ok?

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.15
diff -u -p -r1.15 rev.c
--- rev.c   29 Jan 2022 00:11:54 -  1.15
+++ rev.c   8 Feb 2022 16:38:46 -
@@ -68,6 +68,9 @@ main(int argc, char *argv[])
 
rval = 0;
if (argc == 0) {
+   if (pledge("stdio", NULL) == -1)
+   err(1, "pledge");
+
rval = rev_file(NULL);
} else {
for (; *argv != NULL; argv++)



Re: head(1): check for stdio errors

2022-02-06 Thread Scott Cheloha
> On Feb 6, 2022, at 20:07, Todd C. Miller  wrote:
> 
> Since the input is opened read-only I don't see the point in checking
> the fclose() return value.  However, if you are going to do so, you
> might as well combine it with the ferror() check.  E.g.
> 
>if (ferror(fp) || fclose(fp) == EOF) {
>warn("%s", name);
>status = 1;
>}

If we do that, we leak fp when there is an
input error.



head(1): check for stdio errors

2022-02-06 Thread Scott Cheloha
Add missing stdio error checks to head(1):

- Output errors are terminal.  The output is always stdout.

- Input errors yield a warning and cause the program to fail
  gracefully.

- Restructure the getc(3)/putchar(3) loop in head_file() to accomodate
  checking for errors.

ok?

P.S. Restructuring the loop makes head(1) a bit faster on my machine
in certain contrived benchmarks.  For example, reading the words file
five hundred times:

/usr/bin/time head -n $((1 << 30)) $(jot -b /usr/share/dict/words 500) 
>/dev/null

Not exactly a win, but at least it isn't slower.

Index: head.c
===
RCS file: /cvs/src/usr.bin/head/head.c,v
retrieving revision 1.23
diff -u -p -r1.23 head.c
--- head.c  29 Jan 2022 00:19:04 -  1.23
+++ head.c  6 Feb 2022 23:44:01 -
@@ -96,30 +96,46 @@ main(int argc, char *argv[])
 int
 head_file(const char *path, long count, int need_header)
 {
+   const char *name;
FILE *fp;
-   int ch;
+   int ch, status = 0;
static int first = 1;
 
if (path != NULL) {
-   fp = fopen(path, "r");
+   name = path;
+   fp = fopen(name, "r");
if (fp == NULL) {
-   warn("%s", path);
+   warn("%s", name);
return 1;
}
if (need_header) {
-   printf("%s==> %s <==\n", first ? "" : "\n", path);
+   printf("%s==> %s <==\n", first ? "" : "\n", name);
+   if (ferror(stdout))
+   err(1, "stdout");
first = 0;
}
-   } else
+   } else {
+   name = "stdin";
fp = stdin;
+   }
 
-   for (; count > 0 && !feof(fp); --count)
-   while ((ch = getc(fp)) != EOF)
-   if (putchar(ch) == '\n')
-   break;
-   fclose(fp);
+   while ((ch = getc(fp)) != EOF) {
+   if (putchar(ch) == EOF)
+   err(1, "stdout");
+   if (ch == '\n' && --count == 0)
+   break;
+   }
+   if (ferror(fp)) {
+   warn("%s", name);
+   status = 1;
+   }
 
-   return 0;
+   if (fclose(fp) == EOF) {
+   warn("%s", name);
+   status = 1;
+   }
+
+   return status;
 }
 
 



Re: amd64: simplify TSC sync testing

2022-02-02 Thread Scott Cheloha
> On Feb 2, 2022, at 13:29, Stuart Henderson  wrote:
> 
> Thanks for testing.
> 
>> On 2022/02/02 13:51, Dave Voutila wrote:
>> 
>> Jason McIntyre  writes:
>> 
>>> On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
 This definitely wants testing on Ryzen ThinkPads (e.g. 
 E485/E585/X395/T495s)
 or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
 
 
>>> 
>>> hi.
>>> 
>>> here are the results from a 5505. was the timecounter meant to switch
>>> from tsc?
>>> 
>>> jmc
>>> 
>>> $ sysctl kern.timecounter
>>> kern.timecounter.tick=1
>>> kern.timecounter.timestepwarnings=0
>>> kern.timecounter.hardware=i8254
>>> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
>>> 
>> 
>> I'm seeing the same issue...switching to i8254 pit where before it was
>> using tsc. :(
> 
> There are two separate related things, one is the kernel choice, and the
> other is whether TSC can be used directly from userland for gettimeofday
> and friends without a syscall. Does the dmesg without the diff say "user
> TSC disabled"? If so then it was only using it in the kernel.
> 
> From reading the diff, I do expect that tsc priority is dropped if the
> measurements indicate problems, but I wonder why it falls back to i8254
> even though acpihpet/acpitimer are available and higher priority..

Because we drop the TSC quality after
adding it as the active timecounter.

This violates assumptions in kern_tc.c.

If i8254 is added last, i8254 has a higher
quality than the TSC and is made the active
counter.  The other counters don't factor
in because the code assumed the active
counter is the highest quality counter.

... which woulf be true if we weren't changing
the quality after calling tc_init().



Re: cat(1): drop "rpath" promise in no-file case

2022-02-01 Thread Scott Cheloha
On Tue, Feb 01, 2022 at 01:33:19PM -0700, Todd C. Miller wrote:
> On Tue, 01 Feb 2022 09:59:51 -0600, Scott Cheloha wrote:
> 
> > To recap:
> >
> >  - Refactor the open/close portions of cook_args() and raw_args() into a
> >single function, cat_file().
> >
> >  - Push the flag-check branch in main() down into cat_file().
> >
> >  - Pull the argv loop in cat_file() up into main().
> >
> > Once we've done that, we can then:
> >
> >  - Drop the "rpath" promise in the no-file case in main().
> >
> > Any objections?
> 
> No objection from me.

Whoops, introduced a bug in that patch.  Matthew Martin points out
off-list that STDIN_FILENO isn't necessarily the standard input.  Like
this:

$ cat <&- foo

So, I guess we just double up on the calls to cook_buf() and
raw_cat().  This is closer to how the code was before.  Not thrilled,
but I don't see a simpler way to do it.

Index: cat.c
===
RCS file: /cvs/src/bin/cat/cat.c,v
retrieving revision 1.32
diff -u -p -r1.32 cat.c
--- cat.c   24 Oct 2021 21:24:21 -  1.32
+++ cat.c   1 Feb 2022 23:22:19 -
@@ -50,9 +50,8 @@
 int bflag, eflag, nflag, sflag, tflag, vflag;
 int rval;
 
-void cook_args(char *argv[]);
+void cat_file(const char *);
 void cook_buf(FILE *, const char *);
-void raw_args(char *argv[]);
 void raw_cat(int, const char *);
 
 int
@@ -92,40 +91,54 @@ main(int argc, char *argv[])
return 1;
}
}
+   argc -= optind;
argv += optind;
 
-   if (bflag || eflag || nflag || sflag || tflag || vflag)
-   cook_args(argv);
-   else
-   raw_args(argv);
+   if (argc == 0) {
+   if (pledge("stdio", NULL) == -1)
+   err(1, "pledge");
+
+   cat_file(NULL);
+   } else {
+   for (; *argv != NULL; argv++)
+   cat_file(*argv);
+   }
if (fclose(stdout))
err(1, "stdout");
return rval;
 }
 
 void
-cook_args(char **argv)
+cat_file(const char *path)
 {
FILE *fp;
+   int fd;
 
-   if (*argv == NULL) {
-   cook_buf(stdin, "stdin");
-   return;
-   }
-
-   for (; *argv != NULL; argv++) {
-   if (!strcmp(*argv, "-")) {
+   if (bflag || eflag || nflag || sflag || tflag || vflag) {
+   if (path == NULL || strcmp(path, "-") == 0) {
cook_buf(stdin, "stdin");
clearerr(stdin);
-   continue;
+   } else {
+   if ((fp = fopen(path, "r")) == NULL) {
+   warn("%s", path);
+   rval = 1;
+   return;
+   }
+   cook_buf(fp, path);
+   fclose(fp);
}
-   if ((fp = fopen(*argv, "r")) == NULL) {
-   warn("%s", *argv);
-   rval = 1;
-   continue;
+   } else {
+   if (path == NULL || strcmp(path, "-") == 0) {
+   raw_cat(STDIN_FILENO, "stdin");
+   } else {
+   if ((fd = open(path, O_RDONLY)) == -1) {
+   warn("%s", path);
+   rval = 1;
+   return;
+   }
+   raw_cat(fd, path);
+   close(fd);
}
-   cook_buf(fp, *argv);
-   fclose(fp);
}
 }
 
@@ -191,31 +204,6 @@ cook_buf(FILE *fp, const char *filename)
}
if (ferror(stdout))
err(1, "stdout");
-}
-
-void
-raw_args(char **argv)
-{
-   int fd;
-
-   if (*argv == NULL) {
-   raw_cat(fileno(stdin), "stdin");
-   return;
-   }
-
-   for (; *argv != NULL; argv++) {
-   if (!strcmp(*argv, "-")) {
-   raw_cat(fileno(stdin), "stdin");
-   continue;
-   }
-   if ((fd = open(*argv, O_RDONLY)) == -1) {
-   warn("%s", *argv);
-   rval = 1;
-   continue;
-   }
-   raw_cat(fd, *argv);
-   close(fd);
-   }
 }
 
 void



Re: cat(1): drop "rpath" promise in no-file case

2022-02-01 Thread Scott Cheloha
On Wed, Dec 15, 2021 at 11:51:48AM +, Ricardo Mestre wrote:
> 
> [...]
> 
> the filename parameter on cook_buf is just used to show a warning, but it 
> should
> be "stdin" instead of "(stdin)" to keep the same old behaviour.

Yep, fixed.

> additionally, please add a blank line after the pledge/err lines you added to
> help grepability.

I don't know why this would help, but sure, added a blank line.

> with those changed I have no objections to the diff, check if there's anyone
> else complaining :)

6 week bump.

To recap:

 - Refactor the open/close portions of cook_args() and raw_args() into a
   single function, cat_file().

 - Push the flag-check branch in main() down into cat_file().

 - Pull the argv loop in cat_file() up into main().

Once we've done that, we can then:

 - Drop the "rpath" promise in the no-file case in main().

Any objections?

Index: cat.c
===
RCS file: /cvs/src/bin/cat/cat.c,v
retrieving revision 1.32
diff -u -p -r1.32 cat.c
--- cat.c   24 Oct 2021 21:24:21 -  1.32
+++ cat.c   1 Feb 2022 15:58:25 -
@@ -50,9 +50,8 @@
 int bflag, eflag, nflag, sflag, tflag, vflag;
 int rval;
 
-void cook_args(char *argv[]);
+void cat_file(const char *);
 void cook_buf(FILE *, const char *);
-void raw_args(char *argv[]);
 void raw_cat(int, const char *);
 
 int
@@ -92,40 +91,62 @@ main(int argc, char *argv[])
return 1;
}
}
+   argc -= optind;
argv += optind;
 
-   if (bflag || eflag || nflag || sflag || tflag || vflag)
-   cook_args(argv);
-   else
-   raw_args(argv);
+   if (argc == 0) {
+   if (pledge("stdio", NULL) == -1)
+   err(1, "pledge");
+
+   cat_file(NULL);
+   } else {
+   for (; *argv != NULL; argv++)
+   cat_file(*argv);
+   }
if (fclose(stdout))
err(1, "stdout");
return rval;
 }
 
 void
-cook_args(char **argv)
+cat_file(const char *path)
 {
+   const char *name;
FILE *fp;
+   int fd;
 
-   if (*argv == NULL) {
-   cook_buf(stdin, "stdin");
-   return;
-   }
-
-   for (; *argv != NULL; argv++) {
-   if (!strcmp(*argv, "-")) {
-   cook_buf(stdin, "stdin");
-   clearerr(stdin);
-   continue;
+   if (bflag || eflag || nflag || sflag || tflag || vflag) {
+   if (path == NULL || strcmp(path, "-") == 0) {
+   name = "stdin";
+   fp = stdin;
+   } else {
+   name = path;
+   if ((fp = fopen(name, "r")) == NULL) {
+   warn("%s", name);
+   rval = 1;
+   return;
+   }
}
-   if ((fp = fopen(*argv, "r")) == NULL) {
-   warn("%s", *argv);
-   rval = 1;
-   continue;
+   cook_buf(fp, name);
+   if (fp == stdin)
+   clearerr(stdin);
+   else
+   fclose(fp);
+   } else {
+   if (path == NULL || strcmp(path, "-") == 0) {
+   name = "stdin";
+   fd = STDIN_FILENO;
+   } else {
+   name = path;
+   if ((fd = open(name, O_RDONLY)) == -1) {
+   warn("%s", name);
+   rval = 1;
+   return;
+   }
}
-   cook_buf(fp, *argv);
-   fclose(fp);
+   raw_cat(fd, name);
+   if (fd != STDIN_FILENO)
+   close(fd);
}
 }
 
@@ -191,31 +212,6 @@ cook_buf(FILE *fp, const char *filename)
}
if (ferror(stdout))
err(1, "stdout");
-}
-
-void
-raw_args(char **argv)
-{
-   int fd;
-
-   if (*argv == NULL) {
-   raw_cat(fileno(stdin), "stdin");
-   return;
-   }
-
-   for (; *argv != NULL; argv++) {
-   if (!strcmp(*argv, "-")) {
-   raw_cat(fileno(stdin), "stdin");
-   continue;
-   }
-   if ((fd = open(*argv, O_RDONLY)) == -1) {
-   warn("%s", *argv);
-   rval = 1;
-   continue;
-   }
-   raw_cat(fd, *argv);
-   close(fd);
-   }
 }
 
 void



tr(1): improve table names

2022-01-30 Thread Scott Cheloha
In tr(1), we have these two global arrays, "string1" and "string2".

I have a few complaints:

1. They are not strings.  They are lookup tables.  The names are
   misleading.

2. The arguments given to tr(1) in argv[] are indeed called "string1"
   and "string2".  These are the names used in the standard, the manpage,
   and the usage printout.

   However, the lookup tables are merely *described* by these arguments.
   They are not the arguments themselves.

3. The meaning of the contents of these lookup tables changes depending
   on which of the five different operating modes tr(1) is running in.

   string1[i] might mean "delete byte i" or "squeeze byte i" or
   "replace byte i with the value string1[i]" depending on how
   tr(1) was invoked.

Given this, I think it'd be a lot nicer if we named the tables to
indicate which transformation operation they correspond to.

We have three such operations: "delete", "squeeze", and "translate".
So we ought to have a table for each.  And in setup() we should call
the table a "table", not a "string".

Now when you look at the loops in main() you can immediately
understand which operation is happening without needing to consult the
manpage or the comments.  (Seriously, look.)

I have more cleanup I want to do here in tr.c, but I think renaming
these tables first is going to make the rest of it a lot easier to
review.

ok?

Index: tr.c
===
RCS file: /cvs/src/usr.bin/tr/tr.c,v
retrieving revision 1.20
diff -u -p -r1.20 tr.c
--- tr.c2 Nov 2021 15:45:52 -   1.20
+++ tr.c30 Jan 2022 16:14:21 -
@@ -40,7 +40,8 @@
 
 #include "extern.h"
 
-static int string1[NCHARS] = {
+int delete[NCHARS], squeeze[NCHARS];
+int translate[NCHARS] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* ASCII */
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
@@ -73,7 +74,7 @@ static int string1[NCHARS] = {
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
-}, string2[NCHARS];
+};
 
 STR s1 = { STRING1, NORMAL, 0, OOBCH, { 0, OOBCH }, NULL, NULL };
 STR s2 = { STRING2, NORMAL, 0, OOBCH, { 0, OOBCH }, NULL, NULL };
@@ -122,11 +123,11 @@ main(int argc, char *argv[])
if (argc != 2)
usage();
 
-   setup(string1, argv[0], , cflag);
-   setup(string2, argv[1], , 0);
+   setup(delete, argv[0], , cflag);
+   setup(squeeze, argv[1], , 0);
 
for (lastch = OOBCH; (ch = getchar()) != EOF;)
-   if (!string1[ch] && (!string2[ch] || lastch != ch)) {
+   if (!delete[ch] && (!squeeze[ch] || lastch != ch)) {
lastch = ch;
(void)putchar(ch);
}
@@ -141,10 +142,10 @@ main(int argc, char *argv[])
if (argc != 1)
usage();
 
-   setup(string1, argv[0], , cflag);
+   setup(delete, argv[0], , cflag);
 
while ((ch = getchar()) != EOF)
-   if (!string1[ch])
+   if (!delete[ch])
(void)putchar(ch);
exit(0);
}
@@ -154,10 +155,10 @@ main(int argc, char *argv[])
 * Squeeze all characters (or complemented characters) in string1.
 */
if (sflag && argc == 1) {
-   setup(string1, argv[0], , cflag);
+   setup(squeeze, argv[0], , cflag);
 
for (lastch = OOBCH; (ch = getchar()) != EOF;)
-   if (!string1[ch] || lastch != ch) {
+   if (!squeeze[ch] || lastch != ch) {
lastch = ch;
(void)putchar(ch);
}
@@ -177,7 +178,7 @@ main(int argc, char *argv[])
s2.str = (unsigned char *)argv[1];
 
if (cflag)
-   for (cnt = NCHARS, p = string1; cnt--;)
+   for (cnt = NCHARS, p = translate; cnt--;)
*p++ = OOBCH;
 
if (!next())
@@ -187,45 +188,45 @@ main(int argc, char *argv[])
ch = s2.lastch;
if (sflag)
while (next()) {
-   string1[s1.lastch] = ch = s2.lastch;
-   string2[ch] = 1;
+   translate[s1.lastch] = ch = s2.lastch;
+   squeeze[ch] = 1;
(void)next();
}
else
while (next()) {
-   string1[s1.lastch] = ch = s2.lastch;
+   translate[s1.lastch] = ch = s2.lastch;
(void)next();
}
 
if (cflag)
-   for (cnt = 0, p = string1; cnt < 

Re: touch(1): don't leak descriptor if futimens(2) fails

2022-01-28 Thread Scott Cheloha
On Fri, Jan 28, 2022 at 07:28:40AM -0700, Todd C. Miller wrote:
> On Thu, 27 Jan 2022 20:02:18 -0800, Philip Guenther wrote:
> 
> > > I think futimens(2) and close(2) failures are exotic enough to warrant
> > > printing the system call name.
> >
> > I don't understand this.  Can you give an example of an error message that
> > touch currently might emit where knowing that the failed call was
> > futimens() or close() would affect the analysis of how to deal with it?  I
> > mean, it looks like the only errors that futimens() could really return are
> > EROFS, EIO, and EPERM (implies a race by different users to create the
> > file), and close() could only return EIO.  For any of those errors, you're
> > going to handle them the same whether they're from open, futimens, or
> > close, no?
> 
> I agree.  The actual syscall in this case is pretty much irrelevant.
> The mostly likely failure is due to an I/O error of some kind.

Alright, you have both convinced me.

We'll go with this patch?

Index: touch.c
===
RCS file: /cvs/src/usr.bin/touch/touch.c,v
retrieving revision 1.26
diff -u -p -r1.26 touch.c
--- touch.c 10 Mar 2019 15:11:52 -  1.26
+++ touch.c 28 Jan 2022 15:35:07 -
@@ -137,9 +137,18 @@ main(int argc, char *argv[])
 
/* Create the file. */
fd = open(*argv, O_WRONLY | O_CREAT, DEFFILEMODE);
-   if (fd == -1 || futimens(fd, ts) || close(fd)) {
+   if (fd == -1) {
rval = 1;
warn("%s", *argv);
+   continue;
+   }
+   if (futimens(fd, ts) == -1) {
+   warn("%s", *argv);
+   rval = 1;
+   }
+   if (close(fd) == -1) {
+   warn("%s", *argv);
+   rval = 1;
}
}
return rval;



<    1   2   3   4   5   6   7   8   9   10   >