Re: get public key as non-root

2020-07-03 Thread Jason A. Donenfeld
On Fri, Jul 3, 2020 at 11:47 AM Klemens Nanni  wrote:
> Is there any particular reason why an interface's *public* key is only
> shown to the root user in ifconfig?

Yes, there is a reason for this.

The WireGuard protocol has a property called "identity hiding". See
section 3.4 and 4.3.4 lemma 7 of

or section 7.8 of
. The mac1
value also relies on this identity hiding property. In other words,
public keys should not be easily broadcasted and should not be
accessible to unprivileged users.



amd64: lapic: refactor lapic timer programming

2020-07-03 Thread Scott Cheloha
Hi,

I want to run the lapic timer in one-shot mode on amd64 as we do with
other interrupt clocks on other platforms.  I aim to make the clock
interrupt code MD where possible.

However, nobody is going to test my MD clock interrupt work unless
amd64 is ready to use it.  amd64 doesn't run in oneshot mode so there
is preliminary work to do first.

--

Before we can run the lapic timer in one-shot mode we need to simplify
the process of actually programming it.

This patch refactors all lapic timer programming into a single
routine.  We don't use any divisor other than 1 so I don't see a need
to make it a parameter to lapic_timer_arm().  We can add TSC deadline
support later if someone wants it.

The way we program the timer differs from how e.g. Darwin and FreeBSD
and Linux do it.  They write:

 - lvtt (mode + vector + (maybe) mask)
 - dcr
 - icr

while we do:

 - lvtt (mode + mask)
 - dcr
 - icr
 - (maybe) lvtt (mode + vector)

I don't see a reason to arm the timer with four writes instead of
three, so in this patch I use the three-write ordering.

Am I missing something?  Do I need to disable interrupts before I
reprogram the timer?

-Scott

Index: lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.55
diff -u -p -r1.55 lapic.c
--- lapic.c 3 Aug 2019 14:57:51 -   1.55
+++ lapic.c 4 Jul 2020 00:40:26 -
@@ -413,6 +413,42 @@ u_int32_t lapic_frac_usec_per_cycle;
 u_int64_t lapic_frac_cycle_per_usec;
 u_int32_t lapic_delaytab[26];
 
+void lapic_timer_arm(uint32_t, int, uint32_t);
+void lapic_timer_arm_once(int, uint32_t);
+void lapic_timer_arm_period(int, uint32_t);
+
+/*
+ * Start the local apic countdown timer.
+ *
+ * First set the mode, vector, and (maybe) the mask.
+ * then set the divisor,
+ * and finally set the cycle count.
+ */
+void
+lapic_timer_arm(uint32_t mode, int masked, uint32_t cycles)
+{
+   uint32_t lvtt;
+
+   lvtt = mode | LAPIC_TIMER_VECTOR;
+   lvtt |= (masked) ? LAPIC_LVTT_M : 0;
+
+   lapic_writereg(LAPIC_LVTT, lvtt);
+   lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
+   lapic_writereg(LAPIC_ICR_TIMER, cycles);
+}
+
+void
+lapic_timer_arm_once(int masked, uint32_t cycles)
+{
+   lapic_timer_arm(LAPIC_LVTT_TM_ONESHOT, masked, cycles);
+}
+
+void
+lapic_timer_arm_period(int masked, uint32_t cycles)
+{
+   lapic_timer_arm(LAPIC_LVTT_TM_PERIODIC, masked, cycles);
+}
+
 void
 lapic_clockintr(void *arg, struct intrframe frame)
 {
@@ -430,17 +466,7 @@ lapic_clockintr(void *arg, struct intrfr
 void
 lapic_startclock(void)
 {
-   /*
-* Start local apic countdown timer running, in repeated mode.
-*
-* Mask the clock interrupt and set mode,
-* then set divisor,
-* then unmask and set the vector.
-*/
-   lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_LVTT_M);
-   lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
-   lapic_writereg(LAPIC_ICR_TIMER, lapic_tval);
-   lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_TIMER_VECTOR);
+   lapic_timer_arm_period(0, lapic_tval);
 }
 
 void
@@ -498,9 +524,7 @@ lapic_calibrate_timer(struct cpu_info *c
 * Configure timer to one-shot, interrupt masked,
 * large positive number.
 */
-   lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_M);
-   lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
-   lapic_writereg(LAPIC_ICR_TIMER, 0x8000);
+   lapic_timer_arm_once(1, 0x8000);
 
s = intr_disable();
 
@@ -540,10 +564,7 @@ skip_calibration:
lapic_tval = (lapic_per_second * 2) / hz;
lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
 
-   lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM | LAPIC_LVTT_M |
-   LAPIC_TIMER_VECTOR);
-   lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
-   lapic_writereg(LAPIC_ICR_TIMER, lapic_tval);
+   lapic_timer_arm_period(0, lapic_tval);
 
/*
 * Compute fixed-point ratios between cycles and



Re: userland clock_gettime proof of concept

2020-07-03 Thread Christian Weisgerber
On 2020-07-03, Scott Cheloha  wrote:

> Are we doing powerpc, arm64, and sparc64 separately?

hppa can de done as well, but somebody with access to a machine
needs to investigate/ensure that the S bit in the processor status
register is appropriately set to allow userland access to the
Interval Timer register.

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: userland clock_gettime proof of concept

2020-07-03 Thread Theo de Raadt
Mark Kettenis  wrote:

> I did not add the LFENCE to tc_get_timecount() itself.  We probably
> should, but in the past some networking folks complained about slow
> timecounters affecting network performance, so I wanted confirmation
> that this didn't cause problems.

I don't see how a more precise time can cause such a problem.  If
we are worried about any possibility of negative deltas, which might
be put into unsigned variables, we should use lfence so it can't
appear to go backwards..



Re: userland clock_gettime proof of concept

2020-07-03 Thread Paul Irofti



În 3 iulie 2020 20:57:52 EEST, Mark Kettenis  a scris:
>> Date: Fri, 3 Jul 2020 12:42:58 -0500
>> From: Scott Cheloha 
>> 
>> On Fri, Jul 03, 2020 at 02:34:20PM +0300, Paul Irofti wrote:
>> > On 2020-07-03 00:40, Scott Cheloha wrote:
>> > > On Fri, Jun 26, 2020 at 04:53:14PM +0300, Paul Irofti wrote:
>> > > > On Wed, Jun 24, 2020 at 11:53:23AM +0200, Robert Nagy wrote:
>> > > > > On 22/06/20 19:12 +0300, Paul Irofti wrote:
>> > > > > > New iteration:
>> > > > > > 
>> > > > > >- ps_timekeep should not coredump, pointed by deraadt@
>> > > > > >- set ps_timekeep to 0 before user uvm_map for
>randomization
>> > > > > >- map timekeep before fixup. confirmed by naddy@ that it
>fixes NULL init
>> > > > > >- initialize va. clarified by kettenis@
>> > > > > > 
>> > > > > > How's the magical max skew value research going? Do we have
>a value yet?
>> > > > > > 
>> > > > > > Paul
>> > > > > 
>> > > > > I think we should pick 100 for now and then we can adjust it
>later if needed.
>> > > > > 
>> > > > > Of course this depends on kettenis' lfence diff so that amd
>ryzen tsc is sane.
>> > > > 
>> > > > I looked at dmesglog and the reported values are indeed small.
>99 was
>> > > > the highest on an Atom. I updated the diff to 100. I think we
>can adapt
>> > > > this as we get more reports (if ever).
>> > > > 
>> > > > OK?
>> > > 
>> > > One thing...
>> > > 
>> > > > diff --git lib/libc/arch/amd64/gen/usertc.c
>lib/libc/arch/amd64/gen/usertc.c
>> > > > new file mode 100644
>> > > > index 000..56016c8eca1
>> > > > --- /dev/null
>> > > > +++ lib/libc/arch/amd64/gen/usertc.c
>> > > > @@ -0,0 +1,41 @@
>> > > > +/*$OpenBSD$ */
>> > > > +/*
>> > > > + * Copyright (c) 2020 Paul Irofti 
>> > > > + *
>> > > > + * Permission to use, copy, modify, and distribute this
>software for any
>> > > > + * purpose with or without fee is hereby granted, provided
>that the above
>> > > > + * copyright notice and this permission notice appear in all
>copies.
>> > > > + *
>> > > > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS
>ALL WARRANTIES
>> > > > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
>WARRANTIES OF
>> > > > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR
>BE LIABLE FOR
>> > > > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR
>ANY DAMAGES
>> > > > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
>WHETHER IN AN
>> > > > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
>ARISING OUT OF
>> > > > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
>SOFTWARE.
>> > > > + */
>> > > > +
>> > > > +#include 
>> > > > +#include 
>> > > > +
>> > > > +static inline u_int
>> > > > +rdtsc(void)
>> > > > +{
>> > > > +  uint32_t hi, lo;
>> > > > +  asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
>> > > > +  return ((uint64_t)lo)|(((uint64_t)hi)<<32);
>> > > > +}
>> > > 
>> > > We need to lfence this.
>> > 
>> > In userland too? Why?
>> 
>> I was under the impression kettenis@ had added an lfence to the
>kernel
>> TSC's tc_get_timecount(), but I was mistaken.
>> 
>> We can deal with that separately.
>
>I just committed a diff that adds the LFENCE everywhere where we are
>measuring a time interval to do some sort of calibration.  I think
>that should get the skew down under 100 cycles on all reasonable
>amd64 machines.

Thank you very much for that, Mark!

>I did not add the LFENCE to tc_get_timecount() itself.  We probably
>should, but in the past some networking folks complained about slow
>timecounters affecting network performance, so I wanted confirmation
>that this didn't cause problems.

Right. I think we should decide that separately from this diff. As it is the 
diff is well tested. 

Paul



Re: userland clock_gettime proof of concept

2020-07-03 Thread Mark Kettenis
> Date: Fri, 3 Jul 2020 12:42:58 -0500
> From: Scott Cheloha 
> 
> On Fri, Jul 03, 2020 at 02:34:20PM +0300, Paul Irofti wrote:
> > On 2020-07-03 00:40, Scott Cheloha wrote:
> > > On Fri, Jun 26, 2020 at 04:53:14PM +0300, Paul Irofti wrote:
> > > > On Wed, Jun 24, 2020 at 11:53:23AM +0200, Robert Nagy wrote:
> > > > > On 22/06/20 19:12 +0300, Paul Irofti wrote:
> > > > > > New iteration:
> > > > > > 
> > > > > >- ps_timekeep should not coredump, pointed by deraadt@
> > > > > >- set ps_timekeep to 0 before user uvm_map for randomization
> > > > > >- map timekeep before fixup. confirmed by naddy@ that it fixes 
> > > > > > NULL init
> > > > > >- initialize va. clarified by kettenis@
> > > > > > 
> > > > > > How's the magical max skew value research going? Do we have a value 
> > > > > > yet?
> > > > > > 
> > > > > > Paul
> > > > > 
> > > > > I think we should pick 100 for now and then we can adjust it later if 
> > > > > needed.
> > > > > 
> > > > > Of course this depends on kettenis' lfence diff so that amd ryzen tsc 
> > > > > is sane.
> > > > 
> > > > I looked at dmesglog and the reported values are indeed small. 99 was
> > > > the highest on an Atom. I updated the diff to 100. I think we can adapt
> > > > this as we get more reports (if ever).
> > > > 
> > > > OK?
> > > 
> > > One thing...
> > > 
> > > > diff --git lib/libc/arch/amd64/gen/usertc.c 
> > > > lib/libc/arch/amd64/gen/usertc.c
> > > > new file mode 100644
> > > > index 000..56016c8eca1
> > > > --- /dev/null
> > > > +++ lib/libc/arch/amd64/gen/usertc.c
> > > > @@ -0,0 +1,41 @@
> > > > +/* $OpenBSD$ */
> > > > +/*
> > > > + * Copyright (c) 2020 Paul Irofti 
> > > > + *
> > > > + * Permission to use, copy, modify, and distribute this software for 
> > > > any
> > > > + * purpose with or without fee is hereby granted, provided that the 
> > > > above
> > > > + * copyright notice and this permission notice appear in all copies.
> > > > + *
> > > > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL 
> > > > WARRANTIES
> > > > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> > > > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE 
> > > > FOR
> > > > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY 
> > > > DAMAGES
> > > > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN 
> > > > AN
> > > > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING 
> > > > OUT OF
> > > > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > > > + */
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +static inline u_int
> > > > +rdtsc(void)
> > > > +{
> > > > +   uint32_t hi, lo;
> > > > +   asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> > > > +   return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> > > > +}
> > > 
> > > We need to lfence this.
> > 
> > In userland too? Why?
> 
> I was under the impression kettenis@ had added an lfence to the kernel
> TSC's tc_get_timecount(), but I was mistaken.
> 
> We can deal with that separately.

I just committed a diff that adds the LFENCE everywhere where we are
measuring a time interval to do some sort of calibration.  I think
that should get the skew down under 100 cycles on all reasonable
amd64 machines.

I did not add the LFENCE to tc_get_timecount() itself.  We probably
should, but in the past some networking folks complained about slow
timecounters affecting network performance, so I wanted confirmation
that this didn't cause problems.



Re: userland clock_gettime proof of concept

2020-07-03 Thread Theo de Raadt
Scott Cheloha  wrote:

> I was under the impression kettenis@ had added an lfence to the kernel
> TSC's tc_get_timecount(), but I was mistaken.
> 
> We can deal with that separately.

the question was asked, but I guess not answered definitively

I think we should *always* lfence, because in the end the values do
get compared against others which are lfenced, and it would be a
shame if delta was ever negative.



wg: get public key as non-root

2020-07-03 Thread Klemens Nanni
Is there any particular reason why an interface's *public* key is only
shown to the root user in ifconfig?

Similar to `wgport', I'd like to see the public key as non-root user as
well for convenience:

$ ifconfig wg0
wg0: flags=80c3 mtu 1420
index 5 priority 0 llprio 3
wgport 1078
wgpubkey mrtNB07tzEJKyJDvhaov7QYt487BXLK3hnnZB+pDIhM=
groups: wg
inet6 ... prefixlen 126

This makes the SIOCGWG ioctl return the public key to unprivileged users
such ifconfig read it.

Feedback? Objections? OK?

diff --git a/sys/net/if_wg.c b/sys/net/if_wg.c
index 3f59681fe..f9a145b37 100644
--- a/sys/net/if_wg.c
+++ b/sys/net/if_wg.c
@@ -2374,12 +2374,14 @@ wg_ioctl_get(struct wg_softc *sc, struct wg_data_io 
*data)
iface_o.i_flags |= WG_INTERFACE_HAS_RTABLE;
}
 
+   if (noise_local_keys(>sc_local, iface_o.i_public, NULL) == 0) {
+   iface_o.i_flags |= WG_INTERFACE_HAS_PUBLIC;
+   }
+
if (!is_suser)
goto copy_out_iface;
 
-   if (noise_local_keys(>sc_local, iface_o.i_public,
-   iface_o.i_private) == 0) {
-   iface_o.i_flags |= WG_INTERFACE_HAS_PUBLIC;
+   if (noise_local_keys(>sc_local, NULL, iface_o.i_private) == 0) {
iface_o.i_flags |= WG_INTERFACE_HAS_PRIVATE;
}
 



Re: userland clock_gettime proof of concept

2020-07-03 Thread Scott Cheloha
On Fri, Jul 03, 2020 at 02:34:20PM +0300, Paul Irofti wrote:
> On 2020-07-03 00:40, Scott Cheloha wrote:
> > On Fri, Jun 26, 2020 at 04:53:14PM +0300, Paul Irofti wrote:
> > > On Wed, Jun 24, 2020 at 11:53:23AM +0200, Robert Nagy wrote:
> > > > On 22/06/20 19:12 +0300, Paul Irofti wrote:
> > > > > New iteration:
> > > > > 
> > > > >- ps_timekeep should not coredump, pointed by deraadt@
> > > > >- set ps_timekeep to 0 before user uvm_map for randomization
> > > > >- map timekeep before fixup. confirmed by naddy@ that it fixes 
> > > > > NULL init
> > > > >- initialize va. clarified by kettenis@
> > > > > 
> > > > > How's the magical max skew value research going? Do we have a value 
> > > > > yet?
> > > > > 
> > > > > Paul
> > > > 
> > > > I think we should pick 100 for now and then we can adjust it later if 
> > > > needed.
> > > > 
> > > > Of course this depends on kettenis' lfence diff so that amd ryzen tsc 
> > > > is sane.
> > > 
> > > I looked at dmesglog and the reported values are indeed small. 99 was
> > > the highest on an Atom. I updated the diff to 100. I think we can adapt
> > > this as we get more reports (if ever).
> > > 
> > > OK?
> > 
> > One thing...
> > 
> > > diff --git lib/libc/arch/amd64/gen/usertc.c 
> > > lib/libc/arch/amd64/gen/usertc.c
> > > new file mode 100644
> > > index 000..56016c8eca1
> > > --- /dev/null
> > > +++ lib/libc/arch/amd64/gen/usertc.c
> > > @@ -0,0 +1,41 @@
> > > +/*   $OpenBSD$ */
> > > +/*
> > > + * Copyright (c) 2020 Paul Irofti 
> > > + *
> > > + * Permission to use, copy, modify, and distribute this software for any
> > > + * purpose with or without fee is hereby granted, provided that the above
> > > + * copyright notice and this permission notice appear in all copies.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL 
> > > WARRANTIES
> > > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> > > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE 
> > > FOR
> > > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> > > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> > > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT 
> > > OF
> > > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +static inline u_int
> > > +rdtsc(void)
> > > +{
> > > + uint32_t hi, lo;
> > > + asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> > > + return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> > > +}
> > 
> > We need to lfence this.
> 
> In userland too? Why?

I was under the impression kettenis@ had added an lfence to the kernel
TSC's tc_get_timecount(), but I was mistaken.

We can deal with that separately.

> Anyway. I think this diff should be committed. It is too long since we
> have been dancing around it and these sorts of comments can always be
> addressed in tree.



Re: userland clock_gettime proof of concept

2020-07-03 Thread Paul Irofti



În 3 iulie 2020 18:45:29 EEST, Scott Cheloha  a scris:
>On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote:
>> 
>> 
>> ??n 3 iulie 2020 17:55:25 EEST, Mark Kettenis
> a scris:
>> >> Date: Fri, 3 Jul 2020 15:13:22 +0200
>> >> From: Robert Nagy 
>> >> 
>> >> On 02/07/20 00:31 +0100, Stuart Henderson wrote:
>> >> > running on 38 of these, btw.
>> >> 
>> >> been running with this on all my workstations and laptops and on 3
>> >build
>> >> servers as well
>> >
>> >Are the issue that naddy@ saw solved?
>> >
>> >Did anybody do a *proper* test on anything besides amd64? 
>Especially
>> >on architectures where the optimized clock_gettime is *not*
>available?
>> 
>> Yes and yes. 
>
>I don't see any userland drivers for anything but amd64 in your diff.
>
>Are we doing powerpc, arm64, and sparc64 separately?

Search the thread. Others have written them, yes. And kettenis was asking about 
the oppsite: architectures without them.



Re: userland clock_gettime proof of concept

2020-07-03 Thread Christian Weisgerber
On 2020-07-03, Mark Kettenis  wrote:

> Are the issue that naddy@ saw solved?

Yes, they were understood and fixed.

I'll defer to you and Scott regarding the TSC synchronization issues;
aside from that I'm okaying the diff.

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: userland clock_gettime proof of concept

2020-07-03 Thread Scott Cheloha
On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote:
> 
> 
> ??n 3 iulie 2020 17:55:25 EEST, Mark Kettenis  a 
> scris:
> >> Date: Fri, 3 Jul 2020 15:13:22 +0200
> >> From: Robert Nagy 
> >> 
> >> On 02/07/20 00:31 +0100, Stuart Henderson wrote:
> >> > running on 38 of these, btw.
> >> 
> >> been running with this on all my workstations and laptops and on 3
> >build
> >> servers as well
> >
> >Are the issue that naddy@ saw solved?
> >
> >Did anybody do a *proper* test on anything besides amd64?  Especially
> >on architectures where the optimized clock_gettime is *not* available?
> 
> Yes and yes. 

I don't see any userland drivers for anything but amd64 in your diff.

Are we doing powerpc, arm64, and sparc64 separately?



Re: userland clock_gettime proof of concept

2020-07-03 Thread Paul Irofti



În 3 iulie 2020 17:55:25 EEST, Mark Kettenis  a scris:
>> Date: Fri, 3 Jul 2020 15:13:22 +0200
>> From: Robert Nagy 
>> 
>> On 02/07/20 00:31 +0100, Stuart Henderson wrote:
>> > running on 38 of these, btw.
>> 
>> been running with this on all my workstations and laptops and on 3
>build
>> servers as well
>
>Are the issue that naddy@ saw solved?
>
>Did anybody do a *proper* test on anything besides amd64?  Especially
>on architectures where the optimized clock_gettime is *not* available?

Yes and yes. 



Re: userland clock_gettime proof of concept

2020-07-03 Thread Mark Kettenis
> Date: Fri, 3 Jul 2020 15:13:22 +0200
> From: Robert Nagy 
> 
> On 02/07/20 00:31 +0100, Stuart Henderson wrote:
> > running on 38 of these, btw.
> 
> been running with this on all my workstations and laptops and on 3 build
> servers as well

Are the issue that naddy@ saw solved?

Did anybody do a *proper* test on anything besides amd64?  Especially
on architectures where the optimized clock_gettime is *not* available?



Re: pipex(4): kill pipexintr()

2020-07-03 Thread Vitaliy Makkoveev
ping?

> On 1 Jul 2020, at 22:42, Vitaliy Makkoveev  wrote:
> 
> pipex(4) has 2 mbuf queues: `pipexinq' and `pipexoutq'. When mbuf passed
> to pipex it goes to one of these queues and pipexintr() will be
> scheduled to process them. pipexintr() called from `netisr' context.
> 
> It's true for pppac(4) but for pppx(4) only incoming mbufs go to
> `pipexinq. Outgoing mbufs go directly to stack. pppx(4) enabled in
> npppd.conf(5) by default so I guess it's the common case of pipex(4)
> usage.
> 
> The code looks like there is no requirements to this delayed mbufs
> processing, we can pass it directly to stack as we do for pppx(4)
> outgoing traffic.
> 
> Also we have some troubles with pipexintr() as it was described in [1].
> It's protection of `ph_cookie'. We don't this protection this time and
> we can't because we should brake if_get(9) logic.
> 
> Diff below removes pipexintr(). Now all mbufs passed directly without
> enqueueing within pipex(4). We also can destroy sessions safe in all
> cases. We also can use if_get(9) instead using unreferenced pointers to
> `ifnet' within pipex(4). We also avoided context switch while we
> processing mbufs within pipex(4). We decreased latency.
> 
> I'm seeding debian torrents with this diff an all goes well.
> 
> 1. https://marc.info/?t=15930080902=1=2
> 
> Index: lib/libc/sys/sysctl.2
> ===
> RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
> retrieving revision 1.40
> diff -u -p -r1.40 sysctl.2
> --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 -  1.40
> +++ lib/libc/sys/sysctl.2 1 Jul 2020 19:20:22 -
> @@ -2033,35 +2033,11 @@ The currently defined variable names are
> .Bl -column "Third level name" "integer" "Changeable" -offset indent
> .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
> .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
> -.It Dv PIPEXCTL_INQ Ta node Ta not applicable
> -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
> .El
> .Bl -tag -width "123456"
> .It Dv PIPEXCTL_ENABLE
> If set to 1, enable PIPEX processing.
> The default is 0.
> -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about the PIPEX packet input queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
> -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about PIPEX packet output queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
> .El
> .El
> .Ss CTL_VFS
> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.611
> diff -u -p -r1.611 if.c
> --- sys/net/if.c  30 Jun 2020 09:31:38 -  1.611
> +++ sys/net/if.c  1 Jul 2020 19:20:27 -
> @@ -1012,13 +1012,6 @@ if_netisr(void *unused)
>   KERNEL_UNLOCK();
>   }
> #endif
> -#ifdef PIPEX
> - if (n & (1 << NETISR_PIPEX)) {
> - KERNEL_LOCK();
> - pipexintr();
> - KERNEL_UNLOCK();
> - }
> -#endif
>   t |= n;
>   }
> 
> Index: sys/net/netisr.h
> ===
> RCS file: /cvs/src/sys/net/netisr.h,v
> retrieving revision 1.51
> diff -u -p -r1.51 netisr.h
> --- sys/net/netisr.h  6 Aug 2019 22:57:54 -   1.51
> +++ sys/net/netisr.h  1 Jul 2020 19:20:27 -
> @@ -48,7 +48,6 @@
> #define   NETISR_IPV6 24  /* same as AF_INET6 */
> #define   NETISR_ISDN 26  /* same as AF_E164 */
> #define   NETISR_PPP  28  /* for PPP processing */
> -#define  NETISR_PIPEX27  /* for pipex processing */
> #define   NETISR_BRIDGE   29  /* for bridge processing */
> #define   NETISR_PPPOE30  /* for pppoe processing */
> #define   NETISR_SWITCH   31  /* for switch dataplane */
> @@ -68,7 +67,6 @@ voidbridgeintr(void);
> void  pppoeintr(void);
> void  switchintr(void);
> void  pfsyncintr(void);
> -void pipexintr(void);
> 
> #define   schednetisr(anisr)  
> \
> do {  \
> Index: sys/net/pipex.c
> ===
> RCS file: /cvs/src/sys/net/pipex.c,v
> retrieving revision 1.117
> diff -u -p -r1.117 pipex.c
> --- sys/net/pipex.c   30 Jun 2020 14:05:13 -  1.117
> +++ sys/net/pipex.c   1 Jul 2020 19:20:28 -
> @@ -97,10 +97,6 @@ struct radix_node_head *pipex_rd_head6 =
> struct timeout pipex_timer_ch; 

Re: fix races in if_clone_create()

2020-07-03 Thread Vitaliy Makkoveev
ping?

> On 1 Jul 2020, at 00:02, Vitaliy Makkoveev  wrote:
> 
> On Tue, Jun 30, 2020 at 03:48:22PM +0300, Vitaliy Makkoveev wrote:
>> On Tue, Jun 30, 2020 at 12:08:03PM +0200, Martin Pieuchot wrote:
>>> On 29/06/20(Mon) 11:59, Vitaliy Makkoveev wrote:
 [...] 
 I reworked tool for reproduce. Now I avoided fork()/exec() route and it
 takes couple of minutes to take panic on 4 cores. Also some screenshots
 attached.
>>> 
>>> Setting kern.pool_debug=2 makes the race reproducible in seconds.
> 
> Unfortunately you will catch splassert() caused by kern/sched_bsd.c:304.
> malloc() will call yield() while we are holding NET_LOCK(). I attached
> screenshot with splassertion to this mail.
> 
>>> 
>>> Could you turn this test into something committable in regress/?  We can
>>> link it to the build once a fix is committed.
>>> 
>> 
>> We have 3 races with cloned interfaces:
>> 1. if_clone_create() vs if_clone_create()
>> 2. if_clone_destroy() vs if_clone_destroy()
>> 3. if_clone_destroy() vs the rest of stack
>> 
>> It makes sences to commit unified test to regress/, so I suggest to wait
>> a little.
> 
> The another solution.
> 
> Diff below introduces per-`ifc' serialization for if_clone_create() and
> if_clone_destroy(). There is no index bitmap anymore.
> 
> Diff fixes the following races:
> 1. if_clone_create() vs if_clone_create()
> 2. if_clone_destroy() vs if_clone_destroy()
> 
> `ifc_create' will go the same lock path for each cloner, and
> `ifc_destroy' will go this path but in reverse order. It seems
> reasonable to allow simultaneous create/destroy for different cloners
> but since different instances of one cloner will block each other it's
> no reason have parallelism here.
> 
> Updated test tool
>  cut begin 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> static struct ifreq ifr;
> 
> static void *clone_create(void *arg)
> {
>   int s;
> 
>   if((s=socket(AF_INET, SOCK_DGRAM, 0))<0)
>   err(1, "socket()");
>   while(1){
>   if(ioctl(s, SIOCIFCREATE, )<0)
>   if(errno==EINVAL)
>   exit(1);
>   }
> 
>   return NULL;
> }
> 
> static void *clone_destroy(void *arg)
> {
>   int s;
> 
>   if((s=socket(AF_INET, SOCK_DGRAM, 0))<0)
>   err(1, "socket()");
>   while(1){
>   if(ioctl(s, SIOCIFDESTROY, )<0)
>   if(errno==EINVAL)
>   exit(1);
>   }
> 
>   return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>   pthread_t thr;
>   int i;
> 
>   if(argc!=2){
>   fprintf(stderr, "usage: %s ifname\n", getprogname());
>   return 1;
>   }
> 
>   if(getuid()!=0){
>   fprintf(stderr, "should be root\n");
>   return 1;
>   }
> 
>   memset(, 0, sizeof(ifr));
>   strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name));
> 
>   for(i=0; i<8*4; ++i){
>   if(pthread_create(, NULL, clone_create, NULL)!=0)
>   errx(1, "pthread_create(clone_create)");
>   if(pthread_create(, NULL, clone_destroy, NULL)!=0)
>   errx(1, "pthread_create(clone_destroy)");
>   }
> 
>   select(0, NULL, NULL, NULL, NULL);
> 
>   return 0;
> }
>  cut end 
> 
> 
> 
> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.611
> diff -u -p -r1.611 if.c
> --- sys/net/if.c  30 Jun 2020 09:31:38 -  1.611
> +++ sys/net/if.c  30 Jun 2020 20:41:50 -
> @@ -155,6 +155,8 @@ int   if_getgrouplist(caddr_t);
> void  if_linkstate(struct ifnet *);
> void  if_linkstate_task(void *);
> 
> +int  if_clone_lock(struct if_clone *);
> +void if_clone_unlock(struct if_clone *);
> int   if_clone_list(struct if_clonereq *);
> struct if_clone   *if_clone_lookup(const char *, int *);
> 
> @@ -1244,27 +1246,35 @@ if_clone_create(const char *name, int rd
> {
>   struct if_clone *ifc;
>   struct ifnet *ifp;
> - int unit, ret;
> + int unit, error;
> 
>   ifc = if_clone_lookup(name, );
>   if (ifc == NULL)
>   return (EINVAL);
> 
> - if (ifunit(name) != NULL)
> - return (EEXIST);
> + error = if_clone_lock(ifc);
> + if (error != 0)
> + return (error);
> +
> + if (ifunit(name) != NULL) {
> + error = (EEXIST);
> + goto unlock;
> + }
> 
> - ret = (*ifc->ifc_create)(ifc, unit);
> + error = (*ifc->ifc_create)(ifc, unit);
> 
> - if (ret != 0 || (ifp = ifunit(name)) == NULL)
> - return (ret);
> + if (error != 0 || (ifp = ifunit(name)) == NULL)
> + goto unlock;
> 
>   NET_LOCK();
>   if_addgroup(ifp, ifc->ifc_name);
>   if (rdomain != 0)
>   

Re: userland clock_gettime proof of concept

2020-07-03 Thread Robert Nagy
On 02/07/20 00:31 +0100, Stuart Henderson wrote:
> running on 38 of these, btw.

been running with this on all my workstations and laptops and on 3 build
servers as well



Re: top: remove handle abstraction, use simpler process list

2020-07-03 Thread Klemens Nanni
On Fri, Jun 26, 2020 at 04:48:53PM +0200, Klemens Nanni wrote:
> The internal handle used to pass process information is a needless
> abstraction, after previously removing an unused member, it now only has
> one member pointing to a pointer to a process struct, i.e. a simple list
> of processes.
> 
> Remove the abstraction layer and (re)use the existing list of
> (pointers to) kinfo_proc structs.  The diff is straight forward and I
> tested it on amd64 and sparc64.
> 
> With this, scrolling does not even deserve its own helper, we can merely
> point at the process to format with an offset in the list now.
Updated diff that uses a `procp' variable in the for loop and does not
rename format_next_process() (separate diff).

Feedback? OK?


Index: commands.c
===
RCS file: /cvs/src/usr.bin/top/commands.c,v
retrieving revision 1.33
diff -u -p -r1.33 commands.c
--- commands.c  8 Oct 2019 07:26:59 -   1.33
+++ commands.c  3 Jul 2020 10:57:15 -
@@ -36,6 +36,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: display.c
===
RCS file: /cvs/src/usr.bin/top/display.c,v
retrieving revision 1.62
diff -u -p -r1.62 display.c
--- display.c   27 Jun 2020 01:09:57 -  1.62
+++ display.c   3 Jul 2020 10:57:15 -
@@ -46,6 +46,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: machine.c
===
RCS file: /cvs/src/usr.bin/top/machine.c,v
retrieving revision 1.106
diff -u -p -r1.106 machine.c
--- machine.c   26 Jun 2020 20:55:55 -  1.106
+++ machine.c   3 Jul 2020 11:28:51 -
@@ -60,12 +60,6 @@ static char  *format_comm(struct kinfo_pr
 static int cmd_matches(struct kinfo_proc *, char *);
 static char**get_proc_args(struct kinfo_proc *);
 
-/* get_process_info passes back a handle.  This is what it looks like: */
-
-struct handle {
-   struct kinfo_proc **next_proc;  /* points to next valid proc pointer */
-};
-
 /* what we consider to be process size: */
 #define PROCSIZE(pp) ((pp)->p_vm_tsize + (pp)->p_vm_dsize + (pp)->p_vm_ssize)
 
@@ -127,7 +121,7 @@ static int  nproc;
 static int onproc = -1;
 static int pref_len;
 static struct kinfo_proc *pbase;
-static struct kinfo_proc **pref;
+struct kinfo_proc **pref;
 
 /* these are for getting the memory statistics */
 static int pageshift;  /* log base 2 of the pagesize */
@@ -308,8 +302,6 @@ get_system_info(struct system_info *si)
si->last_pid = -1;
 }
 
-static struct handle handle;
-
 struct kinfo_proc *
 getprocs(int op, int arg, int *cnt)
 {
@@ -408,7 +400,7 @@ cmd_matches(struct kinfo_proc *proc, cha
return 0;
 }
 
-struct handle *
+void
 get_process_info(struct system_info *si, struct process_select *sel,
 int (*compare) (const void *, const void *))
 {
@@ -484,10 +476,6 @@ get_process_info(struct system_info *si,
/* remember active and total counts */
si->p_total = total_procs;
si->p_active = pref_len = active_procs;
-
-   /* pass back a handle */
-   handle.next_proc = pref;
-   return 
 }
 
 char fmt[MAX_COLS];/* static area where result is built */
@@ -531,24 +519,14 @@ format_comm(struct kinfo_proc *kp)
return (buf);
 }
 
-void
-skip_processes(struct handle *hndl, int n)
-{
-   hndl->next_proc += n;
-}
-
 char *
-format_next_process(struct handle *hndl, const char *(*get_userid)(uid_t, int),
+format_next_process(struct kinfo_proc *pp, const char *(*get_userid)(uid_t, 
int),
 pid_t *pid)
 {
char *p_wait;
-   struct kinfo_proc *pp;
int cputime;
double pct;
char buf[16];
-
-   /* find and remember the next proc structure */
-   pp = *(hndl->next_proc++);
 
cputime = pp->p_rtime_sec + ((pp->p_rtime_usec + 50) / 100);
 
Index: machine.h
===
RCS file: /cvs/src/usr.bin/top/machine.h,v
retrieving revision 1.29
diff -u -p -r1.29 machine.h
--- machine.h   25 Jun 2020 20:38:41 -  1.29
+++ machine.h   3 Jul 2020 11:28:54 -
@@ -87,11 +87,10 @@ extern int  display_init(struct stat
 extern int  machine_init(struct statics *);
 extern char*format_header(char *);
 extern void get_system_info(struct system_info *);
-extern struct handle
-*get_process_info(struct system_info *, struct process_select *,
+extern void
+get_process_info(struct system_info *, struct process_select *,
 int (*) (const void *, const void *));
-extern void skip_processes(struct handle *, int);
-extern char*format_next_process(struct handle *,
+extern char*format_next_process(struct kinfo_proc *,
 const char *(*)(uid_t, int), pid_t *);
 extern uid_tproc_owner(pid_t);
 
Index: top.c
===
RCS 

Re: userland clock_gettime proof of concept

2020-07-03 Thread Paul Irofti

On 2020-07-03 00:40, Scott Cheloha wrote:

On Fri, Jun 26, 2020 at 04:53:14PM +0300, Paul Irofti wrote:

On Wed, Jun 24, 2020 at 11:53:23AM +0200, Robert Nagy wrote:

On 22/06/20 19:12 +0300, Paul Irofti wrote:

New iteration:

   - ps_timekeep should not coredump, pointed by deraadt@
   - set ps_timekeep to 0 before user uvm_map for randomization
   - map timekeep before fixup. confirmed by naddy@ that it fixes NULL init
   - initialize va. clarified by kettenis@

How's the magical max skew value research going? Do we have a value yet?

Paul


I think we should pick 100 for now and then we can adjust it later if needed.

Of course this depends on kettenis' lfence diff so that amd ryzen tsc is sane.


I looked at dmesglog and the reported values are indeed small. 99 was
the highest on an Atom. I updated the diff to 100. I think we can adapt
this as we get more reports (if ever).

OK?


One thing...


diff --git lib/libc/arch/amd64/gen/usertc.c lib/libc/arch/amd64/gen/usertc.c
new file mode 100644
index 000..56016c8eca1
--- /dev/null
+++ lib/libc/arch/amd64/gen/usertc.c
@@ -0,0 +1,41 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (c) 2020 Paul Irofti 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+
+static inline u_int
+rdtsc(void)
+{
+   uint32_t hi, lo;
+   asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
+   return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+}


We need to lfence this.


In userland too? Why? Anyway. I think this diff should be committed. It 
is too long since we have been dancing around it and these sorts of 
comments can always be addressed in tree.


Paul



Re: snmp(1) initial UTF-8 support

2020-07-03 Thread Ingo Schwarze
Hi Martijn,

sorry for the delay, now i finally looked at the function
smi_displayhint_os() from the line "if (MB_CUR_MAX > 1) {" to the
end of the corresponding else clause.  IIUC, what that code is
trying to do is iterate the input buffer "const char *buf" of length
"size_t buflen".

Before starting, i will consider the sizing of rbuf.  It is octetlength
plus strlen("STRING: ") plus 1 byte for the terminating NUL.  Then
optionally, "STRING: " is copied into it, so octetlength plus one
byte for the terminating NUL remains.

However, if the "STRING: " is copied in, j is intialized to
strlen("STRING: ") and then later compared to octetlength which
remains unchanged.  That feels odd.  I would expect that if j starts
at strlen("STRING: "), octetlength should be increased accordingly,
or alternatively, if octetlength remains as it is, j should always
start at 0.  Otherwise, the last eight bytes allocated always seem
to remain unused.  Is the octetlength input parameter supposed to
include strlen("STRING: ") or not?  [1]

What the else clause does (i.e. when the user selected the C locale)
seems to be this:

 * j is unconditionally re-initialized to 0, so if "STRING: "
   was written earlier, that gets overwritten.  Is that
   intentional?  [2]
 * UTF-8 starting bytes followed by another UTF-8 starting byte
   are totally ignored.  Is that intentional?  If feels odd.  [3]
 * UTF-8 starting bytes followed by an ascii byte
   are represented as ".".  That seems intentional.
 * UTF-8 starting bytes followed by at least one UTF-8 cont byte
   are represented by "?".  Up to FOUR cont bytes are totally
   ignored, which seems quite odd because an UTF-8 sequence
   can have at most THREE cont bytes (four bytes total including
   the starting byte).  [4]  From the FIFTH following cont byte
   onwards, "." is written for each cont byte.
   Note that this is *not* trying to check whether the UTF-8 sequence
   is valid (which wouldn't be trivial at all), but probably that
   isn't required.  It merely tries to check whether the sequence
   is longer than the absolute maximum (and seems to allow one byte
   too much unless i misread the code).
 * A sequence "start + cont + ASCII + cont" prints "?a",
   then totally ignores the second cont byte as if it were
   a continuation of the initial start byte, despite the
   intervening ASCII byte.  [5]
   The state machine doesn't appear to be fully thought out.
   Did you prepare a list for yourself containing all the
   possible states and all the edges of the machine?
 * An initial cont byte, or one after start + ASCII, or one after
   start + at least 4 cont + ASCII is represented by ".";
   that seems intentional.
 * In the for loop in the else clause, the "i < octetlength" feels
   confusing to me.  It cannot cause an overflow because in the
   else clause, we always have j < i (caveat, this might only be
   due to the possibly unintentional re-initializing of j to 0).
   But assume an input buf containing lots of UTF-8 characters.  In
   that case, you abort processing long before the output rbuf is
   full.  Is that intentional?  Either way, wouldn't "j < octetlength"
   be both clearer and safer?  [6]
 * The input "UTF-8 start + ASCII, end of input" is represented
   as "x." rather than ".x".  [7]

So the idea probably is to print

 * "." for clearly invalid bytes including non-printable ASCII bytes
 * "?" for UTF-8 start bytes including following cont bytes that
   _might_ form a valid UTF-8 sequence, accepting invalid sequences
   that would be non-trivial to identify.

But it doesn't quite do that.

For the "if (MB_CUR_MAX > 1)" clause (i.e., when the user selected
a UTF-8 lcale), it seems to do this:

 * Embedded NUL bytes terminate processing, returning what was
   decoded so far.  Note that differs from what the ASCII clause
   does, which represents NUL as "." and continues processing.
   I don't think it can be intentional that processing stops
   at different places depending on the user's locale.  [8]
 * The "case -1" clause obviously lacks a call
   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
   see the mbtowc(3) manual page,
   /usr/src/usr.bin/fmt/fmt.c for simple examples, or
   /usr/src/usr.bin/ssh/utf8.c for a full-blown example.  [9]
 * In contrast to the ASCII clause, the UTF-8 clause does detect
   all invalid UTF-8 sequences and writes a replacement character
   for each invalid byte (rather than a replacement character
   for groups of invalid bytes as the ASCII clause does).
   I think this difference in behaviour makes sense because
   the UTF-8 clause can easily detect such conditions while
   the ASCII clause should better not try.
 * For non-printable UTF-8 characters, it prints ".".
   This feels inconsistent.  We have:

   v-- input ASCII   UTF-8  <-  locale
   invalid byte  "." REPLACEMENT
   non-printable char"." "."
   UTF-8 printable char  "?" itself
   ASCII printable char  itself  itself

   So judging from 

Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-03 Thread Otto Moerbeek
On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:

> Hi,
> 
> When we recompute the scaling factor during tc_windup() there is an
> opportunity for arithmetic overflow/underflow when we add the NTP
> adjustment into the scale:
> 
>649  scale = (u_int64_t)1 << 63;
>650  scale += \
>651  ((th->th_adjustment + th->th_counter->tc_freq_adj) / 
> 1024) * 2199;
>652  scale /= th->th_counter->tc_frequency;
>653  th->th_scale = scale * 2;
> 
> At lines 650 and 651, you will overflow/underflow if
> th->th_counter->tc_freq_adj is sufficiently positive/negative.
> 
> I don't like the idea of checking for that overflow during
> tc_windup().  We can pick a reasonable adjustment range and check for
> it during adjfreq(2) and that should be good enough.
> 
> My strawman proposal is a range of -5 to 5 parts per
> billion.  We could push the limits a bit, but half a billion seems
> like a nice round number to me.
> 
> On a perfect clock, this means you can effect a 0.5x slowdown or a
> 1.5x speedup via adjfreq(2), but no slower/faster.
> 
> I don't *think* ntpd(8) would ever reach such extreme adjustments
> through its algorithm.  I don't think this will break anyone's working
> setup.
> 
> (Maybe I'm wrong, though.  otto@?)

Right, ntpd is pretty conversative and won't do big adjustments.

-Otto

> 
> Just so we're all clear that the math is sound, here's the result at
> the upper limit of the input range.  Note that adjtime(2) is capped at
> 5000PPM in ntp_update_second(), hence its value here.
> 
>   int64_t th_adjustment = (5000 * 1000) << 32;/* 2147483648000 */
>   int64_t tc_freq_adj = 5LL << 32;/* 21474836480 
> */
>   
> 
>   scale = (u_int64_t)1 << 63  /* 9223372036854775808 
> */
>   scale += (th_adjustment + tc_freq_adj) / 1024 * 2199;
>   /*+= (216895848448000) / 1024 * 2199; */
>   /*+= 465775362048000; */
> 
> 9223372036854775808 + 465775362048000 = 13881125657334775808,
> which less than 18446744073709551616, so we don't have overflow.
> 
> At the negative end of the input range, i.e.
> 
>   int64_t th_adjustment = (-5000 * 1000) << 32;
>   int64_t tc_freq_adj = -5LL << 32;
> 
> you have 9223372036854775808 - 465775362048000 = 4565618416374775808,
> so no underflow either.
> 
> Thoughts?
> 
> What is the best way to express this range in the documentation?  Do I
> say "parts per billion", or something else?
> 
> Index: sys/kern/kern_time.c
> ===
> RCS file: /cvs/src/sys/kern/kern_time.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 kern_time.c
> --- sys/kern/kern_time.c  22 Jun 2020 18:25:57 -  1.131
> +++ sys/kern/kern_time.c  3 Jul 2020 00:57:49 -
> @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
>   return (0);
>  }
>  
> +#define ADJFREQ_MAX (5LL << 32)
> +#define ADJFREQ_MIN (-5LL << 32)
> +
>  int
>  sys_adjfreq(struct proc *p, void *v, register_t *retval)
>  {
> @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
>   return (error);
>   if ((error = copyin(freq, , sizeof(f
>   return (error);
> + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
> + return (EINVAL);
>   }
>  
>   rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
> Index: lib/libc/sys/adjfreq.2
> ===
> RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
> retrieving revision 1.7
> diff -u -p -r1.7 adjfreq.2
> --- lib/libc/sys/adjfreq.210 Sep 2015 17:55:21 -  1.7
> +++ lib/libc/sys/adjfreq.23 Jul 2020 00:57:49 -
> @@ -60,6 +60,10 @@ The
>  .Fa freq
>  argument is non-null and the process's effective user ID is not that
>  of the superuser.
> +.It Bq Er EINVAL
> +.Fa freq
> +is less than -5 parts-per-billion or greater than 5
> +parts-per-billion.
>  .El
>  .Sh SEE ALSO
>  .Xr date 1 ,