Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
> PowerPC Mac OS X had a userland gettimeofday(2) using the cpu's
> timebase and a "common page" from the kernel.  Their common page also
> had executable code for gettimeofday, memcpy, pthread_self, and a few
> other functions.

We are desperately avoiding the model where such code is exported.
It becomes a target.



Re: userland clock_gettime proof of concept

2020-05-30 Thread George Koehler
On Sat, 30 May 2020 19:21:30 +0300
Paul Irofti  wrote:

> Here is an updated diff with no libc bump.  Please use this one for
> further testing.

Your diff does amd64.
Here is a diff to add macppc.  Apply after your diff.

I have only tested clock_gettime(2) with CLOCK_REALTIME,
by doing loops in Ruby like, $ ruby27 -e '1.times{p Time.now}'
The time increased steadily, and ktrace showed only a few system calls
to clock_gettime(2).

I copied ppc_mftb() from /sys/arch/powerpc/powerpc/cpu.h and renamed
it to tc_get_timecount_md(), because #include  doesn't
provide ppc_mftb() if not _KERNEL.  It would be better to edit the
kernel headers to give ppc_mftb() if _LIBC, but I haven't done that.

I changed __amd64 to __amd64__ because I didn't find __powerpc.  I'm
not sure, but one might move the list of arches to dlfcn/Makefile.inc
and do -DTIMEKEEP, like how thread/Makefile.inc does -DFUTEX.  One
might drop the tc_get_timecount function pointer and just always call
the function #ifdef TIMEKEEP.

PowerPC Mac OS X had a userland gettimeofday(2) using the cpu's
timebase and a "common page" from the kernel.  Their common page also
had executable code for gettimeofday, memcpy, pthread_self, and a few
other functions.  --George

Index: lib/libc/arch/powerpc/gen/Makefile.inc
===
RCS file: /cvs/src/lib/libc/arch/powerpc/gen/Makefile.inc,v
retrieving revision 1.14
diff -u -p -r1.14 Makefile.inc
--- lib/libc/arch/powerpc/gen/Makefile.inc  12 Apr 2012 16:14:09 -  
1.14
+++ lib/libc/arch/powerpc/gen/Makefile.inc  31 May 2020 03:20:58 -
@@ -3,3 +3,4 @@ SRCS+= fabs.c
 SRCS+= fpgetmask.c fpsetmask.c
 SRCS+= fpgetround.c fpsetround.c
 SRCS+= fpgetsticky.c fpsetsticky.c
+SRCS+= usertc.c
--- /dev/null   Sat May 30 23:21:20 2020
+++ lib/libc/arch/powerpc/gen/usertc.c  Sat May 30 19:37:59 2020
@@ -0,0 +1,44 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (C) 1995, 1996 Wolfgang Solfrank.
+ * Copyright (C) 1995, 1996 TooLs GmbH.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. All advertising materials mentioning features or use of this software
+ *must display the following acknowledgement:
+ * This product includes software developed by TooLs GmbH.
+ * 4. The name of TooLs GmbH may not be used to endorse or promote products
+ *derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+uint64_t
+tc_get_timecount_md(void)
+{
+   u_long scratch;
+   u_int64_t tb;
+
+   __asm volatile ("1: mftbu %0; mftb %L0; mftbu %1;"
+   " cmpw 0,%0,%1; bne 1b" : "=r"(tb), "=r"(scratch));
+   return tb;
+}
--- lib/libc/dlfcn/init.c.beforeSat May 30 23:26:35 2020
+++ lib/libc/dlfcn/init.c   Sat May 30 18:00:45 2020
@@ -70,7 +70,7 @@
 
 /* provide definitions for these */
 const dl_cb *_dl_cb __relro = NULL;
-#if defined(__amd64)
+#if defined(__amd64__) || defined(__powerpc__)
 uint64_t (*const tc_get_timecount)(void) = tc_get_timecount_md;
 #else
 uint64_t (*const tc_get_timecount)(void) = NULL;
--- sys/arch/macppc/macppc/clock.c.before   Sat May 30 23:28:00 2020
+++ sys/arch/macppc/macppc/clock.c  Sat May 30 20:35:47 2020
@@ -57,7 +57,7 @@
 static int32_t ticks_per_intr;
 
 static struct timecounter tb_timecounter = {
-   tb_get_timecount, NULL, 0x7fff, 0, "tb", 0, NULL, 0
+   tb_get_timecount, NULL, 0x7fff, 0, "tb", 0, NULL, 1
 };
 
 /* calibrate the timecounter frequency for the listed models */



Re: fix pppx(4) with net/ifq.c rev 1.38

2020-05-30 Thread David Gwynne



> On 30 May 2020, at 9:43 pm, Vitaliy Makkoveev  
> wrote:
> 
> 
>> On 30 May 2020, at 09:40, David Gwynne  wrote:
>> 
>> On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
>>> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
> On 23 May 2020, at 12:54, Martin Pieuchot  wrote:
> On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
>> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
>>> [...] 
>>> can you try the following diff?
>>> 
>> 
>> I tested this diff and it works for me. But the problem I pointed is
>> about pipex(4) locking.
>> 
>> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
>> ip{,6}_output() but for itself too. But since pppac_start() has
>> unpredictable behavior I suggested to make it predictable [1].
> 
> What needs the NET_LOCK() in their?  We're talking about
> pipex_ppp_output(), right?  Does it really need the NET_LOCK() or
> the KERNEL_LOCK() is what protects those data structures?
 
 Yes, about pipex_ppp_output() and pipex_output(). Except
 ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
 they can be replaced by ip{,6}_send().
>>> 
>>> Locks protect data structures, you're talking about functions, which
>>> data structures are serialized by this lock?  I'm questioning whether
>>> there is one.
>>> 
 [...]
> In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
 
 I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
 accessed through `pr_input???. Is NET_LOCK() required for this case?
>>> 
>>> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
>>> because it was easy to do.  That means it isn't a concious decision or
>>> design.  The fact that pipex(4) code runs under the NET_LOCK() is a side
>>> effect of how the rest of the stack evolved.  I'm questioning whether
>>> this lock is required there.  In theory it shouldn't.  What is the
>>> reality?
>> 
>> pipex and pppx pre-date the NET_LOCK, which means you can assume
>> that any implicit locking was and is done by the KERNEL_LOCK. mpi is
>> asking the right questions here.
>> 
>> As for the ifq maxlen difference between pppx and pppac, that's more
>> about when and how quickly they were written more than anything else.
>> The IFQ_SET_MAXLEN(>if_snd, 1) in pppx is because that's a way to
>> bypass transmit mitigation for pseudo/virtual interfaces. That was the
>> only way to do it historically. It is not an elegant hack to keep
>> hold of the NET_LOCK over a call to a start routine.
>> 
>> As a rule of thumb, network interface drivers should not (maybe
>> cannot) rely on the NET_LOCK in their if_start handlers. To be
>> clear, they should not rely on it being held by the network stack
>> when if_start is called because sometimes the stack calls it without
>> holding NET_LOCK, and they should not take it because they might
>> be called by the stack when it is being held.
>> 
>> Also, be aware that the ifq machinery makes sure that the start
>> routine is not called concurrently or recursively. You can queue
>> packets for transmission on an ifq from anywhere in the kernel at
>> any time, but only one cpu will run the start routine. Other cpus
>> can queue packets while another one is running if_start, but the
>> first one ends up responsible for trying to transmit it.
>> 
>> ifqs also take the KERNEL_LOCK before calling if_start if the interface
>> is not marked as IFXF_MPSAFE.
>> 
>> The summary is that pppx and pppac are not marked as mpsafe so their
>> start routines are called with KERNEL_LOCK held. Currently pppx
>> accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
>> rely on it.
>> 
>> Cheers,
>> dlg
>> 
> 
> Thanks for explanation.
> Will you commit diff you posted in this thread?

Yes, I'm doing that now.

Thanks for testing it btw.

dlg


Re: sparc64 boot issue on qemu

2020-05-30 Thread Patrick Wildt
On Sat, May 30, 2020 at 07:21:15PM +, Miod Vallat wrote:
> Yet another case where the emulator does not match the real hardware.
> 
> Why bother with them?
> 
> Get qemu to fix their shit so that the frame buffer metrics variable are
> aligned on 64-bit boundaries. There might not be a written specification
> for this requirement, but that's the way real hardware behaves, and it
> makes complete sense (the variables are OFW cells, which are 64-bit
> values and 64-bit aligned).

I'm not sure if sparc's OFW is different, but in the device trees as
used on arm and probably mips as well, 64-bit values are represented
using two 32-bit cells.  So I think a requirement of 64-bit would not
be correct anyway, and it should be 32-bit instead.

I saw some mailthread on the U-Boot lists regarding some alignment
requirements of some payloads.  Even though they confused the diff-
erence between "alignment of payload that will be put somewhere"
and "alignment of where the payload actually ends up with", it seems
like they also only require 32-bit alignment.  The tools that create
the image with the payloads, which they discussed, also makes sure
all payloads are aligned to 32-bit.



Re: sparc64 boot issue on qemu

2020-05-30 Thread Miod Vallat
Yet another case where the emulator does not match the real hardware.

Why bother with them?

Get qemu to fix their shit so that the frame buffer metrics variable are
aligned on 64-bit boundaries. There might not be a written specification
for this requirement, but that's the way real hardware behaves, and it
makes complete sense (the variables are OFW cells, which are 64-bit
values and 64-bit aligned).

If you really want to accomodate their broken firmware, you can avoid
using memcpy by relying upon sparc64 being big-endian and do something
like:

-   *fontwidth = (int)*(uint64_t *)romwidth;
-   *fontheight = (int)*(uint64_t *)romheight;
-   *wtop = (int)*(uint64_t *)windowtop;
-   *wleft = (int)*(uint64_t *)windowleft;
+   *fontwidth = *(uint32_t *)(romwidth + 4);
+   *fontheight = *(uint32_t *)(romheight + 4);
+   *wtop = *(uint32_t *)(windowtop + 4);
+   *wleft = *(uint32_t *)(windowleft + 4);

Miod



Re: umstc: Microsoft Surface Type Cover driver

2020-05-30 Thread Mark Kettenis
> Date: Sat, 30 May 2020 12:38:12 -0500
> From: joshua stein 
> 
> This makes the volume and screen brightness keys work, but more 
> importantly it keeps the USB data/interrupt pipes open at all times 
> because otherwise the Type Cover resets itself (or at least detaches 
> and reattaches) when these buttons are pressed.

I have no real objection to this going in.  I do wonder though whether
it makes sense to add a more general driver that attaches to all USB
HID devices that provide a HUP_CONSUMER, HUC_CONTROL collection.  And
attaching it as a wskbd(4) isn't such as strange idea.


> diff --git share/man/man4/Makefile share/man/man4/Makefile
> index 258399f2e7a..3c15cc14285 100644
> --- share/man/man4/Makefile
> +++ share/man/man4/Makefile
> @@ -84,8 +84,8 @@ MAN=aac.4 abcrtc.4 ac97.4 acphy.4 acrtc.4 \
>   uftdi.4 ugen.4 ugl.4 ugold.4 uguru.4 uhci.4 uhid.4 uhidev.4 uipaq.4 \
>   uk.4 ukbd.4 \
>   ukphy.4 ulpt.4 umass.4 umb.4 umbg.4 umcs.4 umct.4 umidi.4 umodem.4 \
> - ums.4 umsm.4 umt.4 unix.4 uonerng.4 uow.4 uoaklux.4 uoakrh.4 uoakv.4 \
> - upd.4 upgt.4 upl.4 uplcom.4 ural.4 ure.4 url.4 urlphy.4 \
> + ums.4 umsm.4 umstc.4 umt.4 unix.4 uonerng.4 uow.4 uoaklux.4 uoakrh.4 \
> + uoakv.4 upd.4 upgt.4 upl.4 uplcom.4 ural.4 ure.4 url.4 urlphy.4 \
>   urndis.4 urng.4 urtw.4 urtwn.4 usb.4 uscom.4 uslcom.4 usps.4 \
>   uthum.4 uticom.4 utpms.4 utwitch.4 utrh.4 uts.4 utvfu.4 uvideo.4 \
>   uvisor.4 uvscom.4 uwacom.4 uxrcom.4 \
> diff --git share/man/man4/umstc.4 share/man/man4/umstc.4
> new file mode 100644
> index 000..da557c59d1d
> --- /dev/null
> +++ share/man/man4/umstc.4
> @@ -0,0 +1,44 @@
> +.\"  $OpenBSD$
> +.\"
> +.\" Copyright (c) 2020 joshua stein 
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt UMSTC 4
> +.Os
> +.Sh NAME
> +.Nm umstc
> +.Nd Microsoft Surface Type Cover driver
> +.Sh SYNOPSIS
> +.Cd "umstc* at uhidev?"
> +.Sh DESCRIPTION
> +The
> +.Nm
> +driver responds to some special keys on the Microsoft Type Cover
> +keyboard, such as volume and screen brightness keys.
> +.Sh SEE ALSO
> +.Xr uhidev 4 ,
> +.Xr usb 4 ,
> +.Xr sysctl 8
> +.Sh HISTORY
> +The
> +.Nm
> +driver first appeared in
> +.Ox 6.8 .
> +.Sh AUTHORS
> +.An -nosplit
> +The
> +.Nm
> +driver was written by
> +.An joshua stein Aq Mt j...@jcs.org .
> diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
> index b77a3b4ead5..624f73bb7cc 100644
> --- sys/arch/amd64/conf/GENERIC
> +++ sys/arch/amd64/conf/GENERIC
> @@ -284,6 +284,7 @@ ucom* at uslhcom?
>  uhid*at uhidev?  # USB generic HID support
>  fido*at uhidev?  # FIDO/U2F security key support
>  upd* at uhidev?  # USB Power Devices sensors
> +umstc*   at uhidev?  # Microsoft Surface Type Cover
>  aue* at uhub?# ADMtek AN986 Pegasus Ethernet
>  atu* at uhub?# Atmel AT76c50x based 802.11b
>  axe* at uhub?# ASIX Electronics AX88172 USB Ethernet
> diff --git sys/dev/hid/hid.h sys/dev/hid/hid.h
> index 17de4065e0c..c528ab9551b 100644
> --- sys/dev/hid/hid.h
> +++ sys/dev/hid/hid.h
> @@ -396,6 +396,11 @@ int  hid_is_collection(const void *, int, uint8_t, 
> int32_t);
>  #define HUL_KANA 0x0005
>  
>  /* Usages, Consumer */
> +#define HUC_CONTROL  0x0001
> +#define HUC_PLAY_PAUSE   0x00cd
> +#define HUC_MUTE 0x00e2
> +#define HUC_VOL_INC  0x00e9
> +#define HUC_VOL_DEC  0x00ea
>  #define HUC_AC_PAN   0x0238
>  
>  /* Usages, FIDO */
> diff --git sys/dev/usb/files.usb sys/dev/usb/files.usb
> index c3597c54775..fc5ff46ce76 100644
> --- sys/dev/usb/files.usb
> +++ sys/dev/usb/files.usb
> @@ -473,3 +473,8 @@ file  dev/usb/uwacom.cuwacom
>  
>  attach   bwfm at uhub with bwfm_usb: firmload
>  file dev/usb/if_bwfm_usb.c   bwfm_usb
> +
> +# Microsoft Surface Type Cover
> +device   umstc: hid
> +attach   umstc at uhidbus
> +file dev/usb/umstc.c umstc
> diff --git sys/dev/usb/umstc.c sys/dev/usb/umstc.c
> new file mode 100644
> index 000..df995181bf6
> --- /dev/null
> +++ sys/dev/usb/umstc.c
> @@ -0,0 +1,172 @@
> +/*   $OpenBSD$ */
> +

Re: official ports vs DEBUG_PACKAGES

2020-05-30 Thread Marc Espie
On Sat, May 30, 2020 at 05:10:49PM +0200, Jeremie Courreges-Anglas wrote:
> On Sat, May 30 2020, Marc Espie  wrote:
> >> - have some magic I don't know in ELF handling that would allow to either
> >> tweak the default location/introduce ${WRKOBJDIR} in that debug info.
> >
> > Thinking some more about it, that 3rd option is possibly not that
> > far-fetched
> >
> > we do pass most debug-info thru dwz  for shrinkage, so there's one tool
> > that does peek inside dwarf debug and does compaction, I have zero idea how
> > hard it is to tweak paths as well.
> 
> Take a look at -fdebug-prefix-map.
> 
>   https://reproducible-builds.org/docs/build-path/

Nice, thanks :)

So just adding this to CFLAGS/CXXFLAGS whenever DEBUG_PACKAGES is set
should be enough



Re: acpihid: INT33D5 driver

2020-05-30 Thread Mark Kettenis
> Date: Sat, 30 May 2020 12:26:00 -0500
> From: joshua stein 
> 
> A driver for tablet hardware buttons which seems to be a 
> standardized interface.  Used on the Surface Go.
> 
> When pressing the power button, the driver receives a button down 
> and then a button up event, unlike acpibtn.  Since there may be an 
> opportunity to do something different based on the duration of the 
> button press, it takes over shutdown/suspend from acpibtn(4).

That last bit is a bit surprising.

Are you saying that this machine has an acpibtn(4) as well?  And that
it works if this driver doesn't attach?

In any case, a dmesg + acpidump for this machine would be appreciated
since that would allow me to make a little bit more sense out of this.

> diff --git share/man/man4/Makefile share/man/man4/Makefile
> index 2d9a57d3814..258399f2e7a 100644
> --- share/man/man4/Makefile
> +++ share/man/man4/Makefile
> @@ -3,6 +3,7 @@
>  MAN= aac.4 abcrtc.4 ac97.4 acphy.4 acrtc.4 \
>   acpi.4 acpiac.4 acpials.4 acpiasus.4 acpibat.4 \
>   acpibtn.4 acpicbkbd.4 acpicpu.4 acpidock.4 acpihve.4 acpiec.4 \
> + acpihid.4 \
>   acpihpet.4 acpimadt.4 acpimcfg.4 acpipci.4 acpiprt.4 acpipwrres.4 \
>   acpisbs.4 acpisony.4 acpisurface.4 acpithinkpad.4 acpitoshiba.4 \
>   acpitimer.4 acpivideo.4 acpivout.4 acpitz.4 \
> diff --git share/man/man4/acpihid.4 share/man/man4/acpihid.4
> new file mode 100644
> index 000..8d54da962d1
> --- /dev/null
> +++ share/man/man4/acpihid.4
> @@ -0,0 +1,47 @@
> +.\"  $OpenBSD$
> +.\"
> +.\" Copyright (c) 2020 joshua stein 
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt ACPIHID 4
> +.Os
> +.Sh NAME
> +.Nm acpihid
> +.Nd ACPI HID event and 5-button array driver
> +.Sh SYNOPSIS
> +.Cd "acpihid* at acpi?"
> +.Sh DESCRIPTION
> +The
> +.Nm
> +driver supports ACPI HID events and 5-button array devices found on some
> +tablet devices.
> +Power button events are processed according to the
> +.Va machdep.pwraction
> +sysctl value.
> +.Sh SEE ALSO
> +.Xr acpi 4 ,
> +.Xr acpibtn 4 ,
> +.Xr sysctl 8
> +.Sh HISTORY
> +The
> +.Nm
> +driver first appeared in
> +.Ox 6.8 .
> +.Sh AUTHORS
> +.An -nosplit
> +The
> +.Nm
> +driver was written by
> +.An joshua stein Aq Mt j...@jcs.org .
> diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
> index 87893bfafea..b77a3b4ead5 100644
> --- sys/arch/amd64/conf/GENERIC
> +++ sys/arch/amd64/conf/GENERIC
> @@ -71,6 +71,7 @@ acpials*at acpi?
>  tpm* at acpi?
>  acpihve* at acpi?
>  acpisurface* at acpi?
> +acpihid* at acpi?
>  ipmi0at acpi? disable
>  ccpmic*  at iic?
>  tipmic*  at iic?
> diff --git sys/dev/acpi/acpi.c sys/dev/acpi/acpi.c
> index 7c103347980..306419761c7 100644
> --- sys/dev/acpi/acpi.c
> +++ sys/dev/acpi/acpi.c
> @@ -72,6 +72,7 @@ int acpi_debug = 16;
>  int  acpi_poll_enabled;
>  int  acpi_hasprocfvs;
>  int  acpi_haspci;
> +int  acpi_hashidpower;
>  
>  #define ACPIEN_RETRIES 15
>  
> @@ -2025,6 +2026,9 @@ acpi_pbtn_task(void *arg0, int dummy)
>   en | ACPI_PM1_PWRBTN_EN);
>   splx(s);
>  
> + if (acpi_hashidpower)
> + return;
> +
>   switch (pwr_action) {
>   case 0:
>   break;
> diff --git sys/dev/acpi/acpibtn.c sys/dev/acpi/acpibtn.c
> index da3b59ef084..e0b759121ee 100644
> --- sys/dev/acpi/acpibtn.c
> +++ sys/dev/acpi/acpibtn.c
> @@ -270,6 +270,9 @@ sleep:
>  #endif /* SMALL_KERNEL */
>   break;
>   case ACPIBTN_POWER:
> + if (acpi_hashidpower)
> + break;
> +
>   if (notify_type == 0x80) {
>   switch (pwr_action) {
>   case 0:
> diff --git sys/dev/acpi/acpihid.c sys/dev/acpi/acpihid.c
> new file mode 100644
> index 000..5ec875a86e8
> --- /dev/null
> +++ sys/dev/acpi/acpihid.c
> @@ -0,0 +1,254 @@
> +/* $OpenBSD$ */
> +/*
> + * ACPI HID event and 5-button array driver
> + *
> + * Copyright (c) 2018 joshua stein 
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS 

ospf6d: enable reload

2020-05-30 Thread Denis Fondras
This diff provides a working 'ospf6ctl reload'.

Must be applied after https://marc.info/?l=openbsd-tech=159084971620177=2

Index: ospf6ctl/ospf6ctl.c
===
RCS file: /home/denis/dev/cvs/src/usr.sbin/ospf6ctl/ospf6ctl.c,v
retrieving revision 1.51
diff -u -p -r1.51 ospf6ctl.c
--- ospf6ctl/ospf6ctl.c 5 Apr 2020 18:19:04 -   1.51
+++ ospf6ctl/ospf6ctl.c 30 May 2020 18:02:41 -
@@ -235,14 +235,10 @@ main(int argc, char *argv[])
done = 1;
break;
case RELOAD:
-#ifdef notyet
imsg_compose(ibuf, IMSG_CTL_RELOAD, 0, 0, -1, NULL, 0);
printf("reload request sent.\n");
done = 1;
break;
-#else
-   errx(1, "reload not supported");
-#endif
}
 
while (ibuf->w.queued)
Index: ospf6d/ospf6d.c
===
RCS file: /home/denis/dev/cvs/src/usr.sbin/ospf6d/ospf6d.c,v
retrieving revision 1.47
diff -u -p -r1.47 ospf6d.c
--- ospf6d/ospf6d.c 30 May 2020 18:02:13 -  1.47
+++ ospf6d/ospf6d.c 30 May 2020 18:02:41 -
@@ -277,6 +277,8 @@ main(int argc, char *argv[])
fatalx("control socket setup failed");
main_imsg_compose_ospfe_fd(IMSG_CONTROLFD, 0, control_fd);
 
+   if (unveil("/", "r") == -1)
+   fatal("unveil");
if (unveil(ospfd_conf->csock, "c") == -1)
fatal("unveil");
if (unveil(NULL, NULL) == -1)
@@ -611,23 +613,37 @@ ospf_redistribute(struct kroute *kr, u_i
 int
 ospf_reload(void)
 {
-#ifdef notyet
struct area *area;
+   struct iface*iface;
struct ospfd_conf   *xconf;
 
if ((xconf = parse_config(conffile, ospfd_conf->opts)) == NULL)
return (-1);
 
-   /* XXX bail out if router-id changed */
+   /* No router-id was specified, keep existing value */
+   if (xconf->rtr_id.s_addr == 0)
+   xconf->rtr_id.s_addr = ospfd_conf->rtr_id.s_addr;
+
+   /* Abort the reload if rtr_id changed */
+   if (ospfd_conf->rtr_id.s_addr != xconf->rtr_id.s_addr) {
+   log_warnx("router-id changed: restart required");
+   return (-1);
+   }
 
/* send config to childs */
if (ospf_sendboth(IMSG_RECONF_CONF, xconf, sizeof(*xconf)) == -1)
return (-1);
 
-   /* send areas, interfaces happen out of band */
+   /* send areas & interfaces */
LIST_FOREACH(area, >area_list, entry) {
if (ospf_sendboth(IMSG_RECONF_AREA, area, sizeof(*area)) == -1)
return (-1);
+
+   LIST_FOREACH(iface, >iface_list, entry) {
+   if (ospf_sendboth(IMSG_RECONF_IFACE, iface,
+   sizeof(*iface)) == -1)
+   return (-1);
+   }
}
 
if (ospf_sendboth(IMSG_RECONF_END, NULL, 0) == -1)
@@ -639,9 +655,6 @@ ospf_reload(void)
/* update redistribute lists */
kr_reload(ospfd_conf->redist_label_or_prefix);
return (0);
-#else
-   return (-1);
-#endif
 }
 
 int
@@ -725,6 +738,22 @@ merge_config(struct ospfd_conf *conf, st
 * stub is not yet used but switching between stub and normal
 * will be another painful job.
 */
+   if (a->stub != xa->stub && ospfd_process == PROC_OSPF_ENGINE)
+   a->dirty = 1; /* force rtr LSA update */
+#if 0
+   if (xa->stub && ospfd_process == PROC_RDE_ENGINE) {
+   while ((r = SIMPLEQ_FIRST(>redist_list)) != NULL) {
+   SIMPLEQ_REMOVE_HEAD(>redist_list, entry);
+   free(r);
+   }
+
+   while ((r = SIMPLEQ_FIRST(>redist_list)) != NULL) {
+   SIMPLEQ_REMOVE_HEAD(>redist_list, entry);
+   SIMPLEQ_INSERT_TAIL(>redist_list, r, entry);
+   }
+   }
+#endif
+
a->stub = xa->stub;
a->stub_default_cost = xa->stub_default_cost;
if (ospfd_process == PROC_RDE_ENGINE)
@@ -746,7 +775,15 @@ merge_config(struct ospfd_conf *conf, st
}
if (a->dirty) {
a->dirty = 0;
-   orig_rtr_lsa(LIST_FIRST(>iface_list)->area);
+   orig_rtr_lsa(a);
+   }
+   }
+   }
+   if (ospfd_process == PROC_RDE_ENGINE) {
+   LIST_FOREACH(a, >area_list, entry) {
+   if (a->dirty) {
+   start_spf_timer();
+   break;
}
}
}
@@ -767,7 +804,7 @@ merge_interfaces(struct area *a, struct 

umstc: Microsoft Surface Type Cover driver

2020-05-30 Thread joshua stein
This makes the volume and screen brightness keys work, but more 
importantly it keeps the USB data/interrupt pipes open at all times 
because otherwise the Type Cover resets itself (or at least detaches 
and reattaches) when these buttons are pressed.


diff --git share/man/man4/Makefile share/man/man4/Makefile
index 258399f2e7a..3c15cc14285 100644
--- share/man/man4/Makefile
+++ share/man/man4/Makefile
@@ -84,8 +84,8 @@ MAN=  aac.4 abcrtc.4 ac97.4 acphy.4 acrtc.4 \
uftdi.4 ugen.4 ugl.4 ugold.4 uguru.4 uhci.4 uhid.4 uhidev.4 uipaq.4 \
uk.4 ukbd.4 \
ukphy.4 ulpt.4 umass.4 umb.4 umbg.4 umcs.4 umct.4 umidi.4 umodem.4 \
-   ums.4 umsm.4 umt.4 unix.4 uonerng.4 uow.4 uoaklux.4 uoakrh.4 uoakv.4 \
-   upd.4 upgt.4 upl.4 uplcom.4 ural.4 ure.4 url.4 urlphy.4 \
+   ums.4 umsm.4 umstc.4 umt.4 unix.4 uonerng.4 uow.4 uoaklux.4 uoakrh.4 \
+   uoakv.4 upd.4 upgt.4 upl.4 uplcom.4 ural.4 ure.4 url.4 urlphy.4 \
urndis.4 urng.4 urtw.4 urtwn.4 usb.4 uscom.4 uslcom.4 usps.4 \
uthum.4 uticom.4 utpms.4 utwitch.4 utrh.4 uts.4 utvfu.4 uvideo.4 \
uvisor.4 uvscom.4 uwacom.4 uxrcom.4 \
diff --git share/man/man4/umstc.4 share/man/man4/umstc.4
new file mode 100644
index 000..da557c59d1d
--- /dev/null
+++ share/man/man4/umstc.4
@@ -0,0 +1,44 @@
+.\"$OpenBSD$
+.\"
+.\" Copyright (c) 2020 joshua stein 
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt UMSTC 4
+.Os
+.Sh NAME
+.Nm umstc
+.Nd Microsoft Surface Type Cover driver
+.Sh SYNOPSIS
+.Cd "umstc* at uhidev?"
+.Sh DESCRIPTION
+The
+.Nm
+driver responds to some special keys on the Microsoft Type Cover
+keyboard, such as volume and screen brightness keys.
+.Sh SEE ALSO
+.Xr uhidev 4 ,
+.Xr usb 4 ,
+.Xr sysctl 8
+.Sh HISTORY
+The
+.Nm
+driver first appeared in
+.Ox 6.8 .
+.Sh AUTHORS
+.An -nosplit
+The
+.Nm
+driver was written by
+.An joshua stein Aq Mt j...@jcs.org .
diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
index b77a3b4ead5..624f73bb7cc 100644
--- sys/arch/amd64/conf/GENERIC
+++ sys/arch/amd64/conf/GENERIC
@@ -284,6 +284,7 @@ ucom*   at uslhcom?
 uhid*  at uhidev?  # USB generic HID support
 fido*  at uhidev?  # FIDO/U2F security key support
 upd*   at uhidev?  # USB Power Devices sensors
+umstc* at uhidev?  # Microsoft Surface Type Cover
 aue*   at uhub?# ADMtek AN986 Pegasus Ethernet
 atu*   at uhub?# Atmel AT76c50x based 802.11b
 axe*   at uhub?# ASIX Electronics AX88172 USB Ethernet
diff --git sys/dev/hid/hid.h sys/dev/hid/hid.h
index 17de4065e0c..c528ab9551b 100644
--- sys/dev/hid/hid.h
+++ sys/dev/hid/hid.h
@@ -396,6 +396,11 @@ inthid_is_collection(const void *, int, uint8_t, 
int32_t);
 #define HUL_KANA   0x0005
 
 /* Usages, Consumer */
+#define HUC_CONTROL0x0001
+#define HUC_PLAY_PAUSE 0x00cd
+#define HUC_MUTE   0x00e2
+#define HUC_VOL_INC0x00e9
+#define HUC_VOL_DEC0x00ea
 #define HUC_AC_PAN 0x0238
 
 /* Usages, FIDO */
diff --git sys/dev/usb/files.usb sys/dev/usb/files.usb
index c3597c54775..fc5ff46ce76 100644
--- sys/dev/usb/files.usb
+++ sys/dev/usb/files.usb
@@ -473,3 +473,8 @@ filedev/usb/uwacom.cuwacom
 
 attach bwfm at uhub with bwfm_usb: firmload
 file   dev/usb/if_bwfm_usb.c   bwfm_usb
+
+# Microsoft Surface Type Cover
+device umstc: hid
+attach umstc at uhidbus
+file   dev/usb/umstc.c umstc
diff --git sys/dev/usb/umstc.c sys/dev/usb/umstc.c
new file mode 100644
index 000..df995181bf6
--- /dev/null
+++ sys/dev/usb/umstc.c
@@ -0,0 +1,172 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2020 joshua stein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY 

acpihid: INT33D5 driver

2020-05-30 Thread joshua stein
A driver for tablet hardware buttons which seems to be a 
standardized interface.  Used on the Surface Go.

When pressing the power button, the driver receives a button down 
and then a button up event, unlike acpibtn.  Since there may be an 
opportunity to do something different based on the duration of the 
button press, it takes over shutdown/suspend from acpibtn(4).


diff --git share/man/man4/Makefile share/man/man4/Makefile
index 2d9a57d3814..258399f2e7a 100644
--- share/man/man4/Makefile
+++ share/man/man4/Makefile
@@ -3,6 +3,7 @@
 MAN=   aac.4 abcrtc.4 ac97.4 acphy.4 acrtc.4 \
acpi.4 acpiac.4 acpials.4 acpiasus.4 acpibat.4 \
acpibtn.4 acpicbkbd.4 acpicpu.4 acpidock.4 acpihve.4 acpiec.4 \
+   acpihid.4 \
acpihpet.4 acpimadt.4 acpimcfg.4 acpipci.4 acpiprt.4 acpipwrres.4 \
acpisbs.4 acpisony.4 acpisurface.4 acpithinkpad.4 acpitoshiba.4 \
acpitimer.4 acpivideo.4 acpivout.4 acpitz.4 \
diff --git share/man/man4/acpihid.4 share/man/man4/acpihid.4
new file mode 100644
index 000..8d54da962d1
--- /dev/null
+++ share/man/man4/acpihid.4
@@ -0,0 +1,47 @@
+.\"$OpenBSD$
+.\"
+.\" Copyright (c) 2020 joshua stein 
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt ACPIHID 4
+.Os
+.Sh NAME
+.Nm acpihid
+.Nd ACPI HID event and 5-button array driver
+.Sh SYNOPSIS
+.Cd "acpihid* at acpi?"
+.Sh DESCRIPTION
+The
+.Nm
+driver supports ACPI HID events and 5-button array devices found on some
+tablet devices.
+Power button events are processed according to the
+.Va machdep.pwraction
+sysctl value.
+.Sh SEE ALSO
+.Xr acpi 4 ,
+.Xr acpibtn 4 ,
+.Xr sysctl 8
+.Sh HISTORY
+The
+.Nm
+driver first appeared in
+.Ox 6.8 .
+.Sh AUTHORS
+.An -nosplit
+The
+.Nm
+driver was written by
+.An joshua stein Aq Mt j...@jcs.org .
diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
index 87893bfafea..b77a3b4ead5 100644
--- sys/arch/amd64/conf/GENERIC
+++ sys/arch/amd64/conf/GENERIC
@@ -71,6 +71,7 @@ acpials*  at acpi?
 tpm*   at acpi?
 acpihve*   at acpi?
 acpisurface*   at acpi?
+acpihid*   at acpi?
 ipmi0  at acpi? disable
 ccpmic*at iic?
 tipmic*at iic?
diff --git sys/dev/acpi/acpi.c sys/dev/acpi/acpi.c
index 7c103347980..306419761c7 100644
--- sys/dev/acpi/acpi.c
+++ sys/dev/acpi/acpi.c
@@ -72,6 +72,7 @@ int   acpi_debug = 16;
 intacpi_poll_enabled;
 intacpi_hasprocfvs;
 intacpi_haspci;
+intacpi_hashidpower;
 
 #define ACPIEN_RETRIES 15
 
@@ -2025,6 +2026,9 @@ acpi_pbtn_task(void *arg0, int dummy)
en | ACPI_PM1_PWRBTN_EN);
splx(s);
 
+   if (acpi_hashidpower)
+   return;
+
switch (pwr_action) {
case 0:
break;
diff --git sys/dev/acpi/acpibtn.c sys/dev/acpi/acpibtn.c
index da3b59ef084..e0b759121ee 100644
--- sys/dev/acpi/acpibtn.c
+++ sys/dev/acpi/acpibtn.c
@@ -270,6 +270,9 @@ sleep:
 #endif /* SMALL_KERNEL */
break;
case ACPIBTN_POWER:
+   if (acpi_hashidpower)
+   break;
+
if (notify_type == 0x80) {
switch (pwr_action) {
case 0:
diff --git sys/dev/acpi/acpihid.c sys/dev/acpi/acpihid.c
new file mode 100644
index 000..5ec875a86e8
--- /dev/null
+++ sys/dev/acpi/acpihid.c
@@ -0,0 +1,254 @@
+/* $OpenBSD$ */
+/*
+ * ACPI HID event and 5-button array driver
+ *
+ * Copyright (c) 2018 joshua stein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+

Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti
Here is an updated diff with no libc bump.  Please use this one for
further testing.

diff --git lib/libc/arch/amd64/gen/Makefile.inc 
lib/libc/arch/amd64/gen/Makefile.inc
index e995309ed71..c80f5cf671a 100644
--- lib/libc/arch/amd64/gen/Makefile.inc
+++ lib/libc/arch/amd64/gen/Makefile.inc
@@ -2,6 +2,6 @@
 
 SRCS+= _setjmp.S fabs.S infinity.c ldexp.c modf.S nan.c setjmp.S \
sigsetjmp.S
-SRCS+= fpclassifyl.c isfinitel.c isinfl.c isnanl.c isnormall.c signbitl.c
+SRCS+= fpclassifyl.c usertc.c isfinitel.c isinfl.c isnanl.c isnormall.c 
signbitl.c
 SRCS+= flt_rounds.S fpgetmask.S fpgetround.S fpgetsticky.S fpsetmask.S \
fpsetround.S fpsetsticky.S
diff --git lib/libc/arch/amd64/gen/usertc.c lib/libc/arch/amd64/gen/usertc.c
new file mode 100644
index 000..b14c862c61a
--- /dev/null
+++ lib/libc/arch/amd64/gen/usertc.c
@@ -0,0 +1,26 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (c) 2020 Paul Irofti 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+
+uint64_t
+tc_get_timecount_md(void)
+{
+   uint32_t hi, lo;
+   asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
+   return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+}
diff --git lib/libc/asr/asr.c lib/libc/asr/asr.c
index cd056c85719..2b25d49f32a 100644
--- lib/libc/asr/asr.c
+++ lib/libc/asr/asr.c
@@ -196,11 +196,11 @@ poll_intrsafe(struct pollfd *fds, nfds_t nfds, int 
timeout)
struct timespec pollstart, pollend, elapsed;
int r;
 
-   if (clock_gettime(CLOCK_MONOTONIC, ))
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
return -1;
 
while ((r = poll(fds, 1, timeout)) == -1 && errno == EINTR) {
-   if (clock_gettime(CLOCK_MONOTONIC, ))
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
return -1;
timespecsub(, , );
timeout -= elapsed.tv_sec * 1000 + elapsed.tv_nsec / 100;
@@ -418,7 +418,7 @@ asr_check_reload(struct asr *asr)
asr->a_rtime = 0;
}
 
-   if (clock_gettime(CLOCK_MONOTONIC, ) == -1)
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ) == -1)
return;
 
if ((ts.tv_sec - asr->a_rtime) < RELOAD_DELAY && asr->a_rtime != 0)
diff --git lib/libc/crypt/bcrypt.c lib/libc/crypt/bcrypt.c
index 82de8fa33b7..02fd3013cc1 100644
--- lib/libc/crypt/bcrypt.c
+++ lib/libc/crypt/bcrypt.c
@@ -248,9 +248,9 @@ _bcrypt_autorounds(void)
char buf[_PASSWORD_LEN];
int duration;
 
-   clock_gettime(CLOCK_THREAD_CPUTIME_ID, );
+   WRAP(clock_gettime)(CLOCK_THREAD_CPUTIME_ID, );
bcrypt_newhash("testpassword", r, buf, sizeof(buf));
-   clock_gettime(CLOCK_THREAD_CPUTIME_ID, );
+   WRAP(clock_gettime)(CLOCK_THREAD_CPUTIME_ID, );
 
duration = after.tv_sec - before.tv_sec;
duration *= 100;
diff --git lib/libc/dlfcn/init.c lib/libc/dlfcn/init.c
index 270f54aada5..860ae2b8698 100644
--- lib/libc/dlfcn/init.c
+++ lib/libc/dlfcn/init.c
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include /* timekeep */
 
 #ifndef PIC
 #include 
@@ -45,8 +46,9 @@
 /* XXX should be in an include file shared with csu */
 char   ***_csu_finish(char **_argv, char **_envp, void (*_cleanup)(void));
 
-/* provide definition for this */
+/* provide definitions for these */
 int_pagesize = 0;
+void   *_timekeep = NULL;
 
 /*
  * In dynamicly linked binaries environ and __progname are overriden by
@@ -68,6 +70,12 @@ extern Elf_Ehdr __executable_start[] __attribute__((weak));
 
 /* provide definitions for these */
 const dl_cb *_dl_cb __relro = NULL;
+#if defined(__amd64)
+uint64_t (*const tc_get_timecount)(void) = tc_get_timecount_md;
+#else
+uint64_t (*const tc_get_timecount)(void) = NULL;
+#endif
+
 
 void _libc_preinit(int, char **, char **, dl_cb_cb *) __dso_hidden;
 void
@@ -105,6 +113,10 @@ _libc_preinit(int argc, char **argv, char **envp, dl_cb_cb 
*cb)
phnum = aux->au_v;
break;
 #endif /* !PIC */
+   case AUX_openbsd_timekeep:
+   if (tc_get_timecount)
+   _timekeep = (void *)aux->au_v;
+   break;
}
}
 
diff --git lib/libc/gen/auth_subr.c lib/libc/gen/auth_subr.c
index 

Re: smtpd: make smarthost to use SNI when relaying

2020-05-30 Thread Bob Beck
On Sat, May 30, 2020 at 05:40:43PM +0200, Sebastien Marie wrote:
> Hi,
> 
> I am looking to make smtpd to set SNI (SSL_set_tlsext_host_name) when 
> connecting
> to smarthost when relaying mail.
> 
> After digging a bit in libtls (to stole the right code) and smtpd (to see 
> where
> to put the stolen code), I have the following diff:
> 
> 
> diff 73b535ef4537e8454483912fc3420bc304759e96 /home/semarie/repos/openbsd/src
> blob - d384692a0e43de47d645142a6b99e72b7d83b687
> file + usr.sbin/smtpd/mta_session.c
> --- usr.sbin/smtpd/mta_session.c
> +++ usr.sbin/smtpd/mta_session.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1604,6 +1605,8 @@ mta_cert_init_cb(void *arg, int status, const char *na
>   struct mta_session *s = arg;
>   void *ssl;
>   char *xname = NULL, *xcert = NULL;
> + struct in_addr addrbuf4;
> + struct in6_addr addrbuf6;
>  
>   if (s->flags & MTA_WAIT)
>   mta_tree_pop(_tls_init, s->id);
> @@ -1623,6 +1626,24 @@ mta_cert_init_cb(void *arg, int status, const char *na
>   free(xcert);
>   if (ssl == NULL)
>   fatal("mta: ssl_mta_init");
> +
> + /*
> +  * RFC4366 (SNI): Literal IPv4 and IPv6 addresses are not
> +  * permitted in "HostName".
> +  */
> + if (s->relay->domain->as_host == 1) {
> + xname = xstrdup(s->relay->domain->name);

This allocation appears to be unnecessary.  I believe you should be
able to simply check with inet_pton, and then use
s->relay->domain->name

On a strictly smtpd front, it seems odd to have somthing called
domain->name possibly being an ip address in text form. It would seem
prudent to keep these things separate as we resolve things. (domain->ip, or
domain->mxip if we have resolved down to that might be better) but
that's a separate issue and larger change.

> + if (inet_pton(AF_INET, xname, ) != 1 &&
> + inet_pton(AF_INET6, xname, ) != 1) {
> + log_info("%016"PRIx64" mta setting SNI name=%s",
> + s->id, xname);
> + if (SSL_set_tlsext_host_name(ssl, xname) == 0)
> + log_warnx("%016"PRIx64" mta setting SNI failed",
> +s->id);
> + }
> + free(xname);
> + }
> +
>   io_start_tls(s->io, ssl);
>  }
>  
> 
> 
> For what I understood:
> 
> mta_cert_init_cb() function is responsable to prepare a connection. the SSL
> initialization (SSL_new() call) occured in ssl_mta_init() which was just 
> called,
> so it seems it is the right place to call SSL_set_tlsext_host_name().
> 
> We just need the hostname to configure it.
> 
> Regarding mta_session structure, relay->domain->as_host is set to 1 when the
> domain is linked to smarthost configuration (or when the mx is ip address I
> think). And in smarthost case, the domain->name is the hostname. For SNI, we 
> are
> excluding ip, so I assume it should copte with domain->name as ip.
> 
> Does someone with better understanding of smtpd code source could confirm the
> approch is right and comment ?
> 
> Please note I have only tested it on simple configuration.
> 
> Thanks.
> -- 
> Sebastien Marie
> 



Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti
On Sat, May 30, 2020 at 10:11:10AM -0600, Theo de Raadt wrote:
> Paul Irofti  wrote:
> 
> > > > The libc bump is there because it helps me switch more easily between
> > > > versions.
> > > 
> > > That is bogus.  Minors are used for visible ABI additions, majors are
> > > used for ABI deletions or API changes visible as ABI.  Please don't
> > > argue for a vague extension of the rules again.
> > 
> > I do not know what you are taking about here. I am not looking at any
> > extension of the rules, nor was I in the past. The whole issue of
> > bumping I leave it up to you and whoever understands these rules. Some
> > developers said this is not required, including kettenis@, and this is
> > why I justified the bump in my diff. That and it might also help others
> > quickly test the diff.
> 
> Repeatedly you were told this wasn't needed, but you kept shipping diffs
> which do it.  And now there are developers who have a future-numbered libc
> on their system, which doesn't do future things.
> 
> It is not justifiable.
> 
> It does NOT help people quickly test the diff, as such an approach
> requires making assumptions which are more complicated then the diff.
> This is not the purpose of major and minor numbers!

Oh, I see. You are correct. My appologies for that. I did not fully
understand the consequences. I will send out a new diff w/o the bump.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
Paul Irofti  wrote:

> > > The libc bump is there because it helps me switch more easily between
> > > versions.
> > 
> > That is bogus.  Minors are used for visible ABI additions, majors are
> > used for ABI deletions or API changes visible as ABI.  Please don't
> > argue for a vague extension of the rules again.
> 
> I do not know what you are taking about here. I am not looking at any
> extension of the rules, nor was I in the past. The whole issue of
> bumping I leave it up to you and whoever understands these rules. Some
> developers said this is not required, including kettenis@, and this is
> why I justified the bump in my diff. That and it might also help others
> quickly test the diff.

Repeatedly you were told this wasn't needed, but you kept shipping diffs
which do it.  And now there are developers who have a future-numbered libc
on their system, which doesn't do future things.

It is not justifiable.

It does NOT help people quickly test the diff, as such an approach
requires making assumptions which are more complicated then the diff.
This is not the purpose of major and minor numbers!



Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti
On Sat, May 30, 2020 at 09:59:41AM -0600, Theo de Raadt wrote:
> Paul Irofti  wrote:
> 
> > > A few more notes below.
> > 
> > I addressed all the comments. Here is the updated diff. This includes
> > the rename to usertc that I suggested.
> 
> I want to see support for quite a few more architectures, especially
> those which are very different, because changing format of the shared
> page later will be very painful.

Sure. Your call. This last diff will help that as discussed with
kettenis@. We now have usertc.c which should be the only place that
needs to be touched by each arch. Let's see. I am currently looking at
doing this on an octeon or a loongson. Kettenis said he will do arm64.

> > The libc bump is there because it helps me switch more easily between
> > versions.
> 
> That is bogus.  Minors are used for visible ABI additions, majors are
> used for ABI deletions or API changes visible as ABI.  Please don't
> argue for a vague extension of the rules again.

I do not know what you are taking about here. I am not looking at any
extension of the rules, nor was I in the past. The whole issue of
bumping I leave it up to you and whoever understands these rules. Some
developers said this is not required, including kettenis@, and this is
why I justified the bump in my diff. That and it might also help others
quickly test the diff.

> In essence this introduction requires no major or minor crank becuase
> it just starts selecting a different backend which is newly supplied.
> But as soon as the back-end is changed, the version number will barely
> help, since code which can't match it has to revert to the non-optimized
> path.
> 
> I don't believe you can shortcut this by supporting 1 architecture and
> casting a prayer it's going to be fine.

I am not trying to shortcut anything. I am in no rush for anything.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
Additional question:

What happens during suspend/resume, over over a hibernate.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
Paul Irofti  wrote:

> > A few more notes below.
> 
> I addressed all the comments. Here is the updated diff. This includes
> the rename to usertc that I suggested.

I want to see support for quite a few more architectures, especially
those which are very different, because changing format of the shared
page later will be very painful.

> The libc bump is there because it helps me switch more easily between
> versions.

That is bogus.  Minors are used for visible ABI additions, majors are
used for ABI deletions or API changes visible as ABI.  Please don't
argue for a vague extension of the rules again.


In essence this introduction requires no major or minor crank becuase
it just starts selecting a different backend which is newly supplied.
But as soon as the back-end is changed, the version number will barely
help, since code which can't match it has to revert to the non-optimized
path.

I don't believe you can shortcut this by supporting 1 architecture and
casting a prayer it's going to be fine.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti
> A few more notes below.

I addressed all the comments. Here is the updated diff. This includes
the rename to usertc that I suggested.

The libc bump is there because it helps me switch more easily between
versions. A lot of our developers tested and reported no issues with
eluding the bump. So we can remove it in the final step if you think
that's OK.

Paul


diff --git lib/libc/arch/amd64/gen/Makefile.inc 
lib/libc/arch/amd64/gen/Makefile.inc
index e995309ed71..c80f5cf671a 100644
--- lib/libc/arch/amd64/gen/Makefile.inc
+++ lib/libc/arch/amd64/gen/Makefile.inc
@@ -2,6 +2,6 @@
 
 SRCS+= _setjmp.S fabs.S infinity.c ldexp.c modf.S nan.c setjmp.S \
sigsetjmp.S
-SRCS+= fpclassifyl.c isfinitel.c isinfl.c isnanl.c isnormall.c signbitl.c
+SRCS+= fpclassifyl.c usertc.c isfinitel.c isinfl.c isnanl.c isnormall.c 
signbitl.c
 SRCS+= flt_rounds.S fpgetmask.S fpgetround.S fpgetsticky.S fpsetmask.S \
fpsetround.S fpsetsticky.S
diff --git lib/libc/arch/amd64/gen/usertc.c lib/libc/arch/amd64/gen/usertc.c
new file mode 100644
index 000..b14c862c61a
--- /dev/null
+++ lib/libc/arch/amd64/gen/usertc.c
@@ -0,0 +1,26 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (c) 2020 Paul Irofti 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+
+uint64_t
+tc_get_timecount_md(void)
+{
+   uint32_t hi, lo;
+   asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
+   return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+}
diff --git lib/libc/asr/asr.c lib/libc/asr/asr.c
index cd056c85719..2b25d49f32a 100644
--- lib/libc/asr/asr.c
+++ lib/libc/asr/asr.c
@@ -196,11 +196,11 @@ poll_intrsafe(struct pollfd *fds, nfds_t nfds, int 
timeout)
struct timespec pollstart, pollend, elapsed;
int r;
 
-   if (clock_gettime(CLOCK_MONOTONIC, ))
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
return -1;
 
while ((r = poll(fds, 1, timeout)) == -1 && errno == EINTR) {
-   if (clock_gettime(CLOCK_MONOTONIC, ))
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
return -1;
timespecsub(, , );
timeout -= elapsed.tv_sec * 1000 + elapsed.tv_nsec / 100;
@@ -418,7 +418,7 @@ asr_check_reload(struct asr *asr)
asr->a_rtime = 0;
}
 
-   if (clock_gettime(CLOCK_MONOTONIC, ) == -1)
+   if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ) == -1)
return;
 
if ((ts.tv_sec - asr->a_rtime) < RELOAD_DELAY && asr->a_rtime != 0)
diff --git lib/libc/crypt/bcrypt.c lib/libc/crypt/bcrypt.c
index 82de8fa33b7..02fd3013cc1 100644
--- lib/libc/crypt/bcrypt.c
+++ lib/libc/crypt/bcrypt.c
@@ -248,9 +248,9 @@ _bcrypt_autorounds(void)
char buf[_PASSWORD_LEN];
int duration;
 
-   clock_gettime(CLOCK_THREAD_CPUTIME_ID, );
+   WRAP(clock_gettime)(CLOCK_THREAD_CPUTIME_ID, );
bcrypt_newhash("testpassword", r, buf, sizeof(buf));
-   clock_gettime(CLOCK_THREAD_CPUTIME_ID, );
+   WRAP(clock_gettime)(CLOCK_THREAD_CPUTIME_ID, );
 
duration = after.tv_sec - before.tv_sec;
duration *= 100;
diff --git lib/libc/dlfcn/init.c lib/libc/dlfcn/init.c
index 270f54aada5..860ae2b8698 100644
--- lib/libc/dlfcn/init.c
+++ lib/libc/dlfcn/init.c
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include /* timekeep */
 
 #ifndef PIC
 #include 
@@ -45,8 +46,9 @@
 /* XXX should be in an include file shared with csu */
 char   ***_csu_finish(char **_argv, char **_envp, void (*_cleanup)(void));
 
-/* provide definition for this */
+/* provide definitions for these */
 int_pagesize = 0;
+void   *_timekeep = NULL;
 
 /*
  * In dynamicly linked binaries environ and __progname are overriden by
@@ -68,6 +70,12 @@ extern Elf_Ehdr __executable_start[] __attribute__((weak));
 
 /* provide definitions for these */
 const dl_cb *_dl_cb __relro = NULL;
+#if defined(__amd64)
+uint64_t (*const tc_get_timecount)(void) = tc_get_timecount_md;
+#else
+uint64_t (*const tc_get_timecount)(void) = NULL;
+#endif
+
 
 void _libc_preinit(int, char **, char **, dl_cb_cb *) __dso_hidden;
 void
@@ -105,6 +113,10 @@ _libc_preinit(int argc, char **argv, char **envp, dl_cb_cb 
*cb)
phnum = aux->au_v;
break;
 #endif /* !PIC */
+   case 

Re: Kill NFS-only kqueue poller thread

2020-05-30 Thread Visa Hankala
On Sat, May 30, 2020 at 03:34:06PM +0200, Martin Pieuchot wrote:
> On 30/05/20(Sat) 09:22, Visa Hankala wrote:
> > On Thu, May 28, 2020 at 12:11:20PM +0200, Martin Pieuchot wrote:
> > > When it comes to kqueue filters NFS is special.  A custom thread is
> > > created when the first event is registered.  Its purpose is to poll
> > > for changes every 2.5sec.  This logic has been inherited from NetBSD
> > > and is not present in FreeBSD.
> > > 
> > > Since filt_nfsread() only check `n_size' of a given nfsnode having a
> > > different context to emulate stat(2) behavior in kernel is questionable.
> > > So the diff below gets rid of this logic.  The rationals are KISS,
> > > coherency with nfs_poll() and match what other FS kqueue filters do.
> > > 
> > > While here add a missing write filter.
> > > 
> > > Matching the nfs_poll() logic and the write filter are necessary to keep
> > > the current behavior of poll(2) & select(2) once they start querying
> > > kqfilter handlers.
> > 
> > I think it is not good to remove nfs_kqpoll(). The function does more
> > than just supports the read filter. It generates various vnode events.
> > If the poller is removed, it becomes impossible for example to monitor
> > an NFS directory with kevent(2).
> 
> What do you mean with "impossible to monitor a NFS directory"?  Are you
> saying that NFS being is the only FS to have a specific thread to implement
> a poller is wanted feature?  If it is wanted, why is it NFS only?  Which
> use case involve monitoring a NFS directory with kevent(2)?  When are we
> aware of the existence of a "nfskqpoll" kernel thread?

Local filesystems can observe changes at the source, which makes polling
unnecessary. NFS clients do not have that benefit. The NFSv3 protocol
lacks a mechanism to notify clients of changes.

The NFS polling mechanism is in use for example when running tail -f
on a remote file. Directory monitoring can be utilized for example
by a directory browser application or a distributed batch processing
system.

> > It is true that the code is not perfect, but having even basic and
> > best-effort functionality with kevent(2) and NFS is better than nothing
> > in my opinion. The kqueue subsystem should not dumb down for emulating
> > poll(2) and select(2).
> 
> In that case we've to accept or work-around with the fact that a thread
> is allocated, an operation that can fail, in the kqfilter().  Do you have
> an opinion on that?  What should we do for other FSs is uniformity wanted?

Can't the emulation of poll(2) and select(2) operate without the poller?
These system calls are supposed to return true for regular files for
reading and writing. No polling of remote state is needed for that.

The read behaviour of poll(2) and select(2) differs from kevent(2)'s
EVFILT_READ. Consequently, there has to be an indicator that makes
f_event work in poll(2)/select(2) mode. I guess the same indicator can
control nfs_kqfilter().



Re: smtpd: make smarthost to use SNI when relaying

2020-05-30 Thread Todd C . Miller
On Sat, 30 May 2020 17:40:43 +0200, Sebastien Marie wrote:

> I am looking to make smtpd to set SNI (SSL_set_tlsext_host_name)
> when connecting to smarthost when relaying mail.
>
> After digging a bit in libtls (to stole the right code) and smtpd (to
> see where to put the stolen code), I have the following diff:

Consider using a union for addrbuf, e.g.

union {
struct in_addr addr4;
struct in6_addr addr6;
} addrbuf;

There's no need to have both on the stack.  Otherwise looks reasonable
to me.

 - todd



Re: fix pppx(4) with net/ifq.c rev 1.38

2020-05-30 Thread Vitaliy Makkoveev


> On 30 May 2020, at 09:40, David Gwynne  wrote:
> 
> On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
>> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
 On 23 May 2020, at 12:54, Martin Pieuchot  wrote:
 On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
>> [...] 
>> can you try the following diff?
>> 
> 
> I tested this diff and it works for me. But the problem I pointed is
> about pipex(4) locking.
> 
> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
> ip{,6}_output() but for itself too. But since pppac_start() has
> unpredictable behavior I suggested to make it predictable [1].
 
 What needs the NET_LOCK() in their?  We're talking about
 pipex_ppp_output(), right?  Does it really need the NET_LOCK() or
 the KERNEL_LOCK() is what protects those data structures?
>>> 
>>> Yes, about pipex_ppp_output() and pipex_output(). Except
>>> ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
>>> they can be replaced by ip{,6}_send().
>> 
>> Locks protect data structures, you're talking about functions, which
>> data structures are serialized by this lock?  I'm questioning whether
>> there is one.
>> 
>>> [...]
 In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
>>> 
>>> I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
>>> accessed through `pr_input???. Is NET_LOCK() required for this case?
>> 
>> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
>> because it was easy to do.  That means it isn't a concious decision or
>> design.  The fact that pipex(4) code runs under the NET_LOCK() is a side
>> effect of how the rest of the stack evolved.  I'm questioning whether
>> this lock is required there.  In theory it shouldn't.  What is the
>> reality?
> 
> pipex and pppx pre-date the NET_LOCK, which means you can assume
> that any implicit locking was and is done by the KERNEL_LOCK. mpi is
> asking the right questions here.
> 
> As for the ifq maxlen difference between pppx and pppac, that's more
> about when and how quickly they were written more than anything else.
> The IFQ_SET_MAXLEN(>if_snd, 1) in pppx is because that's a way to
> bypass transmit mitigation for pseudo/virtual interfaces. That was the
> only way to do it historically. It is not an elegant hack to keep
> hold of the NET_LOCK over a call to a start routine.
> 
> As a rule of thumb, network interface drivers should not (maybe
> cannot) rely on the NET_LOCK in their if_start handlers. To be
> clear, they should not rely on it being held by the network stack
> when if_start is called because sometimes the stack calls it without
> holding NET_LOCK, and they should not take it because they might
> be called by the stack when it is being held.
> 
> Also, be aware that the ifq machinery makes sure that the start
> routine is not called concurrently or recursively. You can queue
> packets for transmission on an ifq from anywhere in the kernel at
> any time, but only one cpu will run the start routine. Other cpus
> can queue packets while another one is running if_start, but the
> first one ends up responsible for trying to transmit it.
> 
> ifqs also take the KERNEL_LOCK before calling if_start if the interface
> is not marked as IFXF_MPSAFE.
> 
> The summary is that pppx and pppac are not marked as mpsafe so their
> start routines are called with KERNEL_LOCK held. Currently pppx
> accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
> rely on it.
> 
> Cheers,
> dlg
> 

Thanks for explanation.
Will you commit diff you posted in this thread?



smtpd: make smarthost to use SNI when relaying

2020-05-30 Thread Sebastien Marie
Hi,

I am looking to make smtpd to set SNI (SSL_set_tlsext_host_name) when connecting
to smarthost when relaying mail.

After digging a bit in libtls (to stole the right code) and smtpd (to see where
to put the stolen code), I have the following diff:


diff 73b535ef4537e8454483912fc3420bc304759e96 /home/semarie/repos/openbsd/src
blob - d384692a0e43de47d645142a6b99e72b7d83b687
file + usr.sbin/smtpd/mta_session.c
--- usr.sbin/smtpd/mta_session.c
+++ usr.sbin/smtpd/mta_session.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -1604,6 +1605,8 @@ mta_cert_init_cb(void *arg, int status, const char *na
struct mta_session *s = arg;
void *ssl;
char *xname = NULL, *xcert = NULL;
+   struct in_addr addrbuf4;
+   struct in6_addr addrbuf6;
 
if (s->flags & MTA_WAIT)
mta_tree_pop(_tls_init, s->id);
@@ -1623,6 +1626,24 @@ mta_cert_init_cb(void *arg, int status, const char *na
free(xcert);
if (ssl == NULL)
fatal("mta: ssl_mta_init");
+
+   /*
+* RFC4366 (SNI): Literal IPv4 and IPv6 addresses are not
+* permitted in "HostName".
+*/
+   if (s->relay->domain->as_host == 1) {
+   xname = xstrdup(s->relay->domain->name);
+   if (inet_pton(AF_INET, xname, ) != 1 &&
+   inet_pton(AF_INET6, xname, ) != 1) {
+   log_info("%016"PRIx64" mta setting SNI name=%s",
+   s->id, xname);
+   if (SSL_set_tlsext_host_name(ssl, xname) == 0)
+   log_warnx("%016"PRIx64" mta setting SNI failed",
+  s->id);
+   }
+   free(xname);
+   }
+
io_start_tls(s->io, ssl);
 }
 


For what I understood:

mta_cert_init_cb() function is responsable to prepare a connection. the SSL
initialization (SSL_new() call) occured in ssl_mta_init() which was just called,
so it seems it is the right place to call SSL_set_tlsext_host_name().

We just need the hostname to configure it.

Regarding mta_session structure, relay->domain->as_host is set to 1 when the
domain is linked to smarthost configuration (or when the mx is ip address I
think). And in smarthost case, the domain->name is the hostname. For SNI, we are
excluding ip, so I assume it should copte with domain->name as ip.

Does someone with better understanding of smtpd code source could confirm the
approch is right and comment ?

Please note I have only tested it on simple configuration.

Thanks.
-- 
Sebastien Marie



Re: official ports vs DEBUG_PACKAGES

2020-05-30 Thread Jeremie Courreges-Anglas
On Sat, May 30 2020, Marc Espie  wrote:
>> - have some magic I don't know in ELF handling that would allow to either
>> tweak the default location/introduce ${WRKOBJDIR} in that debug info.
>
> Thinking some more about it, that 3rd option is possibly not that
> far-fetched
>
> we do pass most debug-info thru dwz  for shrinkage, so there's one tool
> that does peek inside dwarf debug and does compaction, I have zero idea how
> hard it is to tweak paths as well.

Take a look at -fdebug-prefix-map.

  https://reproducible-builds.org/docs/build-path/

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



ospf6d: change the way interfaces are handled

2020-05-30 Thread Denis Fondras
This diff updates how ospf6d(8) handles interfaces.
It is now in line with what ospfd(8) does.

Last step before enabling reload.

Tested against Mikrotik and Zebra implementations.

Warning: it changes the default behaviour. No prefix is announced if no
"redistribute" statement is present in config file. Is this a showstopper ?

Index: hello.c
===
RCS file: /cvs/src/usr.sbin/ospf6d/hello.c,v
retrieving revision 1.22
diff -u -p -r1.22 hello.c
--- hello.c 3 Jan 2020 17:25:48 -   1.22
+++ hello.c 30 May 2020 14:19:09 -
@@ -175,12 +175,16 @@ recv_hello(struct iface *iface, struct i
nbr->priority = LSA_24_GETHI(ntohl(hello.opts));
/* XXX neighbor address shouldn't be stored on virtual links */
nbr->addr = *src;
+   ospfe_imsg_compose_rde(IMSG_NEIGHBOR_ADDR, nbr->peerid, 0,
+   src, sizeof(struct in6_addr));
}
 
if (!IN6_ARE_ADDR_EQUAL(>addr, src)) {
log_warnx("%s: neighbor ID %s changed its address to %s",
__func__, inet_ntoa(nbr->id), log_in6addr(src));
nbr->addr = *src;
+   ospfe_imsg_compose_rde(IMSG_NEIGHBOR_ADDR, nbr->peerid, 0,
+   src, sizeof(struct in6_addr));
}
 
nbr->options = opts;
Index: interface.c
===
RCS file: /cvs/src/usr.sbin/ospf6d/interface.c,v
retrieving revision 1.29
diff -u -p -r1.29 interface.c
--- interface.c 27 May 2020 09:03:56 -  1.29
+++ interface.c 30 May 2020 14:19:09 -
@@ -72,8 +72,6 @@ struct {
 static int vlink_cnt = 0;
 #endif
 
-TAILQ_HEAD(, iface)iflist;
-
 const char * const if_event_names[] = {
"NOTHING",
"UP",
@@ -145,10 +143,6 @@ if_fsm(struct iface *iface, enum iface_e
area_track(iface->area);
orig_rtr_lsa(iface->area);
orig_link_lsa(iface);
-
-   /* state change inform RDE */
-   ospfe_imsg_compose_rde(IMSG_IFINFO, iface->self->peerid, 0,
-   >state, sizeof(iface->state));
}
 
if (old_state & (IF_STA_MULTI | IF_STA_POINTTOPOINT) &&
@@ -166,41 +160,8 @@ if_fsm(struct iface *iface, enum iface_e
return (ret);
 }
 
-int
-if_init(void)
-{
-   TAILQ_INIT();
-
-   return (fetchifs(0));
-}
-
-/* XXX using a linked list should be OK for now */
 struct iface *
-if_find(unsigned int ifindex)
-{
-   struct iface*iface;
-
-   TAILQ_FOREACH(iface, , list) {
-   if (ifindex == iface->ifindex)
-   return (iface);
-   }
-   return (NULL);
-}
-
-struct iface *
-if_findname(char *name)
-{
-   struct iface*iface;
-
-   TAILQ_FOREACH(iface, , list) {
-   if (!strcmp(name, iface->name))
-   return (iface);
-   }
-   return (NULL);
-}
-
-struct iface *
-if_new(u_short ifindex, char *ifname)
+if_new(struct kif *kif, struct kif_addr *ka)
 {
struct iface*iface;
 
@@ -210,7 +171,6 @@ if_new(u_short ifindex, char *ifname)
iface->state = IF_STA_DOWN;
 
LIST_INIT(>nbr_list);
-   TAILQ_INIT(>ifa_list);
TAILQ_INIT(>ls_ack_list);
RB_INIT(>lsa_tree);
 
@@ -225,34 +185,36 @@ if_new(u_short ifindex, char *ifname)
return (iface);
}
 #endif
-   strlcpy(iface->name, ifname, sizeof(iface->name));
-   iface->ifindex = ifindex;
-
-   TAILQ_INSERT_TAIL(, iface, list);
-
-   return (iface);
-}
 
-void
-if_update(struct iface *iface, int mtu, int flags, u_int8_t type,
-u_int8_t state, u_int64_t rate, u_int32_t rdomain)
-{
-   iface->mtu = mtu;
-   iface->flags = flags;
-   iface->if_type = type;
-   iface->linkstate = state;
-   iface->baudrate = rate;
-   iface->rdomain = rdomain;
+   strlcpy(iface->name, kif->ifname, sizeof(iface->name));
 
-   /* set type */
-   if (flags & IFF_POINTOPOINT)
+   /* get type */
+   if (kif->flags & IFF_POINTOPOINT)
iface->type = IF_TYPE_POINTOPOINT;
-   if (flags & IFF_BROADCAST && flags & IFF_MULTICAST)
+   if (kif->flags & IFF_BROADCAST && kif->flags & IFF_MULTICAST)
iface->type = IF_TYPE_BROADCAST;
-   if (flags & IFF_LOOPBACK) {
+   if (kif->flags & IFF_LOOPBACK) {
iface->type = IF_TYPE_POINTOPOINT;
-   iface->cflags |= F_IFACE_PASSIVE;
+   iface->passive = 1;
}
+
+   /* get mtu, index and flags */
+   iface->mtu = kif->mtu;
+   iface->ifindex = kif->ifindex;
+   iface->rdomain = kif->rdomain;
+   iface->flags = kif->flags;
+   iface->linkstate = kif->link_state;
+   iface->if_type = kif->if_type;
+   iface->baudrate = kif->baudrate;
+
+   /* set address, mask and p2p addr */
+   iface->addr = ka->addr;
+   

Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
Paul Irofti  wrote:

> Exposing the skews to the user is easy. The hard bit is figuring out
> on which CPU you are to pick the proper skew without doing a system
> call. If you do a syscall then all of this is for nothing :)

That can't work right.  When you figure out which cpu you are on, you
context switch and now it is untrue.

If other systems are trying to handle this, it should be looked at how
they try to handle it.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Theo de Raadt
Robert Nagy  wrote:

> > I also would like to see at least one non-amd64 platform supported
> > before we settle on this approach.
> 
> 
> Which one would you prefer? arm64?

As many as possible.

If a design issue is encountered, cranking the major on this *might not
be easy*.  If the mechanism changes, the code may need to support TWO
methods to complete the cross-over.

That's a bunch of BS which can be avoided writing support for maximum
number of diverse platforms FIRST.




Re: Kill NFS-only kqueue poller thread

2020-05-30 Thread Martin Pieuchot
On 30/05/20(Sat) 09:22, Visa Hankala wrote:
> On Thu, May 28, 2020 at 12:11:20PM +0200, Martin Pieuchot wrote:
> > When it comes to kqueue filters NFS is special.  A custom thread is
> > created when the first event is registered.  Its purpose is to poll
> > for changes every 2.5sec.  This logic has been inherited from NetBSD
> > and is not present in FreeBSD.
> > 
> > Since filt_nfsread() only check `n_size' of a given nfsnode having a
> > different context to emulate stat(2) behavior in kernel is questionable.
> > So the diff below gets rid of this logic.  The rationals are KISS,
> > coherency with nfs_poll() and match what other FS kqueue filters do.
> > 
> > While here add a missing write filter.
> > 
> > Matching the nfs_poll() logic and the write filter are necessary to keep
> > the current behavior of poll(2) & select(2) once they start querying
> > kqfilter handlers.
> 
> I think it is not good to remove nfs_kqpoll(). The function does more
> than just supports the read filter. It generates various vnode events.
> If the poller is removed, it becomes impossible for example to monitor
> an NFS directory with kevent(2).

What do you mean with "impossible to monitor a NFS directory"?  Are you
saying that NFS being is the only FS to have a specific thread to implement
a poller is wanted feature?  If it is wanted, why is it NFS only?  Which
use case involve monitoring a NFS directory with kevent(2)?  When are we
aware of the existence of a "nfskqpoll" kernel thread?

> It is true that the code is not perfect, but having even basic and
> best-effort functionality with kevent(2) and NFS is better than nothing
> in my opinion. The kqueue subsystem should not dumb down for emulating
> poll(2) and select(2).

In that case we've to accept or work-around with the fact that a thread
is allocated, an operation that can fail, in the kqfilter().  Do you have
an opinion on that?  What should we do for other FSs is uniformity wanted?



Re: official ports vs DEBUG_PACKAGES

2020-05-30 Thread Marc Espie
> - have some magic I don't know in ELF handling that would allow to either
> tweak the default location/introduce ${WRKOBJDIR} in that debug info.

Thinking some more about it, that 3rd option is possibly not that
far-fetched

we do pass most debug-info thru dwz  for shrinkage, so there's one tool
that does peek inside dwarf debug and does compaction, I have zero idea how
hard it is to tweak paths as well.



Re: [patch] azalia: Intel 300 Series HD Audio

2020-05-30 Thread Bruno Flueckiger
On 29.05., Jonathan Gray wrote:
> On Fri, May 29, 2020 at 11:25:44AM +0200, Bruno Flueckiger wrote:
> > Hi,
> >
> > My brand new laptop HP EliteBook 850 G6 comes with an Intel 300 Series
> > HD Audio device rev 0x11. The device shows up as not configured in the
> > dmesg. The PCI config space of the device identifies its subclass as
> > PCI_SUBCLASS_MULTIMEDIA_AUDIO instead of PCI_SUBCLASS_MULTIMEDIA_HDAUDIO
> >
> > The patch below makes the device work just fine on my laptop.
> >
> > Cheers,
> > Bruno
>
> I think it would be better to match on the id in that case as non-azalia
> devices use the audio class.
>
> Index: azalia.c
> ===
> RCS file: /cvs/src/sys/dev/pci/azalia.c,v
> retrieving revision 1.255
> diff -u -p -r1.255 azalia.c
> --- azalia.c  18 Apr 2020 21:55:56 -  1.255
> +++ azalia.c  29 May 2020 09:51:06 -
> @@ -474,6 +474,10 @@ azalia_configure_pci(azalia_t *az)
>   }
>  }
>
> +const struct pci_matchid azalia_pci_devices[] = {
> + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_300SERIES_U_HDA }
> +};
> +
>  int
>  azalia_pci_match(struct device *parent, void *match, void *aux)
>  {
> @@ -483,7 +487,8 @@ azalia_pci_match(struct device *parent,
>   if (PCI_CLASS(pa->pa_class) == PCI_CLASS_MULTIMEDIA
>   && PCI_SUBCLASS(pa->pa_class) == PCI_SUBCLASS_MULTIMEDIA_HDAUDIO)
>   return 1;
> - return 0;
> + return pci_matchbyid((struct pci_attach_args *)aux, azalia_pci_devices,
> + nitems(azalia_pci_devices));
>  }
>
>  void
>

Your patch works just fine on my laptop.

Cheers,
Bruno



Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti

On 2020-05-30 12:30, Mark Kettenis wrote:

Date: Fri, 29 May 2020 17:51:50 +0300
From: Paul Irofti 

On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:

Date: Fri, 29 May 2020 13:45:37 +0100
From: Stuart Henderson 

On 2020/05/29 13:50, Paul Irofti wrote:

+struct __timekeep {
+   uint32_t major; /* version major number */
+   uint32_t minor; /* version minor number */
+
+   u_int64_t   th_scale;
+   unsigned intth_offset_count;
+   struct bintime  th_offset;
+   struct bintime  th_naptime;
+   struct bintime  th_boottime;
+   volatile unsigned int   th_generation;
+
+   unsigned inttc_user;
+   unsigned inttc_counter_mask;
+};


Ah good, you got rid of u_int, that was causing problems with port builds.


That in itself is a problem.  This means  is the wrong place
for this struct.  We need to find a better place for this.

Since this is now closely linked to the timecounter stuff
 would be an obvious place.  Now that file has:

#ifndef _KERNEL
#error "no user-serviceable parts inside"
#endif

you could change that into

#if !defined(_KERNEL) && !defined(_LIBC)
#error "no user-serviceable parts inside"
#endif

and make sure you #define _LIBC brefore uncluding this file where it
is needed.  As few places as possible obviously.


Done. Also includes claudio@'s observation.


What are your plans to deal with the potential "skew" between the TSCs
on different processors?  We can probably tolerate a small skew
without having to worry about it un userland as long as the skew is
smaller than the time it takes to do a context switch.  If you want to
handle the skew in userland, you need to export the skews somewhere on
the timekeep page and we'd need to use rdtscp to read the TSC and
associate it with the right skew.


The results I got from last years work on fixing TSC and adding per CPU 
skew, indicated that the skew has small values (two digit numbers 
usually). So indeed this does not seem an issue for userland.


Exposing the skews to the user is easy. The hard bit is figuring out on 
which CPU you are to pick the proper skew without doing a system call. 
If you do a syscall then all of this is for nothing :)


One option is to use a hard-thresholding strategy as you describe.

if (timekeep->maxskew > TK_MAXSKEW_THRESHOLD)
  return clock_gettime();

Another is to add support in libc to figure out on what CPU it is 
running. I don't have a plan for that yet. You mention associating the 
right skew for the RDTSCP call, do you have an example of how to do that?


I will also probably add support for HPET clocks (if this diff goes in) 
as some machines do not have a proper, invariant, TSC (like solene@'s) 
and, perhaps, others might want to switch for other reasons.



A few more notes below.


I will fix these later and come back with a diff. Thank you for the review!



Re: userland clock_gettime proof of concept

2020-05-30 Thread Mark Kettenis
> From: Paul Irofti 
> Date: Sat, 30 May 2020 13:53:18 +0300
> 
> On 2020-05-30 12:40, Mark Kettenis wrote:
> >> Date: Sat, 30 May 2020 10:49:07 +0200
> >> From: Robert Nagy 
> >>
> >> On 30/05/20 10:40 +0200, Mark Kettenis wrote:
>  Date: Sat, 30 May 2020 10:32:15 +0200
>  From: Robert Nagy 
> 
>  On 29/05/20 17:51 +0300, Paul Irofti wrote:
> > On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> >>> Date: Fri, 29 May 2020 13:45:37 +0100
> >>> From: Stuart Henderson 
> >>>
> >>> On 2020/05/29 13:50, Paul Irofti wrote:
>  +struct __timekeep {
>  +uint32_t major; /* version major number */
>  +uint32_t minor; /* version minor number */
>  +
>  +u_int64_t   th_scale;
>  +unsigned intth_offset_count;
>  +struct bintime  th_offset;
>  +struct bintime  th_naptime;
>  +struct bintime  th_boottime;
>  +volatile unsigned int   th_generation;
>  +
>  +unsigned inttc_user;
>  +unsigned inttc_counter_mask;
>  +};
> >>>
> >>> Ah good, you got rid of u_int, that was causing problems with port 
> >>> builds.
> >>
> >> That in itself is a problem.  This means  is the wrong place
> >> for this struct.  We need to find a better place for this.
> >>
> >> Since this is now closely linked to the timecounter stuff
> >>  would be an obvious place.  Now that file has:
> >>
> >> #ifndef _KERNEL
> >> #error "no user-serviceable parts inside"
> >> #endif
> >>
> >> you could change that into
> >>
> >> #if !defined(_KERNEL) && !defined(_LIBC)
> >> #error "no user-serviceable parts inside"
> >> #endif
> >>
> >> and make sure you #define _LIBC brefore uncluding this file where it
> >> is needed.  As few places as possible obviously.
> >
> > Done. Also includes claudio@'s observation.
> 
>  I think if there are no more header changes, this should be commited to
>  have wider testing. We are also just after tree unlock so it feels like
>  the right time, and since there is no library bump we can easily revert
>  if there is a need for that.
> >>>
> >>> Not ready yet.
> >>>
> >>> I also would like to see at least one non-amd64 platform supported
> >>> before we settle on this approach.
> >>
> >>
> >> Which one would you prefer? arm64?
> > 
> > yes, arm64 would be good; I can probably give it a go later this weekend
> 
> I was thinking we could have a common name for the MD (arch) files. In 
> my diff it is rdtsc.c, but I think we can switch to have all the arches 
> have a file named usertc.c. What do you think?
> 
>arch/amd64/gen/rdtsc.c -> arch/amd64/gen/usertc.c

Yes, that would be better.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Paul Irofti

On 2020-05-30 12:40, Mark Kettenis wrote:

Date: Sat, 30 May 2020 10:49:07 +0200
From: Robert Nagy 

On 30/05/20 10:40 +0200, Mark Kettenis wrote:

Date: Sat, 30 May 2020 10:32:15 +0200
From: Robert Nagy 

On 29/05/20 17:51 +0300, Paul Irofti wrote:

On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:

Date: Fri, 29 May 2020 13:45:37 +0100
From: Stuart Henderson 

On 2020/05/29 13:50, Paul Irofti wrote:

+struct __timekeep {
+   uint32_t major; /* version major number */
+   uint32_t minor; /* version minor number */
+
+   u_int64_t   th_scale;
+   unsigned intth_offset_count;
+   struct bintime  th_offset;
+   struct bintime  th_naptime;
+   struct bintime  th_boottime;
+   volatile unsigned int   th_generation;
+
+   unsigned inttc_user;
+   unsigned inttc_counter_mask;
+};


Ah good, you got rid of u_int, that was causing problems with port builds.


That in itself is a problem.  This means  is the wrong place
for this struct.  We need to find a better place for this.

Since this is now closely linked to the timecounter stuff
 would be an obvious place.  Now that file has:

#ifndef _KERNEL
#error "no user-serviceable parts inside"
#endif

you could change that into

#if !defined(_KERNEL) && !defined(_LIBC)
#error "no user-serviceable parts inside"
#endif

and make sure you #define _LIBC brefore uncluding this file where it
is needed.  As few places as possible obviously.


Done. Also includes claudio@'s observation.


I think if there are no more header changes, this should be commited to
have wider testing. We are also just after tree unlock so it feels like
the right time, and since there is no library bump we can easily revert
if there is a need for that.


Not ready yet.

I also would like to see at least one non-amd64 platform supported
before we settle on this approach.



Which one would you prefer? arm64?


yes, arm64 would be good; I can probably give it a go later this weekend


I was thinking we could have a common name for the MD (arch) files. In 
my diff it is rdtsc.c, but I think we can switch to have all the arches 
have a file named usertc.c. What do you think?


  arch/amd64/gen/rdtsc.c -> arch/amd64/gen/usertc.c


Paul, do you have some sort of regression test for this stuff?


If you use the minor bump you can switch between libc's easily and 
that's what I do now. My main regress test is Firefox.


I also have a few hand written smoke tests that I wrote in the beginning 
with which I test with when I do major changes. I placed them on 
cvs:~pirofti/timekeep/.


Another batch that I run is the posixtestsuite (that is available as a 
package now). Example: 
/usr/local/libexec/posixtestsuite/conformance/interfaces/clock_gettime/1-1.test




Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 10:11:08AM +0100, Mark Cave-Ayland wrote:

> On 30/05/2020 10:03, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > thanks for the hints, but an unpatched 6.7 miniroot still fails to
> > boot for me
> > 
> > qemu-system-sparc64 -machine sun4u -m 1024 -drive \
> > file=miniroot67.img,format=raw -nographic -serial stdio -monitor none
> > 
> > OpenBIOS for Sparc64
> > Configuration device id QEMU version 1 machine id 0
> > kernel cmdline 
> > CPUs: 1 x SUNW,UltraSPARC-IIi
> > UUID: ----
> > Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
> >   Type 'help' for detailed information
> > Trying disk:a...
> > Not a bootable ELF image
> > Not a bootable a.out image
> > 
> > Loading FCode image...
> > Loaded 6882 bytes
> > entry point is 0x4000
> > Evaluating FCode...
> > OpenBSD IEEE 1275 Bootblock 2.0
> > ..
> > 
> > And then hangs
> > 
> > While the patched bootblocks do boot (but hang later after
> > 
> > scsibus1 at softraid0: 256 targets
> > 
> > 
> > as before,
> > 
> > -Otto
> 
> Hmmm odd. Is it possible for you to upload your miniroot somewhere for me to 
> take a
> quick look? I don't have a great deal of time right now, but I can run it 
> through a
> debugger to see if anything obvious shows up.
> 
> 
> ATB,
> 
> Mark.

https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/
contains the unpatched miniroot.

https://www.drijf.net/openbsd/disk.qcow2

is the disk image based on the miniroot containing the patch in the
firts post in this thread.

Thanks for looking into this.

Note that we did *not* observe boot failure on any real sparc64
hardware. The bootblock changes I did for the 6.7 release were tested
on many different machines.

-Otto




Outdated BUGS section in kqueue.2

2020-05-30 Thread Visa Hankala
The BUGS section in kqueue.2 is outdated. The FIFO limitation does not
seem to exist anymore, AIO is not implemented at all in the kernel,
and watching vnodes looks possible on all filesystem types that
allow writing.

OK to remove the section?

Index: kqueue.2
===
RCS file: src/lib/libc/sys/kqueue.2,v
retrieving revision 1.39
diff -u -p -r1.39 kqueue.2
--- kqueue.21 Jul 2019 16:52:02 -   1.39
+++ kqueue.230 May 2020 09:30:12 -
@@ -565,7 +565,3 @@ The
 .Fn kqueue
 system and this manual page were written by
 .An Jonathan Lemon Aq Mt jle...@freebsd.org .
-.Sh BUGS
-It is currently not possible to watch FIFOs or AIO that reside
-on anything but a UFS file system.
-Watching a vnode is possible on UFS, NFS and MS-DOS file systems.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Mark Kettenis
> Date: Sat, 30 May 2020 10:49:07 +0200
> From: Robert Nagy 
> 
> On 30/05/20 10:40 +0200, Mark Kettenis wrote:
> > > Date: Sat, 30 May 2020 10:32:15 +0200
> > > From: Robert Nagy 
> > > 
> > > On 29/05/20 17:51 +0300, Paul Irofti wrote:
> > > > On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> > > > > > Date: Fri, 29 May 2020 13:45:37 +0100
> > > > > > From: Stuart Henderson 
> > > > > > 
> > > > > > On 2020/05/29 13:50, Paul Irofti wrote:
> > > > > > > +struct __timekeep {
> > > > > > > + uint32_t major; /* version major number */
> > > > > > > + uint32_t minor; /* version minor number */
> > > > > > > +
> > > > > > > + u_int64_t   th_scale;
> > > > > > > + unsigned intth_offset_count;
> > > > > > > + struct bintime  th_offset;
> > > > > > > + struct bintime  th_naptime;
> > > > > > > + struct bintime  th_boottime;
> > > > > > > + volatile unsigned int   th_generation;
> > > > > > > +
> > > > > > > + unsigned inttc_user;
> > > > > > > + unsigned inttc_counter_mask;
> > > > > > > +};
> > > > > > 
> > > > > > Ah good, you got rid of u_int, that was causing problems with port 
> > > > > > builds.
> > > > > 
> > > > > That in itself is a problem.  This means  is the wrong place
> > > > > for this struct.  We need to find a better place for this.
> > > > > 
> > > > > Since this is now closely linked to the timecounter stuff
> > > > >  would be an obvious place.  Now that file has:
> > > > > 
> > > > > #ifndef _KERNEL
> > > > > #error "no user-serviceable parts inside"
> > > > > #endif
> > > > > 
> > > > > you could change that into
> > > > > 
> > > > > #if !defined(_KERNEL) && !defined(_LIBC)
> > > > > #error "no user-serviceable parts inside"
> > > > > #endif
> > > > > 
> > > > > and make sure you #define _LIBC brefore uncluding this file where it
> > > > > is needed.  As few places as possible obviously.
> > > > 
> > > > Done. Also includes claudio@'s observation.
> > > 
> > > I think if there are no more header changes, this should be commited to
> > > have wider testing. We are also just after tree unlock so it feels like
> > > the right time, and since there is no library bump we can easily revert
> > > if there is a need for that.
> > 
> > Not ready yet.
> > 
> > I also would like to see at least one non-amd64 platform supported
> > before we settle on this approach.
> 
> 
> Which one would you prefer? arm64?

yes, arm64 would be good; I can probably give it a go later this weekend

Paul, do you have some sort of regression test for this stuff?



Re: sparc64 boot issue on qemu

2020-05-30 Thread Mark Cave-Ayland
On 29/05/2020 23:56, Jason A. Donenfeld wrote:

> Oh that's a nice observation about `boot disk -V`. Doing so actually
> got me booting up entirely:
> 
> $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> $ qemu-img resize disk.qcow2 20G
> $ qemu-system-sparc64 -m 1024 -drive file=disk.qcow2,if=ide -net
> nic,model=ne2k_pci -net user -boot a -nographic -monitor none -serial
> stdio

I think the problem here is that you're asking OpenBIOS to boot from the (empty)
floppy disk with "-boot a" rather than the qcow2 image which is normally 
attached to
the first hard disk "-boot c". As this is the default, then I would expect the
command line above to work if you simply drop "-boot a".

Also is there a particular reason for using the ne2k_pci NIC instead of the 
default
in-built sunhme device? I try and keep the documentation at
https://wiki.qemu.org/Documentation/Platforms/SPARC as accurate as I can, so do 
look
there for latest best practices and command line examples.

Finally the version of qemu-system-sparc64 you are running can also boot from a
virtio-blk-pci device (again see the above wiki page for details) if you are 
looking
for the best emulated disk performance.


ATB,

Mark.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Mark Kettenis
> Date: Fri, 29 May 2020 17:51:50 +0300
> From: Paul Irofti 
> 
> On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> > > Date: Fri, 29 May 2020 13:45:37 +0100
> > > From: Stuart Henderson 
> > > 
> > > On 2020/05/29 13:50, Paul Irofti wrote:
> > > > +struct __timekeep {
> > > > +   uint32_t major; /* version major number */
> > > > +   uint32_t minor; /* version minor number */
> > > > +
> > > > +   u_int64_t   th_scale;
> > > > +   unsigned intth_offset_count;
> > > > +   struct bintime  th_offset;
> > > > +   struct bintime  th_naptime;
> > > > +   struct bintime  th_boottime;
> > > > +   volatile unsigned int   th_generation;
> > > > +
> > > > +   unsigned inttc_user;
> > > > +   unsigned inttc_counter_mask;
> > > > +};
> > > 
> > > Ah good, you got rid of u_int, that was causing problems with port builds.
> > 
> > That in itself is a problem.  This means  is the wrong place
> > for this struct.  We need to find a better place for this.
> > 
> > Since this is now closely linked to the timecounter stuff
> >  would be an obvious place.  Now that file has:
> > 
> > #ifndef _KERNEL
> > #error "no user-serviceable parts inside"
> > #endif
> > 
> > you could change that into
> > 
> > #if !defined(_KERNEL) && !defined(_LIBC)
> > #error "no user-serviceable parts inside"
> > #endif
> > 
> > and make sure you #define _LIBC brefore uncluding this file where it
> > is needed.  As few places as possible obviously.
> 
> Done. Also includes claudio@'s observation.

What are your plans to deal with the potential "skew" between the TSCs
on different processors?  We can probably tolerate a small skew
without having to worry about it un userland as long as the skew is
smaller than the time it takes to do a context switch.  If you want to
handle the skew in userland, you need to export the skews somewhere on
the timekeep page and we'd need to use rdtscp to read the TSC and
associate it with the right skew.

A few more notes below.
 
> diff --git lib/libc/arch/amd64/gen/Makefile.inc 
> lib/libc/arch/amd64/gen/Makefile.inc
> index e995309ed71..caa4452a3d9 100644
> --- lib/libc/arch/amd64/gen/Makefile.inc
> +++ lib/libc/arch/amd64/gen/Makefile.inc
> @@ -2,6 +2,6 @@
>  
>  SRCS+=   _setjmp.S fabs.S infinity.c ldexp.c modf.S nan.c setjmp.S \
>   sigsetjmp.S
> -SRCS+=   fpclassifyl.c isfinitel.c isinfl.c isnanl.c isnormall.c 
> signbitl.c
> +SRCS+=   fpclassifyl.c rdtsc.c isfinitel.c isinfl.c isnanl.c isnormall.c 
> signbitl.c
>  SRCS+=   flt_rounds.S fpgetmask.S fpgetround.S fpgetsticky.S fpsetmask.S 
> \
>   fpsetround.S fpsetsticky.S
> diff --git lib/libc/arch/amd64/gen/rdtsc.c lib/libc/arch/amd64/gen/rdtsc.c
> new file mode 100644
> index 000..b14c862c61a
> --- /dev/null
> +++ lib/libc/arch/amd64/gen/rdtsc.c
> @@ -0,0 +1,26 @@
> +/*   $OpenBSD$ */
> +/*
> + * Copyright (c) 2020 Paul Irofti 
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#include 
> +
> +uint64_t
> +tc_get_timecount_md(void)
> +{
> + uint32_t hi, lo;
> + asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> + return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> +}
> diff --git lib/libc/asr/asr.c lib/libc/asr/asr.c
> index cd056c85719..2b25d49f32a 100644
> --- lib/libc/asr/asr.c
> +++ lib/libc/asr/asr.c
> @@ -196,11 +196,11 @@ poll_intrsafe(struct pollfd *fds, nfds_t nfds, int 
> timeout)
>   struct timespec pollstart, pollend, elapsed;
>   int r;
>  
> - if (clock_gettime(CLOCK_MONOTONIC, ))
> + if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
>   return -1;
>  
>   while ((r = poll(fds, 1, timeout)) == -1 && errno == EINTR) {
> - if (clock_gettime(CLOCK_MONOTONIC, ))
> + if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ))
>   return -1;
>   timespecsub(, , );
>   timeout -= elapsed.tv_sec * 1000 + elapsed.tv_nsec / 100;
> @@ -418,7 +418,7 @@ asr_check_reload(struct asr *asr)
>   asr->a_rtime = 0;
>   }
>  
> - if (clock_gettime(CLOCK_MONOTONIC, ) == -1)
> + if (WRAP(clock_gettime)(CLOCK_MONOTONIC, ) == -1)
>   return;
>  
>   if ((ts.tv_sec - 

Re: Kill NFS-only kqueue poller thread

2020-05-30 Thread Visa Hankala
On Thu, May 28, 2020 at 12:11:20PM +0200, Martin Pieuchot wrote:
> When it comes to kqueue filters NFS is special.  A custom thread is
> created when the first event is registered.  Its purpose is to poll
> for changes every 2.5sec.  This logic has been inherited from NetBSD
> and is not present in FreeBSD.
> 
> Since filt_nfsread() only check `n_size' of a given nfsnode having a
> different context to emulate stat(2) behavior in kernel is questionable.
> So the diff below gets rid of this logic.  The rationals are KISS,
> coherency with nfs_poll() and match what other FS kqueue filters do.
> 
> While here add a missing write filter.
> 
> Matching the nfs_poll() logic and the write filter are necessary to keep
> the current behavior of poll(2) & select(2) once they start querying
> kqfilter handlers.

I think it is not good to remove nfs_kqpoll(). The function does more
than just supports the read filter. It generates various vnode events.
If the poller is removed, it becomes impossible for example to monitor
an NFS directory with kevent(2).

It is true that the code is not perfect, but having even basic and
best-effort functionality with kevent(2) and NFS is better than nothing
in my opinion. The kqueue subsystem should not dumb down for emulating
poll(2) and select(2).



Re: sparc64 boot issue on qemu

2020-05-30 Thread Mark Cave-Ayland
On 30/05/2020 10:03, Otto Moerbeek wrote:

> Hi,
> 
> thanks for the hints, but an unpatched 6.7 miniroot still fails to
> boot for me
> 
> qemu-system-sparc64 -machine sun4u -m 1024 -drive \
>   file=miniroot67.img,format=raw -nographic -serial stdio -monitor none
> 
> OpenBIOS for Sparc64
> Configuration device id QEMU version 1 machine id 0
> kernel cmdline 
> CPUs: 1 x SUNW,UltraSPARC-IIi
> UUID: ----
> Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
>   Type 'help' for detailed information
> Trying disk:a...
> Not a bootable ELF image
> Not a bootable a.out image
> 
> Loading FCode image...
> Loaded 6882 bytes
> entry point is 0x4000
> Evaluating FCode...
> OpenBSD IEEE 1275 Bootblock 2.0
> ..
> 
> And then hangs
> 
> While the patched bootblocks do boot (but hang later after
> 
> scsibus1 at softraid0: 256 targets
> 
> 
> as before,
> 
>   -Otto

Hmmm odd. Is it possible for you to upload your miniroot somewhere for me to 
take a
quick look? I don't have a great deal of time right now, but I can run it 
through a
debugger to see if anything obvious shows up.


ATB,

Mark.



Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 09:29:36AM +0100, Mark Cave-Ayland wrote:

> On 29/05/2020 23:56, Jason A. Donenfeld wrote:
> 
> > Oh that's a nice observation about `boot disk -V`. Doing so actually
> > got me booting up entirely:
> > 
> > $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> > $ qemu-img resize disk.qcow2 20G
> > $ qemu-system-sparc64 -m 1024 -drive file=disk.qcow2,if=ide -net
> > nic,model=ne2k_pci -net user -boot a -nographic -monitor none -serial
> > stdio
> 
> I think the problem here is that you're asking OpenBIOS to boot from the 
> (empty)
> floppy disk with "-boot a" rather than the qcow2 image which is normally 
> attached to
> the first hard disk "-boot c". As this is the default, then I would expect the
> command line above to work if you simply drop "-boot a".
> 
> Also is there a particular reason for using the ne2k_pci NIC instead of the 
> default
> in-built sunhme device? I try and keep the documentation at
> https://wiki.qemu.org/Documentation/Platforms/SPARC as accurate as I can, so 
> do look
> there for latest best practices and command line examples.
> 
> Finally the version of qemu-system-sparc64 you are running can also boot from 
> a
> virtio-blk-pci device (again see the above wiki page for details) if you are 
> looking
> for the best emulated disk performance.
> 
> 
> ATB,
> 
> Mark.

Hi,

thanks for the hints, but an unpatched 6.7 miniroot still fails to
boot for me

qemu-system-sparc64 -machine sun4u -m 1024 -drive \
file=miniroot67.img,format=raw -nographic -serial stdio -monitor none

OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline 
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: ----
Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
  Type 'help' for detailed information
Trying disk:a...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 6882 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
..

And then hangs

While the patched bootblocks do boot (but hang later after

scsibus1 at softraid0: 256 targets


as before,

-Otto



Re: userland clock_gettime proof of concept

2020-05-30 Thread Robert Nagy
On 30/05/20 10:40 +0200, Mark Kettenis wrote:
> > Date: Sat, 30 May 2020 10:32:15 +0200
> > From: Robert Nagy 
> > 
> > On 29/05/20 17:51 +0300, Paul Irofti wrote:
> > > On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> > > > > Date: Fri, 29 May 2020 13:45:37 +0100
> > > > > From: Stuart Henderson 
> > > > > 
> > > > > On 2020/05/29 13:50, Paul Irofti wrote:
> > > > > > +struct __timekeep {
> > > > > > +   uint32_t major; /* version major number */
> > > > > > +   uint32_t minor; /* version minor number */
> > > > > > +
> > > > > > +   u_int64_t   th_scale;
> > > > > > +   unsigned intth_offset_count;
> > > > > > +   struct bintime  th_offset;
> > > > > > +   struct bintime  th_naptime;
> > > > > > +   struct bintime  th_boottime;
> > > > > > +   volatile unsigned int   th_generation;
> > > > > > +
> > > > > > +   unsigned inttc_user;
> > > > > > +   unsigned inttc_counter_mask;
> > > > > > +};
> > > > > 
> > > > > Ah good, you got rid of u_int, that was causing problems with port 
> > > > > builds.
> > > > 
> > > > That in itself is a problem.  This means  is the wrong place
> > > > for this struct.  We need to find a better place for this.
> > > > 
> > > > Since this is now closely linked to the timecounter stuff
> > > >  would be an obvious place.  Now that file has:
> > > > 
> > > > #ifndef _KERNEL
> > > > #error "no user-serviceable parts inside"
> > > > #endif
> > > > 
> > > > you could change that into
> > > > 
> > > > #if !defined(_KERNEL) && !defined(_LIBC)
> > > > #error "no user-serviceable parts inside"
> > > > #endif
> > > > 
> > > > and make sure you #define _LIBC brefore uncluding this file where it
> > > > is needed.  As few places as possible obviously.
> > > 
> > > Done. Also includes claudio@'s observation.
> > 
> > I think if there are no more header changes, this should be commited to
> > have wider testing. We are also just after tree unlock so it feels like
> > the right time, and since there is no library bump we can easily revert
> > if there is a need for that.
> 
> Not ready yet.
> 
> I also would like to see at least one non-amd64 platform supported
> before we settle on this approach.


Which one would you prefer? arm64?



Re: userland clock_gettime proof of concept

2020-05-30 Thread Mark Kettenis
> Date: Sat, 30 May 2020 10:32:15 +0200
> From: Robert Nagy 
> 
> On 29/05/20 17:51 +0300, Paul Irofti wrote:
> > On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> > > > Date: Fri, 29 May 2020 13:45:37 +0100
> > > > From: Stuart Henderson 
> > > > 
> > > > On 2020/05/29 13:50, Paul Irofti wrote:
> > > > > +struct __timekeep {
> > > > > + uint32_t major; /* version major number */
> > > > > + uint32_t minor; /* version minor number */
> > > > > +
> > > > > + u_int64_t   th_scale;
> > > > > + unsigned intth_offset_count;
> > > > > + struct bintime  th_offset;
> > > > > + struct bintime  th_naptime;
> > > > > + struct bintime  th_boottime;
> > > > > + volatile unsigned int   th_generation;
> > > > > +
> > > > > + unsigned inttc_user;
> > > > > + unsigned inttc_counter_mask;
> > > > > +};
> > > > 
> > > > Ah good, you got rid of u_int, that was causing problems with port 
> > > > builds.
> > > 
> > > That in itself is a problem.  This means  is the wrong place
> > > for this struct.  We need to find a better place for this.
> > > 
> > > Since this is now closely linked to the timecounter stuff
> > >  would be an obvious place.  Now that file has:
> > > 
> > > #ifndef _KERNEL
> > > #error "no user-serviceable parts inside"
> > > #endif
> > > 
> > > you could change that into
> > > 
> > > #if !defined(_KERNEL) && !defined(_LIBC)
> > > #error "no user-serviceable parts inside"
> > > #endif
> > > 
> > > and make sure you #define _LIBC brefore uncluding this file where it
> > > is needed.  As few places as possible obviously.
> > 
> > Done. Also includes claudio@'s observation.
> 
> I think if there are no more header changes, this should be commited to
> have wider testing. We are also just after tree unlock so it feels like
> the right time, and since there is no library bump we can easily revert
> if there is a need for that.

Not ready yet.

I also would like to see at least one non-amd64 platform supported
before we settle on this approach.



Re: userland clock_gettime proof of concept

2020-05-30 Thread Robert Nagy
On 29/05/20 17:51 +0300, Paul Irofti wrote:
> On Fri, May 29, 2020 at 03:00:50PM +0200, Mark Kettenis wrote:
> > > Date: Fri, 29 May 2020 13:45:37 +0100
> > > From: Stuart Henderson 
> > > 
> > > On 2020/05/29 13:50, Paul Irofti wrote:
> > > > +struct __timekeep {
> > > > +   uint32_t major; /* version major number */
> > > > +   uint32_t minor; /* version minor number */
> > > > +
> > > > +   u_int64_t   th_scale;
> > > > +   unsigned intth_offset_count;
> > > > +   struct bintime  th_offset;
> > > > +   struct bintime  th_naptime;
> > > > +   struct bintime  th_boottime;
> > > > +   volatile unsigned int   th_generation;
> > > > +
> > > > +   unsigned inttc_user;
> > > > +   unsigned inttc_counter_mask;
> > > > +};
> > > 
> > > Ah good, you got rid of u_int, that was causing problems with port builds.
> > 
> > That in itself is a problem.  This means  is the wrong place
> > for this struct.  We need to find a better place for this.
> > 
> > Since this is now closely linked to the timecounter stuff
> >  would be an obvious place.  Now that file has:
> > 
> > #ifndef _KERNEL
> > #error "no user-serviceable parts inside"
> > #endif
> > 
> > you could change that into
> > 
> > #if !defined(_KERNEL) && !defined(_LIBC)
> > #error "no user-serviceable parts inside"
> > #endif
> > 
> > and make sure you #define _LIBC brefore uncluding this file where it
> > is needed.  As few places as possible obviously.
> 
> Done. Also includes claudio@'s observation.

I think if there are no more header changes, this should be commited to
have wider testing. We are also just after tree unlock so it feels like
the right time, and since there is no library bump we can easily revert
if there is a need for that.



Re: fix pppx(4) with net/ifq.c rev 1.38

2020-05-30 Thread David Gwynne
On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
> > > On 23 May 2020, at 12:54, Martin Pieuchot  wrote:
> > > On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
> > >> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
> > >>> [...] 
> > >>> can you try the following diff?
> > >>> 
> > >> 
> > >> I tested this diff and it works for me. But the problem I pointed is
> > >> about pipex(4) locking.
> > >> 
> > >> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
> > >> ip{,6}_output() but for itself too. But since pppac_start() has
> > >> unpredictable behavior I suggested to make it predictable [1].
> > > 
> > > What needs the NET_LOCK() in their?  We're talking about
> > > pipex_ppp_output(), right?  Does it really need the NET_LOCK() or
> > > the KERNEL_LOCK() is what protects those data structures?
> > 
> > Yes, about pipex_ppp_output() and pipex_output(). Except
> > ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
> > they can be replaced by ip{,6}_send().
> 
> Locks protect data structures, you're talking about functions, which
> data structures are serialized by this lock?  I'm questioning whether
> there is one.
> 
> > [...]
> > > In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
> > 
> > I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
> > accessed through `pr_input???. Is NET_LOCK() required for this case?
> 
> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
> because it was easy to do.  That means it isn't a concious decision or
> design.  The fact that pipex(4) code runs under the NET_LOCK() is a side
> effect of how the rest of the stack evolved.  I'm questioning whether
> this lock is required there.  In theory it shouldn't.  What is the
> reality?

pipex and pppx pre-date the NET_LOCK, which means you can assume
that any implicit locking was and is done by the KERNEL_LOCK. mpi is
asking the right questions here.

As for the ifq maxlen difference between pppx and pppac, that's more
about when and how quickly they were written more than anything else.
The IFQ_SET_MAXLEN(>if_snd, 1) in pppx is because that's a way to
bypass transmit mitigation for pseudo/virtual interfaces. That was the
only way to do it historically. It is not an elegant hack to keep
hold of the NET_LOCK over a call to a start routine.

As a rule of thumb, network interface drivers should not (maybe
cannot) rely on the NET_LOCK in their if_start handlers. To be
clear, they should not rely on it being held by the network stack
when if_start is called because sometimes the stack calls it without
holding NET_LOCK, and they should not take it because they might
be called by the stack when it is being held.

Also, be aware that the ifq machinery makes sure that the start
routine is not called concurrently or recursively. You can queue
packets for transmission on an ifq from anywhere in the kernel at
any time, but only one cpu will run the start routine. Other cpus
can queue packets while another one is running if_start, but the
first one ends up responsible for trying to transmit it.

ifqs also take the KERNEL_LOCK before calling if_start if the interface
is not marked as IFXF_MPSAFE.

The summary is that pppx and pppac are not marked as mpsafe so their
start routines are called with KERNEL_LOCK held. Currently pppx
accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
rely on it.

Cheers,
dlg