pkg_add.1

2020-06-23 Thread sven falempin
Dear readers,

It may not be very obvious that 'dry run' mode of pkg_add
actually downloads packages.
It is a good feature and maybe the pkg_add man could use an EXAMPLES
section.

Index: pkg_add.1
===
RCS file: /cvs/src/usr.sbin/pkg_add/pkg_add.1,v
retrieving revision 1.163
diff -u -p -r1.163 pkg_add.1
--- pkg_add.1   24 Jan 2020 21:10:46 -  1.163
+++ pkg_add.1   23 Jun 2020 23:25:12 -
@@ -836,6 +836,18 @@ information about individual packages
 database of installed
 .Xr packages 7
 .El
+.Sh EXAMPLES
+It is possible to download packages before installing them to separate
download and installation process.
+.Pp
+.Dl PKG_CACHE=/tmp/pkgs pkg_add -I -u -n
+.Pp
+will download the package(s) to be updated into .Dq /tmp/pkgs
+directory
+and they can be installed later
+.Pp
+pkg_add -I /tmp/pkgs/*
+.Pp
+.El
 .Sh SEE ALSO
 .Xr ftp 1 ,
 .Xr pkg_create 1 ,

Best,


Re: [PATCH] net: prevent if_clone_destroy from racing with rest of stack

2020-06-23 Thread Jason A. Donenfeld
On Tue, Jun 23, 2020 at 8:21 AM Martin Pieuchot  wrote:
> I'd argue this is a related problem but a different one.  The diff I
> sent serializes cloning/destroying pseudo-interfaces.  It has value on
> its own because *all* if_clone_*() operations are now serialized.
>
> Now you correctly points out that it doesn't fix all the races in the
> ioctl path.  That's a fact.  However without the informations of the
> crashes it is hard to reason about the uncounted reference returned by
> ifunit().
>
> Taking a rwlock around all ioctl(2) can have effects that are hard to
> debug.

We're talking about the same resource and lookup structure, so
generally it makes sense to protect that the same way, right? Adding
and removing ifps, and adding and them and removing them from the list
of ifps, all need to be synchronized with the read-only usage of those
ifps. The other way to fix it would be avoiding a critical section
entirely by incrementing a refcount during the if_list lookup.

Either way, it sounds like the big problem you're pointing out with my
patch is that you fear some of those ioctls need to be callable from
contexts that cannot sleep, which means using an rwlock is wrong. It
doesn't look like the mutex spinlock has a r/w variant though. Or am I
missing that?

Jason



xhci(4): acknowledge interrupts before calling usb_schedsoftintr()

2020-06-23 Thread Patrick Wildt
Hi,

I had issues with a machine hanging on powerdown.  The issue is caused
by sd(4)'s suspend method trying to "power down" my umass(4) USB stick.

The symptom was that during powerdown, when running in "polling mode",
the first transaction (send command to power down to USB stick) works:
We enqueue a transfer, and then poll for an event in xhci(4).  On the
first transfer, we see an event.  On the second transfer, which is to
read the status from the USB stick, we get a timeout.  We poll for an
event, but we never see it.

"Polling" for an event in xhci(4) means checking its interrupt status
for *any* bit.  But, the interrupt status register never had one set
for the second transaction.  Using a USB debugger, one could see that
the second transaction actually completed, but we just did not get an
interrupt for that completed transfer.

The issue is actually in xhci(4)'s interrupt handler, which is also
called for the polling mode.  There we first acknowledge the pending
interrupts in the USB status register, then we call usb_schedsoftintr(),
and afterwards we acknowledge the interrupt manager regarding "level-
-triggered" interrupts.

In polling mode, usb_schedsoftintr() calls the xhci_softintr() method
right away, which will dequeue an event from the event queue and thus
complete transfers.  The important aspect there is that dequeuing an
event also means touching xhci's registers to inform it that we have
dequeued an event.

In non-polling mode, usb_schedsoftintr() will only schedule a soft-
interrupt, which means that in regards to "touching" the xhci hardware,
the first thing that happens is acknowledging the interrupt manager
bits.

Moving the call to usb_schedsoftintr() to be after the interrupt ACKs
resolves my problem.  With this change, the first thing that happens,
polling and non-polling, is acknowledge the interrupts, and no other
register touching.  And that's also what Linux is doing.  ACK first,
handle events later.

With this, the next xhci_poll() actually sees an interrupt and the
second transfer can succeed.  Thus my machine finally shuts down and
does not anymore hang indefinitely.

Comments?

Patrick

diff --git a/sys/dev/usb/xhci.c b/sys/dev/usb/xhci.c
index 2d65208f3db..ba5ee56502c 100644
--- a/sys/dev/usb/xhci.c
+++ b/sys/dev/usb/xhci.c
@@ -624,13 +624,13 @@ xhci_intr1(struct xhci_softc *sc)
return (1);
}
 
-   XOWRITE4(sc, XHCI_USBSTS, intrs); /* Acknowledge */
-   usb_schedsoftintr(>sc_bus);
-
-   /* Acknowledge PCI interrupt */
+   /* Acknowledge interrupts */
+   XOWRITE4(sc, XHCI_USBSTS, intrs);
intrs = XRREAD4(sc, XHCI_IMAN(0));
XRWRITE4(sc, XHCI_IMAN(0), intrs | XHCI_IMAN_INTR_PEND);
 
+   usb_schedsoftintr(>sc_bus);
+
return (1);
 }
 



Re: top.1: Fix COMMAND description

2020-06-23 Thread Jason McIntyre
On Tue, Jun 23, 2020 at 09:42:06PM +0200, Klemens Nanni wrote:
> There simply is no code that adds angle brackets the swapped out
> processes in the COMMAND column.
> 
> I double checked with a tiny VMM instance using 64M of RAM where
> ld(1) from the library_aslr script immediately hits swap: no <> around.
> 
> While here, mention that -C appends arguments.
> 
> Feedback? OK?
> 

that seems correct:

RCS file: /cvs/src/usr.bin/top/machine.c,v

revision 1.54
date: 2006/11/29 12:34:22;  author: miod;  state: Exp;  lines: +1 -10;
Do not test for processes being swapped out since this can't happen anymore.

=
Index: machine.c
===
RCS file: /cvs/src/usr.bin/top/machine.c,v
retrieving revision 1.53
retrieving revision 1.54
diff -u -r1.53 -r1.54
--- machine.c   20 Sep 2006 21:26:20 -  1.53
+++ machine.c   29 Nov 2006 12:34:22 -  1.54
@@ -1,4 +1,4 @@
-/* $OpenBSD: machine.c,v 1.53 2006/09/20 21:26:20 ray Exp $ */
+/* $OpenBSD: machine.c,v 1.54 2006/11/29 12:34:22 miod Exp $*/
 
 /*-
  * Copyright (c) 1994 Thorsten Lockert 
@@ -475,15 +475,6 @@
pp = *(hp->next_proc++);
hp->remaining--;
 
-   if ((pp->p_flag & P_INMEM) == 0) {
-   /*
-* Print swapped processes as 
-*/
-   char buf[sizeof(pp->p_comm)];
-
-   (void) strlcpy(buf, pp->p_comm, sizeof(buf));
-   (void) snprintf(pp->p_comm, sizeof(pp->p_comm), "<%s>", buf);
-   }
cputime = (pp->p_uticks + pp->p_sticks + pp->p_iticks) / stathz;
 
/* calculate the base for cpu percentages */

ok for the (amended) diff.
jmc

> 
> Index: top.1
> ===
> RCS file: /cvs/src/usr.bin/top/top.1,v
> retrieving revision 1.73
> diff -u -p -r1.73 top.1
> --- top.1 7 Jan 2020 13:30:43 -   1.73
> +++ top.1 23 Jun 2020 19:33:04 -
> @@ -450,9 +450,9 @@ The number of system and user CPU second
>  The raw percentage of CPU usage and the default field on which the
>  display is sorted.
>  .It COMMAND
> -The name of the command that the process is currently running.
> -(If the process is swapped out, this column is enclosed by angle
> -brackets.)
> +The name (and arguments if
> +.Fl H
> +is specified) of the command that the process is currently running.
>  .El
>  .Sh ENVIRONMENT
>  .Bl -tag -width Ev
> 



Re: top.1: Fix COMMAND description

2020-06-23 Thread Klemens Nanni
On Tue, Jun 23, 2020 at 09:42:06PM +0200, Klemens Nanni wrote:
> There simply is no code that adds angle brackets the swapped out
> processes in the COMMAND column.
> 
> I double checked with a tiny VMM instance using 64M of RAM where
> ld(1) from the library_aslr script immediately hits swap: no <> around.
> 
> While here, mention that -C appends arguments.
> 
> Feedback? OK?
Without fat fingering the Fl macro this time, thanks Matthew Martin!



Index: top.1
===
RCS file: /cvs/src/usr.bin/top/top.1,v
retrieving revision 1.73
diff -u -p -r1.73 top.1
--- top.1   7 Jan 2020 13:30:43 -   1.73
+++ top.1   23 Jun 2020 20:12:44 -
@@ -450,9 +450,9 @@ The number of system and user CPU second
 The raw percentage of CPU usage and the default field on which the
 display is sorted.
 .It COMMAND
-The name of the command that the process is currently running.
-(If the process is swapped out, this column is enclosed by angle
-brackets.)
+The name (and arguments if
+.Fl C
+is specified) of the command that the process is currently running.
 .El
 .Sh ENVIRONMENT
 .Bl -tag -width Ev



Re: SSE in kernel?

2020-06-23 Thread Philip Guenther
On Mon, Jun 22, 2020 at 9:12 PM  wrote:

> Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> a sufficient guard?
>
> I have a rasops32 putchar with SSE that is 2x faster.
>

As Bryan and Patrick noted: it's possible, but there are restrictions and
costs.

The main restriction is that the code must not permit a context-switch
between the fpu_kernel_enter() and fpu_kernel_exit() calls.  No taking an
rwlock or calling any of the sleep functions, for example.

If you're using more than the minimal level of SSE which is already
required by the kernel (for lfence, etc) then you should also check whether
the necessary extension bits are present in curcpu()->ci_feature_* and fall
back to the current code if not present.

The cost is that if the thread doing this isn't a system thread, then the
first fpu_kernel_enter() call after the userspace->kernel transition has to
save and reset the FPU registers (XSAVEOPT + XRSTOR on newish CPUs).  Every
fpu_kernel_exit(), regardless of thread type, resets them again (XRSTOR).

If the restriction isn't a problem and the cost of those is worth the gain,
then sure, go for it.  We already do it for AES stuff in the kernel, for
example.  c.f. /usr/src/sys/arch/amd64/amd64/aesni.c


Philip Guenther


top.1: Fix COMMAND description

2020-06-23 Thread Klemens Nanni
There simply is no code that adds angle brackets the swapped out
processes in the COMMAND column.

I double checked with a tiny VMM instance using 64M of RAM where
ld(1) from the library_aslr script immediately hits swap: no <> around.

While here, mention that -C appends arguments.

Feedback? OK?


Index: top.1
===
RCS file: /cvs/src/usr.bin/top/top.1,v
retrieving revision 1.73
diff -u -p -r1.73 top.1
--- top.1   7 Jan 2020 13:30:43 -   1.73
+++ top.1   23 Jun 2020 19:33:04 -
@@ -450,9 +450,9 @@ The number of system and user CPU second
 The raw percentage of CPU usage and the default field on which the
 display is sorted.
 .It COMMAND
-The name of the command that the process is currently running.
-(If the process is swapped out, this column is enclosed by angle
-brackets.)
+The name (and arguments if
+.Fl H
+is specified) of the command that the process is currently running.
 .El
 .Sh ENVIRONMENT
 .Bl -tag -width Ev



Re: top: remove redundant NULL check

2020-06-23 Thread Todd C . Miller
On Tue, 23 Jun 2020 18:31:03 +0200, Klemens Nanni wrote:

> I'd like to remove a NULL check in get_process_info() for the sake of
> simplicity and to reflect that the process list is *always* sorted
> (default is "cpu"), even if not explicitly requested;  this makes it
> easier to argue about the code, imho.

Makes sense.  OK millert@

 - todd



top: remove redundant NULL check

2020-06-23 Thread Klemens Nanni
I'd like to remove a NULL check in get_process_info() for the sake of
simplicity and to reflect that the process list is *always* sorted
(default is "cpu"), even if not explicitly requested;  this makes it
easier to argue about the code, imho.

Details on why this check is never true:

get_process_info() is the function that obtains the process list and
sorts it using the compare function passed to get_process_info().

The function pointer is always picked from the array ordernames[] using
the getorder() helper:

335 int
336 main(int argc, char *argv[])
337 {
...
400 /* determine sorting order index, if necessary */
401 if (order_name != NULL) {
402 if ((order_index = getorder(order_name)) == -1) {
403 new_message(MT_delayed,
404 " %s: unrecognized sorting order", order_name);
405 order_index = 0;
406 }
407 }
...
472 /*
473  *  main loop -- repeat while display count is positive or while it
474  *  indicates infinity (by being -1)
475  */
476 while ((displays == -1) || (displays-- > 0)) {
477 if (winchflag) {
...
513 /* get the current set of processes */
514 processes = get_process_info(_info, ,
515 proc_compares[order_index]);

This is the only call path to get_process_info() and thus qsort().

order_name is NULL initialized and only set during getopt() parsing iff
-o is used.

order_index is zero initialized and only updated through CMD_order,
the interactive `o' prompt, which uses getorder() to only update it iff
the given order is valid.

getorder() never yields an invalid index for order_names[], hence
proc_compares[order_index] is never NULL and order_index is never out of
bound.


Feedback? OK?

Index: machine.c
===
RCS file: /cvs/src/usr.bin/top/machine.c,v
retrieving revision 1.102
diff -u -p -r1.102 machine.c
--- machine.c   6 Jan 2020 20:05:10 -   1.102
+++ machine.c   23 Jun 2020 15:27:48 -
@@ -491,10 +491,7 @@ get_process_info(struct system_info *si,
}
}
 
-   /* if requested, sort the "interesting" processes */
-   if (compare != NULL)
-   qsort((char *) pref, active_procs,
-   sizeof(struct kinfo_proc *), compare);
+   qsort((char *)pref, active_procs, sizeof(struct kinfo_proc *), compare);
/* remember active and total counts */
si->p_total = total_procs;
si->p_active = pref_len = active_procs;



Re: [PATCH] net: prevent if_clone_destroy from racing with rest of stack

2020-06-23 Thread Martin Pieuchot
On 23/06/20(Tue) 04:53, Jason A. Donenfeld wrote:
> On 6/23/20, Martin Pieuchot  wrote:
> > On 23/06/20(Tue) 01:00, Jason A. Donenfeld wrote:
> >> You can crash a system by running something like:
> >>
> >> for i in 1 2 3; do while true; do ifconfig bridge0 create& ifconfig
> >> bridge0 destroy& done& done
> >>
> >> This works with every type of interface I've tried. It appears that
> >> if_clone_destroy and if_clone_create race with other ioctls, which
> >> causes a variety of different UaFs or just general logic errors.
> >
> > Thanks for the report.  This is a long standing & known issue.
> >
> >> One common root cause appears to be that most ifioctl functions use
> >> ifunit() to find an interface by name, which traverses if_list. Writes
> >> to if_list are protected by a lock, but reads are apparently
> >> unprotected. There's also the question of the life time of the object
> >> returned from ifunit(). Most things that access 's if_list are
> >> done without locking, and even if those accesses were to be locked, the
> >> lock would be released before the object is no longer used, causing the
> >> UaF in that case as well.
> >
> > Any sleeping point between the first ifunit() and the end of `ifc_create'
> > or `ifc_destroy' respectively can lead to races.
> >
> >> This patch fixes the issue by making if_clone_{create,destroy} exclusive
> >> with all other ifioctls.
> >
> > Diff below achieves the same but moves the locking inside the if_clone*
> > functions such that consumers like tun(4), bridge(4) and switch(4) which
> > clone interfaces upon open(2) are also serialized.
> >
> > I also added a NET_ASSERT_UNLOCKED() in both functions because the new
> > lock must be grabbed before the NET_LOCK() in order to allow ifc_create
> > and ifc_destroy to manipulate data structures protected by this lock.
> >
> > Comments, ok?
> 
> Not okay. This is the first thing I tried, and it still races with
> ifioctl_get, causing UaF crashes. It's harder to trigger, but still
> possible after a few minutes of races. This structure here needs to be
> a read/write lock, which is what my original patch provides. (I guess
> I forgot to add the "ok?" epilogue to it.)

I'd argue this is a related problem but a different one.  The diff I
sent serializes cloning/destroying pseudo-interfaces.  It has value on
its own because *all* if_clone_*() operations are now serialized.

Now you correctly points out that it doesn't fix all the races in the
ioctl path.  That's a fact.  However without the informations of the
crashes it is hard to reason about the uncounted reference returned by
ifunit().

Taking a rwlock around all ioctl(2) can have effects that are hard to
debug.

> > Index: net/if.c
> > ===
> > RCS file: /cvs/src/sys/net/if.c,v
> > retrieving revision 1.610
> > diff -u -p -r1.610 if.c
> > --- net/if.c22 Jun 2020 09:45:13 -  1.610
> > +++ net/if.c23 Jun 2020 10:25:41 -
> > @@ -224,6 +224,7 @@ voidif_idxmap_remove(struct ifnet *);
> >
> >  TAILQ_HEAD(, ifg_group) ifg_head = TAILQ_HEAD_INITIALIZER(ifg_head);
> >
> > +struct rwlock if_clone_lock = RWLOCK_INITIALIZER("if_clone_lock");
> >  LIST_HEAD(, if_clone) if_cloners = LIST_HEAD_INITIALIZER(if_cloners);
> >  int if_cloners_count;
> >
> > @@ -1246,24 +1247,36 @@ if_clone_create(const char *name, int rd
> > struct ifnet *ifp;
> > int unit, ret;
> >
> > -   ifc = if_clone_lookup(name, );
> > -   if (ifc == NULL)
> > -   return (EINVAL);
> > +   NET_ASSERT_UNLOCKED();
> > +   rw_enter_write(_clone_lock);
> >
> > -   if (ifunit(name) != NULL)
> > -   return (EEXIST);
> > +   ifc = if_clone_lookup(name, );
> > +   if (ifc == NULL) {
> > +   ret = EINVAL;
> > +   goto out;
> > +   }
> > +   if (ifunit(name) != NULL) {
> > +   ret = EEXIST;
> > +   goto out;
> > +   }
> >
> > ret = (*ifc->ifc_create)(ifc, unit);
> > +   if (ret != 0)
> > +   goto out;
> >
> > -   if (ret != 0 || (ifp = ifunit(name)) == NULL)
> > -   return (ret);
> > +   ifp = ifunit(name);
> > +   if (ifp == NULL) {
> > +   ret = EAGAIN;
> > +   goto out;
> > +   }
> >
> > NET_LOCK();
> > if_addgroup(ifp, ifc->ifc_name);
> > if (rdomain != 0)
> > if_setrdomain(ifp, rdomain);
> > NET_UNLOCK();
> > -
> > +out:
> > +   rw_exit_write(_clone_lock);
> > return (ret);
> >  }
> >
> > @@ -1277,16 +1290,23 @@ if_clone_destroy(const char *name)
> > struct ifnet *ifp;
> > int ret;
> >
> > -   ifc = if_clone_lookup(name, NULL);
> > -   if (ifc == NULL)
> > -   return (EINVAL);
> > +   NET_ASSERT_UNLOCKED();
> > +   rw_enter_write(_clone_lock);
> >
> > +   ifc = if_clone_lookup(name, NULL);
> > +   if (ifc == NULL) {
> > +   ret = EINVAL;
> > +   goto out;
> > +   }
> > ifp = ifunit(name);
> > -   if (ifp == NULL)
> > -   return (ENXIO);
> > -
> > -   if 

Re: systat.1: Trim ":" description, support line kill character

2020-06-23 Thread Klemens Nanni
On Mon, Jun 22, 2020 at 08:57:43PM -0600, Theo de Raadt wrote:
> In OpenBSD, the erase character is ^?
> 
> ^H is accepted in a few places, like here (because of CTRL_H) but
> it is absolutely not the canonical tty 'character erase' character,
> which is implied in your text by placing it next to kill and the
> canonical tty werase default ^U
Right, ^H works here and there but erase is in fact ^?.

^U is usually line kill, not word erase, no?

> It was vague, and I guess OK.  But your increasing precision is making
> it wrong.
Just add ^U as canonical line kill and leave the manual as is.

Feedback? OK?


Index: engine.c
===
RCS file: /cvs/src/usr.bin/systat/engine.c,v
retrieving revision 1.25
diff -u -p -r1.25 engine.c
--- engine.c12 Jan 2020 20:51:08 -  1.25
+++ engine.c22 Jun 2020 13:39:51 -
@@ -1204,6 +1204,7 @@ cmd_keyboard(int ch)
break;
case 0x1b:
case CTRL_G:
+   case CTRL_U:
if (cmd_len > 0) {
cmdbuf[0] = '\0';
cmd_len = 0;
Index: engine.h
===
RCS file: /cvs/src/usr.bin/systat/engine.h,v
retrieving revision 1.12
diff -u -p -r1.12 engine.h
--- engine.h12 Jan 2020 20:51:08 -  1.12
+++ engine.h22 Jun 2020 13:39:52 -
@@ -36,6 +36,7 @@
 #define CTRL_L  12
 #define CTRL_N  14
 #define CTRL_P  16
+#define CTRL_U  21
 #define CTRL_V  22
 
 #define META_V  246



Adapt usbhidaction.1 example to sndio changes

2020-06-23 Thread Benjamin Baier
Hi,

adapt usbhidaction.1 example to sndio changes.

Greetings Ben

Index: usbhidaction.1
===
RCS file: /var/cvs/src/usr.bin/usbhidaction/usbhidaction.1,v
retrieving revision 1.14
diff -u -p -r1.14 usbhidaction.1
--- usbhidaction.1  20 Apr 2020 20:54:31 -  1.14
+++ usbhidaction.1  23 Jun 2020 12:06:01 -
@@ -105,20 +105,20 @@ master volume and muting of an
 .Xr azalia 4
 device using the multimedia keys on a Belkin USB keyboard.
 .Bd -literal -offset indent
-# The volume range is 0..255. Moving 8 volume steps each keypress
+# The volume range is 0..1. Moving 0.05 volume steps each keypress
 # moves quickly through the volume range but still has decent
 # granularity.
 Consumer:Volume_Increment 1
-   mixerctl -f $1 outputs.master=+8
+   sndioctl -f $1 output.level=+0.05
 Consumer:Volume_Decrement 1
-   mixerctl -f $1 outputs.master=-8
+   sndioctl -f $1 output.level=-0.05
 Consumer:Mute 1
-   mixerctl -f $1 outputs.master.mute=toggle
+   sndioctl -f $1 output.mute=!
 .Ed
 .Pp
 A sample invocation using this configuration would be
 .Bd -literal -offset indent
-$ usbhidaction -f /dev/uhid1 -c conf /dev/audioctl0
+$ usbhidaction -f /dev/uhid1 -c conf snd/0
 .Ed
 .Sh SEE ALSO
 .Xr usbhidctl 1 ,



umass(4): consistently use sc_xfer_flags for polling mode

2020-06-23 Thread Patrick Wildt
Hi,

when powering down, sd(4) will trigger a powerdown on it's umass(4)
USB stick.  If the device fails to respond, for whatever reason, the
umass(4) code will do multiple reset mechanism, and one of those uses
a control transfer.  Unfortunately the control transfer is not passed
the sc_xfer_flags, which are *only* used to supply USBD_SYNCHRONOUS
to allow the polling mode to work.

Without USBD_SYNCHRONOUS, umass_polled_transfer()'s call to
usbd_transfer() will immediately return, and it will never complete.

The code will return to scsi, where it will wait until the "cookie"
is cleared.  Since this is polling mode, there's no asynchronous call-
back, and the cookie will never be cleared.  Thus we will msleep and
wait forever.

By also using sc->sc_xfer_flags on the control transfer, it will run
synchronously.  There's still another bug that happens when even more
transfers fail, since then umass_bbb_transfer()'s call to umass_bbb_
reset() will cause the SCSI done handler to be called a second time
resulting in a panic.

But that's a bug in the state machine for error handling, which can be
fixed later on.  Also a panic during powerdown is better than hanging
indefinitely.

ok?

Patrick

diff --git a/sys/dev/usb/umass.c b/sys/dev/usb/umass.c
index 53d783ff396..f871a3d9c41 100644
--- a/sys/dev/usb/umass.c
+++ b/sys/dev/usb/umass.c
@@ -789,18 +789,18 @@ umass_setup_ctrl_transfer(struct umass_softc *sc, 
usb_device_request_t *req,
/* Initialise a USB control transfer and then schedule it */
 
usbd_setup_default_xfer(xfer, sc->sc_udev, (void *) sc,
-   USBD_DEFAULT_TIMEOUT, req, buffer, buflen, flags,
-   sc->sc_methods->wire_state);
+   USBD_DEFAULT_TIMEOUT, req, buffer, buflen,
+   flags | sc->sc_xfer_flags, sc->sc_methods->wire_state);
 
if (sc->sc_udev->bus->use_polling) {
DPRINTF(UDMASS_XFER,("%s: start polled ctrl xfer buffer=%p "
"buflen=%d flags=0x%x\n", sc->sc_dev.dv_xname, buffer,
-   buflen, flags));
+   buflen, flags | sc->sc_xfer_flags));
err = umass_polled_transfer(sc, xfer);
} else {
DPRINTF(UDMASS_XFER,("%s: start ctrl xfer buffer=%p buflen=%d "
"flags=0x%x\n", sc->sc_dev.dv_xname, buffer, buflen,
-   flags));
+   flags | sc->sc_xfer_flags));
err = usbd_transfer(xfer);
}
if (err && err != USBD_IN_PROGRESS) {



Re: SSE in kernel?

2020-06-23 Thread Bryan Steele
On Tue, Jun 23, 2020 at 01:03:18PM +0200, Patrick Wildt wrote:
> On Tue, Jun 23, 2020 at 06:51:20AM -0400, Bryan Steele wrote:
> > On Mon, Jun 22, 2020 at 11:10:10PM -0700, jo...@armadilloaerospace.com 
> > wrote:
> > > Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> > > a sufficient guard?
> > > 
> > > I have a rasops32 putchar with SSE that is 2x faster.
> > 
> > No, in general you cannot using FP instructions in the kernel, also the
> > kernel is often compiled with -msoft-float on platforms that support it.
> 
> Exceptions are being made for amdgpu drm, where some of the files are
> compiled with sse enabled, though the code is guarded with
> fpu_kernel_enter().  DC_FP_START() and DC_FP_END() to be precise, which
> are macros pointing to the kernel functions.

^ patrick got there first. :-)



Re: SSE in kernel?

2020-06-23 Thread Bryan Steele
On Tue, Jun 23, 2020 at 06:51:22AM -0400, Bryan Steele wrote:
> On Mon, Jun 22, 2020 at 11:10:10PM -0700, jo...@armadilloaerospace.com wrote:
> > Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> > a sufficient guard?
> > 
> > I have a rasops32 putchar with SSE that is 2x faster.
> 
> No, in general you cannot using FP instructions in the kernel, also the
> kernel is often compiled with -msoft-float on platforms that support it.

More specifically using the FP instructions the kernel incurs a
performance penalty from having to save and restore the FP registers.
There are some helper functions for that in the very limited cases
where it's done, see fpu_kernel_enter/exit() in arch/amd64/amd64/fpu.c.
But these are i386/amd64-only.



Re: SSE in kernel?

2020-06-23 Thread Patrick Wildt
On Tue, Jun 23, 2020 at 06:51:20AM -0400, Bryan Steele wrote:
> On Mon, Jun 22, 2020 at 11:10:10PM -0700, jo...@armadilloaerospace.com wrote:
> > Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> > a sufficient guard?
> > 
> > I have a rasops32 putchar with SSE that is 2x faster.
> 
> No, in general you cannot using FP instructions in the kernel, also the
> kernel is often compiled with -msoft-float on platforms that support it.

Exceptions are being made for amdgpu drm, where some of the files are
compiled with sse enabled, though the code is guarded with
fpu_kernel_enter().  DC_FP_START() and DC_FP_END() to be precise, which
are macros pointing to the kernel functions.



Re: [PATCH] net: prevent if_clone_destroy from racing with rest of stack

2020-06-23 Thread Jason A. Donenfeld
On 6/23/20, Martin Pieuchot  wrote:
> On 23/06/20(Tue) 01:00, Jason A. Donenfeld wrote:
>> You can crash a system by running something like:
>>
>> for i in 1 2 3; do while true; do ifconfig bridge0 create& ifconfig
>> bridge0 destroy& done& done
>>
>> This works with every type of interface I've tried. It appears that
>> if_clone_destroy and if_clone_create race with other ioctls, which
>> causes a variety of different UaFs or just general logic errors.
>
> Thanks for the report.  This is a long standing & known issue.
>
>> One common root cause appears to be that most ifioctl functions use
>> ifunit() to find an interface by name, which traverses if_list. Writes
>> to if_list are protected by a lock, but reads are apparently
>> unprotected. There's also the question of the life time of the object
>> returned from ifunit(). Most things that access 's if_list are
>> done without locking, and even if those accesses were to be locked, the
>> lock would be released before the object is no longer used, causing the
>> UaF in that case as well.
>
> Any sleeping point between the first ifunit() and the end of `ifc_create'
> or `ifc_destroy' respectively can lead to races.
>
>> This patch fixes the issue by making if_clone_{create,destroy} exclusive
>> with all other ifioctls.
>
> Diff below achieves the same but moves the locking inside the if_clone*
> functions such that consumers like tun(4), bridge(4) and switch(4) which
> clone interfaces upon open(2) are also serialized.
>
> I also added a NET_ASSERT_UNLOCKED() in both functions because the new
> lock must be grabbed before the NET_LOCK() in order to allow ifc_create
> and ifc_destroy to manipulate data structures protected by this lock.
>
> Comments, ok?

Not okay. This is the first thing I tried, and it still races with
ifioctl_get, causing UaF crashes. It's harder to trigger, but still
possible after a few minutes of races. This structure here needs to be
a read/write lock, which is what my original patch provides. (I guess
I forgot to add the "ok?" epilogue to it.)


>
> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.610
> diff -u -p -r1.610 if.c
> --- net/if.c  22 Jun 2020 09:45:13 -  1.610
> +++ net/if.c  23 Jun 2020 10:25:41 -
> @@ -224,6 +224,7 @@ void  if_idxmap_remove(struct ifnet *);
>
>  TAILQ_HEAD(, ifg_group) ifg_head = TAILQ_HEAD_INITIALIZER(ifg_head);
>
> +struct rwlock if_clone_lock = RWLOCK_INITIALIZER("if_clone_lock");
>  LIST_HEAD(, if_clone) if_cloners = LIST_HEAD_INITIALIZER(if_cloners);
>  int if_cloners_count;
>
> @@ -1246,24 +1247,36 @@ if_clone_create(const char *name, int rd
>   struct ifnet *ifp;
>   int unit, ret;
>
> - ifc = if_clone_lookup(name, );
> - if (ifc == NULL)
> - return (EINVAL);
> + NET_ASSERT_UNLOCKED();
> + rw_enter_write(_clone_lock);
>
> - if (ifunit(name) != NULL)
> - return (EEXIST);
> + ifc = if_clone_lookup(name, );
> + if (ifc == NULL) {
> + ret = EINVAL;
> + goto out;
> + }
> + if (ifunit(name) != NULL) {
> + ret = EEXIST;
> + goto out;
> + }
>
>   ret = (*ifc->ifc_create)(ifc, unit);
> + if (ret != 0)
> + goto out;
>
> - if (ret != 0 || (ifp = ifunit(name)) == NULL)
> - return (ret);
> + ifp = ifunit(name);
> + if (ifp == NULL) {
> + ret = EAGAIN;
> + goto out;
> + }
>
>   NET_LOCK();
>   if_addgroup(ifp, ifc->ifc_name);
>   if (rdomain != 0)
>   if_setrdomain(ifp, rdomain);
>   NET_UNLOCK();
> -
> +out:
> + rw_exit_write(_clone_lock);
>   return (ret);
>  }
>
> @@ -1277,16 +1290,23 @@ if_clone_destroy(const char *name)
>   struct ifnet *ifp;
>   int ret;
>
> - ifc = if_clone_lookup(name, NULL);
> - if (ifc == NULL)
> - return (EINVAL);
> + NET_ASSERT_UNLOCKED();
> + rw_enter_write(_clone_lock);
>
> + ifc = if_clone_lookup(name, NULL);
> + if (ifc == NULL) {
> + ret = EINVAL;
> + goto out;
> + }
>   ifp = ifunit(name);
> - if (ifp == NULL)
> - return (ENXIO);
> -
> - if (ifc->ifc_destroy == NULL)
> - return (EOPNOTSUPP);
> + if (ifp == NULL) {
> + ret = ENXIO;
> + goto out;
> + }
> + if (ifc->ifc_destroy == NULL) {
> + ret = EOPNOTSUPP;
> + goto out;
> + }
>
>   NET_LOCK();
>   if (ifp->if_flags & IFF_UP) {
> @@ -1297,7 +1317,8 @@ if_clone_destroy(const char *name)
>   }
>   NET_UNLOCK();
>   ret = (*ifc->ifc_destroy)(ifp);
> -
> +out:
> + rw_exit_write(_clone_lock);
>   return (ret);
>  }
>
>



Re: SSE in kernel?

2020-06-23 Thread Bryan Steele
On Mon, Jun 22, 2020 at 11:10:10PM -0700, jo...@armadilloaerospace.com wrote:
> Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> a sufficient guard?
> 
> I have a rasops32 putchar with SSE that is 2x faster.

No, in general you cannot using FP instructions in the kernel, also the
kernel is often compiled with -msoft-float on platforms that support it.



Re: [PATCH] net: prevent if_clone_destroy from racing with rest of stack

2020-06-23 Thread Martin Pieuchot
On 23/06/20(Tue) 01:00, Jason A. Donenfeld wrote:
> You can crash a system by running something like:
> 
> for i in 1 2 3; do while true; do ifconfig bridge0 create& ifconfig 
> bridge0 destroy& done& done
> 
> This works with every type of interface I've tried. It appears that
> if_clone_destroy and if_clone_create race with other ioctls, which
> causes a variety of different UaFs or just general logic errors.

Thanks for the report.  This is a long standing & known issue.
 
> One common root cause appears to be that most ifioctl functions use
> ifunit() to find an interface by name, which traverses if_list. Writes
> to if_list are protected by a lock, but reads are apparently
> unprotected. There's also the question of the life time of the object
> returned from ifunit(). Most things that access 's if_list are
> done without locking, and even if those accesses were to be locked, the
> lock would be released before the object is no longer used, causing the
> UaF in that case as well.

Any sleeping point between the first ifunit() and the end of `ifc_create'
or `ifc_destroy' respectively can lead to races.

> This patch fixes the issue by making if_clone_{create,destroy} exclusive
> with all other ifioctls.

Diff below achieves the same but moves the locking inside the if_clone*
functions such that consumers like tun(4), bridge(4) and switch(4) which
clone interfaces upon open(2) are also serialized.

I also added a NET_ASSERT_UNLOCKED() in both functions because the new
lock must be grabbed before the NET_LOCK() in order to allow ifc_create
and ifc_destroy to manipulate data structures protected by this lock.

Comments, ok?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.610
diff -u -p -r1.610 if.c
--- net/if.c22 Jun 2020 09:45:13 -  1.610
+++ net/if.c23 Jun 2020 10:25:41 -
@@ -224,6 +224,7 @@ voidif_idxmap_remove(struct ifnet *);
 
 TAILQ_HEAD(, ifg_group) ifg_head = TAILQ_HEAD_INITIALIZER(ifg_head);
 
+struct rwlock if_clone_lock = RWLOCK_INITIALIZER("if_clone_lock");
 LIST_HEAD(, if_clone) if_cloners = LIST_HEAD_INITIALIZER(if_cloners);
 int if_cloners_count;
 
@@ -1246,24 +1247,36 @@ if_clone_create(const char *name, int rd
struct ifnet *ifp;
int unit, ret;
 
-   ifc = if_clone_lookup(name, );
-   if (ifc == NULL)
-   return (EINVAL);
+   NET_ASSERT_UNLOCKED();
+   rw_enter_write(_clone_lock);
 
-   if (ifunit(name) != NULL)
-   return (EEXIST);
+   ifc = if_clone_lookup(name, );
+   if (ifc == NULL) {
+   ret = EINVAL;
+   goto out;
+   }
+   if (ifunit(name) != NULL) {
+   ret = EEXIST;
+   goto out;
+   }
 
ret = (*ifc->ifc_create)(ifc, unit);
+   if (ret != 0)
+   goto out;
 
-   if (ret != 0 || (ifp = ifunit(name)) == NULL)
-   return (ret);
+   ifp = ifunit(name);
+   if (ifp == NULL) {
+   ret = EAGAIN;
+   goto out;
+   }
 
NET_LOCK();
if_addgroup(ifp, ifc->ifc_name);
if (rdomain != 0)
if_setrdomain(ifp, rdomain);
NET_UNLOCK();
-
+out:
+   rw_exit_write(_clone_lock);
return (ret);
 }
 
@@ -1277,16 +1290,23 @@ if_clone_destroy(const char *name)
struct ifnet *ifp;
int ret;
 
-   ifc = if_clone_lookup(name, NULL);
-   if (ifc == NULL)
-   return (EINVAL);
+   NET_ASSERT_UNLOCKED();
+   rw_enter_write(_clone_lock);
 
+   ifc = if_clone_lookup(name, NULL);
+   if (ifc == NULL) {
+   ret = EINVAL;
+   goto out;
+   }
ifp = ifunit(name);
-   if (ifp == NULL)
-   return (ENXIO);
-
-   if (ifc->ifc_destroy == NULL)
-   return (EOPNOTSUPP);
+   if (ifp == NULL) {
+   ret = ENXIO;
+   goto out;
+   }
+   if (ifc->ifc_destroy == NULL) {
+   ret = EOPNOTSUPP;
+   goto out;
+   }
 
NET_LOCK();
if (ifp->if_flags & IFF_UP) {
@@ -1297,7 +1317,8 @@ if_clone_destroy(const char *name)
}
NET_UNLOCK();
ret = (*ifc->ifc_destroy)(ifp);
-
+out:
+   rw_exit_write(_clone_lock);
return (ret);
 }
 



Make *select(2) use kqfilter handlers

2020-06-23 Thread Martin Pieuchot
Diff below can be seen as 3 logical parts that together change the
current *select(2) implementation:

  - Create & destroy a per-thread kqueue in fork1() and exit1().

  - Change the kqueue_scan() interface to keep track of the end point of
a scan, this is mostly from visa@.

  - Change dopselect() to translate inputs "fd_set" into "struct kevent",
register them via kqueue_register(), wait for events via kqueue_scan()
then translate them back.

A visible change is that programs waiting in *select(2) will now sleep
on the "kqread" or "kqsel" channel.

Please test, comment and report back.

Index: kern/kern_event.c
===
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.140
diff -u -p -r1.140 kern_event.c
--- kern/kern_event.c   22 Jun 2020 13:14:32 -  1.140
+++ kern/kern_event.c   23 Jun 2020 08:44:47 -
@@ -64,9 +64,6 @@ void  KQREF(struct kqueue *);
 void   KQRELE(struct kqueue *);
 
 intkqueue_sleep(struct kqueue *, struct timespec *);
-intkqueue_scan(struct kqueue *kq, int maxevents,
-   struct kevent *ulistp, struct timespec *timeout,
-   struct proc *p, int *retval);
 
 intkqueue_read(struct file *, struct uio *, int);
 intkqueue_write(struct file *, struct uio *, int);
@@ -521,6 +518,14 @@ kqueue_alloc(struct filedesc *fdp)
return (kq);
 }
 
+void
+kqueue_exit(struct proc *p)
+{
+   kqueue_terminate(p, p->p_kq);
+   kqueue_free(p->p_kq);
+   p->p_kq = NULL;
+}
+
 int
 sys_kqueue(struct proc *p, void *v, register_t *retval)
 {
@@ -554,6 +559,7 @@ out:
 int
 sys_kevent(struct proc *p, void *v, register_t *retval)
 {
+   struct kqueue_scan_state scan;
struct filedesc* fdp = p->p_fd;
struct sys_kevent_args /* {
syscallarg(int) fd;
@@ -569,6 +575,7 @@ sys_kevent(struct proc *p, void *v, regi
struct timespec ts;
struct timespec *tsp = NULL;
int i, n, nerrors, error;
+   int ready, total;
struct kevent kev[KQ_NEVENTS];
 
if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL)
@@ -597,9 +604,9 @@ sys_kevent(struct proc *p, void *v, regi
kq = fp->f_data;
nerrors = 0;
 
-   while (SCARG(uap, nchanges) > 0) {
-   n = SCARG(uap, nchanges) > KQ_NEVENTS ?
-   KQ_NEVENTS : SCARG(uap, nchanges);
+   while ((n = SCARG(uap, nchanges)) > 0) {
+   if (n > nitems(kev))
+   n = nitems(kev);
error = copyin(SCARG(uap, changelist), kev,
n * sizeof(struct kevent));
if (error)
@@ -635,12 +642,39 @@ sys_kevent(struct proc *p, void *v, regi
goto done;
}
 
+
KQREF(kq);
FRELE(fp, p);
-   error = kqueue_scan(kq, SCARG(uap, nevents), SCARG(uap, eventlist),
-   tsp, p, );
+   /*
+* Collect as many events as we can.  The timeout on successive
+* loops is disabled (kqueue_scan() becomes non-blocking).
+*/
+   total = 0;
+   error = 0;
+   kqueue_scan_setup(, kq);
+   while ((n = SCARG(uap, nevents) - total) > 0) {
+   if (n > nitems(kev))
+   n = nitems(kev);
+   ready = kqueue_scan(, n, kev, tsp, p, );
+   if (ready == 0)
+   break;
+   error = copyout(kev, SCARG(uap, eventlist) + total,
+   sizeof(struct kevent) * ready);
+#ifdef KTRACE
+   if (KTRPOINT(p, KTR_STRUCT))
+   ktrevent(p, kev, ready);
+#endif
+   total += ready;
+   if (error || ready < n)
+   break;
+   tsp =   /* successive loops non-blocking */
+   timespecclear(tsp);
+   }
+   kqueue_scan_finish();
KQRELE(kq);
-   *retval = n;
+   if (error == EWOULDBLOCK)
+   error = 0;
+   *retval = total;
return (error);
 
  done:
@@ -894,32 +928,32 @@ kqueue_sleep(struct kqueue *kq, struct t
return (error);
 }
 
+/*
+ * Scan the kqueue, blocking if necessary until the target time is reached.
+ * If tsp is NULL we block indefinitely.  If tsp->ts_secs/nsecs are both
+ * 0 we do not block at all.
+ */
 int
-kqueue_scan(struct kqueue *kq, int maxevents, struct kevent *ulistp,
-struct timespec *tsp, struct proc *p, int *retval)
+kqueue_scan(struct kqueue_scan_state *scan, int maxevents,
+struct kevent *kevp, struct timespec *tsp, struct proc *p, int *errorp)
 {
-   struct kevent *kevp;
-   struct knote mend, mstart, *kn;
+   struct knote *kn;
+   struct kqueue *kq = scan->kqs_kq;
int s, count, nkev = 0, error = 0;
-   struct kevent kev[KQ_NEVENTS];
 
count = maxevents;
if (count == 0)
goto done;
-
-   memset(, 0, sizeof(mstart));
-   memset(, 0, sizeof(mend));
-
 retry:
 

Rename iev_ldpe to iev_lde in ldpd(8)'s lde.c

2020-06-23 Thread wlund


Dear Tech,

By applying this patch, you might save someone from confusing the renamed
variable with iev_ldpe in ldpd.c.

diff refs/remotes/origin/master 8c512ccc39fae85e26c6bf0b4b62aa7980809163
blob - 1bfd701a39082259d0913467704bee5191aa8714
blob + b284033f891ef1bdb1d4d5fe23129e252e1726ec
--- usr.sbin/ldpd/lde.c
+++ usr.sbin/ldpd/lde.c
@@ -62,7 +62,7 @@ RB_GENERATE(nbr_tree, lde_nbr, entry, lde_nbr_compare)
 struct ldpd_conf   *ldeconf;
 struct nbr_tree lde_nbrs = RB_INITIALIZER(_nbrs);
 
-static struct imsgev   *iev_ldpe;
+static struct imsgev   *iev_lde;
 static struct imsgev   *iev_main;
 
 /* ARGSUSED */
@@ -152,8 +152,8 @@ static __dead void
 lde_shutdown(void)
 {
/* close pipes */
-   msgbuf_clear(_ldpe->ibuf.w);
-   close(iev_ldpe->ibuf.fd);
+   msgbuf_clear(_lde->ibuf.w);
+   close(iev_lde->ibuf.fd);
msgbuf_clear(_main->ibuf.w);
close(iev_main->ibuf.fd);
 
@@ -163,7 +163,7 @@ lde_shutdown(void)
 
config_clear(ldeconf);
 
-   free(iev_ldpe);
+   free(iev_lde);
free(iev_main);
 
log_info("label decision engine exiting");
@@ -181,7 +181,7 @@ int
 lde_imsg_compose_ldpe(int type, uint32_t peerid, pid_t pid, void *data,
 uint16_t datalen)
 {
-   return (imsg_compose_event(iev_ldpe, type, peerid, pid,
+   return (imsg_compose_event(iev_lde, type, peerid, pid,
 -1, data, datalen));
 }
 
@@ -450,7 +450,7 @@ lde_dispatch_parent(int fd, short event, void *bula)
}
break;
case IMSG_SOCKET_IPC:
-   if (iev_ldpe) {
+   if (iev_lde) {
log_warnx("%s: received unexpected imsg fd "
"to ldpe", __func__);
break;
@@ -461,14 +461,14 @@ lde_dispatch_parent(int fd, short event, void *bula)
break;
}
 
-   if ((iev_ldpe = malloc(sizeof(struct imsgev))) == NULL)
+   if ((iev_lde = malloc(sizeof(struct imsgev))) == NULL)
fatal(NULL);
-   imsg_init(_ldpe->ibuf, fd);
-   iev_ldpe->handler = lde_dispatch_imsg;
-   iev_ldpe->events = EV_READ;
-   event_set(_ldpe->ev, iev_ldpe->ibuf.fd,
-   iev_ldpe->events, iev_ldpe->handler, iev_ldpe);
-   event_add(_ldpe->ev, NULL);
+   imsg_init(_lde->ibuf, fd);
+   iev_lde->handler = lde_dispatch_imsg;
+   iev_lde->events = EV_READ;
+   event_set(_lde->ev, iev_lde->ibuf.fd,
+   iev_lde->events, iev_lde->handler, iev_lde);
+   event_add(_lde->ev, NULL);
break;
case IMSG_RECONF_CONF:
if ((nconf = malloc(sizeof(struct ldpd_conf))) ==



Re: pipe: reduce number of allocations

2020-06-23 Thread Anton Lindqvist
On Tue, Jun 16, 2020 at 09:10:54PM +0200, Anton Lindqvist wrote:
> Hi,
> Instead of performing three distinct allocations per created pipe,
> reduce it to a single one. Not only should this be more performant, it
> also solves a kqueue related issue found by visa@ who also requested
> this change:
> 
> > If you attach an EVFILT_WRITE filter to a pipe fd, the knote gets added
> > to the peer's klist. This is a problem for kqueue  because if you close
> > the peer's fd, the knote is left in the list whose head is about to be
> > freed. knote_fdclose() is not able to clear the knote because it is not
> > registered with the peer's fd.
> 
> FreeBSD also takes a similar approach to pipe allocations.
> 
> Comments? OK?

This was backed out since it contained bug. Updated diff below in which
pipe_buffer_free() is called while holding the pipe lock in
pipe_destroy(). This prevents another thread from freeing the pipe_pair
structure while another thread is sleeping inside pipe_buffer_free().

Note that pipe_buffer_free() is also called from pipe_buffer_realloc().
The pipe is then either being created, i.e. we have exclusive access at
this point or the pipe lock is already acquired.

Comments? OK?

Index: kern/sys_pipe.c
===
RCS file: /cvs/src/sys/kern/sys_pipe.c,v
retrieving revision 1.122
diff -u -p -r1.122 sys_pipe.c
--- kern/sys_pipe.c 19 Jun 2020 02:08:48 -  1.122
+++ kern/sys_pipe.c 22 Jun 2020 19:23:56 -
@@ -49,6 +49,12 @@
 
 #include 
 
+struct pipe_pair {
+   struct pipe pp_wpipe;
+   struct pipe pp_rpipe;
+   struct rwlock pp_lock;
+};
+
 /*
  * interfaces to the outside world
  */
@@ -103,13 +109,12 @@ const struct filterops pipe_wfiltops = {
 unsigned int nbigpipe;
 static unsigned int amountpipekva;
 
-struct pool pipe_pool;
-struct pool pipe_lock_pool;
+struct pool pipe_pair_pool;
 
 intdopipe(struct proc *, int *, int);
 void   pipeselwakeup(struct pipe *);
 
-struct pipe *pipe_create(void);
+intpipe_create(struct pipe *);
 void   pipe_destroy(struct pipe *);
 intpipe_rundown(struct pipe *);
 struct pipe *pipe_peer(struct pipe *);
@@ -120,6 +125,8 @@ int pipe_iolock(struct pipe *);
 void   pipe_iounlock(struct pipe *);
 intpipe_iosleep(struct pipe *, const char *);
 
+struct pipe_pair *pipe_pair_create(void);
+
 /*
  * The pipe system call for the DTYPE_PIPE type of pipes
  */
@@ -153,33 +160,17 @@ dopipe(struct proc *p, int *ufds, int fl
 {
struct filedesc *fdp = p->p_fd;
struct file *rf, *wf;
+   struct pipe_pair *pp;
struct pipe *rpipe, *wpipe = NULL;
-   struct rwlock *lock;
int fds[2], cloexec, error;
 
cloexec = (flags & O_CLOEXEC) ? UF_EXCLOSE : 0;
 
-   if ((rpipe = pipe_create()) == NULL) {
-   error = ENOMEM;
-   goto free1;
-   }
-
-   /*
-* One lock is used per pipe pair in order to obtain exclusive access to
-* the pipe pair.
-*/
-   lock = pool_get(_lock_pool, PR_WAITOK);
-   rw_init(lock, "pipelk");
-   rpipe->pipe_lock = lock;
-
-   if ((wpipe = pipe_create()) == NULL) {
-   error = ENOMEM;
-   goto free1;
-   }
-   wpipe->pipe_lock = lock;
-
-   rpipe->pipe_peer = wpipe;
-   wpipe->pipe_peer = rpipe;
+   pp = pipe_pair_create();
+   if (pp == NULL)
+   return (ENOMEM);
+   wpipe = >pp_wpipe;
+   rpipe = >pp_rpipe;
 
fdplock(fdp);
 
@@ -226,7 +217,6 @@ free3:
rpipe = NULL;
 free2:
fdpunlock(fdp);
-free1:
pipe_destroy(wpipe);
pipe_destroy(rpipe);
return (error);
@@ -272,19 +262,14 @@ pipe_buffer_realloc(struct pipe *cpipe, 
 /*
  * initialize and allocate VM and memory for pipe
  */
-struct pipe *
-pipe_create(void)
+int
+pipe_create(struct pipe *cpipe)
 {
-   struct pipe *cpipe;
int error;
 
-   cpipe = pool_get(_pool, PR_WAITOK | PR_ZERO);
-
error = pipe_buffer_realloc(cpipe, PIPE_SIZE);
-   if (error != 0) {
-   pool_put(_pool, cpipe);
-   return (NULL);
-   }
+   if (error != 0)
+   return (error);
 
sigio_init(>pipe_sigio);
 
@@ -292,7 +277,7 @@ pipe_create(void)
cpipe->pipe_atime = cpipe->pipe_ctime;
cpipe->pipe_mtime = cpipe->pipe_ctime;
 
-   return (cpipe);
+   return (0);
 }
 
 struct pipe *
@@ -834,7 +819,6 @@ void
 pipe_destroy(struct pipe *cpipe)
 {
struct pipe *ppipe;
-   struct rwlock *lock = NULL;
 
if (cpipe == NULL)
return;
@@ -862,20 +846,14 @@ pipe_destroy(struct pipe *cpipe)
ppipe->pipe_state |= PIPE_EOF;
wakeup(ppipe);
ppipe->pipe_peer = NULL;
-   } else {
-   /*
-* Peer already gone. This is last reference to the pipe lock
-* and it must therefore be freed below.
-*/
-   

[PATCH] net: prevent if_clone_destroy from racing with rest of stack

2020-06-23 Thread Jason A. Donenfeld
You can crash a system by running something like:

for i in 1 2 3; do while true; do ifconfig bridge0 create& ifconfig bridge0 
destroy& done& done

This works with every type of interface I've tried. It appears that
if_clone_destroy and if_clone_create race with other ioctls, which
causes a variety of different UaFs or just general logic errors.

One common root cause appears to be that most ifioctl functions use
ifunit() to find an interface by name, which traverses if_list. Writes
to if_list are protected by a lock, but reads are apparently
unprotected. There's also the question of the life time of the object
returned from ifunit(). Most things that access 's if_list are
done without locking, and even if those accesses were to be locked, the
lock would be released before the object is no longer used, causing the
UaF in that case as well.

This patch fixes the issue by making if_clone_{create,destroy} exclusive
with all other ifioctls.
---
 sys/net/if.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git sys/net/if.c sys/net/if.c
index fb2f86f4a7c..9eedea83d32 100644
--- sys/net/if.c
+++ sys/net/if.c
@@ -246,6 +246,11 @@ struct task if_input_task_locked = 
TASK_INITIALIZER(if_netisr, NULL);
  */
 struct rwlock netlock = RWLOCK_INITIALIZER("netlock");
 
+/*
+ * Protect modifications to objects owned by ifnet's if_list
+ */
+struct rwlock iflist_obj_lock = RWLOCK_INITIALIZER("iflist_obj_lock");
+
 /*
  * Network interface utility routines.
  */
@@ -1905,7 +1910,7 @@ if_setrdomain(struct ifnet *ifp, int rdomain)
  * Interface ioctls.
  */
 int
-ifioctl(struct socket *so, u_long cmd, caddr_t data, struct proc *p)
+ifioctl_unlocked(struct socket *so, u_long cmd, caddr_t data, struct proc *p)
 {
struct ifnet *ifp;
struct ifreq *ifr = (struct ifreq *)data;
@@ -2432,6 +2437,35 @@ ifioctl_get(u_long cmd, caddr_t data)
return (error);
 }
 
+int
+ifioctl(struct socket *so, u_long cmd, caddr_t data, struct proc *p)
+{
+   int ret;
+
+   switch (cmd) {
+   case SIOCIFCREATE:
+   case SIOCIFDESTROY:
+   rw_enter_write(_obj_lock);
+   break;
+   default:
+   rw_enter_read(_obj_lock);
+   }
+
+   ret = ifioctl_unlocked(so, cmd, data, p);
+
+   switch (cmd) {
+   case SIOCIFCREATE:
+   case SIOCIFDESTROY:
+   rw_exit_write(_obj_lock);
+   break;
+   default:
+   rw_exit_read(_obj_lock);
+   }
+
+   return (ret);
+}
+
+
 static int
 if_sffpage_check(const caddr_t data)
 {
-- 
2.27.0



SSE in kernel?

2020-06-23 Thread johnc
Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
a sufficient guard?

I have a rasops32 putchar with SSE that is 2x faster.