Re: bpf(4): remove ticks

2020-12-28 Thread David Gwynne
On Mon, Dec 28, 2020 at 06:56:08PM -0600, Scott Cheloha wrote:
> On Mon, Dec 28, 2020 at 10:49:59AM +1000, David Gwynne wrote:
> > On Sat, Dec 26, 2020 at 04:48:23PM -0600, Scott Cheloha wrote:
> > > Now that we've removed bd_rdStart from the bpf_d struct, removing
> > > ticks from bpf(4) itself is straightforward.
> > > 
> > > - bd_rtout becomes a timespec; update bpfioctl() accordingly.
> > >   Cap it at MAXTSLP nanoseconds to avoid arithmetic overflow
> > >   in bpfread().
> > > 
> > > - At the start of bpfread(), if a timeout is set, find the end
> > >   of the read as an absolute uptime.  This is the point where
> > >   we want to avoid overflow: if bd_rtout is only MAXTSLP
> > >   nanoseconds the timespecadd(3) will effectively never overflow.
> > > 
> > > - Before going to sleep, if we have a timeout set, compute how
> > >   much longer to sleep in nanoseconds.
> > > 
> > >   Here's a spot where an absolute timeout sleep would save a
> > >   little code, but we don't have such an interface yet.  Worth
> > >   keeping in mind for the future, though.
> > 
> > Are there any other places that would be useful though? bpf is pretty
> > special.
> 
> kqueue_scan() in kern_event.c can have a spurious wakeup, so an
> absolute sleep would be useful there.  doppoll() in sys_generic.c is
> the same, though I think mpi@/visa@ intend to refactor it to use
> kqueue_scan().
> 
> In general, if you have a thread that wants to do something on a
> strict period you need to use an absolute sleep to avoid drift.

True, something to keep in mind then.

> This code drifts:
> 
>   for (;;) {
>   do_work();
>   tsleep_nsec(, PPAUSE, "worker", SEC_TO_NSEC(1));
>   }
> 
> While this code will not:
> 
>   uint64_t deadline;
> 
>   deadline = nsecuptime();
>   for (;;) {
>   do_work();
>   deadline = nsec_advance(deadline, SEC_TO_NSEC(1));
>   tsleep_abs_nsec(, PPAUSE, "worker", deadline);
>   }
> 
> (Some of those interfaces don't actually exist, but they are easy to
> write and you can infer how they work.)
> 
> Most developers probably do not care about maintaining a strict period
> for periodic workloads, but I have a suspicion that it would keep
> system performance more deterministic because you don't have various
> periodic workloads drifting into one another, overlapping, and
> momentarily causing utilization spikes.
> 
> I know that probably sounds far-fetched... it's an idea I've been
> fussing with.
> 
> > > dlg@: You said you wanted to simplify this loop.  Unsure what shape
> > > that cleanup would take.  Should we clean it up before this change?
> > 
> > I wanted to have a single msleep_nsec call and pass INFSLP when it
> > should sleep forever.. You saw my first attempt at that. It had
> > issues.
> > 
> > > Thoughts?  ok?
> > 
> > How would this look if you used a uint64_t for nsecs for bd_rtout,
> > and the nsec uptime thing from your pool diff instead of timespecs
> > and nanouptime?
> 
> See the attached patch.  It is shorter because we can do more inline
> stuff with a uint64_t than with a timespec.

I like it.

> 
> > What's the thinking behind nanouptime instead of getnanouptime?
> 
> In general we should prefer high resolution time unless there is a
> compelling performance reason to use low-res time.
> 
> In particular, we should prefer high res time whenever userspace
> timeouts are involved as userspace can only use high-res time.
> 
> For instance, when tsleep_nsec/etc. are reimplemented with kclock
> timeouts (soon!) I will remove the use of low-res time from
> nanosleep(2), select(2)/pselect(2), poll(2)/ppoll(2), and kevent(2).
> The use of low-res time in these interfaces can cause undersleep right
> now.  They're buggy.
> 
> > More generally, what will getnanouptime do when ticks go away?
> 
> Ticks will probably never "go away" entirely.
> 
> In general, some CPU is always going to need to call tc_windup()
> regularly to keep time moving forward.  When tc_windup() is called we
> update the low-res timestamps.  So getnanouptime(9)/etc. will continue
> to work as they do today.  I also imagine that the CPU responsible for
> calling tc_windup() will continue to increment `ticks' and `jiffies'.

Only one CPU needs to do that though, not all CPUs need to tick.

> In the near-ish future I want to add support for dynamic clock
> interrupts.  This would permit a CPU to stay idle, or drop into a
> deeper power-saving state, for longer than 1 tick if it has no work to
> do.  This is not strictly-speaking "tickless" operation, as the clock
> interrupt is not disabled.  But it is nice, so it is a near-term goal.
> 
> In the Distant Future we could add support for disabling the clock
> interrupt on select CPUs.  This might be useful for certain realtime
> applications.  This would be "true tickless" operation.  But for this
> to be useful we'd need to add support for realtime scheduling policies
> (e.g. SCHED_FIFO) and support 

Re: bpf_catchpacket and bpf_wakeup optimisations

2020-12-28 Thread David Gwynne
On Mon, Dec 28, 2020 at 02:45:06PM +1000, David Gwynne wrote:
> now that bpf read timeouts are only handled on the bpfread() side,
> there's a simplification that can be made in bpf_catchpacket. the chunk
> in bpf_catchpacket that rotates the buffers when one gets full already
> does a wakeup, so we don't have to check if we have any waiting readers
> and wake them up when a buffer gets full.
> 
> we can use bd_nreaders to omgoptimise bpf_wakeup though. wakeup(9) is
> mpsafe, so we don't have to defer the call to a task. however, we can
> avoid calling wakeup() and therefore trying to take the sched lock and
> all that stuff when we know there's nothing sleeping.
> 
> this also avoids scheduling the task if there's no async stuff set up.
> it's a bit magical because it knows what's inside selwakeup.
> 
> tests? ok?

visa pointed out that i missed a bit relating to kq.

Index: bpf.c
===
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.199
diff -u -p -r1.199 bpf.c
--- bpf.c   26 Dec 2020 16:30:58 -  1.199
+++ bpf.c   29 Dec 2020 06:04:58 -
@@ -554,7 +554,6 @@ out:
return (error);
 }
 
-
 /*
  * If there are processes sleeping on this descriptor, wake them up.
  */
@@ -563,14 +562,20 @@ bpf_wakeup(struct bpf_d *d)
 {
MUTEX_ASSERT_LOCKED(>bd_mtx);
 
+   if (d->bd_nreaders)
+   wakeup(d);
+
/*
 * As long as pgsigio() and selwakeup() need to be protected
 * by the KERNEL_LOCK() we have to delay the wakeup to
 * another context to keep the hot path KERNEL_LOCK()-free.
 */
-   bpf_get(d);
-   if (!task_add(systq, >bd_wake_task))
-   bpf_put(d);
+   if ((d->bd_async && d->bd_sig) ||
+   (!klist_empty(>bd_sel.si_note) || d->bd_sel.si_seltid != 0)) {
+   bpf_get(d);
+   if (!task_add(systq, >bd_wake_task))
+   bpf_put(d);
+   }
 }
 
 void
@@ -578,7 +583,6 @@ bpf_wakeup_cb(void *xd)
 {
struct bpf_d *d = xd;
 
-   wakeup(d);
if (d->bd_async && d->bd_sig)
pgsigio(>bd_sigio, d->bd_sig, 0);
 
@@ -1542,17 +1546,6 @@ bpf_catchpacket(struct bpf_d *d, u_char 
 * reads should be woken up.
 */
do_wakeup = 1;
-   }
-
-   if (d->bd_nreaders > 0) {
-   /*
-* We have one or more threads sleeping in bpfread().
-* We got a packet, so wake up all readers.
-*/
-   if (d->bd_fbuf != NULL) {
-   ROTATE_BUFFERS(d);
-   do_wakeup = 1;
-   }
}
 
if (do_wakeup)



Re: [PATCH] Reduce case duplication in kern_sysctl

2020-12-28 Thread Greg Steuck
> I tested this by diff'ing sysctl output before/after on amd64. Since
> there's a bunch of ifdef'ness I verified RAMDISK still builds.
>
> I deliberately didn't fix the indentation to keep this diff a pure line
> motion (would run over 80 chars otherwise). I can either fix that it in
> a separate commit or in this one before submitting.
>
> OK?

Here's an updated version to account from an intervening change since I
posted this first.

Marcus, since my change conflicted with yours, maybe you could review?

Thanks
Greg

Subject: [PATCH] Reduce case duplication in kern_sysctl

This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400.
Surprisingly, RAMDISK grows from 1645 to 1678.
---
 sys/kern/kern_sysctl.c | 191 ++---
 1 file changed, 84 insertions(+), 107 deletions(-)

diff --git sys/kern/kern_sysctl.c sys/kern/kern_sysctl.c
index d462926fea2..7b164d3d32a 100644
--- sys/kern/kern_sysctl.c
+++ sys/kern/kern_sysctl.c
@@ -363,32 +363,92 @@ kern_sysctl(int *name, u_int namelen, void *oldp, size_t 
*oldlenp, void *newp,
dev_t dev;
extern int pool_debug;
 
-   /* all sysctl names at this level are terminal except a ton of them */
+   /* dispatch the non-terminal nodes first */
if (namelen != 1) {
switch (name[0]) {
-   case KERN_PROC:
-   case KERN_PROF:
-   case KERN_MALLOCSTATS:
-   case KERN_TTY:
-   case KERN_POOL:
-   case KERN_PROC_ARGS:
-   case KERN_PROC_CWD:
-   case KERN_PROC_NOBROADCASTKILL:
-   case KERN_PROC_VMMAP:
-   case KERN_SYSVIPC_INFO:
-   case KERN_SEMINFO:
-   case KERN_SHMINFO:
-   case KERN_INTRCNT:
-   case KERN_WATCHDOG:
-   case KERN_EVCOUNT:
-   case KERN_TIMECOUNTER:
-   case KERN_CPTIME2:
-   case KERN_FILE:
-   case KERN_WITNESS:
-   case KERN_AUDIO:
-   case KERN_VIDEO:
-   case KERN_CPUSTATS:
-   break;
+#ifndef SMALL_KERNEL
+   case KERN_PROC:
+   return (sysctl_doproc(name + 1, namelen - 1, oldp, oldlenp));
+   case KERN_PROC_ARGS:
+   return (sysctl_proc_args(name + 1, namelen - 1, oldp, oldlenp,
+p));
+   case KERN_PROC_CWD:
+   return (sysctl_proc_cwd(name + 1, namelen - 1, oldp, oldlenp,
+p));
+   case KERN_PROC_NOBROADCASTKILL:
+   return (sysctl_proc_nobroadcastkill(name + 1, namelen - 1,
+newp, newlen, oldp, oldlenp, p));
+   case KERN_PROC_VMMAP:
+   return (sysctl_proc_vmmap(name + 1, namelen - 1, oldp, oldlenp,
+p));
+   case KERN_FILE:
+   return (sysctl_file(name + 1, namelen - 1, oldp, oldlenp, p));
+#endif
+#if defined(GPROF) || defined(DDBPROF)
+   case KERN_PROF:
+   return (sysctl_doprof(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+#endif
+   case KERN_MALLOCSTATS:
+   return (sysctl_malloc(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen, p));
+   case KERN_TTY:
+   return (sysctl_tty(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+   case KERN_POOL:
+   return (sysctl_dopool(name + 1, namelen - 1, oldp, oldlenp));
+#if defined(SYSVMSG) || defined(SYSVSEM) || defined(SYSVSHM)
+   case KERN_SYSVIPC_INFO:
+   return (sysctl_sysvipc(name + 1, namelen - 1, oldp, oldlenp));
+#endif
+#ifdef SYSVSEM
+   case KERN_SEMINFO:
+   return (sysctl_sysvsem(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+#endif
+#ifdef SYSVSHM
+   case KERN_SHMINFO:
+   return (sysctl_sysvshm(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+#endif
+#ifndef SMALL_KERNEL
+   case KERN_INTRCNT:
+   return (sysctl_intrcnt(name + 1, namelen - 1, oldp, oldlenp));
+   case KERN_WATCHDOG:
+   return (sysctl_wdog(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+#endif
+#ifndef SMALL_KERNEL
+   case KERN_EVCOUNT:
+   return (evcount_sysctl(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+#endif
+   case KERN_TIMECOUNTER:
+   return (sysctl_tc(name + 1, namelen - 1, oldp, oldlenp,
+   newp, newlen));
+   case KERN_CPTIME2:
+   return (sysctl_cptime2(name + 1, namelen -1, oldp, oldlenp,
+   newp, newlen));
+#ifdef WITNESS
+   case KERN_WITNESSWATCH:
+   return witness_sysctl_watch(oldp, oldlenp, newp, newlen);
+   case KERN_WITNESS:
+   return witness_sysctl(name + 1, namelen - 1, oldp, oldlenp,
+   

Re: remove double call of ttyopen()

2020-12-28 Thread Jan Klemkow
On Mon, Dec 28, 2020 at 07:59:23PM +0100, Klemens Nanni wrote:
> On Mon, Dec 28, 2020 at 03:49:35PM +0100, Jan Klemkow wrote:
> > The following diff removes useless double calls of ttyopen.  l_open is
> > a pointer to ttyopen().  All other serial drivers also just use l_open,
> > as it is the general API for this.
> I'm not familiar with the subsystem/API, but one thing looks off to me.
> 
> > Index: dev/pci/cz.c
> > ===
> > RCS file: /cvs/src/sys/dev/pci/cz.c,v
> > retrieving revision 1.24
> > diff -u -p -r1.24 cz.c
> > --- dev/pci/cz.c21 May 2020 09:31:59 -  1.24
> > +++ dev/pci/cz.c28 Dec 2020 14:41:30 -
> > @@ -1034,10 +1034,6 @@ czttyopen(dev_t dev, int flags, int mode
> >  
> > splx(s);
> >  
> > -   error = ttyopen(CZTTY_DIALOUT(dev), tp, p);
> > -   if (error)
> > -   goto bad;
> > -
> > error = (*linesw[tp->t_line].l_open)(dev, tp, p);
> 
> Is `CZTTY_DIALOUT(dev)' the same as `dev'?

No, is not the same in all cases.  The macro converts cuaXX to
corresponding ttyXX.  But, it doesn't matter, would be overwritten by
l_open anyway:
...
tp->t_dev = device;
...

> Either way, the `CZTTY_DIALOUT' macro becomes unused with this diff.

The updated diff below also removes the dead CZTTY_DIALOUT macro.

OK?

Thanks,
Jan

Index: arch/luna88k/dev/siotty.c
===
RCS file: /cvs/src/sys/arch/luna88k/dev/siotty.c,v
retrieving revision 1.23
diff -u -p -r1.23 siotty.c
--- arch/luna88k/dev/siotty.c   25 Feb 2019 11:29:30 -  1.23
+++ arch/luna88k/dev/siotty.c   28 Dec 2020 14:40:29 -
@@ -503,9 +503,6 @@ sioopen(dev_t dev, int flag, int mode, s
splx(s);
}
 
-   error = ttyopen(dev, tp, p);
-   if (error > 0)
-   return error;
return (*linesw[tp->t_line].l_open)(dev, tp, p);
 }
  
Index: arch/sh/dev/scif.c
===
RCS file: /cvs/src/sys/arch/sh/dev/scif.c,v
retrieving revision 1.19
diff -u -p -r1.19 scif.c
--- arch/sh/dev/scif.c  19 Feb 2018 08:59:52 -  1.19
+++ arch/sh/dev/scif.c  28 Dec 2020 14:41:05 -
@@ -756,10 +756,6 @@ scifopen(dev_t dev, int flag, int mode, 
 
splx(s);
 
-   error = ttyopen(dev, tp, p);
-   if (error)
-   goto bad;
-
error = (*linesw[tp->t_line].l_open)(dev, tp, p);
if (error)
goto bad;
Index: dev/pci/cz.c
===
RCS file: /cvs/src/sys/dev/pci/cz.c,v
retrieving revision 1.24
diff -u -p -r1.24 cz.c
--- dev/pci/cz.c21 May 2020 09:31:59 -  1.24
+++ dev/pci/cz.c29 Dec 2020 00:48:52 -
@@ -835,7 +835,6 @@ cz_wait_pci_doorbell(struct cz_softc *cz
 
 #define CZTTYDIALOUT_MASK  0x80
 
-#defineCZTTY_DIALOUT(dev)  (minor((dev)) & CZTTYDIALOUT_MASK)
 #defineCZTTY_CZ(sc)((sc)->sc_parent)
 
 #defineCZTTY_SOFTC(dev)cztty_getttysoftc(dev)
@@ -1033,10 +1032,6 @@ czttyopen(dev_t dev, int flags, int mode
}
 
splx(s);
-
-   error = ttyopen(CZTTY_DIALOUT(dev), tp, p);
-   if (error)
-   goto bad;
 
error = (*linesw[tp->t_line].l_open)(dev, tp, p);
if (error)



Re: New ujoy(4) device for USB gamecontrollers

2020-12-28 Thread Bryan Steele
On Mon, Dec 28, 2020 at 05:03:14PM -0700, Thomas Frohwein wrote:
> Hi,
> 
> This is a diff to propose a new device type for USB gamecontrollers,
> 'ujoy'.
> 
> Rationale
> -
> 
> Since the tightening of security around USB devices, USB
> gamecontrollers that generally attach to the kitchen-sink uhid device
> don't work out of the box anymore since read permissions are absent. As
> a consequence, USB gamecontrollers don't work out of the box anymore;
> requiring a root user to find what uhid device the controller is
> attached to and chmod +r that one. This poses a barrier to using those
> devices, as well as in my opinion a security risk by enticing users to
> indiscriminately do '# chmod +r /dev/uhid*' in order to not have to
> re-apply permissions on different uhid devices multiple times.
> 
> The proposed solution is inspired by the implementation of fido(4) [1];
> a similarly narrow use case for a certain type of USB devices. It
> creates a device for gamecontrollers that has the minimum required
> permissions, but won't require chmod(1)'ing device files to get the
> controller working.
> 
> Implementation in this Diff
> ---
> 
> This diff is largely based on reyk's commit for the fido(4) device from
> December 2019 [1]. It creates a ujoy device and attaches USB devices
> identified as HUG_JOYSTICK or HUG_GAME_PAD.
> 
> ujoyopen() is set up such that it only allows read-only access. If it is
> opened with write permissions, it should error with EPERM. Only a
> subset of ioctls is allowed in ujoyioctl(), based on what I could find 
> that is used in devel/sdl2 (USB_GET_DEVICEINFO,USB_GET_REPORT,
> USB_GET_REPORT_DESC,USB_GET_REPORT_ID), as well as FIONBIO and
> FIOASYNC.
> 
> This contains conf.c and MAKEDEV.md bits for the same architectures as
> fido(4). A small man page, as well as updates to other pertinent man
> pages (usb(4), uhidev(4)) are included, again following the example of
> the commits for the fido(4) device.
> 
> Credit to brynet@ who helped me extensively with this diff.
> 
> Testing
> ---
> 
> A simple way to test this without needing to do a full release is
> building a kernel with the diff and booting into it, then creating the
> updated MAKEDEV script with:
> 
> $ cd /usr/src/etc/etc.amd64/
> $ make# this will create MAKEDEV in etc.amd64
> $ cd /dev
> # sh /usr/src/etc/etc.amd64/MAKEDEV ujoy
> 
> This creates 2 device ujoy/0 and ujoy/1 by default. The device is
> read-only by default (444).
> 
> Plug in your USB gamecontroller and check in dmesg that it attaches as
> ujoy, not uhid.
> 
> The simplest way to test is with usbhidctl(1):
> 
> $ usbhidctl -f /dev/ujoy/0 -l
> 
> This will loop through printing the state of buttons, sticks etc. until
> exited with Crtl-C.
> 
> Another way to test is with games/sdl-jstest. As the SDL backends use
> /dev/uhid* to find gamecontrollers, the way I tested this is by
> repurposing /dev/uhid0 with mknod(8) (note major 100 is for amd64):
> 
> $ cd /dev
> # rm uhid0
> # mknod -m 444 uhid0 c 100 0
> 
> I put the diff through a release(8) without xenocara on amd64, and it
> built base without problems and /dev/MAKEDEV and the /dev/ujoy directory
> all look correct.
> 
> I have tested usbhidctl and sdl-jstest with an XBox 360 gamepad and the
> Logitech F310 gamepad, the latter both in DInput and XInput mode. All
> of those work as expected.
> 
> Issues/Follow-up Tasks
> --
> 
> As ujoy devices don't attach to /dev/uhid* anymore, ports that use
> gamecontrollers will need to be patched. This seems to be mostly (if
> not exclusively) sdl/sdl2-based ports, so I anticipate that patching
> these 2 ports will take care of the bulk of the use cases.
> 
> On other OS's, writing to gamecontrollers is used at times for the
> "rumble" functionality of the controller, or setting LEDs (like the
> circle LED on XBox 360 controllers that can reflect what player number
> the gamepad is for). Those have to my knowledge never worked on OpenBSD
> and are not of any immediate interest as far as I can tell.
> 
> ok? comments?
> 
> [1] https://marc.info/?l=openbsd-cvs=157658815523208=mbox

Nice work!

I think this makes a lot of sense (slightly biased), I agree that it
felt very wrong advising users to change permissions for uhid* when
they plugged in a gamepad, exposing a bunch of kernel functionality to
games that don't need it.

Having them Just-Work (tm) by default, while doing so in a clean and
safe way will help the experience a lot for gamers (and slacking
developers) alike! :-)

Some ports patches are still required if this goes in, as mentioned
devel/sdl2 at the very least, but the current situation as it stands
controllers already don't work out of the box.

thfr@ already has my ok and some general "good idea" from others, but
feedback is appreciated, as well as any tips for getting this ready to
go in.

-Bryan.

> Index: etc/MAKEDEV.common
> 

Re: bpf(4): remove ticks

2020-12-28 Thread Scott Cheloha
On Mon, Dec 28, 2020 at 10:49:59AM +1000, David Gwynne wrote:
> On Sat, Dec 26, 2020 at 04:48:23PM -0600, Scott Cheloha wrote:
> > Now that we've removed bd_rdStart from the bpf_d struct, removing
> > ticks from bpf(4) itself is straightforward.
> > 
> > - bd_rtout becomes a timespec; update bpfioctl() accordingly.
> >   Cap it at MAXTSLP nanoseconds to avoid arithmetic overflow
> >   in bpfread().
> > 
> > - At the start of bpfread(), if a timeout is set, find the end
> >   of the read as an absolute uptime.  This is the point where
> >   we want to avoid overflow: if bd_rtout is only MAXTSLP
> >   nanoseconds the timespecadd(3) will effectively never overflow.
> > 
> > - Before going to sleep, if we have a timeout set, compute how
> >   much longer to sleep in nanoseconds.
> > 
> >   Here's a spot where an absolute timeout sleep would save a
> >   little code, but we don't have such an interface yet.  Worth
> >   keeping in mind for the future, though.
> 
> Are there any other places that would be useful though? bpf is pretty
> special.

kqueue_scan() in kern_event.c can have a spurious wakeup, so an
absolute sleep would be useful there.  doppoll() in sys_generic.c is
the same, though I think mpi@/visa@ intend to refactor it to use
kqueue_scan().

In general, if you have a thread that wants to do something on a
strict period you need to use an absolute sleep to avoid drift.

This code drifts:

for (;;) {
do_work();
tsleep_nsec(, PPAUSE, "worker", SEC_TO_NSEC(1));
}

While this code will not:

uint64_t deadline;

deadline = nsecuptime();
for (;;) {
do_work();
deadline = nsec_advance(deadline, SEC_TO_NSEC(1));
tsleep_abs_nsec(, PPAUSE, "worker", deadline);
}

(Some of those interfaces don't actually exist, but they are easy to
write and you can infer how they work.)

Most developers probably do not care about maintaining a strict period
for periodic workloads, but I have a suspicion that it would keep
system performance more deterministic because you don't have various
periodic workloads drifting into one another, overlapping, and
momentarily causing utilization spikes.

I know that probably sounds far-fetched... it's an idea I've been
fussing with.

> > dlg@: You said you wanted to simplify this loop.  Unsure what shape
> > that cleanup would take.  Should we clean it up before this change?
> 
> I wanted to have a single msleep_nsec call and pass INFSLP when it
> should sleep forever.. You saw my first attempt at that. It had
> issues.
> 
> > Thoughts?  ok?
> 
> How would this look if you used a uint64_t for nsecs for bd_rtout,
> and the nsec uptime thing from your pool diff instead of timespecs
> and nanouptime?

See the attached patch.  It is shorter because we can do more inline
stuff with a uint64_t than with a timespec.

> What's the thinking behind nanouptime instead of getnanouptime?

In general we should prefer high resolution time unless there is a
compelling performance reason to use low-res time.

In particular, we should prefer high res time whenever userspace
timeouts are involved as userspace can only use high-res time.

For instance, when tsleep_nsec/etc. are reimplemented with kclock
timeouts (soon!) I will remove the use of low-res time from
nanosleep(2), select(2)/pselect(2), poll(2)/ppoll(2), and kevent(2).
The use of low-res time in these interfaces can cause undersleep right
now.  They're buggy.

> More generally, what will getnanouptime do when ticks go away?

Ticks will probably never "go away" entirely.

In general, some CPU is always going to need to call tc_windup()
regularly to keep time moving forward.  When tc_windup() is called we
update the low-res timestamps.  So getnanouptime(9)/etc. will continue
to work as they do today.  I also imagine that the CPU responsible for
calling tc_windup() will continue to increment `ticks' and `jiffies'.

In the near-ish future I want to add support for dynamic clock
interrupts.  This would permit a CPU to stay idle, or drop into a
deeper power-saving state, for longer than 1 tick if it has no work to
do.  This is not strictly-speaking "tickless" operation, as the clock
interrupt is not disabled.  But it is nice, so it is a near-term goal.

In the Distant Future we could add support for disabling the clock
interrupt on select CPUs.  This might be useful for certain realtime
applications.  This would be "true tickless" operation.  But for this
to be useful we'd need to add support for realtime scheduling policies
(e.g. SCHED_FIFO) and support for binding threads to (and excluding
threads from) particular CPUs.  So think Distant Future, if ever.

--

Anyway, updated patch here.

Index: bpf.c
===
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.199
diff -u -p -r1.199 bpf.c
--- bpf.c   26 Dec 2020 16:30:58 -  1.199
+++ bpf.c   29 

New ujoy(4) device for USB gamecontrollers

2020-12-28 Thread Thomas Frohwein
Hi,

This is a diff to propose a new device type for USB gamecontrollers,
'ujoy'.

Rationale
-

Since the tightening of security around USB devices, USB
gamecontrollers that generally attach to the kitchen-sink uhid device
don't work out of the box anymore since read permissions are absent. As
a consequence, USB gamecontrollers don't work out of the box anymore;
requiring a root user to find what uhid device the controller is
attached to and chmod +r that one. This poses a barrier to using those
devices, as well as in my opinion a security risk by enticing users to
indiscriminately do '# chmod +r /dev/uhid*' in order to not have to
re-apply permissions on different uhid devices multiple times.

The proposed solution is inspired by the implementation of fido(4) [1];
a similarly narrow use case for a certain type of USB devices. It
creates a device for gamecontrollers that has the minimum required
permissions, but won't require chmod(1)'ing device files to get the
controller working.

Implementation in this Diff
---

This diff is largely based on reyk's commit for the fido(4) device from
December 2019 [1]. It creates a ujoy device and attaches USB devices
identified as HUG_JOYSTICK or HUG_GAME_PAD.

ujoyopen() is set up such that it only allows read-only access. If it is
opened with write permissions, it should error with EPERM. Only a
subset of ioctls is allowed in ujoyioctl(), based on what I could find 
that is used in devel/sdl2 (USB_GET_DEVICEINFO,USB_GET_REPORT,
USB_GET_REPORT_DESC,USB_GET_REPORT_ID), as well as FIONBIO and
FIOASYNC.

This contains conf.c and MAKEDEV.md bits for the same architectures as
fido(4). A small man page, as well as updates to other pertinent man
pages (usb(4), uhidev(4)) are included, again following the example of
the commits for the fido(4) device.

Credit to brynet@ who helped me extensively with this diff.

Testing
---

A simple way to test this without needing to do a full release is
building a kernel with the diff and booting into it, then creating the
updated MAKEDEV script with:

$ cd /usr/src/etc/etc.amd64/
$ make  # this will create MAKEDEV in etc.amd64
$ cd /dev
# sh /usr/src/etc/etc.amd64/MAKEDEV ujoy

This creates 2 device ujoy/0 and ujoy/1 by default. The device is
read-only by default (444).

Plug in your USB gamecontroller and check in dmesg that it attaches as
ujoy, not uhid.

The simplest way to test is with usbhidctl(1):

$ usbhidctl -f /dev/ujoy/0 -l

This will loop through printing the state of buttons, sticks etc. until
exited with Crtl-C.

Another way to test is with games/sdl-jstest. As the SDL backends use
/dev/uhid* to find gamecontrollers, the way I tested this is by
repurposing /dev/uhid0 with mknod(8) (note major 100 is for amd64):

$ cd /dev
# rm uhid0
# mknod -m 444 uhid0 c 100 0

I put the diff through a release(8) without xenocara on amd64, and it
built base without problems and /dev/MAKEDEV and the /dev/ujoy directory
all look correct.

I have tested usbhidctl and sdl-jstest with an XBox 360 gamepad and the
Logitech F310 gamepad, the latter both in DInput and XInput mode. All
of those work as expected.

Issues/Follow-up Tasks
--

As ujoy devices don't attach to /dev/uhid* anymore, ports that use
gamecontrollers will need to be patched. This seems to be mostly (if
not exclusively) sdl/sdl2-based ports, so I anticipate that patching
these 2 ports will take care of the bulk of the use cases.

On other OS's, writing to gamecontrollers is used at times for the
"rumble" functionality of the controller, or setting LEDs (like the
circle LED on XBox 360 controllers that can reflect what player number
the gamepad is for). Those have to my knowledge never worked on OpenBSD
and are not of any immediate interest as far as I can tell.

ok? comments?

[1] https://marc.info/?l=openbsd-cvs=157658815523208=mbox

Index: etc/MAKEDEV.common
===
RCS file: /cvs/src/etc/MAKEDEV.common,v
retrieving revision 1.111
diff -u -p -r1.111 MAKEDEV.common
--- etc/MAKEDEV.common  6 Jul 2020 06:11:26 -   1.111
+++ etc/MAKEDEV.common  28 Dec 2020 03:25:04 -
@@ -181,6 +181,7 @@ dnl
 target(usb, usb, 0, 1, 2, 3, 4, 5, 6, 7)dnl
 target(usb, uhid, 0, 1, 2, 3, 4, 5, 6, 7)dnl
 twrget(usb, fido, fido)dnl
+twrget(usb, ujoy, ujoy)dnl
 target(usb, ulpt, 0, 1)dnl
 target(usb, ugen, 0, 1, 2, 3, 4, 5, 6, 7)dnl
 target(usb, ttyU, 0, 1, 2, 3)dnl
@@ -365,6 +366,10 @@ __devitem(fido, fido, fido/* nodes, fido
 _mkdev(fido, fido, {-RMlist[${#RMlist[*]}]=";mkdir -p fido;rm -f" n=0
while [ $n -lt 4 ];do M fido/$n c major_fido_c $n 666;n=Add($n, 1);done
MKlist[${#MKlist[*]}]=";chmod 555 fido"-})dnl
+__devitem(ujoy, ujoy, ujoy/* nodes, ujoy)dnl
+_mkdev(ujoy, ujoy, {-RMlist[${#RMlist[*]}]=";mkdir -p ujoy;rm -f" n=0
+   while [ $n -lt 2 ];do M ujoy/$n c major_ujoy_c $n 444;n=Add($n, 1);done
+   MKlist[${#MKlist[*]}]=";chmod 

Re: Rename SIMPLEQ_ to STAILQ_, diff 1/7

2020-12-28 Thread Christian Weisgerber
Denis Fondras:

> This diff renames SIMPLEQ_* to STAILQ_* in /usr/src/sys/sys to unify with 
> FreeBSD and Linux.
> 
> I added aliases at the end of queue.h to avoid breaking base too much. they 
> will
> be removed as soon as diff 2,3,4,5,6,7 are commited.
> 
> net/sniproxy has a patch to define STAILQ_*, it may be removed later.

I applied diffs 1 and 2 and did a "make includes" to ensure that
_all_ header files were switched over to STAILQ.  I then ran a
package bulk build on amd64.

There were seven build failures.  Except for sniproxy those are all
programs developed on OpenBSD that use SIMPLEQ:

audio/morseplayer
devel/got
mail/pop3d
net/adsuck
net/oicb
net/sniproxy
x11/spectrwm

sniproxy breaks because the renamed s/SIMPLEQ/STAILQ/ macros lack
STAILQ_REMOVE().

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: remove double call of ttyopen()

2020-12-28 Thread Klemens Nanni
On Mon, Dec 28, 2020 at 03:49:35PM +0100, Jan Klemkow wrote:
> The following diff removes useless double calls of ttyopen.  l_open is
> a pointer to ttyopen().  All other serial drivers also just use l_open,
> as it is the general API for this.
I'm not familiar with the subsystem/API, but one thing looks off to me.

> Index: dev/pci/cz.c
> ===
> RCS file: /cvs/src/sys/dev/pci/cz.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 cz.c
> --- dev/pci/cz.c  21 May 2020 09:31:59 -  1.24
> +++ dev/pci/cz.c  28 Dec 2020 14:41:30 -
> @@ -1034,10 +1034,6 @@ czttyopen(dev_t dev, int flags, int mode
>  
>   splx(s);
>  
> - error = ttyopen(CZTTY_DIALOUT(dev), tp, p);
> - if (error)
> - goto bad;
> -
>   error = (*linesw[tp->t_line].l_open)(dev, tp, p);

Is `CZTTY_DIALOUT(dev)' the same as `dev'?

Either way, the `CZTTY_DIALOUT' macro becomes unused with this diff.

>   if (error)
>   goto bad;
> 



remove double call of ttyopen()

2020-12-28 Thread Jan Klemkow
Hi,

The following diff removes useless double calls of ttyopen.  l_open is
a pointer to ttyopen().  All other serial drivers also just use l_open,
as it is the general API for this.

OK?

Bye,
Jan

Index: arch/luna88k/dev/siotty.c
===
RCS file: /cvs/src/sys/arch/luna88k/dev/siotty.c,v
retrieving revision 1.23
diff -u -p -r1.23 siotty.c
--- arch/luna88k/dev/siotty.c   25 Feb 2019 11:29:30 -  1.23
+++ arch/luna88k/dev/siotty.c   28 Dec 2020 14:40:29 -
@@ -503,9 +503,6 @@ sioopen(dev_t dev, int flag, int mode, s
splx(s);
}
 
-   error = ttyopen(dev, tp, p);
-   if (error > 0)
-   return error;
return (*linesw[tp->t_line].l_open)(dev, tp, p);
 }
  
Index: arch/sh/dev/scif.c
===
RCS file: /cvs/src/sys/arch/sh/dev/scif.c,v
retrieving revision 1.19
diff -u -p -r1.19 scif.c
--- arch/sh/dev/scif.c  19 Feb 2018 08:59:52 -  1.19
+++ arch/sh/dev/scif.c  28 Dec 2020 14:41:05 -
@@ -756,10 +756,6 @@ scifopen(dev_t dev, int flag, int mode, 
 
splx(s);
 
-   error = ttyopen(dev, tp, p);
-   if (error)
-   goto bad;
-
error = (*linesw[tp->t_line].l_open)(dev, tp, p);
if (error)
goto bad;
Index: dev/pci/cz.c
===
RCS file: /cvs/src/sys/dev/pci/cz.c,v
retrieving revision 1.24
diff -u -p -r1.24 cz.c
--- dev/pci/cz.c21 May 2020 09:31:59 -  1.24
+++ dev/pci/cz.c28 Dec 2020 14:41:30 -
@@ -1034,10 +1034,6 @@ czttyopen(dev_t dev, int flags, int mode
 
splx(s);
 
-   error = ttyopen(CZTTY_DIALOUT(dev), tp, p);
-   if (error)
-   goto bad;
-
error = (*linesw[tp->t_line].l_open)(dev, tp, p);
if (error)
goto bad;



Re: sleep_setup/finish simplification

2020-12-28 Thread Martin Pieuchot
On 08/12/20(Tue) 10:06, Martin Pieuchot wrote:
> Diff below aims to simplify the API to put a thread on a sleep queue and
> reduce it to the following:
> 
>   sleep_setup();
>   /* check condition or release lock */
>   sleep_finish();
> 
> It is motivated by my work to sleep the SCHED_LOCK() but might as well
> prevent/fix some bugs.
> 
> The tricky part of the current implementation is that sleep_setup_signal()
> can already park/stop the current thread resulting in a context change.
> Should any custom accounting / lock check happen before that?  At least
> two lock primitives do so currently:  drm's schedule_timeout() and
> rwlock's rw_enter().
> 
> As a result of this diff various states can be removed and sleep_finish()
> contains the following magic:
> 
>   1. check for signal/parking
>   2. context switch or remove from sleep queue
>   3. check for signal/parking
> 
> Note that sleep_finish() could be simplified even further but I left
> that for later to ease the review.
> 
> Comments?  Oks?

Anyone?

> Index: dev/dt/dt_dev.c
> ===
> RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 dt_dev.c
> --- dev/dt/dt_dev.c   28 Sep 2020 13:16:58 -  1.10
> +++ dev/dt/dt_dev.c   7 Dec 2020 17:19:15 -
> @@ -225,10 +225,8 @@ dtread(dev_t dev, struct uio *uio, int f
>   return (EMSGSIZE);
>  
>   while (!sc->ds_evtcnt) {
> - sleep_setup(, sc, PWAIT | PCATCH, "dtread");
> - sleep_setup_signal();
> - sleep_finish(, !sc->ds_evtcnt);
> - error = sleep_finish_signal();
> + sleep_setup(, sc, PWAIT | PCATCH, "dtread", 0);
> + error = sleep_finish(, !sc->ds_evtcnt);
>   if (error == EINTR || error == ERESTART)
>   break;
>   }
> Index: dev/pci/drm/drm_linux.c
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 drm_linux.c
> --- dev/pci/drm/drm_linux.c   14 Nov 2020 23:08:47 -  1.70
> +++ dev/pci/drm/drm_linux.c   7 Dec 2020 17:19:15 -
> @@ -110,26 +110,23 @@ schedule_timeout(long timeout)
>  {
>   struct sleep_state sls;
>   long deadline;
> - int wait, spl;
> + int wait, spl, timo = 0;
>  
>   MUTEX_ASSERT_LOCKED(_mtx);
>   KASSERT(!cold);
>  
> - sleep_setup(, sch_ident, sch_priority, "schto");
>   if (timeout != MAX_SCHEDULE_TIMEOUT)
> - sleep_setup_timeout(, timeout);
> + timo = timeout;
> + sleep_setup(, sch_ident, sch_priority, "schto", timo);
>  
>   wait = (sch_proc == curproc && timeout > 0);
>  
>   spl = MUTEX_OLDIPL(_mtx);
>   MUTEX_OLDIPL(_mtx) = splsched();
>   mtx_leave(_mtx);
> -
> - sleep_setup_signal();
> -
>   if (timeout != MAX_SCHEDULE_TIMEOUT)
>   deadline = ticks + timeout;
> - sleep_finish_all(, wait);
> + sleep_finish(, wait);
>   if (timeout != MAX_SCHEDULE_TIMEOUT)
>   timeout = deadline - ticks;
>  
> Index: dev/pci/if_myx.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_myx.c,v
> retrieving revision 1.112
> diff -u -p -r1.112 if_myx.c
> --- dev/pci/if_myx.c  27 Nov 2020 00:13:15 -  1.112
> +++ dev/pci/if_myx.c  7 Dec 2020 17:19:15 -
> @@ -1396,7 +1396,7 @@ myx_down(struct myx_softc *sc)
>   (void)myx_cmd(sc, MYXCMD_SET_IFDOWN, , NULL);
>  
>   while (sc->sc_state != MYX_S_OFF) {
> - sleep_setup(, sts, PWAIT, "myxdown");
> + sleep_setup(, sts, PWAIT, "myxdown", 0);
>   membar_consumer();
>   sleep_finish(, sc->sc_state != MYX_S_OFF);
>   }
> Index: kern/kern_rwlock.c
> ===
> RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
> retrieving revision 1.45
> diff -u -p -r1.45 kern_rwlock.c
> --- kern/kern_rwlock.c2 Mar 2020 17:07:49 -   1.45
> +++ kern/kern_rwlock.c7 Dec 2020 17:19:15 -
> @@ -278,15 +278,13 @@ retry:
>   prio = op->wait_prio;
>   if (flags & RW_INTR)
>   prio |= PCATCH;
> - sleep_setup(, rwl, prio, rwl->rwl_name);
> - if (flags & RW_INTR)
> - sleep_setup_signal();
> + sleep_setup(, rwl, prio, rwl->rwl_name, 0);
>  
>   do_sleep = !rw_cas(>rwl_owner, o, set);
>  
> - sleep_finish(, do_sleep);
> + error = sleep_finish(, do_sleep);
>   if ((flags & RW_INTR) &&
> - (error = sleep_finish_signal()) != 0)
> + (error != 0))
>   return (error);
>   if (flags & RW_SLEEPFAIL)
>   return (EAGAIN);
> Index: kern/kern_sched.c
> 

Re: i386 pmap diff

2020-12-28 Thread Martin Pieuchot
On 23/12/20(Wed) 18:24, Mark Kettenis wrote:
> Diff below switches the i386 pmap to use the modern km_alloc(9)
> functions and uses IPL_VM for the pmap pool, following the example of
> amd64.

Diff below is the one I sent you last year.  It has an "#if notyet"
around the allocation that generates the following fault:

panic: uvm_fault(0xd0e39af8, 0xf1dfc000, 0, 1) -> e
Stopped at  db_enter+0x4:   popl%ebp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 0  0  0 0x1  0x2000K swapper
db_enter(d0e53909,d10c5df4,0,f1dfc000,d0ecca7c) at db_enter+0x4
panic(d0c38a96,d0e39af8,f1dfc000,1,e) at panic+0xd3
kpageflttrap(d10c5e60,f1dfc000,f1dfc000,,d0f78b00) at kpageflttrap+0x14d
trap(d10c5e60) at trap+0x26a
calltrap(8,10006,d1d91cc0,f1ee2000,d083107c) at calltrap+0xc
docopyf(d1d91cc0) at docopyf+0x5
pmap_create(1,1000,61c1cc4d,d1da2ea4,d0f7af34) at pmap_create+0xa8
uvmspace_fork(d0f7ab0c,d1d94ca0,d0f7ab0c,1,d10c5f70) at uvmspace_fork+0x56
process_new(d1d94ca0,d0f7ab0c,1) at process_new+0xeb
fork1(d0ecca7c,1,d08c8d40,0,0,d10c5f90) at fork1+0x1ba

> Don't have easy access to an i386 machine right now, so this has only
> been compile tested.

This can be reproduced in vmm(4) in case you'd like to debug it.

Index: arch/i386/i386/pmap.c
===
RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v
retrieving revision 1.210
diff -u -p -r1.210 pmap.c
--- arch/i386/i386/pmap.c   28 Dec 2020 14:02:08 -  1.210
+++ arch/i386/i386/pmap.c   28 Dec 2020 14:17:45 -
@@ -1365,7 +1365,7 @@ void
 pmap_pinit_pd_86(struct pmap *pmap)
 {
/* allocate PDP */
-   pmap->pm_pdir = uvm_km_alloc(kernel_map, NBPG);
+   pmap->pm_pdir = (vaddr_t)km_alloc(NBPG, _any, _dirty, _waitok);
if (pmap->pm_pdir == 0)
panic("pmap_pinit_pd_86: kernel_map out of virtual space!");
pmap_extract(pmap_kernel(), (vaddr_t)pmap->pm_pdir,
@@ -1397,7 +1397,8 @@ pmap_pinit_pd_86(struct pmap *pmap)
 * execution, one that lacks all kernel mappings.
 */
if (cpu_meltdown) {
-   pmap->pm_pdir_intel = uvm_km_zalloc(kernel_map, NBPG);
+   pmap->pm_pdir_intel = (vaddr_t)km_alloc(NBPG, _any, _zero,
+   _waitok);
if (pmap->pm_pdir_intel == 0)
panic("%s: kernel_map out of virtual space!", __func__);
 
@@ -1449,11 +1450,12 @@ pmap_destroy(struct pmap *pmap)
uvm_pagefree(pg);
}
 
-   uvm_km_free(kernel_map, pmap->pm_pdir, pmap->pm_pdirsize);
+   km_free((void *)pmap->pm_pdir, pmap->pm_pdirsize, _any, _dirty);
pmap->pm_pdir = 0;
 
if (pmap->pm_pdir_intel) {
-   uvm_km_free(kernel_map, pmap->pm_pdir_intel, pmap->pm_pdirsize);
+   km_free((void *)pmap->pm_pdir_intel, pmap->pm_pdirsize,
+   _any, _dirty);
pmap->pm_pdir_intel = 0;
}
 
@@ -2522,8 +2524,9 @@ pmap_enter_special_86(vaddr_t va, paddr_
__func__, va);
 
if (!pmap->pm_pdir_intel) {
-   if ((pmap->pm_pdir_intel = uvm_km_zalloc(kernel_map, NBPG))
-   == 0)
+   pmap->pm_pdir_intel = (vaddr_t)km_alloc(NBPG, _any, _zero,
+   _waitok);
+   if (pmap->pm_pdir_intel == 0)
panic("%s: kernel_map out of virtual space!", __func__);
if (!pmap_extract(pmap, pmap->pm_pdir_intel,
>pm_pdirpa_intel))
Index: arch/i386/i386/pmapae.c
===
RCS file: /cvs/src/sys/arch/i386/i386/pmapae.c,v
retrieving revision 1.60
diff -u -p -r1.60 pmapae.c
--- arch/i386/i386/pmapae.c 23 Sep 2020 15:13:26 -  1.60
+++ arch/i386/i386/pmapae.c 28 Dec 2020 14:17:45 -
@@ -738,7 +738,7 @@ pmap_bootstrap_pae(void)
(uint32_t)VM_PAGE_TO_PHYS(ptppg));
}
}
-   uvm_km_free(kernel_map, (vaddr_t)pd, NBPG);
+   km_free(pd, NBPG, _any, _dirty);
DPRINTF("%s: freeing PDP 0x%x\n", __func__, (uint32_t)pd);
}
 
@@ -944,7 +944,8 @@ pmap_pinit_pd_pae(struct pmap *pmap)
paddr_t pdidx[4];
 
/* allocate PDP */
-   pmap->pm_pdir = uvm_km_alloc(kernel_map, 4 * NBPG);
+   pmap->pm_pdir = (vaddr_t)km_alloc(4 * NBPG, _any, _dirty,
+   _waitok);
if (pmap->pm_pdir == 0)
panic("pmap_pinit_pd_pae: kernel_map out of virtual space!");
/* page index is in the pmap! */
@@ -997,7 +998,8 @@ pmap_pinit_pd_pae(struct pmap *pmap)
if (cpu_meltdown) {
int i;
 
-   if ((va = uvm_km_zalloc(kernel_map, 4 * NBPG)) == 0)
+   va = (vaddr_t)km_alloc(4 * NBPG, _any, _zero, _waitok);
+   if (va == 0)
panic("%s: kernel_map out of virtual 

Re: Revise fd close notification for kqueue-based select(2) and poll(2)

2020-12-28 Thread Visa Hankala
On Sun, Dec 27, 2020 at 05:09:46PM +, Visa Hankala wrote:
> This patch revises the way how kqueue notifies select(2) about the
> closing of monitored file descriptors. Instead of returning EBADF through
> kqueue_scan(), the error is conveyed in struct kevent. This is excessive
> for select(2) but should be useful with kqueue-based poll(2).

Actually, I think it is possible to implement kqueue-based poll(2)
with fd close handling using the existing kqueue functionality.
In the rare case where kqueue_scan() returns EBADF, which tells that
a monitored file descriptor has been closed, kqueue-based poll(2)
could rerun its kqueue_register() calls without the EV_ADD flag.
This lets the code check the status of the initial event registrations
without interference from quick fd reuse.



Re: Fwd: gre(4): mgre

2020-12-28 Thread David Gwynne
On Sun, Nov 29, 2020 at 08:30:23PM +0100, Pierre Emeriaud wrote:
> Le sam. 28 nov. 2020 ?? 21:46, Jason McIntyre  a ??crit :
> > > > > +.Bd -literal
> > > > add "-offset indent" to match the other examples
> > > Done, although I copied this block from gre example, so there's
> > > another occurrence here which I didn't touch.
> > >
> >
> > yes, sorry, that's my mistake. i think the width of the gre example
> > probably caused that. so i think you should keep your original text
> > (i.e. no indent for the artwork; indent for commands).
> 
> There you are. David, does this look sound to you?

yes, ok by me.

> Index: share/man/man4/gre.4
> ===
> RCS file: /cvs/src/share/man/man4/gre.4,v
> retrieving revision 1.79
> diff -u -p -u -r1.79 gre.4
> --- share/man/man4/gre.418 Nov 2020 16:19:54 -  1.79
> +++ share/man/man4/gre.429 Nov 2020 19:26:28 -
> @@ -455,6 +455,67 @@ In most cases the following should work:
>  .Bd -literal -offset indent
>  pass quick on gre proto gre no state
>  .Ed
> +.Ss Point-to-Multipoint Layer 3 GRE tunnel interfaces (mgre) example
> +.Nm mgre
> +can be used to build a point-to-multipoint tunnel network to several
> +hosts using a single
> +.Nm mgre
> +interface.
> +.Pp
> +In this example the host A has an outer IP of 198.51.100.12, host
> +B has 203.0.113.27, and host C has 203.0.113.254.
> +.Pp
> +Addressing within the tunnel is done using 192.0.2.0/24:
> +.Bd -literal
> ++--- Host B
> +   /
> +  /
> +Host A --- tunnel ---+
> +  \e
> +   \e
> ++--- Host C
> +.Ed
> +.Pp
> +On Host A:
> +.Bd -literal -offset indent
> +# ifconfig mgreN create
> +# ifconfig mgreN tunneladdr 198.51.100.12
> +# ifconfig mgreN inet 192.0.2.1 netmask 0xff00 up
> +.Ed
> +.Pp
> +On Host B:
> +.Bd -literal -offset indent
> +# ifconfig mgreN create
> +# ifconfig mgreN tunneladdr 203.0.113.27
> +# ifconfig mgreN inet 192.0.2.2 netmask 0xff00 up
> +.Ed
> +.Pp
> +On Host C:
> +.Bd -literal -offset indent
> +# ifconfig mgreN create
> +# ifconfig mgreN tunneladdr 203.0.113.254
> +# ifconfig mgreN inet 192.0.2.3 netmask 0xff00 up
> +.Ed
> +.Pp
> +To reach Host B over the tunnel (from Host A), there has to be a
> +route on Host A specifying the next-hop:
> +.Pp
> +.Dl # route add -host 192.0.2.2 203.0.113.27 -iface -ifp mgreN
> +.Pp
> +Similarly, to reach Host A over the tunnel from Host B, a route must
> +be present on B with A's outer IP as next-hop:
> +.Pp
> +.Dl # route add -host 192.0.2.1 198.51.100.12 -iface -ifp mgreN
> +.Pp
> +The same tunnel interface can then be used between host B and C by
> +adding the appropriate routes, making the network any-to-any instead
> +of hub-and-spoke:
> +.Pp
> +On Host B:
> +.Dl # route add -host 192.0.2.3 203.0.113.254 -iface -ifp mgreN
> +.Pp
> +On Host C:
> +.Dl # route add -host 192.0.2.2 203.0.113.27 -iface -ifp mgreN
>  .Ss Point-to-Point Ethernet over GRE tunnel interfaces (egre) example
>  .Nm egre
>  can be used to carry Ethernet traffic between two endpoints over