Re: Update to pcap-filter.5/tcpdump.8 (was: update to tcpdump(8))

2021-09-06 Thread Theo de Raadt
I think this is a good enough start, as-is.  A big improvement.

Further iterations can refine the few funny sentences where tcpdump and
pcap-filter diverge.  I'm not worried about them, they just come off
as strange wording, or irrelevancies.

Jason McIntyre  wrote:

> On Sun, Sep 05, 2021 at 04:43:34PM +0200, Denis Fondras wrote:
> > Le Sat, Sep 04, 2021 at 09:57:10PM +0100, Jason McIntyre a ?crit :
> > > the diff looks ok to me. but run any doc changes through "mandoc
> > > -Tlint", and look at any issues your diff may have introduced. in this
> > > case it's just trailing whitespace, but it's super helpful to check your
> > > work.
> > > 
> > 
> > Thank you Jason. There is still a warning in tcpdump.8.
> > 
> > Here is a new version including changes to pcap-filter.5 and tcpdump.8
> > I did not change the examples though as tcpdump examples are broader than
> > filters.
> > 
> 
> hi.
> 
> the warning in tcpdump is fine.
> 
> the diff reads ok to me, but let's wait for a technical ok ;)
> 
> jmc
> 
> > Index: lib/libpcap/pcap-filter.5
> > ===
> > RCS file: /cvs/src/lib/libpcap/pcap-filter.5,v
> > retrieving revision 1.9
> > diff -u -p -r1.9 pcap-filter.5
> > --- lib/libpcap/pcap-filter.5   2 Sep 2021 10:59:13 -   1.9
> > +++ lib/libpcap/pcap-filter.5   5 Sep 2021 13:35:41 -
> > @@ -40,27 +40,31 @@ or
> >  .Pp
> >  The filter expression consists of one or more
> >  .Em primitives .
> > -Primitives usually consist of an ID (name or number)
> > +Primitives usually consist of an
> > +.Ar id
> > +.Pq name or number
> >  preceded by one or more qualifiers.
> >  There are three different kinds of qualifier:
> >  .Bl -tag -width "proto"
> > -.It type
> > -Type qualifiers say what kind of thing the ID name or number refers to.
> > +.It Ar type
> > +Specify which kind of address component the
> > +.Ar id
> > +name or number refers to.
> >  Possible types are
> >  .Cm host ,
> > -.Cm net ,
> > +.Cm net
> >  and
> >  .Cm port .
> > -For example,
> > +E.g.,
> >  .Dq host foo ,
> >  .Dq net 128.3 ,
> > -and
> >  .Dq port 20 .
> >  If there is no type qualifier,
> >  .Cm host
> >  is assumed.
> > -.It dir
> > -Dir qualifiers specify a particular transfer direction to and/or from an 
> > ID.
> > +.It Ar dir
> > +Specify a particular transfer direction to and/or from
> > +.Ar id .
> >  Possible directions are
> >  .Cm src ,
> >  .Cm dst ,
> > @@ -73,11 +77,13 @@ Possible directions are
> >  .Cm addr3 ,
> >  and
> >  .Cm addr4 .
> > -For example,
> > -.Cm src foo ,
> > -.Cm dst net 128.3 ,
> > -.Cm src or dst port ftp-data .
> > -If there is no dir qualifier,
> > +E.g.,
> > +.Dq src foo ,
> > +.Dq dst net 128.3 ,
> > +.Dq src or dst port ftp-data .
> > +If there is no
> > +.Ar dir
> > +qualifier,
> >  .Cm src or dst
> >  is assumed.
> >  The
> > @@ -89,57 +95,85 @@ The
> >  and
> >  .Cm addr4
> >  qualifiers are only valid for IEEE 802.11 Wireless LAN link layers.
> > -For some link layers, such as SLIP and the "cooked" Linux capture mode
> > -used for the "any" device and for some other device types, the
> > +For null link layers (i.e., point-to-point protocols such as SLIP
> > +.Pq Serial Line Internet Protocol
> > +or the
> > +.Xr pflog 4
> > +header), the
> >  .Cm inbound
> >  and
> >  .Cm outbound
> >  qualifiers can be used to specify a desired direction.
> > -.It proto
> > -Proto qualifiers restrict the match to a particular protocol.
> > -Possible
> > -protos are:
> > +.It Ar proto
> > +Restrict the match to a particular protocol.
> > +Possible protocols are:
> > +.Cm ah ,
> > +.Cm arp ,
> > +.Cm atalk ,
> > +.Cm decnet ,
> > +.Cm esp ,
> >  .Cm ether ,
> >  .Cm fddi ,
> > -.Cm tr ,
> > -.Cm wlan ,
> > +.Cm icmp ,
> > +.Cm icmp6 ,
> > +.Cm igmp ,
> > +.Cm igrp ,
> >  .Cm ip ,
> >  .Cm ip6 ,
> > -.Cm arp ,
> > +.Cm lat ,
> > +.Cm mopdl ,
> > +.Cm moprc ,
> > +.Cm pim ,
> >  .Cm rarp ,
> > -.Cm decnet ,
> > +.Cm sca ,
> > +.Cm stp ,
> >  .Cm tcp ,
> > +.Cm udp ,
> >  and
> > -.Cm udp .
> > -For example,
> > +.Cm wlan .
> > +E.g.,
> >  .Dq ether src foo ,
> >  .Dq arp net 128.3 ,
> >  .Dq tcp port 21 ,
> >  and
> >  .Dq wlan addr2 0:2:3:4:5:6 .
> > -If there is no proto qualifier,
> > +If there is no protocol qualifier,
> >  all protocols consistent with the type are assumed.
> > -For example,
> > +E.g.,
> >  .Dq src foo
> >  means
> > -.Dq (ip or arp or rarp) src foo
> > -(except the latter is not legal syntax);
> > +.Do
> > +.Pq ip or arp or rarp
> > +src foo
> > +.Dc
> > +.Pq except the latter is not legal syntax ;
> >  .Dq net bar
> >  means
> > -.Dq (ip or arp or rarp) net bar ;
> > +.Do
> > +.Pq ip or arp or rarp
> > +net bar
> > +.Dc ;
> >  and
> >  .Dq port 53
> >  means
> > -.Dq (tcp or udp) port 53 .
> > +.Do
> > +.Pq TCP or UDP
> > +port 53
> > +.Dc .
> >  .Pp
> >  .Cm fddi
> >  is actually an alias for
> >  .Cm ether ;
> >  the parser treats them identically as meaning
> > -"the data link level used on the specified netwo

Re: [please test] amd64: schedule clock interrupts against system clock

2021-09-06 Thread Patrick Wildt
Am Fri, Jul 30, 2021 at 07:55:29PM +0200 schrieb Alexander Bluhm:
> On Mon, Jul 26, 2021 at 08:12:39AM -0500, Scott Cheloha wrote:
> > On Fri, Jun 25, 2021 at 06:09:27PM -0500, Scott Cheloha wrote:
> > 1 month bump.  I really appreciate the tests I've gotten so far, thank
> > you.
> 
> On my Xeon machine it works and all regress tests pass.
> 
> But it fails on my old Opteron machine.  It hangs after attaching
> cpu1.

This seems to be caused by contention on the mutex in i8254's gettick().

With Scott's diff, delay_func is i8254_delay() on that old AMD machine.
Its gettick() implementation uses a mutex to protect I/O access to the
i8254.

When secondary CPUs come up, they will wait for CPU0 to let them boot up
further by checking for a flag:

/*
 * We need to wait until we can identify, otherwise dmesg
 * output will be messy.
 */
while ((ci->ci_flags & CPUF_IDENTIFY) == 0)
delay(10);

Now that machine has 3 secondary cores that are spinning like that.  At
the same time CPU0 waits for the core to come up:

/* wait for it to identify */
for (i = 200; (ci->ci_flags & CPUF_IDENTIFY) && i > 0; i--)
delay(10);

That means we have 3-4 cores spinning just to be able to delay().  Our
mutex implementation isn't fair, which means whoever manages to claim
the free mutex wins.  Now if CPU2 and CPU3 are spinning all the time,
CPU1 identifies and needs delay() and CPU0 waits for CPU1, maybe the
one that needs to make progress never gets it.

I changed those delay(10) in cpu_hatch() to CPU_BUSY_CYCLE() and it went
ahead a bit better instead of hanging forever.

Then I remembered an idea something from years ago: fair kernel mutexes,
so basically mutexes implemented as ticket lock, like our kerne lock.

I did a quick diff, which probably contains a million bugs, but with
this bluhm's machine boots up well.

I'm not saying this is the solution, but it might be.

Patrick

diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c
index 5cc55bb256a..c6a284beb51 100644
--- a/sys/kern/kern_lock.c
+++ b/sys/kern/kern_lock.c
@@ -248,6 +248,8 @@ __mtx_init(struct mutex *mtx, int wantipl)
mtx->mtx_owner = NULL;
mtx->mtx_wantipl = wantipl;
mtx->mtx_oldipl = IPL_NONE;
+   mtx->mtx_ticket = 0;
+   mtx->mtx_cur = 0;
 }
 
 #ifdef MULTIPROCESSOR
@@ -255,15 +257,26 @@ void
 mtx_enter(struct mutex *mtx)
 {
struct schedstate_percpu *spc = &curcpu()->ci_schedstate;
+   struct cpu_info *ci = curcpu();
+   unsigned int t;
 #ifdef MP_LOCKDEBUG
int nticks = __mp_lock_spinout;
 #endif
+   int s;
+
+   /* Avoid deadlocks after panic or in DDB */
+   if (panicstr || db_active)
+   return;
 
WITNESS_CHECKORDER(MUTEX_LOCK_OBJECT(mtx),
LOP_EXCLUSIVE | LOP_NEWORDER, NULL);
 
+   if (mtx->mtx_wantipl != IPL_NONE)
+   s = splraise(mtx->mtx_wantipl);
+
spc->spc_spinning++;
-   while (mtx_enter_try(mtx) == 0) {
+   t = atomic_inc_int_nv(&mtx->mtx_ticket) - 1;
+   while (READ_ONCE(mtx->mtx_cur) != t) {
CPU_BUSY_CYCLE();
 
 #ifdef MP_LOCKDEBUG
@@ -275,12 +288,21 @@ mtx_enter(struct mutex *mtx)
 #endif
}
spc->spc_spinning--;
+
+   mtx->mtx_owner = curcpu();
+   if (mtx->mtx_wantipl != IPL_NONE)
+   mtx->mtx_oldipl = s;
+#ifdef DIAGNOSTIC
+   ci->ci_mutex_level++;
+#endif
+   WITNESS_LOCK(MUTEX_LOCK_OBJECT(mtx), LOP_EXCLUSIVE);
 }
 
 int
 mtx_enter_try(struct mutex *mtx)
 {
-   struct cpu_info *owner, *ci = curcpu();
+   struct cpu_info *ci = curcpu();
+   unsigned int t;
int s;
 
/* Avoid deadlocks after panic or in DDB */
@@ -290,13 +312,15 @@ mtx_enter_try(struct mutex *mtx)
if (mtx->mtx_wantipl != IPL_NONE)
s = splraise(mtx->mtx_wantipl);
 
-   owner = atomic_cas_ptr(&mtx->mtx_owner, NULL, ci);
 #ifdef DIAGNOSTIC
-   if (__predict_false(owner == ci))
+   if (__predict_false(mtx->mtx_owner == ci))
panic("mtx %p: locking against myself", mtx);
 #endif
-   if (owner == NULL) {
+
+   t = READ_ONCE(mtx->mtx_cur);
+   if (atomic_cas_uint(&mtx->mtx_ticket, t, t + 1) == t) {
membar_enter_after_atomic();
+   mtx->mtx_owner = curcpu();
if (mtx->mtx_wantipl != IPL_NONE)
mtx->mtx_oldipl = s;
 #ifdef DIAGNOSTIC
@@ -369,6 +393,9 @@ mtx_leave(struct mutex *mtx)
membar_exit();
 #endif
mtx->mtx_owner = NULL;
+#ifdef MULTIPROCESSOR
+   atomic_inc_int_nv(&mtx->mtx_cur);
+#endif
if (mtx->mtx_wantipl != IPL_NONE)
splx(s);
 }



Change vm_dsize to vsize_t

2021-09-06 Thread Greg Steuck
In the course of making ASan work on OpenBSD I ran into an accounting
limitation. struct vmspace declares vm_dsize as segsz_t (aka int32_t).
This effectively limits it to 2^31 pages (2^43 bytes on amd64). This
would be enough if didn't also count sparse allocation.

ASan allocates 1/8th of the process address space as shadow memory. It
is very sparsely populated, still given VM_MAXUSER_ADDRESS value of
0x7f7fc000, it goes up to 2^47 bytes which then requires 2^44
bytes of shadow. So, it won't fit.

Hence the following unfinished patch which allows simple ASan'd programs
to detect memory errors. If people don't see an alternative solution,
I'll fix up the users of kinfo_proc.p_vm_dsize and we can decide
when/how this should land.

>From 42c776531620e9baa6735da71349c3c045fb64d8 Mon Sep 17 00:00:00 2001
From: Greg Steuck 
Date: Sun, 5 Sep 2021 13:28:43 -0700
Subject: [PATCH] Change struct vmspace to use vsize_t vm_dused

This was overflowing given high MAXDSIZ. This is very appropriate given
that the field is usually incremented by a value returned by
uvmspace_dused which returns vsize_t.

The change is not finished, only kernel is fixed so far. Userspace
tools consuming p_vm_dsize from kinfo_proc are likely not correct.
---
 sys/sys/sysctl.h | 2 +-
 sys/uvm/uvm_extern.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sys/sys/sysctl.h b/sys/sys/sysctl.h
index afdc0689dee..868ef82c696 100644
--- a/sys/sys/sysctl.h
+++ b/sys/sys/sysctl.h
@@ -443,7 +443,7 @@ struct kinfo_proc {
 
int32_t p_vm_rssize;/* SEGSZ_T: current resident set size 
in pages */
int32_t p_vm_tsize; /* SEGSZ_T: text size (pages) */
-   int32_t p_vm_dsize; /* SEGSZ_T: data size (pages) */
+   u_int64_t   p_vm_dsize; /* VSIZE_T: data size (pages) */
int32_t p_vm_ssize; /* SEGSZ_T: stack size (pages) */
 
int64_t p_uvalid;   /* CHAR: following p_u* members from 
struct user are valid */
diff --git a/sys/uvm/uvm_extern.h b/sys/uvm/uvm_extern.h
index faa4a2e5449..ebc74d97917 100644
--- a/sys/uvm/uvm_extern.h
+++ b/sys/uvm/uvm_extern.h
@@ -207,7 +207,7 @@ struct vmspace {
segsz_t vm_swrss;   /* resident set size before last swap */
segsz_t vm_tsize;   /* text size (pages) XXX */
segsz_t vm_dsize;   /* data size (pages) XXX */
-   segsz_t vm_dused;   /* data segment length (pages) XXX */
+   vsize_t vm_dused;   /* data segment length (pages) XXX */
segsz_t vm_ssize;   /* [v] stack size (pages) */
caddr_t vm_taddr;   /* [I] user virtual address of text */
caddr_t vm_daddr;   /* [I] user virtual address of data */
-- 
2.33.0



Re: [Patch] Document /upgrade.site in sysupgrade(8) man page

2021-09-06 Thread Aaron Poffenberger
Ping.

Will someone commit this? Seems like no one knows about /upgrade.site and it
fits well with sysupgrade(8).

--Aaron

On 2021-09-02 10:18 -0500, Aaron Poffenberger  wrote:
> Any further thoughts on this patch to the man page?
> 
> Cheers,
> 
> --Aaron
> 
> On 2021-08-28 12:53 -0500, Aaron Poffenberger  wrote:
> > On 2021-08-28 19:45 +0200, Sebastien Marie  wrote:
> > > On Sat, Aug 28, 2021 at 05:05:18PM +, Klemens Nanni wrote:
> > > > On Sat, Aug 28, 2021 at 10:44:48AM -0500, Aaron Poffenberger wrote:
> > > > > Based on conversations in another thread, here's a patch documenting
> > > > > use of /upgrade.site in the sysupgrade(8) man page.
> > > > > 
> > > > > The revised doc references /upgrade.site and includes examples for
> > > > > updating packages from Sebastien Marie.
> > > > 
> > > > Documenting is the right approach, imho (I didn't even know about
> > > > $MODE.site) but this should probably be done in autoinstall(8).
> > > > 
> > > > This feature has nothing to do with sysupgrade per se and next to
> > > > upgrade.site there's also install.site.
> > > 
> > > $MODE.site isn't specially related to autoinstall(8) too :)
> > > 
> > > > I'd amend autoinstall(8) and briefly mention it in sysupgrade(8), just
> > > > via EXAMPLES or so to avoid duplication but showing a neat usecase.
> > > 
> > > Currently, these scripts seems to be only documented in the FAQ
> > > (https://www.openbsd.org/faq/faq4.html#site). so having some
> > > additionnal references at them in few man pages would be good.
> > > 
> > > Having examples in sysupgrade(8) and in autoinstall(8) makes sense to
> > > me.
> > > 
> > > FAQ could be expanded a bit too.
> > > 
> > > Thanks.
> > > -- 
> > > Sebastien Marie
> > > 
> > 
> > I agree that /install.site needs explaining, but I don't think it fits
> > well in autoinstall(8). siteXX.tgz isn't touched on there and would have
> > to be addressed as well.
> > 
> > I wouldn't mine working on that, but I'd prefer to put it where it belongs,
> > or in a separate man page.
> > 
> > New diff attached. I see I put the wrong file name in the FILES section. 
> > Also,
> > I simplified the example back to Sebastien Marie's original.
> > 
> > --Aaron
> > 
> > 
> > Index: sysupgrade.8
> > ===
> > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 sysupgrade.8
> > --- sysupgrade.83 Oct 2019 12:43:58 -   1.10
> > +++ sysupgrade.828 Aug 2021 17:48:18 -
> > @@ -46,6 +46,11 @@ The bootloader will automatically choose
> >  triggering a one-shot upgrade using the files in
> >  .Pa /home/_sysupgrade .
> >  .Pp
> > +If
> > +.Pa /upgrade.site
> > +exists and is executable, it is executed at the end of the upgrade
> > +process, prior to rebooting.
> > +.Pp
> >  The options are as follows:
> >  .Bl -tag -width Ds
> >  .It Fl f
> > @@ -73,16 +78,39 @@ This is the default if the system is cur
> >  Response file for the ramdisk kernel.
> >  .It Pa /bsd.upgrade
> >  The ramdisk kernel to trigger an unattended upgrade.
> > +.It Pa /upgrade.site
> > +Executable file of actions to run after upgrade.
> >  .It Pa /etc/installurl
> >  .Ox
> >  mirror top-level URL for fetching an upgrade.
> >  .It Pa /home/_sysupgrade
> >  Directory the upgrade is downloaded to.
> >  .El
> > +.Sh EXAMPLES
> > +.Pa /upgrade.site
> > +script to upgrade packages and check sysclean when
> > +.Pa /etc/rc.firsttime
> > +runs:
> > +.Bd -literal
> > +   #!/bin/sh
> > +   PATH=/sbin:/bin:/usr/sbin:/usr/bin
> > +
> > +   # upgrade packages
> > +   echo 'pkg_add -Iu' >>/etc/rc.firsttime
> > +
> > +   # run sysclean (if installed)
> > +   echo '[ -x /usr/local/sbin/sysclean ] && \\
> > +   /usr/local/sbin/sysclean | mail -Es sysclean \\
> > +   root &' >>/etc/rc.firsttime
> > +
> > +   exit 0
> > +   #
> > +.Ed
> >  .Sh SEE ALSO
> >  .Xr signify 1 ,
> >  .Xr installurl 5 ,
> >  .Xr autoinstall 8 ,
> > +.Xr rc 8 ,
> >  .Xr release 8
> >  .Sh HISTORY
> >  .Nm