[CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-16 Thread Gavin Howard
Hello,

My name is Gavin Howard. I have developed a new bc(1) and dc(1)
implementation. [0]

I propose replacing the current implementations with mine.

Advantages:

* Performance. [1]
* Extensions for ease of use.
* With build options to remove most extensions, if desired.
* Compatible with GNU bc.
* Already used by default in FreeBSD.
* Fuzzed thoroughly.
* No exec needed for bc(1) (both programs are contained in the same
  multi-call binary).

Expectations met:

* Already uses pledge(2) and unveil(2).
* No dependencies beyond C99, POSIX `make`, and POSIX `sh`.
* This includes no dependency on editline(3), even though my bc(1)
  and dc(1) have a history implementation.
* Thorough test suite.
* Comprehensive man pages.
* Locale support.

Disadvantages:

* There are incompatibilities with the current bc(1) and dc(1), which
  are listed below. All users would need to be made aware of these
  incompatibilities, so they can update scripts, and scripts in `src/`
  will also need to be updated.

Incompatibilities (failing tests from `regress/usr.bin`):

1. Current bc(1) and dc(1) return 0 for length(a) where a is 0. Mine
   return 1. This causes my dc(1) to fail `dc/t1.in` and `dc/t28.in`.
2. Current dc(1) implements arrays as part of registers. Mine keeps
   arrays and registers separate. This causes my dc(1) to fail
   `dc/t1.in` and `dc/t8.in`.
3. Current dc(1) does not print a `nul` byte if given the `P` or `a`
   commands with 0 on the top of the stack. My dc(1) does (because it
   considers 0 to have one digit, see #1). This causes my dc(1) to fail
   `dc/t3.in` and `dc/t13.in`.
4. Current dc(1) starts with empty registers, and allows the user to pop
   all items off the register stack. My dc(1) starts with a single item
   in the register and does not allow the user to remove it.
5. Current dc(1) will push an item onto a register stack for the `s`
   command. My dc(1) does not since one already exists. This, combined
   with #4, causes my dc(1) to fail `dc/t5.in`
6. Current bc(1) and dc(1) have a larger maximum `obase` than mine. This
   causes my dc(1) to fail `dc/t9.in`.
7. Current dc(1) does not reset on errors. My dc(1) does, so it fails
   `dc/t12.in`.
8. Current dc(1) allows register names with any character. My dc(1)
   requires non-control characters and has a different way of doing
   extended registers. This causes my dc(1) to fail `dc/t15.in`,
   `dc/t16.in`, `dc/t19.in`, `dc/t21.in`, and `dc/t23.in`.
9. Current bc(1) is a frontend to dc(1). Mine are combined into the same
   binary and generate and run bytecode. This means that my bc(1) fails
   all of the bc(1) regression tests (which generate dc(1) code) and
   does not have the `-c` option.

Notes:

My dc(1) also fails `dc/t10.in` because it doesn't have the `!` command,
but https://github.com/openbsd/src/commit/dc405aa075 makes it appear as
though the current dc(1) does not have the `!` command either.

In https://youtu.be/gvmGfpMgny4?t=1277 , Bob Beck said that unveil(2)
must not be used on command-line arguments, so I use unveil(2) after all
command-line files are executed.

Current version is 4.0.2. I am planning to release version 4.1.0 soon,
but held off in case you are interested and had feedback that might
help.

I am willing to help maintain them if they are put into OpenBSD, but I
am also willing to pass them off to whoever you wish, should you wish to
do so.

I do have a mirror on GitHub.

If you are not interested, feel free to ignore this email.

Regardless, thank you for your time.

Gavin Howard

[0]: https://git.yzena.com/gavin/bc
[1]: https://git.yzena.com/gavin/bc/src/branch/master/manuals/benchmarks.md



vmd(8): add barebones vioblk GET_ID support

2021-06-16 Thread Dave Voutila
The virtio spec has had a de facto command for drivers to read the
serial number off a virtual block device. QEMU introduced this feature
years ago. Last November, the virtio governing group voted in favor of
adopting it officially into v1.2 (the next virtio spec) [1].

The below diff adds the basics of handling the request returning an
empty serial number. (Serial numbers are limited to 20 bytes.) This
stops vmd from complaining about "unsupported command 0x8" when guests
send this command type.

secdata_desc{,idx} variables are renamed to just data_desc{,idx} to
semantically match the change since they're used for more than sector
data.

This is primarily part of my work to clean up and bring vmd's virtio
implementation more up to date and to align to our own
v{io,ioblk,ioscsi,etc.}(4) current capabilities. (vioblk(4) doesn't
support this yet, but Linux guests use it frequently.)

OK?

-dv

[1] https://www.oasis-open.org/committees/ballot.php?id=3536


Index: virtio.c
===
RCS file: /cvs/src/usr.sbin/vmd/virtio.c,v
retrieving revision 1.89
diff -u -p -r1.89 virtio.c
--- virtio.c16 Jun 2021 16:55:02 -  1.89
+++ virtio.c17 Jun 2021 00:10:15 -
@@ -460,15 +460,16 @@ vioblk_notifyq(struct vioblk_dev *dev)
 {
uint64_t q_gpa;
uint32_t vr_sz;
-   uint16_t idx, cmd_desc_idx, secdata_desc_idx, ds_desc_idx;
+   uint16_t idx, cmd_desc_idx, data_desc_idx, ds_desc_idx;
uint8_t ds;
int cnt, ret;
off_t secbias;
char *vr;
-   struct vring_desc *desc, *cmd_desc, *secdata_desc, *ds_desc;
+   struct vring_desc *desc, *cmd_desc, *data_desc, *ds_desc;
struct vring_avail *avail;
struct vring_used *used;
struct virtio_blk_req_hdr cmd;
+   char blkid[VIRTIO_BLK_ID_BYTES];

ret = 0;

@@ -526,10 +527,10 @@ vioblk_notifyq(struct vioblk_dev *dev)
switch (cmd.type) {
case VIRTIO_BLK_T_IN:
/* first descriptor */
-   secdata_desc_idx = cmd_desc->next & VIOBLK_QUEUE_MASK;
-   secdata_desc = [secdata_desc_idx];
+   data_desc_idx = cmd_desc->next & VIOBLK_QUEUE_MASK;
+   data_desc = [data_desc_idx];

-   if ((secdata_desc->flags & VRING_DESC_F_NEXT) == 0) {
+   if ((data_desc->flags & VRING_DESC_F_NEXT) == 0) {
log_warnx("unchained vioblk data descriptor "
"received (idx %d)", cmd_desc_idx);
goto out;
@@ -542,7 +543,7 @@ vioblk_notifyq(struct vioblk_dev *dev)
const uint8_t *secdata;

info = vioblk_start_read(dev,
-   cmd.sector + secbias, secdata_desc->len);
+   cmd.sector + secbias, data_desc->len);

/* read the data, use current data descriptor */
secdata = vioblk_finish_read(info);
@@ -553,11 +554,11 @@ vioblk_notifyq(struct vioblk_dev *dev)
goto out;
}

-   if (write_mem(secdata_desc->addr, secdata,
-   secdata_desc->len)) {
+   if (write_mem(data_desc->addr, secdata,
+   data_desc->len)) {
log_warnx("can't write sector "
"data to gpa @ 0x%llx",
-   secdata_desc->addr);
+   data_desc->addr);
dump_descriptor_chain(desc,
cmd_desc_idx);
vioblk_free_info(info);
@@ -566,11 +567,11 @@ vioblk_notifyq(struct vioblk_dev *dev)

vioblk_free_info(info);

-   secbias += (secdata_desc->len /
+   secbias += (data_desc->len /
VIRTIO_BLK_SECTOR_SIZE);
-   secdata_desc_idx = secdata_desc->next &
+   data_desc_idx = data_desc->next &
VIOBLK_QUEUE_MASK;
-   secdata_desc = [secdata_desc_idx];
+   data_desc = [data_desc_idx];

/* Guard against infinite chains */
if (++cnt >= VIOBLK_QUEUE_SIZE) {
@@ -578,27 +579,27 @@ vioblk_notifyq(struct vioblk_dev *dev)
"invalid", __func__);
goto out;
}
-  

Re: ifnewlladdr spl

2021-06-16 Thread Alexander Bluhm
Thanks dlg@ for the hint with the ixl(4) fixes in current.  I will
try so solve my 6.6 problem with that.

On Wed, Jun 16, 2021 at 10:19:03AM +0200, Martin Pieuchot wrote:
> The diff discussed in this thread reduces the scope of the splnet/splx()
> dance to only surround the modification of `if_flags'.   How is this
> related to what you said?  Is it because `if_flags' is read in interrupt
> handlers and it isn't modified atomically?  Does that imply that every
> modification of `if_flags' should be done at IPL_NET?  Does that mean
> some love is needed to ensure reading `if_flags' is coherent?

I have checked some drivers: hme, gem, em, ix.

They read if_flags in the interrupt handler, but never set it there.
So it seems to be save to change it without splnet().  We can remove
it from ifnewlladdr().

And most changes seem to happen from ioctl path, there we have net
lock.  But I see also places where if_flags is changed with kernel
lock.  This is in carp_carpdev_state() task and e.g. ixgbe_watchdog().

As we also have kernel lock in the ioctl path, I think this is the
lock which protects if_flags modifications.  Look at if_watchdog_task()
and carp_carpdev_state().

> Does that change also imply that it is safe to issue a SIOCSIFFLAGS on
> a legacy drivers without blocking interrupts?  If the IPL needs to be
> raised, this is left to the driver, right?

The drivers I have checked do splnet() in their ioctl function.
ifioctl() calls driver ioctl with net lock, but without splnet().
ixl does not use splnet() at all.

> Was the current splnet/splx() dance an easy way to block packet reception
> between multiple SIOCSIFFLAGS ioctls?  This might have been a way to not
> receive packets on a DOWN interface.  This is no longer the case on MP
> kernels as there's a window between the UP and DOWN ioctls.  Do we care?
> Is this down/up/down dance fine for legacy a modern drivers?

I guess, if we allow interrupts to happen, all packets should have
been processed after we return from setting the interface down.

Is this the correct version of the diff?  Not tested yet.

bluhm


Index: net/if.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
retrieving revision 1.641
diff -u -p -r1.641 if.c
--- net/if.c25 May 2021 22:45:09 -  1.641
+++ net/if.c16 Jun 2021 23:46:42 -
@@ -3107,9 +3107,10 @@ ifnewlladdr(struct ifnet *ifp)
 #endif
struct ifreq ifrq;
short up;
-   int s;
 
-   s = splnet();
+   NET_ASSERT_LOCKED();/* for ioctl and in6 */
+   KERNEL_ASSERT_LOCKED(); /* for if_flags */
+
up = ifp->if_flags & IFF_UP;
 
if (up) {
@@ -3143,7 +3144,6 @@ ifnewlladdr(struct ifnet *ifp)
ifrq.ifr_flags = ifp->if_flags;
(*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t));
}
-   splx(s);
 }
 
 void



Re: [External] : Re: parallel forwarding vs. bridges

2021-06-16 Thread Alexandr Nedvedicky
Hello,

re-sending the same diff updated to current if_pfsync.c.

diff avoids a recursion to PF_LOCK(), which is indicated
by panic stack here:


> > production firewalls panicked with the pf lock trying to lock against
> > itself a couple of nights ago:
> > 
> > db_enter() at db_enter+0x5
> > panic(81d47212) at panic+0x12a
> > rw_enter(82060e70,1) at rw_enter+0x261
> > pf_test(18,2,8158f000,800024d03db8) at pf_test+0x118c
> > ip6_output(fd80049a3a00,0,0,0,800024d03eb0,0) at
> > ip6_output+0xd33
> > nd6_ns_output(8158f000,0,800024d04228,fd8279b3b420,0) at
> > nd6_ns
> > _output+0x3e2
> > nd6_resolve(8158f000,fd816c6b2ae0,fd80659d8300,800024d04220
> > ,800024d040d8) at nd6_resolve+0x29d
> > ether_resolve(8158f000,fd80659d8300,800024d04220,fd816c6b2a
> > e0,800024d040d8) at ether_resolve+0x127
> > ether_output(8158f000,fd80659d8300,800024d04220,fd816c6b2ae
> > 0) at ether_output+0x2a
> > ip6_output(fd80659d8300,0,0,0,0,0) at ip6_output+0x1180
> > pfsync_undefer_notify(fd841dbec7b8) at pfsync_undefer_notify+0xac
> > pfsync_undefer(fd841dbec7b8,0) at pfsync_undefer+0x8d
> > pfsync_defer(fd82303cb310,fd8065a2) at pfsync_defer+0xfe
> > pf_test_rule(800024d04600,800024d045e8,800024d045e0,800024d045f
> > 0,800024d045d8,800024d045fe) at pf_test_rule+0x693
> > pf_test(2,3,815a9000,800024d04798) at pf_test+0x10f1
> > ip_output(fd8065a2,0,800024d04850,1,0,0) at ip_output+0x829
> > ip_forward(fd8065a2,81541800,fd817a7722d0,0) at
> > ip_forward+
> > 0x27a  
> > ip_input_if(800024d04a80,800024d04a7c,4,0,81541800) at
> > ip_input
> > _if+0x5fd
> > ipv4_input(81541800,fd8065a2) at ipv4_input+0x37
> > carp_input(8158f000,fd8065a2,5e000158) at
> > carp_input+0x1ac
> > ether_input(8158f000,fd8065a2) at ether_input+0x1c0
> > vlan_input(8152f000,fd8065a2) at vlan_input+0x19a
> > ether_input(8152f000,fd8065a2) at ether_input+0x76
> > if_input_process(801df048,800024d04ca8) at
> > if_input_process+0x5a
> > ifiq_process(801dbe00) at ifiq_process+0x6f
> > taskq_thread(8002b080) at taskq_thread+0x7b
> > end trace frame: 0x0, count: -26
> > 
> 

the recursion to PF_LOCK() is avoided by postponing the call to
pfsync_undefer() to very last phase of pf_test(), where we run
without holding PF_LOCK().

I believe we will be able to revert diff below once scope of PF_LOCK() in
pf_test() will be reduced in future. However future is not here yet. To reduce
PF_LOCK() scope in pf_test() we must give some love to pf's tables first.

comments? OKs?

thanks and
regards
sashan

8<---8<---8<--8<

diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index f1d292d67ff..9e7dc5064b9 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -1931,7 +1931,7 @@ pfsync_insert_state(struct pf_state *st)
 }
 
 int
-pfsync_defer(struct pf_state *st, struct mbuf *m)
+pfsync_defer(struct pf_state *st, struct mbuf *m, struct pfsync_deferral **ppd)
 {
struct pfsync_softc *sc = pfsyncif;
struct pfsync_deferral *pd;
@@ -1944,21 +1944,30 @@ pfsync_defer(struct pf_state *st, struct mbuf *m)
m->m_flags & (M_BCAST|M_MCAST))
return (0);
 
+   pd = pool_get(>sc_pool, M_NOWAIT);
+   if (pd == NULL)
+   return (0);
+
+   /*
+* deferral queue grows faster, than timeout can consume,
+* we have to ask packet (caller) to help timer and dispatch
+* one deferral for us.
+*
+* We wish to call pfsync_undefer() here. Unfortunately we can't,
+* because pfsync_undefer() will be calling to ip_output(),
+* which in turn will call to pf_test(), which would then attempt
+* to grab PF_LOCK() we currently hold.
+*/
if (sc->sc_deferred >= 128) {
mtx_enter(>sc_deferrals_mtx);
-   pd = TAILQ_FIRST(>sc_deferrals);
-   if (pd != NULL) {
-   TAILQ_REMOVE(>sc_deferrals, pd, pd_entry);
+   *ppd = TAILQ_FIRST(>sc_deferrals);
+   if (*ppd != NULL) {
+   TAILQ_REMOVE(>sc_deferrals, *ppd, pd_entry);
sc->sc_deferred--;
}
mtx_leave(>sc_deferrals_mtx);
-   if (pd != NULL)
-   pfsync_undefer(pd, 0);
-   }
-
-   pd = pool_get(>sc_pool, M_NOWAIT);
-   if (pd == NULL)
-   return (0);
+   } else
+   *ppd = NULL;
 
m->m_pkthdr.pf.flags |= PF_TAG_GENERATED;
SET(st->state_flags, PFSTATE_ACK);
diff --git a/sys/net/if_pfsync.h b/sys/net/if_pfsync.h
index 2520c87a36c..037219537e8 100644
--- a/sys/net/if_pfsync.h
+++ 

Re: ipsec crypto kernel lock

2021-06-16 Thread Vitaliy Makkoveev
ok mvs@

> On 17 Jun 2021, at 01:06, Alexander Bluhm  wrote:
> 
> On Wed, Jun 16, 2021 at 11:58:48PM +0300, Vitaliy Makkoveev wrote:
>> crypto_getreq() and crypto_freereq() don???t require kernel lock.
> 
> Indeed, new diff.
> 
> ok?
> 
> bluhm
> 
> Index: netinet/ip_ah.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> retrieving revision 1.146
> diff -u -p -r1.146 ip_ah.c
> --- netinet/ip_ah.c   25 Feb 2021 02:48:21 -  1.146
> +++ netinet/ip_ah.c   16 Jun 2021 21:59:38 -
> @@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> {
>   struct auth_hash *thash = NULL;
>   struct cryptoini cria, crin;
> + int error;
> 
>   /* Authentication operation. */
>   switch (ii->ii_authalg) {
> @@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
>   cria.cri_next = 
>   }
> 
> - return crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_LOCK();
> + error = crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_UNLOCK();
> + return error;
> }
> 
> /*
> @@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> int
> ah_zeroize(struct tdb *tdbp)
> {
> - int err;
> + int error;
> 
>   if (tdbp->tdb_amxkey) {
>   explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> @@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
>   tdbp->tdb_amxkey = NULL;
>   }
> 
> - err = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_LOCK();
> + error = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_UNLOCK();
>   tdbp->tdb_cryptoid = 0;
> - return err;
> + return error;
> }
> 
> /*
> @@ -696,7 +702,10 @@ ah_input(struct mbuf *m, struct tdb *tdb
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> @@ -1144,7 +1153,10 @@ ah_output(struct mbuf *m, struct tdb *td
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> Index: netinet/ip_esp.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
> retrieving revision 1.162
> diff -u -p -r1.162 ip_esp.c
> --- netinet/ip_esp.c  25 Feb 2021 02:48:21 -  1.162
> +++ netinet/ip_esp.c  16 Jun 2021 22:00:08 -
> @@ -93,6 +93,7 @@ esp_init(struct tdb *tdbp, struct xforms
>   struct enc_xform *txform = NULL;
>   struct auth_hash *thash = NULL;
>   struct cryptoini cria, crie, crin;
> + int error;
> 
>   if (!ii->ii_encalg && !ii->ii_authalg) {
>   DPRINTF(("esp_init(): neither authentication nor encryption "
> @@ -294,8 +295,11 @@ esp_init(struct tdb *tdbp, struct xforms
>   cria.cri_key = ii->ii_authkey;
>   }
> 
> - return crypto_newsession(>tdb_cryptoid,
> + KERNEL_LOCK();
> + error = crypto_newsession(>tdb_cryptoid,
>   (tdbp->tdb_encalgxform ?  : ), 0);
> + KERNEL_UNLOCK();
> + return error;
> }
> 
> /*
> @@ -304,7 +308,7 @@ esp_init(struct tdb *tdbp, struct xforms
> int
> esp_zeroize(struct tdb *tdbp)
> {
> - int err;
> + int error;
> 
>   if (tdbp->tdb_amxkey) {
>   explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> @@ -318,9 +322,11 @@ esp_zeroize(struct tdb *tdbp)
>   tdbp->tdb_emxkey = NULL;
>   }
> 
> - err = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_LOCK();
> + error = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_UNLOCK();
>   tdbp->tdb_cryptoid = 0;
> - return err;
> + return error;
> }
> 
> #define MAXBUFSIZ (AH_ALEN_MAX > ESP_MAX_IVS ? AH_ALEN_MAX : ESP_MAX_IVS)
> @@ -519,7 +525,10 @@ esp_input(struct mbuf *m, struct tdb *td
>   crde->crd_len = m->m_pkthdr.len - (skip + hlen + alen);
>   }
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> @@ -1006,7 +1015,10 @@ esp_output(struct mbuf *m, struct tdb *t
>   crda->crd_len = m->m_pkthdr.len - (skip + alen);
>   }
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> Index: netinet/ip_ipcomp.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ipcomp.c,v
> retrieving revision 1.67
> diff -u -p -r1.67 ip_ipcomp.c
> --- 

Re: ipsec crypto kernel lock

2021-06-16 Thread Alexander Bluhm
On Wed, Jun 16, 2021 at 11:58:48PM +0300, Vitaliy Makkoveev wrote:
> crypto_getreq() and crypto_freereq() don???t require kernel lock.

Indeed, new diff.

ok?

bluhm

Index: netinet/ip_ah.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
retrieving revision 1.146
diff -u -p -r1.146 ip_ah.c
--- netinet/ip_ah.c 25 Feb 2021 02:48:21 -  1.146
+++ netinet/ip_ah.c 16 Jun 2021 21:59:38 -
@@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
 {
struct auth_hash *thash = NULL;
struct cryptoini cria, crin;
+   int error;
 
/* Authentication operation. */
switch (ii->ii_authalg) {
@@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
cria.cri_next = 
}
 
-   return crypto_newsession(>tdb_cryptoid, , 0);
+   KERNEL_LOCK();
+   error = crypto_newsession(>tdb_cryptoid, , 0);
+   KERNEL_UNLOCK();
+   return error;
 }
 
 /*
@@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
 int
 ah_zeroize(struct tdb *tdbp)
 {
-   int err;
+   int error;
 
if (tdbp->tdb_amxkey) {
explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
@@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
tdbp->tdb_amxkey = NULL;
}
 
-   err = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_LOCK();
+   error = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_UNLOCK();
tdbp->tdb_cryptoid = 0;
-   return err;
+   return error;
 }
 
 /*
@@ -696,7 +702,10 @@ ah_input(struct mbuf *m, struct tdb *tdb
tc->tc_rdomain = tdb->tdb_rdomain;
memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
@@ -1144,7 +1153,10 @@ ah_output(struct mbuf *m, struct tdb *td
tc->tc_rdomain = tdb->tdb_rdomain;
memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
Index: netinet/ip_esp.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
retrieving revision 1.162
diff -u -p -r1.162 ip_esp.c
--- netinet/ip_esp.c25 Feb 2021 02:48:21 -  1.162
+++ netinet/ip_esp.c16 Jun 2021 22:00:08 -
@@ -93,6 +93,7 @@ esp_init(struct tdb *tdbp, struct xforms
struct enc_xform *txform = NULL;
struct auth_hash *thash = NULL;
struct cryptoini cria, crie, crin;
+   int error;
 
if (!ii->ii_encalg && !ii->ii_authalg) {
DPRINTF(("esp_init(): neither authentication nor encryption "
@@ -294,8 +295,11 @@ esp_init(struct tdb *tdbp, struct xforms
cria.cri_key = ii->ii_authkey;
}
 
-   return crypto_newsession(>tdb_cryptoid,
+   KERNEL_LOCK();
+   error = crypto_newsession(>tdb_cryptoid,
(tdbp->tdb_encalgxform ?  : ), 0);
+   KERNEL_UNLOCK();
+   return error;
 }
 
 /*
@@ -304,7 +308,7 @@ esp_init(struct tdb *tdbp, struct xforms
 int
 esp_zeroize(struct tdb *tdbp)
 {
-   int err;
+   int error;
 
if (tdbp->tdb_amxkey) {
explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
@@ -318,9 +322,11 @@ esp_zeroize(struct tdb *tdbp)
tdbp->tdb_emxkey = NULL;
}
 
-   err = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_LOCK();
+   error = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_UNLOCK();
tdbp->tdb_cryptoid = 0;
-   return err;
+   return error;
 }
 
 #define MAXBUFSIZ (AH_ALEN_MAX > ESP_MAX_IVS ? AH_ALEN_MAX : ESP_MAX_IVS)
@@ -519,7 +525,10 @@ esp_input(struct mbuf *m, struct tdb *td
crde->crd_len = m->m_pkthdr.len - (skip + hlen + alen);
}
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
@@ -1006,7 +1015,10 @@ esp_output(struct mbuf *m, struct tdb *t
crda->crd_len = m->m_pkthdr.len - (skip + alen);
}
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
Index: netinet/ip_ipcomp.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ipcomp.c,v
retrieving revision 1.67
diff -u -p -r1.67 ip_ipcomp.c
--- netinet/ip_ipcomp.c 30 Sep 2019 01:53:05 -  1.67
+++ netinet/ip_ipcomp.c 16 Jun 2021 22:00:45 -
@@ -79,6 +79,7 @@ ipcomp_init(struct tdb *tdbp, struct xfo
 {

Re: ipsec crypto kernel lock

2021-06-16 Thread Vitaliy Makkoveev
Hi,

crypto_getreq() and crypto_freereq() don’t require kernel lock.

> On 16 Jun 2021, at 23:05, Alexander Bluhm  wrote:
> 
> Hi,
> 
> I have seen a kernel crash with while forwarding and encrypting
> much traffic through OpenBSD IPsec gateways running iked.
> 
> kernel: protection fault trap, code=0
> Stopped at  aes_ctr_crypt+0x1e: addb$0x1,0x2e3(%rdi)
> 
> ddb{2}> trace
> aes_ctr_crypt(16367ed4be021a53,8000246e1db0) at aes_ctr_crypt+0x1e
> swcr_authenc(fd8132a21b08) at swcr_authenc+0x5c3
> swcr_process(fd8132a21b08) at swcr_process+0x1e8
> crypto_invoke(fd8132a21b08) at crypto_invoke+0xde
> taskq_thread(80200500) at taskq_thread+0x81
> end trace frame: 0x0, count: -5
> 
> *64926  109760  0  0  7 0x14200crypto
> 
> swcr_authenc() passes swe->sw_kschedule to aes_ctr_crypt(), swe has
> been freed and contains deaf006cdeaf4152, which looks like some
> sort of poison.  I suspect a use after free.
> 
> The swe value comes from the swcr_sessions global pointers.  Its
> content looks sane in ddb.  Noone touches it in swcr_authenc().  So
> I guess that an other CPU changes the global structures while
> swcr_authenc() is working with it.
> 
> The crypto thread is protected by kernel lock, both network stack
> and pfkey use net lock.  The kernel lock has been recently removed
> from pfkey.
> 
> I think the required lock for the crypto framework is the kernel
> lock.  If crypto_ functions are called, IPsec must grab the kernel
> lock.  pfkey accesses crypto only via tdb_ functions, so this diff
> also covers that case.
> 
> ok?
> 
> bluhm
> 
> Index: netinet/ip_ah.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> retrieving revision 1.146
> diff -u -p -r1.146 ip_ah.c
> --- netinet/ip_ah.c   25 Feb 2021 02:48:21 -  1.146
> +++ netinet/ip_ah.c   16 Jun 2021 17:29:37 -
> @@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> {
>   struct auth_hash *thash = NULL;
>   struct cryptoini cria, crin;
> + int error;
> 
>   /* Authentication operation. */
>   switch (ii->ii_authalg) {
> @@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
>   cria.cri_next = 
>   }
> 
> - return crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_LOCK();
> + error = crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_UNLOCK();
> + return error;
> }
> 
> /*
> @@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> int
> ah_zeroize(struct tdb *tdbp)
> {
> - int err;
> + int error;
> 
>   if (tdbp->tdb_amxkey) {
>   explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> @@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
>   tdbp->tdb_amxkey = NULL;
>   }
> 
> - err = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_LOCK();
> + error = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_UNLOCK();
>   tdbp->tdb_cryptoid = 0;
> - return err;
> + return error;
> }
> 
> /*
> @@ -626,7 +632,9 @@ ah_input(struct mbuf *m, struct tdb *tdb
>   }
> 
>   /* Get crypto descriptors. */
> + KERNEL_LOCK();
>   crp = crypto_getreq(1);
> + KERNEL_UNLOCK();
>   if (crp == NULL) {
>   DPRINTF(("%s: failed to acquire crypto descriptors\n",
>   __func__));
> @@ -696,11 +704,16 @@ ah_input(struct mbuf *m, struct tdb *tdb
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> + KERNEL_LOCK();
>   crypto_freereq(crp);
> + KERNEL_UNLOCK();
>   free(tc, M_XDATA, 0);
>   return error;
> }
> @@ -1047,7 +1060,9 @@ ah_output(struct mbuf *m, struct tdb *td
> #endif
> 
>   /* Get crypto descriptors. */
> + KERNEL_LOCK();
>   crp = crypto_getreq(1);
> + KERNEL_UNLOCK();
>   if (crp == NULL) {
>   DPRINTF(("%s: failed to acquire crypto descriptors\n",
>   __func__));
> @@ -1144,11 +1159,16 @@ ah_output(struct mbuf *m, struct tdb *td
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> 
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
> 
>  drop:
>   m_freem(m);
> + KERNEL_LOCK();
>   crypto_freereq(crp);
> + KERNEL_UNLOCK();
>   free(tc, M_XDATA, 0);
>   return error;
> }
> Index: netinet/ip_esp.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
> retrieving revision 1.162
> diff -u -p -r1.162 ip_esp.c
> --- netinet/ip_esp.c  25 Feb 2021 02:48:21 -  1.162
> +++ 

ipsec crypto kernel lock

2021-06-16 Thread Alexander Bluhm
Hi,

I have seen a kernel crash with while forwarding and encrypting
much traffic through OpenBSD IPsec gateways running iked.

kernel: protection fault trap, code=0
Stopped at  aes_ctr_crypt+0x1e: addb$0x1,0x2e3(%rdi)

ddb{2}> trace
aes_ctr_crypt(16367ed4be021a53,8000246e1db0) at aes_ctr_crypt+0x1e
swcr_authenc(fd8132a21b08) at swcr_authenc+0x5c3
swcr_process(fd8132a21b08) at swcr_process+0x1e8
crypto_invoke(fd8132a21b08) at crypto_invoke+0xde
taskq_thread(80200500) at taskq_thread+0x81
end trace frame: 0x0, count: -5

*64926  109760  0  0  7 0x14200crypto

swcr_authenc() passes swe->sw_kschedule to aes_ctr_crypt(), swe has
been freed and contains deaf006cdeaf4152, which looks like some
sort of poison.  I suspect a use after free.

The swe value comes from the swcr_sessions global pointers.  Its
content looks sane in ddb.  Noone touches it in swcr_authenc().  So
I guess that an other CPU changes the global structures while
swcr_authenc() is working with it.

The crypto thread is protected by kernel lock, both network stack
and pfkey use net lock.  The kernel lock has been recently removed
from pfkey.

I think the required lock for the crypto framework is the kernel
lock.  If crypto_ functions are called, IPsec must grab the kernel
lock.  pfkey accesses crypto only via tdb_ functions, so this diff
also covers that case.

ok?

bluhm

Index: netinet/ip_ah.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
retrieving revision 1.146
diff -u -p -r1.146 ip_ah.c
--- netinet/ip_ah.c 25 Feb 2021 02:48:21 -  1.146
+++ netinet/ip_ah.c 16 Jun 2021 17:29:37 -
@@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
 {
struct auth_hash *thash = NULL;
struct cryptoini cria, crin;
+   int error;
 
/* Authentication operation. */
switch (ii->ii_authalg) {
@@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
cria.cri_next = 
}
 
-   return crypto_newsession(>tdb_cryptoid, , 0);
+   KERNEL_LOCK();
+   error = crypto_newsession(>tdb_cryptoid, , 0);
+   KERNEL_UNLOCK();
+   return error;
 }
 
 /*
@@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
 int
 ah_zeroize(struct tdb *tdbp)
 {
-   int err;
+   int error;
 
if (tdbp->tdb_amxkey) {
explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
@@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
tdbp->tdb_amxkey = NULL;
}
 
-   err = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_LOCK();
+   error = crypto_freesession(tdbp->tdb_cryptoid);
+   KERNEL_UNLOCK();
tdbp->tdb_cryptoid = 0;
-   return err;
+   return error;
 }
 
 /*
@@ -626,7 +632,9 @@ ah_input(struct mbuf *m, struct tdb *tdb
}
 
/* Get crypto descriptors. */
+   KERNEL_LOCK();
crp = crypto_getreq(1);
+   KERNEL_UNLOCK();
if (crp == NULL) {
DPRINTF(("%s: failed to acquire crypto descriptors\n",
__func__));
@@ -696,11 +704,16 @@ ah_input(struct mbuf *m, struct tdb *tdb
tc->tc_rdomain = tdb->tdb_rdomain;
memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
+   KERNEL_LOCK();
crypto_freereq(crp);
+   KERNEL_UNLOCK();
free(tc, M_XDATA, 0);
return error;
 }
@@ -1047,7 +1060,9 @@ ah_output(struct mbuf *m, struct tdb *td
 #endif
 
/* Get crypto descriptors. */
+   KERNEL_LOCK();
crp = crypto_getreq(1);
+   KERNEL_UNLOCK();
if (crp == NULL) {
DPRINTF(("%s: failed to acquire crypto descriptors\n",
__func__));
@@ -1144,11 +1159,16 @@ ah_output(struct mbuf *m, struct tdb *td
tc->tc_rdomain = tdb->tdb_rdomain;
memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
 
-   return crypto_dispatch(crp);
+   KERNEL_LOCK();
+   error = crypto_dispatch(crp);
+   KERNEL_UNLOCK();
+   return error;
 
  drop:
m_freem(m);
+   KERNEL_LOCK();
crypto_freereq(crp);
+   KERNEL_UNLOCK();
free(tc, M_XDATA, 0);
return error;
 }
Index: netinet/ip_esp.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
retrieving revision 1.162
diff -u -p -r1.162 ip_esp.c
--- netinet/ip_esp.c25 Feb 2021 02:48:21 -  1.162
+++ netinet/ip_esp.c16 Jun 2021 17:29:55 -
@@ -93,6 +93,7 @@ esp_init(struct tdb *tdbp, struct xforms
struct enc_xform *txform = NULL;
struct auth_hash *thash = NULL;
struct cryptoini cria, crie, crin;
+   int error;
 
if 

Sony UWA-BR100 device support patch with AR9280+AR7010 onboard

2021-06-16 Thread Martin
Hi tech,

Please add Sony UWA-BR100 device ID with well known AR9280+AR7010 to source 
tree (confirmed working).

Martin





if_athn_usb.c.patch
Description: Binary data


usbdevs.h.patch
Description: Binary data


tcpdump(8): improve dhcp6

2021-06-16 Thread Martijn van Duren
According to the last commit message to print-dhcp6.c by dlg:
if someone is interested in making this easier to read, it would
be a straightforward and  well contained project to better handle
option printing.

I'm playing around a little with dhcp6 and I heeded the call.

This diff does a few things:
- Extract the 24 bits as specified by the spec instead of stripping them
  inside the printf
- Add the defines of all the options inside RFC8415.  This one can be
  expanded by iana's official list[0], but that's too much for this
  diff.
- Print the human readable name of options, or print the numeric value
  if unknown.
  I also opted for the OPTION_* syntax, instead of the full name,
  because "Identity Association for Non-temporary Addresses" seemed a
  bit much and it keeps the code simpler.
- Pretty print the clientid, serverid, and elapsed_time options. In my
  minimal test setup these were the first ones to come by. Values of
  other options continue to be printed as they were and might follow if
  people are interested in this diff.

OK?

martijn@

[0] https://www.iana.org/assignments/dhcpv6-parameters/dhcpv6-parameters.xhtml

Index: print-dhcp6.c
===
RCS file: /cvs/src/usr.sbin/tcpdump/print-dhcp6.c,v
retrieving revision 1.12
diff -u -p -r1.12 print-dhcp6.c
--- print-dhcp6.c   2 Dec 2019 22:07:20 -   1.12
+++ print-dhcp6.c   16 Jun 2021 14:55:42 -
@@ -49,10 +49,44 @@ struct rtentry;
 #define DH6_RELAY_FORW 12
 #define DH6_RELAY_REPL 13
 
+#define XIDLENGTH  3
+
+#define OPTION_CLIENTID1
+#define OPTION_SERVERID2
+#define OPTION_IA_NA   3
+#define OPTION_IA_TA   4
+#define OPTION_IAADDR  5
+#define OPTION_ORO 6
+#define OPTION_PREFERENCE  7
+#define OPTION_ELAPSED_TIME8
+#define OPTION_RELAY_MSG   9
+#define OPTION_AUTH11
+#define OPTION_UNICAST 12
+#define OPTION_STATUS_CODE 13
+#define OPTION_RAPID_COMMIT14
+#define OPTION_USER_CLASS  15
+#define OPTION_VENDOR_CLASS16
+#define OPTION_VENDOR_OPTS 17
+#define OPTION_INTERFACE_ID18
+#define OPTION_RECONF_MSG  19
+#define OPTION_RECONF_ACCEPT   20
+#define OPTION_IA_PD   25
+#define OPTION_IAPREFIX26
+#define OPTION_INFORMATION_REFRESH_TIME32
+#define OPTION_SOL_MAX_RT  82
+#define OPTION_INF_MAX_RT  83
+
+#define CASEEXPAND(var, OPTION) \
+case OPTION:   \
+   var = #OPTION;  \
+   break;
+
 static void
 dhcp6opt_print(const u_char *cp, u_int length)
 {
uint16_t code, len;
+   const char *codename;
+   char codenameunknown[sizeof("OPTION (65535)")];
u_int i;
int l = snapend - cp;
 
@@ -77,22 +111,96 @@ dhcp6opt_print(const u_char *cp, u_int l
length -= sizeof(len);
l -= sizeof(len);
 
-   printf("\n\toption %u len %u", code, len);
+   switch (code) {
+   CASEEXPAND(codename, OPTION_CLIENTID)
+   CASEEXPAND(codename, OPTION_SERVERID)
+   CASEEXPAND(codename, OPTION_IA_NA)
+   CASEEXPAND(codename, OPTION_IA_TA)
+   CASEEXPAND(codename, OPTION_IAADDR)
+   CASEEXPAND(codename, OPTION_ORO)
+   CASEEXPAND(codename, OPTION_PREFERENCE)
+   CASEEXPAND(codename, OPTION_ELAPSED_TIME)
+   CASEEXPAND(codename, OPTION_RELAY_MSG)
+   CASEEXPAND(codename, OPTION_AUTH)
+   CASEEXPAND(codename, OPTION_UNICAST)
+   CASEEXPAND(codename, OPTION_STATUS_CODE)
+   CASEEXPAND(codename, OPTION_RAPID_COMMIT)
+   CASEEXPAND(codename, OPTION_USER_CLASS)
+   CASEEXPAND(codename, OPTION_VENDOR_CLASS)
+   CASEEXPAND(codename, OPTION_VENDOR_OPTS)
+   CASEEXPAND(codename, OPTION_INTERFACE_ID)
+   CASEEXPAND(codename, OPTION_RECONF_MSG)
+   CASEEXPAND(codename, OPTION_RECONF_ACCEPT)
+   CASEEXPAND(codename, OPTION_IA_PD)
+   CASEEXPAND(codename, OPTION_IAPREFIX)
+   CASEEXPAND(codename, OPTION_INFORMATION_REFRESH_TIME)
+   CASEEXPAND(codename, OPTION_SOL_MAX_RT)
+   CASEEXPAND(codename, OPTION_INF_MAX_RT)
+   default:
+   snprintf(codenameunknown, sizeof(codenameunknown),
+   "OPTION (%hd)", code);
+   codename = codenameunknown;
+   break;
+   }
 
-   if (len > 0) {
-   if (l < len)
-   goto trunc;
-   if (length < len)
-   goto iptrunc;
-
-   printf(" ");
-   for (i = 0; i < len; i++)
-   printf("%02x", cp[4 + i] & 0xff);
-
- 

Re: bgpd show proper info in Adj-RIB-Out

2021-06-16 Thread Claudio Jeker
On Tue, Jun 15, 2021 at 06:14:38PM +0200, Claudio Jeker wrote:
> The Adj-RIB-Out should show what is sent to the peer. bgpd did not fully
> do that since it adjusted the ASPATH and the nexthop afterwards when
> building the actual UPDATE.
> 
> This diff changes that and moves the ASPATH prepend for ebgp sessions and
> the selection of the nexthop. Thanks to this the output of `bgpctl show
> rib out` should now show what is actually sent.
> 
> Please test.
> -- 
> :wq Claudio
> 
> Index: rde.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.524
> diff -u -p -r1.524 rde.c
> --- rde.c 27 May 2021 16:32:13 -  1.524
> +++ rde.c 15 Jun 2021 12:44:32 -
> @@ -3567,6 +3567,14 @@ rde_softreconfig_out(struct rib_entry *r
>   return;
>  
>   LIST_FOREACH(peer, , peer_l) {
> + /* skip ourself */
> + if (peer == peerself)
> + continue;
> + if (peer->state != PEER_UP)
> + continue;
> + /* check if peer actually supports the address family */
> + if (peer->capa.mp[re->prefix->aid] == 0)
> + continue;
>   if (peer->loc_rib_id == re->rib_id && peer->reconf_out)
>   /* Regenerate all updates. */
>   up_generate_updates(out_rules, peer, p, p);

This part of the diff fixes a few other issues. Mainly errors on startup
because wrong prefixes are sent out. Also errors about unknown peer with
id 1 are gone after adding this.
So people who noticed IPv6 updates on IPv4 sessions and vice versa should
try this out.

> Index: rde_rib.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde_rib.c,v
> retrieving revision 1.221
> diff -u -p -r1.221 rde_rib.c
> --- rde_rib.c 4 May 2021 09:27:09 -   1.221
> +++ rde_rib.c 15 Jun 2021 16:04:19 -
> @@ -1940,7 +1940,7 @@ nexthop_hash(struct bgpd_addr *nexthop)
>   sizeof(struct in6_addr));
>   break;
>   default:
> - fatalx("nexthop_hash: unsupported AF");
> + fatalx("nexthop_hash: unsupported AID %d", nexthop->aid);
>   }
>   return (_hashtbl[h & 
> nexthoptable.nexthop_hashmask]);
>  }
> Index: rde_update.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
> retrieving revision 1.129
> diff -u -p -r1.129 rde_update.c
> --- rde_update.c  27 May 2021 14:32:08 -  1.129
> +++ rde_update.c  15 Jun 2021 16:02:50 -
> @@ -45,6 +45,8 @@ static struct community comm_no_expsubco
>   .data2 = COMMUNITY_NO_EXPSUBCONFED
>  };
>  
> +static void up_prep_adjout(struct rde_peer *, struct filterstate *, 
> u_int8_t);
> +
>  static int
>  up_test_update(struct rde_peer *peer, struct prefix *p)
>  {
> @@ -166,6 +168,8 @@ again:
>   goto again;
>   }
>  
> + up_prep_adjout(peer, , addr.aid);
> +
>   /* only send update if path changed */
>   if (prefix_adjout_update(peer, , ,
>   new->pt->prefixlen, prefix_vstate(new)) == 1) {
> @@ -225,6 +229,8 @@ up_generate_default(struct filter_head *
>   return;
>   }
>  
> + up_prep_adjout(peer, , addr.aid);
> +
>   if (prefix_adjout_update(peer, , , 0, ROA_NOTFOUND) == 1) {
>   peer->prefix_out_cnt++;
>   peer->up_nlricnt++;
> @@ -244,7 +250,6 @@ up_generate_default(struct filter_head *
>   }
>  }
>  
> -/* only for IPv4 */
>  static struct bgpd_addr *
>  up_get_nexthop(struct rde_peer *peer, struct filterstate *state, u_int8_t 
> aid)
>  {
> @@ -325,6 +330,32 @@ up_get_nexthop(struct rde_peer *peer, st
>   }
>  }
>  
> +static void
> +up_prep_adjout(struct rde_peer *peer, struct filterstate *state, u_int8_t 
> aid)
> +{
> + struct bgpd_addr *nexthop;
> + struct nexthop *nh;
> + u_char *np;
> + u_int16_t nl;
> +
> + /* prepend local AS number for eBGP sessions. */
> + if (peer->conf.ebgp && (peer->flags & PEERFLAG_TRANS_AS) == 0) {
> + u_int32_t prep_as = peer->conf.local_as;
> + np = aspath_prepend(state->aspath.aspath, prep_as, 1, );
> + aspath_put(state->aspath.aspath);
> + state->aspath.aspath = aspath_get(np, nl);
> + free(np);
> + }
> +
> + /* update nexthop */
> + nexthop = up_get_nexthop(peer, state, aid);
> + nh = nexthop_get(nexthop);
> + nexthop_unref(state->nexthop);
> + state->nexthop = nh;
> + state->nhflags = 0;
> +}
> +
> +
>  static int
>  up_generate_attr(u_char *buf, int len, struct rde_peer *peer,
>  struct filterstate *state, u_int8_t aid)
> @@ -334,7 +365,7 @@ up_generate_attr(u_char *buf, int len, s
>   struct attr *oa = NULL, *newaggr = NULL;
>   u_char  *pdata;

uao references & uao_swap_off() cleanup

2021-06-16 Thread Martin Pieuchot
Diff below does two things:

- Use atomic operations for incrementing/decrementing references of
  anonymous objects.  This allows us to manipulate them without holding
  the KERNEL_LOCK().

- Rewrite the loop from uao_swap_off() to only keep a reference to the
  next item in the list.  This is imported from NetBSD and is necessary
  to introduce locking around uao_pagein().

ok?

Index: uvm/uvm_aobj.c
===
RCS file: /cvs/src/sys/uvm/uvm_aobj.c,v
retrieving revision 1.98
diff -u -p -r1.98 uvm_aobj.c
--- uvm/uvm_aobj.c  16 Jun 2021 09:02:21 -  1.98
+++ uvm/uvm_aobj.c  16 Jun 2021 09:20:26 -
@@ -779,19 +779,11 @@ uao_init(void)
 void
 uao_reference(struct uvm_object *uobj)
 {
-   KERNEL_ASSERT_LOCKED();
-   uao_reference_locked(uobj);
-}
-
-void
-uao_reference_locked(struct uvm_object *uobj)
-{
-
/* Kernel object is persistent. */
if (UVM_OBJ_IS_KERN_OBJECT(uobj))
return;
 
-   uobj->uo_refs++;
+   atomic_inc_int(>uo_refs);
 }
 
 
@@ -801,34 +793,19 @@ uao_reference_locked(struct uvm_object *
 void
 uao_detach(struct uvm_object *uobj)
 {
-   KERNEL_ASSERT_LOCKED();
-   uao_detach_locked(uobj);
-}
-
-
-/*
- * uao_detach_locked: drop a reference to an aobj
- *
- * => aobj may freed upon return.
- */
-void
-uao_detach_locked(struct uvm_object *uobj)
-{
struct uvm_aobj *aobj = (struct uvm_aobj *)uobj;
struct vm_page *pg;
 
/*
 * Detaching from kernel_object is a NOP.
 */
-   if (UVM_OBJ_IS_KERN_OBJECT(uobj)) {
+   if (UVM_OBJ_IS_KERN_OBJECT(uobj))
return;
-   }
 
/*
 * Drop the reference.  If it was the last one, destroy the object.
 */
-   uobj->uo_refs--;
-   if (uobj->uo_refs) {
+   if (atomic_dec_int_nv(>uo_refs) > 0) {
return;
}
 
@@ -1265,68 +1242,54 @@ uao_dropswap(struct uvm_object *uobj, in
 boolean_t
 uao_swap_off(int startslot, int endslot)
 {
-   struct uvm_aobj *aobj, *nextaobj, *prevaobj = NULL;
+   struct uvm_aobj *aobj;
 
/*
-* Walk the list of all anonymous UVM objects.
+* Walk the list of all anonymous UVM objects.  Grab the first.
 */
mtx_enter(_list_lock);
+   if ((aobj = LIST_FIRST(_list)) == NULL) {
+   mtx_leave(_list_lock);
+   return FALSE;
+   }
+   uao_reference(>u_obj);
 
-   for (aobj = LIST_FIRST(_list);
-aobj != NULL;
-aobj = nextaobj) {
+   do {
+   struct uvm_aobj *nextaobj;
boolean_t rv;
 
/*
-* add a ref to the aobj so it doesn't disappear
-* while we're working.
-*/
-   uao_reference_locked(>u_obj);
-
-   /*
-* now it's safe to unlock the uao list.
-* note that lock interleaving is alright with IPL_NONE mutexes.
+* Prefetch the next object and immediately hold a reference
+* on it, so neither the current nor the next entry could
+* disappear while we are iterating.
 */
-   mtx_leave(_list_lock);
-
-   if (prevaobj) {
-   uao_detach_locked(>u_obj);
-   prevaobj = NULL;
+   if ((nextaobj = LIST_NEXT(aobj, u_list)) != NULL) {
+   uao_reference(>u_obj);
}
+   mtx_leave(_list_lock);
 
/*
-* page in any pages in the swslot range.
-* if there's an error, abort and return the error.
+* Page in all pages in the swap slot range.
 */
rv = uao_pagein(aobj, startslot, endslot);
+
+   /* Drop the reference of the current object. */
+   uao_detach(>u_obj);
if (rv) {
-   uao_detach_locked(>u_obj);
+   if (nextaobj) {
+   uao_detach(>u_obj);
+   }
return rv;
}
 
-   /*
-* we're done with this aobj.
-* relock the list and drop our ref on the aobj.
-*/
+   aobj = nextaobj;
mtx_enter(_list_lock);
-   nextaobj = LIST_NEXT(aobj, u_list);
-   /*
-* prevaobj means that we have an object that we need
-* to drop a reference for. We can't just drop it now with
-* the list locked since that could cause lock recursion in
-* the case where we reduce the refcount to 0. It will be
-* released the next time we drop the list lock.
-*/
-   prevaobj = aobj;
-   }
+   } while (aobj);
 
/*
 * done with 

Re: ifnewlladdr spl

2021-06-16 Thread Martin Pieuchot
On 16/06/21(Wed) 14:26, David Gwynne wrote:
> 
> 
> > On 16 Jun 2021, at 00:39, Martin Pieuchot  wrote:
> > 
> > On 15/06/21(Tue) 22:52, David Gwynne wrote:
> >> On Mon, Jun 14, 2021 at 10:07:58AM +0200, Martin Pieuchot wrote:
> >>> On 10/06/21(Thu) 19:17, Alexander Bluhm wrote:
> >> [...] 
>  The in6_ functions need netlock.  And driver SIOCSIFFLAGS ioctl
>  must not have splnet().
> >>> 
> >>> Why not?  This is new since the introduction of intr_barrier() or this
> >>> is an old issue?
> >>> 
>  Is reducing splnet() the correct aproach?
> >> 
> >> yes.
> >> 
> >>> I doubt it is possible to answer this question without defining who owns
> >>> `if_flags' and how can it be read/written to.
> >> 
> >> NET_LOCK is what "owns" updates to if_flags.
> > 
> > Why does reducing splnet() is the correct approach?  It isn't clear to
> > me.  What's splnet() protecting then?
> 
> splnet() and all the other splraise() variants only raise the IPL on the 
> current CPU. Unless you have some other lock to coordinate with other CPUs 
> (eg KERNEL_LOCK) it doesn't really prevent other code running. ixl in 
> particular has mpsafe interrupts, so unless your ioctl code is running on the 
> same CPU that ixl is interrupting, it's not helping.
> 
> splnet() with KERNEL_LOCK provides backward compat for with legacy drivers. 
> The reason it doesn't really help with the network stack is because the stack 
> runs from nettq under NET_LOCK without KERNEL_LOCK, it's no longer a softint 
> at an IPL lower than net.

The diff discussed in this thread reduces the scope of the splnet/splx()
dance to only surround the modification of `if_flags'.   How is this
related to what you said?  Is it because `if_flags' is read in interrupt
handlers and it isn't modified atomically?  Does that imply that every
modification of `if_flags' should be done at IPL_NET?  Does that mean
some love is needed to ensure reading `if_flags' is coherent?

Does that change also imply that it is safe to issue a SIOCSIFFLAGS on
a legacy drivers without blocking interrupts?  If the IPL needs to be
raised, this is left to the driver, right?

Was the current splnet/splx() dance an easy way to block packet reception
between multiple SIOCSIFFLAGS ioctls?  This might have been a way to not
receive packets on a DOWN interface.  This is no longer the case on MP
kernels as there's a window between the UP and DOWN ioctls.  Do we care?
Is this down/up/down dance fine for legacy a modern drivers?

>  RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
>  retrieving revision 1.641
>  diff -u -p -r1.641 if.c
>  --- net/if.c 25 May 2021 22:45:09 -  1.641
>  +++ net/if.c 10 Jun 2021 14:32:12 -
>  @@ -3109,6 +3109,8 @@ ifnewlladdr(struct ifnet *ifp)
>   short up;
>   int s;
>  
>  +NET_ASSERT_LOCKED();
>  +
>   s = splnet();
>   up = ifp->if_flags & IFF_UP;
>  
>  @@ -3116,11 +3118,14 @@ ifnewlladdr(struct ifnet *ifp)
>   /* go down for a moment... */
>   ifp->if_flags &= ~IFF_UP;
>   ifrq.ifr_flags = ifp->if_flags;
>  +splx(s);
>   (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t));
>  +s = splnet();
>   }
>  
>   ifp->if_flags |= IFF_UP;
>   ifrq.ifr_flags = ifp->if_flags;
>  +splx(s);
>   (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t));
>  
>  #ifdef INET6
>  @@ -3139,11 +3144,12 @@ ifnewlladdr(struct ifnet *ifp)
>  #endif
>   if (!up) {
>   /* go back down */
>  +s = splnet();
>   ifp->if_flags &= ~IFF_UP;
>   ifrq.ifr_flags = ifp->if_flags;
>  +splx(s);
>   (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t));
>   }
>  -splx(s);
>  }
>  
>  void
>  
> >>> 
> >> 
> 



gprof: Profiling a multi-threaded application

2021-06-16 Thread Yuichiro NAITO
When I compile a multi-threaded application with '-pg' option, I always get no
results in gmon.out. With bad luck, my application gets SIGSEGV while running.
Because the buffer that holds number of caller/callee count is the only one
in the process and will be broken by multi-threaded access.

I get the idea to solve this problem from NetBSD. NetBSD has individual buffers
for each thread and merges them at the end of profiling.

NetBSD stores the reference to the individual buffer by pthread_setspecific(3).
I think it causes infinite recursive call if whole libc library (except
mcount.c) is compiled with -pg.

The compiler generates '_mcount' function call at the beginning of every
functions. If '_mcount' calls pthread_getspecific(3) to get the individual
buffer, pthread_getspecific() calls '_mcount' again and causes infinite
recursion.

NetBSD prevents from infinite recursive call by checking a global variable. But
this approach also prevents simultaneously call of '_mcount' on a multi-threaded
application. It makes a little bit wrong results of profiling.

So I added a pointer to the buffer in `struct pthread` that can be accessible
via macro call as same as pthread_self(3). This approach can prevent of
infinite recursive call of '_mcount'.

I obtained merging function from NetBSD that is called in '_mcleanup' function.
Merging function needs to walk through all the individual buffers,
I added SLIST_ENTRY member in 'struct gmonparam' to make a list of the buffers.
And also added '#ifndef _KERNEL' for the SLIST_ENTRY member not to be used for
the kernel.

But I still use pthread_getspecific(3) for that can call destructor when
a thread is terminated. The individual buffer is moved to free list to reuse
for a new thread.

Here is a patch for this approach.

diff --git a/lib/libc/gmon/gmon.c b/lib/libc/gmon/gmon.c
index 1ce0a1c289e..6887f4f5987 100644
--- a/lib/libc/gmon/gmon.c
+++ b/lib/libc/gmon/gmon.c
@@ -42,6 +42,16 @@
 
 struct gmonparam _gmonparam = { GMON_PROF_OFF };
 
+#ifndef _KERNEL
+#include 
+
+SLIST_HEAD(, gmonparam) _gmonfree = SLIST_HEAD_INITIALIZER(_gmonfree);
+SLIST_HEAD(, gmonparam) _gmoninuse = SLIST_HEAD_INITIALIZER(_gmoninuse);
+pthread_mutex_t _gmonlock = PTHREAD_MUTEX_INITIALIZER;
+pthread_key_t _gmonkey;
+struct gmonparam _gmondummy;
+#endif
+
 static int s_scale;
 /* see profil(2) where this is describe (incorrectly) */
 #defineSCALE_1_TO_10x1L
@@ -52,6 +62,13 @@ PROTO_NORMAL(moncontrol);
 PROTO_DEPRECATED(monstartup);
 static int hertz(void);
 
+#ifndef _KERNEL
+static void _gmon_destructor(void *);
+struct gmonparam *_gmon_alloc(void);
+static void _gmon_merge(void);
+static void _gmon_merge_two(struct gmonparam *, struct gmonparam *);
+#endif
+
 void
 monstartup(u_long lowpc, u_long highpc)
 {
@@ -114,6 +131,11 @@ monstartup(u_long lowpc, u_long highpc)
} else
s_scale = SCALE_1_TO_1;
 
+#ifndef _KERNEL
+   _gmondummy.state = GMON_PROF_BUSY;
+   pthread_key_create(&_gmonkey, _gmon_destructor);
+#endif
+
moncontrol(1);
return;
 
@@ -134,6 +156,194 @@ mapfailed:
 }
 __strong_alias(_monstartup,monstartup);
 
+#ifndef _KERNEL
+static void
+_gmon_destructor(void *arg)
+{
+   struct gmonparam *p = arg, *q, **prev;
+
+   if (p == &_gmondummy)
+   return;
+
+   pthread_setspecific(_gmonkey, &_gmondummy);
+
+   pthread_mutex_lock(&_gmonlock);
+   SLIST_REMOVE(&_gmoninuse, p, gmonparam, next);
+   SLIST_INSERT_HEAD(&_gmonfree, p, next);
+   pthread_mutex_unlock(&_gmonlock);
+
+   pthread_setspecific(_gmonkey, NULL);
+}
+
+struct gmonparam *
+_gmon_alloc(void)
+{
+   void *addr;
+   struct gmonparam *p;
+
+   pthread_mutex_lock(&_gmonlock);
+   p = SLIST_FIRST(&_gmonfree);
+   if (p != NULL) {
+   SLIST_REMOVE_HEAD(&_gmonfree, next);
+   SLIST_INSERT_HEAD(&_gmoninuse, p ,next);
+   } else {
+   pthread_mutex_unlock(&_gmonlock);
+   p = mmap(NULL, sizeof (struct gmonparam),
+PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
+   if (p == MAP_FAILED)
+   goto mapfailed_2;
+   *p = _gmonparam;
+   p->kcount = NULL;
+   p->kcountsize = 0;
+   p->froms = NULL;
+   p->tos = NULL;
+   addr = mmap(NULL, p->fromssize, PROT_READ|PROT_WRITE,
+   MAP_ANON|MAP_PRIVATE, -1, 0);
+   if (addr == MAP_FAILED)
+   goto mapfailed;
+   p->froms = addr;
+
+   addr = mmap(NULL, p->tossize, PROT_READ|PROT_WRITE,
+   MAP_ANON|MAP_PRIVATE, -1, 0);
+   if (addr == MAP_FAILED)
+   goto mapfailed;
+   p->tos = addr;
+   pthread_mutex_lock(&_gmonlock);
+   SLIST_INSERT_HEAD(&_gmoninuse, p ,next);
+   }
+   

Re: [External] : rework pfsync deferral timeout handling

2021-06-16 Thread Alexandr Nedvedicky
Hello,

On Wed, Jun 16, 2021 at 02:19:24PM +1000, David Gwynne wrote:
> 
> 
> > On 14 Jun 2021, at 19:12, Alexandr Nedvedicky 
> >  wrote:
> > 
> > Hello,
> > 
> > looks good to me. I think this should be committed
> > as-is. I have just one question,
> > 
> > On Mon, Jun 14, 2021 at 01:58:06PM +1000, David Gwynne wrote:
> > 
> >> @@ -1931,6 +1933,9 @@ pfsync_defer(struct pf_state *st, struct
> >> {
> >>struct pfsync_softc *sc = pfsyncif;
> >>struct pfsync_deferral *pd;
> >> +  struct timeval now;
> >> +  unsigned int sched;
> >> +  static const struct timeval defer = { 0, 2 };
> >^^^
> >I'm just curious, why there is a static?
> 
> so it can exist in the ro data section rather than get set up on the stack 
> every call.
> 

I see. Thanks for clarifying. I took a closer look at how `defer`
is being used in pfsync_defer():

timeradd(, , >pd_deadline);

yes, it makes sense to put the `defer` constant to ro data section,
so it does not need to be allocated on a stack with every call to
pfsync_defer().

thanks and
regards
sashan