Re: Add rtable capability to login.conf

2022-02-24 Thread Matthew Martin
On Fri, Feb 18, 2022 at 03:25:51PM -0500, Ted Unangst wrote:
> On 2022-02-06, Ted Unangst wrote:
> > On 2022-02-05, Matthew Martin wrote:
> > > On Sat, Jan 29, 2022 at 06:25:32PM -0600, Matthew Martin wrote:
> > > > On Sat, Jan 29, 2022 at 07:10:00PM -0500, Ted Unangst wrote:
> > > > > I believe it would be better to add setrtable to id pledge.
> > > 
> > > ping
> > > 
> > > Also are there any opinions on adding LOGIN_SETRTABLE to doas?
> > 
> > I think this diff looks fine.
> > 
> > For doas, we can use setall with an extra note in the man page.
> 
> Final auction for oks. I think all the login.conf.d changes are in now.
> 
> Plan is add setrtable to pledge first so people don't get caught, then libc.

ping?

> > Index: doas.1
> > ===
> > RCS file: /home/cvs/src/usr.bin/doas/doas.1,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 doas.1
> > --- doas.1  16 Jan 2021 09:18:41 -  1.25
> > +++ doas.1  6 Feb 2022 18:41:53 -
> > @@ -54,6 +54,8 @@ and
> >  and the
> >  .Xr umask 2
> >  are set to values appropriate for the target user.
> > +Other values may also be set as specified in
> > +.Pa /etc/login.conf .
> >  .Ev DOAS_USER
> >  is set to the name of the user executing
> >  .Nm .
> > Index: doas.c
> > ===
> > RCS file: /home/cvs/src/usr.bin/doas/doas.c,v
> > retrieving revision 1.93
> > diff -u -p -r1.93 doas.c
> > --- doas.c  30 Nov 2021 20:08:15 -  1.93
> > +++ doas.c  6 Feb 2022 18:39:38 -
> > @@ -450,10 +450,7 @@ main(int argc, char **argv)
> > if (targpw == NULL)
> > errx(1, "no passwd entry for target");
> >  
> > -   if (setusercontext(NULL, targpw, target, LOGIN_SETGROUP |
> > -   LOGIN_SETPATH |
> > -   LOGIN_SETPRIORITY | LOGIN_SETRESOURCES | LOGIN_SETUMASK |
> > -   LOGIN_SETUSER) != 0)
> > +   if (setusercontext(NULL, targpw, target, LOGIN_SETALL) == -1)
> > errx(1, "failed to set user context for target");
> >  
> > if (pledge("stdio rpath exec", NULL) == -1)
> > 
> > 



Re: [External] : pfsync mutex

2022-02-24 Thread Alexandr Nedvedicky
Hello,

On Thu, Feb 24, 2022 at 10:42:58PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Hrvoje reported some crashes with pfsync, IPsec and parallel
> forwarding.  Some locks were missing around the tdb flags in pfsync.
> 
> ok?
> 

change looks good to me. I just have a single concern
about pfsync_update_tdb() which I think deserves some
attention:

2474 void
2475 pfsync_update_tdb(struct tdb *t, int output)
2476 {
2477 struct pfsync_softc *sc = pfsyncif;
2478 size_t nlen, sc_len;
2479 
2480 if (sc == NULL)
2481 return;
2482 
2483 if (!ISSET(t->tdb_flags, TDBF_PFSYNC)) {
2484 do {
2485 mtx_enter(>sc_tdb_mtx);
2486 nlen = sizeof(struct pfsync_tdb);
2487 
2488 if (TAILQ_EMPTY(>sc_tdb_q))
2489 nlen += sizeof(struct pfsync_subheader);
2490 
2491 sc_len = atomic_add_long_nv(>sc_len, nlen);
2492 if (sc_len > sc->sc_if.if_mtu) {
2493 atomic_sub_long(>sc_len, nlen);
2494 mtx_leave(>sc_tdb_mtx);
2495 pfsync_sendout();
2496 continue;
2497 }

my concern is we check ->tdb_flags at line 2483 _without_ holding
->tdb_mtx. I think we should re-check the flag as soon as we
grab ->tdb_mtx in the 'do {} while ()' loop. diff below does that.
it applies on top of your diff. with tweak suggested below the change
is OK sashan@

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index c934392af0f..620aa73060b 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2485,19 +2485,27 @@ pfsync_update_tdb(struct tdb *t, int output)
mtx_enter(>sc_tdb_mtx);
nlen = sizeof(struct pfsync_tdb);
 
+   mtx_enter(>tdb_mtx);
+   if (ISSET(t->tdb_flags, TDBF_PFSYNC)) {
+   /* we've lost race, no action for us then */
+   mtx_leave(>tdb_mtx);
+   mtx_leave(>sc_tdb_mtx);
+   break;
+   }
+
if (TAILQ_EMPTY(>sc_tdb_q))
nlen += sizeof(struct pfsync_subheader);
 
sc_len = atomic_add_long_nv(>sc_len, nlen);
if (sc_len > sc->sc_if.if_mtu) {
atomic_sub_long(>sc_len, nlen);
+   mtx_leave(>tdb_mtx);
mtx_leave(>sc_tdb_mtx);
pfsync_sendout();
continue;
}
 
TAILQ_INSERT_TAIL(>sc_tdb_q, t, tdb_sync_entry);
-   mtx_enter(>tdb_mtx);
SET(t->tdb_flags, TDBF_PFSYNC);
mtx_leave(>tdb_mtx);
 



pfsync mutex

2022-02-24 Thread Alexander Bluhm
Hi,

Hrvoje reported some crashes with pfsync, IPsec and parallel
forwarding.  Some locks were missing around the tdb flags in pfsync.

ok?

bluhm

Index: net/if_pfsync.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.299
diff -u -p -r1.299 if_pfsync.c
--- net/if_pfsync.c 25 Nov 2021 13:46:02 -  1.299
+++ net/if_pfsync.c 24 Feb 2022 20:59:11 -
@@ -295,7 +295,7 @@ voidpfsync_bulk_update(void *);
 void   pfsync_bulk_fail(void *);
 
 void   pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
-void   pfsync_drop_snapshot(struct pfsync_snapshot *);
+void   pfsync_drop_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
 
 void   pfsync_send_dispatch(void *);
 void   pfsync_send_pkt(struct mbuf *);
@@ -1618,7 +1618,7 @@ pfsync_grab_snapshot(struct pfsync_snaps
 }
 
 void
-pfsync_drop_snapshot(struct pfsync_snapshot *sn)
+pfsync_drop_snapshot(struct pfsync_snapshot *sn, struct pfsync_softc * sc)
 {
struct pf_state *st;
struct pfsync_upd_req_item *ur;
@@ -1645,10 +1645,14 @@ pfsync_drop_snapshot(struct pfsync_snaps
pool_put(>sn_sc->sc_pool, ur);
}
 
+   mtx_enter(>sc_tdb_mtx);
while ((t = TAILQ_FIRST(>sn_tdb_q)) != NULL) {
TAILQ_REMOVE(>sn_tdb_q, t, tdb_sync_entry);
+   mtx_enter(>tdb_mtx);
CLR(t->tdb_flags, TDBF_PFSYNC);
+   mtx_leave(>tdb_mtx);
}
+   mtx_leave(>sc_tdb_mtx);
 }
 
 int
@@ -1675,7 +1679,7 @@ pfsync_drop(struct pfsync_softc *sc)
struct pfsync_snapshot  sn;
 
pfsync_grab_snapshot(, sc);
-   pfsync_drop_snapshot();
+   pfsync_drop_snapshot(, sc);
 }
 
 void
@@ -1767,7 +1771,7 @@ pfsync_sendout(void)
if (m == NULL) {
sc->sc_if.if_oerrors++;
pfsyncstat_inc(pfsyncs_onomem);
-   pfsync_drop_snapshot();
+   pfsync_drop_snapshot(, sc);
return;
}
 
@@ -1777,7 +1781,7 @@ pfsync_sendout(void)
m_free(m);
sc->sc_if.if_oerrors++;
pfsyncstat_inc(pfsyncs_onomem);
-   pfsync_drop_snapshot();
+   pfsync_drop_snapshot(, sc);
return;
}
}
@@ -1835,14 +1839,18 @@ pfsync_sendout(void)
subh = (struct pfsync_subheader *)(m->m_data + offset);
offset += sizeof(*subh);
 
+   mtx_enter(>sc_tdb_mtx);
count = 0;
while ((t = TAILQ_FIRST(_tdb_q)) != NULL) {
TAILQ_REMOVE(_tdb_q, t, tdb_sync_entry);
pfsync_out_tdb(t, m->m_data + offset);
offset += sizeof(struct pfsync_tdb);
+   mtx_enter(>tdb_mtx);
CLR(t->tdb_flags, TDBF_PFSYNC);
+   mtx_leave(>tdb_mtx);
count++;
}
+   mtx_leave(>sc_tdb_mtx);
 
bzero(subh, sizeof(*subh));
subh->action = PFSYNC_ACT_TDB;
@@ -2489,9 +2497,11 @@ pfsync_update_tdb(struct tdb *t, int out
}
 
TAILQ_INSERT_TAIL(>sc_tdb_q, t, tdb_sync_entry);
-   mtx_leave(>sc_tdb_mtx);
-
+   mtx_enter(>tdb_mtx);
SET(t->tdb_flags, TDBF_PFSYNC);
+   mtx_leave(>tdb_mtx);
+
+   mtx_leave(>sc_tdb_mtx);
t->tdb_updates = 0;
} while (0);
} else {
@@ -2499,10 +2509,12 @@ pfsync_update_tdb(struct tdb *t, int out
schednetisr(NETISR_PFSYNC);
}
 
+   mtx_enter(>tdb_mtx);
if (output)
SET(t->tdb_flags, TDBF_PFSYNC_RPL);
else
CLR(t->tdb_flags, TDBF_PFSYNC_RPL);
+   mtx_leave(>tdb_mtx);
 }
 
 void
@@ -2517,7 +2529,9 @@ pfsync_delete_tdb(struct tdb *t)
mtx_enter(>sc_tdb_mtx);
 
TAILQ_REMOVE(>sc_tdb_q, t, tdb_sync_entry);
+   mtx_enter(>tdb_mtx);
CLR(t->tdb_flags, TDBF_PFSYNC);
+   mtx_leave(>tdb_mtx);
 
nlen = sizeof(struct pfsync_tdb);
if (TAILQ_EMPTY(>sc_tdb_q))



unwind(8): use parse_edns_from_pkt

2022-02-24 Thread Florian Obser


Upstream renamed parse_extract_edns to
parse_extract_edns_from_response_msg and parse_edns_from_pkt to
parse_edns_from_query_pkt in the upcomming libunbound 1.15.0
update. Both funktions work equally well for us but it would look weird
to use the "from_response_msg" function on the query so switch to
parse_edns_from_pkt in preparation for the libunbound update.

OK?

diff --git frontend.c frontend.c
index 1b6333da22c..3a2ee87dea8 100644
--- frontend.c
+++ frontend.c
@@ -734,6 +734,7 @@ void
 handle_query(struct pending_query *pq)
 {
struct query_imsgquery_imsg;
+   struct query_infoskip;
struct bl_node   find;
int  rcode;
char*str;
@@ -773,7 +774,12 @@ handle_query(struct pending_query *pq)
goto send_answer;
}
 
-   rcode = parse_extract_edns(pq->qmsg, >edns, pq->region);
+   sldns_buffer_rewind(pq->qbuf);
+   if (!query_info_parse(, pq->qbuf)) {
+   error_answer(pq, LDNS_RCODE_SERVFAIL);
+   goto send_answer;
+   }
+   rcode = parse_edns_from_pkt(pq->qbuf, >edns, pq->region);
if (rcode != LDNS_RCODE_NOERROR) {
error_answer(pq, rcode);
goto send_answer;
@@ -999,6 +1005,7 @@ void
 resend_dns64_query(struct pending_query *opq) {
struct pending_query*pq;
struct query_imsgquery_imsg;
+   struct query_infoskip;
int  rcode;
char dname[LDNS_MAX_DOMAINLEN + 1];
 
@@ -1059,7 +1066,12 @@ resend_dns64_query(struct pending_query *opq) {
goto drop;
}
 
-   rcode = parse_extract_edns(pq->qmsg, >edns, pq->region);
+   sldns_buffer_rewind(pq->qbuf);
+   if (!query_info_parse(, pq->qbuf)) {
+   error_answer(pq, LDNS_RCODE_SERVFAIL);
+   goto send_answer;
+   }
+   rcode = parse_edns_from_pkt(pq->qbuf, >edns, pq->region);
if (rcode != LDNS_RCODE_NOERROR) {
error_answer(pq, rcode);
goto send_answer;

-- 
I'm not entirely sure you are real.



Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Visa Hankala
On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> claudio and i came up with the following, which is to have tun_dev_open
> check the state of the vnode associated with the current open call
> after all the sleeping and potential tun_clone_destroy and
> tun_clone_create calls. if the vnode has been made bad/dead after
> all the sleeping, it returns with ENXIO.
> 
> this appears to fix the issue. i had a test program that would trigger
> the KASSERT in a matter of seconds and it's been running for an hour
> now.
> 
> here's the diff.

The diff looks reasonable.

OK visa@

I guess a similar issue can arise with some other detachable devices
as well. However, the problem is most acute with things like tun(4)
that can be created and destroyed programmatically on a whim at high
speed.

spec_open() could check for vnode revocation after completing d_open.
Of course, that would not fix the problem with tun. The check would
allow error reporting to happen earlier, but there would be no
fundamental shift in behaviour, I think.

> Index: if_tun.c
> ===
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.234
> diff -u -p -r1.234 if_tun.c
> --- if_tun.c  16 Feb 2022 02:22:39 -  1.234
> +++ if_tun.c  24 Feb 2022 08:08:38 -
> @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
>   struct ifnet *ifp;
>   int error;
>   u_short stayup = 0;
> + struct vnode *vp;
>  
>   char name[IFNAMSIZ];
>   unsigned int rdomain;
>  
> + /*
> +  * Find the vnode associated with this open before we sleep
> +  * and let something else revoke it. Our caller has a reference
> +  * to it so we don't need to account for it.
> +  */
> + if (!vfinddev(dev, VCHR, ))
> + panic("%s vfinddev failed", __func__);
> +
>   snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
>   rdomain = rtable_l2(p->p_p->ps_rtableid);
>  
> @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
>   /* XXX if_clone_destroy if stayup? */
>   goto done;
>   }
> + }
> +
> + /* Has tun_clone_destroy torn the rug out under us? */
> + if (vp->v_type == VBAD) {
> + error = ENXIO;
> + goto done;
>   }
>  
>   if (sc->sc_dev != 0) {
> 



Re: lldb(1) libedit support

2022-02-24 Thread Jeremie Courreges-Anglas
On Tue, Feb 15 2022, Greg Steuck  wrote:
> Andrei  writes:
>
>> The lldb(1) debugger was recently added in base and as I was playing around 
>> with it I noticed the lack of line editing functionality.
>>
>> This is because currently lldb is built without support for
>> libedit. It would be nice to have libedit in base such to be able to
>> link lldb against it, as the lack of line editing functionality makes
>> it difficult to use.
>
> libedit is in base and gnu/usr.bin/clang/lldb/Makefile even links
> lldb against libedit:
> : LDADD+= -lcurses -ledit -lpanel
>
> Curiously, trying to enable libedit support in lldb with the following
> patch didn't seem to help (arrow keys still show ^[[A and such).
>
> modified   gnu/usr.bin/clang/include/lldb/Host/Config.h
> @@ -41,7 +41,7 @@
>  
>  #define CURSES_HAVE_NCURSES_CURSES_H 0
>  
> -#define LLDB_ENABLE_LIBEDIT 0
> +#define LLDB_ENABLE_LIBEDIT 1
>  
>  #define LLDB_ENABLE_LIBXML2 0

Here's what the devel/llvm cmake build system defines.  Editing in emacs
mode (default) seems to work but I have tested very lightly so far.

Works for you?  ok?


diff --git gnu/usr.bin/clang/include/lldb/Host/Config.h 
gnu/usr.bin/clang/include/lldb/Host/Config.h
index 184c61330a6..3a07fd60667 100644
--- gnu/usr.bin/clang/include/lldb/Host/Config.h
+++ gnu/usr.bin/clang/include/lldb/Host/Config.h
@@ -9,9 +9,9 @@
 #ifndef LLDB_HOST_CONFIG_H
 #define LLDB_HOST_CONFIG_H
 
-#define LLDB_EDITLINE_USE_WCHAR 0
+#define LLDB_EDITLINE_USE_WCHAR 1
 
-#define LLDB_HAVE_EL_RFUNC_T 0
+#define LLDB_HAVE_EL_RFUNC_T 1
 
 #define HAVE_SYS_TYPES_H 1
 
@@ -37,11 +37,11 @@
 
 #define LLDB_ENABLE_LZMA 0
 
-#define LLDB_ENABLE_CURSES 0
+#define LLDB_ENABLE_CURSES 1
 
 #define CURSES_HAVE_NCURSES_CURSES_H 0
 
-#define LLDB_ENABLE_LIBEDIT 0
+#define LLDB_ENABLE_LIBEDIT 1
 
 #define LLDB_ENABLE_LIBXML2 0
 

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: bgpd start using path_id_tx

2022-02-24 Thread Theo Buehler
On Thu, Feb 24, 2022 at 02:08:13PM +0100, Claudio Jeker wrote:
> This is one small step closer to support add-path send side.
> We store the path_id_tx on the prefix and we can adjust a few places to
> make use of that field. Now it is always 0 so nothing changes in the end
> apart from removing some XXX comments.

ok tb

> 
> -- 
> :wq Claudio
> 
> Index: rde.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.534
> diff -u -p -r1.534 rde.c
> --- rde.c 6 Feb 2022 09:51:19 -   1.534
> +++ rde.c 24 Feb 2022 13:01:13 -
> @@ -2422,7 +2422,7 @@ rde_dump_rib_as(struct prefix *p, struct
>   }
>   } else {
>   if (peer_has_add_path(peer, p->pt->aid, CAPA_AP_SEND)) {
> - rib.path_id = 0;/* XXX add-path send */
> + rib.path_id = p->path_id_tx;
>   rib.flags |= F_PREF_PATH_ID;
>   }
>   }
> @@ -2507,12 +2507,16 @@ rde_dump_filter(struct prefix *p, struct
>   if ((req->flags & F_CTL_INVALID) &&
>   (asp->flags & F_ATTR_PARSE_ERR) == 0)
>   return;
> - /*
> -  * XXX handle out specially since then we want to match against our
> -  * path ids.
> -  */
> - if ((req->flags & F_CTL_HAS_PATHID) && req->path_id != p->path_id)
> - return;
> + if ((req->flags & F_CTL_HAS_PATHID)) {
> + /* Match against the transmit path id if adjout is used.  */
> + if (adjout) {
> + if (req->path_id != p->path_id_tx)
> + return;
> + } else {
> + if (req->path_id != p->path_id)
> + return;
> + }
> + }
>   if (req->as.type != AS_UNDEF &&
>   !aspath_match(asp->aspath, >as, 0))
>   return;
> Index: rde_update.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
> retrieving revision 1.132
> diff -u -p -r1.132 rde_update.c
> --- rde_update.c  6 Feb 2022 09:51:19 -   1.132
> +++ rde_update.c  24 Feb 2022 13:01:13 -
> @@ -632,8 +632,7 @@ up_dump_prefix(u_char *buf, int len, str
>   if (peer_has_add_path(peer, p->pt->aid, CAPA_AP_SEND)) {
>   if (len <= wpos + (int)sizeof(pathid))
>   break;
> - /* XXX add-path send side */
> - pathid = 0;
> + pathid = htonl(p->path_id_tx);
>   memcpy(buf + wpos, , sizeof(pathid));
>   wpos += sizeof(pathid);
>   }
> 



bgpd start using path_id_tx

2022-02-24 Thread Claudio Jeker
This is one small step closer to support add-path send side.
We store the path_id_tx on the prefix and we can adjust a few places to
make use of that field. Now it is always 0 so nothing changes in the end
apart from removing some XXX comments.

-- 
:wq Claudio

Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.534
diff -u -p -r1.534 rde.c
--- rde.c   6 Feb 2022 09:51:19 -   1.534
+++ rde.c   24 Feb 2022 13:01:13 -
@@ -2422,7 +2422,7 @@ rde_dump_rib_as(struct prefix *p, struct
}
} else {
if (peer_has_add_path(peer, p->pt->aid, CAPA_AP_SEND)) {
-   rib.path_id = 0;/* XXX add-path send */
+   rib.path_id = p->path_id_tx;
rib.flags |= F_PREF_PATH_ID;
}
}
@@ -2507,12 +2507,16 @@ rde_dump_filter(struct prefix *p, struct
if ((req->flags & F_CTL_INVALID) &&
(asp->flags & F_ATTR_PARSE_ERR) == 0)
return;
-   /*
-* XXX handle out specially since then we want to match against our
-* path ids.
-*/
-   if ((req->flags & F_CTL_HAS_PATHID) && req->path_id != p->path_id)
-   return;
+   if ((req->flags & F_CTL_HAS_PATHID)) {
+   /* Match against the transmit path id if adjout is used.  */
+   if (adjout) {
+   if (req->path_id != p->path_id_tx)
+   return;
+   } else {
+   if (req->path_id != p->path_id)
+   return;
+   }
+   }
if (req->as.type != AS_UNDEF &&
!aspath_match(asp->aspath, >as, 0))
return;
Index: rde_update.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.132
diff -u -p -r1.132 rde_update.c
--- rde_update.c6 Feb 2022 09:51:19 -   1.132
+++ rde_update.c24 Feb 2022 13:01:13 -
@@ -632,8 +632,7 @@ up_dump_prefix(u_char *buf, int len, str
if (peer_has_add_path(peer, p->pt->aid, CAPA_AP_SEND)) {
if (len <= wpos + (int)sizeof(pathid))
break;
-   /* XXX add-path send side */
-   pathid = 0;
+   pathid = htonl(p->path_id_tx);
memcpy(buf + wpos, , sizeof(pathid));
wpos += sizeof(pathid);
}



Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Claudio Jeker
On Thu, Feb 24, 2022 at 08:56:59PM +1000, David Gwynne wrote:
> On Thu, Feb 24, 2022 at 11:13:48AM +0100, Claudio Jeker wrote:
> > On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> > > 
> > > here's the diff.
> > > 
> > > Index: if_tun.c
> > > ===
> > > RCS file: /cvs/src/sys/net/if_tun.c,v
> > > retrieving revision 1.234
> > > diff -u -p -r1.234 if_tun.c
> > > --- if_tun.c  16 Feb 2022 02:22:39 -  1.234
> > > +++ if_tun.c  24 Feb 2022 08:08:38 -
> > > @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
> > >   struct ifnet *ifp;
> > >   int error;
> > >   u_short stayup = 0;
> > > + struct vnode *vp;
> > >  
> > 
> > Why is there this empty line? It was there before but still wondering.
> 
> feng shui? laziness? i'll fix it later.
> 
> > >   char name[IFNAMSIZ];
> > >   unsigned int rdomain;
> > >  
> > > + /*
> > > +  * Find the vnode associated with this open before we sleep
> > > +  * and let something else revoke it. Our caller has a reference
> > > +  * to it so we don't need to account for it.
> > > +  */
> > > + if (!vfinddev(dev, VCHR, ))
> > > + panic("%s vfinddev failed", __func__);
> > > +
> > >   snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
> > >   rdomain = rtable_l2(p->p_p->ps_rtableid);
> > >  
> > > @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
> > >   /* XXX if_clone_destroy if stayup? */
> > >   goto done;
> > >   }
> > > + }
> > > +
> > > + /* Has tun_clone_destroy torn the rug out under us? */
> > > + if (vp->v_type == VBAD) {
> > > + error = ENXIO;
> > > + goto done;
> > >   }
> > >  
> > >   if (sc->sc_dev != 0) {
> > > 
> > 
> > OK claudio@
> > 
> > After sleeping over this I think this is the cleanest and simplest way
> > around this problem. A bit ugly that tun needs to peek into the vnode.
> > 
> > Another option is to split the clone destroy from the softc / device node.
> > Remove the VOP_REVOKE and actually allow tun to be destroyed and recreated
> > while open and in that case the device remains open and only the network
> > bits are destroyed and later recreated. So in your example from above:
> > 
> > > > - ifconfig tun0 create
> > > > - open /dev/tun0 -> fd 3
> > > > - ifconfig tun0 destroy
> > > > - ifconfig tun0 create
> > > > - write to fd 3
> > 
> > The write would be perfectly fine since the destroy did not distroy this
> > connection (only close(2) would do that). Actually a call to open
> > /dev/tun0 after the 2nd create would fail because the device is still
> > open. Doing this seems a lot more complex.
> 
> we talked about this a lot around the time of src/sys/net/if_tun.c
> r1.210. there seemed to be more weight on the side of the argument for
> VOP_REVOKE than against, and i still think that's the case now. tun
> going away and coming back in between open and write could go either
> way, but what about these:
> 
> - open() /dev/tun0, tun0 is destroyed, write() to tun0
> 
> should the write error? if we run with "only close can destroy the
> connection" does this mean the write will create tun0 again? in which
> rdomain should it be?

Normally, yes you would generate an error like EPIPE in such a case.
You may even need to send SIGPIPE but that is one of those nasty details.
 
> - open() /dev/tun0, tun0 is destroyed, read() from tun0
> 
> same as above?

Idealy you would signal EOF with a return of 0 (again similar to how pipes
behave).
 
> - begin blocking read of tun0, tun0 is destroyed, let's go shopping!
> 
> should the read wake up and return an error, or does it just block?

Again return 0.
 
> - poll on tun0, tun0 is destroyed
> 
> same as above?

poll should signal POLLHUP | POLLIN (iirc that's what pipes do).
 
> if we leave the /dev side of things operational if the interface goes
> away, then this would be inconsistent with something workign with bpf on
> the same interfaces. wouldnt this be inconsistent with hotplug devices
> and their /dev things?

Maybe, on the other hand it is how pipes, socketpairs and unix sockets
behave. As I said going down this road is possible but much more complex
because all these edgecases need to be defined. Also there may be bugs
triggered in userland apps with a tun(4) behaving like this.
 
> maybe destroy should be blocked if the /dev entry is open?
> 
> i dunno. it feels more natural that detaching or destroying an interface
> should push userland off it.

I think one can argue about benefits from either mode. The big problem is
that because of how if_clone locking works you can not block in the
destroy function or people will become very unhappy. You may be even able
to cause deadlocks.

As I said above I think your diff is the best solution to fix this race and
while implementing this without VOP_REVOKE is possible it is too complex.
I'm perfectly fine with the current behaviour and think it actually fits

Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread David Gwynne
On Thu, Feb 24, 2022 at 11:13:48AM +0100, Claudio Jeker wrote:
> On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> > 
> > here's the diff.
> > 
> > Index: if_tun.c
> > ===
> > RCS file: /cvs/src/sys/net/if_tun.c,v
> > retrieving revision 1.234
> > diff -u -p -r1.234 if_tun.c
> > --- if_tun.c16 Feb 2022 02:22:39 -  1.234
> > +++ if_tun.c24 Feb 2022 08:08:38 -
> > @@ -374,10 +374,19 @@ tun_dev_open(dev_t dev, const struct if_
> > struct ifnet *ifp;
> > int error;
> > u_short stayup = 0;
> > +   struct vnode *vp;
> >  
> 
> Why is there this empty line? It was there before but still wondering.

feng shui? laziness? i'll fix it later.

> > char name[IFNAMSIZ];
> > unsigned int rdomain;
> >  
> > +   /*
> > +* Find the vnode associated with this open before we sleep
> > +* and let something else revoke it. Our caller has a reference
> > +* to it so we don't need to account for it.
> > +*/
> > +   if (!vfinddev(dev, VCHR, ))
> > +   panic("%s vfinddev failed", __func__);
> > +
> > snprintf(name, sizeof(name), "%s%u", ifc->ifc_name, minor(dev));
> > rdomain = rtable_l2(p->p_p->ps_rtableid);
> >  
> > @@ -412,6 +421,12 @@ tun_dev_open(dev_t dev, const struct if_
> > /* XXX if_clone_destroy if stayup? */
> > goto done;
> > }
> > +   }
> > +
> > +   /* Has tun_clone_destroy torn the rug out under us? */
> > +   if (vp->v_type == VBAD) {
> > +   error = ENXIO;
> > +   goto done;
> > }
> >  
> > if (sc->sc_dev != 0) {
> > 
> 
> OK claudio@
> 
> After sleeping over this I think this is the cleanest and simplest way
> around this problem. A bit ugly that tun needs to peek into the vnode.
> 
> Another option is to split the clone destroy from the softc / device node.
> Remove the VOP_REVOKE and actually allow tun to be destroyed and recreated
> while open and in that case the device remains open and only the network
> bits are destroyed and later recreated. So in your example from above:
> 
> > > - ifconfig tun0 create
> > > - open /dev/tun0 -> fd 3
> > > - ifconfig tun0 destroy
> > > - ifconfig tun0 create
> > > - write to fd 3
> 
> The write would be perfectly fine since the destroy did not distroy this
> connection (only close(2) would do that). Actually a call to open
> /dev/tun0 after the 2nd create would fail because the device is still
> open. Doing this seems a lot more complex.

we talked about this a lot around the time of src/sys/net/if_tun.c
r1.210. there seemed to be more weight on the side of the argument for
VOP_REVOKE than against, and i still think that's the case now. tun
going away and coming back in between open and write could go either
way, but what about these:

- open() /dev/tun0, tun0 is destroyed, write() to tun0

should the write error? if we run with "only close can destroy the
connection" does this mean the write will create tun0 again? in which
rdomain should it be?

- open() /dev/tun0, tun0 is destroyed, read() from tun0

same as above?

- begin blocking read of tun0, tun0 is destroyed, let's go shopping!

should the read wake up and return an error, or does it just block?

- poll on tun0, tun0 is destroyed

same as above?

if we leave the /dev side of things operational if the interface goes
away, then this would be inconsistent with something workign with bpf on
the same interfaces. wouldnt this be inconsistent with hotplug devices
and their /dev things?

maybe destroy should be blocked if the /dev entry is open?

i dunno. it feels more natural that detaching or destroying an interface
should push userland off it.



Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread Claudio Jeker
On Thu, Feb 24, 2022 at 07:39:54PM +1000, David Gwynne wrote:
> On Mon, Feb 21, 2022 at 03:00:01PM +1000, David Gwynne wrote:
> > On Sun, Feb 20, 2022 at 10:30:22AM +1000, David Gwynne wrote:
> > > 
> > > 
> > > > On 20 Feb 2022, at 09:46, David Gwynne  wrote:
> > > > 
> > > > On Sat, Feb 19, 2022 at 02:58:08PM -0800, Greg Steuck wrote:
> > > >> There's no reproducer, but maybe this race is approachable without one?
> > > >> 
> > > >> dev = sc->sc_dev;
> > > >> if (dev) {
> > > >> struct vnode *vp;
> > > >> 
> > > >> if (vfinddev(dev, VCHR, ))
> > > >> VOP_REVOKE(vp, REVOKEALL);
> > > >> 
> > > >> KASSERT(sc->sc_dev == 0);
> > > >> }
> > > > 
> > > > this was my last run at it:
> > > > https://marc.info/?l=openbsd-tech=164489981621957=2
> > > > 
> > > > maybe i need another dvthing thread to push it a bit harder...
> > > 
> > > adding another dvthing thread or two made it blow up pretty quickly :(
> > 
> > so it is this section:
> > 
> > /* kick userland off the device */
> > dev = sc->sc_dev;
> > if (dev) {
> > struct vnode *vp;
> > 
> > if (vfinddev(dev, VCHR, ))
> > VOP_REVOKE(vp, REVOKEALL);
> > 
> > KASSERT(sc->sc_dev == 0);
> > }
> > 
> > my assumption was/is that VOP_REVOKE would call tunclose (or tapclose)
> > on the currently open tun (or tap) device, and swap the vfs backend out
> > behind the scenes with deadfs.
> > 
> > the context behind this is that isnt really a strong binding between an
> > open /dev entry (eg, /dev/tun0) and an instance of an interface (eg,
> > tun0). all the device entrypoints (eg, tunopen, tunread, tunwrite,
> > etc) pass a dev_t, and that's used to look up an interface instance to
> > work with.
> > 
> > the problem with this is an interface could be destroyed and recreated
> > in between calls to the device entrypoints. ie, you could do the
> > following high level steps:
> > 
> > - ifconfig tun0 create
> > - open /dev/tun0 -> fd 3
> > - ifconfig tun0 destroy
> > - ifconfig tun0 create
> > - write to fd 3
> > 
> > and that would send a packet on the newly created tun0 because it had
> > the same minor number as the previous one.
> > 
> > there was a lot of consensus that this was Not The Best(tm), and that if
> > a tun interface was destroyed while the /dev entry was open, it should
> > act like the interface was detached and the open /dev entry should stop
> > working. this is what VOP_REVOKE helps with. or it is supposed to help
> > with.
> > 
> > when we create a tun interface, it's added to a global list of tun
> > interfaces. when a tun device is opened, we look for the interface on
> > that list (and create it if it doesnt exist), and then check to see if
> > it is already open by looking at sc_dev. if sc_dev is 0, it's not open
> > and tunopen can set sc_dev to claim ownership of it. if sc_dev is
> > non-zero, then the device is busy and open fails.
> > 
> > tunclose clears sc_dev to say ownership is given up.
> > 
> > tun destroy checks sc_dev, and if it is != 0 then it knows something has
> > it open and will call VOP_REVOKE. VOP_REVOKE is supposed to do what i
> > described above, which is call tunclose on the programs behalf and swap
> > the vfs ops out.
> > 
> > what i'm seeing is that sometimes VOP_REVOKE gets called, it happily
> > returns 0, but tunclose is not called. this means sc_dev is not cleared,
> > and then this KASSERT fires.
> > 
> > ive tried changing it something like this in the destroy path:
> > 
> > -   KASSERT(sc->sc_dev == 0);
> > +   while (sc->sc_dev != 0)
> > +   tsleep_nsec(>sc_dev, PWAIT, "tunclose", INFSLP);
> > 
> > with tunclose calling wakeup(>dev) when it's finished, but this ends
> > up getting stuck in the tsleep.
> > 
> > however, if i cut the KASSERT out and let destroy keep going, i do see
> > tunclose get called against the now non-existent interface. this would
> > be fine, but now we're back where we started. if someone recreates tun0
> > after it's been destroyed, tunclose will find the new interface and try
> > to close it.
> > 
> > ive tried to follow what VOP_REVOKE actually does and how it finds
> > tunclose to call it, but it's pretty twisty and i got tired.
> > 
> > i guess my question at this point is are my assumptions about how
> > VOP_REVOKE works valid? specifically, should it reliably be calling
> > tunclose? if not, what causes tunclose to be called in the future and
> > why can't i wait for it in tun_clone_destroy?
> 
> claudio figured it out. his clue was that multiple concurrent calls
> to tunopen (or tapopen) will share a vnode. because tunopen can sleep,
> multiple programs can be inside tunopen for the same tun interface at
> the same time, all with references against the same vnode.
> 
> at the same time as this another thread/program can call VOP_REVOKE
> via tun_clone_destroy (eg, ifconfig tun1 destroy does this).
> VOP_REVOKE marks a vnode as bad, which in turn 

Re: assert "sc->sc_dev == NUM" failed in if_tun.c (2)

2022-02-24 Thread David Gwynne
On Mon, Feb 21, 2022 at 03:00:01PM +1000, David Gwynne wrote:
> On Sun, Feb 20, 2022 at 10:30:22AM +1000, David Gwynne wrote:
> > 
> > 
> > > On 20 Feb 2022, at 09:46, David Gwynne  wrote:
> > > 
> > > On Sat, Feb 19, 2022 at 02:58:08PM -0800, Greg Steuck wrote:
> > >> There's no reproducer, but maybe this race is approachable without one?
> > >> 
> > >> dev = sc->sc_dev;
> > >> if (dev) {
> > >> struct vnode *vp;
> > >> 
> > >> if (vfinddev(dev, VCHR, ))
> > >> VOP_REVOKE(vp, REVOKEALL);
> > >> 
> > >> KASSERT(sc->sc_dev == 0);
> > >> }
> > > 
> > > this was my last run at it:
> > > https://marc.info/?l=openbsd-tech=164489981621957=2
> > > 
> > > maybe i need another dvthing thread to push it a bit harder...
> > 
> > adding another dvthing thread or two made it blow up pretty quickly :(
> 
> so it is this section:
> 
> /* kick userland off the device */
> dev = sc->sc_dev;
> if (dev) {
> struct vnode *vp;
> 
> if (vfinddev(dev, VCHR, ))
> VOP_REVOKE(vp, REVOKEALL);
> 
> KASSERT(sc->sc_dev == 0);
> }
> 
> my assumption was/is that VOP_REVOKE would call tunclose (or tapclose)
> on the currently open tun (or tap) device, and swap the vfs backend out
> behind the scenes with deadfs.
> 
> the context behind this is that isnt really a strong binding between an
> open /dev entry (eg, /dev/tun0) and an instance of an interface (eg,
> tun0). all the device entrypoints (eg, tunopen, tunread, tunwrite,
> etc) pass a dev_t, and that's used to look up an interface instance to
> work with.
> 
> the problem with this is an interface could be destroyed and recreated
> in between calls to the device entrypoints. ie, you could do the
> following high level steps:
> 
> - ifconfig tun0 create
> - open /dev/tun0 -> fd 3
> - ifconfig tun0 destroy
> - ifconfig tun0 create
> - write to fd 3
> 
> and that would send a packet on the newly created tun0 because it had
> the same minor number as the previous one.
> 
> there was a lot of consensus that this was Not The Best(tm), and that if
> a tun interface was destroyed while the /dev entry was open, it should
> act like the interface was detached and the open /dev entry should stop
> working. this is what VOP_REVOKE helps with. or it is supposed to help
> with.
> 
> when we create a tun interface, it's added to a global list of tun
> interfaces. when a tun device is opened, we look for the interface on
> that list (and create it if it doesnt exist), and then check to see if
> it is already open by looking at sc_dev. if sc_dev is 0, it's not open
> and tunopen can set sc_dev to claim ownership of it. if sc_dev is
> non-zero, then the device is busy and open fails.
> 
> tunclose clears sc_dev to say ownership is given up.
> 
> tun destroy checks sc_dev, and if it is != 0 then it knows something has
> it open and will call VOP_REVOKE. VOP_REVOKE is supposed to do what i
> described above, which is call tunclose on the programs behalf and swap
> the vfs ops out.
> 
> what i'm seeing is that sometimes VOP_REVOKE gets called, it happily
> returns 0, but tunclose is not called. this means sc_dev is not cleared,
> and then this KASSERT fires.
> 
> ive tried changing it something like this in the destroy path:
> 
> - KASSERT(sc->sc_dev == 0);
> + while (sc->sc_dev != 0)
> + tsleep_nsec(>sc_dev, PWAIT, "tunclose", INFSLP);
> 
> with tunclose calling wakeup(>dev) when it's finished, but this ends
> up getting stuck in the tsleep.
> 
> however, if i cut the KASSERT out and let destroy keep going, i do see
> tunclose get called against the now non-existent interface. this would
> be fine, but now we're back where we started. if someone recreates tun0
> after it's been destroyed, tunclose will find the new interface and try
> to close it.
> 
> ive tried to follow what VOP_REVOKE actually does and how it finds
> tunclose to call it, but it's pretty twisty and i got tired.
> 
> i guess my question at this point is are my assumptions about how
> VOP_REVOKE works valid? specifically, should it reliably be calling
> tunclose? if not, what causes tunclose to be called in the future and
> why can't i wait for it in tun_clone_destroy?

claudio figured it out. his clue was that multiple concurrent calls
to tunopen (or tapopen) will share a vnode. because tunopen can sleep,
multiple programs can be inside tunopen for the same tun interface at
the same time, all with references against the same vnode.

at the same time as this another thread/program can call VOP_REVOKE
via tun_clone_destroy (eg, ifconfig tun1 destroy does this).
VOP_REVOKE marks a vnode as bad, which in turn means that subsequent
open()s of a tun interface will get a brand new vnode.

so multiple threads holding references to a vnode can be sleeping in
tun_dev_open on the interface cloner lock. one thread wins and takes
ownership of the tun interface, then another thread can destroy that tun