Re: switch: allow datapath_id and maxflow ioctls for non-root
On Fri, Jul 31, 2020 at 06:28:32AM +0200, Klemens Nanni wrote: > ifconfig(8) detects switch(4) through its unique SIOCSWSDPID ioctl and > further does another switch specific ioctl for the default output > regardless of configuration and/or members: > > SIOCSWSDPID struct ifbrparam > Set the datapath_id in the OpenFlow protocol of the switch named > in ifbrp_name to the value in the ifbrpu_datapath field. Sorry, stupid copy/pasta mistake; ifconfig uses the read-only "get" ioctl of course for regular output which my diff addresses: SIOCSWGDPID Retrieve the datapath_id in the OpenFlow protocol of the switch named in ifbrp_name into the ifbrpu_datapath field.
Re: pipex(4): kill pipexintr()
On Thu, 30 Jul 2020 22:43:10 +0300 Vitaliy Makkoveev wrote: > On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote: >> On Thu, 30 Jul 2020 15:34:09 +0300 >> Vitaliy Makkoveev wrote: >> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote: >> >> If the diff removes the queue, then the pipex input routine is >> >> executed by the NIC's interrupt handler. >> >> >> >> The queues had been made to avoid that kind of situations. >> > >> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we >> > call pipex_common_input() with `useq' argument set to '0', so we don't >> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to >> > ipv{4,6}_input(). >> >> You are right. Sorry, I forgot about this which I did that by myself. > > I'm interesting the reason why you did that. I remembered, it was first step of MP steps for pipex. At that time, I discussed with mpi, he suggested like below. 1. stop enqueueing packets for PPPoE 2. try not take a kernel lock before calling gre_input(), then we can also stop enqueueing packets for PPTP(GRE) 3. for L2TP, keep the queue and change the netisr to an unlocked task
switch: allow datapath_id and maxflow ioctls for non-root
ifconfig(8) detects switch(4) through its unique SIOCSWSDPID ioctl and further does another switch specific ioctl for the default output regardless of configuration and/or members: SIOCSWSDPID struct ifbrparam Set the datapath_id in the OpenFlow protocol of the switch named in ifbrp_name to the value in the ifbrpu_datapath field. SIOCSWGMAXFLOW struct ifbrparam Retrieve the maximum number of flows in the OpenFlow protocol of the switch named in ifbrp_name into the ifbrp_maxflow field. This is how it should look like: # ifconfig switch0 create # ifconfig switch0 switch0: flags=0<> index 29 llprio 3 groups: switch datapath 0x5bea2b5b8e2456cf maxflow 1 maxgroup 1000 But using ifconfig as unprivileged user makes it fail switch(4) interfaces as such and thus interprets them as bridge(4) instead: $ ifconfig switch0 switch0: flags=0<> index 29 llprio 3 groups: switch priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp designated: id 00:00:00:00:00:00 priority 0 This is because the above mentioned ioctls are listed together with all other bridge and switch related ioctls that set or write things. Getting datapath_id and maxflow values however is read-only and crucial for ifconfig as demonstrated above, so I'd like to move them out of the root check to fix ifconfig. Feedback? OK? Index: sys/net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.616 diff -u -p -r1.616 if.c --- sys/net/if.c24 Jul 2020 18:17:14 - 1.616 +++ sys/net/if.c31 Jul 2020 04:13:40 - @@ -2170,13 +2170,15 @@ ifioctl(struct socket *so, u_long cmd, c case SIOCBRDGSIFCOST: case SIOCBRDGSTXHC: case SIOCBRDGSPROTO: - case SIOCSWGDPID: case SIOCSWSPORTNO: - case SIOCSWGMAXFLOW: #endif if ((error = suser(p)) != 0) break; /* FALLTHROUGH */ +#if NBRIDGE > 0 + case SIOCSWGDPID: + case SIOCSWGMAXFLOW: +#endif default: error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL, (struct mbuf *) cmd, (struct mbuf *) data,
cat(1): add more restrictive pledge(2)
I have to say I'm only a beginner to C but hopefully my patch is good. This patch adds a second and more restrictive pledge (only "stdio" instead of "stdio rpath") after the getopt loop if there is no input file or if the input file is "-" (stdin) or a sequence of repeated instances of "-". It doesn't move argv past the last "-", and doesn't pledge if it runs into an input file other than "-". I've compiled it and tested it with ktrace(1) and kdump(1) and it appears to work as expected and pledge correctly with at least these invocations: $ echo test | ./cat -uv # pledge("stdio", NULL); $ echo test | ./cat -uv - # pledge("stdio", NULL); $ echo test | ./cat # pledge("stdio", NULL); $ echo test | ./cat - - - # pledge("stdio", NULL); $ echo test | ./cat - # pledge("stdio", NULL); $ echo test | ./cat - cat.c # pledge("stdio rpath", NULL); $ echo test | ./cat cat.c - # pledge("stdio rpath", NULL); Index: bin/cat/cat.c === RCS file: /cvs/src/bin/cat/cat.c,v retrieving revision 1.27 diff -u -p -u -p -r1.27 cat.c --- bin/cat/cat.c 28 Jun 2019 13:34:58 - 1.27 +++ bin/cat/cat.c 30 Jul 2020 23:21:14 - @@ -94,7 +94,26 @@ main(int argc, char *argv[]) "usage: %s [-benstuv] [file ...]\n", __progname); return 1; } + argc -= optind; argv += optind; + + if (argc) { + if (!strcmp(*argv, "-")) { + do { + if (argc == 1) { + if (pledge("stdio", NULL) == -1) + err(1, "pledge"); + argc--, argv++; + break; + } else + argc--, argv++; + } while (argc && !strcmp(*argv, "-")); + argc++, argv--; + } + } else { + if (pledge("stdio", NULL) == -1) + err(1, "pledge"); + } if (bflag || eflag || nflag || sflag || tflag || vflag) cook_args(argv);
Re: pipex(4): kill pipexintr()
On Thu, 30 Jul 2020 22:43:10 +0300 Vitaliy Makkoveev wrote: > On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote: >> On Thu, 30 Jul 2020 15:34:09 +0300 >> Vitaliy Makkoveev wrote: >> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote: >> >> Hi, >> >> >> >> sys/net/if_ethersubr.c: >> >> 372 void >> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m) >> >> (snip) >> >> 519 #if NPPPOE > 0 || defined(PIPEX) >> >> 520 case ETHERTYPE_PPPOEDISC: >> >> 521 case ETHERTYPE_PPPOE: >> >> 522 if (m->m_flags & (M_MCAST | M_BCAST)) >> >> 523 goto dropanyway; >> >> 524 #ifdef PIPEX >> >> 525 if (pipex_enable) { >> >> 526 struct pipex_session *session; >> >> 527 >> >> 528 if ((session = pipex_pppoe_lookup_session(m)) >> >> != NULL) { >> >> 529 pipex_pppoe_input(m, session); >> >> 530 return; >> >> 531 } >> >> 532 } >> >> 533 #endif >> >> >> >> previously a packet which branchces to #529 is enqueued. >> >> >> >> If the diff removes the queue, then the pipex input routine is >> >> executed by the NIC's interrupt handler. >> >> >> >> The queues had been made to avoid that kind of situations. >> > >> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we >> > call pipex_common_input() with `useq' argument set to '0', so we don't >> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to >> > ipv{4,6}_input(). >> >> You are right. Sorry, I forgot about this which I did that by myself. >> > > I'm interesting the reason why you did that. > >> >> Also I don't see a relation of the use-after-free problem and killing >> >> queues. Can't we fix the problem unless we kill the queues? >> > >> > Yes we can. Reference counters allow us to keep orphan sessions in these >> > queues without use after free issue. >> > >> > I will wait your commentaries current enqueuing before to do something. >> >> I have another concern. >> >> You might know, when L2TP/IPsec is used heavily, the crypto thread >> uses 100% of 1 CPU core. In that case, that thread becomes like >> below: >> >> crypto thread -> udp_userreq -> pipex_l2tp_input >> >> some clients are using MPPE(RC4 encryption) on CCP. It's not so >> light. >> >> How do we offload this for CPUs? I am thinking that "pipex" can have >> a dedicated thread. Do we have another scenario? >> > > I suppose you mean udp_input(). What is you call "crypto thread"? I did > a little backtrace but I didn't find this thread. > > ether_resolve > if_input_local > ipv4_input > ip_input_if > ip_ours > ip_deliver > udp_input (through pr_input) > pipex_l2tp_input > > ipi{,6}_mloopback > if_input_local > ipv4_input > ... > udp_input (through pr_input) > pipex_l2tp_input > > loinput > if_input_local > ipv4_input > ... > udp_input (through pr_input) > pipex_l2tp_input > > Also various pseudo drivers call ipv{4,6}_input() and underlay > udp_unput() too. > > Except nfs, we call udp_usrreq() through socket layer only. Do you mean > userland as "crypto thread"? Sorry, udp_usrreq() should be usr_input() and crypto thread meant a kthread for crypto_taskq_mp_safe, whose name is "crynlk" (see crypto_init()). A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is processed like: ipv4_input ... udp_input ipsec_common_input esp_input crypto_dispatch => crypto_taskq_mp_safe kthread "crynlk" crypto_invoke ... (*1) crypto_done esp_input_cb ipsec_common_input_cb ip_deliver udp_input pipex_l2tp_input pipex_common_input (*2) pipex_ppp_input pipex_mppe_input (*3) pipex_ppp_input pipex_ip_input ipv4_input ... At *2 there was a queue. "crynlk" is a busy thread, since it is doing decryption at *1. I think it's better pipex input is be done by another thread than crypto since it also has decryption at *3. > But upd_input(), udp_usrreq() and pipexintr() are serialized by > NET_LOCK(). We should move pipex(4) under it's own rwlock to allow them > simultaneous execution. Also, should we move outgoing pppx(4) mbufs to > queue too? pppx has a dedicated queue for each pppx interface. I think that is a reason why it skips pipexoutq. Also I noticed there is a generic queue for IP. pipex_ppp_output pipex_l2tp_output ipsend() => net_tq For PPPoE, pipex_ppp_output pipex_pppoe_output
Re: pipex(4): kill pipexintr()
On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote: > On Thu, 30 Jul 2020 15:34:09 +0300 > Vitaliy Makkoveev wrote: > > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote: > >> Hi, > >> > >> sys/net/if_ethersubr.c: > >> 372 void > >> 373 ether_input(struct ifnet *ifp, struct mbuf *m) > >> (snip) > >> 519 #if NPPPOE > 0 || defined(PIPEX) > >> 520 case ETHERTYPE_PPPOEDISC: > >> 521 case ETHERTYPE_PPPOE: > >> 522 if (m->m_flags & (M_MCAST | M_BCAST)) > >> 523 goto dropanyway; > >> 524 #ifdef PIPEX > >> 525 if (pipex_enable) { > >> 526 struct pipex_session *session; > >> 527 > >> 528 if ((session = pipex_pppoe_lookup_session(m)) > >> != NULL) { > >> 529 pipex_pppoe_input(m, session); > >> 530 return; > >> 531 } > >> 532 } > >> 533 #endif > >> > >> previously a packet which branchces to #529 is enqueued. > >> > >> If the diff removes the queue, then the pipex input routine is > >> executed by the NIC's interrupt handler. > >> > >> The queues had been made to avoid that kind of situations. > > > > It's not enqueued in pppoe case. According pipex_pppoe_input() code we > > call pipex_common_input() with `useq' argument set to '0', so we don't > > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to > > ipv{4,6}_input(). > > You are right. Sorry, I forgot about this which I did that by myself. > I'm interesting the reason why you did that. > >> Also I don't see a relation of the use-after-free problem and killing > >> queues. Can't we fix the problem unless we kill the queues? > > > > Yes we can. Reference counters allow us to keep orphan sessions in these > > queues without use after free issue. > > > > I will wait your commentaries current enqueuing before to do something. > > I have another concern. > > You might know, when L2TP/IPsec is used heavily, the crypto thread > uses 100% of 1 CPU core. In that case, that thread becomes like > below: > > crypto thread -> udp_userreq -> pipex_l2tp_input > > some clients are using MPPE(RC4 encryption) on CCP. It's not so > light. > > How do we offload this for CPUs? I am thinking that "pipex" can have > a dedicated thread. Do we have another scenario? > I suppose you mean udp_input(). What is you call "crypto thread"? I did a little backtrace but I didn't find this thread. ether_resolve if_input_local ipv4_input ip_input_if ip_ours ip_deliver udp_input (through pr_input) pipex_l2tp_input ipi{,6}_mloopback if_input_local ipv4_input ... udp_input (through pr_input) pipex_l2tp_input loinput if_input_local ipv4_input ... udp_input (through pr_input) pipex_l2tp_input Also various pseudo drivers call ipv{4,6}_input() and underlay udp_unput() too. Except nfs, we call udp_usrreq() through socket layer only. Do you mean userland as "crypto thread"? But upd_input(), udp_usrreq() and pipexintr() are serialized by NET_LOCK(). We should move pipex(4) under it's own rwlock to allow them simultaneous execution. Also, should we move outgoing pppx(4) mbufs to queue too?
Re: Improve ure(4) performance
On Thu, Jul 30, 2020 at 07:58:23AM -0700, Jonathon Fletcher wrote: > > Mikolaj, > > I hope the patch has been stable for you. I do have an update - it appears > that a splx(s) got lost along the way (from the end of ure_txeof). This > patch adds that back in and has some minor cleanup (variable name, cleanup > defines, adjust the setting of flags and buffer sizes based on device type). > I have been running this for a couple of days now. Yes, no issues so far. I managed to have on Google Meet call, audio only and one Zoom call (was surprised it actually worked on OpenBSD, but needed to fake user agent to Linux). Network card didn't fail like before your diff during VoIP calls. That's really great. I just compiled new version of the kernel with your below patch, will reboot my laptop tonight to switch to that new kernel. Thank you! > Jonathon > > > Index: sys/dev/usb/if_urereg.h > === > RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v > retrieving revision 1.8 > diff -u -p -u -r1.8 if_urereg.h > --- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 - 1.8 > +++ sys/dev/usb/if_urereg.h 28 Jul 2020 15:30:41 - > @@ -494,28 +494,30 @@ struct ure_txpkt { > #define URE_ENDPT_TX 1 > #define URE_ENDPT_MAX2 > > -#define URE_TX_LIST_CNT 8 > +#define URE_TX_LIST_CNT 1 > #define URE_RX_LIST_CNT 1 > -#define URE_RX_BUF_ALIGNsizeof(uint64_t) > +#define URE_TX_BUF_ALIGN4 > +#define URE_RX_BUF_ALIGN8 > > -#define URE_TXBUFSZ 16384 > -#define URE_8152_RXBUFSZ16384 > -#define URE_8153_RXBUFSZ32768 > +#define URE_TX_BUFSZ16384 > +#define URE_8152_RX_BUFSZ 16384 > +#define URE_8153_RX_BUFSZ 32768 > > struct ure_chain { > struct ure_softc*uc_sc; > struct usbd_xfer*uc_xfer; > char*uc_buf; > - struct mbuf *uc_mbuf; > - int uc_accum; > - int uc_idx; > + uint32_tuc_buflen; > + uint32_tuc_bufmax; > + struct mbuf_listuc_mbuf_list; > + SLIST_ENTRY(ure_chain) uc_list; > + uint8_t uc_idx; > }; > > struct ure_cdata { > - struct ure_chaintx_chain[URE_TX_LIST_CNT]; > - struct ure_chainrx_chain[URE_RX_LIST_CNT]; > - int tx_prod; > - int tx_cnt; > + struct ure_chainure_rx_chain[URE_RX_LIST_CNT]; > + struct ure_chainure_tx_chain[URE_TX_LIST_CNT]; > + SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free; > }; > > struct ure_softc { > @@ -541,7 +543,7 @@ struct ure_softc { > > struct timeval ure_rx_notice; > int ure_rxbufsz; > - int ure_tx_list_cnt; > + int ure_txbufsz; > > int ure_phyno; > > Index: sys/dev/usb/if_ure.c > === > RCS file: /cvs/src/sys/dev/usb/if_ure.c,v > retrieving revision 1.16 > diff -u -p -u -r1.16 if_ure.c > --- sys/dev/usb/if_ure.c 10 Jul 2020 13:26:41 - 1.16 > +++ sys/dev/usb/if_ure.c 28 Jul 2020 22:56:29 - > @@ -117,11 +117,13 @@ voidure_miibus_writereg(struct device > void ure_lock_mii(struct ure_softc *); > void ure_unlock_mii(struct ure_softc *); > > -int ure_encap(struct ure_softc *, struct mbuf *); > +int ure_encap_txpkt(struct mbuf *, char *, uint32_t); > +int ure_encap_xfer(struct ifnet *, struct ure_softc *, struct > ure_chain *); > void ure_rxeof(struct usbd_xfer *, void *, usbd_status); > void ure_txeof(struct usbd_xfer *, void *, usbd_status); > -int ure_rx_list_init(struct ure_softc *); > -int ure_tx_list_init(struct ure_softc *); > +int ure_xfer_list_init(struct ure_softc *, struct ure_chain *, > + uint32_t, int); > +void ure_xfer_list_free(struct ure_softc *, struct ure_chain *, int); > > void ure_tick_task(void *); > void ure_tick(void *); > @@ -625,12 +627,36 @@ void > ure_watchdog(struct ifnet *ifp) > { > struct ure_softc*sc = ifp->if_softc; > + struct ure_chain*c; > + usbd_status err; > + int i, s; > + > + ifp->if_timer = 0; > > if (usbd_is_dying(sc->ure_udev)) > return; > > + if ((ifp->if_flags & (IFF_RUNNING|IFF_UP)) != (IFF_RUNNING|IFF_UP)) > + return; > + > + sc = ifp->if_softc; > + s = splnet(); > + > ifp->if_oerrors++; > - printf("%s: watchdog timeout\n", sc->ure_dev.dv_xname); > + DPRINTF(("%s: watchdog timeout\n", sc->ure_dev.dv_xname)); > + > +
Re: Improve ure(4) performance
On Mon, Jul 27, 2020 at 07:13:33PM +, Mikolaj Kucharski wrote: > On Mon, Jul 27, 2020 at 11:58:02AM -0700, Jonathon Fletcher wrote: > > > > Are you ok using patch's -l option for this patch? > > > > Sure, can do. New kernel compiled, will switch my machine tonight. Will > let you know how things go. For meetings I would need couple of days, as > I don't have Google Meet calls that often. Mikolaj, I hope the patch has been stable for you. I do have an update - it appears that a splx(s) got lost along the way (from the end of ure_txeof). This patch adds that back in and has some minor cleanup (variable name, cleanup defines, adjust the setting of flags and buffer sizes based on device type). I have been running this for a couple of days now. Jonathon Index: sys/dev/usb/if_urereg.h === RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v retrieving revision 1.8 diff -u -p -u -r1.8 if_urereg.h --- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 - 1.8 +++ sys/dev/usb/if_urereg.h 28 Jul 2020 15:30:41 - @@ -494,28 +494,30 @@ struct ure_txpkt { #define URE_ENDPT_TX 1 #define URE_ENDPT_MAX 2 -#defineURE_TX_LIST_CNT 8 +#defineURE_TX_LIST_CNT 1 #defineURE_RX_LIST_CNT 1 -#defineURE_RX_BUF_ALIGNsizeof(uint64_t) +#defineURE_TX_BUF_ALIGN4 +#defineURE_RX_BUF_ALIGN8 -#defineURE_TXBUFSZ 16384 -#defineURE_8152_RXBUFSZ16384 -#defineURE_8153_RXBUFSZ32768 +#defineURE_TX_BUFSZ16384 +#defineURE_8152_RX_BUFSZ 16384 +#defineURE_8153_RX_BUFSZ 32768 struct ure_chain { struct ure_softc*uc_sc; struct usbd_xfer*uc_xfer; char*uc_buf; - struct mbuf *uc_mbuf; - int uc_accum; - int uc_idx; + uint32_tuc_buflen; + uint32_tuc_bufmax; + struct mbuf_listuc_mbuf_list; + SLIST_ENTRY(ure_chain) uc_list; + uint8_t uc_idx; }; struct ure_cdata { - struct ure_chaintx_chain[URE_TX_LIST_CNT]; - struct ure_chainrx_chain[URE_RX_LIST_CNT]; - int tx_prod; - int tx_cnt; + struct ure_chainure_rx_chain[URE_RX_LIST_CNT]; + struct ure_chainure_tx_chain[URE_TX_LIST_CNT]; + SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free; }; struct ure_softc { @@ -541,7 +543,7 @@ struct ure_softc { struct timeval ure_rx_notice; int ure_rxbufsz; - int ure_tx_list_cnt; + int ure_txbufsz; int ure_phyno; Index: sys/dev/usb/if_ure.c === RCS file: /cvs/src/sys/dev/usb/if_ure.c,v retrieving revision 1.16 diff -u -p -u -r1.16 if_ure.c --- sys/dev/usb/if_ure.c10 Jul 2020 13:26:41 - 1.16 +++ sys/dev/usb/if_ure.c28 Jul 2020 22:56:29 - @@ -117,11 +117,13 @@ void ure_miibus_writereg(struct device void ure_lock_mii(struct ure_softc *); void ure_unlock_mii(struct ure_softc *); -inture_encap(struct ure_softc *, struct mbuf *); +inture_encap_txpkt(struct mbuf *, char *, uint32_t); +inture_encap_xfer(struct ifnet *, struct ure_softc *, struct ure_chain *); void ure_rxeof(struct usbd_xfer *, void *, usbd_status); void ure_txeof(struct usbd_xfer *, void *, usbd_status); -inture_rx_list_init(struct ure_softc *); -inture_tx_list_init(struct ure_softc *); +inture_xfer_list_init(struct ure_softc *, struct ure_chain *, + uint32_t, int); +void ure_xfer_list_free(struct ure_softc *, struct ure_chain *, int); void ure_tick_task(void *); void ure_tick(void *); @@ -625,12 +627,36 @@ void ure_watchdog(struct ifnet *ifp) { struct ure_softc*sc = ifp->if_softc; + struct ure_chain*c; + usbd_status err; + int i, s; + + ifp->if_timer = 0; if (usbd_is_dying(sc->ure_udev)) return; + if ((ifp->if_flags & (IFF_RUNNING|IFF_UP)) != (IFF_RUNNING|IFF_UP)) + return; + + sc = ifp->if_softc; + s = splnet(); + ifp->if_oerrors++; - printf("%s: watchdog timeout\n", sc->ure_dev.dv_xname); + DPRINTF(("%s: watchdog timeout\n", sc->ure_dev.dv_xname)); + + for (i = 0; i < URE_TX_LIST_CNT; i++) { + c = >ure_cdata.ure_tx_chain[i]; + if (ml_len(>uc_mbuf_list) > 0) { +
dhclient requested ip address in decline message
Hi, If the dhclient receives an OFFER or ACK, that does not contain all required parameters, a DECLINE is send. This DECLINE has the 'Requested IP Address' (DHO_DHCP_REQUESTET_ADDRESS) set to 0 instead of using the client IP address (yiaddr) from the packet. As far as I see it, the 'Requested IP Address' is the address the dhclient is declining, so the yiaddr would make more sense than 0. This happens in dhclient.c::packet_to_lease() and only if not all required parameters are in the packet. For all other cases the 'Requested IP Address' of the DECLINE is set to the yiaddr from the OFFER / ACK packet. My fix would be to store (and check) the yiaddr from the packet before the first jump to the decline label occurs. I have included a diff below. Best Regards, Dominik diff --git a/sbin/dhclient/dhclient.c b/sbin/dhclient/dhclient.c index 007358c5008..b287c19a3b2 100644 --- a/sbin/dhclient/dhclient.c +++ b/sbin/dhclient/dhclient.c @@ -1257,18 +1257,6 @@ packet_to_lease(struct interface_info *ifi, struct option_data *options) options[i].len = 0; } - /* -* If this lease doesn't supply a required parameter, decline it. -*/ - for (i = 0; i < config->required_option_count; i++) { - if (lease->options[config->required_options[i]].len == 0) { - name = code_to_name(config->required_options[i]); - log_warnx("%s: %s required but missing", log_procname, - name); - goto decline; - } - } - /* * If this lease is trying to sell us an address we are already * using, decline it. @@ -1282,6 +1270,18 @@ packet_to_lease(struct interface_info *ifi, struct option_data *options) goto decline; } + /* +* If this lease doesn't supply a required parameter, decline it. +*/ + for (i = 0; i < config->required_option_count; i++) { + if (lease->options[config->required_options[i]].len == 0) { + name = code_to_name(config->required_options[i]); + log_warnx("%s: %s required but missing", log_procname, + name); + goto decline; + } + } + /* Save the siaddr (a.k.a. next-server) info. */ lease->next_server.s_addr = packet->siaddr.s_addr;
Re: pipex(4): kill pipexintr()
On Thu, 30 Jul 2020 15:34:09 +0300 Vitaliy Makkoveev wrote: > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote: >> Hi, >> >> sys/net/if_ethersubr.c: >> 372 void >> 373 ether_input(struct ifnet *ifp, struct mbuf *m) >> (snip) >> 519 #if NPPPOE > 0 || defined(PIPEX) >> 520 case ETHERTYPE_PPPOEDISC: >> 521 case ETHERTYPE_PPPOE: >> 522 if (m->m_flags & (M_MCAST | M_BCAST)) >> 523 goto dropanyway; >> 524 #ifdef PIPEX >> 525 if (pipex_enable) { >> 526 struct pipex_session *session; >> 527 >> 528 if ((session = pipex_pppoe_lookup_session(m)) != >> NULL) { >> 529 pipex_pppoe_input(m, session); >> 530 return; >> 531 } >> 532 } >> 533 #endif >> >> previously a packet which branchces to #529 is enqueued. >> >> If the diff removes the queue, then the pipex input routine is >> executed by the NIC's interrupt handler. >> >> The queues had been made to avoid that kind of situations. > > It's not enqueued in pppoe case. According pipex_pppoe_input() code we > call pipex_common_input() with `useq' argument set to '0', so we don't > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to > ipv{4,6}_input(). You are right. Sorry, I forgot about this which I did that by myself. >> Also I don't see a relation of the use-after-free problem and killing >> queues. Can't we fix the problem unless we kill the queues? > > Yes we can. Reference counters allow us to keep orphan sessions in these > queues without use after free issue. > > I will wait your commentaries current enqueuing before to do something. I have another concern. You might know, when L2TP/IPsec is used heavily, the crypto thread uses 100% of 1 CPU core. In that case, that thread becomes like below: crypto thread -> udp_userreq -> pipex_l2tp_input some clients are using MPPE(RC4 encryption) on CCP. It's not so light. How do we offload this for CPUs? I am thinking that "pipex" can have a dedicated thread. Do we have another scenario? --yasuoka
Re: pipex(4): kill pipexintr()
On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote: > Hi, > > sys/net/if_ethersubr.c: > 372 void > 373 ether_input(struct ifnet *ifp, struct mbuf *m) > (snip) > 519 #if NPPPOE > 0 || defined(PIPEX) > 520 case ETHERTYPE_PPPOEDISC: > 521 case ETHERTYPE_PPPOE: > 522 if (m->m_flags & (M_MCAST | M_BCAST)) > 523 goto dropanyway; > 524 #ifdef PIPEX > 525 if (pipex_enable) { > 526 struct pipex_session *session; > 527 > 528 if ((session = pipex_pppoe_lookup_session(m)) != > NULL) { > 529 pipex_pppoe_input(m, session); > 530 return; > 531 } > 532 } > 533 #endif > > previously a packet which branchces to #529 is enqueued. > > If the diff removes the queue, then the pipex input routine is > executed by the NIC's interrupt handler. > > The queues had been made to avoid that kind of situations. It's not enqueued in pppoe case. According pipex_pppoe_input() code we call pipex_common_input() with `useq' argument set to '0', so we don't enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to ipv{4,6}_input(). cut begin pipex_pppoe_input(struct mbuf *m0, struct pipex_session *session) { int hlen; struct pipex_pppoe_header pppoe; NET_ASSERT_LOCKED(); /* already checked at pipex_pppoe_lookup_session */ KASSERT(m0->m_pkthdr.len >= (sizeof(struct ether_header) + sizeof(pppoe))); m_copydata(m0, sizeof(struct ether_header), sizeof(struct pipex_pppoe_header), (caddr_t)); hlen = sizeof(struct ether_header) + sizeof(struct pipex_pppoe_header); if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length), 0)) == NULL) return (NULL); m_freem(m0); session->stat.ierrors++; return (NULL); } pipex_common_input(struct pipex_session *session, struct mbuf *m0, int hlen, int plen, int useq) { /* skip */ if (!useq) { pipex_ppp_input(m0, session, 0); return (NULL); } /* input ppp packets to kernel session */ if (pipex_ppp_enqueue(m0, session, ) != 0) goto dropped; else return (NULL); cut end We enqueue pppac(4) related mbufs, except incoming pppoe. We enqueue pppx(4) related incoming mbufs except pppoe. We don't enqueue pppx(4) outgoing mbufs, we don't enqueue pppoe incoming mbufs. > > Also I don't see a relation of the use-after-free problem and killing > queues. Can't we fix the problem unless we kill the queues? > Yes we can. Reference counters allow us to keep orphan sessions in these queues without use after free issue. I will wait your commentaries current enqueuing before to do something.
Re: pipex(4): kill pipexintr()
Hi, sys/net/if_ethersubr.c: 372 void 373 ether_input(struct ifnet *ifp, struct mbuf *m) (snip) 519 #if NPPPOE > 0 || defined(PIPEX) 520 case ETHERTYPE_PPPOEDISC: 521 case ETHERTYPE_PPPOE: 522 if (m->m_flags & (M_MCAST | M_BCAST)) 523 goto dropanyway; 524 #ifdef PIPEX 525 if (pipex_enable) { 526 struct pipex_session *session; 527 528 if ((session = pipex_pppoe_lookup_session(m)) != NULL) { 529 pipex_pppoe_input(m, session); 530 return; 531 } 532 } 533 #endif previously a packet which branchces to #529 is enqueued. If the diff removes the queue, then the pipex input routine is executed by the NIC's interrupt handler. The queues had been made to avoid that kind of situations. Also I don't see a relation of the use-after-free problem and killing queues. Can't we fix the problem unless we kill the queues? On Wed, 29 Jul 2020 23:04:36 +0300 Vitaliy Makkoveev wrote: > Now pipex(4) is fully covered by NET_LOCK() and this is documented. But > we still have an issue with pipex(4) session itself and I guess it's > time to fix it. > > We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each > mbuf(9) passed to these queues stores the pointer to corresponding > session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for > pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't > enqueue pppoe related mbufs. After packet was enqueued to corresponding > queue we call schednetisr() which just schedules netisr() to run: > > cut begin > > 780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session, > 781 struct mbuf_queue *mq) > 782 { > 783 m0->m_pkthdr.ph_cookie = session; > 784 /* XXX need to support other protocols */ > 785 m0->m_pkthdr.ph_ppp_proto = PPP_IP; > 786 > 787 if (mq_enqueue(mq, m0) != 0) > 788 return (1); > 789 > 790 schednetisr(NETISR_PIPEX); > 791 > 792 return (0); > 793 } > > cut end > > Also we have pipex_timer() which should destroy session in safe way, but > it does this only for pppac(4) and only for sessions closed by > `PIPEXDSESSION' command: > > cut begin > > 812 pipex_timer(void *ignored_arg) > 813 { > /* skip */ > 846 case PIPEX_STATE_CLOSED: > 847 /* > 848 * mbuf queued in pipexinq or pipexoutq may have a > 849* refererce to this session. > 850 */ > 851 if (!mq_empty() || !mq_empty()) > 852 continue; > 853 > 854 pipex_destroy_session(session); > 855 break; > > cut end > > While we destroy sessions through pipex_rele_session() or through > pipex_iface_fini() or through `PIPEXSMODE' command we don't check > `pipexinq' and `pipexoutq' state. This means we can break them. > > It's not guaranteed that netisr() will start just after schednetisr() > call. This means we can destroy session, but corresponding mbuf(9) is > stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still > stores pointer to destroyed session and we have use after free issue. I > wonder why we didn't caught panic yet. > > I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is > absolutely no reason them to exist. This should not only fix issue > described above but simplifies code too. > > Other ways are to implement reference counters for session or walk > through mbuf(9) queues and kill corresponding mbufs. It doesn't make > sense to go these ways. > > Index: lib/libc/sys/sysctl.2 > === > RCS file: /cvs/src/lib/libc/sys/sysctl.2,v > retrieving revision 1.40 > diff -u -p -r1.40 sysctl.2 > --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 - 1.40 > +++ lib/libc/sys/sysctl.2 29 Jul 2020 13:47:40 - > @@ -2033,35 +2033,11 @@ The currently defined variable names are > .Bl -column "Third level name" "integer" "Changeable" -offset indent > .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable" > .It Dv PIPEXCTL_ENABLE Ta integer Ta yes > -.It Dv PIPEXCTL_INQ Ta node Ta not applicable > -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable > .El > .Bl -tag -width "123456" > .It Dv PIPEXCTL_ENABLE > If set to 1, enable PIPEX processing. > The default is 0. > -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq > -Fourth level comprises an array of > -.Vt struct ifqueue > -structures containing information about the PIPEX packet input queue. > -The forth level names for the elements of > -.Vt struct ifqueue > -are the same as described in > -.Li ip.arpq > -in the > -.Dv PF_INET > -section. > -.It Dv PIPEXCTL_OUTQ Pq Va
Re: VFS vgone(l) manpage vs. code
> From: Jason McIntyre, Mittwoch, 29. Juli 2020 14:23 > > On Wed, Jul 29, 2020 at 11:37:03AM +, Schreilechner, Dominik wrote: > > Hi, > > > > I noticed, that the description of vgone/vgonel in the manpage does not > > match the \ > > code. The manpage (https://man.openbsd.org/vgone) states: > > The difference between vgone() and vgonel() is that vgone() locks the vnode > > \ > > interlock and then calls vgonel() while vgonel() expects the interlock to > > already \ > > be locked. > > However, vgone simply calls vgonel with curproc (see vfs_subr.c). > > void > > vgone(struct vnode *vp) > > { > > struct proc *p = curproc; > > vgonel(vp, p); > > } > > > > Best regards, > > Dominik > > > > hi. > > it would be easier to get things improved if you attached a diff > describing what you think is correct. even failing a diff, a piece of > text saying what you think it should be would increase your chances. > > jmc Sorry, I was not sure about that, but worst case I am wrong, I guess. Below is a diff of the man page, that describes the current difference between vgone and vgonel as it is in the code. Best regards, Dominik diff --git a/share/man/man9/vgone.9 b/share/man/man9/vgone.9 index 3a663b582df..b3436fd9148 100644 --- a/share/man/man9/vgone.9 +++ b/share/man/man9/vgone.9 @@ -49,17 +49,13 @@ prepare a vnode for reuse by another file system. The preparation includes the cleaning of all file system specific data and the removal from its mount point vnode list. .Pp -The difference between .Fn vgone -and -.Fn vgonel -is that -.Fn vgone -locks the vnode interlock and then calls -.Fn vgonel -while +is the same as .Fn vgonel -expects the interlock to already be locked. +with +.Fa p +set to the current process. Historically, vgonel was called with a locked vnode +interlock, but that lock was removed. .Sh SEE ALSO .Xr vclean 9 , .Xr vnode 9 ,
Re: Add ability to set control values with video(1)
Hi Laurie, Thanks for testing and feedback! On Thu, 30 Jul 2020 08:07:56 +0100 Laurence Tratt wrote: > On Wed, Jul 29, 2020 at 10:52:31PM +0200, Marcus Glocker wrote: > > Hello Marcus, > > > Slightly adapted diff; Negative numbers can happen on controls. > > This looks really good to me, and a significant improvement on my > original. I've tested everything I can think of and it all looks > good, with one mild exception. > > UVC has separate controls for white_balance_temperature [int] and > white_balance_temperature_auto [bool]. The former is only active if > the latter is false. This probably makes devices about 30 seconds > easier to develop, but it's not a nice UI for end users, so we've > conflated them into one control. However, I now think we might need > to display this conflated control slightly differently to others. For > example consider this: > > $ video -d > $ video -c > brightness=128 > contrast=128 > saturation=128 > gain=0 > sharpness=128 > white_balance_temperature=4000 > > That last line, I think, should probably be: > > white_balance_temperature=auto > > because although the device's white_balance_temperature is set to > 4000K, we've turned auto white balance back on, so the device will > choose whatever white_balance_temperature it feels like. If/when we > get controls like zoom/exposure, they'll also probably want to be > treated in the same way, so if there's a generic way of handling > these in the code, it might be a nice bit of future proofing? Yes, I was shortly thinking about the right way of how to handle the white balance temperature control as well. And I agree, probably we should display 'auto' when auto white balance is turned on, but the current value when it's turned off. > However, this is not a huge deal: IMHO your diff is already good > enough to go in tree! Right, I also would prefer to get this diff in as is, and then implement the white balance change on top of that. As you already mentioned, that should be no big deal. > Laurie Marcus
Re: pipex(4): kill pipexintr()
On 29/07/20(Wed) 23:04, Vitaliy Makkoveev wrote: > Now pipex(4) is fully covered by NET_LOCK() and this is documented. But > we still have an issue with pipex(4) session itself and I guess it's > time to fix it. > > We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each > mbuf(9) passed to these queues stores the pointer to corresponding > session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for > pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't > enqueue pppoe related mbufs. After packet was enqueued to corresponding > queue we call schednetisr() which just schedules netisr() to run: > > cut begin > > 780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session, > 781 struct mbuf_queue *mq) > 782 { > 783 m0->m_pkthdr.ph_cookie = session; > 784 /* XXX need to support other protocols */ > 785 m0->m_pkthdr.ph_ppp_proto = PPP_IP; > 786 > 787 if (mq_enqueue(mq, m0) != 0) > 788 return (1); > 789 > 790 schednetisr(NETISR_PIPEX); > 791 > 792 return (0); > 793 } > > cut end > > Also we have pipex_timer() which should destroy session in safe way, but > it does this only for pppac(4) and only for sessions closed by > `PIPEXDSESSION' command: > > cut begin > > 812 pipex_timer(void *ignored_arg) > 813 { > /* skip */ > 846 case PIPEX_STATE_CLOSED: > 847 /* > 848 * mbuf queued in pipexinq or pipexoutq may have a > 849* refererce to this session. > 850 */ > 851 if (!mq_empty() || !mq_empty()) > 852 continue; > 853 > 854 pipex_destroy_session(session); > 855 break; > > cut end > > While we destroy sessions through pipex_rele_session() or through > pipex_iface_fini() or through `PIPEXSMODE' command we don't check > `pipexinq' and `pipexoutq' state. This means we can break them. > > It's not guaranteed that netisr() will start just after schednetisr() > call. This means we can destroy session, but corresponding mbuf(9) is > stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still > stores pointer to destroyed session and we have use after free issue. I > wonder why we didn't caught panic yet. > > I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is > absolutely no reason them to exist. This should not only fix issue > described above but simplifies code too. Time is fine. Make sure you watch for possible fallouts. If you're curious you can generate Flamegraphs or profile the kernel with and without this diff to get an idea of what get executed. This change should improve latency by reducing contention on the KERNEL_LOCK(). ok mpi@ Note that as a next step it could be beneficial to pass an `ifp' pointer to pipex_ip{,6}_input() and maybe even pipex_ppp_input() to reduce the number of if_get(9). This will make an interesting pattern appear ;) > Index: lib/libc/sys/sysctl.2 > === > RCS file: /cvs/src/lib/libc/sys/sysctl.2,v > retrieving revision 1.40 > diff -u -p -r1.40 sysctl.2 > --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 - 1.40 > +++ lib/libc/sys/sysctl.2 29 Jul 2020 13:47:40 - > @@ -2033,35 +2033,11 @@ The currently defined variable names are > .Bl -column "Third level name" "integer" "Changeable" -offset indent > .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable" > .It Dv PIPEXCTL_ENABLE Ta integer Ta yes > -.It Dv PIPEXCTL_INQ Ta node Ta not applicable > -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable > .El > .Bl -tag -width "123456" > .It Dv PIPEXCTL_ENABLE > If set to 1, enable PIPEX processing. > The default is 0. > -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq > -Fourth level comprises an array of > -.Vt struct ifqueue > -structures containing information about the PIPEX packet input queue. > -The forth level names for the elements of > -.Vt struct ifqueue > -are the same as described in > -.Li ip.arpq > -in the > -.Dv PF_INET > -section. > -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq > -Fourth level comprises an array of > -.Vt struct ifqueue > -structures containing information about PIPEX packet output queue. > -The forth level names for the elements of > -.Vt struct ifqueue > -are the same as described in > -.Li ip.arpq > -in the > -.Dv PF_INET > -section. > .El > .El > .Ss CTL_VFS > Index: sys/net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.616 > diff -u -p -r1.616 if.c > --- sys/net/if.c 24 Jul 2020 18:17:14 - 1.616 > +++ sys/net/if.c 29 Jul 2020 13:47:44 - > @@ -909,13 +909,6 @@ if_netisr(void *unused) > KERNEL_UNLOCK(); > } >
Re: rdomain.4: route -T takes an rtable, not rdomain
On Thu, Jul 30, 2020 at 04:08:01AM +0200, Klemens Nanni wrote: > Multiple rtables may exist in the default rdomain (0), that is their > corresponding rdomains/lo(4) interfaces do not have to exist. > > This demonstrates it; first, nothing but default, so route(8) fails: > > # netstat -R > Rdomain 0 > Interfaces: lo0 vio0 enc0 > Routing table: 0 > > # route -T1 exec id -R > route: routing table 1: No such file or directory > > Then create an rdomain and with it an rtable: > > # ifconfig lo1 rdomain 1 > # netstat -R > Rdomain 0 > Interfaces: lo0 vio0 enc0 > Routing table: 0 > > Rdomain 1 > Interface: lo1 > Routing table: 1 > > This makes route(8) work, but it keeps working when we remove the > rdomain again since the rtable persits: > > # route -T1 exec id -R > 1 > # ifconfig lo1 destroy > # netstat -R > Rdomain 0 > Interfaces: lo0 vio0 enc0 > Routing tables: 0 1 > > # route -T1 exec id -R > 1 > > > I'm not sure yet, whether this is intentional or in fact a bug. > Either ways, the manual should be fixed - route(8)'s synopsis says the > same, just like ping(8)'s `-V rtable': > > $ man -hs8 route > route [-dnqtv] [-T rtable] command [[modifiers] args] > > Feedback? Objections? OK? > OK remi@ > > Index: share/man/man4/rdomain.4 > === > RCS file: /cvs/src/share/man/man4/rdomain.4,v > retrieving revision 1.13 > diff -u -p -r1.13 rdomain.4 > --- share/man/man4/rdomain.4 1 Feb 2020 15:00:20 - 1.13 > +++ share/man/man4/rdomain.4 30 Jul 2020 01:56:39 - > @@ -98,7 +98,7 @@ Put em0 and lo4 in rdomain 4: > # ifconfig em0 192.0.2.100/24 > .Ed > .Pp > -Set a default route and localhost reject route within rdomain 4: > +Set a default route and localhost reject route within rtable 4: > .Bd -literal -offset indent > # route -T4 -qn add -net 127 127.0.0.1 -reject > # route -T4 -n add default 192.0.2.1 > @@ -106,7 +106,7 @@ Set a default route and localhost reject > .Pp > Start > .Xr sshd 8 > -in rdomain 4: > +in rtable 4: > .Pp > .Dl # route -T4 exec /usr/sbin/sshd > .Pp >
Re: Add ability to set control values with video(1)
On Wed, Jul 29, 2020 at 10:52:31PM +0200, Marcus Glocker wrote: Hello Marcus, > Slightly adapted diff; Negative numbers can happen on controls. This looks really good to me, and a significant improvement on my original. I've tested everything I can think of and it all looks good, with one mild exception. UVC has separate controls for white_balance_temperature [int] and white_balance_temperature_auto [bool]. The former is only active if the latter is false. This probably makes devices about 30 seconds easier to develop, but it's not a nice UI for end users, so we've conflated them into one control. However, I now think we might need to display this conflated control slightly differently to others. For example consider this: $ video -d $ video -c brightness=128 contrast=128 saturation=128 gain=0 sharpness=128 white_balance_temperature=4000 That last line, I think, should probably be: white_balance_temperature=auto because although the device's white_balance_temperature is set to 4000K, we've turned auto white balance back on, so the device will choose whatever white_balance_temperature it feels like. If/when we get controls like zoom/exposure, they'll also probably want to be treated in the same way, so if there's a generic way of handling these in the code, it might be a nice bit of future proofing? However, this is not a huge deal: IMHO your diff is already good enough to go in tree! Laurie