Re: route add ::/0 ...
On Tue, 30 Jun 2020 02:42:02 +0200 Klemens Nanni wrote: > On Tue, Jun 30, 2020 at 09:00:30AM +0900, YASUOKA Masahiko wrote: >> inet_makenetandmask() had required another treatment. >> >> Also -prefixlen 0 for -inet has a bug >> >> % doas ./obj/route -T100 add -inet 0.0.0.0 -prefixlen 0 127.0.0.1 >> add net 0.0.0.0: gateway 127.0.0.1 >> % netstat -nrf inet -T 100 >> Routing tables >> >> Internet: >> DestinationGatewayFlags Refs Use Mtu Prio >> Iface >> 0.0.0.0/32 127.0.0.1 UGS00 32768 8 >> lo100 >> >> /0 becomes /32. The diff following also fixes the problem. > Yes, this looks correct to me; regress is also happy (again). > > OK kn Thanks, I'm going to commit the diff. ok or comments, are still welcome. Stop using make_addr() which trims trailing zeros of the netmask, set family and length field. This fixes route(8) to handle "::/0" properly. Also fix "route add -inet 0.0.0.0 -prefixlen 0 (gateway)" to work properly. Index: sbin/route/route.c === RCS file: /cvs/src/sbin/route/route.c,v retrieving revision 1.247 diff -u -p -r1.247 route.c --- sbin/route/route.c 15 Jan 2020 10:26:25 - 1.247 +++ sbin/route/route.c 6 Jul 2020 08:45:06 - @@ -107,7 +107,6 @@ void print_rtmsg(struct rt_msghdr *, in voidpmsg_common(struct rt_msghdr *); voidpmsg_addrs(char *, int); voidbprintf(FILE *, int, char *); -voidmask_addr(union sockunion *, union sockunion *, int); int getaddr(int, int, char *, struct hostent **); voidgetmplslabel(char *, int); int rtmsg(int, int, int, uint8_t); @@ -781,12 +780,9 @@ inet_makenetandmask(u_int32_t net, struc sin->sin_addr.s_addr = htonl(net); sin = _mask.sin; sin->sin_addr.s_addr = htonl(mask); - sin->sin_len = 0; - sin->sin_family = 0; + sin->sin_family = AF_INET; cp = (char *)(>sin_addr + 1); - while (*--cp == '\0' && cp > (char *)sin) - continue; - sin->sin_len = 1 + cp - (char *)sin; + sin->sin_len = sizeof(struct sockaddr_in); } /* @@ -1001,7 +997,8 @@ prefixlen(int af, char *s) memset(_mask, 0, sizeof(so_mask)); so_mask.sin.sin_family = AF_INET; so_mask.sin.sin_len = sizeof(struct sockaddr_in); - so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len)); + if (len != 0) + so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len)); break; case AF_INET6: so_mask.sin6.sin6_family = AF_INET6; @@ -1088,8 +1085,6 @@ rtmsg(int cmd, int flags, int fmask, uin rtm.rtm_mpls = mpls_flags; rtm.rtm_hdrlen = sizeof(rtm); - if (rtm_addrs & RTA_NETMASK) - mask_addr(_dst, _mask, RTA_DST); /* store addresses in ascending order of RTA values */ NEXTADDR(RTA_DST, so_dst); NEXTADDR(RTA_GATEWAY, so_gate); @@ -1118,34 +1113,6 @@ rtmsg(int cmd, int flags, int fmask, uin } #undef rtm return (0); -} - -void -mask_addr(union sockunion *addr, union sockunion *mask, int which) -{ - int olen = mask->sa.sa_len; - char *cp1 = olen + (char *)mask, *cp2; - - for (mask->sa.sa_len = 0; cp1 > (char *)mask; ) - if (*--cp1 != '\0') { - mask->sa.sa_len = 1 + cp1 - (char *)mask; - break; - } - if ((rtm_addrs & which) == 0) - return; - switch (addr->sa.sa_family) { - case AF_INET: - case AF_INET6: - case AF_UNSPEC: - return; - } - cp1 = mask->sa.sa_len + 1 + (char *)addr; - cp2 = addr->sa.sa_len + 1 + (char *)addr; - while (cp2 > cp1) - *--cp2 = '\0'; - cp2 = mask->sa.sa_len + 1 + (char *)mask; - while (cp1 > addr->sa.sa_data) - *--cp1 &= *--cp2; } char *msgtypes[] = {
Re: fix races in if_clone_create()
On 01/07/20(Wed) 00:02, Vitaliy Makkoveev wrote: > On Tue, Jun 30, 2020 at 03:48:22PM +0300, Vitaliy Makkoveev wrote: > > On Tue, Jun 30, 2020 at 12:08:03PM +0200, Martin Pieuchot wrote: > > > On 29/06/20(Mon) 11:59, Vitaliy Makkoveev wrote: > > > > [...] > > > > I reworked tool for reproduce. Now I avoided fork()/exec() route and it > > > > takes couple of minutes to take panic on 4 cores. Also some screenshots > > > > attached. > > > > > > Setting kern.pool_debug=2 makes the race reproducible in seconds. > > Unfortunately you will catch splassert() caused by kern/sched_bsd.c:304. > malloc() will call yield() while we are holding NET_LOCK(). I attached > screenshot with splassertion to this mail. With kern.splassert < 3 it is fine. > > > Could you turn this test into something committable in regress/? We can > > > link it to the build once a fix is committed. > > > > > > > We have 3 races with cloned interfaces: > > 1. if_clone_create() vs if_clone_create() > > 2. if_clone_destroy() vs if_clone_destroy() > > 3. if_clone_destroy() vs the rest of stack > > > > It makes sences to commit unified test to regress/, so I suggest to wait > > a little. > > The another solution. > > Diff below introduces per-`ifc' serialization for if_clone_create() and > if_clone_destroy(). There is no index bitmap anymore. I like the simplification. More comments below: > +/* > + * Lock a clone network interface. > + */ > +int > +if_clone_lock(struct if_clone *ifc) > +{ > + int error; > + > + rw_enter_write(>ifc_lock); > + > + while (ifc->ifc_flags & IFC_CREATE_LOCKED) { > + ifc->ifc_flags |= IFC_CREATE_LOCKWAIT; > + error = rwsleep_nsec(>ifc_flags, >ifc_lock, > + PWAIT|PCATCH, "ifclk", INFSLP); > + if(error != 0) { > + ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT; > + rw_exit_write(>ifc_lock); > + return error; > + } > + } > + ifc->ifc_flags |= IFC_CREATE_LOCKED; > + ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT; > + > + rw_exit_write(>ifc_lock); > + > + return 0; > +} This is like re-implementing a rwlock but loosing the debugging ability of WITNESS. I also don't see any reason for having a per-ifc lock. If, at least one of the problems, is a double insert in `ifnet' then we should be able to assert that a lock is held when doing such assertion. Assertions and documentation are more important than preventing races because they allow to build awareness and elegant solutions instead of hacking diffs until stuff work without knowing why. There are two cases where `ifp' are inserted into `ifnet': 1. by autoconf during boot or hotplug 2. by cloning ioctl In the second case it is always about pseudo-devices. So the assertion should be conditional like: if (ISSET(ifp->if_xflags, IFXF_CLONED)) rw_assert_wrlock(_lock); In other words this fixes serializes insertions/removal on the global list `ifnet', the KERNEL_LOCK() being still required for reading it. Is there any other data structure which ends up being protected by this approach and could be documented?
urtwn(4): Add support for D-Link DWA-121 rev B1
Hi, The patch at the end should add support for USB wifi dongle DWA-121 from D-Link [1]. The USB id of such device is 2001:331b. lykke$ usbdevs Controller /dev/usb0: addr 01: : Generic, EHCI root hub Controller /dev/usb1: addr 01: : Generic, EHCI root hub addr 02: 05e3:0608 Genesys Logic, USB2.0 Hub addr 03: 0c45:6321 Sonix Technology Co., Ltd., USB Camera Controller /dev/usb2: addr 01: : Generic, xHCI root hub Controller /dev/usb3: addr 01: : Generic, xHCI root hub addr 02: 2001:331b Realtek\r, Controller /dev/usb4: addr 01: : Generic, OHCI root hub addr 02: 258a:001e HAILUCK CO.,LTD, USB KEYBOARD Controller /dev/usb5: addr 01: : Generic, OHCI root hub Relevant dmesg message (full dmesg is attached): urtwn0 at uhub3 port 1 configuration 1 interface 0 "Realtek " rev 2.00/0.00 addr 2 urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address 60:63:4c:xx:xx:xx Thanks, Miguel. 1. https://eu.dlink.com/uk/en/products/dwa-121-wireless-n-150-pico-usb-adapter Index: sys/dev/usb/if_urtwn.c === RCS file: /cvs/src/sys/dev/usb/if_urtwn.c,v retrieving revision 1.90 diff -u -p -r1.90 if_urtwn.c --- sys/dev/usb/if_urtwn.c 11 Jun 2020 00:56:12 - 1.90 +++ sys/dev/usb/if_urtwn.c 6 Jul 2020 08:26:30 - @@ -328,6 +328,7 @@ static const struct urtwn_type { URTWN_DEV_8188EU(REALTEK, RTL8188ETV), URTWN_DEV_8188EU(REALTEK, RTL8188EU), URTWN_DEV_8188EU(TPLINK,RTL8188EUS), + URTWN_DEV_8188EU(DLINK, DWA121B1), /* URTWN_RTL8192EU */ URTWN_DEV_8192EU(DLINK, DWA131E1), URTWN_DEV_8192EU(REALTEK, RTL8192EU), Index: sys/dev/usb/usbdevs === RCS file: /cvs/src/sys/dev/usb/usbdevs,v retrieving revision 1.717 diff -u -p -r1.717 usbdevs --- sys/dev/usb/usbdevs 22 Jun 2020 15:49:37 - 1.717 +++ sys/dev/usb/usbdevs 6 Jul 2020 08:26:30 - @@ -1565,6 +1565,7 @@ product DLINK DWA125D10x330f DWA-125 r product DLINK DWA123D1 0x3310 DWA-123 rev D1 product DLINK DWA137A1 0x3317 DWA-137 rev A1 product DLINK DWA131E1 0x3319 DWA-131 rev E1 +product DLINK DWA121B1 0x331b DWA-121 rev B1 product DLINK DWA182D1 0x331c DWA-182 rev D1 product DLINK DWA171C1 0x331d DWA-171 rev C1 product DLINK DWL122 0x3700 DWL-122 Index: sys/dev/usb/usbdevs.h === RCS file: /cvs/src/sys/dev/usb/usbdevs.h,v retrieving revision 1.729 diff -u -p -r1.729 usbdevs.h --- sys/dev/usb/usbdevs.h 22 Jun 2020 15:52:39 - 1.729 +++ sys/dev/usb/usbdevs.h 6 Jul 2020 08:26:30 - @@ -1570,6 +1570,7 @@ #defineUSB_PRODUCT_DLINK_DWA131B 0x330d /* DWA-131 rev B */ #defineUSB_PRODUCT_DLINK_DWA125D1 0x330f /* DWA-125 rev D1 */ #defineUSB_PRODUCT_DLINK_DWA123D1 0x3310 /* DWA-123 rev D1 */ +#defineUSB_PRODUCT_DLINK_DWA121B1 0x331b /* DWA-121 rev B1 */ #defineUSB_PRODUCT_DLINK_DWA137A1 0x3317 /* DWA-137 rev A1 */ #defineUSB_PRODUCT_DLINK_DWA131E1 0x3319 /* DWA-131 rev E1 */ #defineUSB_PRODUCT_DLINK_DWA182D1 0x331c /* DWA-182 rev D1 */ Index: sys/dev/usb/usbdevs_data.h === RCS file: /cvs/src/sys/dev/usb/usbdevs_data.h,v retrieving revision 1.723 diff -u -p -r1.723 usbdevs_data.h --- sys/dev/usb/usbdevs_data.h 22 Jun 2020 15:52:39 - 1.723 +++ sys/dev/usb/usbdevs_data.h 6 Jul 2020 08:26:31 - @@ -2618,6 +2618,10 @@ const struct usb_known_product usb_known "DWA-123 rev D1", }, { + USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA121B1, + "DWA-121 rev B1", + }, + { USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA137A1, "DWA-137 rev A1", }, -- Miguel Landaeta, nomadium at debian.org secure email with PGP 0x6E608B637D8967E9 available at http://miguel.cc/key. "Faith means not wanting to know what is true." -- Nietzsche OpenBSD 6.7-current (CUSTOM) #0: Fri Jul 3 14:18:45 IST 2020 mig...@lykke.nomadium.net:/sys/arch/arm64/compile/CUSTOM real mem = 4092612608 (3903MB) avail mem = 3891392512 (3711MB) random: good seed from bootblocks mainbus0 at root: Pine64 Pinebook Pro psci0 at mainbus0: PSCI 1.1, SMCCC 1.1 cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4 cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu0: 512KB 64b/line 16-way L2 cache cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4 cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu1: 512KB 64b/line 16-way L2 cache cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4 cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1
Re: pipex(4): kill pipexintr()
On 01/07/20(Wed) 22:42, Vitaliy Makkoveev wrote: > pipex(4) has 2 mbuf queues: `pipexinq' and `pipexoutq'. When mbuf passed > to pipex it goes to one of these queues and pipexintr() will be > scheduled to process them. pipexintr() called from `netisr' context. > > It's true for pppac(4) but for pppx(4) only incoming mbufs go to > `pipexinq. Outgoing mbufs go directly to stack. pppx(4) enabled in > npppd.conf(5) by default so I guess it's the common case of pipex(4) > usage. > > The code looks like there is no requirements to this delayed mbufs > processing, we can pass it directly to stack as we do for pppx(4) > outgoing traffic. > > Also we have some troubles with pipexintr() as it was described in [1]. > It's protection of `ph_cookie'. We don't this protection this time and > we can't because we should brake if_get(9) logic. > > Diff below removes pipexintr(). Now all mbufs passed directly without > enqueueing within pipex(4). We also can destroy sessions safe in all > cases. We also can use if_get(9) instead using unreferenced pointers to > `ifnet' within pipex(4). We also avoided context switch while we > processing mbufs within pipex(4). We decreased latency. > > I'm seeding debian torrents with this diff an all goes well. With this diff the content of pipexintr() is no longer executed with the KERNEL_LOCK() held. This can be seen by following the code starting in ether_input(). Grabbing the KERNEL_LOCK() there is not a way forward. The whole idea of if_input_process() is to be free of KERNEL_LOCK() to not introduce latency delay. So this changes implies that `pipex_session_list' and possibly other global data structures as well as the elements linked in those are all protected by the NET_LOCK(). I believe this is the easiest way forward. That said I would be comfortable with this diff going in if an audit of the data structures accessed in the code path starting by pipex_pppoe_input() has been done. That implies annotating/documenting which data structures are now protected by the NET_LOCK() and adding the necessary NET_ASSERT_LOCK(). Such audit might lead to consider some ioctl code path changes to now serialize on the NET_LOCK() instead of the KERNEL_LOCK(). > 1. https://marc.info/?t=15930080902=1=2 > > Index: lib/libc/sys/sysctl.2 > === > RCS file: /cvs/src/lib/libc/sys/sysctl.2,v > retrieving revision 1.40 > diff -u -p -r1.40 sysctl.2 > --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 - 1.40 > +++ lib/libc/sys/sysctl.2 1 Jul 2020 19:20:22 - > @@ -2033,35 +2033,11 @@ The currently defined variable names are > .Bl -column "Third level name" "integer" "Changeable" -offset indent > .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable" > .It Dv PIPEXCTL_ENABLE Ta integer Ta yes > -.It Dv PIPEXCTL_INQ Ta node Ta not applicable > -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable > .El > .Bl -tag -width "123456" > .It Dv PIPEXCTL_ENABLE > If set to 1, enable PIPEX processing. > The default is 0. > -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq > -Fourth level comprises an array of > -.Vt struct ifqueue > -structures containing information about the PIPEX packet input queue. > -The forth level names for the elements of > -.Vt struct ifqueue > -are the same as described in > -.Li ip.arpq > -in the > -.Dv PF_INET > -section. > -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq > -Fourth level comprises an array of > -.Vt struct ifqueue > -structures containing information about PIPEX packet output queue. > -The forth level names for the elements of > -.Vt struct ifqueue > -are the same as described in > -.Li ip.arpq > -in the > -.Dv PF_INET > -section. > .El > .El > .Ss CTL_VFS > Index: sys/net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.611 > diff -u -p -r1.611 if.c > --- sys/net/if.c 30 Jun 2020 09:31:38 - 1.611 > +++ sys/net/if.c 1 Jul 2020 19:20:27 - > @@ -1012,13 +1012,6 @@ if_netisr(void *unused) > KERNEL_UNLOCK(); > } > #endif > -#ifdef PIPEX > - if (n & (1 << NETISR_PIPEX)) { > - KERNEL_LOCK(); > - pipexintr(); > - KERNEL_UNLOCK(); > - } > -#endif > t |= n; > } > > Index: sys/net/netisr.h > === > RCS file: /cvs/src/sys/net/netisr.h,v > retrieving revision 1.51 > diff -u -p -r1.51 netisr.h > --- sys/net/netisr.h 6 Aug 2019 22:57:54 - 1.51 > +++ sys/net/netisr.h 1 Jul 2020 19:20:27 - > @@ -48,7 +48,6 @@ > #define NETISR_IPV6 24 /* same as AF_INET6 */ > #define NETISR_ISDN 26 /* same as AF_E164 */ > #define NETISR_PPP 28 /* for PPP processing */ > -#define NETISR_PIPEX27 /* for
Re: route add ::/0 ...
Let me updated the diff. On Mon, 06 Jul 2020 17:54:30 +0900 (JST) YASUOKA Masahiko wrote: > On Tue, 30 Jun 2020 02:42:02 +0200 > Klemens Nanni wrote: >> On Tue, Jun 30, 2020 at 09:00:30AM +0900, YASUOKA Masahiko wrote: >>> inet_makenetandmask() had required another treatment. >>> >>> Also -prefixlen 0 for -inet has a bug >>> >>> % doas ./obj/route -T100 add -inet 0.0.0.0 -prefixlen 0 127.0.0.1 >>> add net 0.0.0.0: gateway 127.0.0.1 >>> % netstat -nrf inet -T 100 >>> Routing tables >>> >>> Internet: >>> DestinationGatewayFlags Refs Use Mtu Prio >>> Iface >>> 0.0.0.0/32 127.0.0.1 UGS00 32768 8 >>> lo100 >>> >>> /0 becomes /32. The diff following also fixes the problem. >> Yes, this looks correct to me; regress is also happy (again). >> >> OK kn > > Thanks, > > I'm going to commit the diff. ok or comments, are still welcome. > > > Stop using make_addr() which trims trailing zeros of the netmask, set > family and length field. This fixes route(8) to handle "::/0" > properly. Also fix "route add -inet 0.0.0.0 -prefixlen 0 (gateway)" > to work properly. > > Index: sbin/route/route.c > === > RCS file: /cvs/src/sbin/route/route.c,v > retrieving revision 1.247 > diff -u -p -r1.247 route.c > --- sbin/route/route.c15 Jan 2020 10:26:25 - 1.247 > +++ sbin/route/route.c6 Jul 2020 08:45:06 - (snip) > @@ -781,12 +780,9 @@ inet_makenetandmask(u_int32_t net, struc > sin->sin_addr.s_addr = htonl(net); > sin = _mask.sin; > sin->sin_addr.s_addr = htonl(mask); > - sin->sin_len = 0; > - sin->sin_family = 0; > + sin->sin_family = AF_INET; > cp = (char *)(>sin_addr + 1); > - while (*--cp == '\0' && cp > (char *)sin) > - continue; > - sin->sin_len = 1 + cp - (char *)sin; > + sin->sin_len = sizeof(struct sockaddr_in); > } > > /* "cp" becomes unused. The updated diff removes "cp" as well. Index: sbin/route/route.c === RCS file: /cvs/src/sbin/route/route.c,v retrieving revision 1.247 diff -u -p -r1.247 route.c --- sbin/route/route.c 15 Jan 2020 10:26:25 - 1.247 +++ sbin/route/route.c 6 Jul 2020 08:57:25 - @@ -107,7 +107,6 @@ void print_rtmsg(struct rt_msghdr *, in voidpmsg_common(struct rt_msghdr *); voidpmsg_addrs(char *, int); voidbprintf(FILE *, int, char *); -voidmask_addr(union sockunion *, union sockunion *, int); int getaddr(int, int, char *, struct hostent **); voidgetmplslabel(char *, int); int rtmsg(int, int, int, uint8_t); @@ -767,7 +766,6 @@ void inet_makenetandmask(u_int32_t net, struct sockaddr_in *sin, int bits) { u_int32_t mask; - char *cp; rtm_addrs |= RTA_NETMASK; if (bits == 0 && net == 0) @@ -781,12 +779,8 @@ inet_makenetandmask(u_int32_t net, struc sin->sin_addr.s_addr = htonl(net); sin = _mask.sin; sin->sin_addr.s_addr = htonl(mask); - sin->sin_len = 0; - sin->sin_family = 0; - cp = (char *)(>sin_addr + 1); - while (*--cp == '\0' && cp > (char *)sin) - continue; - sin->sin_len = 1 + cp - (char *)sin; + sin->sin_family = AF_INET; + sin->sin_len = sizeof(struct sockaddr_in); } /* @@ -1001,7 +995,8 @@ prefixlen(int af, char *s) memset(_mask, 0, sizeof(so_mask)); so_mask.sin.sin_family = AF_INET; so_mask.sin.sin_len = sizeof(struct sockaddr_in); - so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len)); + if (len != 0) + so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len)); break; case AF_INET6: so_mask.sin6.sin6_family = AF_INET6; @@ -1088,8 +1083,6 @@ rtmsg(int cmd, int flags, int fmask, uin rtm.rtm_mpls = mpls_flags; rtm.rtm_hdrlen = sizeof(rtm); - if (rtm_addrs & RTA_NETMASK) - mask_addr(_dst, _mask, RTA_DST); /* store addresses in ascending order of RTA values */ NEXTADDR(RTA_DST, so_dst); NEXTADDR(RTA_GATEWAY, so_gate); @@ -1118,34 +,6 @@ rtmsg(int cmd, int flags, int fmask, uin } #undef rtm return (0); -} - -void -mask_addr(union sockunion *addr, union sockunion *mask, int which) -{ - int olen = mask->sa.sa_len; - char *cp1 = olen + (char *)mask, *cp2; - - for (mask->sa.sa_len = 0; cp1 > (char *)mask; ) - if (*--cp1 != '\0') { - mask->sa.sa_len = 1 + cp1 - (char *)mask; - break; - } - if ((rtm_addrs & which) == 0) - return; - switch (addr->sa.sa_family) { - case AF_INET: - case AF_INET6: - case AF_UNSPEC: - return; - } - cp1 =
Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()
10^12 was the old definition of billion in the UK also, although in the last few decades it has become rare and 10^9 is now the norm. https://en.wikipedia.org/wiki/Long_and_short_scales has quite a bit about it. On Tue, 7 Jul 2020 at 00:27, Scott Cheloha wrote: > On Mon, Jul 06, 2020 at 11:57:32PM +0200, Mark Kettenis wrote: > > > Date: Mon, 6 Jul 2020 16:40:35 -0500 > > > From: Scott Cheloha > > > > > > On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote: > > > > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote: > > > > > > > > > Hi, > > > > > > > > > > When we recompute the scaling factor during tc_windup() there is an > > > > > opportunity for arithmetic overflow/underflow when we add the NTP > > > > > adjustment into the scale: > > > > > > > > > >649 scale = (u_int64_t)1 << 63; > > > > >650 scale += \ > > > > >651 ((th->th_adjustment + > th->th_counter->tc_freq_adj) / 1024) * 2199; > > > > >652 scale /= th->th_counter->tc_frequency; > > > > >653 th->th_scale = scale * 2; > > > > > > > > > > At lines 650 and 651, you will overflow/underflow if > > > > > th->th_counter->tc_freq_adj is sufficiently positive/negative. > > > > > > > > > > I don't like the idea of checking for that overflow during > > > > > tc_windup(). We can pick a reasonable adjustment range and check > for > > > > > it during adjfreq(2) and that should be good enough. > > > > > > > > > > My strawman proposal is a range of -5 to 5 parts > per > > > > > billion. We could push the limits a bit, but half a billion seems > > > > > like a nice round number to me. > > > > > > > > > > On a perfect clock, this means you can effect a 0.5x slowdown or a > > > > > 1.5x speedup via adjfreq(2), but no slower/faster. > > > > > > > > > > I don't *think* ntpd(8) would ever reach such extreme adjustments > > > > > through its algorithm. I don't think this will break anyone's > working > > > > > setup. > > > > > > > > > > (Maybe I'm wrong, though. otto@?) > > > > > > > > Right, ntpd is pretty conversative and won't do big adjustments. > > > > > > So, what is the right way to describe these limits? > > > > > > "Parts per billion"? Something else? > > > > The traditional way to express clock drift in the context of NTP is > > "parts per million" or "ppm" for short. There are good reasons to > > avoid using "billion" in documentation as a very similar word is used > > in Germanic languages for 10^12 where in English you mean 10^9. > > Huh. So from what I've gathered: > > German English ISO > > Million Million 10^6 > Milliarde Billion 10^9 > Billion Trillion10^12 > Billiarde Quadrillion 10^15 > TrillionQuintillion 10^18 > > Potentially confusing. > > The "German Connection" to NTP in particular eludes me, though. > > > Stripping three zeroes also makes it easier to read. [...] > > Yes, it always does. It looks nicer as "N ppm" in the mandoc(1) > output, too. > > -- > > otto@, from what you said I take it you're OK with the new limits so > I'm going commit it as follows in a day or two. > > Index: lib/libc/sys/adjfreq.2 > === > RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v > retrieving revision 1.7 > diff -u -p -r1.7 adjfreq.2 > --- lib/libc/sys/adjfreq.2 10 Sep 2015 17:55:21 - 1.7 > +++ lib/libc/sys/adjfreq.2 6 Jul 2020 23:00:24 - > @@ -60,6 +60,9 @@ The > .Fa freq > argument is non-null and the process's effective user ID is not that > of the superuser. > +.It Bq Er EINVAL > +.Fa freq > +is less than -50 ppm or greater than 50 ppm. > .El > .Sh SEE ALSO > .Xr date 1 , > Index: sys/kern/kern_time.c > === > RCS file: /cvs/src/sys/kern/kern_time.c,v > retrieving revision 1.131 > diff -u -p -r1.131 kern_time.c > --- sys/kern/kern_time.c22 Jun 2020 18:25:57 - 1.131 > +++ sys/kern/kern_time.c6 Jul 2020 23:00:24 - > @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v > return (0); > } > > +#define ADJFREQ_MAX (5LL << 32) > +#define ADJFREQ_MIN (-5LL << 32) > + > int > sys_adjfreq(struct proc *p, void *v, register_t *retval) > { > @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg > return (error); > if ((error = copyin(freq, , sizeof(f > return (error); > + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX) > + return (EINVAL); > } > > rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE); > >
Re: urtwn(4): Add support for D-Link DWA-121 rev B1
On Mon, Jul 06, 2020 at 10:15:14AM +0100, Miguel Landaeta wrote: > Hi, > > The patch at the end should add support for USB wifi dongle > DWA-121 from D-Link [1]. > > The USB id of such device is 2001:331b. > > lykke$ usbdevs > Controller /dev/usb0: > addr 01: : Generic, EHCI root hub > Controller /dev/usb1: > addr 01: : Generic, EHCI root hub > addr 02: 05e3:0608 Genesys Logic, USB2.0 Hub > addr 03: 0c45:6321 Sonix Technology Co., Ltd., USB Camera > Controller /dev/usb2: > addr 01: : Generic, xHCI root hub > Controller /dev/usb3: > addr 01: : Generic, xHCI root hub > addr 02: 2001:331b Realtek\r, > Controller /dev/usb4: > addr 01: : Generic, OHCI root hub > addr 02: 258a:001e HAILUCK CO.,LTD, USB KEYBOARD > Controller /dev/usb5: > addr 01: : Generic, OHCI root hub > > Relevant dmesg message (full dmesg is attached): > > urtwn0 at uhub3 port 1 configuration 1 interface 0 "Realtek " rev 2.00/0.00 > addr 2 > urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address 60:63:4c:xx:xx:xx > > Thanks, > Miguel. > > 1. https://eu.dlink.com/uk/en/products/dwa-121-wireless-n-150-pico-usb-adapter Thanks, committed with if_urtwn.c changed to be in sorted order. If you run 'fw_update bwfm' bwfm firmware will be installed and builtin 802.11 should work. > > > Index: sys/dev/usb/if_urtwn.c > === > RCS file: /cvs/src/sys/dev/usb/if_urtwn.c,v > retrieving revision 1.90 > diff -u -p -r1.90 if_urtwn.c > --- sys/dev/usb/if_urtwn.c11 Jun 2020 00:56:12 - 1.90 > +++ sys/dev/usb/if_urtwn.c6 Jul 2020 08:26:30 - > @@ -328,6 +328,7 @@ static const struct urtwn_type { > URTWN_DEV_8188EU(REALTEK, RTL8188ETV), > URTWN_DEV_8188EU(REALTEK, RTL8188EU), > URTWN_DEV_8188EU(TPLINK,RTL8188EUS), > + URTWN_DEV_8188EU(DLINK, DWA121B1), > /* URTWN_RTL8192EU */ > URTWN_DEV_8192EU(DLINK, DWA131E1), > URTWN_DEV_8192EU(REALTEK, RTL8192EU), > Index: sys/dev/usb/usbdevs > === > RCS file: /cvs/src/sys/dev/usb/usbdevs,v > retrieving revision 1.717 > diff -u -p -r1.717 usbdevs > --- sys/dev/usb/usbdevs 22 Jun 2020 15:49:37 - 1.717 > +++ sys/dev/usb/usbdevs 6 Jul 2020 08:26:30 - > @@ -1565,6 +1565,7 @@ product DLINK DWA125D1 0x330f DWA-125 r > product DLINK DWA123D1 0x3310 DWA-123 rev D1 > product DLINK DWA137A1 0x3317 DWA-137 rev A1 > product DLINK DWA131E1 0x3319 DWA-131 rev E1 > +product DLINK DWA121B1 0x331b DWA-121 rev B1 > product DLINK DWA182D1 0x331c DWA-182 rev D1 > product DLINK DWA171C1 0x331d DWA-171 rev C1 > product DLINK DWL122 0x3700 DWL-122 > Index: sys/dev/usb/usbdevs.h > === > RCS file: /cvs/src/sys/dev/usb/usbdevs.h,v > retrieving revision 1.729 > diff -u -p -r1.729 usbdevs.h > --- sys/dev/usb/usbdevs.h 22 Jun 2020 15:52:39 - 1.729 > +++ sys/dev/usb/usbdevs.h 6 Jul 2020 08:26:30 - > @@ -1570,6 +1570,7 @@ > #define USB_PRODUCT_DLINK_DWA131B 0x330d /* DWA-131 rev > B */ > #define USB_PRODUCT_DLINK_DWA125D1 0x330f /* DWA-125 rev > D1 */ > #define USB_PRODUCT_DLINK_DWA123D1 0x3310 /* DWA-123 rev > D1 */ > +#define USB_PRODUCT_DLINK_DWA121B1 0x331b /* DWA-121 rev > B1 */ > #define USB_PRODUCT_DLINK_DWA137A1 0x3317 /* DWA-137 rev > A1 */ > #define USB_PRODUCT_DLINK_DWA131E1 0x3319 /* DWA-131 rev > E1 */ > #define USB_PRODUCT_DLINK_DWA182D1 0x331c /* DWA-182 rev > D1 */ > Index: sys/dev/usb/usbdevs_data.h > === > RCS file: /cvs/src/sys/dev/usb/usbdevs_data.h,v > retrieving revision 1.723 > diff -u -p -r1.723 usbdevs_data.h > --- sys/dev/usb/usbdevs_data.h22 Jun 2020 15:52:39 - 1.723 > +++ sys/dev/usb/usbdevs_data.h6 Jul 2020 08:26:31 - > @@ -2618,6 +2618,10 @@ const struct usb_known_product usb_known > "DWA-123 rev D1", > }, > { > + USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA121B1, > + "DWA-121 rev B1", > + }, > + { > USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA137A1, > "DWA-137 rev A1", > }, > > > -- > Miguel Landaeta, nomadium at debian.org > secure email with PGP 0x6E608B637D8967E9 available at http://miguel.cc/key. > "Faith means not wanting to know what is true." -- Nietzsche > OpenBSD 6.7-current (CUSTOM) #0: Fri Jul 3 14:18:45 IST 2020 > mig...@lykke.nomadium.net:/sys/arch/arm64/compile/CUSTOM > real mem = 4092612608 (3903MB) > avail mem = 3891392512 (3711MB) > random: good seed from bootblocks > mainbus0 at root: Pine64 Pinebook Pro >
pipex(4) prevent `old_session_keys' memory leak
Before session freeing pipex_rele_session() will check and release `old_session_keys' if necessary. So use it instead of pool_put(9) within pipex_destroy_session(). Index: sys/net/pipex.c === RCS file: /cvs/src/sys/net/pipex.c,v retrieving revision 1.117 diff -u -p -r1.117 pipex.c --- sys/net/pipex.c 30 Jun 2020 14:05:13 - 1.117 +++ sys/net/pipex.c 6 Jul 2020 13:23:25 - @@ -652,7 +652,7 @@ pipex_destroy_session(struct pipex_sessi } pipex_unlink_session(session); - pool_put(_session_pool, session); + pipex_rele_session(session); return (0); }
Re: pipex(4): kill pipexintr()
On Mon, Jul 06, 2020 at 10:59:17AM +0200, Martin Pieuchot wrote: > On 01/07/20(Wed) 22:42, Vitaliy Makkoveev wrote: > > pipex(4) has 2 mbuf queues: `pipexinq' and `pipexoutq'. When mbuf passed > > to pipex it goes to one of these queues and pipexintr() will be > > scheduled to process them. pipexintr() called from `netisr' context. > > > > It's true for pppac(4) but for pppx(4) only incoming mbufs go to > > `pipexinq. Outgoing mbufs go directly to stack. pppx(4) enabled in > > npppd.conf(5) by default so I guess it's the common case of pipex(4) > > usage. > > > > The code looks like there is no requirements to this delayed mbufs > > processing, we can pass it directly to stack as we do for pppx(4) > > outgoing traffic. > > > > Also we have some troubles with pipexintr() as it was described in [1]. > > It's protection of `ph_cookie'. We don't this protection this time and > > we can't because we should brake if_get(9) logic. > > > > Diff below removes pipexintr(). Now all mbufs passed directly without > > enqueueing within pipex(4). We also can destroy sessions safe in all > > cases. We also can use if_get(9) instead using unreferenced pointers to > > `ifnet' within pipex(4). We also avoided context switch while we > > processing mbufs within pipex(4). We decreased latency. > > > > I'm seeding debian torrents with this diff an all goes well. > > With this diff the content of pipexintr() is no longer executed with the > KERNEL_LOCK() held. This can be seen by following the code starting in > ether_input(). > > Grabbing the KERNEL_LOCK() there is not a way forward. The whole idea > of if_input_process() is to be free of KERNEL_LOCK() to not introduce > latency delay. > > So this changes implies that `pipex_session_list' and possibly other > global data structures as well as the elements linked in those are all > protected by the NET_LOCK(). I believe this is the easiest way forward. > > That said I would be comfortable with this diff going in if an audit of > the data structures accessed in the code path starting by pipex_pppoe_input() > has been done. That implies annotating/documenting which data structures > are now protected by the NET_LOCK() and adding the necessary > NET_ASSERT_LOCK(). Such audit might lead to consider some ioctl code > path changes to now serialize on the NET_LOCK() instead of the > KERNEL_LOCK(). > pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but with two exceptions: 1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held. 2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK() held. It depends on `if_snd' usage. Diff below enforces pppac_start() to be called with NET_LOCK() held. Also all externally called pipex(4) input and output routines have NET_ASSERT_LOCKED() assertion. Now pipex(4) is fully protected by NET_LOCK() so description of struct members chenget too. Index: sys/net/if_pppx.c === RCS file: /cvs/src/sys/net/if_pppx.c,v retrieving revision 1.90 diff -u -p -r1.90 if_pppx.c --- sys/net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 +++ sys/net/if_pppx.c 6 Jul 2020 11:10:17 - @@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode ifp->if_output = pppac_output; ifp->if_start = pppac_start; ifp->if_ioctl = pppac_ioctl; + /* XXXSMP: be sure pppac_start() called under NET_LOCK() */ + IFQ_SET_MAXLEN(>if_snd, 1); if_counters_alloc(ifp); if_attach(ifp); Index: sys/net/pipex.c === RCS file: /cvs/src/sys/net/pipex.c,v retrieving revision 1.117 diff -u -p -r1.117 pipex.c --- sys/net/pipex.c 30 Jun 2020 14:05:13 - 1.117 +++ sys/net/pipex.c 6 Jul 2020 11:10:17 - @@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in struct ip ip; struct mbuf *mret; + NET_ASSERT_LOCKED(); session = NULL; mret = NULL; switch (af) { @@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct { u_char *cp, hdr[16]; + NET_ASSERT_LOCKED(); + #ifdef PIPEX_MPPE if (pipex_session_is_mppe_enabled(session)) { if (proto == PPP_IP) { @@ -1355,6 +1358,7 @@ pipex_pppoe_input(struct mbuf *m0, struc int hlen; struct pipex_pppoe_header pppoe; + NET_ASSERT_LOCKED(); /* already checked at pipex_pppoe_lookup_session */ KASSERT(m0->m_pkthdr.len >= (sizeof(struct ether_header) + sizeof(pppoe))); @@ -1586,6 +1590,7 @@ pipex_pptp_input(struct mbuf *m0, struct struct pipex_pptp_session *pptp_session; int rewind = 0; + NET_ASSERT_LOCKED(); KASSERT(m0->m_pkthdr.len >= PIPEX_IPGRE_HDRLEN); pptp_session = >proto.pptp; @@ -2031,6 +2036,7 @@ pipex_l2tp_input(struct mbuf *m0, int of uint16_t flags, ns = 0, nr = 0; int rewind = 0; +
DNS options for sppp(4)
Hello, This is an old patch from Gerhard Roth, and mpf@ dating back to 2007. Please see: https://marc.info/?l=openbsd-tech=134943767022961=2 I contacted Gerhard who said instead of begging for this I should make it IPv6 capable. So I tried and nearly flooded my ISP off the net (sorry), it seems that there is no IPV6CP capabilities for this yet. I googled and found a few drafts but all said the protocol number is TBD. Also I looked on IANA.org https://www.iana.org/assignments/ppp-numbers/ppp-numbers.xhtml and found no official IPV6CP specification. I'm submitting this patch because I have a need for DNS servers as my ISP does not give these out via official support channels. So I'm relying on either using a public resolver (such as 8.8.8.8) or unwind. No offence to unwind but I'd rather use these ISP nameservers if I knew where to get them. I have also made a forwarding feature on my authoritative nameserver (with caching) and would write the code to automatically configure on startup to automatically determined pppoe(4) nameservers. So this is the self pitch. Patch is below with lots of #if 0, that I'd clean up if it passes the political hurdles. Also I have no -current on me currently so this is all against 6.7. I understand there is wireguard code in ifconfig and I'd have to bite the bullet and sysupgrade a machine here to make new new patch fit. I just need an OK from someone and I'll clean it up otherwise I won't do anymore work on this subject. Patch after my signature: Best Regards, -peter Index: ifconfig.c === RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v retrieving revision 1.421 diff -u -p -u -r1.421 ifconfig.c --- ifconfig.c 27 Feb 2020 08:28:35 - 1.421 +++ ifconfig.c 6 Jul 2020 17:09:49 - @@ -690,6 +690,7 @@ u_int getwpacipher(const char *); void print_cipherset(u_int32_t); void spppauthinfo(struct sauthreq *, int); +void spppdnsinfo(struct sdnsreq *); /* Known address families */ const struct afswtch { @@ -5401,6 +5402,17 @@ spppauthinfo(struct sauthreq *spa, int d } void +spppdnsinfo(struct sdnsreq *spd) +{ + bzero(spd, sizeof(struct sdnsreq)); + + ifr.ifr_data = (caddr_t)spd; + spd->cmd = SPPPIOGDNS; + if (ioctl(sock, SIOCGSARAMS, ) == -1) + err(1, "SIOCGSARAMS(SPPPIOGDNS)"); +} + +void setsroto(const char *val, int d) { struct sauthreq spa; @@ -5534,6 +5546,11 @@ sppp_status(void) { struct spppreq spr; struct sauthreq spa; + struct sdnsreq spd; + struct sockaddr_storage *ss = NULL; + + char buf[INET6_ADDRSTRLEN]; + int i; bzero(, sizeof(spr)); @@ -5573,6 +5590,39 @@ sppp_status(void) if (spa.flags & AUTHFLAG_NORECHALLENGE) printf("norechallenge "); putchar('\n'); + + spppdnsinfo(); + for (i = 0; i < 2; i++) { + ss = (struct sockaddr_storage *)_dns[i]; + if (ss->ss_family == AF_UNSPEC) + continue; + + switch (i) { + case 0: + inet_ntop(AF_INET, &((struct sockaddr_in *)ss)->sin_addr.s_addr, + buf, sizeof(buf)); + printf("\tipcp pri_dns: %s\n", buf); + break; + case 1: + inet_ntop(AF_INET, &((struct sockaddr_in *)ss)->sin_addr.s_addr, + buf, sizeof(buf)); + printf("\tipcp sec_dns: %s\n", buf); + break; +#if notyet + case 2: + inet_ntop(AF_INET6, &((struct sockaddr_in6 *)ss)->sin6_addr, + buf, sizeof(buf)); + printf("\tpri_dns6: %s\n", buf); + break; + case 3: + inet_ntop(AF_INET6, &((struct sockaddr_in6 *)ss)->sin6_addr, + buf, sizeof(buf)); + printf("\tsec_dns6: %s\n", buf); + break; +#endif + } + } + } void Index: if_sppp.h === RCS file: /cvs/src/sys/net/if_sppp.h,v retrieving revision 1.27 diff -u -p -u -r1.27 if_sppp.h --- if_sppp.h 24 Jun 2019 21:36:53 - 1.27 +++ if_sppp.h 6 Jul 2020 17:12:27 - @@ -82,12 +82,18 @@ struct spppreq { enum ppp_phase phase; /* phase we're currently in */ }; +struct sdnsreq { + int cmd; + struct sockaddr_storage ss_dns[4]; +}; + #define SPPPIOGDEFS ((int)(('S' << 24) + (1 << 16) + sizeof(struct spppreq))) #define SPPPIOSDEFS ((int)(('S' << 24) + (2 << 16) + sizeof(struct spppreq))) #define SPPPIOGMAUTH ((int)(('S' << 24) + (3 << 16) + sizeof(struct sauthreq))) #define SPPPIOSMAUTH ((int)(('S' << 24) + (4 << 16) + sizeof(struct sauthreq)))
Re: pipex(4): kill pipexintr()
On 06/07/20(Mon) 19:23, Vitaliy Makkoveev wrote: > > On 6 Jul 2020, at 17:36, Martin Pieuchot wrote: > [...] > Unfortunately you can’t be sure about NET_LOCK() status while you are > in pppac_start(). It was described at this thread [1]. > > We have two cases: > 1. pppac_start() called from pppac_output(). NET_LOCK() was inherited. Such recursions should be avoided. if_enqueue() should take care of that.
Re: fix races in if_clone_create()
> On 6 Jul 2020, at 12:17, Martin Pieuchot wrote: > > On 01/07/20(Wed) 00:02, Vitaliy Makkoveev wrote: >> On Tue, Jun 30, 2020 at 03:48:22PM +0300, Vitaliy Makkoveev wrote: >>> On Tue, Jun 30, 2020 at 12:08:03PM +0200, Martin Pieuchot wrote: On 29/06/20(Mon) 11:59, Vitaliy Makkoveev wrote: > [...] > I reworked tool for reproduce. Now I avoided fork()/exec() route and it > takes couple of minutes to take panic on 4 cores. Also some screenshots > attached. Setting kern.pool_debug=2 makes the race reproducible in seconds. >> >> Unfortunately you will catch splassert() caused by kern/sched_bsd.c:304. >> malloc() will call yield() while we are holding NET_LOCK(). I attached >> screenshot with splassertion to this mail. > > With kern.splassert < 3 it is fine. > Could you turn this test into something committable in regress/? We can link it to the build once a fix is committed. >>> >>> We have 3 races with cloned interfaces: >>> 1. if_clone_create() vs if_clone_create() >>> 2. if_clone_destroy() vs if_clone_destroy() >>> 3. if_clone_destroy() vs the rest of stack >>> >>> It makes sences to commit unified test to regress/, so I suggest to wait >>> a little. >> >> The another solution. >> >> Diff below introduces per-`ifc' serialization for if_clone_create() and >> if_clone_destroy(). There is no index bitmap anymore. > > I like the simplification. More comments below: > >> +/* >> + * Lock a clone network interface. >> + */ >> +int >> +if_clone_lock(struct if_clone *ifc) >> +{ >> +int error; >> + >> +rw_enter_write(>ifc_lock); >> + >> +while (ifc->ifc_flags & IFC_CREATE_LOCKED) { >> +ifc->ifc_flags |= IFC_CREATE_LOCKWAIT; >> +error = rwsleep_nsec(>ifc_flags, >ifc_lock, >> +PWAIT|PCATCH, "ifclk", INFSLP); >> +if(error != 0) { >> +ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT; >> +rw_exit_write(>ifc_lock); >> +return error; >> +} >> +} >> +ifc->ifc_flags |= IFC_CREATE_LOCKED; >> +ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT; >> + >> +rw_exit_write(>ifc_lock); >> + >> +return 0; >> +} > > This is like re-implementing a rwlock but loosing the debugging ability of > WITNESS. The reason to do this is to avoid call `ifc_create’ with rwlock held. We have unique sleep points for each underlaying routine for `ifc_create’, so this "rwlock reimplementation" looks better. Also this lock is used in one place only and impact of loosing debugging ability is not such big. > > I also don't see any reason for having a per-ifc lock. If, at least one > of the problems, is a double insert in `ifnet' then we should be able to > assert that a lock is held when doing such assertion. This race breaks ifunit() not `if_list’. I mean LIST_*() operation are not broken because `le_{prev,next}’ are valid, but list is inconsistent of course. Since only "ifconfig clonerA0 create& ifconfig clonerA0 create” will break, I see no reason to deny simultaneous execution of “ifconfig clonerA0 create& ifconfig clonerB0 create”. Let this lock be per `ifc’ not global. > > Assertions and documentation are more important than preventing races > because they allow to build awareness and elegant solutions instead of > hacking diffs until stuff work without knowing why. > > There are two cases where `ifp' are inserted into `ifnet': > 1. by autoconf during boot or hotplug > 2. by cloning ioctl > > In the second case it is always about pseudo-devices. So the assertion > should be conditional like: > > if (ISSET(ifp->if_xflags, IFXF_CLONED)) > rw_assert_wrlock(_lock); > > In other words this fixes serializes insertions/removal on the global > list `ifnet', the KERNEL_LOCK() being still required for reading it. > > Is there any other data structure which ends up being protected by this > approach and could be documented? We should be sure there is no multiple `ifnet’s in `if_list’ with the same `if_xname’. And the assertion you proposed looks not obvious here. Assertion like below looks more reasonable but introduces performance impact. cut begin void if_attach(struct ifnet *ifp) { if_attach_common(ifp); NET_LOCK(); KASSERT(ifunit(ifp->if_xname) == NULL); TAILQ_INSERT_TAIL(, ifp, if_list); if_attachsetup(ifp); NET_UNLOCK(); } cut end I guess the commentary within if_clone_create() is the best solution. Something like this: “Deny simultaneous execution to prevent multiple creation of interfaces with the same name”.
Re: pipex(4): kill pipexintr()
On 06/07/20(Mon) 16:42, Vitaliy Makkoveev wrote: > [...] > pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but > with two exceptions: > > 1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held. > 2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK() >held. It depends on `if_snd' usage. > > Diff below enforces pppac_start() to be called with NET_LOCK() held. > Also all externally called pipex(4) input and output routines have > NET_ASSERT_LOCKED() assertion. > > Now pipex(4) is fully protected by NET_LOCK() so description of struct > members chenget too. > > Index: sys/net/if_pppx.c > === > RCS file: /cvs/src/sys/net/if_pppx.c,v > retrieving revision 1.90 > diff -u -p -r1.90 if_pppx.c > --- sys/net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 > +++ sys/net/if_pppx.c 6 Jul 2020 11:10:17 - > @@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode > ifp->if_output = pppac_output; > ifp->if_start = pppac_start; > ifp->if_ioctl = pppac_ioctl; > + /* XXXSMP: be sure pppac_start() called under NET_LOCK() */ > + IFQ_SET_MAXLEN(>if_snd, 1); Is it possible to grab the NET_LOCK() inside pppac_start() instead of grabbing it outside? This should allow *start() routine to be called from any context. It might be interesting to see that as a difference between the NET_LOCK() used to protect the network stack internals and the NET_LOCK() used to protect pipex(4) internals. Such distinction might help to convert the latter into a different lock or primitive. > Index: sys/net/pipex.c > === > RCS file: /cvs/src/sys/net/pipex.c,v > retrieving revision 1.117 > diff -u -p -r1.117 pipex.c > --- sys/net/pipex.c 30 Jun 2020 14:05:13 - 1.117 > +++ sys/net/pipex.c 6 Jul 2020 11:10:17 - > @@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in > struct ip ip; > struct mbuf *mret; > > + NET_ASSERT_LOCKED(); This function doesn't touch any shared data structure, we'd better move the NET_ASSERT_LOCKED() above rn_lookuo() in pipex_lookup_by_ip_address(). Note that `pipex_rd_head4' and `pipex_rd_head6' are, with this diff, also protected by the NET_LOCK() and should be annotated as such. > session = NULL; > mret = NULL; > switch (af) { > @@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct > { > u_char *cp, hdr[16]; > > + NET_ASSERT_LOCKED(); Same here, it seems that the only reason the NET_LOCK() is necessary in the output path is to prevent corruption of the `session' descriptor being used. So we'd rather put the assertion above the LIST_FOREACH(). Anyway all of those can be addressed later, your diff is ok mpi@
Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()
On Mon, Jul 06, 2020 at 11:57:32PM +0200, Mark Kettenis wrote: > > Date: Mon, 6 Jul 2020 16:40:35 -0500 > > From: Scott Cheloha > > > > On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote: > > > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote: > > > > > > > Hi, > > > > > > > > When we recompute the scaling factor during tc_windup() there is an > > > > opportunity for arithmetic overflow/underflow when we add the NTP > > > > adjustment into the scale: > > > > > > > >649 scale = (u_int64_t)1 << 63; > > > >650 scale += \ > > > >651 ((th->th_adjustment + th->th_counter->tc_freq_adj) > > > > / 1024) * 2199; > > > >652 scale /= th->th_counter->tc_frequency; > > > >653 th->th_scale = scale * 2; > > > > > > > > At lines 650 and 651, you will overflow/underflow if > > > > th->th_counter->tc_freq_adj is sufficiently positive/negative. > > > > > > > > I don't like the idea of checking for that overflow during > > > > tc_windup(). We can pick a reasonable adjustment range and check for > > > > it during adjfreq(2) and that should be good enough. > > > > > > > > My strawman proposal is a range of -5 to 5 parts per > > > > billion. We could push the limits a bit, but half a billion seems > > > > like a nice round number to me. > > > > > > > > On a perfect clock, this means you can effect a 0.5x slowdown or a > > > > 1.5x speedup via adjfreq(2), but no slower/faster. > > > > > > > > I don't *think* ntpd(8) would ever reach such extreme adjustments > > > > through its algorithm. I don't think this will break anyone's working > > > > setup. > > > > > > > > (Maybe I'm wrong, though. otto@?) > > > > > > Right, ntpd is pretty conversative and won't do big adjustments. > > > > So, what is the right way to describe these limits? > > > > "Parts per billion"? Something else? > > The traditional way to express clock drift in the context of NTP is > "parts per million" or "ppm" for short. There are good reasons to > avoid using "billion" in documentation as a very similar word is used > in Germanic languages for 10^12 where in English you mean 10^9. Huh. So from what I've gathered: German English ISO Million Million 10^6 Milliarde Billion 10^9 Billion Trillion10^12 Billiarde Quadrillion 10^15 TrillionQuintillion 10^18 Potentially confusing. The "German Connection" to NTP in particular eludes me, though. > Stripping three zeroes also makes it easier to read. [...] Yes, it always does. It looks nicer as "N ppm" in the mandoc(1) output, too. -- otto@, from what you said I take it you're OK with the new limits so I'm going commit it as follows in a day or two. Index: lib/libc/sys/adjfreq.2 === RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v retrieving revision 1.7 diff -u -p -r1.7 adjfreq.2 --- lib/libc/sys/adjfreq.2 10 Sep 2015 17:55:21 - 1.7 +++ lib/libc/sys/adjfreq.2 6 Jul 2020 23:00:24 - @@ -60,6 +60,9 @@ The .Fa freq argument is non-null and the process's effective user ID is not that of the superuser. +.It Bq Er EINVAL +.Fa freq +is less than -50 ppm or greater than 50 ppm. .El .Sh SEE ALSO .Xr date 1 , Index: sys/kern/kern_time.c === RCS file: /cvs/src/sys/kern/kern_time.c,v retrieving revision 1.131 diff -u -p -r1.131 kern_time.c --- sys/kern/kern_time.c22 Jun 2020 18:25:57 - 1.131 +++ sys/kern/kern_time.c6 Jul 2020 23:00:24 - @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v return (0); } +#define ADJFREQ_MAX (5LL << 32) +#define ADJFREQ_MIN (-5LL << 32) + int sys_adjfreq(struct proc *p, void *v, register_t *retval) { @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg return (error); if ((error = copyin(freq, , sizeof(f return (error); + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX) + return (EINVAL); } rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
Re: userland clock_gettime proof of concept
Ah, it was seen. But everyone has too much memory these days. Vitaliy Makkoveev wrote: > Sorry for late reaction. At least VirtualBox based virtual machines > started to panic with the recent kernel. I reverted your diff and panics > stopped. > Screenshot attached.
Re: pipex(4): kill pipexintr()
> On 6 Jul 2020, at 17:36, Martin Pieuchot wrote: > > On 06/07/20(Mon) 16:42, Vitaliy Makkoveev wrote: >> [...] >> pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but >> with two exceptions: >> >> 1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held. >> 2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK() >> held. It depends on `if_snd' usage. >> >> Diff below enforces pppac_start() to be called with NET_LOCK() held. >> Also all externally called pipex(4) input and output routines have >> NET_ASSERT_LOCKED() assertion. >> >> Now pipex(4) is fully protected by NET_LOCK() so description of struct >> members chenget too. >> >> Index: sys/net/if_pppx.c >> === >> RCS file: /cvs/src/sys/net/if_pppx.c,v >> retrieving revision 1.90 >> diff -u -p -r1.90 if_pppx.c >> --- sys/net/if_pppx.c24 Jun 2020 08:52:53 - 1.90 >> +++ sys/net/if_pppx.c6 Jul 2020 11:10:17 - >> @@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode >> ifp->if_output = pppac_output; >> ifp->if_start = pppac_start; >> ifp->if_ioctl = pppac_ioctl; >> +/* XXXSMP: be sure pppac_start() called under NET_LOCK() */ >> +IFQ_SET_MAXLEN(>if_snd, 1); > > Is it possible to grab the NET_LOCK() inside pppac_start() instead of > grabbing it outside? This should allow *start() routine to be called > from any context. Unfortunately you can’t be sure about NET_LOCK() status while you are in pppac_start(). It was described at this thread [1]. We have two cases: 1. pppac_start() called from pppac_output(). NET_LOCK() was inherited. 2. pppac_start() called from `systq’. There is no NET_LOCK() held. NET_LOCK() is not required for pipex_{,ppp}_output() because underlay routines were switched to ip{,6}_send(). I intentionally want to made all pipex(4) locked by one lock and be sure it’s locking is consistent. I hope to start implementing fine graining locks just after pipex(4) be switched to if_get(9). > > It might be interesting to see that as a difference between the NET_LOCK() > used to protect the network stack internals and the NET_LOCK() used to > protect pipex(4) internals. Such distinction might help to convert the > latter into a different lock or primitive. > >> Index: sys/net/pipex.c >> === >> RCS file: /cvs/src/sys/net/pipex.c,v >> retrieving revision 1.117 >> diff -u -p -r1.117 pipex.c >> --- sys/net/pipex.c 30 Jun 2020 14:05:13 - 1.117 >> +++ sys/net/pipex.c 6 Jul 2020 11:10:17 - >> @@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in >> struct ip ip; >> struct mbuf *mret; >> >> +NET_ASSERT_LOCKED(); > > This function doesn't touch any shared data structure, we'd better move > the NET_ASSERT_LOCKED() above rn_lookuo() in pipex_lookup_by_ip_address(). > > Note that `pipex_rd_head4' and `pipex_rd_head6' are, with this diff, > also protected by the NET_LOCK() and should be annotated as such. > >> session = NULL; >> mret = NULL; >> switch (af) { >> @@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct >> { >> u_char *cp, hdr[16]; >> >> +NET_ASSERT_LOCKED(); > > Same here, it seems that the only reason the NET_LOCK() is necessary in > the output path is to prevent corruption of the `session' descriptor being > used. So we'd rather put the assertion above the LIST_FOREACH(). They touch `session->stat’ :) We can switch pipex(4) to percpu counters and output will be lockless. But I guess this should be done later. > > Anyway all of those can be addressed later, your diff is ok mpi@ > Thanks. I will commit it if no one has objections. 1. https://marc.info/?t=15899861152=1=2
ksh(1) set -o pipefail
Requested by ajacoutot@, here's an attempt at implementing set -o pipefail. As pointed by sthen@ this option should be included in the next POSIX standard: https://www.austingroupbugs.net/view.php?id=789 There are several ways to implement it, the diff I showed to Antoine was based on option 2) in https://www.austingroupbugs.net/view.php?id=789#c4102 whereas the diff below implements option 1) described by kre@NetBSD, which looks like the most sensible approach. No, this doesn't add support for a PIPESTATUS array, I'm not sure we need it. This diff includes manpage and regress bits. Thoughts, oks? Index: bin/ksh/jobs.c === RCS file: /d/cvs/src/bin/ksh/jobs.c,v retrieving revision 1.61 diff -u -p -r1.61 jobs.c --- bin/ksh/jobs.c 28 Jun 2019 13:34:59 - 1.61 +++ bin/ksh/jobs.c 6 Jul 2020 18:10:43 - @@ -70,6 +70,7 @@ struct proc { #define JF_REMOVE 0x200 /* flagged for removal (j_jobs()/j_notify()) */ #define JF_USETTYMODE 0x400 /* tty mode saved if process exits normally */ #define JF_SAVEDTTYPGRP0x800 /* j->saved_ttypgrp is valid */ +#define JF_PIPEFAIL0x1000 /* pipefail on when job was started */ typedef struct job Job; struct job { @@ -421,6 +422,8 @@ exchild(struct op *t, int flags, volatil */ j->flags = (flags & XXCOM) ? JF_XXCOM : ((flags & XBGND) ? 0 : (JF_FG|JF_USETTYMODE)); + if (Flag(FPIPEFAIL)) + j->flags |= JF_PIPEFAIL; timerclear(>usrtime); timerclear(>systime); j->state = PRUNNING; @@ -1084,7 +1087,30 @@ j_waitj(Job *j, j_usrtime = j->usrtime; j_systime = j->systime; - rv = j->status; + + if (j->flags & JF_PIPEFAIL) { + Proc *p; + int status; + + rv = 0; + for (p = j->proc_list; p != NULL; p = p->next) { + switch (p->state) { + case PEXITED: + status = WEXITSTATUS(p->status); + break; + case PSIGNALLED: + status = 128 + WTERMSIG(p->status); + break; + default: + status = 0; + break; + } + if (status) + rv = status; + } + } else + rv = j->status; + if (!(flags & JW_ASYNCNOTIFY) && (!Flag(FMONITOR) || j->state != PSTOPPED)) { Index: bin/ksh/ksh.1 === RCS file: /d/cvs/src/bin/ksh/ksh.1,v retrieving revision 1.208 diff -u -p -r1.208 ksh.1 --- bin/ksh/ksh.1 26 Nov 2019 22:49:01 - 1.208 +++ bin/ksh/ksh.1 6 Jul 2020 16:04:34 - @@ -361,7 +361,9 @@ token to form pipelines, in which the st last is piped (see .Xr pipe 2 ) to the standard input of the following command. -The exit status of a pipeline is that of its last command. +The exit status of a pipeline is that of its last command, unless the +.Ic pipefail +option is set. A pipeline may be prefixed by the .Ql \&! reserved word, which causes the exit status of the pipeline to be logically @@ -3664,6 +3666,10 @@ See the and .Ic pwd commands above for more details. +.It Ic pipefail +The exit status of a pipeline is the exit status of the rightmost +command in the pipeline that doesn't return 0, or 0 if all commands +returned a 0 exit status. .It Ic posix Enable POSIX mode. See Index: bin/ksh/misc.c === RCS file: /d/cvs/src/bin/ksh/misc.c,v retrieving revision 1.73 diff -u -p -r1.73 misc.c --- bin/ksh/misc.c 28 Jun 2019 13:34:59 - 1.73 +++ bin/ksh/misc.c 6 Jul 2020 16:04:34 - @@ -147,6 +147,7 @@ const struct option sh_options[] = { { "notify", 'b',OF_ANY }, { "nounset",'u',OF_ANY }, { "physical", 0,OF_ANY }, /* non-standard */ + { "pipefail", 0,OF_ANY }, /* non-standard */ { "posix",0,OF_ANY }, /* non-standard */ { "privileged", 'p',OF_ANY }, { "restricted", 'r',OF_CMDLINE }, Index: bin/ksh/sh.h === RCS file: /d/cvs/src/bin/ksh/sh.h,v retrieving revision 1.75 diff -u -p -r1.75 sh.h --- bin/ksh/sh.h20 Feb 2019 23:59:17 - 1.75 +++ bin/ksh/sh.h6 Jul 2020 16:13:59 - @@ -158,6 +158,7 @@ enum sh_flag { FNOTIFY,/* -b: asynchronous job completion notification */ FNOUNSET, /* -u: using an unset var is an error */ FPHYSICAL, /* -o physical: don't do
Re: amd64: lapic: refactor lapic timer programming
On Fri, Jul 03, 2020 at 07:41:45PM -0500, Scott Cheloha wrote: > Hi, > > I want to run the lapic timer in one-shot mode on amd64 as we do with > other interrupt clocks on other platforms. I aim to make the clock > interrupt code MD where possible. > > However, nobody is going to test my MD clock interrupt work unless > amd64 is ready to use it. amd64 doesn't run in oneshot mode so there > is preliminary work to do first. > > -- > > Before we can run the lapic timer in one-shot mode we need to simplify > the process of actually programming it. > > This patch refactors all lapic timer programming into a single > routine. We don't use any divisor other than 1 so I don't see a need > to make it a parameter to lapic_timer_arm(). We can add TSC deadline > support later if someone wants it. > > The way we program the timer differs from how e.g. Darwin and FreeBSD > and Linux do it. They write: > > - lvtt (mode + vector + (maybe) mask) > - dcr > - icr > > while we do: > > - lvtt (mode + mask) > - dcr > - icr > - (maybe) lvtt (mode + vector) > > I don't see a reason to arm the timer with four writes instead of > three, so in this patch I use the three-write ordering. > > Am I missing something? Do I need to disable interrupts before I > reprogram the timer? > This reads ok to me. I am not aware of any requirements to disable interrupts while reprogramming the timer. -ml > -Scott > > Index: lapic.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v > retrieving revision 1.55 > diff -u -p -r1.55 lapic.c > --- lapic.c 3 Aug 2019 14:57:51 - 1.55 > +++ lapic.c 4 Jul 2020 00:40:26 - > @@ -413,6 +413,42 @@ u_int32_t lapic_frac_usec_per_cycle; > u_int64_t lapic_frac_cycle_per_usec; > u_int32_t lapic_delaytab[26]; > > +void lapic_timer_arm(uint32_t, int, uint32_t); > +void lapic_timer_arm_once(int, uint32_t); > +void lapic_timer_arm_period(int, uint32_t); > + > +/* > + * Start the local apic countdown timer. > + * > + * First set the mode, vector, and (maybe) the mask. > + * then set the divisor, > + * and finally set the cycle count. > + */ > +void > +lapic_timer_arm(uint32_t mode, int masked, uint32_t cycles) > +{ > + uint32_t lvtt; > + > + lvtt = mode | LAPIC_TIMER_VECTOR; > + lvtt |= (masked) ? LAPIC_LVTT_M : 0; > + > + lapic_writereg(LAPIC_LVTT, lvtt); > + lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1); > + lapic_writereg(LAPIC_ICR_TIMER, cycles); > +} > + > +void > +lapic_timer_arm_once(int masked, uint32_t cycles) > +{ > + lapic_timer_arm(LAPIC_LVTT_TM_ONESHOT, masked, cycles); > +} > + > +void > +lapic_timer_arm_period(int masked, uint32_t cycles) > +{ > + lapic_timer_arm(LAPIC_LVTT_TM_PERIODIC, masked, cycles); > +} > + > void > lapic_clockintr(void *arg, struct intrframe frame) > { > @@ -430,17 +466,7 @@ lapic_clockintr(void *arg, struct intrfr > void > lapic_startclock(void) > { > - /* > - * Start local apic countdown timer running, in repeated mode. > - * > - * Mask the clock interrupt and set mode, > - * then set divisor, > - * then unmask and set the vector. > - */ > - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_LVTT_M); > - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1); > - lapic_writereg(LAPIC_ICR_TIMER, lapic_tval); > - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_TIMER_VECTOR); > + lapic_timer_arm_period(0, lapic_tval); > } > > void > @@ -498,9 +524,7 @@ lapic_calibrate_timer(struct cpu_info *c >* Configure timer to one-shot, interrupt masked, >* large positive number. >*/ > - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_M); > - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1); > - lapic_writereg(LAPIC_ICR_TIMER, 0x8000); > + lapic_timer_arm_once(1, 0x8000); > > s = intr_disable(); > > @@ -540,10 +564,7 @@ skip_calibration: > lapic_tval = (lapic_per_second * 2) / hz; > lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1); > > - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM | LAPIC_LVTT_M | > - LAPIC_TIMER_VECTOR); > - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1); > - lapic_writereg(LAPIC_ICR_TIMER, lapic_tval); > + lapic_timer_arm_period(0, lapic_tval); > > /* >* Compute fixed-point ratios between cycles and >
Re: pipex(4): kill pipexintr()
On Mon, Jul 06, 2020 at 08:47:23PM +0200, Martin Pieuchot wrote: > On 06/07/20(Mon) 19:23, Vitaliy Makkoveev wrote: > > > On 6 Jul 2020, at 17:36, Martin Pieuchot wrote: > > [...] > > Unfortunately you can’t be sure about NET_LOCK() status while you are > > in pppac_start(). It was described at this thread [1]. > > > > We have two cases: > > 1. pppac_start() called from pppac_output(). NET_LOCK() was inherited. > > Such recursions should be avoided. if_enqueue() should take care of > that. I suggest to finish the route to if_get(9) before. Updated diff which removes pipexintr() below. Just against the most resent source tree. Index: lib/libc/sys/sysctl.2 === RCS file: /cvs/src/lib/libc/sys/sysctl.2,v retrieving revision 1.40 diff -u -p -r1.40 sysctl.2 --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 - 1.40 +++ lib/libc/sys/sysctl.2 6 Jul 2020 21:55:16 - @@ -2033,35 +2033,11 @@ The currently defined variable names are .Bl -column "Third level name" "integer" "Changeable" -offset indent .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable" .It Dv PIPEXCTL_ENABLE Ta integer Ta yes -.It Dv PIPEXCTL_INQ Ta node Ta not applicable -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable .El .Bl -tag -width "123456" .It Dv PIPEXCTL_ENABLE If set to 1, enable PIPEX processing. The default is 0. -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq -Fourth level comprises an array of -.Vt struct ifqueue -structures containing information about the PIPEX packet input queue. -The forth level names for the elements of -.Vt struct ifqueue -are the same as described in -.Li ip.arpq -in the -.Dv PF_INET -section. -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq -Fourth level comprises an array of -.Vt struct ifqueue -structures containing information about PIPEX packet output queue. -The forth level names for the elements of -.Vt struct ifqueue -are the same as described in -.Li ip.arpq -in the -.Dv PF_INET -section. .El .El .Ss CTL_VFS Index: sys/net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.611 diff -u -p -r1.611 if.c --- sys/net/if.c30 Jun 2020 09:31:38 - 1.611 +++ sys/net/if.c6 Jul 2020 21:55:17 - @@ -1012,13 +1012,6 @@ if_netisr(void *unused) KERNEL_UNLOCK(); } #endif -#ifdef PIPEX - if (n & (1 << NETISR_PIPEX)) { - KERNEL_LOCK(); - pipexintr(); - KERNEL_UNLOCK(); - } -#endif t |= n; } Index: sys/net/netisr.h === RCS file: /cvs/src/sys/net/netisr.h,v retrieving revision 1.51 diff -u -p -r1.51 netisr.h --- sys/net/netisr.h6 Aug 2019 22:57:54 - 1.51 +++ sys/net/netisr.h6 Jul 2020 21:55:17 - @@ -48,7 +48,6 @@ #defineNETISR_IPV6 24 /* same as AF_INET6 */ #defineNETISR_ISDN 26 /* same as AF_E164 */ #defineNETISR_PPP 28 /* for PPP processing */ -#defineNETISR_PIPEX27 /* for pipex processing */ #defineNETISR_BRIDGE 29 /* for bridge processing */ #defineNETISR_PPPOE30 /* for pppoe processing */ #defineNETISR_SWITCH 31 /* for switch dataplane */ @@ -68,7 +67,6 @@ void bridgeintr(void); void pppoeintr(void); void switchintr(void); void pfsyncintr(void); -void pipexintr(void); #defineschednetisr(anisr) \ do { \ Index: sys/net/pipex.c === RCS file: /cvs/src/sys/net/pipex.c,v retrieving revision 1.119 diff -u -p -r1.119 pipex.c --- sys/net/pipex.c 6 Jul 2020 20:37:51 - 1.119 +++ sys/net/pipex.c 6 Jul 2020 21:55:17 - @@ -97,10 +97,6 @@ struct radix_node_head *pipex_rd_head6 = struct timeout pipex_timer_ch; /* callout timer context */ int pipex_prune = 1; /* walk list every seconds */ -/* pipex traffic queue */ -struct mbuf_queue pipexinq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET); -struct mbuf_queue pipexoutq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET); - /* borrow an mbuf pkthdr field */ #define ph_ppp_proto ether_vtag @@ -713,82 +709,6 @@ pipex_lookup_by_session_id(int protocol, } /*** - * Queue and Software Interrupt Handler - ***/ -void -pipexintr(void) -{ - struct pipex_session *pkt_session; - u_int16_t proto; - struct mbuf *m; - struct mbuf_list ml; - - /* ppp output */ -
Re: [patch] dhclient(8) crashes with vm.malloc_conf=J
> On 7 Jul 2020, at 7:57 am, Jesper Wallin wrote: > > Hi all, > > I received a segmentation fault from dhclient(8) upon boot and decided > to investigate... My system is running with vm.malloc_conf=CFGJUR and > figured one of those options was the cause of the crash. I noticed that > the buffer which holds my config options contained a lot of junk at the > end and learned that 'J' is to blame together with a missing \0. > > > How to reproduce: > # sysctl vm.malloc_conf=J > # cp /etc/dhclient.conf /etc/dhclient.conf.backup > # echo 'supersede domain-name "ifconfig.se";' > /etc/dhclient.conf > > Then run 'dhclient if0' a lot of times until it crashes, sometimes it > takes more than 100 attempts. Using vm.malloc_conf=CFGJUR might trigger > it faster. > > > In clparse.c:916, malloc(3) is used to get a buffer of the same length > as the option in the config file. But with 'J' in vm.malloc_conf, the > buffer is bigger and contains junk. I wouldn't say that my fix is the > prettiest, but I get an extra byte and zero out the buffer. Maybe > someone has a more elegant fix for this. > > > Yours, > Jesper Wallin you might want to put the memset after the check to see if the malloc failed... > Index: clparse.c > === > RCS file: /cvs/src/sbin/dhclient/clparse.c,v > retrieving revision 1.199 > diff -u -p -r1.199 clparse.c > --- clparse.c 13 May 2020 20:55:41 - 1.199 > +++ clparse.c 6 Jul 2020 21:25:54 - > @@ -913,7 +913,8 @@ parse_option(FILE *cfile, int *code, str > } while (*fmt == 'A' && token == ','); > > free(options[i].data); > - options[i].data = malloc(hunkix); > + options[i].data = malloc(hunkix+1); > + memset(options[i].data, 0, hunkix+1); > if (options[i].data == NULL) > fatal("option data"); > memcpy(options[i].data, hunkbuf, hunkix); >
Re: userland clock_gettime proof of concept
> On 5 Jul 2020, at 20:31, Paul Irofti wrote: > > On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote: >> >> >> În 3 iulie 2020 17:55:25 EEST, Mark Kettenis a >> scris: Date: Fri, 3 Jul 2020 15:13:22 +0200 From: Robert Nagy On 02/07/20 00:31 +0100, Stuart Henderson wrote: > running on 38 of these, btw. been running with this on all my workstations and laptops and on 3 >>> build servers as well >>> >>> Are the issue that naddy@ saw solved? >>> >>> Did anybody do a *proper* test on anything besides amd64? Especially >>> on architectures where the optimized clock_gettime is *not* available? >> >> Yes and yes. > > So, can we go ahead with this? > Sorry for late reaction. At least VirtualBox based virtual machines started to panic with the recent kernel. I reverted your diff and panics stopped. Screenshot attached.
Re: userland clock_gettime proof of concept
Sorry for late reaction. At least VirtualBox based virtual machines started to panic with the recent kernel. I reverted your diff and panics stopped. Screenshot attached.
Re: ksh(1) set -o pipefail
On Mon, 06 Jul 2020 22:22:36 +0200, Jeremie Courreges-Anglas wrote: > Requested by ajacoutot@, here's an attempt at implementing set -o > pipefail. As pointed by sthen@ this option should be included in the > next POSIX standard: > > https://www.austingroupbugs.net/view.php?id=789 > > There are several ways to implement it, the diff I showed to Antoine was > based on option 2) in > > https://www.austingroupbugs.net/view.php?id=789#c4102 > > whereas the diff below implements option 1) described by kre@NetBSD, > which looks like the most sensible approach. Looks good to me. OK millert@ - todd
Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()
On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote: > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote: > > > Hi, > > > > When we recompute the scaling factor during tc_windup() there is an > > opportunity for arithmetic overflow/underflow when we add the NTP > > adjustment into the scale: > > > >649 scale = (u_int64_t)1 << 63; > >650 scale += \ > >651 ((th->th_adjustment + th->th_counter->tc_freq_adj) / > > 1024) * 2199; > >652 scale /= th->th_counter->tc_frequency; > >653 th->th_scale = scale * 2; > > > > At lines 650 and 651, you will overflow/underflow if > > th->th_counter->tc_freq_adj is sufficiently positive/negative. > > > > I don't like the idea of checking for that overflow during > > tc_windup(). We can pick a reasonable adjustment range and check for > > it during adjfreq(2) and that should be good enough. > > > > My strawman proposal is a range of -5 to 5 parts per > > billion. We could push the limits a bit, but half a billion seems > > like a nice round number to me. > > > > On a perfect clock, this means you can effect a 0.5x slowdown or a > > 1.5x speedup via adjfreq(2), but no slower/faster. > > > > I don't *think* ntpd(8) would ever reach such extreme adjustments > > through its algorithm. I don't think this will break anyone's working > > setup. > > > > (Maybe I'm wrong, though. otto@?) > > Right, ntpd is pretty conversative and won't do big adjustments. So, what is the right way to describe these limits? "Parts per billion"? Something else? Index: lib/libc/sys/adjfreq.2 === RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v retrieving revision 1.7 diff -u -p -r1.7 adjfreq.2 --- lib/libc/sys/adjfreq.2 10 Sep 2015 17:55:21 - 1.7 +++ lib/libc/sys/adjfreq.2 6 Jul 2020 21:39:37 - @@ -60,6 +60,10 @@ The .Fa freq argument is non-null and the process's effective user ID is not that of the superuser. +.It Bq Er EINVAL +.Fa freq +is less than -5 parts-per-billion or greater than 5 +parts-per-billion. .El .Sh SEE ALSO .Xr date 1 , Index: sys/kern/kern_time.c === RCS file: /cvs/src/sys/kern/kern_time.c,v retrieving revision 1.131 diff -u -p -r1.131 kern_time.c --- sys/kern/kern_time.c22 Jun 2020 18:25:57 - 1.131 +++ sys/kern/kern_time.c6 Jul 2020 21:39:38 - @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v return (0); } +#define ADJFREQ_MAX (5LL << 32) +#define ADJFREQ_MIN (-5LL << 32) + int sys_adjfreq(struct proc *p, void *v, register_t *retval) { @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg return (error); if ((error = copyin(freq, , sizeof(f return (error); + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX) + return (EINVAL); } rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()
> Date: Mon, 6 Jul 2020 16:40:35 -0500 > From: Scott Cheloha > > On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote: > > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote: > > > > > Hi, > > > > > > When we recompute the scaling factor during tc_windup() there is an > > > opportunity for arithmetic overflow/underflow when we add the NTP > > > adjustment into the scale: > > > > > >649 scale = (u_int64_t)1 << 63; > > >650 scale += \ > > >651 ((th->th_adjustment + th->th_counter->tc_freq_adj) / > > > 1024) * 2199; > > >652 scale /= th->th_counter->tc_frequency; > > >653 th->th_scale = scale * 2; > > > > > > At lines 650 and 651, you will overflow/underflow if > > > th->th_counter->tc_freq_adj is sufficiently positive/negative. > > > > > > I don't like the idea of checking for that overflow during > > > tc_windup(). We can pick a reasonable adjustment range and check for > > > it during adjfreq(2) and that should be good enough. > > > > > > My strawman proposal is a range of -5 to 5 parts per > > > billion. We could push the limits a bit, but half a billion seems > > > like a nice round number to me. > > > > > > On a perfect clock, this means you can effect a 0.5x slowdown or a > > > 1.5x speedup via adjfreq(2), but no slower/faster. > > > > > > I don't *think* ntpd(8) would ever reach such extreme adjustments > > > through its algorithm. I don't think this will break anyone's working > > > setup. > > > > > > (Maybe I'm wrong, though. otto@?) > > > > Right, ntpd is pretty conversative and won't do big adjustments. > > So, what is the right way to describe these limits? > > "Parts per billion"? Something else? The traditional way to express clock drift in the context of NTP is "parts per million" or "ppm" for short. There are good reasons to avoid using "billion" in documentation as a very similar word is used in Germanic languages for 10^12 where in English you mean 10^9. Stripping three zeroes also makes it easier to read. I don't think the fact that this is expressed in the code as "parts per billion" is an issue. Cheers, Mark > Index: lib/libc/sys/adjfreq.2 > === > RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v > retrieving revision 1.7 > diff -u -p -r1.7 adjfreq.2 > --- lib/libc/sys/adjfreq.210 Sep 2015 17:55:21 - 1.7 > +++ lib/libc/sys/adjfreq.26 Jul 2020 21:39:37 - > @@ -60,6 +60,10 @@ The > .Fa freq > argument is non-null and the process's effective user ID is not that > of the superuser. > +.It Bq Er EINVAL > +.Fa freq > +is less than -5 parts-per-billion or greater than 5 > +parts-per-billion. > .El > .Sh SEE ALSO > .Xr date 1 , > Index: sys/kern/kern_time.c > === > RCS file: /cvs/src/sys/kern/kern_time.c,v > retrieving revision 1.131 > diff -u -p -r1.131 kern_time.c > --- sys/kern/kern_time.c 22 Jun 2020 18:25:57 - 1.131 > +++ sys/kern/kern_time.c 6 Jul 2020 21:39:38 - > @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v > return (0); > } > > +#define ADJFREQ_MAX (5LL << 32) > +#define ADJFREQ_MIN (-5LL << 32) > + > int > sys_adjfreq(struct proc *p, void *v, register_t *retval) > { > @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg > return (error); > if ((error = copyin(freq, , sizeof(f > return (error); > + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX) > + return (EINVAL); > } > > rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE); > >
[patch] dhclient(8) crashes with vm.malloc_conf=J
Hi all, I received a segmentation fault from dhclient(8) upon boot and decided to investigate... My system is running with vm.malloc_conf=CFGJUR and figured one of those options was the cause of the crash. I noticed that the buffer which holds my config options contained a lot of junk at the end and learned that 'J' is to blame together with a missing \0. How to reproduce: # sysctl vm.malloc_conf=J # cp /etc/dhclient.conf /etc/dhclient.conf.backup # echo 'supersede domain-name "ifconfig.se";' > /etc/dhclient.conf Then run 'dhclient if0' a lot of times until it crashes, sometimes it takes more than 100 attempts. Using vm.malloc_conf=CFGJUR might trigger it faster. In clparse.c:916, malloc(3) is used to get a buffer of the same length as the option in the config file. But with 'J' in vm.malloc_conf, the buffer is bigger and contains junk. I wouldn't say that my fix is the prettiest, but I get an extra byte and zero out the buffer. Maybe someone has a more elegant fix for this. Yours, Jesper Wallin Index: clparse.c === RCS file: /cvs/src/sbin/dhclient/clparse.c,v retrieving revision 1.199 diff -u -p -r1.199 clparse.c --- clparse.c 13 May 2020 20:55:41 - 1.199 +++ clparse.c 6 Jul 2020 21:25:54 - @@ -913,7 +913,8 @@ parse_option(FILE *cfile, int *code, str } while (*fmt == 'A' && token == ','); free(options[i].data); - options[i].data = malloc(hunkix); + options[i].data = malloc(hunkix+1); + memset(options[i].data, 0, hunkix+1); if (options[i].data == NULL) fatal("option data"); memcpy(options[i].data, hunkbuf, hunkix);
Use CPU_IS_PRIMARY macro in identifycpu() on amd64
Hi tech@, While having a glance at identcpu.c to see how CPU flags were printed for each CPU, I noticed we have the following macro in cpu.h: #define CPU_IS_PRIMARY(ci) ((ci)->ci_flags & CPUF_PRIMARY) Here is a diff to use it in identifycpu() on amd64. Comments? OK? Index: sys/arch/amd64/amd64/identcpu.c === RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v retrieving revision 1.115 diff -u -p -r1.115 identcpu.c --- sys/arch/amd64/amd64/identcpu.c 27 May 2020 05:08:53 - 1.115 +++ sys/arch/amd64/amd64/identcpu.c 6 Jul 2020 21:04:13 - @@ -480,7 +480,7 @@ identifycpu(struct cpu_info *ci) ci->ci_efeature_ecx, ci->ci_feature_eflags); /* Other bits may clash */ ci->ci_feature_flags |= (ci->ci_feature_eflags & CPUID_NXE); - if (ci->ci_flags & CPUF_PRIMARY) + if (CPU_IS_PRIMARY(ci)) ecpu_ecxfeature = ci->ci_efeature_ecx; /* Let cpu_feature be the common bits */ cpu_feature &= ci->ci_feature_flags; @@ -515,7 +515,7 @@ identifycpu(struct cpu_info *ci) sizeof(mycpu_model)); /* If primary cpu, fill in the global cpu_model used by sysctl */ - if (ci->ci_flags & CPUF_PRIMARY) + if (CPU_IS_PRIMARY(ci)) strlcpy(cpu_model, mycpu_model, sizeof(cpu_model)); ci->ci_family = (ci->ci_signature >> 8) & 0x0f; @@ -561,7 +561,7 @@ identifycpu(struct cpu_info *ci) printf(", %llu.%02llu MHz", (freq + 4999) / 100, ((freq + 4999) / 1) % 100); - if (ci->ci_flags & CPUF_PRIMARY) { + if (CPU_IS_PRIMARY(ci)) { cpuspeed = (freq + 4999) / 100; cpu_cpuspeed = cpu_amd64speed; } @@ -689,7 +689,7 @@ identifycpu(struct cpu_info *ci) ci->ci_dev->dv_xname); } - if (ci->ci_flags & CPUF_PRIMARY) { + if (CPU_IS_PRIMARY(ci)) { #ifndef SMALL_KERNEL if (!strcmp(cpu_vendor, "AuthenticAMD") && ci->ci_pnfeatset >= 0x8007) { @@ -737,7 +737,7 @@ identifycpu(struct cpu_info *ci) #endif #ifdef CRYPTO - if (ci->ci_flags & CPUF_PRIMARY) { + if (CPU_IS_PRIMARY(ci)) { if (cpu_ecxfeature & CPUIDECX_PCLMUL) amd64_has_pclmul = 1;
Re: userland clock_gettime proof of concept
> From: Vitaliy Makkoveev > Date: Tue, 7 Jul 2020 01:34:33 +0300 > > > On 5 Jul 2020, at 20:31, Paul Irofti wrote: > > > > On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote: > >> > >> > >> În 3 iulie 2020 17:55:25 EEST, Mark Kettenis a > >> scris: > Date: Fri, 3 Jul 2020 15:13:22 +0200 > From: Robert Nagy > > On 02/07/20 00:31 +0100, Stuart Henderson wrote: > > running on 38 of these, btw. > > been running with this on all my workstations and laptops and on 3 > >>> build > servers as well > >>> > >>> Are the issue that naddy@ saw solved? > >>> > >>> Did anybody do a *proper* test on anything besides amd64? Especially > >>> on architectures where the optimized clock_gettime is *not* available? > >> > >> Yes and yes. > > > > So, can we go ahead with this? > > > > Sorry for late reaction. At least VirtualBox based virtual machines > started to panic with the recent kernel. I reverted your diff and > panics stopped. Should already be fixed.
ws(4): Wheel emulation touch scroll on touchscreens
My first diff... This patch has the ws driver update mouse cursor absolute position before scrolling begins on a touchscreen when wheel emulation is enabled. This change should not impact other uses of wheel emulation. Is this ok? Index: driver/xf86-input-ws/src/ws.c === RCS file: /cvs/xenocara/driver/xf86-input-ws/src/ws.c,v retrieving revision 1.64 diff -u -p -u -p -r1.64 driver/xf86-input-ws/src/ws.c --- driver/xf86-input-ws/src/ws.c 24 Mar 2019 17:59:19 - 1.64 +++ driver/xf86-input-ws/src/ws.c 6 Jul 2020 21:57:03 - @@ -679,7 +679,8 @@ wsReadInput(InputInfoPtr pInfo) int ydelta = hw.ay - priv->old_ay; priv->old_ax = hw.ax; priv->old_ay = hw.ay; - if (wsWheelEmuFilterMotion(pInfo, xdelta, ydelta)) + if (wsWheelEmuFilterMotion(pInfo, xdelta, ydelta) + && priv->type != WSMOUSE_TYPE_TPANEL) return; /* absolute position event */