Re: CVS commit: src/sys/net
Thanks for kind words! Apparently, we need some test to detect kernel and userland mismatch for bpf(4) header... I'd also like to fix problems by which ATF does not complete on ERL3. Some CPU/memory consuming tests, e.g., lib/libc/regex/t_exhaust, trigger watchdog. I guess something wrong in interrupt handler for mips (IIRC, there are similar kind of behaviors for hpcmips machines). Disabling __HAVE_FAST_SOFTINT by hand does not work around the problem... For n32 userland, most rump-based tests in /usr/tests/net crash. But for n64 userland, on the other hand, they seem to work just fine. This may be due to bugs for n32 kernel. IIUC, most kernel codes are compiled for o32 or n64, and only the exception is n32 rump kernel for mips64e[bl] userland. Thanks, rin On 2023/08/26 9:10, David H. Gutteridge wrote: Module Name: src Committed By: rin Date: Wed Aug 23 13:21:17 UTC 2023 Modified Files: src/sys/net: bpf.h Log Message: bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64 It cannot fit within 18 bytes, of course ;) As we had never provided working bpf(4) implementation for LP64 userland on mips, just use natural structure size here. Thanks for fixing this. This means dhcpcd now works on Octeon hardware using the n64 userland. Previously, it would fail with "ps_bpf_recvbpf: Resource temporarily unavailable". (Perhaps you'd noticed this?) This was reported a while ago[1] by an end user on port-mips@, so we should presumably pull this up. Where you say "we had never provided working bpf(4) implementation", I had been looking into the dhcpcd issue myself, but hadn't thought to look here. That is, I had run the tests under net/bpf and net/t_ip_reass (the latter modified to open /dev/bpf directly, not via rump) on my ERL3, and these all passed, giving me the impression we did indeed have a working bpf(4) implementation on n64. Since we didn't, it seems we need more test coverage here. Thanks, Dave 1. http://mail-index.netbsd.org/port-mips/2021/07/04/msg001112.html
Re: CVS commit: src/sys/net
Module Name:src Committed By: rin Date: Wed Aug 23 13:21:17 UTC 2023 Modified Files: src/sys/net: bpf.h Log Message: bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64 It cannot fit within 18 bytes, of course ;) As we had never provided working bpf(4) implementation for LP64 userland on mips, just use natural structure size here. Thanks for fixing this. This means dhcpcd now works on Octeon hardware using the n64 userland. Previously, it would fail with "ps_bpf_recvbpf: Resource temporarily unavailable". (Perhaps you'd noticed this?) This was reported a while ago[1] by an end user on port-mips@, so we should presumably pull this up. Where you say "we had never provided working bpf(4) implementation", I had been looking into the dhcpcd issue myself, but hadn't thought to look here. That is, I had run the tests under net/bpf and net/t_ip_reass (the latter modified to open /dev/bpf directly, not via rump) on my ERL3, and these all passed, giving me the impression we did indeed have a working bpf(4) implementation on n64. Since we didn't, it seems we need more test coverage here. Thanks, Dave 1. http://mail-index.netbsd.org/port-mips/2021/07/04/msg001112.html
re: CVS commit: src/sys/net
"Jonathan A. Kollasch" writes: > Module Name: src > Committed By: jakllsch > Date: Thu Jan 5 02:38:51 UTC 2023 > > Modified Files: > src/sys/net: if_wg.c > > Log Message: > Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4). > > Addresses PR 57161. might be nice to push this down for SIOCGDRVSPEC. it sure seems right for *set* operation, but perhaps for *get*, it can just elide the sensitive portion in the output ioctl (either make it empty or make it not present at all?) it doesn't seem too hard, just moving the check into wg_ioctl_get() for the problematic parts... the idea being to match "ifconfig" on eg, wifi, only showing the configured passwrds to root. thanks. .mrg.
Re: CVS commit: src/sys/net
On 2022/10/25 14:55, Masanobu SAITOH wrote: > > > On 2022/10/25 14:51, matthew green wrote: >> "SAITOH Masanobu" writes: >>> Module Name:src >>> Committed By: msaitoh >>> Date: Mon Oct 24 07:45:44 UTC 2022 >>> >>> Modified Files: >>> src/sys/net: if_dl.h >>> >>> Log Message: >>> Increase sdl_data so that more then IFNAMSIZ bytes are available. >>> >>> - Increase the size of dl_data[] from 12 to 24. >>> - Same as OpenBSD. >> >> isn't this a binary compat issue? eg, 'struct sockaddr_dl' changes, >> and that, and things based upon it, are in user interfaces. i had >> a look and i believe it's a problem, but maybe i missed something. >> >> thanks. >> >> >> .mrg. > > struct dl_addr is at the end of struct sockaddr_dl. > dl_data is at the end of struct dl_addr. > So I think it's has no problem for old binaries. I've roughly tested with old arp, ndp, ifconfig, ifmcstat and all /usr/tests on a new kernel and I didn't saw any problem. I think getifaddrs(3) has no problem. The routing message has no problem because struct rtm_msglen has rtm_msglen and we can get the next message using with it. I can't see any kernel data structure which has struct sockaddr_dl foobadr[xxx] array. One problem might be exist. An old userland program allocate buffer with sizeof(struct sockaddr_dl), the size is smaller than the newer. The old program might copy with memcpy(old_sized_buf, newdata, dla->sdl_len). Should I backout the change until the COMPAT code is written? -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: CVS commit: src/sys/net
On 2022/10/25 14:51, matthew green wrote: > "SAITOH Masanobu" writes: >> Module Name: src >> Committed By:msaitoh >> Date:Mon Oct 24 07:45:44 UTC 2022 >> >> Modified Files: >> src/sys/net: if_dl.h >> >> Log Message: >> Increase sdl_data so that more then IFNAMSIZ bytes are available. >> >> - Increase the size of dl_data[] from 12 to 24. >> - Same as OpenBSD. > > isn't this a binary compat issue? eg, 'struct sockaddr_dl' changes, > and that, and things based upon it, are in user interfaces. i had > a look and i believe it's a problem, but maybe i missed something. > > thanks. > > > .mrg. struct dl_addr is at the end of struct sockaddr_dl. dl_data is at the end of struct dl_addr. So I think it's has no problem for old binaries. -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
re: CVS commit: src/sys/net
"SAITOH Masanobu" writes: > Module Name: src > Committed By: msaitoh > Date: Mon Oct 24 07:45:44 UTC 2022 > > Modified Files: > src/sys/net: if_dl.h > > Log Message: > Increase sdl_data so that more then IFNAMSIZ bytes are available. > > - Increase the size of dl_data[] from 12 to 24. > - Same as OpenBSD. isn't this a binary compat issue? eg, 'struct sockaddr_dl' changes, and that, and things based upon it, are in user interfaces. i had a look and i believe it's a problem, but maybe i missed something. thanks. .mrg.
Re: CVS commit: src/sys/net
On Mon, 15 Nov 2021 07:07:06 + Shoichi YAMAGUCHI wrote: > Date: Mon Nov 15 07:07:06 UTC 2021 > > Modified Files: > src/sys/net: if_ether.h if_ethersubr.c if_vlan.c > src/sys/net/lagg: if_lagg.c > > Log Message: > introduced APIs to configure VLAN TAG to ethernet devices Hello, This change seems to have broke something MTU related when more than two VLANs are configured on the same NIC. The first VLAN inherits the parent's MTU but the subsequent ones have 4 byte too small MTU causing network problems. Kernel from 2021-11-15: # ifconfig |grep mtu wm0: flags=0x8843 mtu 1500 vlan100: flags=0x8843 mtu 1500 vlan101: flags=0x8843 mtu 1500 vlan102: flags=0x8843 mtu 1500 vlan103: flags=0x8843 mtu 1500 vlan104: flags=0x8843 mtu 1500 vlan105: flags=0x8843 mtu 1500 vlan106: flags=0x8843 mtu 1500 Kernel from 2021-11-16: wm0: flags=0x8843 mtu 1500 vlan100: flags=0x8843 mtu 1500 vlan101: flags=0x8843 mtu 1496 vlan102: flags=0x8843 mtu 1496 vlan103: flags=0x8843 mtu 1496 vlan104: flags=0x8843 mtu 1496 vlan105: flags=0x8843 mtu 1496 vlan106: flags=0x8843 mtu 1496 I could not force increase the MTU either: root@tinfoilhat-b:log> ifconfig vlan100 mtu 1500 root@tinfoilhat-b:log> ifconfig vlan101 mtu 1500 ifconfig: SIOCSIFMTU: Invalid argument Kind regards, -Tobias
Re: CVS commit: src/sys/net
In article <20210506061816.94bb1f...@cvs.netbsd.org>, Shoichi YAMAGUCHI wrote: >-=-=-=-=-=- > >Module Name: src >Committed By: yamaguchi >Date: Thu May 6 06:18:16 UTC 2021 > >Modified Files: > src/sys/net: if_spppsubr.c > >Log Message: >do not clear destination address if there is no saved address >and add initialization of saved_hisaddr for safety You don't need to byte-swap in order to compare with 0... :-) christos
re: CVS commit: src/sys/net
> (I haven't investigated exactly what's going on leading to ~5 KB-byte > stack frames, but this shuts gcc up, at least, and the hypotheses > sound plausible to me!) this matches what i saw when looking at it. i was going to test the exact same idea (noinline). .mrg.
Re: CVS commit: src/sys/net
On 13/02/2021 14:19, Jonathan A. Kollasch wrote: On Sat, Feb 13, 2021 at 07:28:05AM +, Roy Marples wrote: Module Name:src Committed By: roy Date: Sat Feb 13 07:28:05 UTC 2021 Modified Files: src/sys/net: if_ether.h if_ethersubr.c Log Message: if_ether: Ensure that ether_header is aligned To generate a diff of this commit: cvs rdiff -u -r1.84 -r1.85 src/sys/net/if_ether.h cvs rdiff -u -r1.289 -r1.290 src/sys/net/if_ethersubr.c This appears to ensure the Ethernet header is naturally aligned on a 32-bit boundary. The 16-bit ether_type field is the only thing in ether_header that is wider than a uint8_t. Many drivers will place the start of the receive buffer at an ETHER_ALIGN (which is 2) offset to ensure the layer three header is 32-bit aligned after the 14-byte Ethernet header. Thus this will result in always calling m_copyup() in ether_input() on strict alignment platforms. Reverted
Re: CVS commit: src/sys/net
On Sat, Feb 13, 2021 at 07:28:05AM +, Roy Marples wrote: > Module Name: src > Committed By: roy > Date: Sat Feb 13 07:28:05 UTC 2021 > > Modified Files: > src/sys/net: if_ether.h if_ethersubr.c > > Log Message: > if_ether: Ensure that ether_header is aligned > > > To generate a diff of this commit: > cvs rdiff -u -r1.84 -r1.85 src/sys/net/if_ether.h > cvs rdiff -u -r1.289 -r1.290 src/sys/net/if_ethersubr.c This appears to ensure the Ethernet header is naturally aligned on a 32-bit boundary. The 16-bit ether_type field is the only thing in ether_header that is wider than a uint8_t. Many drivers will place the start of the receive buffer at an ETHER_ALIGN (which is 2) offset to ensure the layer three header is 32-bit aligned after the 14-byte Ethernet header. Thus this will result in always calling m_copyup() in ether_input() on strict alignment platforms.
Re: CVS commit: src/sys/net
Hi, I have one question about the following commit. Why stoeplitz_hash_ip4() and stoeplitz_hash_ip4port() argument types are different between toeplitz.c and toeplitz.h? They have in_addr_t and in_port_t argument types in toeplitz.c, howerver uint32_t and uint16_t in toeplitz.h. On 2021/01/31 6:23, Jared D. McNeill wrote: Module Name:src Committed By: jmcneill Date: Sat Jan 30 21:23:08 UTC 2021 Modified Files: src/sys/net: files.net Added Files: src/sys/net: toeplitz.c toeplitz.h Log Message: Add symmetric toeplitz implementation with integration for NICs, from OpenBSD. To generate a diff of this commit: cvs rdiff -u -r1.29 -r1.30 src/sys/net/files.net cvs rdiff -u -r0 -r1.1 src/sys/net/toeplitz.c src/sys/net/toeplitz.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: CVS commit: src/sys/net
On 29.01.2020 07:40, Jason Thorpe wrote: > >> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski wrote: >> >> The distribution build breaks for me with: > > Should be fixed with: > > /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v <-- Makefile > new revision: 1.33; previous revision: 1.32 > >> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: >> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined >> reference to `rumpns_if_stats_init' >> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: >> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined >> reference to `rumpns_if_stats_fini' >> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: >> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined >> reference to `rumpns_if_stats_to_if_data' >> > > -- thorpej > It works now! signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/net
> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski wrote: > > The distribution build breaks for me with: Should be fixed with: /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v <-- Makefile new revision: 1.33; previous revision: 1.32 > /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: > /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined > reference to `rumpns_if_stats_init' > /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: > /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined > reference to `rumpns_if_stats_fini' > /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: > /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined > reference to `rumpns_if_stats_to_if_data' > -- thorpej
Re: CVS commit: src/sys/net
On 29.01.2020 04:16, Jason R Thorpe wrote: > Module Name: src > Committed By: thorpej > Date: Wed Jan 29 03:16:28 UTC 2020 > > Modified Files: > src/sys/net: Makefile files.net if.c if.h > Added Files: > src/sys/net: if_stats.c if_stats.h > > Log Message: > Add support for MP-safe network interface statistics by maintaining them > in per-cpu storage, and collecting them for export in an if_data structure > when user-space wants them. > > The new if_stat API is structured to make a gradual transition to the > new way in network drivers possible, and per-cpu stats are currently > disabled (thus there is no kernel ABI change). Once all drivers have > been converted, the old ABI will be removed, and per-cpu stats will be > enabled universally. > The distribution build breaks for me with: /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined reference to `rumpns_if_stats_init' /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined reference to `rumpns_if_stats_fini' /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld: /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined reference to `rumpns_if_stats_to_if_data' signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/net
On 2019/12/05 14:29, SAITOH Masanobu wrote: > Module Name: src > Committed By: msaitoh > Date: Thu Dec 5 05:29:28 UTC 2019 > > Modified Files: > src/sys/net: if_media.h > > Log Message: > Fix previous comment change for ifm_media. It was correct. > > The real problem is that some driver misuse ifm_media as the current active > media. struct mii_data has the current active media(mii_media_active). If a > driver use mii(4), it can be use mii->mii_media_active for this purpose. > struct ifmedia has no entry for this purpose. Some drivers have an entry s/Some drivers/Some drivers not using mii(4)/ > in their own softc to keep the value, but some other's don't have it and > they mistakenly use ifm_media. > > We might add a new entry to struct ifmedia in future to avoid this confusion > and for simplify. > > > To generate a diff of this commit: > cvs rdiff -u -r1.67 -r1.68 src/sys/net/if_media.h > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: CVS commit: src/sys/net
On Mon, Sep 23, 2019 at 4:14 PM Kamil Rytarowski wrote: > > On 23.09.2019 06:53, Rin Okuyama wrote: > > Hi, > > > > On 2019/09/22 18:30, Kamil Rytarowski wrote: > >> On 12.04.2018 06:38, Ryota Ozaki wrote: > >>> Module Name:src > >>> Committed By:ozaki-r > >>> Date:Thu Apr 12 04:38:13 UTC 2018 > >>> > >>> Modified Files: > >>> src/sys/net: if.h route.c route.h rtsock.c > >>> > >>> Log Message: > >>> Resolve tangled lock dependencies in route.c > >>> > >>> This change sweeps remaining lock decisions based on if locked or not > >>> by moving > >>> utility functions of rtentry updates from rtsock.c and ensuring > >>> holding the > >>> rt_lock. It also improves the atomicity of a update of a rtentry. > >>> > >> > >>> +static struct ifaddr * > >>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct > >>> rtentry *rt, > >>> +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref) > >>> +{ > >> > >> > >> Do we need to pass info as a value? It is pretty large here (1024 bytes). > > > > Yeah, we were just discussing on this alert of LGTM bot. > > > > We can use const pointer here. I will commit the fix soon. > > > > Thanks, > > rin Thank you for the commit! > > Thanks for addressing it! I wonder whether there is performance impact > here (is this hot-path code?). The function is not used in any packet processing (hot paths) and used only for route updates that are uncommon for most users. ozaki-r
Re: CVS commit: src/sys/net
On 23.09.2019 06:53, Rin Okuyama wrote: > Hi, > > On 2019/09/22 18:30, Kamil Rytarowski wrote: >> On 12.04.2018 06:38, Ryota Ozaki wrote: >>> Module Name: src >>> Committed By: ozaki-r >>> Date: Thu Apr 12 04:38:13 UTC 2018 >>> >>> Modified Files: >>> src/sys/net: if.h route.c route.h rtsock.c >>> >>> Log Message: >>> Resolve tangled lock dependencies in route.c >>> >>> This change sweeps remaining lock decisions based on if locked or not >>> by moving >>> utility functions of rtentry updates from rtsock.c and ensuring >>> holding the >>> rt_lock. It also improves the atomicity of a update of a rtentry. >>> >> >>> +static struct ifaddr * >>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct >>> rtentry *rt, >>> + struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref) >>> +{ >> >> >> Do we need to pass info as a value? It is pretty large here (1024 bytes). > > Yeah, we were just discussing on this alert of LGTM bot. > > We can use const pointer here. I will commit the fix soon. > > Thanks, > rin Thanks for addressing it! I wonder whether there is performance impact here (is this hot-path code?). signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/net
Hi, On 2019/09/22 18:30, Kamil Rytarowski wrote: On 12.04.2018 06:38, Ryota Ozaki wrote: Module Name:src Committed By: ozaki-r Date: Thu Apr 12 04:38:13 UTC 2018 Modified Files: src/sys/net: if.h route.c route.h rtsock.c Log Message: Resolve tangled lock dependencies in route.c This change sweeps remaining lock decisions based on if locked or not by moving utility functions of rtentry updates from rtsock.c and ensuring holding the rt_lock. It also improves the atomicity of a update of a rtentry. +static struct ifaddr * +rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt, +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref) +{ Do we need to pass info as a value? It is pretty large here (1024 bytes). Yeah, we were just discussing on this alert of LGTM bot. We can use const pointer here. I will commit the fix soon. Thanks, rin
Re: CVS commit: src/sys/net
On 12.04.2018 06:38, Ryota Ozaki wrote: > Module Name: src > Committed By: ozaki-r > Date: Thu Apr 12 04:38:13 UTC 2018 > > Modified Files: > src/sys/net: if.h route.c route.h rtsock.c > > Log Message: > Resolve tangled lock dependencies in route.c > > This change sweeps remaining lock decisions based on if locked or not by > moving > utility functions of rtentry updates from rtsock.c and ensuring holding the > rt_lock. It also improves the atomicity of a update of a rtentry. > > +static struct ifaddr * > +rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt, > +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref) > +{ Do we need to pass info as a value? It is pretty large here (1024 bytes). signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/net/npf
Le 06/08/2019 à 12:31, Christos Zoulas a écrit : I did not see any messages about it, and the fix is fine until rmind comes up with something better. Yes turns out it was an off-list email It is not nice to have HEAD unusable for 2 weeks now (since July 22nd). Given your second commit, I should understand that rmind is ok with this change right? christos On Aug 6, 2019, at 1:26 PM, Maxime Villard wrote: Le 06/08/2019 à 12:25, Christos Zoulas a écrit : Module Name:src Committed By: christos Date: Tue Aug 6 10:25:13 UTC 2019 Modified Files: src/sys/net/npf: npf_conn.c Log Message: Introduce an npf_conn_destroy_idx() that can handle partially constructed conn structures. To generate a diff of this commit: cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Rmind said he had a fix and was testing it. Please revert this.
Re: CVS commit: src/sys/net/npf
Yes, rmind asked me to commit the change in private email. christos > On Aug 6, 2019, at 3:59 PM, Maxime Villard wrote: > > Le 06/08/2019 à 12:31, Christos Zoulas a écrit : >> I did not see any messages about it, and the fix is fine until rmind comes >> up with something better. > > Yes turns out it was an off-list email > >> It is not nice to have HEAD unusable for 2 weeks now (since July 22nd). > > Given your second commit, I should understand that rmind is ok with this > change right? > >> christos >>> On Aug 6, 2019, at 1:26 PM, Maxime Villard wrote: >>> >>> Le 06/08/2019 à 12:25, Christos Zoulas a écrit : Module Name: src Committed By: christos Date: Tue Aug 6 10:25:13 UTC 2019 Modified Files: src/sys/net/npf: npf_conn.c Log Message: Introduce an npf_conn_destroy_idx() that can handle partially constructed conn structures. To generate a diff of this commit: cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. >>> >>> Rmind said he had a fix and was testing it. Please revert this.
Re: CVS commit: src/sys/net/npf
I did not see any messages about it, and the fix is fine until rmind comes up with something better. It is not nice to have HEAD unusable for 2 weeks now (since July 22nd). christos > On Aug 6, 2019, at 1:26 PM, Maxime Villard wrote: > > Le 06/08/2019 à 12:25, Christos Zoulas a écrit : >> Module Name: src >> Committed By:christos >> Date:Tue Aug 6 10:25:13 UTC 2019 >> Modified Files: >> src/sys/net/npf: npf_conn.c >> Log Message: >> Introduce an npf_conn_destroy_idx() that can handle partially constructed >> conn structures. >> To generate a diff of this commit: >> cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c >> Please note that diffs are not public domain; they are subject to the >> copyright notices on the relevant files. > > Rmind said he had a fix and was testing it. Please revert this.
Re: CVS commit: src/sys/net/npf
Le 06/08/2019 à 12:25, Christos Zoulas a écrit : Module Name:src Committed By: christos Date: Tue Aug 6 10:25:13 UTC 2019 Modified Files: src/sys/net/npf: npf_conn.c Log Message: Introduce an npf_conn_destroy_idx() that can handle partially constructed conn structures. To generate a diff of this commit: cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Rmind said he had a fix and was testing it. Please revert this.
Re: CVS commit: src/sys/net
In article , Ryota Ozaki wrote: >On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas wrote: >> >> Module Name:src >> Committed By: christos >> Date: Mon Apr 15 20:51:46 UTC 2019 >> >> Modified Files: >> src/sys/net: if.c >> >> Log Message: >> Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack >> disclosure. From Andy Nguyen, many thanks! > >Should we apply the same fix to all compat variants of ifconf? Yup, good catch! Also pullups :-) christos
Re: CVS commit: src/sys/net
On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas wrote: > > Module Name:src > Committed By: christos > Date: Mon Apr 15 20:51:46 UTC 2019 > > Modified Files: > src/sys/net: if.c > > Log Message: > Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack > disclosure. From Andy Nguyen, many thanks! Should we apply the same fix to all compat variants of ifconf? ozaki-r
Re: CVS commit: src/sys/net
Date:Tue, 26 Mar 2019 00:23:32 + From:"Paul Goyette" Message-ID: <20190326002332.c0241f...@cvs.netbsd.org> | XXX Someone(tm) needs to update MAKEDEV to create the /dev/srtN device | nodes (with device-major 179)! Yes, I know ... That's easy - but as I mentioned in off list mail, my private pre-requisite for doing it is making srt(4) exist first. kre ps: thanks for all the modularisation fixes. I wonder how much more exists that appears as if it could be made a module, but actually cannot?
re: CVS commit: src/sys/net
"Christos Zoulas" writes: > Module Name: src > Committed By: christos > Date: Mon Feb 18 23:13:14 UTC 2019 > > Modified Files: > src/sys/net: zlib.c > > Log Message: > add fallthrough's i specific avoided changing upstream code where possible by adding the relevant cflags. where was this failing? same with most of the drm changes (commited) and (posted) patch, too. can you either revert these as unnecessary or go remove the flags for them? they're duplicated in multiple places (module/kernel). thanks. .mrg.
Re: CVS commit: src/sys/net
Thank you for finding it out. I removed it in accident. rin On 2018/12/14 21:27, Martin Husemann wrote: Module Name:src Committed By: martin Date: Fri Dec 14 12:27:22 UTC 2018 Modified Files: src/sys/net: if_bridge.c Log Message: Need for ip6_statinc() prototype. To generate a diff of this commit: cvs rdiff -u -r1.161 -r1.162 src/sys/net/if_bridge.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Re: CVS commit: src/sys/net
Panic seems not optimal. A rate limited log message at LOG_ERROR seems better. -- thorpej Sent from my iPhone. > On Dec 13, 2018, at 12:54 PM, Rin Okuyama wrote: > > Module Name:src > Committed By:rin > Date:Thu Dec 13 20:54:50 UTC 2018 > > Modified Files: >src/sys/net: ether_sw_offload.c > > Log Message: > Panic rather than silently dropping packets when TX offload options are > enabled for unsupported frame types. > > > To generate a diff of this commit: > cvs rdiff -u -r1.3 -r1.4 src/sys/net/ether_sw_offload.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. >
Re: CVS commit: src/sys/net
On Tue, Jul 10, 2018 at 05:02:27PM +, Christos Zoulas wrote: > Should be pulled up to -8, with the other patch. Already done ;-)
Re: CVS commit: src/sys/net
In article <20180710110040.e80fcf...@cvs.netbsd.org>, Robert Elz wrote: >-=-=-=-=-=- > >Module Name: src >Committed By: kre >Date: Tue Jul 10 11:00:40 UTC 2018 > >Modified Files: > src/sys/net: if_llatbl.c > >Log Message: >Avoid attempting to call arp related functions if there is no >arp in the kernel. Should be pulled up to -8, with the other patch. christos
Re: CVS commit: src/sys/net/npf
Le 07/04/2018 à 23:28, Christos Zoulas a écrit : In article <20180407090627.20058f...@cvs.netbsd.org>, Maxime Villard wrote: -=-=-=-=-=- Module Name:src Committed By: maxv Date: Sat Apr 7 09:06:27 UTC 2018 Modified Files: src/sys/net/npf: npf_inet.c Log Message: Rewrite npf_fetch_tcpopts: * Instead of doing several nbuf_advance/nbuf_ensure_contig and playing with gotos, fetch the TCP options only once, and iterate over the (safe) area. The code is similar to tcp_dooptions. * When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is the one we're expecting. If it isn't, then skip the option. This wasn't done before, and not doing it allowed a packet to bypass the max-mss clamping procedure. Discussed on tech-net@. This seems to break cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning ENETUNREACH. christos My bad (again). Seems like the TCP code is getting me confused all the time.
Re: CVS commit: src/sys/net/npf
In article <20180407090627.20058f...@cvs.netbsd.org>, Maxime Villard wrote: >-=-=-=-=-=- > >Module Name: src >Committed By: maxv >Date: Sat Apr 7 09:06:27 UTC 2018 > >Modified Files: > src/sys/net/npf: npf_inet.c > >Log Message: >Rewrite npf_fetch_tcpopts: > > * Instead of doing several nbuf_advance/nbuf_ensure_contig and > playing with gotos, fetch the TCP options only once, and iterate over > the (safe) area. The code is similar to tcp_dooptions. > > * When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is > the one we're expecting. If it isn't, then skip the option. This > wasn't done before, and not doing it allowed a packet to bypass the > max-mss clamping procedure. Discussed on tech-net@. > This seems to break cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning ENETUNREACH. christos
Re: CVS commit: src/sys/net/npf
Le 31/01/2018 à 00:18, Mindaugas Rasiukevicius a écrit : [...] Fix this by using uint32_t. While here, it seems to me there is also a memory overflow: still in npf_cache_ip, npc_hlen may be incremented with a value that goes beyond the mbuf. [...] If the npc_hlen value is beyond the packet length, NPF's nbuf interface will catch that, since it performs the bounds check. I meant to say that the IPv6 loop in npf_cache_ip seems suspicious to me. while (nbuf_advance(nbuf, hlen, 0) != NULL) { [...] hlen = (ip6e->ip6e_len + 1) << 3; [...] npc->npc_hlen += hlen; } [continue execution...] Here, if you have a 'hlen' that goes beyond the mbuf, nbuf_advance will fail, and we're not handling this case. npc_hlen got incremented along the way, and it now points past the end of the mbuf. Perhaps that's handled properly later, but in all cases, we ought to handle the error right here instead of processing the packet any further. Note however that NPF is rather at the end of my TODO list, and I'll come back to it later. Maxime
Re: CVS commit: src/sys/net/npf
"Maxime Villard" wrote: > Module Name: src > Committed By: maxv > Date: Fri Dec 15 21:00:26 UTC 2017 > > Modified Files: > src/sys/net/npf: npf.h > > Log Message: > Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to > bypass a certain number of filtering rules. > > Basically there is an integer overflow in npf_cache_ip: npc_hlen is a > 8bit unsigned int, and can wrap to zero if the IPv6 packet being processed > has large extensions. Thanks for discovering and fixing this. I think this is the first serious remote vulnerability in NPF, although limited to IPv6 only. > Fix this by using uint32_t. While here, it seems to me there is also a > memory overflow: still in npf_cache_ip, npc_hlen may be incremented with > a value that goes beyond the mbuf. A minor aspect, but promoting npf_hlen to uint32_t results in wasteful padding in the struct, so it is better to re-order the struct members in this case. If the npc_hlen value is beyond the packet length, NPF's nbuf interface will catch that, since it performs the bounds check. However, I think we should implement some sanity check for the npc_hlen value. RFC 7112 suggests that the IPv6 header chain should not exceed the MTU size (and thus fit in the first fragment, in case of fragmentation). Some value along these lines could be the basis for a sanity check.. -- Mindaugas
Re: CVS commit: src/sys/net
Le 19/01/2018 à 13:31, Takeshi Nakayama a écrit : Module Name:src Committed By: nakayama Date: Fri Jan 19 12:31:28 UTC 2018 Modified Files: src/sys/net: if_ethersubr.c Log Message: Fix inverted logic. To generate a diff of this commit: cvs rdiff -u -r1.256 -r1.257 src/sys/net/if_ethersubr.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Ah yes, my bad, sorry about that. When I saw this ugly #ifdef DIAGNOSTIC and panic I just machinally replaced it by KASSERTMSG. Maxime
Re: CVS commit: src/sys/net
On Tue, Jan 09, 2018 at 01:07:50AM +, m...@netbsd.org wrote: > This is causing PR kern/52914 (on netbsd-8, not current) Copied wrong number kern/52895 it's in -current but also -8
Re: CVS commit: src/sys/net
This is causing PR kern/52914 (on netbsd-8, not current) On Fri, Dec 15, 2017 at 04:04:59AM +, Ryota Ozaki wrote: > Module Name: src > Committed By: ozaki-r > Date: Fri Dec 15 04:04:59 UTC 2017 > > Modified Files: > src/sys/net: if.c > > Log Message: > Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held > > > To generate a diff of this commit: > cvs rdiff -u -r1.415 -r1.416 src/sys/net/if.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > > Modified files: > > Index: src/sys/net/if.c > diff -u src/sys/net/if.c:1.415 src/sys/net/if.c:1.416 > --- src/sys/net/if.c:1.415Fri Dec 15 04:03:46 2017 > +++ src/sys/net/if.c Fri Dec 15 04:04:58 2017 > @@ -1,4 +1,4 @@ > -/* $NetBSD: if.c,v 1.415 2017/12/15 04:03:46 ozaki-r Exp $ */ > +/* $NetBSD: if.c,v 1.416 2017/12/15 04:04:58 ozaki-r Exp $ */ > > /*- > * Copyright (c) 1999, 2000, 2001, 2008 The NetBSD Foundation, Inc. > @@ -90,7 +90,7 @@ > */ > > #include > -__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.415 2017/12/15 04:03:46 ozaki-r Exp $"); > +__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.416 2017/12/15 04:04:58 ozaki-r Exp $"); > > #if defined(_KERNEL_OPT) > #include "opt_inet.h" > @@ -1792,11 +1792,18 @@ ifa_insert(struct ifnet *ifp, struct ifa > > ifa->ifa_ifp = ifp; > > - IFNET_GLOBAL_LOCK(); > + /* > + * Check !IFF_RUNNING for initialization routines that normally don't > + * take IFNET_LOCK but it's safe because there is no competitor. > + * XXX there are false positive cases because IFF_RUNNING can be off on > + * if_stop. > + */ > + KASSERT(!ISSET(ifp->if_flags, IFF_RUNNING) || > + IFNET_LOCKED(ifp)); > + > TAILQ_INSERT_TAIL(&ifp->if_addrlist, ifa, ifa_list); > IFADDR_ENTRY_INIT(ifa); > IFADDR_WRITER_INSERT_TAIL(ifp, ifa); > - IFNET_GLOBAL_UNLOCK(); > > ifaref(ifa); > } > @@ -1806,14 +1813,19 @@ ifa_remove(struct ifnet *ifp, struct ifa > { > > KASSERT(ifa->ifa_ifp == ifp); > + /* > + * if_is_deactivated indicates ifa_remove is called form if_detach > + * where is safe even if IFNET_LOCK isn't held. > + */ > + KASSERT(if_is_deactivated(ifp) || IFNET_LOCKED(ifp)); > > - IFNET_GLOBAL_LOCK(); > TAILQ_REMOVE(&ifp->if_addrlist, ifa, ifa_list); > IFADDR_WRITER_REMOVE(ifa); > #ifdef NET_MPSAFE > + IFNET_GLOBAL_LOCK(); > pserialize_perform(ifnet_psz); > -#endif > IFNET_GLOBAL_UNLOCK(); > +#endif > > #ifdef NET_MPSAFE > psref_target_destroy(&ifa->ifa_psref, ifa_psref_class); >
Re: CVS commit: src/sys/net
2017/12/18 22:09 "Christos Zoulas" : On Dec 18, 2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote: -- Subject: Re: CVS commit: src/sys/net | Oh cool. I didn't know that technique. Great, can you undo the change then? I already did it :) ozaki-r Thanks, christos
Re: CVS commit: src/sys/net
On Dec 18, 2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote: -- Subject: Re: CVS commit: src/sys/net | Oh cool. I didn't know that technique. Great, can you undo the change then? Thanks, christos
Re: CVS commit: src/sys/net
On Thu, Dec 14, 2017 at 10:03 PM, Christos Zoulas wrote: > On Dec 14, 5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote: > -- Subject: Re: CVS commit: src/sys/net > > | Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh? > | Anyway no objection to revert the change that is not quite important > | (but makes my life easy). > > just do: > > cat << _EOF >> /etc/mk.conf > .if make(netbsd) > COPTS.route.c += -O0 > .endif > _EOF > > That (or something similar) should work. Oh cool. I didn't know that technique. Thanks, ozaki-r
Re: CVS commit: src/sys/net
On Dec 14, 5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote: -- Subject: Re: CVS commit: src/sys/net | Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh? | Anyway no objection to revert the change that is not quite important | (but makes my life easy). just do: cat << _EOF >> /etc/mk.conf .if make(netbsd) COPTS.route.c += -O0 .endif _EOF That (or something similar) should work. christos
Re: CVS commit: src/sys/net
On Thu, Dec 14, 2017 at 3:27 PM, Christos Zoulas wrote: > In article <20171214054314.e76edf...@cvs.netbsd.org>, > Ryota Ozaki wrote: >>-=-=-=-=-=- >> >>Module Name: src >>Committed By: ozaki-r >>Date: Thu Dec 14 05:43:14 UTC 2017 >> >>Modified Files: >> src/sys/net: rtsock.c >> >>Log Message: >>Spinkle __noinline to some non-performance-sensitive functions for debugging > > Well, "for debugging" we can always compile with different flags rather > than penalize everyone/everytime however that small might be. Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh? Anyway no objection to revert the change that is not quite important (but makes my life easy). ozaki-r
Re: CVS commit: src/sys/net
In article <20171214054314.e76edf...@cvs.netbsd.org>, Ryota Ozaki wrote: >-=-=-=-=-=- > >Module Name: src >Committed By: ozaki-r >Date: Thu Dec 14 05:43:14 UTC 2017 > >Modified Files: > src/sys/net: rtsock.c > >Log Message: >Spinkle __noinline to some non-performance-sensitive functions for debugging Well, "for debugging" we can always compile with different flags rather than penalize everyone/everytime however that small might be. christos
Re: CVS commit: src/sys/net
On 2017/11/22 14:40, Robert Elz wrote: Date:Wed, 22 Nov 2017 04:56:52 + From:"SAITOH Masanobu" Message-ID: <20171122045652.2ee75f...@cvs.netbsd.org> | Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535. | The spec states 0 and 65535 are reserved. Tags are 12 bigs, so 0..4095 (65535 won't fit). The code that was added in this commit looks correct (0xfff - it would be better as a named const defined somewhere) but the comment that accompanies it (and this commit log entry) are not. Yes, you're correct. I fixed the comment. Thanks. kre -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: CVS commit: src/sys/net
Date:Wed, 22 Nov 2017 04:56:52 + From:"SAITOH Masanobu" Message-ID: <20171122045652.2ee75f...@cvs.netbsd.org> | Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535. | The spec states 0 and 65535 are reserved. Tags are 12 bigs, so 0..4095 (65535 won't fit). The code that was added in this commit looks correct (0xfff - it would be better as a named const defined somewhere) but the comment that accompanies it (and this commit log entry) are not. kre
Re: CVS commit: src/sys/net
On Fri, Sep 22, 2017 at 5:53 PM, Joerg Sonnenberger wrote: > On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote: >> Module Name: src >> Committed By: ozaki-r >> Date: Fri Sep 22 05:05:32 UTC 2017 >> >> Modified Files: >> src/sys/net: route.c >> >> Log Message: >> Remove the global lock for rtcache >> >> Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only >> by >> their users. And in existing usages a rtcache is guranteed to be not accessed >> simultaneously. So the rtcache framework doesn't need any exclusion controls >> in itself. > > I'm a bit suspicous about this since the generation counter is not > atomic. So what guarantees that route manipulations of different > subsystems don't race? If !NET_MPSAFE, it's softnet_lock, however, if NET_MPSAFE, nothing guarantees that as you suspect. I've fixed the issue. Thanks! ozaki-r
Re: CVS commit: src/sys/net
On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote: > Module Name: src > Committed By: ozaki-r > Date: Fri Sep 22 05:05:32 UTC 2017 > > Modified Files: > src/sys/net: route.c > > Log Message: > Remove the global lock for rtcache > > Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by > their users. And in existing usages a rtcache is guranteed to be not accessed > simultaneously. So the rtcache framework doesn't need any exclusion controls > in itself. I'm a bit suspicous about this since the generation counter is not atomic. So what guarantees that route manipulations of different subsystems don't race? Joerg
Re: CVS commit: src/sys/net
On Thu, Apr 6, 2017 at 3:34 AM, Taylor R Campbell wrote: >> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08) >> From: Paul Goyette >> >> @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned >> { >> int rc; >> struct ifreq ifr; >> + bool need_unlock = false; >> + >> + /* XXX if_ioctl_lock may or may not be held here */ >> + if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) { >> + mutex_enter(ifp->if_ioctl_lock); >> + need_unlock = true; >> + } >> >> It is my understanding that using mutex_owned() to effect locking >> decisions is forbidden. >> >> I understand the intent of doing it this way, but perhaps we should >> revisit the callers and enforce that they always take the lock? > > Quite right. Please revert this change and post a note to tech-net@ > about how to resolve the issue properly if you need help, including a > statement of what code paths do or don't hold the ioctl lock and why > it's hard to address them. It tried to put the ioctl lock at upper callers and gave up doing so because that's too complex (ioctl is easily called recursively) and the ioctl lock for an interface doesn't fit to upper callers that don't related to ioctl at all. So I don't think it's worthwhile to investigate where we should lock correctly at upper callers. If we correctly fix the issue, we should make ioctl callees MP-safe somehow. Anyway I know the change isn't the best approach so I'll revert the commit. > > From what I recall (haven't looked recently, though) the ifioctl > locking may be broken, and may be better written with localcount(9). That part now uses psref and if_ioctl_lock. I'm not sure localcount suits more for that. ozaki-r
Re: CVS commit: src/sys/net
> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08) > From: Paul Goyette > > @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned > { > int rc; > struct ifreq ifr; > + bool need_unlock = false; > + > + /* XXX if_ioctl_lock may or may not be held here */ > + if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) { > + mutex_enter(ifp->if_ioctl_lock); > + need_unlock = true; > + } > > It is my understanding that using mutex_owned() to effect locking > decisions is forbidden. > > I understand the intent of doing it this way, but perhaps we should > revisit the callers and enforce that they always take the lock? Quite right. Please revert this change and post a note to tech-net@ about how to resolve the issue properly if you need help, including a statement of what code paths do or don't hold the ioctl lock and why it's hard to address them. >From what I recall (haven't looked recently, though) the ifioctl locking may be broken, and may be better written with localcount(9).
Re: CVS commit: src/sys/net
@@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned { int rc; struct ifreq ifr; + bool need_unlock = false; + + /* XXX if_ioctl_lock may or may not be held here */ + if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) { + mutex_enter(ifp->if_ioctl_lock); + need_unlock = true; + } It is my understanding that using mutex_owned() to effect locking decisions is forbidden. I understand the intent of doing it this way, but perhaps we should revisit the callers and enforce that they always take the lock? On Wed, 5 Apr 2017, Ryota Ozaki wrote: Module Name:src Committed By: ozaki-r Date: Wed Apr 5 03:47:51 UTC 2017 Modified Files: src/sys/net: if.c if.h if_ethersubr.c link_proto.c Log Message: Make sure to hold if_ioctl_lock when calling ifp->if_ioctl Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set and if_mcast_op) may or may not hold if_ioctl_lock, so we have to hold the lock only if it's not held. To generate a diff of this commit: cvs rdiff -u -r1.389 -r1.390 src/sys/net/if.c cvs rdiff -u -r1.236 -r1.237 src/sys/net/if.h cvs rdiff -u -r1.240 -r1.241 src/sys/net/if_ethersubr.c cvs rdiff -u -r1.34 -r1.35 src/sys/net/link_proto.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. !DSPAM:58e49ead273461966420671! +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: CVS commit: src/sys/net
oops, I was wrong here, please disregard :-)
Re: CVS commit: src/sys/net
On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote: > @@ -803,32 +790,14 @@ tunread(dev_t dev, struct uio *uio, int > goto out; > } > tp->tun_flags |= TUN_RWAIT; > - if (mtsleep((void *)tp, PZERO|PCATCH|PNORELOCK, > - "tunread", 0, &tp->tun_lock) != 0) { > + if (cv_wait_sig(&tp->tun_cv, &tp->tun_lock)) { > error = EINTR; > - goto out_nolock; > - } else { > - /* > - * Maybe the interface was destroyed while > - * we were sleeping, so let's ensure that > - * we're looking at the same (valid) tun > - * interface before looping. > - */ > - tp = tun_find_unit(dev); > - if (tp == NULL) { > - error = ENXIO; > - goto out_nolock; > - } > - if (tp->tun_if.if_index != index) { > - error = ENXIO; > - goto out; > - } > + goto out; > } > } If you goto out if tp is NULL, it will dereference it trying to mutex_exit(&tp->tun_lock);
Re: CVS commit: src/sys/net
Hi, On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote: > @@ -534,14 +527,12 @@ tun_output(struct ifnet *ifp, struct mbu > const struct rtentry *rt) > { > struct tun_softc *tp = ifp->if_softc; > - int s; > int error; > #if defined(INET) || defined(INET6) > int mlen; > uint32_t*af; > #endif > > - s = splnet(); > mutex_enter(&tp->tun_lock); > TUNDEBUG ("%s: tun_output\n", ifp->if_xname); > > @@ -551,6 +542,8 @@ tun_output(struct ifnet *ifp, struct mbu > error = EHOSTDOWN; > goto out; > } > + // XXXrmind > + mutex_exit(&tp->tun_lock); > > /* >* if the queueing discipline needs packet classification, > @@ -576,7 +569,7 @@ tun_output(struct ifnet *ifp, struct mbu > error = ENOBUFS; > goto out; > } > - bcopy(dst, mtod(m0, char *), dst->sa_len); > + memcpy(mtod(m0, char *), dst, dst->sa_len); > } > > if (tp->tun_flags & TUN_IFHEAD) { > @@ -617,9 +610,10 @@ tun_output(struct ifnet *ifp, struct mbu > goto out; > } > > + mutex_enter(&tp->tun_lock); you call goto out; with &tp->tun-lock not held, it then mutex_exits it.
Re: CVS commit: src/sys/net
Hi, On 2016/06/27 18:55, Roy Marples wrote: > On 27/06/2016 09:58, Kengo NAKAHARA wrote: >> Module Name: src >> Committed By:knakahara >> Date:Mon Jun 27 08:58:50 UTC 2016 >> >> Modified Files: >> src/sys/net: if.c if.h >> >> Log Message: >> reduce link state changing softint if it is not required >> >> ok by ozaki-r@n.o > > There is a spelling mistake, an n is missing. > > if_is_link_state_chageable(ifp) > > should be > > if_is_link_state_changeable(ifp) Thank you for your point out. I fix it if.c:r1.347 and if.h:r1.217. Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, IoT Platform Development Department, Network Division, Technology Unit Kengo NAKAHARA
Re: CVS commit: src/sys/net
On 27/06/2016 09:58, Kengo NAKAHARA wrote: > Module Name: src > Committed By: knakahara > Date: Mon Jun 27 08:58:50 UTC 2016 > > Modified Files: > src/sys/net: if.c if.h > > Log Message: > reduce link state changing softint if it is not required > > ok by ozaki-r@n.o There is a spelling mistake, an n is missing. if_is_link_state_chageable(ifp) should be if_is_link_state_changeable(ifp) Roy
Re: CVS commit: src/sys/net
Date: Mon, 11 Apr 2016 14:33:41 +0900 From: Ryota Ozaki On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell wrote: > Nice, thanks for doing this! Have you tried subjecting it to load, > with options DIAGNOSTIC? Yes. It passes ATF tests (that enable DIAGNOSTIC). OK, great! However, I wonder how much load they subject it to, if they failed to catch the missing membar_producer in the list code bfeore. >- LIST_REMOVE(bif, bif_next); >+ PSLIST_WRITER_REMOVE(bif, bif_next); >+ PSLIST_ENTRY_DESTROY(bif, bif_next); > >BRIDGE_PSZ_PERFORM(sc); > > This order is incorrect. You cannot destroy the entry until you have > confirmed all readers are done with it. You must pserialize_perform > before doing PSLIST_ENTRY_DESTROY. See the note in the pslist(9) man > page about this: > > [...] I see. (I should be claimed RTFM :-/) I clarified some wording early on in the DESCRIPTION section of the man page. Please let me know if there's any verbiage in there that's confusing or hard to follow! >@@ -972,7 +984,8 @@ retry: >} > >i = 0; >- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { >+ PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, >+ bif_next) { > > PSLIST_WRITER_FOREACH Hm, so we need to choose a macro not what we do in the loop but how the loop is protected. Indeed in terms of use of membar. Correct. This is why PSLIST_READER_FOREACH is in the section `READER OPERATIONS' of the man page, which is opened by `The following operations may be performed on list heads and entries when the caller is in a passively serialized read section.'.
Re: CVS commit: src/sys/net
On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell wrote: >Date: Mon, 11 Apr 2016 02:04:14 + >From: Ryota Ozaki > >Module Name:src >Committed By: ozaki-r >Date: Mon Apr 11 02:04:14 UTC 2016 > >Modified Files: >src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h > >Log Message: >Use pslist(9) in bridge(4) > >This adds missing memory barriers to list operations for pserialize. > > Nice, thanks for doing this! Have you tried subjecting it to load, > with options DIAGNOSTIC? Yes. It passes ATF tests (that enable DIAGNOSTIC). > >Index: src/sys/net/bridgestp.c >diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20 >--- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016 >+++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016 >@@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg > { >struct bridge_iflist *bif; > >- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { >+ PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, >+ bif_next) { > > I'm pretty sure everything in bridgestp.c runs under a writer lock, so > you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH. > > (PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the > statement of intent that this is in an exclusive write section, not a > shared read section.) > >@@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc > >BRIDGE_LOCK(sc); > >- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { >+ PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, >+ bif_next) { > > For example, this one is definitely done under a writer lock! Oh yes. I think I already forgot bridgestp.c at all! > >Index: src/sys/net/if_bridge.c >diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112 >--- src/sys/net/if_bridge.c:1.111 Mon Mar 28 04:38:04 2016 >+++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016 >@@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp) >bridge_stop(ifp, 1); > >BRIDGE_LOCK(sc); >- while ((bif = LIST_FIRST(&sc->sc_iflist)) != NULL) >+ for (;;) { >+ bif = PSLIST_WRITER_FIRST(&sc->sc_iflist, struct > bridge_iflist, >+ bif_next); >+ if (bif == NULL) >+ break; >bridge_delete_member(sc, bif); >+ } >+ PSLIST_DESTROY(&sc->sc_iflist); > > Any particular reason you switched from `while ((bif = ...) != NULL)' > to `for (;;) { bif = ...; if (bif == NULL) break;'? I find the while > construct clearer, myself. Just because it's too long (for me) to put it in the condition. > >@@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc >ifs->if_bridge = NULL; >ifs->if_bridgeif = NULL; > >- LIST_REMOVE(bif, bif_next); >+ PSLIST_WRITER_REMOVE(bif, bif_next); >+ PSLIST_ENTRY_DESTROY(bif, bif_next); > >BRIDGE_PSZ_PERFORM(sc); > > This order is incorrect. You cannot destroy the entry until you have > confirmed all readers are done with it. You must pserialize_perform > before doing PSLIST_ENTRY_DESTROY. See the note in the pslist(9) man > page about this: > > `Either _element_ must never have been inserted into a list, or it > must have been inserted and removed, and the caller must have waited > for all parallel readers to finish reading it first.' > > The reason is that PSLIST_WRITER_REMOVE only changes the next pointer > of the *previous* element so that no new readers can discover the > current element, but leaves the next pointer of the *current* element > intact so that readers that are still active can get to the next > element. Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the > current element and kasserts that you removed the entry. I see. (I should be claimed RTFM :-/) > > Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null > poison pointer like QUEUEDEBUG does, so that anyone trying to use it > afterward will actually crash instead of just thinking they hit the > end of the list. > >@@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s > retry: >BRIDGE_LOCK(sc); >count = 0; >- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) >+ PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, >+ bif_next) > > Use PSLIST_WRITER_FOREACH here too. > >@@ -959,7 +970,8 @@ retry: >BRIDGE_LOCK(sc); > >i = 0; >- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) >+ PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, >+ bif_next) > > PSLIST_WRITER_FOREACH > >i++; >if (i > count) { >/* >@@ -972,7 +984,8
Re: CVS commit: src/sys/net
Date: Mon, 11 Apr 2016 02:04:14 + From: Ryota Ozaki Module Name:src Committed By: ozaki-r Date: Mon Apr 11 02:04:14 UTC 2016 Modified Files: src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h Log Message: Use pslist(9) in bridge(4) This adds missing memory barriers to list operations for pserialize. Nice, thanks for doing this! Have you tried subjecting it to load, with options DIAGNOSTIC? Index: src/sys/net/bridgestp.c diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20 --- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016 +++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016 @@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg { struct bridge_iflist *bif; - LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { + PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, + bif_next) { I'm pretty sure everything in bridgestp.c runs under a writer lock, so you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH. (PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the statement of intent that this is in an exclusive write section, not a shared read section.) @@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc BRIDGE_LOCK(sc); - LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { + PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, + bif_next) { For example, this one is definitely done under a writer lock! Index: src/sys/net/if_bridge.c diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112 --- src/sys/net/if_bridge.c:1.111 Mon Mar 28 04:38:04 2016 +++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016 @@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp) bridge_stop(ifp, 1); BRIDGE_LOCK(sc); - while ((bif = LIST_FIRST(&sc->sc_iflist)) != NULL) + for (;;) { + bif = PSLIST_WRITER_FIRST(&sc->sc_iflist, struct bridge_iflist, + bif_next); + if (bif == NULL) + break; bridge_delete_member(sc, bif); + } + PSLIST_DESTROY(&sc->sc_iflist); Any particular reason you switched from `while ((bif = ...) != NULL)' to `for (;;) { bif = ...; if (bif == NULL) break;'? I find the while construct clearer, myself. @@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc ifs->if_bridge = NULL; ifs->if_bridgeif = NULL; - LIST_REMOVE(bif, bif_next); + PSLIST_WRITER_REMOVE(bif, bif_next); + PSLIST_ENTRY_DESTROY(bif, bif_next); BRIDGE_PSZ_PERFORM(sc); This order is incorrect. You cannot destroy the entry until you have confirmed all readers are done with it. You must pserialize_perform before doing PSLIST_ENTRY_DESTROY. See the note in the pslist(9) man page about this: `Either _element_ must never have been inserted into a list, or it must have been inserted and removed, and the caller must have waited for all parallel readers to finish reading it first.' The reason is that PSLIST_WRITER_REMOVE only changes the next pointer of the *previous* element so that no new readers can discover the current element, but leaves the next pointer of the *current* element intact so that readers that are still active can get to the next element. Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the current element and kasserts that you removed the entry. Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null poison pointer like QUEUEDEBUG does, so that anyone trying to use it afterward will actually crash instead of just thinking they hit the end of the list. @@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s retry: BRIDGE_LOCK(sc); count = 0; - LIST_FOREACH(bif, &sc->sc_iflist, bif_next) + PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, + bif_next) Use PSLIST_WRITER_FOREACH here too. @@ -959,7 +970,8 @@ retry: BRIDGE_LOCK(sc); i = 0; - LIST_FOREACH(bif, &sc->sc_iflist, bif_next) + PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, + bif_next) PSLIST_WRITER_FOREACH i++; if (i > count) { /* @@ -972,7 +984,8 @@ retry: } i = 0; - LIST_FOREACH(bif, &sc->sc_iflist, bif_next) { + PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist, + bif_next) { PSLIST_WRITER_FOREACH
re: CVS commit: src/sys/net
> And add an assertion of if_addrlen and ll_addr. thanks. i almost did this myself :-)
Re: CVS commit: src/sys/net
On Fri, Feb 19, 2016 at 12:16 PM, Roy Marples wrote: > On 18/02/2016 11:29, Ryota Ozaki wrote: >> Okay, I've changed my mind. Let's commit your patch anyway and >> keep an eye on the lossage messages to know how often and how >> many lossage happens. If there are issues, we can improve then. >> It's not too late. > > The irony is that overnight I changed my mind also! > The challenge was to achieve a longer queue in the same size or less. > According to size(1), the end result of this patch is exactly the same > size as my previous one :) > > This has been achieved with bitmask array macros to densely pack a 2 bit > integer into a larger integer as a backing store. > I decided to use a uint16_t which allows for 8 events in the queue. > Simply changing the integral type will magically expand the queue, so it > could be changed to a uint64_t to allow 32 events in the queue but this > is probably overkill. > For really size challenged systems, this could be changed to a uint8_t > to allow 4 events in the queue ... which should still be big enough > normally, but you wanted 10 so the closest match was 8. > > I've managed to keep the assignment of the link state to the interface > within the initial if_link_state_change() call which my prior patch > moved out. > > Also, only one queued event is processed at a time, subsequent ones have > to wait for another softint(9) to occur. > After all, the host may have >1 interface and we don't want to hog the > network processing a larger queue. > >> I'm sorry for disturbing you. > > No, no. Thankyou for challenging me :) > > Maybe the macros here could be put to a more generic use (or there are > already generic macros for this which I missed?) but unsure how right now. > > All being said, I learned a lot about bit shifting doing this and I > might have made a mistake, but sample test cases for my macros appear to > work fine. I would appreciate an OK from someone before I commit this. Wow, I must be convinced by the new design :) I don't understand the macros very much though, coverity would warn us if there are issues. Thank you for your effort! ozaki-r
Re: CVS commit: src/sys/net
On 18/02/2016 11:29, Ryota Ozaki wrote: > Okay, I've changed my mind. Let's commit your patch anyway and > keep an eye on the lossage messages to know how often and how > many lossage happens. If there are issues, we can improve then. > It's not too late. The irony is that overnight I changed my mind also! The challenge was to achieve a longer queue in the same size or less. According to size(1), the end result of this patch is exactly the same size as my previous one :) This has been achieved with bitmask array macros to densely pack a 2 bit integer into a larger integer as a backing store. I decided to use a uint16_t which allows for 8 events in the queue. Simply changing the integral type will magically expand the queue, so it could be changed to a uint64_t to allow 32 events in the queue but this is probably overkill. For really size challenged systems, this could be changed to a uint8_t to allow 4 events in the queue ... which should still be big enough normally, but you wanted 10 so the closest match was 8. I've managed to keep the assignment of the link state to the interface within the initial if_link_state_change() call which my prior patch moved out. Also, only one queued event is processed at a time, subsequent ones have to wait for another softint(9) to occur. After all, the host may have >1 interface and we don't want to hog the network processing a larger queue. > I'm sorry for disturbing you. No, no. Thankyou for challenging me :) Maybe the macros here could be put to a more generic use (or there are already generic macros for this which I missed?) but unsure how right now. All being said, I learned a lot about bit shifting doing this and I might have made a mistake, but sample test cases for my macros appear to work fine. I would appreciate an OK from someone before I commit this. Roy Index: sys/net/if.c === RCS file: /cvsroot/src/sys/net/if.c,v retrieving revision 1.324 diff -u -r1.324 if.c --- sys/net/if.c15 Feb 2016 08:08:04 - 1.324 +++ sys/net/if.c19 Feb 2016 03:13:45 - @@ -603,7 +603,8 @@ ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */ ifp->if_link_state = LINK_STATE_UNKNOWN; - ifp->if_old_link_state = LINK_STATE_UNKNOWN; + ifp->if_link_queue = -1; /* all bits set, see link_state_change() */ + ifp->if_link_queue_state = LINK_STATE_UNKNOWN; ifp->if_capenable = 0; ifp->if_csum_flags_tx = 0; @@ -1563,51 +1564,141 @@ } /* - * Handle a change in the interface link state. - * XXX: We should listen to the routing socket in-kernel rather - * than calling in6_if_link_* functions directly from here. + * bitmask macros to manage a densely packed link_state change queue. + * Because we need to store LINK_STATE_UNKNOWN(0), LINK_STATE_DOWN(1) and + * LINK_STATE_UP(2) we need 2 bits for each state change. + * As a state change to store is 0, treat all bits set as an unset item. + */ +#define LQ_ITEM_BITS 2 +#define LQ_ITEM_MASK ((1 << LQ_ITEM_BITS) - 1) +#define LQ_MASK(i) (LQ_ITEM_MASK << (i) * LQ_ITEM_BITS) +#define LINK_STATE_UNSET LQ_ITEM_MASK +#define LQ_ITEM(q, i) (((q) & LQ_MASK((i))) >> (i) * LQ_ITEM_BITS) +#define LQ_STORE(q, i, v)\ + do { \ + (q) &= ~LQ_MASK((i)); \ + (q) |= (v) << (i) * LQ_ITEM_BITS; \ + } while (0 /* CONSTCOND */) +#define LQ_MAX(q) ((sizeof((q)) * NBBY) / LQ_ITEM_BITS) +#define LQ_POP(q, v) \ + do { \ + (v) = LQ_ITEM((q), 0);\ + (q) >>= LQ_ITEM_BITS; \ + (q) |= LINK_STATE_UNSET << (LQ_MAX((q)) - 1) * LQ_ITEM_BITS; \ + } while (0 /* CONSTCOND */) +#define LQ_PUSH(q, v)\ + do { \ + (q) >>= LQ_ITEM_BITS; \ + (q) |= (v) << (LQ_MAX((q)) - 1) * LQ_ITEM_BITS; \ + } while (0 /* CONSTCOND */) +#define LQ_FIND_UNSET(q, i) \ + for ((i) = 0; i < LQ_MAX((q)); (i)++) { \ + if (LQ_ITEM((q), (i)) == LINK_STATE_UNSET)\ + break;\ + } +/* + * Handle a change in the interface link state and + * queue notifications. */ void if_link_state_change(struct ifnet *ifp, int link_
Re: CVS commit: src/sys/net
On Thu, Feb 18, 2016 at 12:08 PM, Ryota Ozaki wrote: > On Thu, Feb 18, 2016 at 12:18 AM, Roy Marples wrote: >> On 17/02/2016 01:46, Roy Marples wrote: >>> Patch attached to solve change from a priority array into a bit mask >>> approach where we swallow the whole queue in a softint. >> >> I did the same tests as noted in PR 50602 - put the system under load by >> building clang from src and qt5 from pkgsrc at the same time, both with >> -j3 on my Core2 @2.4Ghz dev machine. >> The load was measured at about 9 by top, the machine itself was pretty >> un-useable. >> >> It also has a wm(4) interface, which the reporter had. >> Also an iwn(4) interface. >> Toggling both as fast as possible - my daughter power cycled the AP >> while I unplugged / replugged the ethernet cable. >> >> Doing this for about 10 minutes resulted in no lossage reported with my >> patch and like state transitions happened perfectly. > > Good. > >> >> I would imagine any lossage would occur as a result of a driver or >> hardware defect, where more important information is likely found on the >> console as well. > > IIUC amd64/i386 have the fast softint facility and it tries to schedule > a softint just after hwintr finishes, and so more than one events are > unlikely to come before the softint starts? Please someone correct > me if my guess is wrong. Okay, I've changed my mind. Let's commit your patch anyway and keep an eye on the lossage messages to know how often and how many lossage happens. If there are issues, we can improve then. It's not too late. I'm sorry for disturbing you. ozaki-r
Re: CVS commit: src/sys/net
On 2016/02/18 11:54, Ryota Ozaki wrote: On Wed, Feb 17, 2016 at 9:12 PM, Roy Marples wrote: On 17/02/2016 09:02, Ryota Ozaki wrote: So what events would you choose to skip, if not the scheme that Roy described? (I think I confused you, sorry...) I rather want to not skip anything as much as possible (except for repeating same events (e.g., up/up/up) because keeping them all changes the original behavior). I intend to skip/eliminate events only if there are too many events happen in a short period (i.e., need queuing) to protect the system from overloading. In that case (it's a very rare case I think), we just drop an earliest event first. How much is too many and what is a short period? We can choose a number that applications unlikely to handle the events (10 or so). A short period means a period between the first interrupt for a link state event happens and a softint for link state changes starts running. Once you start skipping/eliminating events, how is your solution any better? How do you measure some lossage vs some lossage? Mine doesn't drop events if there are only a few events while yours drops one event even if there are just two events. I suppose that a few or several events can happen in "a short period" easier than a dozen of events (or more) and the latter implies some hardware troubles (or VMM defects?) and needs a special care to protect the system, for example we give up delivering all events. For the former, we shouldn't skip/eliminate events. Also, we can't just drop the earliest event first - we have to ensure that each state is left in the queue. Consider starting in UP: DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP We cannot just discard the fact it went down because important events attached to DOWN won't trigger. We can preserve DOWN specially if we need. Lastly, have we considered the system could be overloaded due to so many link state change events? A longer queue or more complicated would only make this worse. Of course, I care and so accept dropping events, but do you really think just two events cause overload? From an earlier post of yours: Even if a UP state is transient, it's an event that may provide us a hint of network conditions for diagnostic. We may be able to get it from the console output, but it's not so convenient; we need to track events via two different facilities. If you're skipping/eliminating events as well then you would also need a second facility to record this. Other than scribbling on the console, what did you have in mind? Could this be used elsewhere in the system where equvialent network assertations are recorded? I don't plan to provide another facility to notify events (even if we provide something, nobody wants to use it, I think). Yes, it's a limitation that we cannot always provide full events, no objection on that. But we can still tell that something bad is happening by sending a bunch of events at once. ozaki-r This discussion reminds me cardbus's Chattering suppression: http://nxr.netbsd.org/xref/src/sys/dev/cardbus/cardslot.c#309 http://nxr.netbsd.org/xref/src/sys/dev/pci/pccbb.c#1053 -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: CVS commit: src/sys/net
On Thu, Feb 18, 2016 at 12:18 AM, Roy Marples wrote: > On 17/02/2016 01:46, Roy Marples wrote: >> Patch attached to solve change from a priority array into a bit mask >> approach where we swallow the whole queue in a softint. > > I did the same tests as noted in PR 50602 - put the system under load by > building clang from src and qt5 from pkgsrc at the same time, both with > -j3 on my Core2 @2.4Ghz dev machine. > The load was measured at about 9 by top, the machine itself was pretty > un-useable. > > It also has a wm(4) interface, which the reporter had. > Also an iwn(4) interface. > Toggling both as fast as possible - my daughter power cycled the AP > while I unplugged / replugged the ethernet cable. > > Doing this for about 10 minutes resulted in no lossage reported with my > patch and like state transitions happened perfectly. Good. > > I would imagine any lossage would occur as a result of a driver or > hardware defect, where more important information is likely found on the > console as well. IIUC amd64/i386 have the fast softint facility and it tries to schedule a softint just after hwintr finishes, and so more than one events are unlikely to come before the softint starts? Please someone correct me if my guess is wrong. ozaki-r
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 9:12 PM, Roy Marples wrote: > On 17/02/2016 09:02, Ryota Ozaki wrote: >>> So what events would you choose to skip, if not the scheme that Roy >>> described? >> >> (I think I confused you, sorry...) >> >> I rather want to not skip anything as much as possible >> (except for repeating same events (e.g., up/up/up) because >> keeping them all changes the original behavior). >> >> I intend to skip/eliminate events only if there are too many >> events happen in a short period (i.e., need queuing) to protect >> the system from overloading. In that case (it's a very rare case >> I think), we just drop an earliest event first. > > How much is too many and what is a short period? We can choose a number that applications unlikely to handle the events (10 or so). A short period means a period between the first interrupt for a link state event happens and a softint for link state changes starts running. > Once you start skipping/eliminating events, how is your solution any > better? How do you measure some lossage vs some lossage? Mine doesn't drop events if there are only a few events while yours drops one event even if there are just two events. I suppose that a few or several events can happen in "a short period" easier than a dozen of events (or more) and the latter implies some hardware troubles (or VMM defects?) and needs a special care to protect the system, for example we give up delivering all events. For the former, we shouldn't skip/eliminate events. > > Also, we can't just drop the earliest event first - we have to ensure > that each state is left in the queue. > Consider starting in UP: > DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP > > We cannot just discard the fact it went down because important events > attached to DOWN won't trigger. We can preserve DOWN specially if we need. > > Lastly, have we considered the system could be overloaded due to so many > link state change events? A longer queue or more complicated would only > make this worse. Of course, I care and so accept dropping events, but do you really think just two events cause overload? > > From an earlier post of yours: >> Even if a UP state is transient, it's an event that may provide us a >> hint of network conditions for diagnostic. We may be able to get it >> from the console output, but it's not so convenient; we need to >> track events via two different facilities. > > If you're skipping/eliminating events as well then you would also need a > second facility to record this. Other than scribbling on the console, > what did you have in mind? Could this be used elsewhere in the system > where equvialent network assertations are recorded? I don't plan to provide another facility to notify events (even if we provide something, nobody wants to use it, I think). Yes, it's a limitation that we cannot always provide full events, no objection on that. But we can still tell that something bad is happening by sending a bunch of events at once. ozaki-r
Re: CVS commit: src/sys/net
On 17/02/2016 01:46, Roy Marples wrote: > Patch attached to solve change from a priority array into a bit mask > approach where we swallow the whole queue in a softint. I did the same tests as noted in PR 50602 - put the system under load by building clang from src and qt5 from pkgsrc at the same time, both with -j3 on my Core2 @2.4Ghz dev machine. The load was measured at about 9 by top, the machine itself was pretty un-useable. It also has a wm(4) interface, which the reporter had. Also an iwn(4) interface. Toggling both as fast as possible - my daughter power cycled the AP while I unplugged / replugged the ethernet cable. Doing this for about 10 minutes resulted in no lossage reported with my patch and like state transitions happened perfectly. I would imagine any lossage would occur as a result of a driver or hardware defect, where more important information is likely found on the console as well. Is there any more testing I can do here? Roy
Re: CVS commit: src/sys/net
On 17/02/2016 09:02, Ryota Ozaki wrote: >> So what events would you choose to skip, if not the scheme that Roy >> described? > > (I think I confused you, sorry...) > > I rather want to not skip anything as much as possible > (except for repeating same events (e.g., up/up/up) because > keeping them all changes the original behavior). > > I intend to skip/eliminate events only if there are too many > events happen in a short period (i.e., need queuing) to protect > the system from overloading. In that case (it's a very rare case > I think), we just drop an earliest event first. How much is too many and what is a short period? Once you start skipping/eliminating events, how is your solution any better? How do you measure some lossage vs some lossage? Also, we can't just drop the earliest event first - we have to ensure that each state is left in the queue. Consider starting in UP: DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP We cannot just discard the fact it went down because important events attached to DOWN won't trigger. Lastly, have we considered the system could be overloaded due to so many link state change events? A longer queue or more complicated would only make this worse. >From an earlier post of yours: > Even if a UP state is transient, it's an event that may provide us a > hint of network conditions for diagnostic. We may be able to get it > from the console output, but it's not so convenient; we need to > track events via two different facilities. If you're skipping/eliminating events as well then you would also need a second facility to record this. Other than scribbling on the console, what did you have in mind? Could this be used elsewhere in the system where equvialent network assertations are recorded? Roy
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 4:52 PM, Taylor R Campbell wrote: >Date: Wed, 17 Feb 2016 15:12:31 +0900 >From: Ryota Ozaki > >On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell > wrote: >> Note that this queueing takes effect only if the link state changes >> multiple times within maybe a few microseconds, before the softint can >> run. If your link state is changing that many times so quickly, >> losing an event or two is probably the least of your worries > >Yeah, it's nitpicking, but for that reason, I think it's better to pass >events as they are to userland. > >> -- but >> you're probably more interested in seeing something like ...down...up >> than ...up/up/up. > >Yes. (up/up/up events are eliminated in the first place though.) > > So what events would you choose to skip, if not the scheme that Roy > described? (I think I confused you, sorry...) I rather want to not skip anything as much as possible (except for repeating same events (e.g., up/up/up) because keeping them all changes the original behavior). I intend to skip/eliminate events only if there are too many events happen in a short period (i.e., need queuing) to protect the system from overloading. In that case (it's a very rare case I think), we just drop an earliest event first. > > I bet that, whatever events you would choose to skip, we can still > prove that the resulting queues need be no longer than, say, three > elements, and we'd still usefully report link flapping to userland -- > as long as we can make enough progress to run softints and userland > processes at all. > > Here are some example reductions that intuitively sound reasonable to > me: > > down/down = down > up/up = up > down/unknown/up = down/up > down/up/down = down > > What other sequences of events would you simplify? As I said, I want to keep the sequences (except for repeating same events) so the last two examples will be sent as they are. ozaki-r
Re: CVS commit: src/sys/net
Date: Wed, 17 Feb 2016 15:12:31 +0900 From: Ryota Ozaki On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell wrote: > Note that this queueing takes effect only if the link state changes > multiple times within maybe a few microseconds, before the softint can > run. If your link state is changing that many times so quickly, > losing an event or two is probably the least of your worries Yeah, it's nitpicking, but for that reason, I think it's better to pass events as they are to userland. > -- but > you're probably more interested in seeing something like ...down...up > than ...up/up/up. Yes. (up/up/up events are eliminated in the first place though.) So what events would you choose to skip, if not the scheme that Roy described? I bet that, whatever events you would choose to skip, we can still prove that the resulting queues need be no longer than, say, three elements, and we'd still usefully report link flapping to userland -- as long as we can make enough progress to run softints and userland processes at all. Here are some example reductions that intuitively sound reasonable to me: down/down = down up/up = up down/unknown/up = down/up down/up/down = down What other sequences of events would you simplify?
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell wrote: >Date: Wed, 17 Feb 2016 11:49:48 +0900 >From: Ryota Ozaki > >And the priority provides asymmetric event deliveries; when the state >repeats up and down, a down event is delivered if the final state is down >while a down event and a up event are delivered if the final state is up. >It's confusable to me. > > This is the part I'm still not sure about: Roy's patch reduces up/down > to just down, but leaves down/up alone. I'm guessing that Roy expects > applications like dhcpcd to want to clear some state if the link ever > goes down, but to be uninterested in knowing that the link went up for > a microsecond and then back down again. > >Can we pass events as they are as much as possible? I don't complain that >event reductions in the kernel, but I think it should be down based on time >series manner (e.g., pick latest three events), not based on some priority >things. If we accept event reductions, we can do it with bit-encoding >(w/o a linked list (memory allocations)), for example represent the state >as 2 bits and encode event series into a variable (say 16 events in int). > > Note that this queueing takes effect only if the link state changes > multiple times within maybe a few microseconds, before the softint can > run. If your link state is changing that many times so quickly, > losing an event or two is probably the least of your worries Yeah, it's nitpicking, but for that reason, I think it's better to pass events as they are to userland. > -- but > you're probably more interested in seeing something like ...down...up > than ...up/up/up. Yes. (up/up/up events are eliminated in the first place though.) ozaki-r
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 1:40 PM, Roy Marples wrote: > On 2016-02-17 02:49, Ryota Ozaki wrote: >>> >>> If you don't read the patch, here is the comment I added to the >>> if_link_state_change() function: >>> >>> * Queue a change in the interface link state. >>> * >>> * The queue itself is very limited: >>> * no state can be queued more than once >>> * a higher priority state will remove lower priority states >>> * >>> * The queue state priority in descending order is DOWN, UNKNOWN, UP. >>> * Rationale is that if the kernel can't process the link state change >>> queue >>> * fast enough then userland has no chance of catching up. >>> * Any lossage is logged to the console. >>> * The queue state priority is ordered so that link state aware programs >>> * will still have the correct end result regardless of any lossage. >>> >>> Any comments or objections? >> >> >> I worry about the priority order, where it comes from? I feel that >> the kernel does too much decision; which event is important. I think >> it depends on applications. > > > Fair enough. > Can you state a use case for an application needing the full ordering? Event tracking of link states for example. > > Let us be more specific > UP: DOWN - UP > Is passed fully > DOWN: UP - DOWN > UP is not passed to userland. > What do you want to achieve in userland during the UP when the kernel > already considers the link DOWN? Even if a UP state is transient, it's an event that may provide us a hint of network conditions for diagnostic. We may be able to get it from the console output, but it's not so convenient; we need to track events via two different facilities. > >> And the priority provides asymmetric event deliveries; when the state >> repeats up and down, a down event is delivered if the final state is down >> while a down event and a up event are delivered if the final state is up. >> It's confusable to me. > > > It's not that confusing :) > If you're an application, what benefit do you have of processing an UP state > on the link when a DOWN state is about to follow? > Aside from logging it (which we still do, just not via a route message just > a console diagnostic) what would you do you with it? > Because a DOWN state is coming, it cannot possibly do anything over the > interface, so why bother? Applications can ignore the UP event. IMO we should delegate such a decision to applications because we don't know all requirements of all applications. ozaki-r
Re: CVS commit: src/sys/net
Date: Wed, 17 Feb 2016 11:49:48 +0900 From: Ryota Ozaki And the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. This is the part I'm still not sure about: Roy's patch reduces up/down to just down, but leaves down/up alone. I'm guessing that Roy expects applications like dhcpcd to want to clear some state if the link ever goes down, but to be uninterested in knowing that the link went up for a microsecond and then back down again. Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). Note that this queueing takes effect only if the link state changes multiple times within maybe a few microseconds, before the softint can run. If your link state is changing that many times so quickly, losing an event or two is probably the least of your worries -- but you're probably more interested in seeing something like ...down...up than ...up/up/up.
Re: CVS commit: src/sys/net
On 2016-02-17 02:49, Ryota Ozaki wrote: If you don't read the patch, here is the comment I added to the if_link_state_change() function: * Queue a change in the interface link state. * * The queue itself is very limited: * no state can be queued more than once * a higher priority state will remove lower priority states * * The queue state priority in descending order is DOWN, UNKNOWN, UP. * Rationale is that if the kernel can't process the link state change queue * fast enough then userland has no chance of catching up. * Any lossage is logged to the console. * The queue state priority is ordered so that link state aware programs * will still have the correct end result regardless of any lossage. Any comments or objections? I worry about the priority order, where it comes from? I feel that the kernel does too much decision; which event is important. I think it depends on applications. Fair enough. Can you state a use case for an application needing the full ordering? Let us be more specific UP: DOWN - UP Is passed fully DOWN: UP - DOWN UP is not passed to userland. What do you want to achieve in userland during the UP when the kernel already considers the link DOWN? And the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. It's not that confusing :) If you're an application, what benefit do you have of processing an UP state on the link when a DOWN state is about to follow? Aside from logging it (which we still do, just not via a route message just a console diagnostic) what would you do you with it? Because a DOWN state is coming, it cannot possibly do anything over the interface, so why bother? Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). Of course we could. But until we understand a reason why this is needed why should we? Roy
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 10:46 AM, Roy Marples wrote: > On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote: >> Except for an issue with queueing discussed privately (scheduling a >> softint that is already scheduled won't cause it to run again, so >> if_link_state_change_si needs to process the whole queue in one go), >> that approach looks fine to me, although as we also discussed >> privately we can easily compact it into a three-bit mask with a >> trivial update instead of a whole array of states. > > Patch attached to solve change from a priority array into a bit mask approach > where we swallow the whole queue in a softint. > Thanks to Taylor for some reviews and suggestions. > >> This has the consequence that if the link goes up/down in quick >> succession, and then up/down in quick succession, &c., with the queue >> processed after each up/down transition, userland will never be >> notified. However, if the link goes down/up, then down/up, &c., the >> userland will be notified of all the transitions. Roy claims that >> that's OK, and I'm inclined to believe the author of dhcpcd about >> this. > > If you don't read the patch, here is the comment I added to the > if_link_state_change() function: > > * Queue a change in the interface link state. > * > * The queue itself is very limited: > * no state can be queued more than once > * a higher priority state will remove lower priority states > * > * The queue state priority in descending order is DOWN, UNKNOWN, UP. > * Rationale is that if the kernel can't process the link state change queue > * fast enough then userland has no chance of catching up. > * Any lossage is logged to the console. > * The queue state priority is ordered so that link state aware programs > * will still have the correct end result regardless of any lossage. > > Any comments or objections? I worry about the priority order, where it comes from? I feel that the kernel does too much decision; which event is important. I think it depends on applications. And the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). ozaki-r
Re: CVS commit: src/sys/net
On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote: > Except for an issue with queueing discussed privately (scheduling a > softint that is already scheduled won't cause it to run again, so > if_link_state_change_si needs to process the whole queue in one go), > that approach looks fine to me, although as we also discussed > privately we can easily compact it into a three-bit mask with a > trivial update instead of a whole array of states. Patch attached to solve change from a priority array into a bit mask approach where we swallow the whole queue in a softint. Thanks to Taylor for some reviews and suggestions. > This has the consequence that if the link goes up/down in quick > succession, and then up/down in quick succession, &c., with the queue > processed after each up/down transition, userland will never be > notified. However, if the link goes down/up, then down/up, &c., the > userland will be notified of all the transitions. Roy claims that > that's OK, and I'm inclined to believe the author of dhcpcd about > this. If you don't read the patch, here is the comment I added to the if_link_state_change() function: * Queue a change in the interface link state. * * The queue itself is very limited: * no state can be queued more than once * a higher priority state will remove lower priority states * * The queue state priority in descending order is DOWN, UNKNOWN, UP. * Rationale is that if the kernel can't process the link state change queue * fast enough then userland has no chance of catching up. * Any lossage is logged to the console. * The queue state priority is ordered so that link state aware programs * will still have the correct end result regardless of any lossage. Any comments or objections? RoyIndex: sys/net/if.c === RCS file: /cvsroot/src/sys/net/if.c,v retrieving revision 1.324 diff -u -r1.324 if.c --- sys/net/if.c 15 Feb 2016 08:08:04 - 1.324 +++ sys/net/if.c 17 Feb 2016 01:36:07 - @@ -603,7 +603,7 @@ ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */ ifp->if_link_state = LINK_STATE_UNKNOWN; - ifp->if_old_link_state = LINK_STATE_UNKNOWN; + ifp->if_link_queue = 0; ifp->if_capenable = 0; ifp->if_csum_flags_tx = 0; @@ -1563,9 +1563,18 @@ } /* - * Handle a change in the interface link state. - * XXX: We should listen to the routing socket in-kernel rather - * than calling in6_if_link_* functions directly from here. + * Queue a change in the interface link state. + * + * The queue itself is very limited: + * no state can be queued more than once + * a higher priority state will remove lower priority states + * + * The queue state priority in descending order is DOWN, UNKNOWN, UP. + * Rationale is that if the kernel can't process the link state change queue + * fast enough then userland has no chance of catching up. + * The queue state priority is ordered so that link state aware programs + * will still have the correct end result regardless of any lossage. + * Any lossage is logged to the console. */ void if_link_state_change(struct ifnet *ifp, int link_state) @@ -1573,30 +1582,68 @@ int s; s = splnet(); - if (ifp->if_link_state == link_state) { - splx(s); - return; + + /* If state is the same and there is no current queue, do nothing. */ + if (ifp->if_link_state == link_state && ifp->if_link_queue == 0x00) + goto out; + +#define LINK_STATE_LOST "%s: lost %s notification\n" + switch (link_state) { + case LINK_STATE_DOWN: + if (isset(&ifp->if_link_queue, LINK_STATE_UP)) { + printf(LINK_STATE_LOST, + ifp->if_xname, "LINK_STATE_UP"); + clrbit(&ifp->if_link_queue, LINK_STATE_UP); + } + if (isset(&ifp->if_link_queue, LINK_STATE_UNKNOWN)) { + printf(LINK_STATE_LOST, + ifp->if_xname, "LINK_STATE_UNKNOWN"); + clrbit(&ifp->if_link_queue, LINK_STATE_UNKNOWN); + } + setbit(&ifp->if_link_queue, LINK_STATE_DOWN); + break; + case LINK_STATE_UNKNOWN: + if (isset(&ifp->if_link_queue, LINK_STATE_UP)) { + printf(LINK_STATE_LOST, + ifp->if_xname, "LINK_STATE_UP"); + clrbit(&ifp->if_link_queue, LINK_STATE_UP); + } + setbit(&ifp->if_link_queue, LINK_STATE_UNKNOWN); + break; + case LINK_STATE_UP: + setbit(&ifp->if_link_queue, LINK_STATE_UP); + break; + default: +#ifdef DEBUG + printf("%s: invalid link state %d\n", + ifp->if_xname, link_state); +#endif + goto out; } +#undef LINK_STATE_LOST - ifp->if_link_state = link_state; softint_schedule(ifp->if_link_si); +out: splx(s); } - +/* + * Handle a change in the interface link state. + * Must be called at splnet(). + */ static void -if_link_state_change_si(void *arg) +if_link_state_change0(struct ifnet *ifp, int link_state) { - struct ifnet *ifp = arg; - int s; - int link_state, old_link_state; + int old_link_state; struct domain *dp; - s = splnet(); - link_state = ifp->if_link_state; - old_link_state = ifp->if_old_link_state; - ifp->if_old_link_state = ifp->if_l
Re: CVS commit: src/sys/net
On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote: > This has the consequence that if the link goes up/down in quick > succession, and then up/down in quick succession, &c., with the queue > processed after each up/down transition, userland will never be > notified. However, if the link goes down/up, then down/up, &c., the > userland will be notified of all the transitions. Roy claims that > that's OK, and I'm inclined to believe the author of dhcpcd about > this. For the up/down - process - up/down - process in quick succession we can print a diagnostic to the console if the up is trimmed. Roy
Re: CVS commit: src/sys/net
Date: Tue, 16 Feb 2016 13:24:15 + From: Roy Marples I found the time to work on this, here is the patch I've been running for an hour or so upping and downing interfaces. The rule in Roy's patch is that each new state changes kicks out all higher-priority transitions, and is queued after all lower-priority transitions, with priorities DOWN = -1, UNKNOWN = 0, UP = 1. This has the effect that each queue has the regular expression down? unknown? up? with the following equivalences: down/down = down down/unknown = down/unknown down/up = down/up unknown/down = down unknown/unknown = unknown unknown/up = unknown/up up/down = down up/unknown = unknown up/up = up Except for an issue with queueing discussed privately (scheduling a softint that is already scheduled won't cause it to run again, so if_link_state_change_si needs to process the whole queue in one go), that approach looks fine to me, although as we also discussed privately we can easily compact it into a three-bit mask with a trivial update instead of a whole array of states. This has the consequence that if the link goes up/down in quick succession, and then up/down in quick succession, &c., with the queue processed after each up/down transition, userland will never be notified. However, if the link goes down/up, then down/up, &c., the userland will be notified of all the transitions. Roy claims that that's OK, and I'm inclined to believe the author of dhcpcd about this.
Re: CVS commit: src/sys/net
Date: Tue, 16 Feb 2016 08:38:55 + From: Roy Marples The queue can be kept quite short because we don't care about any prior entries each time state changes to LINK_STATE_DOWN. Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when setting LINK_STATE_UNKNOWN. Finally we set LINK_STATE_UP. So the queue should only ever hold a maximum of 3 items. Could even work it like so [...] Thoughts? Sounds better than my suggestion, from someone who knows better how the link state events are used! Maybe I should have looked at the list of valid states, too -- all three of them. We don't even need an array, really. If I understand correctly, we obviously have the following equivalences of queues: (1) unknown/unknown = unknown (2) unknown/down= down (3) unknown/up = down/up (4)down/unknown = down/unknown (5)down/down= down (6)down/up = down/up (7) up/unknown = up/unknown (8) up/down= up/down (9) up/up = up I can reasonably believe (10) up/down/up = down/up and (11) down/up/down = down, too. From these, we can reduce almost all 3-queues to 2-queues: unknown/unknown/unknown =(1) unknown/unknown =(1) unknown unknown/unknown/down =(1) unknown/down =(2) down unknown/unknown/up =(3) unknown/up =(2) down/up unknown/down/unknown =(2) down/unknown unknown/down/down =(2) down/down =(5) down unknown/down/up =(2) down/down/up =(5) down/up unknown/up/unknown =(3) down/up/unknown ? unknown/up/down =(3) down/up/down =(11) down unknown/up/up =(3) down/up/up =(9) down/up down/unknown/unknown =(1) down/unknown down/unknown/down =(2) down/down =(5) down down/unknown/up =(3) down/down/up =(5) down/up down/down/unknown =(5) down/unknown down/down/down =(5) down/down =(5) down down/down/up =(5) down/up down/up/unknown ? down/up/down =(11) down down/up/up =(9) down/up up/unknown/unknown =(1) up/unknown up/unknown/down =(2) up/down up/unknown/up = up/down/up =(10) down/up up/down/unknown ? up/down/down =(5) up/down up/down/up =(10) down/up up/up/unknown = up/unknown up/up/down = up/down up/up/up = up/up = up That leaves us with only two cases where a 3-queue stays a 3-queue and doesn't reduce to a 2-queue. For those cases, the possible 4-queues are: down/up/unknown/unknown =(1) down/up/unknown down/up/unknown/down =(2) down/up/down =(11) down down/up/unknown/up =(3) down/up/down/up =(10) down/up up/down/unknown/unknown =(1) up/down/unknown up/down/unknown/down =(2) up/down/down =(5) up/down up/down/unknown/up =(3) up/down/down/up =(5) up/down/up =(10) down/up Collecting all the possible queues, we get only nine possibilities: down down/unknown down/up down/up/unknown unknown up up/down up/down/unknown up/unknown We could store it as a single int (numbered densely, or by 1 << LINK_STATE_*) and write the queue-append operation all as one switch. Perhaps there are more equivalences I didn't capture, or cases that are never going to occur -- for example, do we ever actually transition from up/down to unknown, or does unknown only happen when the interface is first attached? (I'm not sure.)
Re: CVS commit: src/sys/net
Date: Tue, 16 Feb 2016 16:30:54 +0900 From: Ryota Ozaki On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell wrote: > void > if_link_state_change(struct ifnet *ifp, int link_state) > { > int s; > > s = splnet(); > if (ifp->if_link_state_pending == link_state) > goto out; > > if (ifp->if_link_state_pending != ifp->if_link_state) { What does the conditional intend? I don't remember! It doesn't make sense now that I look at it again. This was just a sketch -- obviously it was missing some parts, such as pool_cache_put, and some parts seem to make no sense. I would just take it out. Anyway I made a patch based on the above code: http://www.netbsd.org/~ozaki-r/link_state_change.diff What I did are to make your code compilable and workable, and limit the number of pending events. Looks reasonable to me. We should have some automatic tests for all this, though!
Re: CVS commit: src/sys/net
On 16/02/2016 08:38, Roy Marples wrote: > On Tuesday 16 February 2016 13:37:28 you wrote: >>> Oh, I see. We shouldn't drop any events of link state changes. >> >> i don't see any point in keeping a huge series of link changes >> if they can be collapsed into an equivalent sequence. with VMs >> and other user-controlled systems it's possible to flood such a >> link state checking engine. > > The queue can be kept quite short because we don't care about any prior > entries each time state changes to LINK_STATE_DOWN. > Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when > setting LINK_STATE_UNKNOWN. > Finally we set LINK_STATE_UP. > > So the queue should only ever hold a maximum of 3 items. Could even work it > like so I found the time to work on this, here is the patch I've been running for an hour or so upping and downing interfaces. OK to commit? Roy Index: sys/net/if.c === RCS file: /cvsroot/src/sys/net/if.c,v retrieving revision 1.324 diff -u -r1.324 if.c --- sys/net/if.c15 Feb 2016 08:08:04 - 1.324 +++ sys/net/if.c16 Feb 2016 13:22:43 - @@ -603,7 +603,6 @@ ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */ ifp->if_link_state = LINK_STATE_UNKNOWN; - ifp->if_old_link_state = LINK_STATE_UNKNOWN; ifp->if_capenable = 0; ifp->if_csum_flags_tx = 0; @@ -1562,41 +1561,73 @@ } } +/* Priority list for link state changes */ +static const int +if_link_state_prio[LINK_STATE_MAX] = { + 0, /* LINK_STATE_UNKNOWN */ + -1, /* LINK_STATE_DOWN */ + 1, /* LINK_STATE_UP */ +}; + /* * Handle a change in the interface link state. - * XXX: We should listen to the routing socket in-kernel rather - * than calling in6_if_link_* functions directly from here. */ void if_link_state_change(struct ifnet *ifp, int link_state) { - int s; + int s, i; + + KASSERT(link_state >= 0 && link_state < LINK_STATE_MAX); s = splnet(); - if (ifp->if_link_state == link_state) { - splx(s); - return; + if (ifp->if_link_state == link_state && ifp->if_link_queue_len == 0) + goto out; + + for (i = 0; i < ifp->if_link_queue_len; i++) { + if (if_link_state_prio[link_state] <= + if_link_state_prio[ifp->if_link_queue[i]]) + { + ifp->if_link_queue[i] = link_state; + ifp->if_link_queue_len = i + 1; + /* Because we are trimming the queue, +* there is no need to call softint_schedule again. */ + goto out; + } } - ifp->if_link_state = link_state; + KASSERT(ifp->if_link_queue_len + 1 < __arraycount(ifp->if_link_queue)); + ifp->if_link_queue[ifp->if_link_queue_len++] = link_state; softint_schedule(ifp->if_link_si); +out: splx(s); } - static void if_link_state_change_si(void *arg) { struct ifnet *ifp = arg; - int s; + int s, i; int link_state, old_link_state; struct domain *dp; s = splnet(); - link_state = ifp->if_link_state; - old_link_state = ifp->if_old_link_state; - ifp->if_old_link_state = ifp->if_link_state; + KASSERT(ifp->if_link_queue_len >= 0); /* unlikely */ + if (ifp->if_link_queue_len == 0) + goto out; + + /* Pop the link state change from the queue */ + link_state = ifp->if_link_queue[0]; + ifp->if_link_queue_len--; + for (i = 0; i < ifp->if_link_queue_len; i++) + ifp->if_link_queue[i] = ifp->if_link_queue[i + 1]; + + /* Ensure link state is actually changing */ + if (ifp->if_link_state == link_state) + goto out; + + old_link_state = ifp->if_link_state; + ifp->if_link_state = link_state; #ifdef DEBUG log(LOG_DEBUG, "%s: link state %s (was %s)\n", ifp->if_xname, @@ -1640,6 +1671,7 @@ dp->dom_if_link_state_change(ifp, link_state); } +out: splx(s); } Index: sys/net/if.h === RCS file: /cvsroot/src/sys/net/if.h,v retrieving revision 1.197 diff -u -r1.197 if.h --- sys/net/if.h16 Feb 2016 01:31:26 - 1.197 +++ sys/net/if.h16 Feb 2016 13:22:43 - @@ -191,10 +191,12 @@ /* * Values for if_link_state. + * When changing these values, consider if_link_state_prio in if.c. */ #defineLINK_STATE_UNKNOWN 0 /* link invalid/unknown */ #defineLINK_STATE_DOWN 1 /* link is down */ #defineLINK_STATE_UP 2 /* link is up */ +#defineLINK_STATE_MAX 3 /* size of link states */ /* * Structure defining a queue for a network interface. @@
Re: CVS commit: src/sys/net
On 16/02/2016 08:38, Roy Marples wrote: > for (i = 0; i < ifp->if_link_queue_len; i++) { > if (link_state <= ifp->if_link_queue[i]) { > ifp->if_link_queue[i] = link_state; > ifp->if_link_queue_len = i; This should be ifp->if_link_queue_len = i + 1; > /* Because we are trimming the queue, >* there is no need to call softint_schedule again. */ > goto out; > } > } Roy
Re: CVS commit: src/sys/net
On Tuesday 16 February 2016 13:37:28 you wrote: > > Oh, I see. We shouldn't drop any events of link state changes. > > i don't see any point in keeping a huge series of link changes > if they can be collapsed into an equivalent sequence. with VMs > and other user-controlled systems it's possible to flood such a > link state checking engine. The queue can be kept quite short because we don't care about any prior entries each time state changes to LINK_STATE_DOWN. Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when setting LINK_STATE_UNKNOWN. Finally we set LINK_STATE_UP. So the queue should only ever hold a maximum of 3 items. Could even work it like so void if_link_state_change(struct ifnet *ifp, int link_state) { int s, i; s = splnet(); if (ifp->if_link_queue_len == 0 && ifp->link_state == link_state) goto out; for (i = 0; i < ifp->if_link_queue_len; i++) { if (link_state <= ifp->if_link_queue[i]) { ifp->if_link_queue[i] = link_state; ifp->if_link_queue_len = i; /* Because we are trimming the queue, * there is no need to call softint_schedule again. */ goto out; } } /* Add bounds check here for sanity */ ifp->if_link_queue[ifp->if_link_queue_len++] = link_state; softint_schedule(ifp->if_link_state_softint); out: splx(s); } static void if_link_state_change_si(void *arg) { int s, link_state, i; s = splnet(); if (ifp->if_link_queue_len == 0) goto out; link_state = ifp->if_link_queue[0]; ifp->if_link_queue_len--; for (i = 0; i < ifp->link_queue_len; i++) ifp->if_link_queue[i] = ifp->if_link_queue[i + 1]; /* process state change as normal */ out: splx(s); } Thoughts? Roy
Re: CVS commit: src/sys/net
On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell wrote: >Date: Mon, 15 Feb 2016 11:06:52 + >From: Roy Marples > >On 15/02/2016 10:32, Ryota Ozaki wrote: >> I think we can fix by returning from if_link_state_change_si >> if ifp->if_link_state == ifp->if_old_link_state. Is it correct? > >Then we're not doing what we should because the protocol actions may not >fire. > >Can we not extend softint_schedule to take some user data? That way, >each call (which I'm assuming would be sequential) can carry the link >state change in the user data which can then be applied to the interface. > > Changing softint is not really a good idea -- it is a significant > semantic change that most current users don't require, because they > already have existing queue mechanisms, e.g. pktq. > > If it is necessary to report every link state transition, then we need > to create a queueing mechanism -- and possibly drop some transitions > in the case of memory shortage, as we already do in route_enqueue. > > Maybe something like this, with a queue and a link_state_pending field > to avoid creating extra queue entries unnecessarily but guaranteeing > that the most recent link state change will take effect, and be > preceded by LINK_STATE_DOWN if anything was lost: Oops. Our (IIJ) local changes have a similar code (softint + queuing) and it works for long time. So I concur the design. > > struct link_state_change { > SIMPLEQ_ENTRY(link_state_change)lsc_entry; > int lsc_state; > boollsc_lost; > }; > > static pool_cache_t link_state_change_pc __read_mostly; > > void > if_link_state_change(struct ifnet *ifp, int link_state) > { > int s; > > s = splnet(); > if (ifp->if_link_state_pending == link_state) > goto out; > > if (ifp->if_link_state_pending != ifp->if_link_state) { What does the conditional intend? > struct link_state_change *lsc; > > lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT); > if (lsc == NULL) { > ifp->if_link_state_lost = true; > goto change; > } > > lsc->lsc_state = link_state; > lsc->lsc_lost = ifp->if_link_state_lost; > SIMPLEQ_INSERT_TAIL(&ifp->if_link_state_changes, lsc, > lsc_entry); > } > > ifp->if_link_state_lost = false; > change: ifp->if_link_state_pending = link_state; > softint_schedule(ifp->if_link_state_softint); > out:splx(s); > } > > static void > if_link_state_change_softintr(void *cookie) > { > struct ifnet *ifp = cookie; > struct link_state_change *lsc; > int s; > > s = splnet(); > while (!SIMPLEQ_EMPTY(&ifp->if_link_state_changes)) { > lsc = SIMPLEQ_HEAD(&ifp->if_link_state_changes); > SIMPLEQ_REMOVE_HEAD(&ifp->if_link_state_changes, lsc_entry); > if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost); > } > > if (ifp->if_link_state_pending != ifp->if_link_state) > if_link_state_change_1(ifp, ifp->if_link_state_pending, > ifp->if_link_state_lost); > ifp->if_link_state_lost = false; > splx(s); > } > > static void > if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost) > { > > if (lost || > (ifp->if_link_state == LINK_STATE_UP && > link_state == LINK_STATE_UNKNOWN)) { > /* pretend LINK_STATE_DOWN first */ > } > > ifp->if_link_state = link_state; > /* call domain dom_if_link_state_change callbacks */ > /* call rt_ifmsg */ > } Anyway I made a patch based on the above code: http://www.netbsd.org/~ozaki-r/link_state_change.diff What I did are to make your code compilable and workable, and limit the number of pending events. Thanks, ozaki-r
Re: CVS commit: src/sys/net
On Tue, Feb 16, 2016 at 11:37 AM, matthew green wrote: >> Oh, I see. We shouldn't drop any events of link state changes. > > i don't see any point in keeping a huge series of link changes > if they can be collapsed into an equivalent sequence. with VMs > and other user-controlled systems it's possible to flood such a > link state checking engine. Sorry for my bad wording. I meant we shouldn't drop events _in normal cases_; the number of events should be several. If a huge number of events happen in a short period (e.g., due to hardware troubles, abnormal VMMs, etc.), we should eliminate events to protect the system. ozaki-r
re: CVS commit: src/sys/net
> Oh, I see. We shouldn't drop any events of link state changes. i don't see any point in keeping a huge series of link changes if they can be collapsed into an equivalent sequence. with VMs and other user-controlled systems it's possible to flood such a link state checking engine. .mrg.
Re: CVS commit: src/sys/net
On Mon, Feb 15, 2016 at 8:06 PM, Roy Marples wrote: > On 15/02/2016 10:32, Ryota Ozaki wrote: >> On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples wrote: >>> On 15/02/2016 08:08, Ryota Ozaki wrote: Module Name: src Committed By: ozaki-r Date: Mon Feb 15 08:08:04 UTC 2016 Modified Files: src/sys/net: if.c if.h Log Message: Run if_link_state_change in softint if_link_state_change can execute the network stack that is expected to not run in hardware interrupt (at least now), however network drivers may call it in hardware interrupt. Avoid that by introducing a new softint for if_link_state_change. >>> >>> I don't know anything about softints, so this could be a silly question, >>> but this change looks racey to me. Is there a guarantee anywhere that >>> if_link_state_change() won't be called again before >>> if_link_state_change_si() has been called? >> >> Not guaranteed. Hmm, you're right. There is a race condition; >> ifp->if_link_state can be overwritten before calling >> if_link_state_change_si and it does matter if ifp->if_link_state >> has been back to ifp->if_old_link_state. (If ifp->if_link_state >> is different from ifp->if_old_link_state, it doesn't matter >> because it's just that the later state wins.) >> >> I think we can fix by returning from if_link_state_change_si >> if ifp->if_link_state == ifp->if_old_link_state. Is it correct? > > Then we're not doing what we should because the protocol actions may not > fire. Oh, I see. We shouldn't drop any events of link state changes. > > Can we not extend softint_schedule to take some user data? That way, > each call (which I'm assuming would be sequential) can carry the link > state change in the user data which can then be applied to the interface. I sometimes want that feature but I don't think changing the API is a good idea. ozaki-r
Re: CVS commit: src/sys/net
Date: Mon, 15 Feb 2016 11:06:52 + From: Roy Marples On 15/02/2016 10:32, Ryota Ozaki wrote: > I think we can fix by returning from if_link_state_change_si > if ifp->if_link_state == ifp->if_old_link_state. Is it correct? Then we're not doing what we should because the protocol actions may not fire. Can we not extend softint_schedule to take some user data? That way, each call (which I'm assuming would be sequential) can carry the link state change in the user data which can then be applied to the interface. Changing softint is not really a good idea -- it is a significant semantic change that most current users don't require, because they already have existing queue mechanisms, e.g. pktq. If it is necessary to report every link state transition, then we need to create a queueing mechanism -- and possibly drop some transitions in the case of memory shortage, as we already do in route_enqueue. Maybe something like this, with a queue and a link_state_pending field to avoid creating extra queue entries unnecessarily but guaranteeing that the most recent link state change will take effect, and be preceded by LINK_STATE_DOWN if anything was lost: struct link_state_change { SIMPLEQ_ENTRY(link_state_change)lsc_entry; int lsc_state; boollsc_lost; }; static pool_cache_t link_state_change_pc __read_mostly; void if_link_state_change(struct ifnet *ifp, int link_state) { int s; s = splnet(); if (ifp->if_link_state_pending == link_state) goto out; if (ifp->if_link_state_pending != ifp->if_link_state) { struct link_state_change *lsc; lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT); if (lsc == NULL) { ifp->if_link_state_lost = true; goto change; } lsc->lsc_state = link_state; lsc->lsc_lost = ifp->if_link_state_lost; SIMPLEQ_INSERT_TAIL(&ifp->if_link_state_changes, lsc, lsc_entry); } ifp->if_link_state_lost = false; change: ifp->if_link_state_pending = link_state; softint_schedule(ifp->if_link_state_softint); out:splx(s); } static void if_link_state_change_softintr(void *cookie) { struct ifnet *ifp = cookie; struct link_state_change *lsc; int s; s = splnet(); while (!SIMPLEQ_EMPTY(&ifp->if_link_state_changes)) { lsc = SIMPLEQ_HEAD(&ifp->if_link_state_changes); SIMPLEQ_REMOVE_HEAD(&ifp->if_link_state_changes, lsc_entry); if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost); } if (ifp->if_link_state_pending != ifp->if_link_state) if_link_state_change_1(ifp, ifp->if_link_state_pending, ifp->if_link_state_lost); ifp->if_link_state_lost = false; splx(s); } static void if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost) { if (lost || (ifp->if_link_state == LINK_STATE_UP && link_state == LINK_STATE_UNKNOWN)) { /* pretend LINK_STATE_DOWN first */ } ifp->if_link_state = link_state; /* call domain dom_if_link_state_change callbacks */ /* call rt_ifmsg */ }
Re: CVS commit: src/sys/net
On Mon, Feb 15, 2016 at 11:06:52AM +, Roy Marples wrote: > Can we not extend softint_schedule to take some user data? That way, > each call (which I'm assuming would be sequential) can carry the link > state change in the user data which can then be applied to the interface. Sounds like a workqueue(9) then... Martin
Re: CVS commit: src/sys/net
On 15/02/2016 10:32, Ryota Ozaki wrote: > On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples wrote: >> On 15/02/2016 08:08, Ryota Ozaki wrote: >>> Module Name: src >>> Committed By: ozaki-r >>> Date: Mon Feb 15 08:08:04 UTC 2016 >>> >>> Modified Files: >>> src/sys/net: if.c if.h >>> >>> Log Message: >>> Run if_link_state_change in softint >>> >>> if_link_state_change can execute the network stack that is expected to >>> not run in hardware interrupt (at least now), however network drivers >>> may call it in hardware interrupt. Avoid that by introducing a new >>> softint for if_link_state_change. >> >> I don't know anything about softints, so this could be a silly question, >> but this change looks racey to me. Is there a guarantee anywhere that >> if_link_state_change() won't be called again before >> if_link_state_change_si() has been called? > > Not guaranteed. Hmm, you're right. There is a race condition; > ifp->if_link_state can be overwritten before calling > if_link_state_change_si and it does matter if ifp->if_link_state > has been back to ifp->if_old_link_state. (If ifp->if_link_state > is different from ifp->if_old_link_state, it doesn't matter > because it's just that the later state wins.) > > I think we can fix by returning from if_link_state_change_si > if ifp->if_link_state == ifp->if_old_link_state. Is it correct? Then we're not doing what we should because the protocol actions may not fire. Can we not extend softint_schedule to take some user data? That way, each call (which I'm assuming would be sequential) can carry the link state change in the user data which can then be applied to the interface. Roy
Re: CVS commit: src/sys/net
On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples wrote: > On 15/02/2016 08:08, Ryota Ozaki wrote: >> Module Name: src >> Committed By: ozaki-r >> Date: Mon Feb 15 08:08:04 UTC 2016 >> >> Modified Files: >> src/sys/net: if.c if.h >> >> Log Message: >> Run if_link_state_change in softint >> >> if_link_state_change can execute the network stack that is expected to >> not run in hardware interrupt (at least now), however network drivers >> may call it in hardware interrupt. Avoid that by introducing a new >> softint for if_link_state_change. > > I don't know anything about softints, so this could be a silly question, > but this change looks racey to me. Is there a guarantee anywhere that > if_link_state_change() won't be called again before > if_link_state_change_si() has been called? Not guaranteed. Hmm, you're right. There is a race condition; ifp->if_link_state can be overwritten before calling if_link_state_change_si and it does matter if ifp->if_link_state has been back to ifp->if_old_link_state. (If ifp->if_link_state is different from ifp->if_old_link_state, it doesn't matter because it's just that the later state wins.) I think we can fix by returning from if_link_state_change_si if ifp->if_link_state == ifp->if_old_link_state. Is it correct? Thanks, ozaki-r
Re: CVS commit: src/sys/net
On 15/02/2016 08:08, Ryota Ozaki wrote: > Module Name: src > Committed By: ozaki-r > Date: Mon Feb 15 08:08:04 UTC 2016 > > Modified Files: > src/sys/net: if.c if.h > > Log Message: > Run if_link_state_change in softint > > if_link_state_change can execute the network stack that is expected to > not run in hardware interrupt (at least now), however network drivers > may call it in hardware interrupt. Avoid that by introducing a new > softint for if_link_state_change. I don't know anything about softints, so this could be a silly question, but this change looks racey to me. Is there a guarantee anywhere that if_link_state_change() won't be called again before if_link_state_change_si() has been called? Roy
Re: CVS commit: src/sys/net
Hi, On 2016/01/13 7:10, David Laight wrote: > On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote: >> # Sorry, I forgot to subscribe source-changes-d ml, I reply as >> # a new mail. >> >> Hi, >> >>> In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>, >>> Kengo NAKAHARA wrote: -=-=-=-=-=- Module Name: src Committed By: knakahara Date: Thu Dec 10 08:11:03 UTC 2015 Modified Files: src/sys/net: if_gif.c Log Message: kmem_zalloc(, KM_SLEEP) must not return NULL. >>> >>> I would like to solicit opinions about this change and form a general >>> policy. >>> >>> 1. I would like to reduce the use of KASSERT in the kernel, specially >>> in situations like thee above where the test can be centralized (inside >>> kmem_alloc) and avoided without being fatal. >> >> OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here. > > There is no real point using KASSERT() when the immediate > dereference of NULL will have the same effect and is about as > easy to debug. > > Whether kmem_alloc(KM_SLEEP) can return NULL is another matter. > Seems wrong to me - maybe even for impossible allications. > ISTR a problem waiting for KVA and phys-mem being 'difficult'. > I know that the Linux equivalent will return NULL, not sure when. > It would be useful to define that allocation for non-huge > (maybe even several MB) amounts will always sleep. > > David Have you read this thread? http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019762.html I think Mr. Chuck Silvers points out a similar issue. http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019772.html Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Core Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: CVS commit: src/sys/net
On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote: > # Sorry, I forgot to subscribe source-changes-d ml, I reply as > # a new mail. > > Hi, > > > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>, > > Kengo NAKAHARA wrote: > >>-=-=-=-=-=- > >> > >>Module Name:src > >>Committed By: knakahara > >>Date: Thu Dec 10 08:11:03 UTC 2015 > >> > >>Modified Files: > >>src/sys/net: if_gif.c > >> > >>Log Message: > >>kmem_zalloc(, KM_SLEEP) must not return NULL. > > > > I would like to solicit opinions about this change and form a general > > policy. > > > > 1. I would like to reduce the use of KASSERT in the kernel, specially > > in situations like thee above where the test can be centralized (inside > > kmem_alloc) and avoided without being fatal. > > OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here. There is no real point using KASSERT() when the immediate dereference of NULL will have the same effect and is about as easy to debug. Whether kmem_alloc(KM_SLEEP) can return NULL is another matter. Seems wrong to me - maybe even for impossible allications. ISTR a problem waiting for KVA and phys-mem being 'difficult'. I know that the Linux equivalent will return NULL, not sure when. It would be useful to define that allocation for non-huge (maybe even several MB) amounts will always sleep. David -- David Laight: da...@l8s.co.uk
Re: CVS commit: src/sys/net
Hi, On 2015/12/11 13:00, Christos Zoulas wrote: > On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote: > -- Subject: Re: CVS commit: src/sys/net > > | Hi, > | > | > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>, > | > Kengo NAKAHARA wrote: > | >>-=-=-=-=-=- > | >> > | >>Module Name: src > | >>Committed By: knakahara > | >>Date: Thu Dec 10 08:11:03 UTC 2015 > | >> > | >>Modified Files: > | >> src/sys/net: if_gif.c > | >> > | >>Log Message: > | >>kmem_zalloc(, KM_SLEEP) must not return NULL. > | > > | > I would like to solicit opinions about this change and form a general > | > policy. > | > > | > 1. I would like to reduce the use of KASSERT in the kernel, specially > | > in situations like thee above where the test can be centralized (inside > | > kmem_alloc) and avoided without being fatal. > | > | OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here. > | > | > | > 2. Static analyzer models understand allocators, but they are not > | > smart enough to determine under which situations they can fail. I > | > believe even kmem_alloc with KM_SLEEP can fail when the size is > | > large enough. > | > | I have a question. The man of kmem(9) says: > | > | kmflags Either of the following: > | > | KM_SLEEPIf the allocation cannot be satisfied immediately, > | sleep until enough memory is available. > > > int ret = uvm_km_kmem_alloc(kmem_va_arena, > (vsize_t)round_page(size), > ((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) > | VM_INSTANTFIT, (vmem_addr_t *)&p); > if (ret) { > return NULL; > } > > > | > | Is this manual incorrect? > | I'm confused... Could you tell me easily comprehensible manual? > > It is documented that it does not fail if you sleep because this > is the usual scenario. Follow the path to the rebbit hole and you > go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you > see that under certain conditions it will fail even when you sleep. I see. Thank you for your explanation. > Note that bt_refill return values are not even always checked, so > this can fail in other mysterious ways. Oh... > | > So I propose to always check the return value of allocators with > | > an 'if' and not a KASSERT. > | > | There are some codes like "foo = kmem_alloc(size, KM_SLEEP); > | KASSERT(foo != NULL)". > | Should the codes be unified to use not KASSERT' but if'? > > Yes (when it is possible), and the man page for kmem_alloc should be > changed to reflect that. I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now. > Log Message: > Spell out that KM_SLEEP allocations can fail. > > > To generate a diff of this commit: > cvs rdiff -u -r1.17 -r1.18 src/share/man/man9/kmem.9 Thank you for your quickly update! Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Core Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: CVS commit: src/sys/net
On Dec 11, 1:25pm, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote: -- Subject: Re: CVS commit: src/sys/net | I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now. Thanks. | Thank you for your quickly update! Thanks for bringing this up. I've been meaning to do something about it for a long time ;-) christos
Re: CVS commit: src/sys/net
On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote: -- Subject: Re: CVS commit: src/sys/net | Hi, | | > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>, | > Kengo NAKAHARA wrote: | >>-=-=-=-=-=- | >> | >>Module Name:src | >>Committed By: knakahara | >>Date: Thu Dec 10 08:11:03 UTC 2015 | >> | >>Modified Files: | >>src/sys/net: if_gif.c | >> | >>Log Message: | >>kmem_zalloc(, KM_SLEEP) must not return NULL. | > | > I would like to solicit opinions about this change and form a general | > policy. | > | > 1. I would like to reduce the use of KASSERT in the kernel, specially | > in situations like thee above where the test can be centralized (inside | > kmem_alloc) and avoided without being fatal. | | OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here. | | | > 2. Static analyzer models understand allocators, but they are not | > smart enough to determine under which situations they can fail. I | > believe even kmem_alloc with KM_SLEEP can fail when the size is | > large enough. | | I have a question. The man of kmem(9) says: | | kmflags Either of the following: | | KM_SLEEPIf the allocation cannot be satisfied immediately, | sleep until enough memory is available. int ret = uvm_km_kmem_alloc(kmem_va_arena, (vsize_t)round_page(size), ((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) | VM_INSTANTFIT, (vmem_addr_t *)&p); if (ret) { return NULL; } | | Is this manual incorrect? | I'm confused... Could you tell me easily comprehensible manual? It is documented that it does not fail if you sleep because this is the usual scenario. Follow the path to the rebbit hole and you go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you see that under certain conditions it will fail even when you sleep. Note that bt_refill return values are not even always checked, so this can fail in other mysterious ways. | > So I propose to always check the return value of allocators with | > an 'if' and not a KASSERT. | | There are some codes like "foo = kmem_alloc(size, KM_SLEEP); | KASSERT(foo != NULL)". | Should the codes be unified to use not KASSERT' but if'? Yes (when it is possible), and the man page for kmem_alloc should be changed to reflect that. christos
Re: CVS commit: src/sys/net
# Sorry, I forgot to subscribe source-changes-d ml, I reply as # a new mail. Hi, > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>, > Kengo NAKAHARA wrote: >>-=-=-=-=-=- >> >>Module Name: src >>Committed By: knakahara >>Date: Thu Dec 10 08:11:03 UTC 2015 >> >>Modified Files: >> src/sys/net: if_gif.c >> >>Log Message: >>kmem_zalloc(, KM_SLEEP) must not return NULL. > > I would like to solicit opinions about this change and form a general > policy. > > 1. I would like to reduce the use of KASSERT in the kernel, specially > in situations like thee above where the test can be centralized (inside > kmem_alloc) and avoided without being fatal. OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here. > 2. Static analyzer models understand allocators, but they are not > smart enough to determine under which situations they can fail. I > believe even kmem_alloc with KM_SLEEP can fail when the size is > large enough. I have a question. The man of kmem(9) says: kmflags Either of the following: KM_SLEEPIf the allocation cannot be satisfied immediately, sleep until enough memory is available. Is this manual incorrect? I'm confused... Could you tell me easily comprehensible manual? > So I propose to always check the return value of allocators with > an 'if' and not a KASSERT. There are some codes like "foo = kmem_alloc(size, KM_SLEEP); KASSERT(foo != NULL)". Should the codes be unified to use not KASSERT' but if'? Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Core Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: CVS commit: src/sys/net
chris...@astron.com (Christos Zoulas) writes: > In article <20151210081103.e0fbbf...@cvs.netbsd.org>, > Kengo NAKAHARA wrote: >>-=-=-=-=-=- >> >>Module Name: src >>Committed By: knakahara >>Date: Thu Dec 10 08:11:03 UTC 2015 >> >>Modified Files: >> src/sys/net: if_gif.c >> >>Log Message: >>kmem_zalloc(, KM_SLEEP) must not return NULL. > > I would like to solicit opinions about this change and form a general > policy. > > 1. I would like to reduce the use of KASSERT in the kernel, specially > in situations like thee above where the test can be centralized (inside > kmem_alloc) and avoided without being fatal. > > 2. Static analyzer models understand allocators, but they are not > smart enough to determine under which situations they can fail. I > believe even kmem_alloc with KM_SLEEP can fail when the size is > large enough. > > So I propose to always check the return value of allocators with > an 'if' and not a KASSERT. I think that a function needs to have a contract specified in a contract (and perhaps static analyzer markup). Code should never KASSERT for anything that can not be argued (statically shown) to be always true given contracts. So I agree with you. signature.asc Description: PGP signature