Re: kernel panic when removing interface
On 27/11/20(Fri) 15:47, Denis Fondras wrote: > > It is, I guess a fix should go in net/rtsock.c to prevent adding "-link" > > entry on routing table different from ifp->if_rdomain. > > > > I came up with this, which is more radical. Which is not exactly what we want. This will prevent adding any route on a routing table different from rdomain. What needs to be enforced is the check from a request coming from userland trying to insert a "-link" route. Such check should have the benefit of documenting that L2 entries should be only inserted in the rdomain table of an interface. > Index: route.c > === > RCS file: /cvs/src/sys/net/route.c,v > retrieving revision 1.397 > diff -u -p -r1.397 route.c > --- route.c 29 Oct 2020 21:15:27 - 1.397 > +++ route.c 27 Nov 2020 09:39:53 - > @@ -865,6 +865,8 @@ rtrequest(int req, struct rt_addrinfo *i > return (EINVAL); > ifa = info->rti_ifa; > ifp = ifa->ifa_ifp; > + if (tableid != ifp->if_rdomain) > + return (EINVAL); > if (prio == 0) > prio = ifp->if_priority + RTP_STATIC; > >
Re: kernel panic when removing interface
> It is, I guess a fix should go in net/rtsock.c to prevent adding "-link" > entry on routing table different from ifp->if_rdomain. > I came up with this, which is more radical. Index: route.c === RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.397 diff -u -p -r1.397 route.c --- route.c 29 Oct 2020 21:15:27 - 1.397 +++ route.c 27 Nov 2020 09:39:53 - @@ -865,6 +865,8 @@ rtrequest(int req, struct rt_addrinfo *i return (EINVAL); ifa = info->rti_ifa; ifp = ifa->ifa_ifp; + if (tableid != ifp->if_rdomain) + return (EINVAL); if (prio == 0) prio = ifp->if_priority + RTP_STATIC;
Re: kernel panic when removing interface
On 26/11/20(Thu) 20:38, Pierre Emeriaud wrote: > Hello Martin > > Le jeu. 26 nov. 2020 à 14:27, Martin Pieuchot a écrit : > > > > > > > > $ doas route -T1 add 192.0.2.2/32 -link -iface vlan12 > > > > I wonder if the problem isn't in the validation of these parameters. > > > > Should we accept a L2 (-link) entry on a routing table which isn't the > > routing domain? If so why does the entry persist in the ARP cache? > > Which arp entry are you referring to? The one from the route I added? Yes. In the kernel ARP entries are represented as route entries. So when you add a "-link" route it is an ARP entry. > > Can you reproduce the problem if you don't specify T1? > > No. The routes are correctly removed when the interface is destroyed. > It only crashes when the routes are added to another (non-empty if > that matters) rdomain, but again, this was a silly mistake on my side. Still, silly mistakes should be prevented and not crash the kernel ;) > I reported it as it might be of interest to fix this for the sake of > it, but it causes almost no harm. It is, I guess a fix should go in net/rtsock.c to prevent adding "-link" entry on routing table different from ifp->if_rdomain. > PS: I've managed to crash my first router just by waiting a few > seconds - no need to remove the route - same thing as the second > router: > ddb> show panic > kernel diagnostic assertion "ifp != NULL" failed: file > "/usr/src/sys/netinet/if > _ether.c", line 718 > > ddb> trace > db_enter() at db_enter+0x10 > panic(81dc761f) at panic+0x12a > __assert(81e321c2,81db9f2b,2ce,81d9e429) at > __assert+0x > 2b > arp_rtrequest(fd800baa10a8,fd800baa10a8,fd801aa63dc0) at > arp_rtrequ > est > arptimer(8216a090) at arptimer+0x67 > softclock_thread(8000ea40) at softclock_thread+0x13f > end trace frame: 0x0, count: -6
Re: kernel panic when removing interface
Hello Martin Le jeu. 26 nov. 2020 à 14:27, Martin Pieuchot a écrit : > > > > > $ doas route -T1 add 192.0.2.2/32 -link -iface vlan12 > > I wonder if the problem isn't in the validation of these parameters. > > Should we accept a L2 (-link) entry on a routing table which isn't the > routing domain? If so why does the entry persist in the ARP cache? Which arp entry are you referring to? The one from the route I added? > Can you reproduce the problem if you don't specify T1? No. The routes are correctly removed when the interface is destroyed. It only crashes when the routes are added to another (non-empty if that matters) rdomain, but again, this was a silly mistake on my side. I reported it as it might be of interest to fix this for the sake of it, but it causes almost no harm. PS: I've managed to crash my first router just by waiting a few seconds - no need to remove the route - same thing as the second router: ddb> show panic kernel diagnostic assertion "ifp != NULL" failed: file "/usr/src/sys/netinet/if _ether.c", line 718 ddb> trace db_enter() at db_enter+0x10 panic(81dc761f) at panic+0x12a __assert(81e321c2,81db9f2b,2ce,81d9e429) at __assert+0x 2b arp_rtrequest(fd800baa10a8,fd800baa10a8,fd801aa63dc0) at arp_rtrequ est arptimer(8216a090) at arptimer+0x67 softclock_thread(8000ea40) at softclock_thread+0x13f end trace frame: 0x0, count: -6
Re: kernel panic when removing interface
On Wed, Nov 25, 2020 at 08:31:14PM +0100, Pierre Emeriaud wrote: > Le mer. 25 nov. 2020 à 10:50, Denis Fondras a écrit : > > > > Cannot reproduce with latest snapshop on ESX or VMM. Could this be QEMU > > related > > You made me test with my home router, which also crashed, although not > at the same point: I didn't had to even remove the route, just wait a > few seconds and the box crashed. > I had missed a crucial point in your first message: "rdomain 1" I can reproduce with this in place.
Re: kernel panic when removing interface
On 24/11/20(Tue) 09:23, Pierre Emeriaud wrote: > > Trying to use mgre(4), I found what looks like a reliable way to crash > > the kernel which might be of interest. > > > > This machine is a one-month-old-current fairly light router, with inet > > default within rdomain 1. I will upgrade to a more recent snap > > shortly. > > I just upgraded to OpenBSD 6.8-current (GENERIC) #181: Mon Nov 23 > 20:55:15 MST 2020 and the same thing happens with vlan(4): > > $ doas ifconfig vlan12 inet 192.0.2.1/24 parent vio0 vnetid 12 > $ ifconfig vlan > vlan12: flags=8843 mtu 1500 > lladdr 02:00:00:ef:3d:d7 > index 8 priority 0 llprio 3 > encap: vnetid 12 parent vio0 txprio packet rxprio outer > groups: vlan > media: Ethernet autoselect > status: active > inet 192.0.2.1 netmask 0xff00 broadcast 192.0.2.255 > > $ doas route -T1 add 192.0.2.2/32 -link -iface vlan12 I wonder if the problem isn't in the validation of these parameters. Should we accept a L2 (-link) entry on a routing table which isn't the routing domain? If so why does the entry persist in the ARP cache? Can you reproduce the problem if you don't specify T1? > add host 192.0.2.2/32: gateway vlan12 > > $ route -T1 -n show -inet > DestinationGatewayFlags Refs Use Mtu Prio Iface > 192.0.2.2 link#8 UHLS 00 - 8 vlan12 > > $ route -n show -inet > Internet: > DestinationGatewayFlags Refs Use Mtu Prio Iface > 192.0.2/24 192.0.2.1 UCn00 - 4 vlan12 > 192.0.2.1 02:00:00:ef:3d:d7 UHLl 00 - 1 vlan12 > 192.0.2.255192.0.2.1 UHb00 - 1 vlan12 > > $ doas ifconfig vlan12 down > $ doas ifconfig vlan12 destroy > > $ route -T1 -n show -inet > DestinationGatewayFlags Refs Use Mtu Prio Iface > 192.0.2.2 link#8 UHLS 00 - 8 (null) > > $ doas route -T1 del 192.0.2.2/32 > > login: panic: kernel diagnostic assertion "ifp != NULL" failed: file > "/usr/src/sys/net/rtsock.c", line 975 > Stopped at db_enter+0x10: popq%rbp > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *189431 84402 00x13 00 route > db_enter() at db_enter+0x10 > panic(81dcc1d7) at panic+0x12a > __assert(81e32678,81e40e69,3cf,81d9f5fd) at > __assert+0x > 2b > rtm_output(80071480,8e77ce80,8e77cdd8,40,1) at > rtm_outp > ut+0x7ee > route_output(fd801ef36c00,fd801af0d698,0,0) at route_output+0x3c3 > route_usrreq(fd801af0d698,9,fd801ef36c00,0,0,8e720540) at > route > _usrreq+0x21a > sosend(fd801af0d698,0,8e77d0d8,0,0,0) at sosend+0x35b > dofilewritev(8e720540,3,8e77d0d8,0,8e77d1b0) at > dofilew > ritev+0x14d > sys_write(8e720540,8e77d150,8e77d1b0) at > sys_write+0x51 > > syscall(8e77d220) at syscall+0x315 > Xsyscall() at Xsyscall+0x128 > end of kernel > end trace frame: 0x7f7d35b0, count: 4 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb> >
Re: kernel panic when removing interface
Le mer. 25 nov. 2020 à 10:50, Denis Fondras a écrit : > > Cannot reproduce with latest snapshop on ESX or VMM. Could this be QEMU > related You made me test with my home router, which also crashed, although not at the same point: I didn't had to even remove the route, just wait a few seconds and the box crashed. # ifconfig vlan12 down # route -T1 -n show DestinationGatewayFlags Refs Use Mtu Prio Iface default193.253.160.3 UGS622280 - 8 pppoe0 92.135.7.9092.135.7.90UHl0 223749 - 1 pppoe0 127.0.0.1 127.0.0.1 UHl00 32768 1 lo1 192.0.2.2 link#22UHLS 00 - 8 vlan12 193.253.160.3 92.135.7.90UHh11 - 8 pppoe0 # ifconfig vlan12 destroy # route -T1 -n show DestinationGatewayFlags Refs Use Mtu Prio Iface default193.253.160.3 UGS522310 - 8 pppoe0 92.135.7.9092.135.7.90UHl0 223875 - 1 pppoe0 127.0.0.1 127.0.0.1 UHl00 32768 1 lo1 192.0.2.2 link#22UHLS 00 - 8 (null) 193.253.160.3 92.135.7.90UHh11 - 8 pppoe0 ddb> show panic kernel diagnostic assertion "ifp != NULL" failed: file "/usr/src/sys/netinet/if _ether.c", line 718 ddb> trace db_enter() at db_enter+0x10 panic(81dc9e6e) at panic+0x12a __assert(81e32005,81dbe77c,2ce,81d9dd78) at __assert+0x2b arp_rtrequest(fd81442890b0,fd81442890b0,fd81706c8a40) at arp_rtrequest arptimer(82196768) at arptimer+0x67 softclock_thread(8000225ab710) at softclock_thread+0x13f end trace frame: 0x0, count: -6 ddb> ps PID TID PPIDUID S FLAGS WAIT COMMAND 84391 112017 2522 0 30x100083 ttyin ksh 2522 267220 97397 1000 30x10008b pause ksh 97397 178345 76323 1000 30x80 selectscreen 76323 376443 87989 1000 30x8b pause screen 87989 313219 94843 1000 30x10008b pause ksh 94843 363201 25500 1000 30x90 selectsshd 25500 449411 99289 0 30x92 poll sshd 48468 475629 1 0 30x100083 ttyin getty 8259 260779 1 0 30x100098 poll cron 4447 58568 1 53 30x90 kqreadunbound 64735 119756 70626 95 30x100092 kqreadsmtpd 59418 344450 70626103 30x100092 kqreadsmtpd 48863 84545 70626 95 30x100092 kqreadsmtpd 78574 41062 70626 95 30x100092 kqreadsmtpd 7276 148319 70626 95 30x100092 kqreadsmtpd 36099 298643 70626 95 30x100092 kqreadsmtpd 70626 391238 1 0 30x100080 kqreadsmtpd 81939 483058 21064 94 30x100092 kqreadrad 98404 338566 21064 94 30x100092 kqreadrad 21064 103877 1 0 30x100080 kqreadrad 43922 499246 1 77 30x100090 poll dhcpd 85345 419592 82134 75 30x100092 poll bgpd 88323 140875 82134 75 30x100092 poll bgpd 82134 372758 1 0 30x80 poll bgpd 99289 392141 1 0 30x80 selectsshd 38786 475213 1 0 30x100080 poll ntpd 51963 385665 15776 83 30x100092 poll ntpd 15776 518047 1 83 30x100092 poll ntpd 46470 299929 1 53 30x90 kqreadunbound 5857 76 1628 74 30x100092 bpf pflogd 1628 282883 1 0 30x80 netio pflogd 62819 289053 19138 73 30x100090 kqreadsyslogd 191387172 1 0 30x100082 netio syslogd 89886 469345 1 0 30x80 selecttincd 85583 41922 1 0 30x80 selecttincd 67213 483596 1 77 30x100090 poll dhclient 91175 124773 1 0 30x80 poll dhclient 34408 387824 0 0 3 0x14200 bored wg_crypt 20742 307442 0 0 3 0x14200 bored wg_handshake 31732 301833 0 0 3 0x14200 bored wg_handshake 69836 338401 14508115 30x100092 kqreadslaacd 50454 69378 14508115 30x100092 kqreadslaacd 14508 337980 1 0 30x100080 kqreadslaacd 57366 360388 0 0 3 0x14200 bored smr 93625 78369 0 0 3 0x14200 pgzerozerothread 58286 441845 0 0 3 0x14200 aiodoned aiodoned 89780 325541 0 0 3 0x14200 syncer
Re: kernel panic when removing interface
On Tue, Nov 24, 2020 at 09:23:05AM +0100, Pierre Emeriaud wrote: > > Trying to use mgre(4), I found what looks like a reliable way to crash > > the kernel which might be of interest. > > > > This machine is a one-month-old-current fairly light router, with inet > > default within rdomain 1. I will upgrade to a more recent snap > > shortly. > > I just upgraded to OpenBSD 6.8-current (GENERIC) #181: Mon Nov 23 > 20:55:15 MST 2020 and the same thing happens with vlan(4): > Cannot reproduce with latest snapshop on ESX or VMM. Could this be QEMU related ? OpenBSD 6.8-current (GENERIC) #182: Tue Nov 24 18:46:05 MST 2020 test# ifconfig vlan12 inet 192.0.2.1/24 parent vio0 vnetid 12 test# route -T1 add 192.0.2.2/32 -link -iface vlan12 add host 192.0.2.2/32: gateway vlan12 test# route -T1 -n show -inet Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface 192.0.2.2 link#5 UHLS 00 - 8 vlan12 test# route -n show -inet Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface 224/4 127.0.0.1 URS00 32768 8 lo0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 12 32768 1 lo0 192.0.2/24 192.0.2.1 UCn00 - 4 vlan12 192.0.2.1 fe:e1:bb:d1:64:89 UHLl 00 - 1 vlan12 192.0.2.255192.0.2.1 UHb00 - 1 vlan12 test# ifconfig vlan12 down test# ifconfig vlan12 destroy test# route -T1 -n show -inet Routing tables test# route -T1 del 192.0.2.2/32 del host 192.0.2.2/32: not in table test#
Re: kernel panic when removing interface
> Trying to use mgre(4), I found what looks like a reliable way to crash > the kernel which might be of interest. > > This machine is a one-month-old-current fairly light router, with inet > default within rdomain 1. I will upgrade to a more recent snap > shortly. I just upgraded to OpenBSD 6.8-current (GENERIC) #181: Mon Nov 23 20:55:15 MST 2020 and the same thing happens with vlan(4): $ doas ifconfig vlan12 inet 192.0.2.1/24 parent vio0 vnetid 12 $ ifconfig vlan vlan12: flags=8843 mtu 1500 lladdr 02:00:00:ef:3d:d7 index 8 priority 0 llprio 3 encap: vnetid 12 parent vio0 txprio packet rxprio outer groups: vlan media: Ethernet autoselect status: active inet 192.0.2.1 netmask 0xff00 broadcast 192.0.2.255 $ doas route -T1 add 192.0.2.2/32 -link -iface vlan12 add host 192.0.2.2/32: gateway vlan12 $ route -T1 -n show -inet DestinationGatewayFlags Refs Use Mtu Prio Iface 192.0.2.2 link#8 UHLS 00 - 8 vlan12 $ route -n show -inet Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface 192.0.2/24 192.0.2.1 UCn00 - 4 vlan12 192.0.2.1 02:00:00:ef:3d:d7 UHLl 00 - 1 vlan12 192.0.2.255192.0.2.1 UHb00 - 1 vlan12 $ doas ifconfig vlan12 down $ doas ifconfig vlan12 destroy $ route -T1 -n show -inet DestinationGatewayFlags Refs Use Mtu Prio Iface 192.0.2.2 link#8 UHLS 00 - 8 (null) $ doas route -T1 del 192.0.2.2/32 login: panic: kernel diagnostic assertion "ifp != NULL" failed: file "/usr/src/sys/net/rtsock.c", line 975 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *189431 84402 00x13 00 route db_enter() at db_enter+0x10 panic(81dcc1d7) at panic+0x12a __assert(81e32678,81e40e69,3cf,81d9f5fd) at __assert+0x 2b rtm_output(80071480,8e77ce80,8e77cdd8,40,1) at rtm_outp ut+0x7ee route_output(fd801ef36c00,fd801af0d698,0,0) at route_output+0x3c3 route_usrreq(fd801af0d698,9,fd801ef36c00,0,0,8e720540) at route _usrreq+0x21a sosend(fd801af0d698,0,8e77d0d8,0,0,0) at sosend+0x35b dofilewritev(8e720540,3,8e77d0d8,0,8e77d1b0) at dofilew ritev+0x14d sys_write(8e720540,8e77d150,8e77d1b0) at sys_write+0x51 syscall(8e77d220) at syscall+0x315 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d35b0, count: 4 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb>
kernel panic when removing interface
Hello bugs@ Trying to use mgre(4), I found what looks like a reliable way to crash the kernel which might be of interest. This machine is a one-month-old-current fairly light router, with inet default within rdomain 1. I will upgrade to a more recent snap shortly. *** Setup First I created an mgre interface: # ifconfig mgre0 create # ifconfig mgre0 tunneldomain 1 # ifconfig mgre0 tunneladdr 198.51.100.162 # ifconfig mgre0 inet 192.0.2.1/24 # ifconfig mgre0 up # ifconfig mgre0 mgre0: flags=8841 mtu 1476 index 10 priority 0 llprio 3 encap: vnetid none txprio payload rxprio packet groups: mgre tunnel: inet 198.50.250.162 ttl 64 nodf ecn rdomain 1 inet 192.0.2.1 netmask 0xff00 So far, so good. Then I added a route towards the destination, although in the wrong table (I know... silly me): # route -T1 add -host 192.0.2.2 212.129.29.29 -iface -ifp mgre0 # route -T1 -n show -inet DestinationGatewayFlags Refs Use Mtu Prio Iface default158.69.55.254 UGS514957 - 8 vio0 158.69.55.254 00:ff:ff:ff:ff:ff UHLSh 1 17 - 8 vio0 192.0.2.2 212.129.29.29 UHS00 - 8 mgre0 198.50.250.162 02:00:00:ef:3d:d7 UHLl 0 4445 - 1 vio0 198.50.250.162/32 198.50.250.162 UCn00 - 4 vio0 Adding the correct route worked as expected: # route add -host 192.0.2.2 212.129.29.29 -iface -ifp mgre0 add host 192.0.2.2: gateway 212.129.29.29 $ route -n show -inet DestinationGatewayFlags Refs Use Mtu Prio Iface 192.0.2/24 192.0.2.1 UCn00 - 4 mgre0 192.0.2.1 mgre0 UHl00 - 1 mgre0 192.0.2.2 212.129.29.29 UHS00 - 8 mgre0 And instead of removing the route first (dumb me again), I first downed the interface then destroyed it: # ifconfig mgre0 down # ifconfig mgre0 destroy The route was correctly removed from rdomain 0, but not rdomain 1: $ route -T1 -n show -inet DestinationGatewayFlags Refs Use Mtu Prio Iface default158.69.55.254 UGS8 18400300 - 8 vio0 158.69.55.254 00:ff:ff:ff:ff:ff UHLSh 118558 - 8 vio0 192.0.2.2 212.129.29.29 UHS00 - 8 (null) 198.50.250.162 02:00:00:ef:3d:d7 UHLl 0 2567768 - 1 vio0 198.50.250.162/32 198.50.250.162 UCn00 - 4 vio0 And then here the host crashes when the following command is entered: $ doas route -T1 del 192.0.2.2 *** Fix: Don't do that. Delete the route before destroying the interface. *** ddb output: ddb> show panic kernel diagnostic assertion "ifp != NULL" failed: file "/usr/src/sys/net/rtsock.c", line 973 ddb> trace db_enter() at db_enter+0x10 panic(81dca15b) at panic+0x12a __assert(81e32a47,81e453a8,3cd,81d9f3ec) at __assert+0x 2b rtm_output(80077780,8e80f410,8e80f368,40,1) at rtm_outp ut+0x7ee route_output(fd801ab0c400,fd800bc8d688,0,0) at route_output+0x3c3 route_usrreq(fd800bc8d688,9,fd801ab0c400,0,0,8e7165a8) at route _usrreq+0x21a sosend(fd800bc8d688,0,8e80f668,0,0,0) at sosend+0x35b dofilewritev(8e7165a8,3,8e80f668,0,8e80f740) at dofilew ritev+0x14d sys_write(8e7165a8,8e80f6e0,8e80f740) at sys_write+0x51 syscall(8e80f7b0) at syscall+0x315 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d7830, count: -11 ddb> ps PID TID PPIDUID S FLAGS WAIT COMMAND *24152 141518 73869 0 70x13route 49518 188379 45656 1000 30x100083 ttyin ksh 94287 357692 57872 1000 30x8b pause screen 57872 185593 92296 1000 30x10008b pause ksh 92296 127811 4690 1000 30x90 selectsshd 4690 197172 85507 0 30x92 poll sshd 29860 469114 1 0 30x100083 ttyin getty 73869 393393 45656 1000 30x10008b pause ksh 45656 405711 1 1000 30x80 selectscreen 85507 417107 1 0 30x80 selectsshd 1937 376184 70354 1000 30x100083 ttyin ksh 70354 95367 21602 1000 30x90 selectsshd 21602 505612 1 0 30x92 poll sshd 76106 521289 1 0 30x100098 poll cron 57436 208740 77558 95 30x100092 kqreadsmtpd 48005 93137 77558103 30x100092 kqreadsmtpd 98080 297758 77558 95 30x100092 kqreadsmtpd 31269 34 77558 95 30x100092 kqreadsmtpd 28729 170519 77558 95 30x100092 kqreadsmtpd 35108