Re: possible stack corruption in icmp_send (__stack_chk_fail)
On Wed, Feb 17, 2021 at 6:18 PM Jason A. Donenfeld wrote: > > On 2/18/21, Willem de Bruijn wrote: > > On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld wrote: > >> > >> Hi Willem, > >> > >> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn > >> wrote: > >> > A vmlinux image might help. I couldn't find one for this kernel. > >> > >> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz > >> has .debs with vmlinuz in there, which you can extract to vmlinux, as > >> well as my own vmlinux elf construction with the symbols added back in > >> by extracting them from kallsyms. That's the best I've been able to > >> do, as all of this is coming from somebody random emailing me. > >> > >> > But could it be > >> > that the forwarded packet is not sensible IPv4? The skb->protocol is > >> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol. > >> > >> The wg calls to icmp_ndo_send are gated by checking skb->protocol: > >> > >> if (skb->protocol == htons(ETH_P_IP)) > >>icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, > >> 0); > >>else if (skb->protocol == htons(ETH_P_IPV6)) > >>icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, > >> ICMPV6_ADDR_UNREACH, 0); > >> > >> On the other hand, that code is hit on an error path when > >> wg_check_packet_protocol returns false: > >> > >> static inline bool wg_check_packet_protocol(struct sk_buff *skb) > >> { > >>__be16 real_protocol = ip_tunnel_parse_protocol(skb); > >>return real_protocol && skb->protocol == real_protocol; > >> } > >> > >> So that means, at least in theory, icmp_ndo_send could be called with > >> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address > >> that. But... is it actually a problem? > > > > For this forwarded packet that arrived on a wireguard tunnel, > > skb->protocol was originally also set by ip_tunnel_parse_protocol. > > So likely not. > > > > The other issue seems more like a real bug. wg_xmit calling > > icmp_ndo_send without clearing IPCB first. > > > > Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb > before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will > send a patch for icmp_ndo_xmit momentarily and will CC you. Great, let's hope that's it. gtp_build_skb_ip4 zeroes before calling. The fix will be most obviously correct if wg_xmit does the same. But it is quite likely that the other callers, xfrmi_xmit2 and sunvnet_start_xmit_common should zero, too. If so, then icmp_ndo_xmit is the more robust location to do this. Then the Fixes tag will likely go quite a bit farther back, too. Whichever variant of the patch you prefer.
Re: possible stack corruption in icmp_send (__stack_chk_fail)
On 2/18/21, Willem de Bruijn wrote: > On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld wrote: >> >> Hi Willem, >> >> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn >> wrote: >> > A vmlinux image might help. I couldn't find one for this kernel. >> >> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz >> has .debs with vmlinuz in there, which you can extract to vmlinux, as >> well as my own vmlinux elf construction with the symbols added back in >> by extracting them from kallsyms. That's the best I've been able to >> do, as all of this is coming from somebody random emailing me. >> >> > But could it be >> > that the forwarded packet is not sensible IPv4? The skb->protocol is >> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol. >> >> The wg calls to icmp_ndo_send are gated by checking skb->protocol: >> >> if (skb->protocol == htons(ETH_P_IP)) >>icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, >> 0); >>else if (skb->protocol == htons(ETH_P_IPV6)) >>icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, >> ICMPV6_ADDR_UNREACH, 0); >> >> On the other hand, that code is hit on an error path when >> wg_check_packet_protocol returns false: >> >> static inline bool wg_check_packet_protocol(struct sk_buff *skb) >> { >>__be16 real_protocol = ip_tunnel_parse_protocol(skb); >>return real_protocol && skb->protocol == real_protocol; >> } >> >> So that means, at least in theory, icmp_ndo_send could be called with >> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address >> that. But... is it actually a problem? > > For this forwarded packet that arrived on a wireguard tunnel, > skb->protocol was originally also set by ip_tunnel_parse_protocol. > So likely not. > > The other issue seems more like a real bug. wg_xmit calling > icmp_ndo_send without clearing IPCB first. > Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will send a patch for icmp_ndo_xmit momentarily and will CC you. Jason
Re: possible stack corruption in icmp_send (__stack_chk_fail)
On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld wrote: > > Hi Willem, > > On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn > wrote: > > A vmlinux image might help. I couldn't find one for this kernel. > > https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz > has .debs with vmlinuz in there, which you can extract to vmlinux, as > well as my own vmlinux elf construction with the symbols added back in > by extracting them from kallsyms. That's the best I've been able to > do, as all of this is coming from somebody random emailing me. > > > But could it be > > that the forwarded packet is not sensible IPv4? The skb->protocol is > > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol. > > The wg calls to icmp_ndo_send are gated by checking skb->protocol: > > if (skb->protocol == htons(ETH_P_IP)) >icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0); >else if (skb->protocol == htons(ETH_P_IPV6)) >icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, > ICMPV6_ADDR_UNREACH, 0); > > On the other hand, that code is hit on an error path when > wg_check_packet_protocol returns false: > > static inline bool wg_check_packet_protocol(struct sk_buff *skb) > { >__be16 real_protocol = ip_tunnel_parse_protocol(skb); >return real_protocol && skb->protocol == real_protocol; > } > > So that means, at least in theory, icmp_ndo_send could be called with > skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address > that. But... is it actually a problem? For this forwarded packet that arrived on a wireguard tunnel, skb->protocol was originally also set by ip_tunnel_parse_protocol. So likely not. The other issue seems more like a real bug. wg_xmit calling icmp_ndo_send without clearing IPCB first.
Re: possible stack corruption in icmp_send (__stack_chk_fail)
Hi Willem, On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn wrote: > A vmlinux image might help. I couldn't find one for this kernel. https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz has .debs with vmlinuz in there, which you can extract to vmlinux, as well as my own vmlinux elf construction with the symbols added back in by extracting them from kallsyms. That's the best I've been able to do, as all of this is coming from somebody random emailing me. > But could it be > that the forwarded packet is not sensible IPv4? The skb->protocol is > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol. The wg calls to icmp_ndo_send are gated by checking skb->protocol: if (skb->protocol == htons(ETH_P_IP)) icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0); else if (skb->protocol == htons(ETH_P_IPV6)) icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0); On the other hand, that code is hit on an error path when wg_check_packet_protocol returns false: static inline bool wg_check_packet_protocol(struct sk_buff *skb) { __be16 real_protocol = ip_tunnel_parse_protocol(skb); return real_protocol && skb->protocol == real_protocol; } So that means, at least in theory, icmp_ndo_send could be called with skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address that. But... is it actually a problem? Jason
Re: possible stack corruption in icmp_send (__stack_chk_fail)
On Wed, Feb 17, 2021 at 1:12 PM Jason A. Donenfeld wrote: > > Hi Netdev & Willem, > > I've received a report of stack corruption -- via the stack protector > check -- in icmp_send. I was sent a vmcore, and was able to extract > the OOPS from there. However, I've been unable to produce the bug and > I don't see where it'd be in the code. That might point to a more > sinister problem, or I'm simply just not seeing it. Apparently the > reporter reproduces it every 40 or so minutes, and has seen it happen > since at least ~5.10. Willem - I'm emailing you because it seems like > you were making a lot of changes to the icmp code around then, and > perhaps you have an intuition. For example, some of the error handling > code takes a pointer to a stack buffer (_objh and such), and maybe > that's problematic? I'm not quite sure. The vmcore, along with the > various kernel binaries I hunted down are here: > https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz > . The extracted dmesg follows below, in case you or anyone has a > pointer. I've been staring at this for a while and don't see it. > > Jason Sorry, I also don't immediately see a cause. The _objh is a fairly standard approach to accessing skb data with skb_header_pointer. More importantly, that codepath is in the icmp receive path and then guarded by a socket option (inet_sk(sk)->recverr_rfc4884), so unlikely to be exercised here. This is an icmp send in response to a forwarded packet (assuming __qdisc_run dequeued the packet that triggered it). The icmp send code is quite robust against, e.g., undersized packets. But could it be that the forwarded packet is not sensible IPv4? The skb->protocol is inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol. As for on-stack variable overflow, __ip_options_echo parses the (untrusted) input to write into stack allocated icmp_param. But that is fairly well tested, rarely touched, code by now. Perhaps relevant, though, the opt passed is in skb->cb[], which at should probably not be interpreted as inet_skb_parm (IPCB). static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) { __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt); } A vmlinux image might help. I couldn't find one for this kernel. Or if the kernel can be modified and this path is rarely taken, logging the packet, e.g., with skb_dump.
possible stack corruption in icmp_send (__stack_chk_fail)
Hi Netdev & Willem, I've received a report of stack corruption -- via the stack protector check -- in icmp_send. I was sent a vmcore, and was able to extract the OOPS from there. However, I've been unable to produce the bug and I don't see where it'd be in the code. That might point to a more sinister problem, or I'm simply just not seeing it. Apparently the reporter reproduces it every 40 or so minutes, and has seen it happen since at least ~5.10. Willem - I'm emailing you because it seems like you were making a lot of changes to the icmp code around then, and perhaps you have an intuition. For example, some of the error handling code takes a pointer to a stack buffer (_objh and such), and maybe that's problematic? I'm not quite sure. The vmcore, along with the various kernel binaries I hunted down are here: https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz . The extracted dmesg follows below, in case you or anyone has a pointer. I've been staring at this for a while and don't see it. Jason Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __icmp_send+0x5bd/0x5c0 CPU: 0 PID: 959 Comm: kworker/0:2 Kdump: loaded Not tainted 5.11.0-051100-lowlatency #202102142330 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 Workqueue: wg-crypt-wg0 wg_packet_decrypt_worker [wireguard] Call Trace: show_stack+0x52/0x58 dump_stack+0x70/0x8b panic+0x108/0x2ea ? ip_push_pending_frames+0x42/0x90 ? __icmp_send+0x5bd/0x5c0 __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 wg_xmit+0x359/0x450 [wireguard] ? harmonize_features+0x19/0x80 xmit_one.constprop.0+0x9f/0x190 dev_hard_start_xmit+0x43/0x90 sch_direct_xmit+0x11d/0x340 __qdisc_run+0x66/0xc0 __dev_xmit_skb+0xd5/0x340 __dev_queue_xmit+0x32b/0x4d0 ? nf_conntrack_double_lock.constprop.0+0x97/0x140 [nf_conntrack] dev_queue_xmit+0x10/0x20 neigh_connected_output+0xcb/0xf0 ip_finish_output2+0x17f/0x470 __ip_finish_output+0x9b/0x140 ? ipv4_confirm+0x4a/0x80 [nf_conntrack] ip_finish_output+0x2d/0xb0 ip_output+0x78/0x110 ? __ip_finish_output+0x140/0x140 ip_forward_finish+0x58/0x90 ip_forward+0x40a/0x4d0 ? ip4_key_hashfn+0xb0/0xb0 ip_sublist_rcv_finish+0x3d/0x50 ip_list_rcv_finish.constprop.0+0x163/0x190 ip_sublist_rcv+0x37/0xb0 ? ip_rcv_finish_core.constprop.0+0x310/0x310 ip_list_rcv+0xf5/0x120 __netif_receive_skb_list_core+0x228/0x250 __netif_receive_skb_list+0x102/0x170 ? dev_gro_receive+0x1b5/0x370 netif_receive_skb_list_internal+0xca/0x190 napi_complete_done+0x7a/0x1a0 wg_packet_rx_poll+0x384/0x400 [wireguard] napi_poll+0x92/0x200 net_rx_action+0xb8/0x1c0 __do_softirq+0xce/0x2b3 asm_call_irq_on_stack+0x12/0x20 do_softirq_own_stack+0x3d/0x50 do_softirq+0x66/0x80 __local_bh_enable_ip+0x62/0x70 _raw_spin_unlock_bh+0x1e/0x20 wg_packet_decrypt_worker+0xf6/0x190 [wireguard] process_one_work+0x217/0x3e0 worker_thread+0x4d/0x350 ? rescuer_thread+0x390/0x390 kthread+0x145/0x170 ? __kthread_bind_mask+0x70/0x70 ret_from_fork+0x22/0x30 Kernel Offset: 0x200 from 0x8100 (relocation range: 0x8000-0xbfff)