[PATCH] batman-adv: Avoid infinite loop trying to resize local TT
If the MTU of one of an attached interface becomes too small to transmit the local translation table then it must be resized to fit inside all fragments (when enabled) or a single packet. But if the MTU becomes too low to transmit even the header + the VLAN specific part then the resizing of the local TT will never succeed. This can for example happen when the usable space is 110 bytes and 11 VLANs are on top of batman-adv. In this case, at least 116 byte would be needed. There will just be an endless spam of batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110) in the log but the function will never finish. Problem here is that the timeout will be halved all the time and will then stagnate at 0 and therefore never be able to reduce the table even more. There are other scenarios possible with a similar result. The number of BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too high to fit inside a packet. Such a scenario can therefore happen also with only a single VLAN + 7 non-purgable addresses - requiring at least 120 bytes. While this should be handled proactively when: * interface with too low MTU is added * VLAN is added * non-purgeable local mac is added * MTU of an attached interface is reduced * fragmentation setting gets disabled (which most likely requires dropping attached interfaces) not all of these scenarios can be prevented because batman-adv is only consuming events without the the possibility to prevent these actions (non-purgable MAC address added, MTU of an attached interface is reduced). It is therefore necessary to also make sure that the code is able to handle also the situations when there were already incompatible system configuration are present. Cc: Fixes: f7f2fe494388 ("batman-adv: limit local translation table max size") Reported-by: Signed-off-by: Sven Eckelmann --- net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b95c3676..2243cec1 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface) spin_lock_bh(_priv->tt.commit_lock); - while (true) { + while (timeout) { table_size = batadv_tt_local_table_transmit_size(bat_priv); if (packet_size_max >= table_size) break; --- base-commit: 7d30b5a06020e8c4e53968e4086a0fa6e9fdd947 change-id: 20240212-infinite-tt-resize-285a0d33fff8 Best regards, -- Sven Eckelmann
Re: [syzbot] [batman?] BUG: soft lockup in sys_sendmsg
> On Monday, 12 February 2024 11:26:24 CET syzbot wrote: >> syzbot found the following issue on: >> >> HEAD commit:41bccc98fb79 Linux 6.8-rc2 >> git tree: >> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci >> console output: https://syzkaller.appspot.com/x/log.txt?x=1420011818 >> kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 >> dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff >> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for >> Debian) 2.40 >> userspace arch: arm64 >> >> Unfortunately, I don't have any reproducer for this issue yet. >> >> Downloadable assets: >> disk image: >> https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.xz >> vmlinux: >> https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.xz >> kernel image: >> https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz.xz >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com >> > > #syz test This crash does not have a reproducer. I cannot test it. > > From 5984ace8f8df7cf8d6f98ded0eebe7d962028992 Mon Sep 17 00:00:00 2001 > From: Sven Eckelmann > Date: Mon, 12 Feb 2024 13:10:33 +0100 > Subject: [PATCH] batman-adv: Avoid infinite loop trying to resize local TT > > If the MTU of one of an attached interface becomes too small to transmit > the local translation table then it must be resized to fit inside all > fragments (when enabled) or a single packet. > > But if the MTU becomes too low to transmit even the header + the VLAN > specific part then the resizing of the local TT will never succeed. This > can for example happen when the usable space is 110 bytes and 11 VLANs are > on top of batman-adv. In this case, at least 116 byte would be needed. > There will just be an endless spam of > >batman_adv: batadv0: Forced to purge local tt entries to fit new maximum > fragment MTU (110) > > in the log but the function will never finish. Problem here is that the > timeout will be halved in each step and will then stagnate at 0 and > therefore never be able to reduce the table even more. > > There are other scenarios possible with a similar result. The number of > BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too > high to fit inside a packet. Such a scenario can therefore happen also with > only a single VLAN + 7 non-purgable addresses - requiring at least 120 > bytes. > > While this should be handled proactively when: > > * interface with too low MTU is added > * VLAN is added > * non-purgeable local mac is added > * MTU of an attached interface is reduced > * fragmentation setting gets disabled (which most likely requires dropping > attached interfaces) > > not all of these scenarios can be prevented because batman-adv is only > consuming events without the the possibility to prevent these actions > (non-purgable MAC address added, MTU of an attached interface is reduced). > It is therefore necessary to also make sure that the code is able to handle > also the situations when there were already incompatible system > configurations present. > > Cc: sta...@vger.kernel.org > Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size") > Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com > Signed-off-by: Sven Eckelmann > --- > net/batman-adv/translation-table.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/batman-adv/translation-table.c > b/net/batman-adv/translation-table.c > index b95c36765d04..2243cec18ecc 100644 > --- a/net/batman-adv/translation-table.c > +++ b/net/batman-adv/translation-table.c > @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device > *soft_iface) > > spin_lock_bh(_priv->tt.commit_lock); > > - while (true) { > + while (timeout) { > table_size = batadv_tt_local_table_transmit_size(bat_priv); > if (packet_size_max >= table_size) > break; > -- > 2.39.2 >
Re: [syzbot] [batman?] BUG: soft lockup in sys_sendmsg
On Monday, 12 February 2024 11:26:24 CET syzbot wrote: > syzbot found the following issue on: > > HEAD commit:41bccc98fb79 Linux 6.8-rc2 > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git > for-kernelci > console output: https://syzkaller.appspot.com/x/log.txt?x=1420011818 > kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 > dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) > 2.40 > userspace arch: arm64 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com > #syz test >From 5984ace8f8df7cf8d6f98ded0eebe7d962028992 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Mon, 12 Feb 2024 13:10:33 +0100 Subject: [PATCH] batman-adv: Avoid infinite loop trying to resize local TT If the MTU of one of an attached interface becomes too small to transmit the local translation table then it must be resized to fit inside all fragments (when enabled) or a single packet. But if the MTU becomes too low to transmit even the header + the VLAN specific part then the resizing of the local TT will never succeed. This can for example happen when the usable space is 110 bytes and 11 VLANs are on top of batman-adv. In this case, at least 116 byte would be needed. There will just be an endless spam of batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110) in the log but the function will never finish. Problem here is that the timeout will be halved in each step and will then stagnate at 0 and therefore never be able to reduce the table even more. There are other scenarios possible with a similar result. The number of BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too high to fit inside a packet. Such a scenario can therefore happen also with only a single VLAN + 7 non-purgable addresses - requiring at least 120 bytes. While this should be handled proactively when: * interface with too low MTU is added * VLAN is added * non-purgeable local mac is added * MTU of an attached interface is reduced * fragmentation setting gets disabled (which most likely requires dropping attached interfaces) not all of these scenarios can be prevented because batman-adv is only consuming events without the the possibility to prevent these actions (non-purgable MAC address added, MTU of an attached interface is reduced). It is therefore necessary to also make sure that the code is able to handle also the situations when there were already incompatible system configurations present. Cc: sta...@vger.kernel.org Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size") Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann --- net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b95c36765d04..2243cec18ecc 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface) spin_lock_bh(_priv->tt.commit_lock); - while (true) { + while (timeout) { table_size = batadv_tt_local_table_transmit_size(bat_priv); if (packet_size_max >= table_size) break; -- 2.39.2 signature.asc Description: This is a digitally signed message part.
Re: [syzbot] [batman?] BUG: soft lockup in sys_sendmsg
On Mon, Feb 12, 2024 at 11:26 AM syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit:41bccc98fb79 Linux 6.8-rc2 > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git > for-kernelci > console output: https://syzkaller.appspot.com/x/log.txt?x=1420011818 > kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 > dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) > 2.40 > userspace arch: arm64 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com > > watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [syz-executor.0:28718] > Modules linked in: > irq event stamp: 45929391 > hardirqs last enabled at (45929390): [] > __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 > hardirqs last disabled at (45929391): [] __el1_irq > arch/arm64/kernel/entry-common.c:499 [inline] > hardirqs last disabled at (45929391): [] > el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:517 > softirqs last enabled at (2040): [] softirq_handle_end > kernel/softirq.c:399 [inline] > softirqs last enabled at (2040): [] > __do_softirq+0xac8/0xce4 kernel/softirq.c:582 > softirqs last disabled at (2052): [] spin_lock_bh > include/linux/spinlock.h:356 [inline] > softirqs last disabled at (2052): [] > batadv_tt_local_resize_to_mtu+0x60/0x154 > net/batman-adv/translation-table.c:3949 > CPU: 1 PID: 28718 Comm: syz-executor.0 Not tainted > 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 11/17/2023 > pstate: 8045 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] > pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 > lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 > sp : 80009a0670b0 > x29: 80009a0670c0 x28: 70001340ce60 x27: 80009a0673d0 > x26: 00011e860290 x25: d08a9f08 x24: 0001 > x23: 1fffe00023d4d3c1 x22: dfff8000 x21: 80008aacbf98 > x20: 0202 x19: 00011ea69e08 x18: 80009a066800 > x17: 77656e2074696620 x16: 80008031ffc8 x15: 0001 > x14: 1fffe0001ba5a290 x13: x12: 0003 > x11: 0004 x10: 0003 x9 : > x8 : 02bcd3ae x7 : 80008aacbe30 x6 : > x5 : x4 : 0001 x3 : > x2 : 0002 x1 : 80008aecd7e0 x0 : 80012545c000 > Call trace: > __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] > arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] > __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 > __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] > _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 > spin_unlock_bh include/linux/spinlock.h:396 [inline] > batadv_tt_local_purge+0x264/0x2e8 net/batman-adv/translation-table.c:1356 > batadv_tt_local_resize_to_mtu+0xa0/0x154 > net/batman-adv/translation-table.c:3956 > batadv_update_min_mtu+0x74/0xa4 net/batman-adv/hard-interface.c:651 > batadv_netlink_set_mesh+0x50c/0x1078 net/batman-adv/netlink.c:500 > genl_family_rcv_msg_doit net/netlink/genetlink.c:1113 [inline] > genl_family_rcv_msg net/netlink/genetlink.c:1193 [inline] > genl_rcv_msg+0x874/0xb6c net/netlink/genetlink.c:1208 > netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2543 > genl_rcv+0x38/0x50 net/netlink/genetlink.c:1217 > netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] > netlink_unicast+0x65c/0x898 net/netlink/af_netlink.c:1367 > netlink_sendmsg+0x83c/0xb20 net/netlink/af_netlink.c:1908 > sock_sendmsg_nosec net/socket.c:730 [inline] > __sock_sendmsg net/socket.c:745 [inline] > sys_sendmsg+0x56c/0x840 net/socket.c:2584 > ___sys_sendmsg net/socket.c:2638 [inline] > __sys_sendmsg+0x26c/0x33c net/socket.c:2667 > __do_sys_sendmsg net/socket.c:2676 [inline] > __se_sys_sendmsg net/socket.c:2674 [inline] > __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2674 > __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] > invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 > el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 > do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 > el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 > el0t_64_sync_handler+0x84/0xfc
Re: [syzbot] [batman?] BUG: soft lockup in sys_sendmsg
On Monday, 12 February 2024 11:41:38 CET Eric Dumazet wrote: > This patch [1] looks suspicious Shouldn't be caused by this - but this might be another way to trigger the problem. The problem would be visible even without it when a mtu is explicitly set. But the reproducer is not available so I can't actually check what is going on. > I think batman-adv should reject too small MTU values. You are refering to the size calculated by batadv_tt_local_table_transmit_size(), right? And yes, I would agree that it looks suspicious and might not have been correctly integrated in batadv_max_header_len() when commit a19d3d85e1b8 ("batman-adv: limit local translation table max size") introduced the code. But I think we also need to remove interfaces again when receiving NETDEV_CHANGEMTU and an interface is not having the correctly sized anymore. So have to check how to do this the best way. Kind regards, Sven signature.asc Description: This is a digitally signed message part.
[syzbot] [batman?] BUG: soft lockup in sys_sendmsg
Hello, syzbot found the following issue on: HEAD commit:41bccc98fb79 Linux 6.8-rc2 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=1420011818 kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.xz kernel image: https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594...@syzkaller.appspotmail.com watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [syz-executor.0:28718] Modules linked in: irq event stamp: 45929391 hardirqs last enabled at (45929390): [] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 hardirqs last disabled at (45929391): [] __el1_irq arch/arm64/kernel/entry-common.c:499 [inline] hardirqs last disabled at (45929391): [] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:517 softirqs last enabled at (2040): [] softirq_handle_end kernel/softirq.c:399 [inline] softirqs last enabled at (2040): [] __do_softirq+0xac8/0xce4 kernel/softirq.c:582 softirqs last disabled at (2052): [] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (2052): [] batadv_tt_local_resize_to_mtu+0x60/0x154 net/batman-adv/translation-table.c:3949 CPU: 1 PID: 28718 Comm: syz-executor.0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 8045 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 sp : 80009a0670b0 x29: 80009a0670c0 x28: 70001340ce60 x27: 80009a0673d0 x26: 00011e860290 x25: d08a9f08 x24: 0001 x23: 1fffe00023d4d3c1 x22: dfff8000 x21: 80008aacbf98 x20: 0202 x19: 00011ea69e08 x18: 80009a066800 x17: 77656e2074696620 x16: 80008031ffc8 x15: 0001 x14: 1fffe0001ba5a290 x13: x12: 0003 x11: 0004 x10: 0003 x9 : x8 : 02bcd3ae x7 : 80008aacbe30 x6 : x5 : x4 : 0001 x3 : x2 : 0002 x1 : 80008aecd7e0 x0 : 80012545c000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_tt_local_purge+0x264/0x2e8 net/batman-adv/translation-table.c:1356 batadv_tt_local_resize_to_mtu+0xa0/0x154 net/batman-adv/translation-table.c:3956 batadv_update_min_mtu+0x74/0xa4 net/batman-adv/hard-interface.c:651 batadv_netlink_set_mesh+0x50c/0x1078 net/batman-adv/netlink.c:500 genl_family_rcv_msg_doit net/netlink/genetlink.c:1113 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1193 [inline] genl_rcv_msg+0x874/0xb6c net/netlink/genetlink.c:1208 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2543 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1217 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x65c/0x898 net/netlink/af_netlink.c:1367 netlink_sendmsg+0x83c/0xb20 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sys_sendmsg+0x56c/0x840 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2674 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: