I have a patch for this accepted upstream that I'll send to the Ubuntu
kernel team in short order. This has been merged to Linus's tree but
has yet to be picked up by Stable. It's tagged to go there, it just
hasn't been picked up by the robots yet. It affects all releases from
5.17 onward, which should put it in scope for Noble, Oracular, and
Plucky.
** Description changed:
[Impact]
If mptcp endpoints are configured on a host using an address that is
external to the host, then the kernel will create an implicit endpoint
with the host's local address when mptcp receives its first flow. If
multiple packets for these local interfaces arrive in parallel, more
than one caller may end up in mptcp_pm_nl_append_new_local_addr because
none found the address in local_addr_list during their call to
mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr
calls may delete the address entry created by the previous caller. These
deletes use synchronize_rcu, but this is not permitted in some of the
contexts where this function may be called. During packet recv, the
caller may be in a rcu read critical section and have preemption
disabled.
This can lead to a BUG / panic because synchronize_rcu is called in
softint context.
An example stack:
- BUG: scheduling while atomic: swapper/2/0/0x00000302
+ BUG: scheduling while atomic: swapper/2/0/0x00000302
- Call Trace:
- <IRQ>
- dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
- dump_stack (lib/dump_stack.c:124)
- __schedule_bug (kernel/sched/core.c:5943)
- schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33
kernel/sched/core.c:5970)
- __schedule (arch/x86/include/asm/jump_label.h:27
include/linux/jump_label.h:207 kernel/sched/features.h:29
kernel/sched/core.c:6621)
- schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804
kernel/sched/core.c:6818)
- schedule_timeout (kernel/time/timer.c:2160)
- wait_for_completion (kernel/sched/completion.c:96
kernel/sched/completion.c:116 kernel/sched/completion.c:127
kernel/sched/completion.c:148)
- __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
- synchronize_rcu (kernel/rcu/tree.c:3609)
- mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966
net/mptcp/pm_netlink.c:1061)
- mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
- mptcp_pm_get_local_id (net/mptcp/pm.c:420)
- subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
- subflow_v4_route_req (net/mptcp/subflow.c:305)
- tcp_conn_request (net/ipv4/tcp_input.c:7216)
- subflow_v4_conn_request (net/mptcp/subflow.c:651)
- tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
- tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
- tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
- ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
- ip_local_deliver (include/linux/netfilter.h:314
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
- ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
- ip_sublist_rcv (net/ipv4/ip_input.c:640)
- ip_list_rcv (net/ipv4/ip_input.c:675)
- __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
- netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
- napi_complete_done (include/linux/list.h:37 include/net/gro.h:449
include/net/gro.h:444 net/core/dev.c:6114)
- igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
- __napi_poll (net/core/dev.c:6582)
- net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
- handle_softirqs (kernel/softirq.c:553)
- __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427
kernel/softirq.c:636)
- irq_exit_rcu (kernel/softirq.c:651)
- common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
- </IRQ>
+ Call Trace:
+ <IRQ>
+ dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
+ dump_stack (lib/dump_stack.c:124)
+ __schedule_bug (kernel/sched/core.c:5943)
+ schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33
kernel/sched/core.c:5970)
+ __schedule (arch/x86/include/asm/jump_label.h:27
include/linux/jump_label.h:207 kernel/sched/features.h:29
kernel/sched/core.c:6621)
+ schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804
kernel/sched/core.c:6818)
+ schedule_timeout (kernel/time/timer.c:2160)
+ wait_for_completion (kernel/sched/completion.c:96
kernel/sched/completion.c:116 kernel/sched/completion.c:127
kernel/sched/completion.c:148)
+ __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
+ synchronize_rcu (kernel/rcu/tree.c:3609)
+ mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966
net/mptcp/pm_netlink.c:1061)
+ mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
+ mptcp_pm_get_local_id (net/mptcp/pm.c:420)
+ subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
+ subflow_v4_route_req (net/mptcp/subflow.c:305)
+ tcp_conn_request (net/ipv4/tcp_input.c:7216)
+ subflow_v4_conn_request (net/mptcp/subflow.c:651)
+ tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
+ tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
+ tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
+ ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
+ ip_local_deliver (include/linux/netfilter.h:314
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
+ ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
+ ip_sublist_rcv (net/ipv4/ip_input.c:640)
+ ip_list_rcv (net/ipv4/ip_input.c:675)
+ __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
+ netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
+ napi_complete_done (include/linux/list.h:37 include/net/gro.h:449
include/net/gro.h:444 net/core/dev.c:6114)
+ igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
+ __napi_poll (net/core/dev.c:6582)
+ net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
+ handle_softirqs (kernel/softirq.c:553)
+ __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427
kernel/softirq.c:636)
+ irq_exit_rcu (kernel/softirq.c:651)
+ common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
+ </IRQ>
[Backport]
Cherry-pick the following patch from upstream:
022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in
- mptcp_pm_nl_append_new_local_addr")
+ mptcp_pm_nl_append_new_local_addr")
This patch fixes the problem by deleting the duplicate prior to its
insertion in local_addr_list by skipping the replacement operation in
mptcp_pm_nl_append_new_local_addr. Instead of the last implicit
endpoint replacing the previous, it is discarded without a
synchronize_rcu and the old copy is kept. This mode is only selected in
mptcp_pm_nl_get_local_id.
[Test]
-
- This patch has passed the upstream mptcp test suites and has also been tested
against the reproducer that triggered the panic. (Add and remove mptcp
endpoints with an external address that differs from the internal address).
Prior to this patch the problem would trigger in less than a minute. With this
patch applied, the test has run for hours without incident.
+
+ This patch has passed the upstream mptcp test suites and has also been
+ tested against the reproducer that triggered the panic. (Add and remove
+ mptcp endpoints with an external address that differs from the internal
+ address). Prior to this patch the problem would trigger in less than a
+ minute. With this patch applied, the test has run for hours without
+ incident.
[Potential Regression]
The regression potential is low since the behavior change is small.
Implicit endpoints still get created and deleted, but they are only
replaced when a user adds an endpoint with the same local address as an
existing implicit address. No replacements via mptcp_pm_nl_get_local_id
will occur anymore.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2101120
Title:
mptcp BUG 'scheduling while atomic' in
mptcp_pm_nl_append_new_local_addr
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2101120/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs