Public bug reported:

Various users have reported hangs happening during network namespace
creation. This mostly manifests in issues while starting lxc containers,
and, when triggered, can be seen clearly by running `unshare -n` which
will simply hang forever. This has been happening randomly for quite a
few kernel versions now. This has been confirmed on 4.13 from Proxmox
users (which uses an ubuntu based kernel with few patches), and various
other older and newer kernels as found by reports in the links [1][2][3]
below. [2] in particular contains the same symptoms across multiple
distributions and kernel versions. The posted stack traces do include
copy_net_ns() on top as well.

There are races in the network code causing copy_net_ns() to hang
(seemingly permanently). Some of these are caused by specific types of
interfaces being in use and have been addressed (various refcount leak
fixes), but that's not all of them. We've received yet another report
with the current version 4.15.0-22.24 / 4.15.17 with the same symptoms.

Processes in this state always have copy_net_ns() on top of their
/proc/$pid/stack looking like:

~/ cat /proc/5228/stack 
[<0>] copy_net_ns+0xab/0x220
[<0>] create_new_namespaces+0x11b/0x1e0
[<0>] unshare_nsproxy_namespaces+0x5a/0xb0
[<0>] SyS_unshare+0x201/0x3a0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

or

cat /proc/23900/stack 
[<0>] copy_net_ns+0xab/0x220
[<0>] create_new_namespaces+0x11b/0x1e0
[<0>] copy_namespaces+0x6d/0xa0
[<0>] copy_process.part.35+0x941/0x1ab0
[<0>] _do_fork+0xdf/0x3f0
[<0>] SyS_clone+0x19/0x20
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

This randomly affects users of network namespaces (lxc, lxd, docker, PVE
as well as service units using systemd's PrivateNetwork option and
various others).

Upstream there have been a lot of changes to the involved locking
mechanism since 4.16 and we should try to backport these patches.
This includes most of Kirill Tkhai's network patches and some others.

I've been going through the following ones generated via various `git
log` calls on net/, drivers/net/ (initially limiting to the ones with
`--author='Kirill Tkhai'` as a starting point.)
There's also a long list of patches we don't need to pick as they're
implicitly reverted by 1 later change, provided we include all the
necessary patches. They seem to be nice to review given that they're a
progressive change first introducing a flag about async-safety, then
going through all the affected areas with commit messages detailing
why/if/how they're safe, followed finally when they're all the same by a
commit to remove the flag again.

Orderd newest to oldest
U .. already in the ubuntu kernel, included due to its order when viewing 
related patches
P .. should be cherry-picked
Q .. (just 1) included for completion, will conflict in case backports of the 
patches adding NETDEV_{C,S}VLAN_FILTER_PUSH_INFO, which is probably good as a 
reminder for verification?
- .. if all other patches are applied, they're made obsolete by 2f635ceeb22b 
("net: Drop pernet_operations::async")

Q 3f5ecd8a90dd net: Fix coccinelle warning
P eb7f54b90bd8 kcm: Fix use-after-free caused by clonned sockets
P 554873e51711 net: Do not take net_rwsem in __rtnl_link_unregister()
P fc1dd36992bb net: Remove net_rwsem from {, un}register_netdevice_notifier()
P 328fbe747ad4 net: Close race between {un, }register_netdevice_notifier() and 
setup_net()/cleanup_net()
P 9e2f6c5d78db netfilter: Rework xt_TEE netdevice notifier
P e9a441b6e729 xfrm: Register xfrm_dev_notifier in appropriate place
P 152f253152cc net: Remove rtnl_lock() in nf_ct_iterate_destroy()
P ec9c780925c5 ovs: Remove rtnl_lock() from ovs_exit_net()
P 350311aab4c0 security: Remove rtnl_lock() in selinux_xfrm_notify_policyload()
P 10256debb918 net: Don't take rtnl_lock() in wireless_nlevent_flush()
P f0b07bb151b0 net: Introduce net_rwsem to protect net_namespace_list
d 8518e9bb98b6 net: Add more comments
P 4420bf21fb6c net: Rename net_sem to pernet_ops_rwsem
P 2f635ceeb22b net: Drop pernet_operations::async
P 094374e5e173 net: Reflect all pernet_operations are converted
- 67441c2472dd net: Convert nfsd_net_ops
- dbf7bb443726 net: Convert nfs4blocklayout_net_ops
- 436de500948e net: Convert nfs4_dns_resolver_ops
- 5e804a6077dc net: Convert sunrpc_net_ops
- 855aeba34047 net: Convert rpcsec_gss_net_ops
P 070f2d7e264a net: Drop NETDEV_UNREGISTER_FINAL
P 3e0c2dbfea28 infiniband: Replace usnic_ib_netdev_event_to_string() with 
netdev_cmd_to_name()
P ede2762d93ff net: Make NETDEV_XXX commands enum { }
- b2864fbdc5ab net: Convert rxrpc_net_ops
- fc18999ed2a2 net: Convert udp_sysctl_ops
P d9ff3049739e net: Replace ip_ra_lock with per-net mutex
P 5796ef75ec7b net: Make ip_ra_chain per struct net
P 128aaa98ad14 net: Revert "ipv4: fix a deadlock in ip_ra_control"
P 0526947f9dd0 net: Move IP_ROUTER_ALERT out of lock_sock(sk)
P 76d3e153d0d1 net: Revert "ipv4: get rid of ip_ra_lock"
P bdf5bd7f2132 rds: tcp: remove register_netdevice_notifier infrastructure.
- aa65f6365405 net: Convert nf_ct_net_ops
- 08012631d627 net: Convert lowpan_frags_ops
- 1ae776276073 net: Convert can_pernet_ops
- 6c77e79557ac net: Convert ip_vs_ftp_ops
- d0edfbb4ba4a net: Convert ipvs_core_dev_ops
- 554855ccdf37 net: Convert ipvs_core_ops
- ec716650a750 net: Convert ovs_net_ops
- 8cec2f49dc41 net: Convert mpls_net_ops
- 489b30b53f05 net: Convert l2tp_net_ops
P b0f3debc9a12 net: Use rtnl_lock_killable() in register_netdev()
P 79ffdfc6522a net: Add rtnl_lock_killable()
P 6056415d3a51 net: Add comment about pernet_operations methods and 
synchronization
- c939a5e4d597 net: Convert rds_tcp_net_ops
- afbbc374ab12 net: Convert tipc_net_ops
- bfdfa38ff0e2 net: Convert sctp_ctrlsock_ops
- 2e01ae0ef2db net: Convert sctp_defaults_ops
- 1fd2c55705ae net: Convet ipv6_net_ops
- e8a95ad46378 net: Convert ipv4_net_ops
- 8dbc6e2eaecc net: Convert iptable_security_net_ops
- 65f828c35261 net: Convert iptable_raw_net_ops
- 06a8a67b5dac net: Convert iptable_nat_net_ops
- 7ba81869d1f6 net: Convert iptable_mangle_net_ops
- 93623f2b0029 net: Convert arptable_filter_net_ops
- 59d269731e2b net: Convert pg_net_ops
- bd54dce07965 net: Convert nfnl_queue_net_ops
- 74f26bbf505a net: Convert nfnl_log_net_ops
- ffdf72bc1eed net: Convert cttimeout_ops
- cf51503a03f7 net: Convert nfnl_acct_ops
- 5a8e9be69d16 net: Convert nfnetlink_net_ops
- c7c5e435e44e net: Convert nf_tables_net_ops
- 649b9826cc73 net: Convert xfrm_user_net_ops
- 997266a4a02d net: Convert ip6 tables pernet_operations
P 30855ffc29b9 net: Make account struct net to memcg
- c29babb7fe26 net: Convert proto_gre_net_ops
- b04a3d098c4c net: Convert ctnetlink_net_ops
- 467d14b30739 net: Convert nf_conntrack_net_ops
- a5a179b6dff1 net: Convert ip_set_net_ops
- 6c6c566e6dd8 net: Convert fou_net_ops
- 16b0c0c4d9a7 net: Convert dccp_v6_ops
- 5368bd72cd7a net: Convert dccp_v4_ops
- 111da7adc127 net: Convert cangw_pernet_ops
- d217472410a0 net: Convert caif_net_ops
- c60a246cd366 net: Convert arp_tables_net_ops and ip6_tables_net_ops
- 3822034569ac net: Convert log pernet_operations
- ec012f3b8515 net: Convert broute_net_ops, frame_filter_net_ops and 
frame_nat_net_ops
- 2e75bb2f8b89 net: Convert hwsim_net_ops
- 3edbccf96d2d net: Convert smack_net_ops
- 79a4fb084326 net: Convert selinux_net_ops
- 9532ce17f7d5 net: Convert defrag6_net_ops
- afd7b3eb1346 net: Convert ila_net_ops
- e5b2ae93b523 net: Convert defrag4_net_ops
- f95978b7ad09 net: Convert clusterip_net_ops
- 7ca9e67febb1 net: Convert brnf_net_ops
- 68eabe8b660c net: Convert ipvlan_net_ops
- f17c9bf07f9c net: Convert cfg802154_pernet_ops
- 989d9812b7ca net: Convert sit_net_ops
- 5ecc29550add net: Convert vti6_net_ops
- 66997ba0834a net: Convert ip6_tnl_net_ops
- 5c155c50244a net: Convert ip6gre_net_ops
- 31502104b301 net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, 
vti_net_ops and ipip_net_ops
- 3cec5fb3476e net: Convert br_net_ops
- ef74c07cf179 net: Convert vxlan_net_ops
- cd59b28ce949 net: Convert ppp_net_ops
- 9e7674519151 net: Convert gtp_net_ops
- f60f33460a8c net: Convert geneve_net_ops
- 6963ad69cee2 net: Convert bond_net_ops
- 685ecfb19888 net: Convert tc_action_net_init() and tc_action_net_exit() based 
pernet_operations
- 5fcc85843d94 net: Convert sysctl creating and destroying pernet_operations
- 25354866e03d net: Convert cma_pernet_operations
- 02df428ca291 net: Convert simple pernet_operations
- 7300bd94e622 net: Convert nfs_net_ops
- f0aad8e340ea net: Convert synproxy_net_ops
- 47d63a01797b net: Convert hashlimit_net_ops and recent_net_ops
- c80afa026a7f net: Convert /proc creating and destroying pernet_operations
P 8349efd90339 net: Queue net_cleanup_work only if there is first net added
P 65b7b5b90fcd net: Make cleanup_list and net::cleanup_list of llist type
P 19efbd93e6fb net: Kill net_mutex
- da349fad8045 net: Convert iptable_filter_net_ops
- 4d6b80762b93 net: Convert ip_tables_net_ops, udplite6_net_ops and xt_net_ops
- 5fc094f5b8c9 net: Convert ip6_frags_ops
- d16784d9fb2a net: Convert fib6_net_ops, ipv6_addr_label_ops and 
ip6_segments_ops
- b489141369f7 net: Convert xfrm6_net_ops
- a7852a76f414 net: Convert ip6_flowlabel_net_ops
- ac34cb6c0c4d net: Convert ping_v6_net_ops
- 58708caef56b net: Convert ipv6_sysctl_net_ops
- fef65a2c6c34 net: Convert tcpv6_net_ops
- 7b7dd180b85b net: Convert fib6_rules_net_ops
- 85ca51b2a239 net: Convert ipv6_inetpeer_ops
- 509114112d0b net: Convert raw6_net_ops, udplite6_net_ops, ipv6_proc_ops, 
if6_proc_net_ops and ip6_route_net_late_ops
- 1a2e93329dd4 net: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops
- b01a59a4884e net: Convert ip6mr_net_ops
- 9c537ca1554e net: Convert cfg80211_pernet_ops
- 753d525a08f2 net: Convert inet6_net_ops
P d8d211a2a0c3 net: Make extern and export get_net_ns()
- b86b47a39598 net: Convert netlink_tap_net_ops
- 59a513587ac0 net: Convert diag_net_ops
- 2608e6b7adc8 net: Convert default_device_ops
- 9a4d105de784 net: Convert loopback_net_ops
- 0bc9be67185e net: Convert addrconf_ops
- 22769a2a6e93 net: Convert ipv4_sysctl_ops
- cb5e3400e785 net: Convert packet_net_ops
- 167f7ac723e5 net: Convert unix_net_ops
- f84c6821aa54 net: Convert pernet_subsys, registered from inet_init()
- 232cf06c611f net: Convert sysctl_core_ops
- 6c0075d0f6cc net: Convert wext_pernet_ops
- 83caf62c867b net: Convert genl_pernet_ops
- 13da199c38ee net: Convert subsys_initcall() registered pernet_operations from 
net/sched
- 86b63418fd38 net: Convert fib_* pernet_operations, registered via 
subsys_initcall
- 88b8ffebdb4d net: Convert pernet_subsys ops, registered via net_dev_init()
- 36b0068e6c98 net: Convert proto_net_ops
- 15898a011b3d net: Convert uevent_net_ops
- 906f63ec1d1b net: Convert audit_net_ops
- 46456675ec1b net: Convert rtnetlink_net_ops
- 194b95d21666 net: Convert netlink_net_ops
- ff291d005a98 net: Convert net_defaults_ops
- 604da74e4fc1 net: Convert net_inuse_ops
- c9d8fb91351f net: Convert nf_log_net_ops
- 954992992373 net: Convert netfilter_net_ops
- 93d230fe0762 net: Convert sysctl_pernet_ops
- 3fc3b827f0c4 net: Convert net_ns_ops methods
- f039e184bc45 net: Convert proc_net_ns_ops
P 447cd7a0d7d1 net: Allow pernet_operations to be executed in parallel
P bcab1ddd9b2b net: Move mutex_unlock() in cleanup_net() up
P 1a57feb847c5 net: Introduce net_sem for protection of pernet_list
P 5ba049a5cc8e net: Cleanup in copy_net_ns()
P 98f6c533a3e9 net: Assign net to net_namespace_list in setup_net()
U a560002437d3 net: Fix hlist corruptions in inet_evict_bucket()
P ed4ffdfec26d tipc: Fix missing RTNL lock protection during setting link 
properties
    either adapted, or we can include e5d1a1eec0f4
    ("tipc: Refactor __tipc_nl_compat_doit") including the 4 patches in
    between, but there are later fixup commits for crashes which we'd
    need to pick as well:

        c6404122cb18 tipc: fix possible crash in __tipc_nl_net_set()
        (...) perhaps more
        ed4ffdfec26d tipc: Fix missing RTNL lock protection during setting link 
properties
        5631f65decf3 tipc: Introduce __tipc_nl_net_set
        07ffb2235732 tipc: Introduce __tipc_nl_media_set
        93532bb1d436 tipc: Introduce __tipc_nl_bearer_set
        45cf7edfbc07 tipc: Introduce __tipc_nl_bearer_enable
        d59d8b77abf4 tipc: Introduce __tipc_nl_bearer_disable
        e5d1a1eec0f4 tipc: Refactor __tipc_nl_compat_doit

P fb07a820fe3f net: Move net:netns_ids destruction out of rtnl_lock() and 
document locking scheme
P 42157277af17 net: Remove spinlock from get_net_ns_by_id()
P 0c06bea919f3 net: Fix possible race in peernet2id_alloc()
P 273c28bc57ca net: Convert atomic_t net::count to refcount_t
U 11bf284f81b4 net: Protect iterations over net::fib_notifier_ops in 
fib_seq_sum()

(While going through the patches I also noticed an unrelated patch
* 7a107c0f55a3 fasync: Fix deadlock between task-context and interrupt-context 
kill_fasync()
which might be of interest generally...)

I have attached my current backport branch WIP (`git format-patch`-ed
off of the master branch of the bionic kernel
git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git) in case whoever goes
through this wants to diff my conflict resolution (there aren't many
conflicts anyway) for reference.

Links:
    [1] https://github.com/lxc/lxd/issues/4468
    [2] https://github.com/moby/moby/issues/5618
    [3] 
https://forum.proxmox.com/threads/lxc-container-reboot-fails-lxc-becomes-unusable.41264/
 [wrongly marked as solved]

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

** Attachment added: "current cherry-pick work"
   
https://bugs.launchpad.net/bugs/1779678/+attachment/5158678/+files/netns-locking-backport.tar.xz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1779678

Title:
  deadlocks in copy_net_ns

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779678/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to