Re: [PATCH net] ipvs: fix rtnl_lock lockups caused by start_sync_thread
On Mon, Apr 09, 2018 at 04:53:22PM +0200, Pablo Neira Ayuso wrote: > On Mon, Apr 09, 2018 at 10:20:18AM +0300, Simon Horman wrote: > > On Sat, Apr 07, 2018 at 03:50:47PM +0300, Julian Anastasov wrote: > > > syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] > > > > > > We have 2 problems in start_sync_thread if error path is > > > taken, eg. on memory allocation error or failure to configure > > > sockets for mcast group or addr/port binding: > > > > > > 1. recursive locking: holding rtnl_lock while calling sock_release > > > which in turn calls again rtnl_lock in ip_mc_drop_socket to leave > > > the mcast group, as noticed by Florian Westphal. Additionally, > > > sock_release can not be called while holding sync_mutex (ABBA > > > deadlock). > > > > > > 2. task hung: holding rtnl_lock while calling kthread_stop to > > > stop the running kthreads. As the kthreads do the same to leave > > > the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) > > > they hang. > > > > > > Fix the problems by calling rtnl_unlock early in the error path, > > > now sock_release is called after unlocking both mutexes. > > > > > > Problem 3 (task hung reported by syzkaller [2]) is variant of > > > problem 2: use _trylock to prevent one user to call rtnl_lock and > > > then while waiting for sync_mutex to block kthreads that execute > > > sock_release when they are stopped by stop_sync_thread. > > > > ... > > > > > Reported-and-tested-by: > > > syzbot+a46d6abf9d56b1365...@syzkaller.appspotmail.com > > > Reported-and-tested-by: > > > syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com > > > Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early") > > > Signed-off-by: Julian Anastasov > > > > Signed-off-by: Simon Horman > > > > Pablo, could you take this directly into nf and then onwards to net? Great, thanks! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ipvs: fix rtnl_lock lockups caused by start_sync_thread
On Mon, Apr 09, 2018 at 10:20:18AM +0300, Simon Horman wrote: > On Sat, Apr 07, 2018 at 03:50:47PM +0300, Julian Anastasov wrote: > > syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] > > > > We have 2 problems in start_sync_thread if error path is > > taken, eg. on memory allocation error or failure to configure > > sockets for mcast group or addr/port binding: > > > > 1. recursive locking: holding rtnl_lock while calling sock_release > > which in turn calls again rtnl_lock in ip_mc_drop_socket to leave > > the mcast group, as noticed by Florian Westphal. Additionally, > > sock_release can not be called while holding sync_mutex (ABBA > > deadlock). > > > > 2. task hung: holding rtnl_lock while calling kthread_stop to > > stop the running kthreads. As the kthreads do the same to leave > > the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) > > they hang. > > > > Fix the problems by calling rtnl_unlock early in the error path, > > now sock_release is called after unlocking both mutexes. > > > > Problem 3 (task hung reported by syzkaller [2]) is variant of > > problem 2: use _trylock to prevent one user to call rtnl_lock and > > then while waiting for sync_mutex to block kthreads that execute > > sock_release when they are stopped by stop_sync_thread. > > ... > > > Reported-and-tested-by: > > syzbot+a46d6abf9d56b1365...@syzkaller.appspotmail.com > > Reported-and-tested-by: > > syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com > > Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early") > > Signed-off-by: Julian Anastasov > > Signed-off-by: Simon Horman > > Pablo, could you take this directly into nf and then onwards to net? Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ipvs: fix rtnl_lock lockups caused by start_sync_thread
On Sat, Apr 07, 2018 at 03:50:47PM +0300, Julian Anastasov wrote: > syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] > > We have 2 problems in start_sync_thread if error path is > taken, eg. on memory allocation error or failure to configure > sockets for mcast group or addr/port binding: > > 1. recursive locking: holding rtnl_lock while calling sock_release > which in turn calls again rtnl_lock in ip_mc_drop_socket to leave > the mcast group, as noticed by Florian Westphal. Additionally, > sock_release can not be called while holding sync_mutex (ABBA > deadlock). > > 2. task hung: holding rtnl_lock while calling kthread_stop to > stop the running kthreads. As the kthreads do the same to leave > the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) > they hang. > > Fix the problems by calling rtnl_unlock early in the error path, > now sock_release is called after unlocking both mutexes. > > Problem 3 (task hung reported by syzkaller [2]) is variant of > problem 2: use _trylock to prevent one user to call rtnl_lock and > then while waiting for sync_mutex to block kthreads that execute > sock_release when they are stopped by stop_sync_thread. ... > Reported-and-tested-by: syzbot+a46d6abf9d56b1365...@syzkaller.appspotmail.com > Reported-and-tested-by: syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com > Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early") > Signed-off-by: Julian Anastasov Signed-off-by: Simon Horman Pablo, could you take this directly into nf and then onwards to net? ... -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] ipvs: fix rtnl_lock lockups caused by start_sync_thread
syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] We have 2 problems in start_sync_thread if error path is taken, eg. on memory allocation error or failure to configure sockets for mcast group or addr/port binding: 1. recursive locking: holding rtnl_lock while calling sock_release which in turn calls again rtnl_lock in ip_mc_drop_socket to leave the mcast group, as noticed by Florian Westphal. Additionally, sock_release can not be called while holding sync_mutex (ABBA deadlock). 2. task hung: holding rtnl_lock while calling kthread_stop to stop the running kthreads. As the kthreads do the same to leave the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) they hang. Fix the problems by calling rtnl_unlock early in the error path, now sock_release is called after unlocking both mutexes. Problem 3 (task hung reported by syzkaller [2]) is variant of problem 2: use _trylock to prevent one user to call rtnl_lock and then while waiting for sync_mutex to block kthreads that execute sock_release when they are stopped by stop_sync_thread. [1] IPVS: stopping backup sync thread 4500 ... WARNING: possible recursive locking detected 4.16.0-rc7+ #3 Not tainted syzkaller688027/4497 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 but task is already holding lock: IPVS: stopping backup sync thread 4495 ... (rtnl_mutex){+.+.}, at: [ ] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(rtnl_mutex); lock(rtnl_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syzkaller688027/4497: #0: (rtnl_mutex){+.+.}, at: [ ] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<703f78e3>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 stack backtrace: CPU: 1 PID: 4497 Comm: syzkaller688027 Not tainted 4.16.0-rc7+ #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_deadlock_bug kernel/locking/lockdep.c:1761 [inline] check_deadlock kernel/locking/lockdep.c:1805 [inline] validate_chain kernel/locking/lockdep.c:2401 [inline] __lock_acquire+0xe8f/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 ip_mc_drop_socket+0x88/0x230 net/ipv4/igmp.c:2643 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:413 sock_release+0x8d/0x1e0 net/socket.c:595 start_sync_thread+0x2213/0x2b70 net/netfilter/ipvs/ip_vs_sync.c:1924 do_ip_vs_set_ctl+0x1139/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2389 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1261 udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2406 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x446a69 RSP: 002b:7fa1c3a64da8 EFLAGS: 0246 ORIG_RAX: 0036 RAX: ffda RBX: RCX: 00446a69 RDX: 048b RSI: RDI: 0003 RBP: 006e29fc R08: 0018 R09: R10: 20c0 R11: 0246 R12: 006e29f8 R13: 00676e697279656b R14: 7fa1c3a659c0 R15: 006e2b60 [2] IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id = 0 IPVS: stopping backup sync thread 25415 ... INFO: task syz-executor7:25421 blocked for more than 120 seconds. Not tainted 4.16.0-rc6+ #284 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D23688 25421 4408 0x0004 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 kthread_stop+0x14a/0x7a0 kernel/kthread.c:530 stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996 do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394 nf_sockopt net/netfilter/nf_soc