Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 7:21 PM, Ozgur wrote: > > > 28.12.2017, 19:33, "Dmitry Vyukov" : >> On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert wrote: >>> On Thu, Dec 28, 2017 at 12:59 AM, Ozgur wrote: 28.12.2017, 04:19, "Tom Herbert" : > On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: >> 27.12.2017, 23:14, "Dmitry Vyukov" : >>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: 27.12.2017, 22:21, "Dmitry Vyukov" : >On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert > wrote: >> Did you try the patch I posted? > >Hi Tom, Hello Dmitry, >No. And I didn't know I need to. Why? >If you think the patch needs additional testing, you can ask > syzbot to >test it. See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot >Otherwise proceed with committing it. Or what are we waiting for? > >Thanks I think we need to fixed patch for crash, in fact check to patch code and test solve the bug. How do test it because there is no patch in the following bug? >>> >>> Hi Ozgur, >>> >>> I am not sure I completely understand what you mean. But the >>> reproducer for this bug (which one can use for testing) is here: >>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY >>> Tom also mentions there is some patch for this, but I don't know where >>> it is, it doesn't seem to be referenced from this thread. >> >> Hello Dmitry, >> >> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not >> tested :) >> I think Tom send patch to only you and are you tested? >> >> kcmsock.c will change and strp_data_ready I think locked. >> >> Tom, please send a patch for me? I can test and inform you. > > Hi Ozgur, > > I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if > you can! > > Thanks, > Tom Hello Tom, Which are you use the repos? I pulled but I don't seen this patches. >>> They are not in any public repo yet. I posted the patches to netdev >>> list so they can be reviewed and tested by third parties. Posting >>> patches to the list a normal path to get patches into the kernel >>> >>> (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/). >>> >>> These patches were applied to net-next but are simple enough that they >>> should apply to other branches. I will repost and target to net per >>> Dave's directive once they are verified to fix the issue. > > Hello, > > thanks Tom and I have tested the fixed patch for linux-next builds and don't > have to kernel panic. when nocheck funcs call sk_lock.owned and kernel > doesn't give a panic. I have compiled and uploaded next-kernel. > > Dmitry, > could you test it on linux-next? If you are trying to test how many times I can repeat this, I can repeat this lots of times: If you think the patch needs additional testing, you can ask syzbot to test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert wrote: > On Thu, Dec 28, 2017 at 12:59 AM, Ozgur wrote: >> >> >> 28.12.2017, 04:19, "Tom Herbert" : >>> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: 27.12.2017, 23:14, "Dmitry Vyukov" : > On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: >> 27.12.2017, 22:21, "Dmitry Vyukov" : >>> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert >>> wrote: Did you try the patch I posted? >>> >>> Hi Tom, >> >> Hello Dmitry, >> >>> No. And I didn't know I need to. Why? >>> If you think the patch needs additional testing, you can ask syzbot to >>> test it. See >>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot >>> Otherwise proceed with committing it. Or what are we waiting for? >>> >>> Thanks >> >> I think we need to fixed patch for crash, in fact check to patch code >> and test solve the bug. >> How do test it because there is no patch in the following bug? > > Hi Ozgur, > > I am not sure I completely understand what you mean. But the > reproducer for this bug (which one can use for testing) is here: > https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY > Tom also mentions there is some patch for this, but I don't know where > it is, it doesn't seem to be referenced from this thread. Hello Dmitry, Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) I think Tom send patch to only you and are you tested? kcmsock.c will change and strp_data_ready I think locked. Tom, please send a patch for me? I can test and inform you. >>> >>> Hi Ozgur, >>> >>> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you >>> can! >>> >>> Thanks, >>> Tom >> >> Hello Tom, >> >> Which are you use the repos? I pulled but I don't seen this patches. >> > They are not in any public repo yet. I posted the patches to netdev > list so they can be reviewed and tested by third parties. Posting > patches to the list a normal path to get patches into the kernel > (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/). > > These patches were applied to net-next but are simple enough that they > should apply to other branches. I will repost and target to net per > Dave's directive once they are verified to fix the issue. FWIW they are already verified to fix the issue, see few emails up, also here: https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/fPdZcO_GAwAJ and don't forget this: https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/uGjsrA3HAwAJ
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 12:59 AM, Ozgur wrote: > > > 28.12.2017, 04:19, "Tom Herbert" : >> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: >>> 27.12.2017, 23:14, "Dmitry Vyukov" : On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: > 27.12.2017, 22:21, "Dmitry Vyukov" : >> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert >> wrote: >>>Did you try the patch I posted? >> >> Hi Tom, > > Hello Dmitry, > >> No. And I didn't know I need to. Why? >> If you think the patch needs additional testing, you can ask syzbot to >> test it. See >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot >> Otherwise proceed with committing it. Or what are we waiting for? >> >> Thanks > > I think we need to fixed patch for crash, in fact check to patch code > and test solve the bug. > How do test it because there is no patch in the following bug? Hi Ozgur, I am not sure I completely understand what you mean. But the reproducer for this bug (which one can use for testing) is here: https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY Tom also mentions there is some patch for this, but I don't know where it is, it doesn't seem to be referenced from this thread. >>> >>> Hello Dmitry, >>> >>> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) >>> I think Tom send patch to only you and are you tested? >>> >>> kcmsock.c will change and strp_data_ready I think locked. >>> >>> Tom, please send a patch for me? I can test and inform you. >> >> Hi Ozgur, >> >> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you >> can! >> >> Thanks, >> Tom > > Hello Tom, > > Which are you use the repos? I pulled but I don't seen this patches. > They are not in any public repo yet. I posted the patches to netdev list so they can be reviewed and tested by third parties. Posting patches to the list a normal path to get patches into the kernel (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/). These patches were applied to net-next but are simple enough that they should apply to other branches. I will repost and target to net per Dave's directive once they are verified to fix the issue. Tom
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 9:12 AM, syzbot wrote: > Hello, > > syzbot has tested the proposed patch and the reproducer did not trigger > crash: > > Reported-and-tested-by: > syzbot+c91c53af67f9ebe599a337d2e70950366153b...@syzkaller.appspotmail.com /\/\/\/\/\/\/\/\/\/\/\/\/\ Tom, please don't miss this part! > Note: the tag will also help syzbot to understand when the bug is fixed. > > Tested on commit 5f520fc318764df800789edd202b5e3b55130613 > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master > compiler: gcc (GCC) 7.1.1 20170620 > Patch is attached. > Kernel config is attached. > > > --- > There is no WARRANTY for the result, to the extent permitted by applicable > law. > Except when otherwise stated in writing syzbot provides the result "AS IS" > without warranty of any kind, either expressed or implied, but not limited > to, > the implied warranties of merchantability and fittness for a particular > purpose. > The entire risk as to the quality of the result is with you. Should the > result > prove defective, you assume the cost of all necessary servicing, repair or > correction.
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbert wrote: > On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: >> >> >> 27.12.2017, 23:14, "Dmitry Vyukov" : >>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: 27.12.2017, 22:21, "Dmitry Vyukov" : > On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert > wrote: >> Did you try the patch I posted? > > Hi Tom, Hello Dmitry, > No. And I didn't know I need to. Why? > If you think the patch needs additional testing, you can ask syzbot to > test it. See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot > Otherwise proceed with committing it. Or what are we waiting for? > > Thanks I think we need to fixed patch for crash, in fact check to patch code and test solve the bug. How do test it because there is no patch in the following bug? >>> >>> Hi Ozgur, >>> >>> I am not sure I completely understand what you mean. But the >>> reproducer for this bug (which one can use for testing) is here: >>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY >>> Tom also mentions there is some patch for this, but I don't know where >>> it is, it doesn't seem to be referenced from this thread. >> >> Hello Dmitry, >> >> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) >> I think Tom send patch to only you and are you tested? >> >> kcmsock.c will change and strp_data_ready I think locked. >> >> Tom, please send a patch for me? I can test and inform you. >> > Hi Ozgur, > > I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you > can! OK, I will work as your typist this time: #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master But I wonder what part of the following you don't understand? Do we need to improve wording or something? > If you think the patch needs additional testing, you can ask syzbot to test > it. > See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot Also I don't know what git repo/branch you have in mind. Kernel patches don't generally apply just to any tree. Fingers crossed that I guessed correctly and it will apply. diff --git a/include/net/sock.h b/include/net/sock.h index 9155da422692..7a7b14e9628a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk) return sk->sk_lock.owned; } +static inline bool sock_owned_by_user_nocheck(const struct sock *sk) +{ + return sk->sk_lock.owned; +} + /* no reclassification while locks are held */ static inline bool sock_allow_reclassification(const struct sock *csk) { diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index c5fda15ba319..1fdab5c4eda8 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp) * allows a thread in BH context to safely check if the process * lock is held. In this case, if the lock is held, queue work. */ - if (sock_owned_by_user(strp->sk)) { + if (sock_owned_by_user_nocheck(strp->sk)) { queue_work(strp_wq, &strp->work); return; }
Re: WARNING in strp_data_ready
On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: > > > 27.12.2017, 23:14, "Dmitry Vyukov" : >> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: >>> 27.12.2017, 22:21, "Dmitry Vyukov" : On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert wrote: > Did you try the patch I posted? Hi Tom, >>> >>> Hello Dmitry, >>> No. And I didn't know I need to. Why? If you think the patch needs additional testing, you can ask syzbot to test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot Otherwise proceed with committing it. Or what are we waiting for? Thanks >>> >>> I think we need to fixed patch for crash, in fact check to patch code and >>> test solve the bug. >>> How do test it because there is no patch in the following bug? >> >> Hi Ozgur, >> >> I am not sure I completely understand what you mean. But the >> reproducer for this bug (which one can use for testing) is here: >> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY >> Tom also mentions there is some patch for this, but I don't know where >> it is, it doesn't seem to be referenced from this thread. > > Hello Dmitry, > > Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) > I think Tom send patch to only you and are you tested? > > kcmsock.c will change and strp_data_ready I think locked. > > Tom, please send a patch for me? I can test and inform you. > Hi Ozgur, I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you can! Thanks, Tom > Regards > > Ozgur > >>> The fix patch should be for this net/kcm/kcmsock.c file and lock functions >>> must be added calling sk_data_ready (). >>> Regards >>> >>> Ozgur >>> > On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov > wrote: >> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov >> wrote: wrote: > On 10/24/2017 08:20 AM, syzbot wrote: >> Hello, >> >> syzkaller hit the following crash on >> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb >> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> C reproducer is attached >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> for information about syzkaller reproducers >> >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> sock_owned_by_me include/net/sock.h:1505 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> sock_owned_by_user include/net/sock.h:1511 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> Kernel panic - not syncing: panic_on_warn set ... >> >> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 >> Hardware name: Google Google Compute Engine/Google Compute Engine, >> BIOS Google 01/01/2011 >> Call Trace: >> >>__dump_stack lib/dump_stack.c:16 [inline] >>dump_stack+0x194/0x257 lib/dump_stack.c:52 >>panic+0x1e4/0x417 kernel/panic.c:181 >>__warn+0x1c4/0x1d9 kernel/panic.c:542 >>report_bug+0x211/0x2d0 lib/bug.c:183 >>fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 >>do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] >>do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 >>do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 >>do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 >>invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 >> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] >> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] >> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> RSP: 0018:8801db206b18 EFLAGS: 00010206 >> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: >> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 >> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd >> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 >> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 >>psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 > > Looks like KCM is calling sk_data_ready() without first taking the > sock lock. > > /* Called with lower sock held */ > static void kcm_rcv_strparser(struct strparser *strp, struct > sk_buff *skb) > { >[...] > if (kcm_queue_rcv_skb(&kcm->sk, skb)) { > > In this
Re: WARNING in strp_data_ready
On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: > > > 27.12.2017, 22:21, "Dmitry Vyukov" : >> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert wrote: >>> Did you try the patch I posted? >> >> Hi Tom, > > Hello Dmitry, > >> No. And I didn't know I need to. Why? >> If you think the patch needs additional testing, you can ask syzbot to >> test it. See >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot >> Otherwise proceed with committing it. Or what are we waiting for? >> >> Thanks > > I think we need to fixed patch for crash, in fact check to patch code and > test solve the bug. > How do test it because there is no patch in the following bug? Hi Ozgur, I am not sure I completely understand what you mean. But the reproducer for this bug (which one can use for testing) is here: https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY Tom also mentions there is some patch for this, but I don't know where it is, it doesn't seem to be referenced from this thread. > The fix patch should be for this net/kcm/kcmsock.c file and lock functions > must be added calling sk_data_ready (). > Regards > > Ozgur > >>> On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov wrote: On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov wrote: >> wrote: >>> On 10/24/2017 08:20 AM, syzbot wrote: Hello, syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master compiler: gcc (GCC) 7.1.1 20170620 .config is attached Raw console output is attached. C reproducer is attached syzkaller reproducer is attached. See https://goo.gl/kgGztJ for information about syzkaller reproducers WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me include/net/sock.h:1505 [inline] WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user include/net/sock.h:1511 [inline] WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:181 __warn+0x1c4/0x1d9 kernel/panic.c:542 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 RSP: 0018:8801db206b18 EFLAGS: 00010206 RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 >>> >>> Looks like KCM is calling sk_data_ready() without first taking the >>> sock lock. >>> >>> /* Called with lower sock held */ >>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff >>> *skb) >>> { >>> [...] >>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) { >>> >>> In this case kcm->sk is not the same lock the comment is referring to. >>> And kcm_queue_rcv_skb() will eventually call sk_data_ready(). >>> >>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? >>> I don't have anything better in mind immediately. >> The sock locks are taken in reverse order in the send path so so >> grabbing kcm sock lock with lower lock held to call sk_data_ready may >> lead to deadlock like I think. >> >> It might be possible to change the order in the send path to do this. >> Something like: >> >> trylock on lower socket lock >> -if trylock fails >>- release kcm sock lock >>- lock lower sock >>- lock kcm sock >> - call sendpage locked fun
Re: WARNING in strp_data_ready
On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert wrote: > Did you try the patch I posted? Hi Tom, No. And I didn't know I need to. Why? If you think the patch needs additional testing, you can ask syzbot to test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot Otherwise proceed with committing it. Or what are we waiting for? Thanks > On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov wrote: >> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov wrote: wrote: > On 10/24/2017 08:20 AM, syzbot wrote: >> Hello, >> >> syzkaller hit the following crash on >> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> C reproducer is attached >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> for information about syzkaller reproducers >> >> >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me >> include/net/sock.h:1505 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> sock_owned_by_user include/net/sock.h:1511 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> Kernel panic - not syncing: panic_on_warn set ... >> >> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Call Trace: >> >> __dump_stack lib/dump_stack.c:16 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:52 >> panic+0x1e4/0x417 kernel/panic.c:181 >> __warn+0x1c4/0x1d9 kernel/panic.c:542 >> report_bug+0x211/0x2d0 lib/bug.c:183 >> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 >> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] >> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 >> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 >> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 >> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 >> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] >> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] >> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> RSP: 0018:8801db206b18 EFLAGS: 00010206 >> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: >> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 >> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd >> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 >> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 >> psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 > > Looks like KCM is calling sk_data_ready() without first taking the > sock lock. > > /* Called with lower sock held */ > static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) > { > [...] > if (kcm_queue_rcv_skb(&kcm->sk, skb)) { > > In this case kcm->sk is not the same lock the comment is referring to. > And kcm_queue_rcv_skb() will eventually call sk_data_ready(). > > @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? > I don't have anything better in mind immediately. > The sock locks are taken in reverse order in the send path so so grabbing kcm sock lock with lower lock held to call sk_data_ready may lead to deadlock like I think. It might be possible to change the order in the send path to do this. Something like: trylock on lower socket lock -if trylock fails - release kcm sock lock - lock lower sock - lock kcm sock - call sendpage locked function I admit that dealing with two levels of socket locks in the data path is quite a pain :-) >>> >>> up >>> >>> still happening and we've lost 50K+ test VMs on this >> >> up >> >> Still happens and number of crashes crossed 60K, can we do something >> with this please?
Re: WARNING in strp_data_ready
Did you try the patch I posted? On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov wrote: > On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov wrote: >>> wrote: On 10/24/2017 08:20 AM, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 73d3393ada4f70fa3df5639c8d438f2f034c0ecb > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > C reproducer is attached > syzkaller reproducer is attached. See https://goo.gl/kgGztJ > for information about syzkaller reproducers > > > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me > include/net/sock.h:1505 [inline] > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user > include/net/sock.h:1511 [inline] > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 > strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > > __dump_stack lib/dump_stack.c:16 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:52 > panic+0x1e4/0x417 kernel/panic.c:181 > __warn+0x1c4/0x1d9 kernel/panic.c:542 > report_bug+0x211/0x2d0 lib/bug.c:183 > fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 > do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] > do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 > do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 > invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 > RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] > RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] > RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 > RSP: 0018:8801db206b18 EFLAGS: 00010206 > RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: > RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 > RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd > R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 > R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 > psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 Looks like KCM is calling sk_data_ready() without first taking the sock lock. /* Called with lower sock held */ static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) { [...] if (kcm_queue_rcv_skb(&kcm->sk, skb)) { In this case kcm->sk is not the same lock the comment is referring to. And kcm_queue_rcv_skb() will eventually call sk_data_ready(). @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? I don't have anything better in mind immediately. >>> The sock locks are taken in reverse order in the send path so so >>> grabbing kcm sock lock with lower lock held to call sk_data_ready may >>> lead to deadlock like I think. >>> >>> It might be possible to change the order in the send path to do this. >>> Something like: >>> >>> trylock on lower socket lock >>> -if trylock fails >>> - release kcm sock lock >>> - lock lower sock >>> - lock kcm sock >>> - call sendpage locked function >>> >>> I admit that dealing with two levels of socket locks in the data path >>> is quite a pain :-) >> >> up >> >> still happening and we've lost 50K+ test VMs on this > > up > > Still happens and number of crashes crossed 60K, can we do something > with this please?
Re: WARNING in strp_data_ready
On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov wrote: >> wrote: >>> On 10/24/2017 08:20 AM, syzbot wrote: Hello, syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master compiler: gcc (GCC) 7.1.1 20170620 .config is attached Raw console output is attached. C reproducer is attached syzkaller reproducer is attached. See https://goo.gl/kgGztJ for information about syzkaller reproducers WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me include/net/sock.h:1505 [inline] WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user include/net/sock.h:1511 [inline] WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:181 __warn+0x1c4/0x1d9 kernel/panic.c:542 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 RSP: 0018:8801db206b18 EFLAGS: 00010206 RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 >>> >>> Looks like KCM is calling sk_data_ready() without first taking the >>> sock lock. >>> >>> /* Called with lower sock held */ >>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) >>> { >>> [...] >>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) { >>> >>> In this case kcm->sk is not the same lock the comment is referring to. >>> And kcm_queue_rcv_skb() will eventually call sk_data_ready(). >>> >>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? >>> I don't have anything better in mind immediately. >>> >> The sock locks are taken in reverse order in the send path so so >> grabbing kcm sock lock with lower lock held to call sk_data_ready may >> lead to deadlock like I think. >> >> It might be possible to change the order in the send path to do this. >> Something like: >> >> trylock on lower socket lock >> -if trylock fails >> - release kcm sock lock >> - lock lower sock >> - lock kcm sock >> - call sendpage locked function >> >> I admit that dealing with two levels of socket locks in the data path >> is quite a pain :-) > > up > > still happening and we've lost 50K+ test VMs on this up Still happens and number of crashes crossed 60K, can we do something with this please?
Re: WARNING in strp_data_ready
On Mon, Oct 30, 2017 at 11:06 PM, Tom Herbert wrote: > On Mon, Oct 30, 2017 at 2:44 PM, John Fastabend > wrote: >> On 10/24/2017 08:20 AM, syzbot wrote: >>> Hello, >>> >>> syzkaller hit the following crash on >>> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >>> compiler: gcc (GCC) 7.1.1 20170620 >>> .config is attached >>> Raw console output is attached. >>> C reproducer is attached >>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >>> for information about syzkaller reproducers >>> >>> >>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me >>> include/net/sock.h:1505 [inline] >>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user >>> include/net/sock.h:1511 [inline] >>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >>> Kernel panic - not syncing: panic_on_warn set ... >>> >>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >>> Google 01/01/2011 >>> Call Trace: >>> >>> __dump_stack lib/dump_stack.c:16 [inline] >>> dump_stack+0x194/0x257 lib/dump_stack.c:52 >>> panic+0x1e4/0x417 kernel/panic.c:181 >>> __warn+0x1c4/0x1d9 kernel/panic.c:542 >>> report_bug+0x211/0x2d0 lib/bug.c:183 >>> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 >>> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] >>> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 >>> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 >>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 >>> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 >>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] >>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] >>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >>> RSP: 0018:8801db206b18 EFLAGS: 00010206 >>> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: >>> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 >>> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd >>> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 >>> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 >>> psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 >> >> Looks like KCM is calling sk_data_ready() without first taking the >> sock lock. >> >> /* Called with lower sock held */ >> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) >> { >> [...] >> if (kcm_queue_rcv_skb(&kcm->sk, skb)) { >> >> In this case kcm->sk is not the same lock the comment is referring to. >> And kcm_queue_rcv_skb() will eventually call sk_data_ready(). >> >> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? >> I don't have anything better in mind immediately. >> > The sock locks are taken in reverse order in the send path so so > grabbing kcm sock lock with lower lock held to call sk_data_ready may > lead to deadlock like I think. > > It might be possible to change the order in the send path to do this. > Something like: > > trylock on lower socket lock > -if trylock fails > - release kcm sock lock > - lock lower sock > - lock kcm sock > - call sendpage locked function > > I admit that dealing with two levels of socket locks in the data path > is quite a pain :-) up still happening and we've lost 50K+ test VMs on this
Re: WARNING in strp_data_ready
On Mon, Oct 30, 2017 at 2:44 PM, John Fastabend wrote: > On 10/24/2017 08:20 AM, syzbot wrote: >> Hello, >> >> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> C reproducer is attached >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> for information about syzkaller reproducers >> >> >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me >> include/net/sock.h:1505 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user >> include/net/sock.h:1511 [inline] >> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 >> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> Kernel panic - not syncing: panic_on_warn set ... >> >> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Call Trace: >> >> __dump_stack lib/dump_stack.c:16 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:52 >> panic+0x1e4/0x417 kernel/panic.c:181 >> __warn+0x1c4/0x1d9 kernel/panic.c:542 >> report_bug+0x211/0x2d0 lib/bug.c:183 >> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 >> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] >> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 >> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 >> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 >> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 >> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] >> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] >> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >> RSP: 0018:8801db206b18 EFLAGS: 00010206 >> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: >> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 >> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd >> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 >> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 >> psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 > > Looks like KCM is calling sk_data_ready() without first taking the > sock lock. > > /* Called with lower sock held */ > static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) > { > [...] > if (kcm_queue_rcv_skb(&kcm->sk, skb)) { > > In this case kcm->sk is not the same lock the comment is referring to. > And kcm_queue_rcv_skb() will eventually call sk_data_ready(). > > @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? > I don't have anything better in mind immediately. > The sock locks are taken in reverse order in the send path so so grabbing kcm sock lock with lower lock held to call sk_data_ready may lead to deadlock like I think. It might be possible to change the order in the send path to do this. Something like: trylock on lower socket lock -if trylock fails - release kcm sock lock - lock lower sock - lock kcm sock - call sendpage locked function I admit that dealing with two levels of socket locks in the data path is quite a pain :-) Tom > Thanks, > John
Re: WARNING in strp_data_ready
On 10/24/2017 08:20 AM, syzbot wrote: > Hello, > > syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > C reproducer is attached > syzkaller reproducer is attached. See https://goo.gl/kgGztJ > for information about syzkaller reproducers > > > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me > include/net/sock.h:1505 [inline] > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user > include/net/sock.h:1511 [inline] > WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 > strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > > __dump_stack lib/dump_stack.c:16 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:52 > panic+0x1e4/0x417 kernel/panic.c:181 > __warn+0x1c4/0x1d9 kernel/panic.c:542 > report_bug+0x211/0x2d0 lib/bug.c:183 > fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 > do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] > do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 > do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 > invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 > RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] > RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] > RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 > RSP: 0018:8801db206b18 EFLAGS: 00010206 > RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: > RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0 > RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd > R10: 8801db206a00 R11: dc00 R12: 8801d29fa000 > R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001 > psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 Looks like KCM is calling sk_data_ready() without first taking the sock lock. /* Called with lower sock held */ static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) { [...] if (kcm_queue_rcv_skb(&kcm->sk, skb)) { In this case kcm->sk is not the same lock the comment is referring to. And kcm_queue_rcv_skb() will eventually call sk_data_ready(). @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? I don't have anything better in mind immediately. Thanks, John