Re: WARNING in strp_data_ready

2017-12-28 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 7:21 PM, Ozgur  wrote:
>
>
> 28.12.2017, 19:33, "Dmitry Vyukov" :
>> On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert  wrote:
>>>  On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
  28.12.2017, 04:19, "Tom Herbert" :
>  On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>   27.12.2017, 23:14, "Dmitry Vyukov" :
>>>   On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
27.12.2017, 22:21, "Dmitry Vyukov" :
>On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert 
>  wrote:
>> Did you try the patch I posted?
>
>Hi Tom,

Hello Dmitry,

>No. And I didn't know I need to. Why?
>If you think the patch needs additional testing, you can ask 
> syzbot to
>test it. See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>Otherwise proceed with committing it. Or what are we waiting for?
>
>Thanks

I think we need to fixed patch for crash, in fact check to patch 
 code and test solve the bug.
How do test it because there is no patch in the following bug?
>>>
>>>   Hi Ozgur,
>>>
>>>   I am not sure I completely understand what you mean. But the
>>>   reproducer for this bug (which one can use for testing) is here:
>>>   https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>>>   Tom also mentions there is some patch for this, but I don't know where
>>>   it is, it doesn't seem to be referenced from this thread.
>>
>>   Hello Dmitry,
>>
>>   Ah, I'm sorry I don't seen Tom mail and I don't have a patch not 
>> tested :)
>>   I think Tom send patch to only you and are you tested?
>>
>>   kcmsock.c will change and strp_data_ready I think locked.
>>
>>   Tom, please send a patch for me? I can test and inform you.
>
>  Hi Ozgur,
>
>  I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if 
> you can!
>
>  Thanks,
>  Tom

  Hello Tom,

  Which are you use the repos? I pulled but I don't seen this patches.
>>>  They are not in any public repo yet. I posted the patches to netdev
>>>  list so they can be reviewed and tested by third parties. Posting
>>>  patches to the list a normal path to get patches into the kernel
>>>  
>>> (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).
>>>
>>>  These patches were applied to net-next but are simple enough that they
>>>  should apply to other branches. I will repost and target to net per
>>>  Dave's directive once they are verified to fix the issue.
>
> Hello,
>
> thanks Tom and I have tested the fixed patch for linux-next builds and don't 
> have to kernel panic. when nocheck funcs call sk_lock.owned and kernel 
> doesn't give a panic.  I have compiled and uploaded next-kernel.
>
> Dmitry,
> could you test it on linux-next?

If you are trying to test how many times I can repeat this, I can
repeat this lots of times:

If you think the patch needs additional testing, you can ask syzbot to
test it. See 
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot


Re: WARNING in strp_data_ready

2017-12-28 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert  wrote:
> On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
>>
>>
>> 28.12.2017, 04:19, "Tom Herbert" :
>>> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
  27.12.2017, 23:14, "Dmitry Vyukov" :
>  On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>>   27.12.2017, 22:21, "Dmitry Vyukov" :
>>>   On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
>>> wrote:
Did you try the patch I posted?
>>>
>>>   Hi Tom,
>>
>>   Hello Dmitry,
>>
>>>   No. And I didn't know I need to. Why?
>>>   If you think the patch needs additional testing, you can ask syzbot to
>>>   test it. See 
>>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>>   Otherwise proceed with committing it. Or what are we waiting for?
>>>
>>>   Thanks
>>
>>   I think we need to fixed patch for crash, in fact check to patch code 
>> and test solve the bug.
>>   How do test it because there is no patch in the following bug?
>
>  Hi Ozgur,
>
>  I am not sure I completely understand what you mean. But the
>  reproducer for this bug (which one can use for testing) is here:
>  https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>  Tom also mentions there is some patch for this, but I don't know where
>  it is, it doesn't seem to be referenced from this thread.

  Hello Dmitry,

  Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
  I think Tom send patch to only you and are you tested?

  kcmsock.c will change and strp_data_ready I think locked.

  Tom, please send a patch for me? I can test and inform you.
>>>
>>> Hi Ozgur,
>>>
>>> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
>>> can!
>>>
>>> Thanks,
>>> Tom
>>
>> Hello Tom,
>>
>> Which are you use the repos? I pulled but I don't seen this patches.
>>
> They are not in any public repo yet. I posted the patches to netdev
> list so they can be reviewed and tested by third parties. Posting
> patches to the list a normal path to get patches into the kernel
> (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).
>
> These patches were applied to net-next but are simple enough that they
> should apply to other branches. I will repost and target to net per
> Dave's directive once they are verified to fix the issue.

FWIW they are already verified to fix the issue, see few emails up, also here:
https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/fPdZcO_GAwAJ
and don't forget this:
https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/uGjsrA3HAwAJ


Re: WARNING in strp_data_ready

2017-12-28 Thread Tom Herbert
On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
>
>
> 28.12.2017, 04:19, "Tom Herbert" :
>> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>>  27.12.2017, 23:14, "Dmitry Vyukov" :
  On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>   27.12.2017, 22:21, "Dmitry Vyukov" :
>>   On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
>> wrote:
>>>Did you try the patch I posted?
>>
>>   Hi Tom,
>
>   Hello Dmitry,
>
>>   No. And I didn't know I need to. Why?
>>   If you think the patch needs additional testing, you can ask syzbot to
>>   test it. See 
>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>   Otherwise proceed with committing it. Or what are we waiting for?
>>
>>   Thanks
>
>   I think we need to fixed patch for crash, in fact check to patch code 
> and test solve the bug.
>   How do test it because there is no patch in the following bug?

  Hi Ozgur,

  I am not sure I completely understand what you mean. But the
  reproducer for this bug (which one can use for testing) is here:
  https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
  Tom also mentions there is some patch for this, but I don't know where
  it is, it doesn't seem to be referenced from this thread.
>>>
>>>  Hello Dmitry,
>>>
>>>  Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
>>>  I think Tom send patch to only you and are you tested?
>>>
>>>  kcmsock.c will change and strp_data_ready I think locked.
>>>
>>>  Tom, please send a patch for me? I can test and inform you.
>>
>> Hi Ozgur,
>>
>> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
>> can!
>>
>> Thanks,
>> Tom
>
> Hello Tom,
>
> Which are you use the repos? I pulled but I don't seen this patches.
>
They are not in any public repo yet. I posted the patches to netdev
list so they can be reviewed and tested by third parties. Posting
patches to the list a normal path to get patches into the kernel
(http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).

These patches were applied to net-next but are simple enough that they
should apply to other branches. I will repost and target to net per
Dave's directive once they are verified to fix the issue.

Tom


Re: WARNING in strp_data_ready

2017-12-28 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 9:12 AM, syzbot

wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:
>
> Reported-and-tested-by:
> syzbot+c91c53af67f9ebe599a337d2e70950366153b...@syzkaller.appspotmail.com

/\/\/\/\/\/\/\/\/\/\/\/\/\

Tom, please don't miss this part!

> Note: the tag will also help syzbot to understand when the bug is fixed.
>
> Tested on commit 5f520fc318764df800789edd202b5e3b55130613
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> Patch is attached.
> Kernel config is attached.
>
>
> ---
> There is no WARRANTY for the result, to the extent permitted by applicable
> law.
> Except when otherwise stated in writing syzbot provides the result "AS IS"
> without warranty of any kind, either expressed or implied, but not limited
> to,
> the implied warranties of merchantability and fittness for a particular
> purpose.
> The entire risk as to the quality of the result is with you. Should the
> result
> prove defective, you assume the cost of all necessary servicing, repair or
> correction.


Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbert  wrote:
> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>
>>
>> 27.12.2017, 23:14, "Dmitry Vyukov" :
>>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
  27.12.2017, 22:21, "Dmitry Vyukov" :
>  On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
> wrote:
>>   Did you try the patch I posted?
>
>  Hi Tom,

  Hello Dmitry,

>  No. And I didn't know I need to. Why?
>  If you think the patch needs additional testing, you can ask syzbot to
>  test it. See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>  Otherwise proceed with committing it. Or what are we waiting for?
>
>  Thanks

  I think we need to fixed patch for crash, in fact check to patch code and 
 test solve the bug.
  How do test it because there is no patch in the following bug?
>>>
>>> Hi Ozgur,
>>>
>>> I am not sure I completely understand what you mean. But the
>>> reproducer for this bug (which one can use for testing) is here:
>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>>> Tom also mentions there is some patch for this, but I don't know where
>>> it is, it doesn't seem to be referenced from this thread.
>>
>> Hello Dmitry,
>>
>> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
>> I think Tom send patch to only you and are you tested?
>>
>> kcmsock.c will change and strp_data_ready I think locked.
>>
>> Tom, please send a patch for me? I can test and inform you.
>>
> Hi Ozgur,
>
> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
> can!

OK, I will work as your typist this time:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

But I wonder what part of the following you don't understand? Do we
need to improve wording or something?

> If you think the patch needs additional testing, you can ask syzbot to test 
> it.
> See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

Also I don't know what git repo/branch you have in mind. Kernel
patches don't generally apply just to any tree. Fingers crossed that I
guessed correctly and it will apply.
diff --git a/include/net/sock.h b/include/net/sock.h
index 9155da422692..7a7b14e9628a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk)
 	return sk->sk_lock.owned;
 }
 
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+	return sk->sk_lock.owned;
+}
+
 /* no reclassification while locks are held */
 static inline bool sock_allow_reclassification(const struct sock *csk)
 {
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index c5fda15ba319..1fdab5c4eda8 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp)
 	 * allows a thread in BH context to safely check if the process
 	 * lock is held. In this case, if the lock is held, queue work.
 	 */
-	if (sock_owned_by_user(strp->sk)) {
+	if (sock_owned_by_user_nocheck(strp->sk)) {
 		queue_work(strp_wq, &strp->work);
 		return;
 	}


Re: WARNING in strp_data_ready

2017-12-27 Thread Tom Herbert
On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>
>
> 27.12.2017, 23:14, "Dmitry Vyukov" :
>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>>>  27.12.2017, 22:21, "Dmitry Vyukov" :
  On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  wrote:
>   Did you try the patch I posted?

  Hi Tom,
>>>
>>>  Hello Dmitry,
>>>
  No. And I didn't know I need to. Why?
  If you think the patch needs additional testing, you can ask syzbot to
  test it. See 
 https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
  Otherwise proceed with committing it. Or what are we waiting for?

  Thanks
>>>
>>>  I think we need to fixed patch for crash, in fact check to patch code and 
>>> test solve the bug.
>>>  How do test it because there is no patch in the following bug?
>>
>> Hi Ozgur,
>>
>> I am not sure I completely understand what you mean. But the
>> reproducer for this bug (which one can use for testing) is here:
>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>> Tom also mentions there is some patch for this, but I don't know where
>> it is, it doesn't seem to be referenced from this thread.
>
> Hello Dmitry,
>
> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
> I think Tom send patch to only you and are you tested?
>
> kcmsock.c will change and strp_data_ready I think locked.
>
> Tom, please send a patch for me? I can test and inform you.
>
Hi Ozgur,

I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you can!

Thanks,
Tom

> Regards
>
> Ozgur
>
>>>  The fix patch should be for this net/kcm/kcmsock.c file and lock functions 
>>> must be added calling sk_data_ready ().
>>>  Regards
>>>
>>>  Ozgur
>>>
>   On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov  
> wrote:
>>   On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov  
>> wrote:
wrote:
>   On 10/24/2017 08:20 AM, syzbot wrote:
>>   Hello,
>>
>>   syzkaller hit the following crash on 
>> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>>   
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>   compiler: gcc (GCC) 7.1.1 20170620
>>   .config is attached
>>   Raw console output is attached.
>>   C reproducer is attached
>>   syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>   for information about syzkaller reproducers
>>
>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> sock_owned_by_me include/net/sock.h:1505 [inline]
>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> sock_owned_by_user include/net/sock.h:1511 [inline]
>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>   Kernel panic - not syncing: panic_on_warn set ...
>>
>>   CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>>   Hardware name: Google Google Compute Engine/Google Compute Engine, 
>> BIOS Google 01/01/2011
>>   Call Trace:
>>
>>__dump_stack lib/dump_stack.c:16 [inline]
>>dump_stack+0x194/0x257 lib/dump_stack.c:52
>>panic+0x1e4/0x417 kernel/panic.c:181
>>__warn+0x1c4/0x1d9 kernel/panic.c:542
>>report_bug+0x211/0x2d0 lib/bug.c:183
>>fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>>   RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>>   RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>>   RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>   RSP: 0018:8801db206b18 EFLAGS: 00010206
>>   RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
>>   RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
>>   RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
>>   R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
>>   R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>>psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>
>   Looks like KCM is calling sk_data_ready() without first taking the
>   sock lock.
>
>   /* Called with lower sock held */
>   static void kcm_rcv_strparser(struct strparser *strp, struct 
> sk_buff *skb)
>   {
>[...]
>   if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>
>   In this

Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>
>
> 27.12.2017, 22:21, "Dmitry Vyukov" :
>> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  wrote:
>>>  Did you try the patch I posted?
>>
>> Hi Tom,
>
> Hello Dmitry,
>
>> No. And I didn't know I need to. Why?
>> If you think the patch needs additional testing, you can ask syzbot to
>> test it. See 
>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>> Otherwise proceed with committing it. Or what are we waiting for?
>>
>> Thanks
>
> I think we need to fixed patch for crash, in fact check to patch code and 
> test solve the bug.
> How do test it because there is no patch in the following bug?

Hi Ozgur,

I am not sure I completely understand what you mean. But the
reproducer for this bug (which one can use for testing) is here:
https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
Tom also mentions there is some patch for this, but I don't know where
it is, it doesn't seem to be referenced from this thread.


> The fix patch should be for this net/kcm/kcmsock.c file and  lock functions 
> must be added calling sk_data_ready ().
> Regards
>
> Ozgur
>
>>>  On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov  wrote:
  On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov  wrote:
>>   wrote:
>>>  On 10/24/2017 08:20 AM, syzbot wrote:
  Hello,

  syzkaller hit the following crash on 
 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
  
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
  compiler: gcc (GCC) 7.1.1 20170620
  .config is attached
  Raw console output is attached.
  C reproducer is attached
  syzkaller reproducer is attached. See https://goo.gl/kgGztJ
  for information about syzkaller reproducers

  WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
 sock_owned_by_me include/net/sock.h:1505 [inline]
  WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
 sock_owned_by_user include/net/sock.h:1511 [inline]
  WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
  Hardware name: Google Google Compute Engine/Google Compute Engine, 
 BIOS Google 01/01/2011
  Call Trace:
   
   __dump_stack lib/dump_stack.c:16 [inline]
   dump_stack+0x194/0x257 lib/dump_stack.c:52
   panic+0x1e4/0x417 kernel/panic.c:181
   __warn+0x1c4/0x1d9 kernel/panic.c:542
   report_bug+0x211/0x2d0 lib/bug.c:183
   fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
   do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
   do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
   do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
   invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
  RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
  RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
  RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
  RSP: 0018:8801db206b18 EFLAGS: 00010206
  RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
  RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
  RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
  R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
  R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
   psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>>>
>>>  Looks like KCM is calling sk_data_ready() without first taking the
>>>  sock lock.
>>>
>>>  /* Called with lower sock held */
>>>  static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff 
>>> *skb)
>>>  {
>>>   [...]
>>>  if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>>>
>>>  In this case kcm->sk is not the same lock the comment is referring to.
>>>  And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>>>
>>>  @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
>>>  I don't have anything better in mind immediately.
>>  The sock locks are taken in reverse order in the send path so so
>>  grabbing kcm sock lock with lower lock held to call sk_data_ready may
>>  lead to deadlock like I think.
>>
>>  It might be possible to change the order in the send path to do this.
>>  Something like:
>>
>>  trylock on lower socket lock
>>  -if trylock fails
>>- release kcm sock lock
>>- lock lower sock
>>- lock kcm sock
>>  - call sendpage locked fun

Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  wrote:
> Did you try the patch I posted?

Hi Tom,

No. And I didn't know I need to. Why?
If you think the patch needs additional testing, you can ask syzbot to
test it. See 
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
Otherwise proceed with committing it. Or what are we waiting for?

Thanks



> On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov  wrote:
>> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov  wrote:
  wrote:
> On 10/24/2017 08:20 AM, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on 
>> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
>> include/net/sock.h:1505 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> sock_owned_by_user include/net/sock.h:1511 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  panic+0x1e4/0x417 kernel/panic.c:181
>>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>>  report_bug+0x211/0x2d0 lib/bug.c:183
>>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> RSP: 0018:8801db206b18 EFLAGS: 00010206
>> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
>> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
>> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
>> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
>> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>
> Looks like KCM is calling sk_data_ready() without first taking the
> sock lock.
>
> /* Called with lower sock held */
> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
> {
>  [...]
> if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>
> In this case kcm->sk is not the same lock the comment is referring to.
> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>
> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
> I don't have anything better in mind immediately.
>
 The sock locks are taken in reverse order in the send path so so
 grabbing kcm sock lock with lower lock held to call sk_data_ready may
 lead to deadlock like I think.

 It might be possible to change the order in the send path to do this.
 Something like:

 trylock on lower socket lock
 -if trylock fails
   - release kcm sock lock
   - lock lower sock
   - lock kcm sock
 - call sendpage locked function

 I admit that dealing with two levels of socket locks in the data path
 is quite a pain :-)
>>>
>>> up
>>>
>>> still happening and we've lost 50K+ test VMs on this
>>
>> up
>>
>> Still happens and number of crashes crossed 60K, can we do something
>> with this please?


Re: WARNING in strp_data_ready

2017-12-27 Thread Tom Herbert
Did you try the patch I posted?


On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov  wrote:
> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov  wrote:
>>>  wrote:
 On 10/24/2017 08:20 AM, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on 
> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
>
>
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
> include/net/sock.h:1505 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
> include/net/sock.h:1511 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  panic+0x1e4/0x417 kernel/panic.c:181
>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> RSP: 0018:8801db206b18 EFLAGS: 00010206
> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353

 Looks like KCM is calling sk_data_ready() without first taking the
 sock lock.

 /* Called with lower sock held */
 static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
 {
  [...]
 if (kcm_queue_rcv_skb(&kcm->sk, skb)) {

 In this case kcm->sk is not the same lock the comment is referring to.
 And kcm_queue_rcv_skb() will eventually call sk_data_ready().

 @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
 I don't have anything better in mind immediately.

>>> The sock locks are taken in reverse order in the send path so so
>>> grabbing kcm sock lock with lower lock held to call sk_data_ready may
>>> lead to deadlock like I think.
>>>
>>> It might be possible to change the order in the send path to do this.
>>> Something like:
>>>
>>> trylock on lower socket lock
>>> -if trylock fails
>>>   - release kcm sock lock
>>>   - lock lower sock
>>>   - lock kcm sock
>>> - call sendpage locked function
>>>
>>> I admit that dealing with two levels of socket locks in the data path
>>> is quite a pain :-)
>>
>> up
>>
>> still happening and we've lost 50K+ test VMs on this
>
> up
>
> Still happens and number of crashes crossed 60K, can we do something
> with this please?


Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov  wrote:
>>  wrote:
>>> On 10/24/2017 08:20 AM, syzbot wrote:
 Hello,

 syzkaller hit the following crash on 
 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
 compiler: gcc (GCC) 7.1.1 20170620
 .config is attached
 Raw console output is attached.
 C reproducer is attached
 syzkaller reproducer is attached. See https://goo.gl/kgGztJ
 for information about syzkaller reproducers


 WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
 include/net/sock.h:1505 [inline]
 WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
 include/net/sock.h:1511 [inline]
 WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
 Kernel panic - not syncing: panic_on_warn set ...

 CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
 Google 01/01/2011
 Call Trace:
  
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  panic+0x1e4/0x417 kernel/panic.c:181
  __warn+0x1c4/0x1d9 kernel/panic.c:542
  report_bug+0x211/0x2d0 lib/bug.c:183
  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
 RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
 RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
 RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
 RSP: 0018:8801db206b18 EFLAGS: 00010206
 RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
 RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
 RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
 R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
 R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>>>
>>> Looks like KCM is calling sk_data_ready() without first taking the
>>> sock lock.
>>>
>>> /* Called with lower sock held */
>>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
>>> {
>>>  [...]
>>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>>>
>>> In this case kcm->sk is not the same lock the comment is referring to.
>>> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>>>
>>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
>>> I don't have anything better in mind immediately.
>>>
>> The sock locks are taken in reverse order in the send path so so
>> grabbing kcm sock lock with lower lock held to call sk_data_ready may
>> lead to deadlock like I think.
>>
>> It might be possible to change the order in the send path to do this.
>> Something like:
>>
>> trylock on lower socket lock
>> -if trylock fails
>>   - release kcm sock lock
>>   - lock lower sock
>>   - lock kcm sock
>> - call sendpage locked function
>>
>> I admit that dealing with two levels of socket locks in the data path
>> is quite a pain :-)
>
> up
>
> still happening and we've lost 50K+ test VMs on this

up

Still happens and number of crashes crossed 60K, can we do something
with this please?


Re: WARNING in strp_data_ready

2017-12-06 Thread Dmitry Vyukov
On Mon, Oct 30, 2017 at 11:06 PM, Tom Herbert  wrote:
> On Mon, Oct 30, 2017 at 2:44 PM, John Fastabend
>  wrote:
>> On 10/24/2017 08:20 AM, syzbot wrote:
>>> Hello,
>>>
>>> syzkaller hit the following crash on 
>>> 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>> compiler: gcc (GCC) 7.1.1 20170620
>>> .config is attached
>>> Raw console output is attached.
>>> C reproducer is attached
>>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>> for information about syzkaller reproducers
>>>
>>>
>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
>>> include/net/sock.h:1505 [inline]
>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
>>> include/net/sock.h:1511 [inline]
>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>> Kernel panic - not syncing: panic_on_warn set ...
>>>
>>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>>> Google 01/01/2011
>>> Call Trace:
>>>  
>>>  __dump_stack lib/dump_stack.c:16 [inline]
>>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>  panic+0x1e4/0x417 kernel/panic.c:181
>>>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>>>  report_bug+0x211/0x2d0 lib/bug.c:183
>>>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>> RSP: 0018:8801db206b18 EFLAGS: 00010206
>>> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
>>> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
>>> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
>>> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
>>> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>>>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>>
>> Looks like KCM is calling sk_data_ready() without first taking the
>> sock lock.
>>
>> /* Called with lower sock held */
>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
>> {
>>  [...]
>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>>
>> In this case kcm->sk is not the same lock the comment is referring to.
>> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>>
>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
>> I don't have anything better in mind immediately.
>>
> The sock locks are taken in reverse order in the send path so so
> grabbing kcm sock lock with lower lock held to call sk_data_ready may
> lead to deadlock like I think.
>
> It might be possible to change the order in the send path to do this.
> Something like:
>
> trylock on lower socket lock
> -if trylock fails
>   - release kcm sock lock
>   - lock lower sock
>   - lock kcm sock
> - call sendpage locked function
>
> I admit that dealing with two levels of socket locks in the data path
> is quite a pain :-)

up

still happening and we've lost 50K+ test VMs on this


Re: WARNING in strp_data_ready

2017-10-30 Thread Tom Herbert
On Mon, Oct 30, 2017 at 2:44 PM, John Fastabend
 wrote:
> On 10/24/2017 08:20 AM, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
>> include/net/sock.h:1505 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
>> include/net/sock.h:1511 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  panic+0x1e4/0x417 kernel/panic.c:181
>>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>>  report_bug+0x211/0x2d0 lib/bug.c:183
>>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> RSP: 0018:8801db206b18 EFLAGS: 00010206
>> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
>> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
>> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
>> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
>> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>
> Looks like KCM is calling sk_data_ready() without first taking the
> sock lock.
>
> /* Called with lower sock held */
> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
> {
>  [...]
> if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>
> In this case kcm->sk is not the same lock the comment is referring to.
> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>
> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
> I don't have anything better in mind immediately.
>
The sock locks are taken in reverse order in the send path so so
grabbing kcm sock lock with lower lock held to call sk_data_ready may
lead to deadlock like I think.

It might be possible to change the order in the send path to do this.
Something like:

trylock on lower socket lock
-if trylock fails
  - release kcm sock lock
  - lock lower sock
  - lock kcm sock
- call sendpage locked function

I admit that dealing with two levels of socket locks in the data path
is quite a pain :-)

Tom

> Thanks,
> John


Re: WARNING in strp_data_ready

2017-10-30 Thread John Fastabend
On 10/24/2017 08:20 AM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
> include/net/sock.h:1505 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
> include/net/sock.h:1511 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  panic+0x1e4/0x417 kernel/panic.c:181
>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> RSP: 0018:8801db206b18 EFLAGS: 00010206
> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353

Looks like KCM is calling sk_data_ready() without first taking the
sock lock.

/* Called with lower sock held */
static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
{
 [...]
if (kcm_queue_rcv_skb(&kcm->sk, skb)) {

In this case kcm->sk is not the same lock the comment is referring to.
And kcm_queue_rcv_skb() will eventually call sk_data_ready().

@Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
I don't have anything better in mind immediately.

Thanks,
John