Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Ard Biesheuvel
On 14 October 2016 at 11:00, Johannes Berg  wrote:
>
>> So why is the performance hit acceptable for ESP but not for WPA? We
>> could easily implement the same thing, i.e.,
>> kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it
>> on the stack
>
> Yeah, maybe we should. It's likely a much bigger allocation, but I
> don't actually know if that affects speed.
>
> In most cases where you want high performance we never hit this anyway
> since we'll have hardware crypto. I know for our (Intel's) devices we
> normally never hit these code paths.
>
> But on the other hand, you also did your changes for a reason, and the
> only reason I can see of that is performance. So you'd be the one with
> most "skin in the game", I guess?
>

Well, what sucks here is that the accelerated driver I implemented for
arm64 does not actually need this, as long as we take aad[] off the
stack. And note that the API was changed since my patch, to add aad[]
to the scatterlist: prior to this change, it used
aead_request_set_assoc() to set the associated data separately.


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Johannes Berg

> So why is the performance hit acceptable for ESP but not for WPA? We
> could easily implement the same thing, i.e.,
> kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it
> on the stack

Yeah, maybe we should. It's likely a much bigger allocation, but I
don't actually know if that affects speed.

In most cases where you want high performance we never hit this anyway
since we'll have hardware crypto. I know for our (Intel's) devices we
normally never hit these code paths.

But on the other hand, you also did your changes for a reason, and the
only reason I can see of that is performance. So you'd be the one with
most "skin in the game", I guess?

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Ard Biesheuvel
On 14 October 2016 at 09:39, Ard Biesheuvel  wrote:
> On 14 October 2016 at 09:28, Johannes Berg  wrote:
>>
>>>1. revert that patch (doing so would need some major adjustments now,
>>>   since it's pretty old and a number of new things were added in the
>>>   meantime)
>>
>> This it will have to be, I guess.
>>
>>>2. allocate a per-CPU buffer for all the things that we put on the
>>>   stack and use in SG lists, those are:
>>>* CCM/GCM: AAD (32B), B_0/J_0 (16B)
>>>* GMAC: AAD (20B), zero (16B)
>>>* (not sure why CMAC isn't using this API, but it would be like GMAC)
>>
>> This doesn't work - I tried to move the mac80211 buffers, but because
>> we also put the struct aead_request on the stack, and crypto_ccm has
>> the "odata" in there, and we can't separate the odata from that struct,
>> we'd have to also put that into a per-CPU buffer, but it's very big -
>> 456 bytes for CCM, didn't measure the others but I'd expect them to be
>> larger, if different.
>>
>> I don't think we can allocate half a kb for each CPU just to be able to
>> possibly use the acceleration here. We can't even make that conditional
>> on not having hardware crypto in the wifi NIC because drivers are
>> always allowed to pass undecrypted frames, regardless of whether or not
>> HW crypto was attempted, so we don't know upfront if we'll have to
>> decrypt anything in software...
>>
>> Given that, I think we have had a bug in here basically since Ard's
>> patch, we never should've put these structs on the stack. Herbert, you
>> also touched this later and converted the API usage, did you see the
>> way the stack is used here and think it should be OK, or did you simply
>> not realize that?
>>
>> Ard, are you able to help out working on a revert of your patch? That
>> would require also reverting a number of other patches (various fixes,
>> API adjustments, etc. to the AEAD usage), but the more complicated part
>> is that in the meantime Jouni introduced GCMP and CCMP-256, both of
>> which we of course need to retain.
>>
>
> I am missing some context here, but could you explain what exactly is
> the problem here?
>
> Look at this code
>
> """
> struct scatterlist sg[3];
>
> char aead_req_data[sizeof(struct aead_request) +
> crypto_aead_reqsize(tfm)]
> __aligned(__alignof__(struct aead_request));
> struct aead_request *aead_req = (void *) aead_req_data;
>
> memset(aead_req, 0, sizeof(aead_req_data));
>
> sg_init_table(sg, 3);
> sg_set_buf([0], [2], be16_to_cpup((__be16 *)aad));
> sg_set_buf([1], data, data_len);
> sg_set_buf([2], mic, mic_len);
>
> aead_request_set_tfm(aead_req, tfm);
> aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
> aead_request_set_ad(aead_req, sg[0].length);
> """
>
> I assume the stack buffer itself is not the problem here, but aad,
> which is allocated on the stack one frame up.
> Do we really need to revert the whole patch to fix that?

Ah never mind, this is about 'odata'. Apologies, should have read first


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Sergey Senozhatsky
On (10/13/16 14:49), Andy Lutomirski wrote:
[..]
> > >  FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02
> > > >> 39) == 130
> >
> > Yeah, we already know that in this function the aad variable is on the
> > stack, it explicitly is.
> >
> > The question, though, is why precisely that fails in the crypto code.
> > Can you send the Oops report itself?
> >
> 
> It's failing before that.  With CONFIG_VMAP_STACK=y, the stack may not
> be physically contiguous and can't be used for DMA, so putting it in a
> scatterlist is bogus in general, and the crypto code mostly wants a
> scatterlist.
> 
> There are a couple (faster!) APIs for crypto that don't use
> scatterlists, but I don't think AEAD works with them.

given that we have a known issue shouldn't VMAP_STACK be
disabled for now, or would you rather prefer to mark MAC80211
as incompatible: "depends on CFG80211 && !VMAP_STACK"?

-ss


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Ard Biesheuvel
On 14 October 2016 at 09:28, Johannes Berg  wrote:
>
>>1. revert that patch (doing so would need some major adjustments now,
>>   since it's pretty old and a number of new things were added in the
>>   meantime)
>
> This it will have to be, I guess.
>
>>2. allocate a per-CPU buffer for all the things that we put on the
>>   stack and use in SG lists, those are:
>>* CCM/GCM: AAD (32B), B_0/J_0 (16B)
>>* GMAC: AAD (20B), zero (16B)
>>* (not sure why CMAC isn't using this API, but it would be like GMAC)
>
> This doesn't work - I tried to move the mac80211 buffers, but because
> we also put the struct aead_request on the stack, and crypto_ccm has
> the "odata" in there, and we can't separate the odata from that struct,
> we'd have to also put that into a per-CPU buffer, but it's very big -
> 456 bytes for CCM, didn't measure the others but I'd expect them to be
> larger, if different.
>
> I don't think we can allocate half a kb for each CPU just to be able to
> possibly use the acceleration here. We can't even make that conditional
> on not having hardware crypto in the wifi NIC because drivers are
> always allowed to pass undecrypted frames, regardless of whether or not
> HW crypto was attempted, so we don't know upfront if we'll have to
> decrypt anything in software...
>
> Given that, I think we have had a bug in here basically since Ard's
> patch, we never should've put these structs on the stack. Herbert, you
> also touched this later and converted the API usage, did you see the
> way the stack is used here and think it should be OK, or did you simply
> not realize that?
>
> Ard, are you able to help out working on a revert of your patch? That
> would require also reverting a number of other patches (various fixes,
> API adjustments, etc. to the AEAD usage), but the more complicated part
> is that in the meantime Jouni introduced GCMP and CCMP-256, both of
> which we of course need to retain.
>

I am missing some context here, but could you explain what exactly is
the problem here?

Look at this code

"""
struct scatterlist sg[3];

char aead_req_data[sizeof(struct aead_request) +
crypto_aead_reqsize(tfm)]
__aligned(__alignof__(struct aead_request));
struct aead_request *aead_req = (void *) aead_req_data;

memset(aead_req, 0, sizeof(aead_req_data));

sg_init_table(sg, 3);
sg_set_buf([0], [2], be16_to_cpup((__be16 *)aad));
sg_set_buf([1], data, data_len);
sg_set_buf([2], mic, mic_len);

aead_request_set_tfm(aead_req, tfm);
aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
aead_request_set_ad(aead_req, sg[0].length);
"""

I assume the stack buffer itself is not the problem here, but aad,
which is allocated on the stack one frame up.
Do we really need to revert the whole patch to fix that?


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Johannes Berg

>    1. revert that patch (doing so would need some major adjustments now,
>   since it's pretty old and a number of new things were added in the
>   meantime)

This it will have to be, I guess.

>    2. allocate a per-CPU buffer for all the things that we put on the
>   stack and use in SG lists, those are:
>    * CCM/GCM: AAD (32B), B_0/J_0 (16B)
>    * GMAC: AAD (20B), zero (16B)
>    * (not sure why CMAC isn't using this API, but it would be like GMAC)

This doesn't work - I tried to move the mac80211 buffers, but because
we also put the struct aead_request on the stack, and crypto_ccm has
the "odata" in there, and we can't separate the odata from that struct,
we'd have to also put that into a per-CPU buffer, but it's very big -
456 bytes for CCM, didn't measure the others but I'd expect them to be
larger, if different.

I don't think we can allocate half a kb for each CPU just to be able to
possibly use the acceleration here. We can't even make that conditional
on not having hardware crypto in the wifi NIC because drivers are
always allowed to pass undecrypted frames, regardless of whether or not
HW crypto was attempted, so we don't know upfront if we'll have to
decrypt anything in software...

Given that, I think we have had a bug in here basically since Ard's
patch, we never should've put these structs on the stack. Herbert, you
also touched this later and converted the API usage, did you see the
way the stack is used here and think it should be OK, or did you simply
not realize that?

Ard, are you able to help out working on a revert of your patch? That
would require also reverting a number of other patches (various fixes,
API adjustments, etc. to the AEAD usage), but the more complicated part
is that in the meantime Jouni introduced GCMP and CCMP-256, both of
which we of course need to retain.

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-14 Thread Johannes Berg
On Thu, 2016-10-13 at 14:49 -0700, Andy Lutomirski wrote:
> 
> It's failing before that.  With CONFIG_VMAP_STACK=y, the stack may
> not be physically contiguous and can't be used for DMA, so putting it
> in a scatterlist is bogus in general, and the crypto code mostly
> wants a scatterlist.

I see, so all this stuff is getting inlined, and we crash in
sg_set_buf() because it does sg_set_page() and that obviously needs to
do virt_to_page(), which is invalid on this address now.
With CONFIG_DEBUG_SG we'd have hit the BUG_ON there instead.

It does indeed look like AEAD doesn't have any non-SG API.

So ultimately, the bug already goes back to Ard's commit 7ec7c4a9a686
("mac80211: port CCMP to cryptoapi's CCM driver") since that already
potentially used stack space for DMA.

Since we don't have any space in the SKB or anywhere else at this point
(other than the stack that we can't use), I see two ways out of this:

   1. revert that patch (doing so would need some major adjustments now,
  since it's pretty old and a number of new things were added in the
  meantime)
   2. allocate a per-CPU buffer for all the things that we put on the
  stack and use in SG lists, those are:
   * CCM/GCM: AAD (32B), B_0/J_0 (16B)
   * GMAC: AAD (20B), zero (16B)
   * (not sure why CMAC isn't using this API, but it would be like
  GMAC)

Thoughts?

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Sergey Senozhatsky
On (10/14/16 00:00), Sergey Senozhatsky wrote:
> kernel:  [] ieee80211_crypto_ccmp_decrypt+0x204/0x298
> kernel:  [] ieee80211_rx_handlers+0x7df/0x1c1d
> kernel:  [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
> kernel:  [] ? ieee80211_rx_napi+0x168/0x7b6
> kernel:  [] ieee80211_rx_napi+0x48b/0x7b6
> kernel:  [] ? debug_smp_processor_id+0x17/0x19
> kernel:  [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
> kernel:  [] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
> kernel:  [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
> kernel:  [] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi]
> kernel:  [] ? irq_finalize_oneshot+0xd4/0xd4
> kernel:  [] irq_thread_fn+0x1d/0x34
> kernel:  [] irq_thread+0xe6/0x1bb
> kernel:  [] ? wake_threads_waitq+0x2c/0x2c
> kernel:  [] ? irq_thread_dtor+0x95/0x95
> kernel:  [] kthread+0xfc/0x104
> kernel:  [] ? put_lock_stats.isra.9+0xe/0x20
> kernel:  [] ? kthread_create_on_node+0x3f/0x3f
> kernel:  [] ret_from_fork+0x22/0x30
> kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 
> c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 
> 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d 
> kernel: RIP  [] ieee80211_aes_ccm_decrypt+0x107/0x27f

8146d1ed :
8146d1ed:   e8 9e 67 04 00  callq  814b3990 
<__fentry__>
8146d1f2:   55  push   %rbp
8146d1f3:   48 89 e5mov%rsp,%rbp
8146d1f6:   41 57   push   %r15
8146d1f8:   41 56   push   %r14
8146d1fa:   49 89 cemov%rcx,%r14
8146d1fd:   41 55   push   %r13
8146d1ff:   41 54   push   %r12
8146d201:   53  push   %rbx
8146d202:   48 83 c4 80 add$0xff80,%rsp
8146d206:   8b 47 04mov0x4(%rdi),%eax
8146d209:   48 8d 48 50 lea0x50(%rax),%rcx
8146d20d:   48 83 c0 5e add$0x5e,%rax
8146d211:   48 c1 e8 03 shr$0x3,%rax
8146d215:   48 c1 e0 03 shl$0x3,%rax
8146d219:   48 29 c4sub%rax,%rsp
8146d21c:   4c 8d 7c 24 07  lea0x7(%rsp),%r15
8146d221:   49 c1 ef 03 shr$0x3,%r15
8146d225:   4d 85 c0test   %r8,%r8
8146d228:   4a 8d 04 fd 00 00 00lea0x0(,%r15,8),%rax
8146d22f:   00 
8146d230:   48 89 85 70 ff ff ffmov%rax,-0x90(%rbp)
8146d237:   75 0a   jne8146d243 

8146d239:   b8 ea ff ff ff  mov$0xffea,%eax
8146d23e:   e9 1a 02 00 00  jmpq   8146d45d 

8146d243:   31 c0   xor%eax,%eax
8146d245:   49 89 fcmov%rdi,%r12
8146d248:   49 89 f5mov%rsi,%r13
8146d24b:   4c 89 85 58 ff ff ffmov%r8,-0xa8(%rbp)
8146d252:   4a 8d 3c fd 00 00 00lea0x0(,%r15,8),%rdi
8146d259:   00 
8146d25a:   be 03 00 00 00  mov$0x3,%esi
8146d25f:   4c 89 cbmov%r9,%rbx
8146d262:   48 89 95 60 ff ff ffmov%rdx,-0xa0(%rbp)
8146d269:   f3 aa   rep stos %al,%es:(%rdi)
8146d26b:   48 8d 85 78 ff ff fflea-0x88(%rbp),%rax
8146d272:   48 89 c7mov%rax,%rdi
8146d275:   48 89 85 68 ff ff ffmov%rax,-0x98(%rbp)
8146d27c:   e8 46 06 dc ff  callq  8122d8c7 

8146d281:   48 8b 95 60 ff ff ffmov-0xa0(%rbp),%rdx
8146d288:   41 b9 00 00 00 80   mov$0x8000,%r9d
8146d28e:   48 8b 0d 7b cd 39 00mov0x39cd7b(%rip),%rcx  
  # 8180a010 
8146d295:   48 8b 85 68 ff ff ffmov-0x98(%rbp),%rax
8146d29c:   4c 8b 85 58 ff ff ffmov-0xa8(%rbp),%r8
8146d2a3:   0f b7 32movzwl (%rdx),%esi
8146d2a6:   48 83 c2 02 add$0x2,%rdx
8146d2aa:   89 d7   mov%edx,%edi
8146d2ac:   81 e7 ff 0f 00 00   and$0xfff,%edi
8146d2b2:   66 c1 c6 08 rol$0x8,%si
8146d2b6:   4c 01 caadd%r9,%rdx
8146d2b9:   0f b7 f6movzwl %si,%esi
8146d2bc:   72 0a   jb 8146d2c8 

8146d2be:   48 b9 00 00 00 80 ffmovabs $0x77ff8000,%rcx
8146d2c5:   77 00 00 
8146d2c8:   48 01 caadd%rcx,%rdx

Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Sergey Senozhatsky
On (10/13/16 15:45), Johannes Berg wrote:
> On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote:
> > 
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> > > 
> > > That truly sounds like something we'd rather avoid in the TX/RX
> > > paths though, which should perform well.
> > 
> > didn't fix.
> 
> It couldn't, since the new helpers weren't used in mac80211 in those
> patches yet.

indeed. I thought they were.

> >  FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02
> > >> 39) == 130
> 
> The question, though, is why precisely that fails in the crypto code.
> Can you send the Oops report itself?

kernel: BUG: unable to handle kernel NULL pointer dereference at   
(null)
kernel: IP: [] ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel: PGD 0 
kernel: 
kernel: Oops:  [#1] PREEMPT SMP
kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat mousedev psmouse 
serio_raw atkbd libps2 i915 coretemp i2c_algo_bit hwmon crc32c_intel mxm_wmi 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect iwlmvm sysimgblt 
fb_sys_fops i2c_i801 cfbcopyarea ie31200_edac drm iwlwifi i2c
kernel: CPU: 3 PID: 245 Comm: irq/28-iwlwifi Not tainted 
4.8.0-next-20161013-dbg-2-ge789862-dirty #112
kernel: task: 88041bf01800 task.stack: c92d
kernel: RIP: 0010:[]  [] 
ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel: RSP: 0018:c92d3770  EFLAGS: 00010246
kernel: RAX: c92d3930 RBX: 8804133cf606 RCX: 00082000
kernel: RDX:  RSI: 0018 RDI: 0a02
kernel: RBP: c92d39b8 R08: 05e4 R09: 000412d3
kernel: R10: 001c R11: 8803e66d2d20 R12: 8804191c2780
kernel: R13: c92d39f0 R14: 8804133cf022 R15: 1925a6ee
kernel: FS:  () GS:88041ea0() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 80050033
kernel: CR2:  CR3: 01805000 CR4: 001406e0
kernel: Stack:
kernel:     
kernel:     
kernel:     
kernel: Call Trace:
kernel:  [] ieee80211_crypto_ccmp_decrypt+0x204/0x298
kernel:  [] ieee80211_rx_handlers+0x7df/0x1c1d
kernel:  [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
kernel:  [] ? ieee80211_rx_napi+0x168/0x7b6
kernel:  [] ieee80211_rx_napi+0x48b/0x7b6
kernel:  [] ? debug_smp_processor_id+0x17/0x19
kernel:  [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
kernel:  [] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
kernel:  [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
kernel:  [] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi]
kernel:  [] ? irq_finalize_oneshot+0xd4/0xd4
kernel:  [] irq_thread_fn+0x1d/0x34
kernel:  [] irq_thread+0xe6/0x1bb
kernel:  [] ? wake_threads_waitq+0x2c/0x2c
kernel:  [] ? irq_thread_dtor+0x95/0x95
kernel:  [] kthread+0xfc/0x104
kernel:  [] ? put_lock_stats.isra.9+0xe/0x20
kernel:  [] ? kthread_create_on_node+0x3f/0x3f
kernel:  [] ret_from_fork+0x22/0x30
kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 
e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 
e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d 
kernel: RIP  [] ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel:  RSP 
kernel: CR2: 
kernel: ---[ end trace 3cd1fcd496516f72 ]---

-ss


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Johannes Berg
On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote:
> 
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > t/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > t/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> > 
> > That truly sounds like something we'd rather avoid in the TX/RX
> > paths though, which should perform well.
> 
> didn't fix.

It couldn't, since the new helpers weren't used in mac80211 in those
patches yet.

> so I finally had some time to do a better bug-reporter job.
> 
> I added a bunch of printk-s and several virt_addr_valid()-s
> to ieee80211_aes_ccm_encrypt().
> 
> and right befoe the Oops I see the following report from
> virt_addr_valid()
> 
> 
>  FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02
> >> 39) == 130

Yeah, we already know that in this function the aad variable is on the
stack, it explicitly is.

The question, though, is why precisely that fails in the crypto code.
Can you send the Oops report itself?

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Sergey Senozhatsky
On (10/13/16 22:42), Sergey Senozhatsky wrote:
> 
> On (10/13/16 08:02), Johannes Berg wrote:
> > On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:
> > 
> > > In a pinch, I have these patches sitting around:
> > > 
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> > 
> > That truly sounds like something we'd rather avoid in the TX/RX paths
> > though, which should perform well.
> 
> didn't fix.
> 
> so I finally had some time to do a better bug-reporter job.
> 
> I added a bunch of printk-s and several virt_addr_valid()-s
> to ieee80211_aes_ccm_encrypt().
> 
> and right befoe the Oops I see the following report from
> virt_addr_valid()
> 
> 
>  FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 >> 39) 
> == 130

that  `(412cba02 >> 39) == 130'   part is

phys_addr_valid()
{
(addr >> boot_cpu_data.x86_phys_bits)
}

-ss


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Sergey Senozhatsky
On (10/13/16 08:02), Johannes Berg wrote:
> On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:
> 
> > In a pinch, I have these patches sitting around:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> 
> That truly sounds like something we'd rather avoid in the TX/RX paths
> though, which should perform well.

didn't fix.

so I finally had some time to do a better bug-reporter job.

I added a bunch of printk-s and several virt_addr_valid()-s
to ieee80211_aes_ccm_encrypt().

and right befoe the Oops I see the following report from
virt_addr_valid()


 FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 >> 39) == 
130


which is basically failed '!phys_addr_valid(x)' in __virt_addr_valid()

/* carry flag will be set if starting x was >= PAGE_OFFSET */
if ((x > y) || !phys_addr_valid(x))
return false;

backtrace

 [ cut here ]
 WARNING: CPU: 7 PID: 246 at arch/x86/mm/physaddr.c:68 
__virt_addr_valid+0xab/0xed
  c92cb6f0 8122168c  
  c92cb730 810428d8 00440198 88041bd21022
  c92cba02 192596ed 88041932d1e0 c92cba00
 Call Trace:
  [] dump_stack+0x4f/0x65
  [] __warn+0xc2/0xdd
  [] warn_slowpath_null+0x1d/0x1f
  [] __virt_addr_valid+0xab/0xed
  [] ieee80211_aes_ccm_decrypt+0x8f/0x2da
  [] ? debug_smp_processor_id+0x17/0x19
  [] ? __put_page+0x3c/0x3f
  [] ? ccmp_special_blocks.isra.1+0x51/0x12d
  [] ieee80211_crypto_ccmp_decrypt+0x204/0x298
  [] ieee80211_rx_handlers+0x7df/0x1c1d
  [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
  [] ? ieee80211_rx_napi+0x154/0x7a5
  [] ieee80211_rx_napi+0x474/0x7a5
  [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
  [] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
  [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
  [] iwl_pcie_irq_handler+0x45d/0x64d [iwlwifi]
  [] ? irq_finalize_oneshot+0xd4/0xd4
  [] irq_thread_fn+0x1d/0x34
  [] irq_thread+0xe6/0x1bb
  [] ? wake_threads_waitq+0x2c/0x2c
  [] ? irq_thread_dtor+0x95/0x95
  [] kthread+0xfc/0x104
  [] ? put_lock_stats.isra.9+0xe/0x20
  [] ? kthread_create_on_node+0x3f/0x3f
  [] ? kthread_create_on_node+0x3f/0x3f
  [] ? kthread_create_on_node+0x3f/0x3f
  [] ret_from_fork+0x22/0x30

-ss


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-13 Thread Johannes Berg
On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:

> In a pinch, I have these patches sitting around:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8

That truly sounds like something we'd rather avoid in the TX/RX paths
though, which should perform well.

> I don't like them, though.  I think it's rather silly that we can't
> just pass virtual addresses to the crypto code.

I don't really understand it either, hence my question about the actual
crash. I'll try to reproduce it in a VM.

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-12 Thread Johannes Berg

> > Can you elaborate on how exactly it kills your system?
> 
> the last time I saw it it was a NULL deref at
> ieee80211_aes_ccm_decrypt.

Hm. I was expecting something within the crypto code would cause the
crash, this seems strange.

Anyway, I'm surely out of my depth wrt. the actual cause. Something
like the patch below probably works around it, but it's horribly
inefficient due to the locking and doesn't cover CMAC/GMAC either.

johannes

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 103187ca9474..e820f437f02e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1224,6 +1225,10 @@ struct ieee80211_local {
 
spinlock_t rx_path_lock;
 
+   /* temporary buffers for software crypto */
+   u8 aad[2 * AES_BLOCK_SIZE];
+   u8 b_0[AES_BLOCK_SIZE];
+
/* Station data */
/*
 * The mutex only protects the list, hash table and
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index b48c1e13e281..a3f17a710b85 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -405,8 +405,8 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, 
struct sk_buff *skb,
u8 *pos;
u8 pn[6];
u64 pn64;
-   u8 aad[2 * AES_BLOCK_SIZE];
-   u8 b_0[AES_BLOCK_SIZE];
+   u8 *aad = tx->local->aad;
+   u8 *b_0 = tx->local->b_0;
 
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -460,9 +460,11 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, 
struct sk_buff *skb,
return 0;
 
pos += IEEE80211_CCMP_HDR_LEN;
+   spin_lock_bh(>local->rx_path_lock);
ccmp_special_blocks(skb, pn, b_0, aad);
ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad, pos, len,
  skb_put(skb, mic_len), mic_len);
+   spin_unlock_bh(>local->rx_path_lock);
 
return 0;
 }
@@ -534,8 +536,9 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx,
}
 
if (!(status->flag & RX_FLAG_DECRYPTED)) {
-   u8 aad[2 * AES_BLOCK_SIZE];
-   u8 b_0[AES_BLOCK_SIZE];
+   u8 *aad = rx->local->aad;
+   u8 *b_0 = rx->local->b_0;
+
/* hardware didn't decrypt/verify MIC */
ccmp_special_blocks(skb, pn, b_0, aad);
 
@@ -639,8 +642,8 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, 
struct sk_buff *skb)
u8 *pos;
u8 pn[6];
u64 pn64;
-   u8 aad[2 * AES_BLOCK_SIZE];
-   u8 j_0[AES_BLOCK_SIZE];
+   u8 *aad = tx->local->aad;
+   u8 *j_0 = tx->local->b_0;
 
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -695,9 +698,11 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, 
struct sk_buff *skb)
return 0;
 
pos += IEEE80211_GCMP_HDR_LEN;
+   spin_lock_bh(>local->rx_path_lock);
gcmp_special_blocks(skb, pn, j_0, aad);
ieee80211_aes_gcm_encrypt(key->u.gcmp.tfm, j_0, aad, pos, len,
  skb_put(skb, IEEE80211_GCMP_MIC_LEN));
+   spin_unlock_bh(>local->rx_path_lock);
 
return 0;
 }
@@ -764,8 +769,9 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx)
}
 
if (!(status->flag & RX_FLAG_DECRYPTED)) {
-   u8 aad[2 * AES_BLOCK_SIZE];
-   u8 j_0[AES_BLOCK_SIZE];
+   u8 *aad = rx->local->aad;
+   u8 *j_0 = rx->local->b_0;
+
/* hardware didn't decrypt/verify MIC */
gcmp_special_blocks(skb, pn, j_0, aad);
 


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-12 Thread Sergey Senozhatsky
Hello,

On (10/12/16 11:05), Johannes Berg wrote:
> Sorry - I meant to look into this yesterday but forgot.
> 
> > Andy, can this be related to CONFIG_VMAP_STACK?
> 
> I think it is.

yeah, the system works fine with !CONFIG_VMAP_STACK.

> > > current -git kills my system.
> 
> Can you elaborate on how exactly it kills your system?

the last time I saw it it was a NULL deref at ieee80211_aes_ccm_decrypt.

-ss


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-12 Thread Johannes Berg
Hi,

Sorry - I meant to look into this yesterday but forgot.

> Andy, can this be related to CONFIG_VMAP_STACK?

I think it is.

> > current -git kills my system. 

Can you elaborate on how exactly it kills your system?

> > adding
> > 
> > if (!virt_addr_valid([2])) {
> > WARN_ON(1);
> > return -EINVAL;
> > }

That's pretty obviously false with VMAP_STACK, since the caller
(ieee80211_crypto_ccmp_decrypt) puts the aad on the stack. b_0 is also
on the stack, but maybe that doesn't matter.

Herbert, do you know what could cause this, and how we should fix it?

We can't really afford to do an allocation here, and we don't have
space in the skb (not even in skb->cb at that point), so if we really
have no way to continue using the stack we'd ... not sure, use a per-
CPU buffer perhaps.
We need 32 bytes for aad and 16 bytes for b_0, if that also can't be on
the stack any more.

johannes


Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)

2016-10-10 Thread Sergey Senozhatsky
Cc Andy

Andy, can this be related to CONFIG_VMAP_STACK?

On (10/11/16 00:03), Sergey Senozhatsky wrote:
> Hello,
> 
> current -git kills my system. adding
> 
>   if (!virt_addr_valid([2])) {
>   WARN_ON(1);
>   return -EINVAL;
>   }
> 
> to ieee80211_aes_ccm_decrypt() given the following backtrace
> 
>  WARNING: CPU: 5 PID: 252 at net/mac80211/aes_ccm.c:77 
> ieee80211_aes_ccm_decrypt+0xc8/0x197
>  CPU: 5 PID: 252 Comm: irq/29-iwlwifi Tainted: GW   
> 4.8.0-next-20161010-dbg-7-g79797e9-dirty #88
>   c9413638 811ff0e3  
>   c9413678 8103fe91 004d01c8 192826d3
>   88040fc526d8 0008 c9413978 c941397a
>  Call Trace:
>   [] dump_stack+0x4f/0x65
>   [] __warn+0xc2/0xdd
>   [] warn_slowpath_null+0x1d/0x1f
>   [] ieee80211_aes_ccm_decrypt+0xc8/0x197
>   [] ? __put_page+0x3c/0x3f
>   [] ? put_page+0x4a/0x62
>   [] ? __pskb_pull_tail+0x1e8/0x279
>   [] ? ccmp_special_blocks.isra.5+0x51/0x12d
>   [] ieee80211_crypto_ccmp_decrypt+0x1ba/0x221
>   [] ieee80211_rx_handlers+0x52a/0x19c2
>   [] ? start_dl_timer+0xa8/0xb4
>   [] ? put_lock_stats.isra.24+0xe/0x20
>   [] ? del_timer+0x57/0x61
>   [] ieee80211_prepare_and_rx_handle+0xcd6/0xd2a
>   [] ? local_clock+0x10/0x12
>   [] ? __lock_acquire.isra.31+0x202/0x57e
>   [] ? rcu_read_unlock+0x23/0x23
>   [] ? sched_clock_cpu+0x17/0xc6
>   [] ieee80211_rx_napi+0x5af/0x698
>   [] ? get_lock_stats+0x19/0x50
>   [] ? put_lock_stats.isra.24+0xe/0x20
>   [] iwl_mvm_rx_rx_mpdu+0x5ab/0x60c [iwlmvm]
>   [] ? get_lock_stats+0x19/0x50
>   [] iwl_mvm_rx+0x45/0x69 [iwlmvm]
>   [] iwl_pcie_rx_handle+0x478/0x584 [iwlwifi]
>   [] iwl_pcie_irq_handler+0x39c/0x52d [iwlwifi]
>   [] ? irq_finalize_oneshot+0xa7/0xa7
>   [] irq_thread_fn+0x1d/0x34
>   [] irq_thread+0xe6/0x1bb
>   [] ? wake_threads_waitq+0x2c/0x2c
>   [] ? irq_thread_dtor+0x95/0x95
>   [] kthread+0xc6/0xce
>   [] ? put_lock_stats.isra.24+0xe/0x20
>   [] ? __list_del_entry+0x22/0x22
>   [] ret_from_fork+0x22/0x30
>  ---[ end trace 94da6d4698b938b2 ]---

-ss