Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On 14 October 2016 at 11:00, Johannes Bergwrote: > >> So why is the performance hit acceptable for ESP but not for WPA? We >> could easily implement the same thing, i.e., >> kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it >> on the stack > > Yeah, maybe we should. It's likely a much bigger allocation, but I > don't actually know if that affects speed. > > In most cases where you want high performance we never hit this anyway > since we'll have hardware crypto. I know for our (Intel's) devices we > normally never hit these code paths. > > But on the other hand, you also did your changes for a reason, and the > only reason I can see of that is performance. So you'd be the one with > most "skin in the game", I guess? > Well, what sucks here is that the accelerated driver I implemented for arm64 does not actually need this, as long as we take aad[] off the stack. And note that the API was changed since my patch, to add aad[] to the scatterlist: prior to this change, it used aead_request_set_assoc() to set the associated data separately.
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
> So why is the performance hit acceptable for ESP but not for WPA? We > could easily implement the same thing, i.e., > kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it > on the stack Yeah, maybe we should. It's likely a much bigger allocation, but I don't actually know if that affects speed. In most cases where you want high performance we never hit this anyway since we'll have hardware crypto. I know for our (Intel's) devices we normally never hit these code paths. But on the other hand, you also did your changes for a reason, and the only reason I can see of that is performance. So you'd be the one with most "skin in the game", I guess? johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On 14 October 2016 at 09:39, Ard Biesheuvelwrote: > On 14 October 2016 at 09:28, Johannes Berg wrote: >> >>>1. revert that patch (doing so would need some major adjustments now, >>> since it's pretty old and a number of new things were added in the >>> meantime) >> >> This it will have to be, I guess. >> >>>2. allocate a per-CPU buffer for all the things that we put on the >>> stack and use in SG lists, those are: >>>* CCM/GCM: AAD (32B), B_0/J_0 (16B) >>>* GMAC: AAD (20B), zero (16B) >>>* (not sure why CMAC isn't using this API, but it would be like GMAC) >> >> This doesn't work - I tried to move the mac80211 buffers, but because >> we also put the struct aead_request on the stack, and crypto_ccm has >> the "odata" in there, and we can't separate the odata from that struct, >> we'd have to also put that into a per-CPU buffer, but it's very big - >> 456 bytes for CCM, didn't measure the others but I'd expect them to be >> larger, if different. >> >> I don't think we can allocate half a kb for each CPU just to be able to >> possibly use the acceleration here. We can't even make that conditional >> on not having hardware crypto in the wifi NIC because drivers are >> always allowed to pass undecrypted frames, regardless of whether or not >> HW crypto was attempted, so we don't know upfront if we'll have to >> decrypt anything in software... >> >> Given that, I think we have had a bug in here basically since Ard's >> patch, we never should've put these structs on the stack. Herbert, you >> also touched this later and converted the API usage, did you see the >> way the stack is used here and think it should be OK, or did you simply >> not realize that? >> >> Ard, are you able to help out working on a revert of your patch? That >> would require also reverting a number of other patches (various fixes, >> API adjustments, etc. to the AEAD usage), but the more complicated part >> is that in the meantime Jouni introduced GCMP and CCMP-256, both of >> which we of course need to retain. >> > > I am missing some context here, but could you explain what exactly is > the problem here? > > Look at this code > > """ > struct scatterlist sg[3]; > > char aead_req_data[sizeof(struct aead_request) + > crypto_aead_reqsize(tfm)] > __aligned(__alignof__(struct aead_request)); > struct aead_request *aead_req = (void *) aead_req_data; > > memset(aead_req, 0, sizeof(aead_req_data)); > > sg_init_table(sg, 3); > sg_set_buf([0], [2], be16_to_cpup((__be16 *)aad)); > sg_set_buf([1], data, data_len); > sg_set_buf([2], mic, mic_len); > > aead_request_set_tfm(aead_req, tfm); > aead_request_set_crypt(aead_req, sg, sg, data_len, b_0); > aead_request_set_ad(aead_req, sg[0].length); > """ > > I assume the stack buffer itself is not the problem here, but aad, > which is allocated on the stack one frame up. > Do we really need to revert the whole patch to fix that? Ah never mind, this is about 'odata'. Apologies, should have read first
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On (10/13/16 14:49), Andy Lutomirski wrote: [..] > > > FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 > > > >> 39) == 130 > > > > Yeah, we already know that in this function the aad variable is on the > > stack, it explicitly is. > > > > The question, though, is why precisely that fails in the crypto code. > > Can you send the Oops report itself? > > > > It's failing before that. With CONFIG_VMAP_STACK=y, the stack may not > be physically contiguous and can't be used for DMA, so putting it in a > scatterlist is bogus in general, and the crypto code mostly wants a > scatterlist. > > There are a couple (faster!) APIs for crypto that don't use > scatterlists, but I don't think AEAD works with them. given that we have a known issue shouldn't VMAP_STACK be disabled for now, or would you rather prefer to mark MAC80211 as incompatible: "depends on CFG80211 && !VMAP_STACK"? -ss
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On 14 October 2016 at 09:28, Johannes Bergwrote: > >>1. revert that patch (doing so would need some major adjustments now, >> since it's pretty old and a number of new things were added in the >> meantime) > > This it will have to be, I guess. > >>2. allocate a per-CPU buffer for all the things that we put on the >> stack and use in SG lists, those are: >>* CCM/GCM: AAD (32B), B_0/J_0 (16B) >>* GMAC: AAD (20B), zero (16B) >>* (not sure why CMAC isn't using this API, but it would be like GMAC) > > This doesn't work - I tried to move the mac80211 buffers, but because > we also put the struct aead_request on the stack, and crypto_ccm has > the "odata" in there, and we can't separate the odata from that struct, > we'd have to also put that into a per-CPU buffer, but it's very big - > 456 bytes for CCM, didn't measure the others but I'd expect them to be > larger, if different. > > I don't think we can allocate half a kb for each CPU just to be able to > possibly use the acceleration here. We can't even make that conditional > on not having hardware crypto in the wifi NIC because drivers are > always allowed to pass undecrypted frames, regardless of whether or not > HW crypto was attempted, so we don't know upfront if we'll have to > decrypt anything in software... > > Given that, I think we have had a bug in here basically since Ard's > patch, we never should've put these structs on the stack. Herbert, you > also touched this later and converted the API usage, did you see the > way the stack is used here and think it should be OK, or did you simply > not realize that? > > Ard, are you able to help out working on a revert of your patch? That > would require also reverting a number of other patches (various fixes, > API adjustments, etc. to the AEAD usage), but the more complicated part > is that in the meantime Jouni introduced GCMP and CCMP-256, both of > which we of course need to retain. > I am missing some context here, but could you explain what exactly is the problem here? Look at this code """ struct scatterlist sg[3]; char aead_req_data[sizeof(struct aead_request) + crypto_aead_reqsize(tfm)] __aligned(__alignof__(struct aead_request)); struct aead_request *aead_req = (void *) aead_req_data; memset(aead_req, 0, sizeof(aead_req_data)); sg_init_table(sg, 3); sg_set_buf([0], [2], be16_to_cpup((__be16 *)aad)); sg_set_buf([1], data, data_len); sg_set_buf([2], mic, mic_len); aead_request_set_tfm(aead_req, tfm); aead_request_set_crypt(aead_req, sg, sg, data_len, b_0); aead_request_set_ad(aead_req, sg[0].length); """ I assume the stack buffer itself is not the problem here, but aad, which is allocated on the stack one frame up. Do we really need to revert the whole patch to fix that?
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
> 1. revert that patch (doing so would need some major adjustments now, > since it's pretty old and a number of new things were added in the > meantime) This it will have to be, I guess. > 2. allocate a per-CPU buffer for all the things that we put on the > stack and use in SG lists, those are: > * CCM/GCM: AAD (32B), B_0/J_0 (16B) > * GMAC: AAD (20B), zero (16B) > * (not sure why CMAC isn't using this API, but it would be like GMAC) This doesn't work - I tried to move the mac80211 buffers, but because we also put the struct aead_request on the stack, and crypto_ccm has the "odata" in there, and we can't separate the odata from that struct, we'd have to also put that into a per-CPU buffer, but it's very big - 456 bytes for CCM, didn't measure the others but I'd expect them to be larger, if different. I don't think we can allocate half a kb for each CPU just to be able to possibly use the acceleration here. We can't even make that conditional on not having hardware crypto in the wifi NIC because drivers are always allowed to pass undecrypted frames, regardless of whether or not HW crypto was attempted, so we don't know upfront if we'll have to decrypt anything in software... Given that, I think we have had a bug in here basically since Ard's patch, we never should've put these structs on the stack. Herbert, you also touched this later and converted the API usage, did you see the way the stack is used here and think it should be OK, or did you simply not realize that? Ard, are you able to help out working on a revert of your patch? That would require also reverting a number of other patches (various fixes, API adjustments, etc. to the AEAD usage), but the more complicated part is that in the meantime Jouni introduced GCMP and CCMP-256, both of which we of course need to retain. johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On Thu, 2016-10-13 at 14:49 -0700, Andy Lutomirski wrote: > > It's failing before that. With CONFIG_VMAP_STACK=y, the stack may > not be physically contiguous and can't be used for DMA, so putting it > in a scatterlist is bogus in general, and the crypto code mostly > wants a scatterlist. I see, so all this stuff is getting inlined, and we crash in sg_set_buf() because it does sg_set_page() and that obviously needs to do virt_to_page(), which is invalid on this address now. With CONFIG_DEBUG_SG we'd have hit the BUG_ON there instead. It does indeed look like AEAD doesn't have any non-SG API. So ultimately, the bug already goes back to Ard's commit 7ec7c4a9a686 ("mac80211: port CCMP to cryptoapi's CCM driver") since that already potentially used stack space for DMA. Since we don't have any space in the SKB or anywhere else at this point (other than the stack that we can't use), I see two ways out of this: 1. revert that patch (doing so would need some major adjustments now, since it's pretty old and a number of new things were added in the meantime) 2. allocate a per-CPU buffer for all the things that we put on the stack and use in SG lists, those are: * CCM/GCM: AAD (32B), B_0/J_0 (16B) * GMAC: AAD (20B), zero (16B) * (not sure why CMAC isn't using this API, but it would be like GMAC) Thoughts? johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On (10/14/16 00:00), Sergey Senozhatsky wrote: > kernel: [] ieee80211_crypto_ccmp_decrypt+0x204/0x298 > kernel: [] ieee80211_rx_handlers+0x7df/0x1c1d > kernel: [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79 > kernel: [] ? ieee80211_rx_napi+0x168/0x7b6 > kernel: [] ieee80211_rx_napi+0x48b/0x7b6 > kernel: [] ? debug_smp_processor_id+0x17/0x19 > kernel: [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm] > kernel: [] iwl_mvm_rx+0x7e/0x98 [iwlmvm] > kernel: [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi] > kernel: [] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi] > kernel: [] ? irq_finalize_oneshot+0xd4/0xd4 > kernel: [] irq_thread_fn+0x1d/0x34 > kernel: [] irq_thread+0xe6/0x1bb > kernel: [] ? wake_threads_waitq+0x2c/0x2c > kernel: [] ? irq_thread_dtor+0x95/0x95 > kernel: [] kthread+0xfc/0x104 > kernel: [] ? put_lock_stats.isra.9+0xe/0x20 > kernel: [] ? kthread_create_on_node+0x3f/0x3f > kernel: [] ret_from_fork+0x22/0x30 > kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 > c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 > 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d > kernel: RIP [] ieee80211_aes_ccm_decrypt+0x107/0x27f 8146d1ed : 8146d1ed: e8 9e 67 04 00 callq 814b3990 <__fentry__> 8146d1f2: 55 push %rbp 8146d1f3: 48 89 e5mov%rsp,%rbp 8146d1f6: 41 57 push %r15 8146d1f8: 41 56 push %r14 8146d1fa: 49 89 cemov%rcx,%r14 8146d1fd: 41 55 push %r13 8146d1ff: 41 54 push %r12 8146d201: 53 push %rbx 8146d202: 48 83 c4 80 add$0xff80,%rsp 8146d206: 8b 47 04mov0x4(%rdi),%eax 8146d209: 48 8d 48 50 lea0x50(%rax),%rcx 8146d20d: 48 83 c0 5e add$0x5e,%rax 8146d211: 48 c1 e8 03 shr$0x3,%rax 8146d215: 48 c1 e0 03 shl$0x3,%rax 8146d219: 48 29 c4sub%rax,%rsp 8146d21c: 4c 8d 7c 24 07 lea0x7(%rsp),%r15 8146d221: 49 c1 ef 03 shr$0x3,%r15 8146d225: 4d 85 c0test %r8,%r8 8146d228: 4a 8d 04 fd 00 00 00lea0x0(,%r15,8),%rax 8146d22f: 00 8146d230: 48 89 85 70 ff ff ffmov%rax,-0x90(%rbp) 8146d237: 75 0a jne8146d2438146d239: b8 ea ff ff ff mov$0xffea,%eax 8146d23e: e9 1a 02 00 00 jmpq 8146d45d 8146d243: 31 c0 xor%eax,%eax 8146d245: 49 89 fcmov%rdi,%r12 8146d248: 49 89 f5mov%rsi,%r13 8146d24b: 4c 89 85 58 ff ff ffmov%r8,-0xa8(%rbp) 8146d252: 4a 8d 3c fd 00 00 00lea0x0(,%r15,8),%rdi 8146d259: 00 8146d25a: be 03 00 00 00 mov$0x3,%esi 8146d25f: 4c 89 cbmov%r9,%rbx 8146d262: 48 89 95 60 ff ff ffmov%rdx,-0xa0(%rbp) 8146d269: f3 aa rep stos %al,%es:(%rdi) 8146d26b: 48 8d 85 78 ff ff fflea-0x88(%rbp),%rax 8146d272: 48 89 c7mov%rax,%rdi 8146d275: 48 89 85 68 ff ff ffmov%rax,-0x98(%rbp) 8146d27c: e8 46 06 dc ff callq 8122d8c7 8146d281: 48 8b 95 60 ff ff ffmov-0xa0(%rbp),%rdx 8146d288: 41 b9 00 00 00 80 mov$0x8000,%r9d 8146d28e: 48 8b 0d 7b cd 39 00mov0x39cd7b(%rip),%rcx # 8180a010 8146d295: 48 8b 85 68 ff ff ffmov-0x98(%rbp),%rax 8146d29c: 4c 8b 85 58 ff ff ffmov-0xa8(%rbp),%r8 8146d2a3: 0f b7 32movzwl (%rdx),%esi 8146d2a6: 48 83 c2 02 add$0x2,%rdx 8146d2aa: 89 d7 mov%edx,%edi 8146d2ac: 81 e7 ff 0f 00 00 and$0xfff,%edi 8146d2b2: 66 c1 c6 08 rol$0x8,%si 8146d2b6: 4c 01 caadd%r9,%rdx 8146d2b9: 0f b7 f6movzwl %si,%esi 8146d2bc: 72 0a jb 8146d2c8 8146d2be: 48 b9 00 00 00 80 ffmovabs $0x77ff8000,%rcx 8146d2c5: 77 00 00 8146d2c8: 48 01 caadd%rcx,%rdx
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On (10/13/16 15:45), Johannes Berg wrote: > On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote: > > > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi > > > > t/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi > > > > t/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8 > > > > > > That truly sounds like something we'd rather avoid in the TX/RX > > > paths though, which should perform well. > > > > didn't fix. > > It couldn't, since the new helpers weren't used in mac80211 in those > patches yet. indeed. I thought they were. > > FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 > > >> 39) == 130 > > The question, though, is why precisely that fails in the crypto code. > Can you send the Oops report itself? kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [] ieee80211_aes_ccm_decrypt+0x107/0x27f kernel: PGD 0 kernel: kernel: Oops: [#1] PREEMPT SMP kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat mousedev psmouse serio_raw atkbd libps2 i915 coretemp i2c_algo_bit hwmon crc32c_intel mxm_wmi drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect iwlmvm sysimgblt fb_sys_fops i2c_i801 cfbcopyarea ie31200_edac drm iwlwifi i2c kernel: CPU: 3 PID: 245 Comm: irq/28-iwlwifi Not tainted 4.8.0-next-20161013-dbg-2-ge789862-dirty #112 kernel: task: 88041bf01800 task.stack: c92d kernel: RIP: 0010:[] [] ieee80211_aes_ccm_decrypt+0x107/0x27f kernel: RSP: 0018:c92d3770 EFLAGS: 00010246 kernel: RAX: c92d3930 RBX: 8804133cf606 RCX: 00082000 kernel: RDX: RSI: 0018 RDI: 0a02 kernel: RBP: c92d39b8 R08: 05e4 R09: 000412d3 kernel: R10: 001c R11: 8803e66d2d20 R12: 8804191c2780 kernel: R13: c92d39f0 R14: 8804133cf022 R15: 1925a6ee kernel: FS: () GS:88041ea0() knlGS: kernel: CS: 0010 DS: ES: CR0: 80050033 kernel: CR2: CR3: 01805000 CR4: 001406e0 kernel: Stack: kernel: kernel: kernel: kernel: Call Trace: kernel: [] ieee80211_crypto_ccmp_decrypt+0x204/0x298 kernel: [] ieee80211_rx_handlers+0x7df/0x1c1d kernel: [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79 kernel: [] ? ieee80211_rx_napi+0x168/0x7b6 kernel: [] ieee80211_rx_napi+0x48b/0x7b6 kernel: [] ? debug_smp_processor_id+0x17/0x19 kernel: [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm] kernel: [] iwl_mvm_rx+0x7e/0x98 [iwlmvm] kernel: [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi] kernel: [] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi] kernel: [] ? irq_finalize_oneshot+0xd4/0xd4 kernel: [] irq_thread_fn+0x1d/0x34 kernel: [] irq_thread+0xe6/0x1bb kernel: [] ? wake_threads_waitq+0x2c/0x2c kernel: [] ? irq_thread_dtor+0x95/0x95 kernel: [] kthread+0xfc/0x104 kernel: [] ? put_lock_stats.isra.9+0xe/0x20 kernel: [] ? kthread_create_on_node+0x3f/0x3f kernel: [] ret_from_fork+0x22/0x30 kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d kernel: RIP [] ieee80211_aes_ccm_decrypt+0x107/0x27f kernel: RSP kernel: CR2: kernel: ---[ end trace 3cd1fcd496516f72 ]--- -ss
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote: > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi > > > t/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi > > > t/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8 > > > > That truly sounds like something we'd rather avoid in the TX/RX > > paths though, which should perform well. > > didn't fix. It couldn't, since the new helpers weren't used in mac80211 in those patches yet. > so I finally had some time to do a better bug-reporter job. > > I added a bunch of printk-s and several virt_addr_valid()-s > to ieee80211_aes_ccm_encrypt(). > > and right befoe the Oops I see the following report from > virt_addr_valid() > > > FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 > >> 39) == 130 Yeah, we already know that in this function the aad variable is on the stack, it explicitly is. The question, though, is why precisely that fails in the crypto code. Can you send the Oops report itself? johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On (10/13/16 22:42), Sergey Senozhatsky wrote: > > On (10/13/16 08:02), Johannes Berg wrote: > > On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote: > > > > > In a pinch, I have these patches sitting around: > > > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8 > > > > That truly sounds like something we'd rather avoid in the TX/RX paths > > though, which should perform well. > > didn't fix. > > so I finally had some time to do a better bug-reporter job. > > I added a bunch of printk-s and several virt_addr_valid()-s > to ieee80211_aes_ccm_encrypt(). > > and right befoe the Oops I see the following report from > virt_addr_valid() > > > FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 >> 39) > == 130 that `(412cba02 >> 39) == 130' part is phys_addr_valid() { (addr >> boot_cpu_data.x86_phys_bits) } -ss
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On (10/13/16 08:02), Johannes Berg wrote: > On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote: > > > In a pinch, I have these patches sitting around: > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8 > > That truly sounds like something we'd rather avoid in the TX/RX paths > though, which should perform well. didn't fix. so I finally had some time to do a better bug-reporter job. I added a bunch of printk-s and several virt_addr_valid()-s to ieee80211_aes_ccm_encrypt(). and right befoe the Oops I see the following report from virt_addr_valid() FAIL: 412cba02 > c900802cba02 || 1 -> (412cba02 >> 39) == 130 which is basically failed '!phys_addr_valid(x)' in __virt_addr_valid() /* carry flag will be set if starting x was >= PAGE_OFFSET */ if ((x > y) || !phys_addr_valid(x)) return false; backtrace [ cut here ] WARNING: CPU: 7 PID: 246 at arch/x86/mm/physaddr.c:68 __virt_addr_valid+0xab/0xed c92cb6f0 8122168c c92cb730 810428d8 00440198 88041bd21022 c92cba02 192596ed 88041932d1e0 c92cba00 Call Trace: [] dump_stack+0x4f/0x65 [] __warn+0xc2/0xdd [] warn_slowpath_null+0x1d/0x1f [] __virt_addr_valid+0xab/0xed [] ieee80211_aes_ccm_decrypt+0x8f/0x2da [] ? debug_smp_processor_id+0x17/0x19 [] ? __put_page+0x3c/0x3f [] ? ccmp_special_blocks.isra.1+0x51/0x12d [] ieee80211_crypto_ccmp_decrypt+0x204/0x298 [] ieee80211_rx_handlers+0x7df/0x1c1d [] ieee80211_prepare_and_rx_handle+0xdc2/0xe79 [] ? ieee80211_rx_napi+0x154/0x7a5 [] ieee80211_rx_napi+0x474/0x7a5 [] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm] [] iwl_mvm_rx+0x7e/0x98 [iwlmvm] [] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi] [] iwl_pcie_irq_handler+0x45d/0x64d [iwlwifi] [] ? irq_finalize_oneshot+0xd4/0xd4 [] irq_thread_fn+0x1d/0x34 [] irq_thread+0xe6/0x1bb [] ? wake_threads_waitq+0x2c/0x2c [] ? irq_thread_dtor+0x95/0x95 [] kthread+0xfc/0x104 [] ? put_lock_stats.isra.9+0xe/0x20 [] ? kthread_create_on_node+0x3f/0x3f [] ? kthread_create_on_node+0x3f/0x3f [] ? kthread_create_on_node+0x3f/0x3f [] ret_from_fork+0x22/0x30 -ss
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote: > In a pinch, I have these patches sitting around: > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=bf8cfa200b5a01383ea39fc8ce2f32909767baa8 That truly sounds like something we'd rather avoid in the TX/RX paths though, which should perform well. > I don't like them, though. I think it's rather silly that we can't > just pass virtual addresses to the crypto code. I don't really understand it either, hence my question about the actual crash. I'll try to reproduce it in a VM. johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
> > Can you elaborate on how exactly it kills your system? > > the last time I saw it it was a NULL deref at > ieee80211_aes_ccm_decrypt. Hm. I was expecting something within the crypto code would cause the crash, this seems strange. Anyway, I'm surely out of my depth wrt. the actual cause. Something like the patch below probably works around it, but it's horribly inefficient due to the locking and doesn't cover CMAC/GMAC either. johannes diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 103187ca9474..e820f437f02e 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -1224,6 +1225,10 @@ struct ieee80211_local { spinlock_t rx_path_lock; + /* temporary buffers for software crypto */ + u8 aad[2 * AES_BLOCK_SIZE]; + u8 b_0[AES_BLOCK_SIZE]; + /* Station data */ /* * The mutex only protects the list, hash table and diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c index b48c1e13e281..a3f17a710b85 100644 --- a/net/mac80211/wpa.c +++ b/net/mac80211/wpa.c @@ -405,8 +405,8 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb, u8 *pos; u8 pn[6]; u64 pn64; - u8 aad[2 * AES_BLOCK_SIZE]; - u8 b_0[AES_BLOCK_SIZE]; + u8 *aad = tx->local->aad; + u8 *b_0 = tx->local->b_0; if (info->control.hw_key && !(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) && @@ -460,9 +460,11 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb, return 0; pos += IEEE80211_CCMP_HDR_LEN; + spin_lock_bh(>local->rx_path_lock); ccmp_special_blocks(skb, pn, b_0, aad); ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad, pos, len, skb_put(skb, mic_len), mic_len); + spin_unlock_bh(>local->rx_path_lock); return 0; } @@ -534,8 +536,9 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx, } if (!(status->flag & RX_FLAG_DECRYPTED)) { - u8 aad[2 * AES_BLOCK_SIZE]; - u8 b_0[AES_BLOCK_SIZE]; + u8 *aad = rx->local->aad; + u8 *b_0 = rx->local->b_0; + /* hardware didn't decrypt/verify MIC */ ccmp_special_blocks(skb, pn, b_0, aad); @@ -639,8 +642,8 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb) u8 *pos; u8 pn[6]; u64 pn64; - u8 aad[2 * AES_BLOCK_SIZE]; - u8 j_0[AES_BLOCK_SIZE]; + u8 *aad = tx->local->aad; + u8 *j_0 = tx->local->b_0; if (info->control.hw_key && !(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) && @@ -695,9 +698,11 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb) return 0; pos += IEEE80211_GCMP_HDR_LEN; + spin_lock_bh(>local->rx_path_lock); gcmp_special_blocks(skb, pn, j_0, aad); ieee80211_aes_gcm_encrypt(key->u.gcmp.tfm, j_0, aad, pos, len, skb_put(skb, IEEE80211_GCMP_MIC_LEN)); + spin_unlock_bh(>local->rx_path_lock); return 0; } @@ -764,8 +769,9 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx) } if (!(status->flag & RX_FLAG_DECRYPTED)) { - u8 aad[2 * AES_BLOCK_SIZE]; - u8 j_0[AES_BLOCK_SIZE]; + u8 *aad = rx->local->aad; + u8 *j_0 = rx->local->b_0; + /* hardware didn't decrypt/verify MIC */ gcmp_special_blocks(skb, pn, j_0, aad);
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
Hello, On (10/12/16 11:05), Johannes Berg wrote: > Sorry - I meant to look into this yesterday but forgot. > > > Andy, can this be related to CONFIG_VMAP_STACK? > > I think it is. yeah, the system works fine with !CONFIG_VMAP_STACK. > > > current -git kills my system. > > Can you elaborate on how exactly it kills your system? the last time I saw it it was a NULL deref at ieee80211_aes_ccm_decrypt. -ss
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
Hi, Sorry - I meant to look into this yesterday but forgot. > Andy, can this be related to CONFIG_VMAP_STACK? I think it is. > > current -git kills my system. Can you elaborate on how exactly it kills your system? > > adding > > > > if (!virt_addr_valid([2])) { > > WARN_ON(1); > > return -EINVAL; > > } That's pretty obviously false with VMAP_STACK, since the caller (ieee80211_crypto_ccmp_decrypt) puts the aad on the stack. b_0 is also on the stack, but maybe that doesn't matter. Herbert, do you know what could cause this, and how we should fix it? We can't really afford to do an allocation here, and we don't have space in the skb (not even in skb->cb at that point), so if we really have no way to continue using the stack we'd ... not sure, use a per- CPU buffer perhaps. We need 32 bytes for aad and 16 bytes for b_0, if that also can't be on the stack any more. johannes
Re: [mac80211] BUG_ON with current -git (4.8.0-11417-g24532f7)
Cc Andy Andy, can this be related to CONFIG_VMAP_STACK? On (10/11/16 00:03), Sergey Senozhatsky wrote: > Hello, > > current -git kills my system. adding > > if (!virt_addr_valid([2])) { > WARN_ON(1); > return -EINVAL; > } > > to ieee80211_aes_ccm_decrypt() given the following backtrace > > WARNING: CPU: 5 PID: 252 at net/mac80211/aes_ccm.c:77 > ieee80211_aes_ccm_decrypt+0xc8/0x197 > CPU: 5 PID: 252 Comm: irq/29-iwlwifi Tainted: GW > 4.8.0-next-20161010-dbg-7-g79797e9-dirty #88 > c9413638 811ff0e3 > c9413678 8103fe91 004d01c8 192826d3 > 88040fc526d8 0008 c9413978 c941397a > Call Trace: > [] dump_stack+0x4f/0x65 > [] __warn+0xc2/0xdd > [] warn_slowpath_null+0x1d/0x1f > [] ieee80211_aes_ccm_decrypt+0xc8/0x197 > [] ? __put_page+0x3c/0x3f > [] ? put_page+0x4a/0x62 > [] ? __pskb_pull_tail+0x1e8/0x279 > [] ? ccmp_special_blocks.isra.5+0x51/0x12d > [] ieee80211_crypto_ccmp_decrypt+0x1ba/0x221 > [] ieee80211_rx_handlers+0x52a/0x19c2 > [] ? start_dl_timer+0xa8/0xb4 > [] ? put_lock_stats.isra.24+0xe/0x20 > [] ? del_timer+0x57/0x61 > [] ieee80211_prepare_and_rx_handle+0xcd6/0xd2a > [] ? local_clock+0x10/0x12 > [] ? __lock_acquire.isra.31+0x202/0x57e > [] ? rcu_read_unlock+0x23/0x23 > [] ? sched_clock_cpu+0x17/0xc6 > [] ieee80211_rx_napi+0x5af/0x698 > [] ? get_lock_stats+0x19/0x50 > [] ? put_lock_stats.isra.24+0xe/0x20 > [] iwl_mvm_rx_rx_mpdu+0x5ab/0x60c [iwlmvm] > [] ? get_lock_stats+0x19/0x50 > [] iwl_mvm_rx+0x45/0x69 [iwlmvm] > [] iwl_pcie_rx_handle+0x478/0x584 [iwlwifi] > [] iwl_pcie_irq_handler+0x39c/0x52d [iwlwifi] > [] ? irq_finalize_oneshot+0xa7/0xa7 > [] irq_thread_fn+0x1d/0x34 > [] irq_thread+0xe6/0x1bb > [] ? wake_threads_waitq+0x2c/0x2c > [] ? irq_thread_dtor+0x95/0x95 > [] kthread+0xc6/0xce > [] ? put_lock_stats.isra.24+0xe/0x20 > [] ? __list_del_entry+0x22/0x22 > [] ret_from_fork+0x22/0x30 > ---[ end trace 94da6d4698b938b2 ]--- -ss