Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Jason, > I'd be inclined to roll with your implementation if it can eventually > become competitive with Andy Polyakov's, [...] I think for the SSSE3/AVX2 code paths it is competitive; especially for small sizes it is faster, which is not that unimportant when implementing layer 3 VPNs. > there are still no AVX-512 paths, which means it's considerably > slower on all newer generation Intel chips. Andy's has the AVX-512VL > implementation for Skylake (using ymm, so as not to hit throttling) > and AVX-512F for Cannon Lake and beyond (using zmm). I don't think that having AVX-512F is that important until it is really usable on CPUs in the market. Adding AVX-512VL support is relatively simple. I have a patchset mostly ready that is more than competitive with the code from Zinc. I'll clean that up and do more testing before posting it later this week. Best regards Martin
Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code
Hi Jason: On Mon, Nov 19, 2018 at 07:13:07AM +0100, Jason A. Donenfeld wrote: > > Sorry, but there isn't a chance I'll agree to this. The whole idea is > to have a clean place to focus on synchronous software > implementations, and not keep it tangled up with the asynchronous API. There is nothing asynchronous in the relevant crypto code at all. The underlying CPU-based software implementations are all completely synchronous, including the architecture-specific ones such as x86/arm. So I don't understand why you are rejecting it. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code
Hi Herbert, On Mon, Nov 19, 2018 at 6:25 AM Herbert Xu wrote: > My proposal is to merge the zinc interface as is, but to invert > how we place the algorithm implementations. IOW the implementations > should stay where they are now, with in the crypto API. However, > we will provide direct access to them for zinc without going through > the crypto API. IOW we'll just export the functions directly. Sorry, but there isn't a chance I'll agree to this. The whole idea is to have a clean place to focus on synchronous software implementations, and not keep it tangled up with the asynchronous API. If WireGuard and Zinc go in, it's going to be done right, not in a way that introduces more problems and organizational complexity. Avoiding the solution proposed here is kind of the point. And given that cryptographic code is so security sensitive, I want to make sure this is done right, as opposed to rushing it with some half-baked concoction that will be maybe possibly be replaced "later". From talking to folks at Plumbers, I believe most of the remaining issues are taken care of by the upcoming v9, and that you're overstating the status of discussions regarding that. I think the remaining issue right now is how to order my series and Eric's series. If Eric's goes in first, it means that I can either a) spend some time developing Zinc further _now_ to support chacha12 and keep the top two conversion patches, or b) just drop the top two conversion patches and work that out carefully with Eric after. I think (b) would be better, but I'm happy to do (a) if you disagree. And as I mentioned in the last email, I'd prefer Eric's work to go in after Zinc, but I don't think the reasoning for that is particularly strong, so it seems fine to merge Eric's work first. I'll post v9 pretty soon and you can see how things are shaping up. Hopefully before then Eric/Ard/you can provide some feedback on whether you'd prefer (a) or (b) above. Regards, Jason
[RFC PATCH] zinc chacha20 generic implementation using crypto API code
On Sun, Nov 18, 2018 at 02:46:30PM +0100, Jason A. Donenfeld wrote: > > Personally I'd prefer this be merged after Zinc, since there's work to > be done on adjusting the 20->12 in chacha20. That's not really much of > a reason though. But maybe we can just sidestep the ordering concern > all together: In response to Martin's patch-set which I merged last week, I think here is quick way out for the zinc interface. Going through the past zinc discussions it would appear that everybody is quite happy with the zinc interface per se. The most contentious areas are in fact the algorithm implementations under zinc, as well as the conversion of the crypto API algorithms over to using the zinc interface (e.g., you can no longer access specific implementations). My proposal is to merge the zinc interface as is, but to invert how we place the algorithm implementations. IOW the implementations should stay where they are now, with in the crypto API. However, we will provide direct access to them for zinc without going through the crypto API. IOW we'll just export the functions directly. Here is a proof of concept patch to do it for chacha20-generic. It basically replaces patch 3 in the October zinc patch series. Actually this patch also exports the arm/x86 chacha20 functions too so they are ready for use by zinc. If everyone is happy with this then we can immediately add the zinc interface and expose the existing crypto API algorithms through it. This would allow wireguard to be merged right away. In parallel, the discussions over replacing the implementations underneath can carry on without stalling the whole project. PS This patch is totally untested :) Thanks, diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c index 59a7be08e80c..1f6239ff41ae 100644 --- a/arch/arm/crypto/chacha20-neon-glue.c +++ b/arch/arm/crypto/chacha20-neon-glue.c @@ -31,7 +31,7 @@ asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src); asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src); -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, +void crypto_chacha20_doneon(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { u8 buf[CHACHA20_BLOCK_SIZE]; @@ -56,11 +56,12 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, memcpy(dst, buf, bytes); } } +EXPORT_SYMBOL_GPL(crypto_chacha20_doneon); static int chacha20_neon(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; u32 state[16]; int err; @@ -79,8 +80,8 @@ static int chacha20_neon(struct skcipher_request *req) if (nbytes < walk.total) nbytes = round_down(nbytes, walk.stride); - chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr, - nbytes); + crypto_chacha20_doneon(state, walk.dst.virt.addr, + walk.src.virt.addr, nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } kernel_neon_end(); @@ -93,7 +94,7 @@ static struct skcipher_alg alg = { .base.cra_driver_name = "chacha20-neon", .base.cra_priority = 300, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct crypto_chacha20_ctx), .base.cra_module= THIS_MODULE, .min_keysize= CHACHA20_KEY_SIZE, diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c index 9fd84fe6ec09..519bbbf8c477 100644 --- a/arch/x86/crypto/chacha20_glue.c +++ b/arch/x86/crypto/chacha20_glue.c @@ -39,7 +39,7 @@ static unsigned int chacha20_advance(unsigned int len, unsigned int maxblocks) return round_up(len, CHACHA20_BLOCK_SIZE) / CHACHA20_BLOCK_SIZE; } -static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, +void crypto_chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { #ifdef CONFIG_AS_AVX2 @@ -85,11 +85,12 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, state[12]++; } } +EXPORT_SYMBOL_GPL(crypto_chacha20_dosimd); static int chacha20_simd(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); u32 *state, state_buf[16 + 2] __aligned(8); struct skcipher_walk walk; int err; @@ -112,8 +113,8 @@ static int chacha20_simd(struct skcipher_request *req) if (nbytes < walk.tota
Important
Hello, My name is ms. Reem Al-Hashimi. The UAE minister of state for international cooparation. I got your contact from a certain email database from your country while i was looking for someone to handle a huge financial transaction for me in confidence. Can you receive and invest on behalf of my only son. Please reply to reem2...@daum.net, for more details if you are interested. Regards, Ms. Reem Al-Hashimy
BUG: unable to handle kernel paging request in aead_geniv_alloc
Hello, syzbot found the following crash on: HEAD commit:9c549a6b0573 selftests: add explicit test for multiple con.. git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=13a46f7b40 kernel config: https://syzkaller.appspot.com/x/.config?x=d86f24333880b605 dashboard link: https://syzkaller.appspot.com/bug?extid=cd50d1708db841432d8a compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+cd50d1708db841432...@syzkaller.appspotmail.com BUG: unable to handle kernel paging request at fff0 PGD 946d067 P4D 946d067 PUD 946f067 PMD 0 Oops: [#1] PREEMPT SMP KASAN CPU: 1 PID: 9032 Comm: cryptomgr_probe Not tainted 4.20.0-rc2+ #300 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 kobject: 'loop1' (f780ed70): kobject_uevent_env RIP: 0010:aead_geniv_alloc+0x2de/0x780 crypto/aead.c:240 Code: 00 00 fc ff df 49 8d 7c 24 f0 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 03 00 00 <45> 8b 74 24 f0 bf 07 00 00 00 48 c7 c3 ea ff ff ff 44 89 f6 e8 59 RSP: 0018:8881946b7e40 EFLAGS: 00010246 kobject: 'loop1' (f780ed70): fill_kobj_path: path = '/devices/virtual/block/loop1' RAX: 0003 RBX: RCX: 835c1711 RDX: RSI: 835c171e RDI: fff0 RBP: 8881946b7e80 R08: 8881975da540 R09: ed10328d6fa3 R10: ed10328d6fa3 R11: 0003 R12: R13: 8881d3ee66b0 R14: R15: FS: () GS:8881daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fff0 CR3: 00019a8d CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: seqiv_aead_create crypto/seqiv.c:149 [inline] seqiv_create+0x99/0x290 crypto/seqiv.c:190 cryptomgr_probe+0x6d/0x280 crypto/algboss.c:75 kthread+0x35a/0x440 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Modules linked in: CR2: fff0 ---[ end trace e6ac52e6f5849122 ]--- RIP: 0010:aead_geniv_alloc+0x2de/0x780 crypto/aead.c:240 Code: 00 00 fc ff df 49 8d 7c 24 f0 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 03 00 00 <45> 8b 74 24 f0 bf 07 00 00 00 48 c7 c3 ea ff ff ff 44 89 f6 e8 59 RSP: 0018:8881946b7e40 EFLAGS: 00010246 RAX: 0003 RBX: RCX: 835c1711 RDX: RSI: 835c171e RDI: fff0 RBP: 8881946b7e80 R08: 8881975da540 R09: ed10328d6fa3 R10: ed10328d6fa3 R11: 0003 R12: R13: 8881d3ee66b0 R14: R15: FS: () GS:8881daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fff0 CR3: 00019a8d CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
Re: [RFC PATCH v3 10/15] crypto: poly1305 - use structures for key and accumulator
Hi Eric, On Sat, Nov 17, 2018 at 1:17 AM Eric Biggers wrote: > Do you prefer that this be merged before or after Zinc? It seems it may still > be a while before the community is satisfied with Zinc (and Wireguard which is > in the same patchset), and I don't want this blocked unnecessarily... So on > my > part I'd prefer to just have this merged as-is. Personally I'd prefer this be merged after Zinc, since there's work to be done on adjusting the 20->12 in chacha20. That's not really much of a reason though. But maybe we can just sidestep the ordering concern all together: What I suspect we should do is make the initial Zinc merge _without_ those top two patches that replace the crypto api's implementations with Zinc, and defer those until after the initial merges. Those commits are already written -- so there's no chance it won't happen due to laziness or something -- and I think the general merge will go a bit more smoothly if we wait until after. (Somebody suggested we do it this way at Plumbers, and was actually a bit surprised I had already ported the crypto API stuff to Zinc.) This way, Adiantum and Zinc aren't potentially co-dependent in their initial merges and we can work on the details carefully with each other after both have landed. I figure this might make things a little bit less stressful for both of us. How would you feel about doing it that? Jason