Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-18 Thread Martin Willi
Hi Jason,

> I'd be inclined to roll with your implementation if it can eventually
> become competitive with Andy Polyakov's, [...]

I think for the SSSE3/AVX2 code paths it is competitive; especially for
small sizes it is faster, which is not that unimportant when
implementing layer 3 VPNs.

> there are still no AVX-512 paths, which means it's considerably
> slower on all newer generation Intel chips. Andy's has the AVX-512VL
> implementation for Skylake (using ymm, so as not to hit throttling)
> and AVX-512F for Cannon Lake and beyond (using zmm).

I don't think that having AVX-512F is that important until it is really
usable on CPUs in the market.

Adding AVX-512VL support is relatively simple. I have a patchset mostly
ready that is more than competitive with the code from Zinc. I'll clean
that up and do more testing before posting it later this week.

Best regards
Martin



Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Herbert Xu
Hi Jason:

On Mon, Nov 19, 2018 at 07:13:07AM +0100, Jason A. Donenfeld wrote:
>
> Sorry, but there isn't a chance I'll agree to this. The whole idea is
> to have a clean place to focus on synchronous software
> implementations, and not keep it tangled up with the asynchronous API.

There is nothing asynchronous in the relevant crypto code at all.
The underlying CPU-based software implementations are all completely
synchronous, including the architecture-specific ones such as
x86/arm.

So I don't understand why you are rejecting it.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Jason A. Donenfeld
Hi Herbert,

On Mon, Nov 19, 2018 at 6:25 AM Herbert Xu  wrote:
> My proposal is to merge the zinc interface as is, but to invert
> how we place the algorithm implementations.  IOW the implementations
> should stay where they are now, with in the crypto API.  However,
> we will provide direct access to them for zinc without going through
> the crypto API.  IOW we'll just export the functions directly.

Sorry, but there isn't a chance I'll agree to this. The whole idea is
to have a clean place to focus on synchronous software
implementations, and not keep it tangled up with the asynchronous API.
If WireGuard and Zinc go in, it's going to be done right, not in a way
that introduces more problems and organizational complexity. Avoiding
the solution proposed here is kind of the point. And given that
cryptographic code is so security sensitive, I want to make sure this
is done right, as opposed to rushing it with some half-baked
concoction that will be maybe possibly be replaced "later". From
talking to folks at Plumbers, I believe most of the remaining issues
are taken care of by the upcoming v9, and that you're overstating the
status of discussions regarding that.

I think the remaining issue right now is how to order my series and
Eric's series. If Eric's goes in first, it means that I can either a)
spend some time developing Zinc further _now_ to support chacha12 and
keep the top two conversion patches, or b) just drop the top two
conversion patches and work that out carefully with Eric after. I
think (b) would be better, but I'm happy to do (a) if you disagree.
And as I mentioned in the last email, I'd prefer Eric's work to go in
after Zinc, but I don't think the reasoning for that is particularly
strong, so it seems fine to merge Eric's work first.

I'll post v9 pretty soon and you can see how things are shaping up.
Hopefully before then Eric/Ard/you can provide some feedback on
whether you'd prefer (a) or (b) above.

Regards,
Jason


[RFC PATCH] zinc chacha20 generic implementation using crypto API code

2018-11-18 Thread Herbert Xu
On Sun, Nov 18, 2018 at 02:46:30PM +0100, Jason A. Donenfeld wrote:
> 
> Personally I'd prefer this be merged after Zinc, since there's work to
> be done on adjusting the 20->12 in chacha20. That's not really much of
> a reason though. But maybe we can just sidestep the ordering concern
> all together:

In response to Martin's patch-set which I merged last week, I think
here is quick way out for the zinc interface.

Going through the past zinc discussions it would appear that
everybody is quite happy with the zinc interface per se.  The
most contentious areas are in fact the algorithm implementations
under zinc, as well as the conversion of the crypto API algorithms
over to using the zinc interface (e.g., you can no longer access
specific implementations).

My proposal is to merge the zinc interface as is, but to invert
how we place the algorithm implementations.  IOW the implementations
should stay where they are now, with in the crypto API.  However,
we will provide direct access to them for zinc without going through
the crypto API.  IOW we'll just export the functions directly.

Here is a proof of concept patch to do it for chacha20-generic.
It basically replaces patch 3 in the October zinc patch series.
Actually this patch also exports the arm/x86 chacha20 functions
too so they are ready for use by zinc.

If everyone is happy with this then we can immediately add the
zinc interface and expose the existing crypto API algorithms
through it.  This would allow wireguard to be merged right away.

In parallel, the discussions over replacing the implementations
underneath can carry on without stalling the whole project.

PS This patch is totally untested :)

Thanks,

diff --git a/arch/arm/crypto/chacha20-neon-glue.c 
b/arch/arm/crypto/chacha20-neon-glue.c
index 59a7be08e80c..1f6239ff41ae 100644
--- a/arch/arm/crypto/chacha20-neon-glue.c
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -31,7 +31,7 @@
 asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
 
-static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
+void crypto_chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
 {
u8 buf[CHACHA20_BLOCK_SIZE];
@@ -56,11 +56,12 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 
*src,
memcpy(dst, buf, bytes);
}
 }
+EXPORT_SYMBOL_GPL(crypto_chacha20_doneon);
 
 static int chacha20_neon(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
u32 state[16];
int err;
@@ -79,8 +80,8 @@ static int chacha20_neon(struct skcipher_request *req)
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
 
-   chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
-   nbytes);
+   crypto_chacha20_doneon(state, walk.dst.virt.addr,
+  walk.src.virt.addr, nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
kernel_neon_end();
@@ -93,7 +94,7 @@ static struct skcipher_alg alg = {
.base.cra_driver_name   = "chacha20-neon",
.base.cra_priority  = 300,
.base.cra_blocksize = 1,
-   .base.cra_ctxsize   = sizeof(struct chacha20_ctx),
+   .base.cra_ctxsize   = sizeof(struct crypto_chacha20_ctx),
.base.cra_module= THIS_MODULE,
 
.min_keysize= CHACHA20_KEY_SIZE,
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index 9fd84fe6ec09..519bbbf8c477 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -39,7 +39,7 @@ static unsigned int chacha20_advance(unsigned int len, 
unsigned int maxblocks)
return round_up(len, CHACHA20_BLOCK_SIZE) / CHACHA20_BLOCK_SIZE;
 }
 
-static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+void crypto_chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
 {
 #ifdef CONFIG_AS_AVX2
@@ -85,11 +85,12 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 
*src,
state[12]++;
}
 }
+EXPORT_SYMBOL_GPL(crypto_chacha20_dosimd);
 
 static int chacha20_simd(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
u32 *state, state_buf[16 + 2] __aligned(8);
struct skcipher_walk walk;
int err;
@@ -112,8 +113,8 @@ static int chacha20_simd(struct skcipher_request *req)
if (nbytes < walk.tota

Important

2018-11-18 Thread Reem Al-Hashimi
Hello,

My name is ms. Reem Al-Hashimi. The UAE minister of state for international 
cooparation. I got your contact from a certain email database from your country 
while i was looking for someone to handle a huge financial transaction for me 
in confidence. Can you receive and invest on behalf of my only son. Please 
reply to reem2...@daum.net, for more details if you are interested.

Regards,

Ms. Reem Al-Hashimy


BUG: unable to handle kernel paging request in aead_geniv_alloc

2018-11-18 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:9c549a6b0573 selftests: add explicit test for multiple con..
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13a46f7b40
kernel config:  https://syzkaller.appspot.com/x/.config?x=d86f24333880b605
dashboard link: https://syzkaller.appspot.com/bug?extid=cd50d1708db841432d8a
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cd50d1708db841432...@syzkaller.appspotmail.com

BUG: unable to handle kernel paging request at fff0
PGD 946d067 P4D 946d067 PUD 946f067 PMD 0
Oops:  [#1] PREEMPT SMP KASAN
CPU: 1 PID: 9032 Comm: cryptomgr_probe Not tainted 4.20.0-rc2+ #300
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

kobject: 'loop1' (f780ed70): kobject_uevent_env
RIP: 0010:aead_geniv_alloc+0x2de/0x780 crypto/aead.c:240
Code: 00 00 fc ff df 49 8d 7c 24 f0 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89  
f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 03 00 00 <45> 8b 74 24 f0  
bf 07 00 00 00 48 c7 c3 ea ff ff ff 44 89 f6 e8 59

RSP: 0018:8881946b7e40 EFLAGS: 00010246
kobject: 'loop1' (f780ed70): fill_kobj_path: path  
= '/devices/virtual/block/loop1'

RAX: 0003 RBX:  RCX: 835c1711
RDX:  RSI: 835c171e RDI: fff0
RBP: 8881946b7e80 R08: 8881975da540 R09: ed10328d6fa3
R10: ed10328d6fa3 R11: 0003 R12: 
R13: 8881d3ee66b0 R14:  R15: 
FS:  () GS:8881daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff0 CR3: 00019a8d CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 seqiv_aead_create crypto/seqiv.c:149 [inline]
 seqiv_create+0x99/0x290 crypto/seqiv.c:190
 cryptomgr_probe+0x6d/0x280 crypto/algboss.c:75
 kthread+0x35a/0x440 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
CR2: fff0
---[ end trace e6ac52e6f5849122 ]---
RIP: 0010:aead_geniv_alloc+0x2de/0x780 crypto/aead.c:240
Code: 00 00 fc ff df 49 8d 7c 24 f0 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89  
f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 03 00 00 <45> 8b 74 24 f0  
bf 07 00 00 00 48 c7 c3 ea ff ff ff 44 89 f6 e8 59

RSP: 0018:8881946b7e40 EFLAGS: 00010246
RAX: 0003 RBX:  RCX: 835c1711
RDX:  RSI: 835c171e RDI: fff0
RBP: 8881946b7e80 R08: 8881975da540 R09: ed10328d6fa3
R10: ed10328d6fa3 R11: 0003 R12: 
R13: 8881d3ee66b0 R14:  R15: 
FS:  () GS:8881daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff0 CR3: 00019a8d CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.


Re: [RFC PATCH v3 10/15] crypto: poly1305 - use structures for key and accumulator

2018-11-18 Thread Jason A. Donenfeld
Hi Eric,

On Sat, Nov 17, 2018 at 1:17 AM Eric Biggers  wrote:
> Do you prefer that this be merged before or after Zinc?  It seems it may still
> be a while before the community is satisfied with Zinc (and Wireguard which is
> in the same patchset), and I don't want this blocked unnecessarily...  So on 
> my
> part I'd prefer to just have this merged as-is.

Personally I'd prefer this be merged after Zinc, since there's work to
be done on adjusting the 20->12 in chacha20. That's not really much of
a reason though. But maybe we can just sidestep the ordering concern
all together:

What I suspect we should do is make the initial Zinc merge _without_
those top two patches that replace the crypto api's implementations
with Zinc, and defer those until after the initial merges. Those
commits are already written -- so there's no chance it won't happen
due to laziness or something -- and I think the general merge will go
a bit more smoothly if we wait until after. (Somebody suggested we do
it this way at Plumbers, and was actually a bit surprised I had
already ported the crypto API stuff to Zinc.) This way, Adiantum and
Zinc aren't potentially co-dependent in their initial merges and we
can work on the details carefully with each other after both have
landed. I figure this might make things a little bit less stressful
for both of us. How would you feel about doing it that?

Jason