Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Herbert Xu
On Tue, Nov 20, 2018 at 05:24:41PM +0100, Jason A. Donenfeld wrote:
> On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu  
> wrote:
> > Yes.  In fact it's used for FIPS certification testing.
> > Sure, nobody sane should be doing it.  But when it comes to
> > government certification... :)
> 
> The kernel does not aim toward any FIPS certification, and we're not
> going to start bloating our designs to fulfill this. It's never been a
> goal. Maybe ask Ted to add a FIPS mode to random.c and see what
> happens... When you start arguing "because FIPS!" as your
> justification, you really hit a head scratcher.

FIPS is not the issue.  The issue is that we need to keep the
existing user-space ABI.  We can't break that.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH 1/1] crypto: cavium/nitrox - crypto request format changes

2018-11-20 Thread Nagadheeraj, Rottela
nitrox_skcipher_crypt() will do the necessary formatting/ordering of
input and output sglists based on the algorithm requirements.
It will also accommodate the mandatory output buffers required for
NITROX hardware like Output request headers (ORH) and Completion headers.

Signed-off-by: Nagadheeraj Rottela 
Reviewed-by: Srikanth Jampala 
---
 drivers/crypto/cavium/nitrox/nitrox_algs.c   | 111 ++-
 drivers/crypto/cavium/nitrox/nitrox_req.h|  94 ++
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c | 266 ++-
 3 files changed, 227 insertions(+), 244 deletions(-)

diff --git a/drivers/crypto/cavium/nitrox/nitrox_algs.c 
b/drivers/crypto/cavium/nitrox/nitrox_algs.c
index 5d54ebc20cb3..10075a97ff0d 100644
--- a/drivers/crypto/cavium/nitrox/nitrox_algs.c
+++ b/drivers/crypto/cavium/nitrox/nitrox_algs.c
@@ -155,13 +155,109 @@ static int nitrox_aes_setkey(struct crypto_skcipher 
*cipher, const u8 *key,
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
 }
 
+static int alloc_src_sglist(struct skcipher_request *skreq, int ivsize)
+{
+   struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
+   int nents = sg_nents(skreq->src) + 1;
+   struct se_crypto_request *creq = &nkreq->creq;
+   char *iv;
+   struct scatterlist *sg;
+
+   /* Allocate buffer to hold IV and input scatterlist array */
+   nkreq->src = alloc_req_buf(nents, ivsize, creq->gfp);
+   if (!nkreq->src)
+   return -ENOMEM;
+
+   /* copy iv */
+   iv = nkreq->src;
+   memcpy(iv, skreq->iv, ivsize);
+
+   sg = (struct scatterlist *)(iv + ivsize);
+   creq->src = sg;
+   sg_init_table(sg, nents);
+
+   /* Input format:
+* +++
+* | IV | SRC sg entries |
+* +++
+*/
+
+   /* IV */
+   sg = create_single_sg(sg, iv, ivsize);
+   /* SRC entries */
+   create_multi_sg(sg, skreq->src);
+
+   return 0;
+}
+
+static int alloc_dst_sglist(struct skcipher_request *skreq, int ivsize)
+{
+   struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
+   int nents = sg_nents(skreq->dst) + 3;
+   int extralen = ORH_HLEN + COMP_HLEN;
+   struct se_crypto_request *creq = &nkreq->creq;
+   struct scatterlist *sg;
+   char *iv = nkreq->src;
+
+   /* Allocate buffer to hold ORH, COMPLETION and output scatterlist
+* array
+*/
+   nkreq->dst = alloc_req_buf(nents, extralen, creq->gfp);
+   if (!nkreq->dst)
+   return -ENOMEM;
+
+   creq->orh = (u64 *)(nkreq->dst);
+   set_orh_value(creq->orh);
+
+   creq->comp = (u64 *)(nkreq->dst + ORH_HLEN);
+   set_comp_value(creq->comp);
+
+   sg = (struct scatterlist *)(nkreq->dst + ORH_HLEN + COMP_HLEN);
+   creq->dst = sg;
+   sg_init_table(sg, nents);
+
+   /* Output format:
+* +-+++-+
+* | ORH | IV | DST sg entries | COMPLETION Bytes|
+* +-+++-+
+*/
+
+   /* ORH */
+   sg = create_single_sg(sg, creq->orh, ORH_HLEN);
+   /* IV */
+   sg = create_single_sg(sg, iv, ivsize);
+   /* DST entries */
+   sg = create_multi_sg(sg, skreq->dst);
+   /* COMPLETION Bytes */
+   create_single_sg(sg, creq->comp, COMP_HLEN);
+
+   return 0;
+}
+
+static void free_src_sglist(struct skcipher_request *skreq)
+{
+   struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
+
+   kfree(nkreq->src);
+}
+
+static void free_dst_sglist(struct skcipher_request *skreq)
+{
+   struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
+
+   kfree(nkreq->dst);
+}
+
 static void nitrox_skcipher_callback(struct skcipher_request *skreq,
 int err)
 {
+   free_src_sglist(skreq);
+   free_dst_sglist(skreq);
if (err) {
pr_err_ratelimited("request failed status 0x%0x\n", err);
err = -EINVAL;
}
+
skcipher_request_complete(skreq, err);
 }
 
@@ -172,6 +268,7 @@ static int nitrox_skcipher_crypt(struct skcipher_request 
*skreq, bool enc)
struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
int ivsize = crypto_skcipher_ivsize(cipher);
struct se_crypto_request *creq;
+   int ret;
 
creq = &nkreq->creq;
creq->flags = skreq->base.flags;
@@ -192,11 +289,15 @@ static int nitrox_skcipher_crypt(struct skcipher_request 
*skreq, bool enc)
creq->ctx_handle = nctx->u.ctx_handle;
creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context);
 
-   /* copy the iv */
-   memcpy(creq->iv, skreq->iv, ivsize);
-   creq->ivsize = ivsize;
-   creq->src = skreq->src;
-   creq->dst = skreq->dst;
+   ret = alloc_src_sglist(skreq, ivsize);
+   if (ret)
+   return ret;
+
+   ret = allo

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Kenneth Lee
On Mon, Nov 19, 2018 at 08:29:39PM -0700, Jason Gunthorpe wrote:
> Date: Mon, 19 Nov 2018 20:29:39 -0700
> From: Jason Gunthorpe 
> To: Kenneth Lee 
> CC: Leon Romanovsky , Kenneth Lee ,
>  Tim Sell , linux-...@vger.kernel.org, Alexander
>  Shishkin , Zaibo Xu
>  , zhangfei@foxmail.com, linux...@huawei.com,
>  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
>  , Gavin Schenk , RDMA mailing
>  list , Zhou Wang ,
>  Doug Ledford , Uwe Kleine-König
>  , David Kershner
>  , Johan Hovold , Cyrille
>  Pitchen , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Vinod Koul
>  , linux-crypto@vger.kernel.org, Philippe Ombredanne
>  , Sanyog Kale , "David S.
>  Miller" , linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <20181120032939.gr4...@ziepe.ca>
> 
> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe 
> > > To: Kenneth Lee 
> > > CC: Leon Romanovsky , Kenneth Lee ,
> > >  Tim Sell , linux-...@vger.kernel.org, Alexander
> > >  Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Doug Ledford , Uwe Kleine-König
> > >  , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <20181119184954.gb4...@ziepe.ca>
> > > 
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >  
> > > > If the hardware cannot share page table with the CPU, we then need to 
> > > > have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But 
> > > > this cannot
> > > > solve the COW problem: if the user process A share a page P with 
> > > > device, and A 
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you 
> > > > have
> > > > no way to let the device know it should use P' rather than P.
> > > 
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > > 
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW. 
> > > 
> > > And this causes the device page tables to re-synchronize.
> > 
> > I don't see such code. The current do_cow_fault() implemenation has nothing 
> > to
> > do with mm_notifer.
> 
> Well, that sure sounds like it would be a bug in mmu_notifiers..

Yes, it can be taken that way:) But it is going to be a tough bug.

> 
> But considering Jean's SVA stuff seems based on mmu notifiers, I have
> a hard time believing that it has any different behavior from RDMA's
> ODP, and if it does have different behavior, then it is probably just
> a bug in the ODP implementation.

As Jean has explained, his solution is based on page table sharing. I think ODP
should also consider this new feature.

> 
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> > > > support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> > > > don't need
> > > > to write any code for that. Because it has been done by IOMMU 
> > > > framework. If it
> > > 
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > > 
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.
> 
> This SVA stuff does not look comparable to CAPI as it still requires
> maintaining seperate IOMMU page tables.
> 
> Also, those patches from Jean have a lot of references to
> mmu_notifiers (ie look at iommu_mmu_notifier).
> 
> Are you really sure it is actually any different at all?
> 
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware

Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Herbert Xu
On Tue, Nov 20, 2018 at 05:18:24PM +0100, Jason A. Donenfeld wrote:
>
> N'ack. As I mentioned in the last email, this really isn't okay, and
> mostly defeats the purpose of Zinc in the first place, adding
> complexity in a discombobulated half-baked way. Your proposal seems to
> be the worst of all worlds. Having talked to lots of people about the
> design of Zinc at this point to very positive reception, I'm going to
> move forward with submitting v9, once I rebase with the required
> changes for Eric's latest merge.

Do you have any actual technical reasons for rejecting this apart
from your apparent hatred of the existing crypto API?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Kenneth Lee
On Tue, Nov 20, 2018 at 07:17:44AM +0200, Leon Romanovsky wrote:
> Date: Tue, 20 Nov 2018 07:17:44 +0200
> From: Leon Romanovsky 
> To: Kenneth Lee 
> CC: Jason Gunthorpe , Kenneth Lee , Tim
>  Sell , linux-...@vger.kernel.org, Alexander
>  Shishkin , Zaibo Xu
>  , zhangfei@foxmail.com, linux...@huawei.com,
>  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
>  , Gavin Schenk , RDMA mailing
>  list , Zhou Wang ,
>  Doug Ledford , Uwe Kleine-König
>  , David Kershner
>  , Johan Hovold , Cyrille
>  Pitchen , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Vinod Koul
>  , linux-crypto@vger.kernel.org, Philippe Ombredanne
>  , Sanyog Kale , "David S.
>  Miller" , linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20181120051743.gd25...@mtr-leonro.mtl.com>
> 
> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe 
> > > To: Kenneth Lee 
> > > CC: Leon Romanovsky , Kenneth Lee ,
> > >  Tim Sell , linux-...@vger.kernel.org, Alexander
> > >  Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Doug Ledford , Uwe Kleine-König
> > >  , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <20181119184954.gb4...@ziepe.ca>
> > >
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to 
> > > > have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But 
> > > > this cannot
> > > > solve the COW problem: if the user process A share a page P with 
> > > > device, and A
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you 
> > > > have
> > > > no way to let the device know it should use P' rather than P.
> > >
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > >
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW.
> > >
> > > And this causes the device page tables to re-synchronize.
> >
> > I don't see such code. The current do_cow_fault() implemenation has nothing 
> > to
> > do with mm_notifer.
> >
> > >
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> > > > support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> > > > don't need
> > > > to write any code for that. Because it has been done by IOMMU 
> > > > framework. If it
> > >
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > >
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.
> > >
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.
> >
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.
> >
> > Yes. That's our first expectation. We can keep it with our driver. But 
> > because
> > there is no user driver support for any accelerator in mainline kernel. 
> > Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.
> >
> > >
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.
> >
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion 
> > and
> > try not to add complexity to the mm subsystem.
> >
> > >
> > > If it needs to do a bunch of other stuff, not rel

Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Theodore Y. Ts'o
On Tue, Nov 20, 2018 at 05:24:41PM +0100, Jason A. Donenfeld wrote:
> On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu  
> wrote:
> > Yes.  In fact it's used for FIPS certification testing.
> > Sure, nobody sane should be doing it.  But when it comes to
> > government certification... :)
> 
> The kernel does not aim toward any FIPS certification, and we're not
> going to start bloating our designs to fulfill this. It's never been a
> goal. Maybe ask Ted to add a FIPS mode to random.c and see what
> happens... When you start arguing "because FIPS!" as your
> justification, you really hit a head scratcher.

There are crazy people who go for FIPS certification for the kernel.
That's why crypto/drbg.c exists.  There is a crazy fips mode in
drivers/char/random.c which causes the kernel to panic with a 1 in
2**80 probability each time _extract_entropy is called.  It's not the
default, and I have no idea if any one uses it, or if it's like the
NIST OSI mandate, which forced everyone to buy an OSI networking stack
--- and then put it on the shelf and use TCP/IP instead.

Press release from March 2018:

https://www.redhat.com/en/about/press-releases/red-hat-completes-fips-140-2-re-certification-red-hat-enterprise-linux-7

- Ted


Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Jason A. Donenfeld
Hi Martin,

On Tue, Nov 20, 2018 at 5:29 PM Martin Willi  wrote:
> Thanks for the offer, no need at this time. But I certainly would
> welcome if you could do some (Wireguard) benching with that code to see
> if it works for you.

I certainly will test it in a few different network circumstances,
especially since real testing like this is sometimes more telling than
busy-loop benchmarks.

> > Actually, similarly here, a 10nm Cannon Lake machine should be
> > arriving at my house this week, which should make for some
> > interesting testing ground for non-throttled zmm, if you'd like to
> > play with it.
>
> Maybe in a future iteration, thanks. In fact would it be interesting to
> know if Cannon Lake can handle that throttling better.

Everything I've read on the Internet seems to indicate that's the
case, so one of the first things I'll be doing is seeing if that's
true. There are also the AVX512 IFMA instructions to play with!

Jason


[PATCH 3/3] crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 272 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 279 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
index 261097578715..55d34de29e3e 100644
--- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -12,6 +12,11 @@
 CTR2BL:.octa 0x
.octa 0x0001
 
+.section   .rodata.cst32.CTR4BL, "aM", @progbits, 32
+.align 32
+CTR4BL:.octa 0x0002
+   .octa 0x0003
+
 .section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
 .align 32
 CTR8BL:.octa 0x000300020001
@@ -185,6 +190,273 @@ ENTRY(chacha20_2block_xor_avx512vl)
 
 ENDPROC(chacha20_2block_xor_avx512vl)
 
+ENTRY(chacha20_4block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 4 data blocks output, o
+   # %rdx: up to 4 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts four ChaCha20 block by loading the state
+   # matrix four times across eight AVX registers. It performs matrix
+   # operations on four words in two matrices in parallel, sequentially
+   # to the operations on the four words of the other two matrices. The
+   # required word shuffling has a rather high latency, we can do the
+   # arithmetic on two matrix-pairs without much slowdown.
+
+   vzeroupper
+
+   # x0..3[0-4] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vmovdqa %ymm0,%ymm4
+   vmovdqa %ymm1,%ymm5
+   vmovdqa %ymm2,%ymm6
+   vmovdqa %ymm3,%ymm7
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+   vpaddd  CTR4BL(%rip),%ymm7,%ymm7
+
+   vmovdqa %ymm0,%ymm11
+   vmovdqa %ymm1,%ymm12
+   vmovdqa %ymm2,%ymm13
+   vmovdqa %ymm3,%ymm14
+   vmovdqa %ymm7,%ymm15
+
+   mov $10,%rax
+
+.Ldoubleround4:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $16,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $12,%ymm5,%ymm5
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $8,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $7,%ymm5,%ymm5
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   vpshufd $0x39,%ymm5,%ymm5
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   vpshufd $0x4e,%ymm6,%ymm6
+   # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm3,%ymm3
+   vpshufd $0x93,%ymm7,%ymm7
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $16,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $12,%ymm5,%ymm5
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4

[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-20 Thread Martin Willi
In the quest for pushing the limits of chacha20 encryption for both IPsec
and Wireguard, this small series adds AVX-512VL block functions. The VL
variant works on 256-bit ymm registers, but compared to AVX2 can benefit
from the new instructions.

Compared to the AVX2 version, these block functions bring an overall
speed improvement across encryption lengths of ~20%. Below the tcrypt
results for additional block sizes in kOps/s, for the current AVX2
code path, the new AVX-512VL code path and the comparison to Zinc in
AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).

These numbers result in a very nice chart, available at:
  https://download.strongswan.org/misc/chacha-avx-512vl.svg

 zinc   zinc
 len   avx2  512vl   avx2  512vl
   8   5719   5672   5468   5612
  16   5675   5627   5355   5621
  24   5687   5601   5322   5633
  32   5667   5622   5244   5564
  40   5603   5582   5337   5578
  48   5638   5539   5400   5556
  56   5624   5566   5375   5482
  64   5590   5573   5352   5531
  72   4841   5467   3365   3457
  80   5316   5761   3310   3381
  88   4798   5470   3239   3343
  96   5324   5723   3197   3281
 104   4819   5460   3155   3232
 112   5266   5749   3020   3195
 120   4776   5391   2959   3145
 128   5291   5723   3398   3489
 136   4122   4837   3321   3423
 144   4507   5057   3247   3389
 152   4139   4815   3233   3329
 160   4482   5043   3159   3256
 168   4142   4766   3131   3224
 176   4506   5028   3073   3162
 184   4119   4772   3010   3109
 192   4499   5016   3402   3502
 200   4127   4766   3329   3448
 208   4452   5012   3276   3371
 216   4128   4744   3243   3334
 224   4484   5008   3203   3298
 232   4103   4772   3141   3237
 240   4458   4963   3115   3217
 248   4121   4751   3085   3177
 256   4461   4987   3364   4046
 264   3406   4282   3270   4006
 272   3408   4287   3207   3961
 280   3371   4271   3203   3825
 288   3625   4301   3129   3751
 296   3402   4283   3093   3688
 304   3401   4247   3062   3637
 312   3382   4282   2995   3614
 320   3611   4279   3305   4070
 328   3386   4260   3276   3968
 336   3369   4288   3171   3929
 344   3389   4289   3134   3847
 352   3609   4266   3127   3720
 360   3355   4252   3076   3692
 368   3387   4264   3048   3650
 376   3387   4238   2967   3553
 384   3568   4265   3277   4035
 392   3369   4262   3299   3973
 400   3362   4235   3239   3899
 408   3352   4269   3196   3843
 416   3585   4243   3127   3736
 424   3364   4216   3092   3672
 432   3341   4246   3067   3628
 440   3353   4235   3018   3593
 448   3538   4245   3327   4035
 456   3322   4244   3275   3900
 464   3340   4237   3212   3880
 472   3330   4242   3054   3802
 480   3530   4234   3078   3707
 488   3337   4228   3094   3664
 496   3330   4223   3015   3591
 504   3317   4214   3002   3517
 512   3531   4197   3339   4016
 520   2511   3101   2030   2682
 528   2627   3087   2027   2641
 536   2508   3102   2001   2601
 544   2638   3090   1964   2564
 552   2494   3077   1962   2516
 560   2625   3064   1941   2515
 568   2500   3086   1922   2493
 576   2611   3074   2050   2689
 584   2482   3062   2041   2680
 592   2595   3074   2026   2644
 600   2470   3060   1985   2595
 608   2581   3039   1961   2555
 616   2478   3062   1956   2521
 624   2587   3066   1930   2493
 632   2457   3053   1923   2486
 640   2581   3050   2059   2712
 648   2296   2839   2024   2655
 656   2389   2845   2019   2642
 664   2292   2842   2002   2610
 672   2404   2838   1959   2537
 680   2273   2827   1956   2527
 688   2389   2840   1938   2510
 696   2280   2837   1911   2463
 704   2370   2819   2055   2702
 712   2277   2834   2029   2663
 720   2369   2829   2020   2625
 728   2255   2820   2001   2600
 736   2373   2819   1958   2543
 744   2269   2827   1956   2524
 752   2364   2817   1937   2492
 760   2270   2805   1909   2483
 768   2378   2820   2050   2696
 776   2053   2700   2002   2643
 784   2066   2693   1922   2640
 792   2065   2703   1928   2602
 800   2138   2706   1962   2535
 808   2065   2679   1938   2528
 816   2063   2699   1929   2500
 824   2053   2676   1915   2468
 832   2149   2692   2036   2693
 840   2055   2689   2024   2659
 848   2049   2689   2006   2610
 856   2057   2702   1979   2585
 864   2144   2703   1960   2547
 872   2047   2685   1945   2501
 880   2055   2683   1902   2497
 888   2060   2689   1897   2478
 896   2139   2693   2023   2663
 904   2049   2686   1970   2644
 912   2055   2688   1925   2621
 920   2047   2685   1911   2572
 928   2114   2695   1907   2545
 936   2055   2681   1927   2492
 944   2055   2693   1930   2478
 952   2042   2688   1909   2471
 960   2136   2682   2014   2672
 968   2054   2687   1999   2626
 976   2040   2682   1982   2598
 984   2055   2687   1943   2569
 992   2138   2694   1884   2522
1000   2036   2681   1929   2506
1008   2052   2676   1926   2475
1016   2050   2686   1889   2430
1024   2125   2670   2039   2656

[PATCH 2/3] crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 171 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 178 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
index e1877afcaa73..261097578715 100644
--- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -7,6 +7,11 @@
 
 #include 
 
+.section   .rodata.cst32.CTR2BL, "aM", @progbits, 32
+.align 32
+CTR2BL:.octa 0x
+   .octa 0x0001
+
 .section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
 .align 32
 CTR8BL:.octa 0x000300020001
@@ -14,6 +19,172 @@ CTR8BL: .octa 0x000300020001
 
 .text
 
+ENTRY(chacha20_2block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 2 data blocks output, o
+   # %rdx: up to 2 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts two ChaCha20 blocks by loading the state
+   # matrix twice across four AVX registers. It performs matrix operations
+   # on four words in each matrix in parallel, but requires shuffling to
+   # rearrange the words after each round.
+
+   vzeroupper
+
+   # x0..3[0-2] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+
+   vmovdqa %ymm0,%ymm8
+   vmovdqa %ymm1,%ymm9
+   vmovdqa %ymm2,%ymm10
+   vmovdqa %ymm3,%ymm11
+
+   mov $10,%rax
+
+.Ldoubleround:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm3,%ymm3
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm3,%ymm3
+
+   dec %rax
+   jnz .Ldoubleround
+
+   # o0 = i0 ^ (x0 + s0)
+   vpaddd  %ymm8,%ymm0,%ymm7
+   cmp $0x10,%rcx
+   jl  .Lxorpart2
+   vpxord  0x00(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x00(%rsi)
+   vextracti128$1,%ymm7,%xmm0
+   # o1 = i1 ^ (x1 + s1)
+   vpaddd  %ymm9,%ymm1,%ymm7
+   cmp $0x20,%rcx
+   jl  .Lxorpart2
+   vpxord  0x10(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x10(%rsi)
+   vextracti128$1,%ymm7,%xmm1
+   # o2 = i2 ^ (x2 + s2)
+   vpaddd  %ymm10,%ymm2,%ymm7
+   cmp $0x30,%rcx
+   jl  .Lxorpart2
+   vpxord  0x20(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x20(%rsi)
+   vextracti128$1,%ymm7,%xmm2
+   # o3 = i3 ^ (x3 + 

[PATCH 1/3] crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/Makefile   |   5 +
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 396 +
 arch/x86/crypto/chacha20_glue.c|  26 ++
 3 files changed, 427 insertions(+)
 create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a4b0007a54e1..ce4e43642984 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -8,6 +8,7 @@ OBJECT_FILES_NON_STANDARD := y
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
 avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
$(comma)4)$(comma)%ymm2,yes,no)
+avx512_supported :=$(call as-instr,vpmovm2b %k1$(comma)%zmm5,yes,no)
 sha1_ni_supported :=$(call as-instr,sha1msg1 %xmm0$(comma)%xmm1,yes,no)
 sha256_ni_supported :=$(call as-instr,sha256msg1 %xmm0$(comma)%xmm1,yes,no)
 
@@ -103,6 +104,10 @@ ifeq ($(avx2_supported),yes)
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
 endif
 
+ifeq ($(avx512_supported),yes)
+   chacha20-x86_64-y += chacha20-avx512vl-x86_64.o
+endif
+
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
new file mode 100644
index ..e1877afcaa73
--- /dev/null
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX-512VL functions
+ *
+ * Copyright (C) 2018 Martin Willi
+ */
+
+#include 
+
+.section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
+.align 32
+CTR8BL:.octa 0x000300020001
+   .octa 0x0007000600050004
+
+.text
+
+ENTRY(chacha20_8block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 8 data blocks output, o
+   # %rdx: up to 8 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts eight consecutive ChaCha20 blocks by loading
+   # the state matrix in AVX registers eight times. Compared to AVX2, this
+   # mostly benefits from the new rotate instructions in VL and the
+   # additional registers.
+
+   vzeroupper
+
+   # x0..15[0-7] = s[0..15]
+   vpbroadcastd0x00(%rdi),%ymm0
+   vpbroadcastd0x04(%rdi),%ymm1
+   vpbroadcastd0x08(%rdi),%ymm2
+   vpbroadcastd0x0c(%rdi),%ymm3
+   vpbroadcastd0x10(%rdi),%ymm4
+   vpbroadcastd0x14(%rdi),%ymm5
+   vpbroadcastd0x18(%rdi),%ymm6
+   vpbroadcastd0x1c(%rdi),%ymm7
+   vpbroadcastd0x20(%rdi),%ymm8
+   vpbroadcastd0x24(%rdi),%ymm9
+   vpbroadcastd0x28(%rdi),%ymm10
+   vpbroadcastd0x2c(%rdi),%ymm11
+   vpbroadcastd0x30(%rdi),%ymm12
+   vpbroadcastd0x34(%rdi),%ymm13
+   vpbroadcastd0x38(%rdi),%ymm14
+   vpbroadcastd0x3c(%rdi),%ymm15
+
+   # x12 += counter values 0-3
+   vpaddd  CTR8BL(%rip),%ymm12,%ymm12
+
+   vmovdqa64   %ymm0,%ymm16
+   vmovdqa64   %ymm1,%ymm17
+   vmovdqa64   %ymm2,%ymm18
+   vmovdqa64   %ymm3,%ymm19
+   vmovdqa64   %ymm4,%ymm20
+   vmovdqa64   %ymm5,%ymm21
+   vmovdqa64   %ymm6,%ymm22
+   vmovdqa64   %ymm7,%ymm23
+   vmovdqa64   %ymm8,%ymm24
+   vmovdqa64   %ymm9,%ymm25
+   vmovdqa64   %ymm10,%ymm26
+   vmovdqa64   %ymm11,%ymm27
+   vmovdqa64   %ymm12,%ymm28
+   vmovdqa64   %ymm13,%ymm29
+   vmovdqa64   %ymm14,%ymm30
+   vmovdqa64   %ymm15,%ymm31
+
+   mov $10,%eax
+
+.Ldoubleround8:
+   # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+   vpaddd  %ymm0,%ymm4,%ymm0
+   vpxord  %ymm0,%ymm12,%ymm12
+   vprold  $16,%ymm12,%ymm12
+   # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+   

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Martin Willi
Hi Jason,

> [...] I have a massive Xeon Gold 5120 machine that I can give you
> access to if you'd like to do some testing and benching.

Thanks for the offer, no need at this time. But I certainly would
welcome if you could do some (Wireguard) benching with that code to see
if it works for you.

> Actually, similarly here, a 10nm Cannon Lake machine should be
> arriving at my house this week, which should make for some
> interesting testing ground for non-throttled zmm, if you'd like to
> play with it.

Maybe in a future iteration, thanks. In fact would it be interesting to
know if Cannon Lake can handle that throttling better.

Regards
Martin



Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Jason A. Donenfeld
On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu  wrote:
> Yes.  In fact it's used for FIPS certification testing.
> Sure, nobody sane should be doing it.  But when it comes to
> government certification... :)

The kernel does not aim toward any FIPS certification, and we're not
going to start bloating our designs to fulfill this. It's never been a
goal. Maybe ask Ted to add a FIPS mode to random.c and see what
happens... When you start arguing "because FIPS!" as your
justification, you really hit a head scratcher.

> They've already paid for the indirect
> function call so why make them go through yet another run-time
> branch?

The indirect function call in the crypto API is the performance hit.
The branch in Zinc is not, as the predictor does the correct thing
every single time. I'm not able to distinguish between the two looking
at the performance measurements between it being there and the branch
being commented out.

Give me a few more days to finish v9's latest required changes for
chacha12, and then I'll submit a revision that I think should address
the remaining technical objections raised over the last several months
we've been discussing this.

Regards,
Jason


Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Jason A. Donenfeld
Hi Herbert,

On Tue, Nov 20, 2018 at 7:02 AM Herbert Xu  wrote:
> Here is an updated version demonstrating how we could access the
> accelerated versions of chacha20.  It also includes a final patch
> to deposit the new zinc version of x86-64 chacha20 into
> arch/x86/crypto where it can be used by both the crypto API as well
> as zinc.

N'ack. As I mentioned in the last email, this really isn't okay, and
mostly defeats the purpose of Zinc in the first place, adding
complexity in a discombobulated half-baked way. Your proposal seems to
be the worst of all worlds. Having talked to lots of people about the
design of Zinc at this point to very positive reception, I'm going to
move forward with submitting v9, once I rebase with the required
changes for Eric's latest merge.

Jason


Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Herbert Xu
Hi Ard:

On Tue, Nov 20, 2018 at 11:32:05AM +0100, Ard Biesheuvel wrote:
>
> > 1. The crypto API algorithms remain individually accessible, this
> > is crucial as these algorithm names are exported to user-space so
> > changing the names to foo-zinc is not going to work.
> 
> Are you saying user space may use names like "ctr-aes-neon" directly
> rather than "ctr(aes)" for any implementation of the mode?

Yes.  In fact it's used for FIPS certification testing.

> If so, that is highly unfortunate, since it means we'd be breaking
> user space by wrapping a crypto library function with its own arch
> specific specializations into a generic crypto API wrapper.

You can never break things by introducing new algorithms.  The
problem is only when you remove existing ones.

> Note that I think that using AF_ALG to access software crypto routines
> (as opposed to accelerators) is rather pointless to begin with, but if
> it permits that today than we're stuck with it.

Sure, nobody sane should be doing it.  But when it comes to
government certification... :)

> > 2. The arch-specific algorithm code lives in their own module rather
> > than in a monolithic module.
> 
> This is one of the remaining issues I have with Zinc. However, modulo
> the above concerns, I think it does make sense to have a separate
> function API for synchronous software routines below the current
> crypto API. What I don't want is a separate Zinc kingdom with a
> different name, different maintainers and going through a different
> tree, and with its own approach to test vectors, arch specific code,
> etc etc

Even without the issue of replacing chacha20-generic with
chacha20-zinc which breaks point 1 above, we simply don't want
or need to go through zinc's run-time implementation choice for
the crypto API algorithms.  They've already paid for the indirect
function call so why make them go through yet another run-time
branch?

If the optics of having the code in crypto is the issue, we could
move the actual algorithm code into a third location, perhaps
arch/x86/crypto/lib or arch/x86/lib/crypto.  Then both zinc and
the crypto API can sit on top of that.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v3 01/10] crypto: crypto_user_stat: made crypto_user_stat optional

2018-11-20 Thread Neil Horman
On Tue, Nov 20, 2018 at 12:32:17PM +, Corentin Labbe wrote:
> Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
> compiled.
> This patch made all part of crypto_user_stat uncompiled in that case.
> 
> Signed-off-by: Corentin Labbe 
> ---
>  crypto/Makefile  |  3 ++-
>  crypto/algapi.c  |  2 ++
>  include/crypto/internal/cryptouser.h | 17 +
>  include/linux/crypto.h   |  2 ++
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/crypto/Makefile b/crypto/Makefile
> index abbd86fdbad2..22f9f84f961d 100644
> --- a/crypto/Makefile
> +++ b/crypto/Makefile
> @@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o
>  
>  obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
>  obj-$(CONFIG_CRYPTO_USER) += crypto_user.o
> -crypto_user-y := crypto_user_base.o crypto_user_stat.o
> +crypto_user-y := crypto_user_base.o
> +crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o
>  obj-$(CONFIG_CRYPTO_CMAC) += cmac.o
>  obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
>  obj-$(CONFIG_CRYPTO_VMAC) += vmac.o
> diff --git a/crypto/algapi.c b/crypto/algapi.c
> index 2545c5f89c4c..f5396c88e8cd 100644
> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -258,6 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct 
> crypto_alg *alg)
>   list_add(&alg->cra_list, &crypto_alg_list);
>   list_add(&larval->alg.cra_list, &crypto_alg_list);
>  
> +#ifdef CONFIG_CRYPTO_STATS
>   atomic_set(&alg->encrypt_cnt, 0);
>   atomic_set(&alg->decrypt_cnt, 0);
>   atomic64_set(&alg->encrypt_tlen, 0);
> @@ -265,6 +266,7 @@ static struct crypto_larval *__crypto_register_alg(struct 
> crypto_alg *alg)
>   atomic_set(&alg->verify_cnt, 0);
>   atomic_set(&alg->cipher_err_cnt, 0);
>   atomic_set(&alg->sign_cnt, 0);
> +#endif
>  
If you created a helper function in crypto_user_stat.c to initalize all the
stats, you could ifdef it in the cryptouser.h header if unconfigured, and avoid
the ifdef leakage to this file above.

Neil

>  out:
>   return larval;
> diff --git a/include/crypto/internal/cryptouser.h 
> b/include/crypto/internal/cryptouser.h
> index 8db299c25566..3492ab42eefb 100644
> --- a/include/crypto/internal/cryptouser.h
> +++ b/include/crypto/internal/cryptouser.h
> @@ -3,6 +3,23 @@
>  
>  struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact);
>  
> +#ifdef CONFIG_CRYPTO_STATS
>  int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb);
>  int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, 
> struct nlattr **attrs);
>  int crypto_dump_reportstat_done(struct netlink_callback *cb);
> +#else
> +static int crypto_dump_reportstat(struct sk_buff *skb, struct 
> netlink_callback *cb)
> +{
> + return -ENOTSUPP;
> +}
> +
> +static int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr 
> *in_nlh, struct nlattr **attrs)
> +{
> + return -ENOTSUPP;
> +}
> +
> +static int crypto_dump_reportstat_done(struct netlink_callback *cb)
> +{
> + return -ENOTSUPP;
> +}
> +#endif
> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
> index 3634ad6fe202..3e05053b8d57 100644
> --- a/include/linux/crypto.h
> +++ b/include/linux/crypto.h
> @@ -515,6 +515,7 @@ struct crypto_alg {
>   
>   struct module *cra_module;
>  
> +#ifdef CONFIG_CRYPTO_STATS
>   union {
>   atomic_t encrypt_cnt;
>   atomic_t compress_cnt;
> @@ -552,6 +553,7 @@ struct crypto_alg {
>   atomic_t compute_shared_secret_cnt;
>   };
>   atomic_t sign_cnt;
> +#endif /* CONFIG_CRYPTO_STATS */
>  
>  } CRYPTO_MINALIGN_ATTR;
>  
> -- 
> 2.18.1
> 
> 


[PATCH v3 03/10] crypto: crypto_user_stat: convert all stats from u32 to u64

2018-11-20 Thread Corentin Labbe
All the 32-bit fields need to be 64-bit.  In some cases, UINT32_MAX crypto
operations can be done in seconds.

Reported-by: Eric Biggers 
Signed-off-by: Corentin Labbe 
---
 crypto/algapi.c |  10 +--
 crypto/crypto_user_stat.c   | 114 +++-
 include/crypto/acompress.h  |   8 +--
 include/crypto/aead.h   |   8 +--
 include/crypto/akcipher.h   |  16 ++---
 include/crypto/hash.h   |   6 +-
 include/crypto/kpp.h|  12 ++--
 include/crypto/rng.h|   8 +--
 include/crypto/skcipher.h   |   8 +--
 include/linux/crypto.h  |  46 ++---
 include/uapi/linux/cryptouser.h |  38 +--
 11 files changed, 133 insertions(+), 141 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index f5396c88e8cd..42fe316f80ee 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -259,13 +259,13 @@ static struct crypto_larval *__crypto_register_alg(struct 
crypto_alg *alg)
list_add(&larval->alg.cra_list, &crypto_alg_list);
 
 #ifdef CONFIG_CRYPTO_STATS
-   atomic_set(&alg->encrypt_cnt, 0);
-   atomic_set(&alg->decrypt_cnt, 0);
+   atomic64_set(&alg->encrypt_cnt, 0);
+   atomic64_set(&alg->decrypt_cnt, 0);
atomic64_set(&alg->encrypt_tlen, 0);
atomic64_set(&alg->decrypt_tlen, 0);
-   atomic_set(&alg->verify_cnt, 0);
-   atomic_set(&alg->cipher_err_cnt, 0);
-   atomic_set(&alg->sign_cnt, 0);
+   atomic64_set(&alg->verify_cnt, 0);
+   atomic64_set(&alg->cipher_err_cnt, 0);
+   atomic64_set(&alg->sign_cnt, 0);
 #endif
 
 out:
diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index a6fb2e6f618d..352569f378a0 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -35,22 +35,21 @@ static int crypto_report_aead(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_stat raead;
u64 v64;
-   u32 v32;
 
memset(&raead, 0, sizeof(raead));
 
strscpy(raead.type, "aead", sizeof(raead.type));
 
-   v32 = atomic_read(&alg->encrypt_cnt);
-   raead.stat_encrypt_cnt = v32;
+   v64 = atomic64_read(&alg->encrypt_cnt);
+   raead.stat_encrypt_cnt = v64;
v64 = atomic64_read(&alg->encrypt_tlen);
raead.stat_encrypt_tlen = v64;
-   v32 = atomic_read(&alg->decrypt_cnt);
-   raead.stat_decrypt_cnt = v32;
+   v64 = atomic64_read(&alg->decrypt_cnt);
+   raead.stat_decrypt_cnt = v64;
v64 = atomic64_read(&alg->decrypt_tlen);
raead.stat_decrypt_tlen = v64;
-   v32 = atomic_read(&alg->aead_err_cnt);
-   raead.stat_aead_err_cnt = v32;
+   v64 = atomic64_read(&alg->aead_err_cnt);
+   raead.stat_aead_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead);
 }
@@ -59,22 +58,21 @@ static int crypto_report_cipher(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_stat rcipher;
u64 v64;
-   u32 v32;
 
memset(&rcipher, 0, sizeof(rcipher));
 
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
 
-   v32 = atomic_read(&alg->encrypt_cnt);
-   rcipher.stat_encrypt_cnt = v32;
+   v64 = atomic64_read(&alg->encrypt_cnt);
+   rcipher.stat_encrypt_cnt = v64;
v64 = atomic64_read(&alg->encrypt_tlen);
rcipher.stat_encrypt_tlen = v64;
-   v32 = atomic_read(&alg->decrypt_cnt);
-   rcipher.stat_decrypt_cnt = v32;
+   v64 = atomic64_read(&alg->decrypt_cnt);
+   rcipher.stat_decrypt_cnt = v64;
v64 = atomic64_read(&alg->decrypt_tlen);
rcipher.stat_decrypt_tlen = v64;
-   v32 = atomic_read(&alg->cipher_err_cnt);
-   rcipher.stat_cipher_err_cnt = v32;
+   v64 = atomic64_read(&alg->cipher_err_cnt);
+   rcipher.stat_cipher_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher);
 }
@@ -83,21 +81,20 @@ static int crypto_report_comp(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_stat rcomp;
u64 v64;
-   u32 v32;
 
memset(&rcomp, 0, sizeof(rcomp));
 
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
-   v32 = atomic_read(&alg->compress_cnt);
-   rcomp.stat_compress_cnt = v32;
+   v64 = atomic64_read(&alg->compress_cnt);
+   rcomp.stat_compress_cnt = v64;
v64 = atomic64_read(&alg->compress_tlen);
rcomp.stat_compress_tlen = v64;
-   v32 = atomic_read(&alg->decompress_cnt);
-   rcomp.stat_decompress_cnt = v32;
+   v64 = atomic64_read(&alg->decompress_cnt);
+   rcomp.stat_decompress_cnt = v64;
v64 = atomic64_read(&alg->decompress_tlen);
rcomp.stat_decompress_tlen = v64;
-   v32 = atomic_read(&alg->cipher_err_cnt);
-   rcomp.stat_compress_err_cnt = v32;
+   v64 = atomic64_read(&alg->cipher_err_cnt);
+   rcomp.stat_compress_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp);
 }
@@ -106,21 +1

[PATCH v3 02/10] crypto: CRYPTO_STATS should depend on CRYPTO_USER

2018-11-20 Thread Corentin Labbe
CRYPTO_STATS is using CRYPTO_USER stuff, so it should depends on it.
Signed-off-by: Corentin Labbe 
---
 crypto/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 62dbd1a99fa3..a2f1b4a86b92 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1829,6 +1829,7 @@ config CRYPTO_USER_API_AEAD
 
 config CRYPTO_STATS
bool "Crypto usage statistics for User-space"
+   depends on CRYPTO_USER
help
  This option enables the gathering of crypto stats.
  This will collect:
-- 
2.18.1



[PATCH v3 01/10] crypto: crypto_user_stat: made crypto_user_stat optional

2018-11-20 Thread Corentin Labbe
Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
compiled.
This patch made all part of crypto_user_stat uncompiled in that case.

Signed-off-by: Corentin Labbe 
---
 crypto/Makefile  |  3 ++-
 crypto/algapi.c  |  2 ++
 include/crypto/internal/cryptouser.h | 17 +
 include/linux/crypto.h   |  2 ++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/crypto/Makefile b/crypto/Makefile
index abbd86fdbad2..22f9f84f961d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o
 
 obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
 obj-$(CONFIG_CRYPTO_USER) += crypto_user.o
-crypto_user-y := crypto_user_base.o crypto_user_stat.o
+crypto_user-y := crypto_user_base.o
+crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o
 obj-$(CONFIG_CRYPTO_CMAC) += cmac.o
 obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
 obj-$(CONFIG_CRYPTO_VMAC) += vmac.o
diff --git a/crypto/algapi.c b/crypto/algapi.c
index 2545c5f89c4c..f5396c88e8cd 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -258,6 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct 
crypto_alg *alg)
list_add(&alg->cra_list, &crypto_alg_list);
list_add(&larval->alg.cra_list, &crypto_alg_list);
 
+#ifdef CONFIG_CRYPTO_STATS
atomic_set(&alg->encrypt_cnt, 0);
atomic_set(&alg->decrypt_cnt, 0);
atomic64_set(&alg->encrypt_tlen, 0);
@@ -265,6 +266,7 @@ static struct crypto_larval *__crypto_register_alg(struct 
crypto_alg *alg)
atomic_set(&alg->verify_cnt, 0);
atomic_set(&alg->cipher_err_cnt, 0);
atomic_set(&alg->sign_cnt, 0);
+#endif
 
 out:
return larval;
diff --git a/include/crypto/internal/cryptouser.h 
b/include/crypto/internal/cryptouser.h
index 8db299c25566..3492ab42eefb 100644
--- a/include/crypto/internal/cryptouser.h
+++ b/include/crypto/internal/cryptouser.h
@@ -3,6 +3,23 @@
 
 struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact);
 
+#ifdef CONFIG_CRYPTO_STATS
 int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb);
 int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, struct 
nlattr **attrs);
 int crypto_dump_reportstat_done(struct netlink_callback *cb);
+#else
+static int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback 
*cb)
+{
+   return -ENOTSUPP;
+}
+
+static int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, 
struct nlattr **attrs)
+{
+   return -ENOTSUPP;
+}
+
+static int crypto_dump_reportstat_done(struct netlink_callback *cb)
+{
+   return -ENOTSUPP;
+}
+#endif
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 3634ad6fe202..3e05053b8d57 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -515,6 +515,7 @@ struct crypto_alg {

struct module *cra_module;
 
+#ifdef CONFIG_CRYPTO_STATS
union {
atomic_t encrypt_cnt;
atomic_t compress_cnt;
@@ -552,6 +553,7 @@ struct crypto_alg {
atomic_t compute_shared_secret_cnt;
};
atomic_t sign_cnt;
+#endif /* CONFIG_CRYPTO_STATS */
 
 } CRYPTO_MINALIGN_ATTR;
 
-- 
2.18.1



[PATCH v3 04/10] crypto: crypto_user_stat: split user space crypto stat structures

2018-11-20 Thread Corentin Labbe
It is cleaner to have each stat in their own structures.

Signed-off-by: Corentin Labbe 
---
 crypto/crypto_user_stat.c   |  20 +++
 include/uapi/linux/cryptouser.h | 100 
 2 files changed, 72 insertions(+), 48 deletions(-)

diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index 352569f378a0..3c14be2f7a1b 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -33,7 +33,7 @@ struct crypto_dump_info {
 
 static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat raead;
+   struct crypto_stat_aead raead;
u64 v64;
 
memset(&raead, 0, sizeof(raead));
@@ -56,7 +56,7 @@ static int crypto_report_aead(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rcipher;
+   struct crypto_stat_cipher rcipher;
u64 v64;
 
memset(&rcipher, 0, sizeof(rcipher));
@@ -79,7 +79,7 @@ static int crypto_report_cipher(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rcomp;
+   struct crypto_stat_compress rcomp;
u64 v64;
 
memset(&rcomp, 0, sizeof(rcomp));
@@ -101,7 +101,7 @@ static int crypto_report_comp(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat racomp;
+   struct crypto_stat_compress racomp;
u64 v64;
 
memset(&racomp, 0, sizeof(racomp));
@@ -123,7 +123,7 @@ static int crypto_report_acomp(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rakcipher;
+   struct crypto_stat_akcipher rakcipher;
u64 v64;
 
memset(&rakcipher, 0, sizeof(rakcipher));
@@ -150,7 +150,7 @@ static int crypto_report_akcipher(struct sk_buff *skb, 
struct crypto_alg *alg)
 
 static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rkpp;
+   struct crypto_stat_kpp rkpp;
u64 v;
 
memset(&rkpp, 0, sizeof(rkpp));
@@ -171,7 +171,7 @@ static int crypto_report_kpp(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_ahash(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rhash;
+   struct crypto_stat_hash rhash;
u64 v64;
 
memset(&rhash, 0, sizeof(rhash));
@@ -190,7 +190,7 @@ static int crypto_report_ahash(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_shash(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rhash;
+   struct crypto_stat_hash rhash;
u64 v64;
 
memset(&rhash, 0, sizeof(rhash));
@@ -209,7 +209,7 @@ static int crypto_report_shash(struct sk_buff *skb, struct 
crypto_alg *alg)
 
 static int crypto_report_rng(struct sk_buff *skb, struct crypto_alg *alg)
 {
-   struct crypto_stat rrng;
+   struct crypto_stat_rng rrng;
u64 v64;
 
memset(&rrng, 0, sizeof(rrng));
@@ -248,7 +248,7 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
if (nla_put_u32(skb, CRYPTOCFGA_PRIORITY_VAL, alg->cra_priority))
goto nla_put_failure;
if (alg->cra_flags & CRYPTO_ALG_LARVAL) {
-   struct crypto_stat rl;
+   struct crypto_stat_larval rl;
 
memset(&rl, 0, sizeof(rl));
strscpy(rl.type, "larval", sizeof(rl.type));
diff --git a/include/uapi/linux/cryptouser.h b/include/uapi/linux/cryptouser.h
index 9f8187077ce4..3a70f025e27d 100644
--- a/include/uapi/linux/cryptouser.h
+++ b/include/uapi/linux/cryptouser.h
@@ -76,45 +76,69 @@ struct crypto_user_alg {
__u32 cru_flags;
 };
 
-struct crypto_stat {
-   char type[CRYPTO_MAX_NAME];
-   union {
-   __u64 stat_encrypt_cnt;
-   __u64 stat_compress_cnt;
-   __u64 stat_generate_cnt;
-   __u64 stat_hash_cnt;
-   __u64 stat_setsecret_cnt;
-   };
-   union {
-   __u64 stat_encrypt_tlen;
-   __u64 stat_compress_tlen;
-   __u64 stat_generate_tlen;
-   __u64 stat_hash_tlen;
-   };
-   union {
-   __u64 stat_akcipher_err_cnt;
-   __u64 stat_cipher_err_cnt;
-   __u64 stat_compress_err_cnt;
-   __u64 stat_aead_err_cnt;
-   __u64 stat_hash_err_cnt;
-   __u64 stat_rng_err_cnt;
-   __u64 stat_kpp_err_cnt;
-   };
-   union {
-   __u64 stat_decrypt_cnt;
-   __u64 stat_decompress_cnt;
-   __u64 stat_seed_cnt;
-   __u64 stat_generate_public_key_cnt;
-   };
-   union {
-   __u64 stat_decrypt_tlen;
-   __u

[PATCH v3 08/10] crypto: crypto_user_stat: remove intermediate variable

2018-11-20 Thread Corentin Labbe
The use of the v64 intermediate variable is useless, and removing it
bring to much readable code.

Signed-off-by: Corentin Labbe 
---
 crypto/crypto_user_stat.c | 132 --
 1 file changed, 41 insertions(+), 91 deletions(-)

diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index 838123758423..7b668c659122 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -34,22 +34,16 @@ struct crypto_dump_info {
 static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg)
 {
struct crypto_stat_aead raead;
-   u64 v64;
 
memset(&raead, 0, sizeof(raead));
 
strscpy(raead.type, "aead", sizeof(raead.type));
 
-   v64 = atomic64_read(&alg->encrypt_cnt);
-   raead.stat_encrypt_cnt = v64;
-   v64 = atomic64_read(&alg->encrypt_tlen);
-   raead.stat_encrypt_tlen = v64;
-   v64 = atomic64_read(&alg->decrypt_cnt);
-   raead.stat_decrypt_cnt = v64;
-   v64 = atomic64_read(&alg->decrypt_tlen);
-   raead.stat_decrypt_tlen = v64;
-   v64 = atomic64_read(&alg->aead_err_cnt);
-   raead.stat_aead_err_cnt = v64;
+   raead.stat_encrypt_cnt = atomic64_read(&alg->encrypt_cnt);
+   raead.stat_encrypt_tlen = atomic64_read(&alg->encrypt_tlen);
+   raead.stat_decrypt_cnt = atomic64_read(&alg->decrypt_cnt);
+   raead.stat_decrypt_tlen = atomic64_read(&alg->decrypt_tlen);
+   raead.stat_aead_err_cnt = atomic64_read(&alg->aead_err_cnt);
 
return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead);
 }
@@ -57,22 +51,16 @@ static int crypto_report_aead(struct sk_buff *skb, struct 
crypto_alg *alg)
 static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
 {
struct crypto_stat_cipher rcipher;
-   u64 v64;
 
memset(&rcipher, 0, sizeof(rcipher));
 
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
 
-   v64 = atomic64_read(&alg->encrypt_cnt);
-   rcipher.stat_encrypt_cnt = v64;
-   v64 = atomic64_read(&alg->encrypt_tlen);
-   rcipher.stat_encrypt_tlen = v64;
-   v64 = atomic64_read(&alg->decrypt_cnt);
-   rcipher.stat_decrypt_cnt = v64;
-   v64 = atomic64_read(&alg->decrypt_tlen);
-   rcipher.stat_decrypt_tlen = v64;
-   v64 = atomic64_read(&alg->cipher_err_cnt);
-   rcipher.stat_cipher_err_cnt = v64;
+   rcipher.stat_encrypt_cnt = atomic64_read(&alg->encrypt_cnt);
+   rcipher.stat_encrypt_tlen = atomic64_read(&alg->encrypt_tlen);
+   rcipher.stat_decrypt_cnt =  atomic64_read(&alg->decrypt_cnt);
+   rcipher.stat_decrypt_tlen = atomic64_read(&alg->decrypt_tlen);
+   rcipher.stat_cipher_err_cnt =  atomic64_read(&alg->cipher_err_cnt);
 
return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher);
 }
@@ -80,21 +68,15 @@ static int crypto_report_cipher(struct sk_buff *skb, struct 
crypto_alg *alg)
 static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
 {
struct crypto_stat_compress rcomp;
-   u64 v64;
 
memset(&rcomp, 0, sizeof(rcomp));
 
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
-   v64 = atomic64_read(&alg->compress_cnt);
-   rcomp.stat_compress_cnt = v64;
-   v64 = atomic64_read(&alg->compress_tlen);
-   rcomp.stat_compress_tlen = v64;
-   v64 = atomic64_read(&alg->decompress_cnt);
-   rcomp.stat_decompress_cnt = v64;
-   v64 = atomic64_read(&alg->decompress_tlen);
-   rcomp.stat_decompress_tlen = v64;
-   v64 = atomic64_read(&alg->compress_err_cnt);
-   rcomp.stat_compress_err_cnt = v64;
+   rcomp.stat_compress_cnt = atomic64_read(&alg->compress_cnt);
+   rcomp.stat_compress_tlen = atomic64_read(&alg->compress_tlen);
+   rcomp.stat_decompress_cnt = atomic64_read(&alg->decompress_cnt);
+   rcomp.stat_decompress_tlen = atomic64_read(&alg->decompress_tlen);
+   rcomp.stat_compress_err_cnt = atomic64_read(&alg->compress_err_cnt);
 
return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp);
 }
@@ -102,21 +84,15 @@ static int crypto_report_comp(struct sk_buff *skb, struct 
crypto_alg *alg)
 static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
 {
struct crypto_stat_compress racomp;
-   u64 v64;
 
memset(&racomp, 0, sizeof(racomp));
 
strscpy(racomp.type, "acomp", sizeof(racomp.type));
-   v64 = atomic64_read(&alg->compress_cnt);
-   racomp.stat_compress_cnt = v64;
-   v64 = atomic64_read(&alg->compress_tlen);
-   racomp.stat_compress_tlen = v64;
-   v64 = atomic64_read(&alg->decompress_cnt);
-   racomp.stat_decompress_cnt = v64;
-   v64 = atomic64_read(&alg->decompress_tlen);
-   racomp.stat_decompress_tlen = v64;
-   v64 = atomic64_read(&alg->compress_err_cnt);
-   racomp.stat_compress_err_cnt = v64;
+   racomp.stat_compress_cnt = atomic64_read(&alg->compress_cnt);
+   racomp.stat_compress_tlen = atomic64_rea

[PATCH v3 10/10] crypto: crypto_user_stat: rename err_cnt parameter

2018-11-20 Thread Corentin Labbe
Since now all crypto stats are on their own structures, it is now
useless to have the algorithm name in the err_cnt member.

Signed-off-by: Corentin Labbe 
---
 crypto/algapi.c | 38 -
 crypto/crypto_user_stat.c   | 18 
 include/linux/crypto.h  | 28 
 include/uapi/linux/cryptouser.h | 14 ++--
 tools/crypto/getstat.c  | 18 
 5 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index 7ef812c4872f..ec3866d6ced0 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -1082,7 +1082,7 @@ void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, 
int ret,
  struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.cipher.cipher_err_cnt);
+   atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.encrypt_cnt);
atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen);
@@ -1094,7 +1094,7 @@ void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, 
int ret,
  struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.cipher.cipher_err_cnt);
+   atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.decrypt_cnt);
atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen);
@@ -1106,7 +1106,7 @@ void crypto_stats_aead_encrypt(unsigned int cryptlen,
struct crypto_alg *alg, int ret)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.aead.aead_err_cnt);
+   atomic64_inc(&alg->stats.aead.err_cnt);
} else {
atomic64_inc(&alg->stats.aead.encrypt_cnt);
atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen);
@@ -1118,7 +1118,7 @@ void crypto_stats_aead_decrypt(unsigned int cryptlen,
struct crypto_alg *alg, int ret)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.aead.aead_err_cnt);
+   atomic64_inc(&alg->stats.aead.err_cnt);
} else {
atomic64_inc(&alg->stats.aead.decrypt_cnt);
atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen);
@@ -1130,7 +1130,7 @@ void crypto_stats_akcipher_encrypt(unsigned int src_len, 
int ret,
struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.err_cnt);
} else {
atomic64_inc(&alg->stats.akcipher.encrypt_cnt);
atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen);
@@ -1142,7 +1142,7 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, 
int ret,
struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.err_cnt);
} else {
atomic64_inc(&alg->stats.akcipher.decrypt_cnt);
atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen);
@@ -1153,7 +1153,7 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, 
int ret,
 void crypto_stats_akcipher_sign(int ret, struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
-   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.err_cnt);
else
atomic64_inc(&alg->stats.akcipher.sign_cnt);
crypto_alg_put(alg);
@@ -1162,7 +1162,7 @@ void crypto_stats_akcipher_sign(int ret, struct 
crypto_alg *alg)
 void crypto_stats_akcipher_verify(int ret, struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
-   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.err_cnt);
else
atomic64_inc(&alg->stats.akcipher.verify_cnt);
crypto_alg_put(alg);
@@ -1172,7 +1172,7 @@ void crypto_stats_compress(unsigned int slen, int ret,
struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->stats.compress.compress_err_cnt);
+   atomic64_inc(&alg->stats.compress.err_cnt);
} else {
atomic64_inc(&alg->stats.compress.compress_cnt);
atomic64_add(slen, &alg->stats.compress.compress_tlen);
@@ -

[PATCH v3 06/10] crypto: crypto_user_stat: fix use_after_free of struct xxx_request

2018-11-20 Thread Corentin Labbe
All crypto_stats functions use the struct xxx_request for feeding stats,
but in some case this structure could already be freed.

For fixing this, the needed parameters (len and alg) will be stored
before the request being executed.
Fixes: cac5818c25d0 ("crypto: user - Implement a generic crypto statistics")
Reported-by: syzbot 

Signed-off-by: Corentin Labbe 
---
 crypto/ahash.c |  17 ++-
 crypto/algapi.c| 283 +
 crypto/rng.c   |   4 +-
 include/crypto/acompress.h |  38 ++---
 include/crypto/aead.h  |  38 ++---
 include/crypto/akcipher.h  |  74 ++
 include/crypto/hash.h  |  32 +
 include/crypto/kpp.h   |  48 ++-
 include/crypto/rng.h   |  27 +---
 include/crypto/skcipher.h  |  36 ++---
 include/linux/crypto.h |  63 -
 11 files changed, 384 insertions(+), 276 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 3a348fbcf8f9..5d320a811f75 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -364,20 +364,28 @@ static int crypto_ahash_op(struct ahash_request *req,
 
 int crypto_ahash_final(struct ahash_request *req)
 {
+   struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+   struct crypto_alg *alg = tfm->base.__crt_alg;
+   unsigned int nbytes = req->nbytes;
int ret;
 
+   crypto_stats_get(alg);
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->final);
-   crypto_stat_ahash_final(req, ret);
+   crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
 }
 EXPORT_SYMBOL_GPL(crypto_ahash_final);
 
 int crypto_ahash_finup(struct ahash_request *req)
 {
+   struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+   struct crypto_alg *alg = tfm->base.__crt_alg;
+   unsigned int nbytes = req->nbytes;
int ret;
 
+   crypto_stats_get(alg);
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->finup);
-   crypto_stat_ahash_final(req, ret);
+   crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
 }
 EXPORT_SYMBOL_GPL(crypto_ahash_finup);
@@ -385,13 +393,16 @@ EXPORT_SYMBOL_GPL(crypto_ahash_finup);
 int crypto_ahash_digest(struct ahash_request *req)
 {
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+   struct crypto_alg *alg = tfm->base.__crt_alg;
+   unsigned int nbytes = req->nbytes;
int ret;
 
+   crypto_stats_get(alg);
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
else
ret = crypto_ahash_op(req, tfm->digest);
-   crypto_stat_ahash_final(req, ret);
+   crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
 }
 EXPORT_SYMBOL_GPL(crypto_ahash_digest);
diff --git a/crypto/algapi.c b/crypto/algapi.c
index 42fe316f80ee..6c98851f3a28 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -1078,6 +1078,289 @@ int crypto_type_has_alg(const char *name, const struct 
crypto_type *frontend,
 }
 EXPORT_SYMBOL_GPL(crypto_type_has_alg);
 
+#ifdef CONFIG_CRYPTO_STATS
+void crypto_stats_get(struct crypto_alg *alg)
+{
+   crypto_alg_get(alg);
+}
+
+void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret,
+ struct crypto_alg *alg)
+{
+   if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
+   atomic64_inc(&alg->cipher_err_cnt);
+   } else {
+   atomic64_inc(&alg->encrypt_cnt);
+   atomic64_add(nbytes, &alg->encrypt_tlen);
+   }
+   crypto_alg_put(alg);
+}
+
+void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret,
+ struct crypto_alg *alg)
+{
+   if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
+   atomic64_inc(&alg->cipher_err_cnt);
+   } else {
+   atomic64_inc(&alg->decrypt_cnt);
+   atomic64_add(nbytes, &alg->decrypt_tlen);
+   }
+   crypto_alg_put(alg);
+}
+
+void crypto_stats_aead_encrypt(unsigned int cryptlen,
+   struct crypto_alg *alg, int ret)
+{
+   if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
+   atomic64_inc(&alg->aead_err_cnt);
+   } else {
+   atomic64_inc(&alg->encrypt_cnt);
+   atomic64_add(cryptlen, &alg->encrypt_tlen);
+   }
+   crypto_alg_put(alg);
+}
+
+void crypto_stats_aead_decrypt(unsigned int cryptlen,
+   struct crypto_alg *alg, int ret)
+{
+   if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
+   atomic64_inc(&alg->aead_err_cnt);
+   } else {
+   atomic64_inc(&alg->decrypt_cnt);
+   atomic64_add(cryptlen, &alg->decrypt_tlen);
+   }
+   crypto_alg_put(alg);
+}
+
+void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret,
+   struct crypto_alg *alg)
+{
+   if (ret && ret != -EINPROGRESS && ret != -EBUSY) 

[PATCH v3 07/10] crypto: crypto_user_stat: Fix invalid stat reporting

2018-11-20 Thread Corentin Labbe
Some error count use the wrong name for getting this data.
But this had not caused any reporting problem, since all error count are shared 
in the same
union.

Signed-off-by: Corentin Labbe 
---
 crypto/crypto_user_stat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index 3c14be2f7a1b..838123758423 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -93,7 +93,7 @@ static int crypto_report_comp(struct sk_buff *skb, struct 
crypto_alg *alg)
rcomp.stat_decompress_cnt = v64;
v64 = atomic64_read(&alg->decompress_tlen);
rcomp.stat_decompress_tlen = v64;
-   v64 = atomic64_read(&alg->cipher_err_cnt);
+   v64 = atomic64_read(&alg->compress_err_cnt);
rcomp.stat_compress_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp);
@@ -115,7 +115,7 @@ static int crypto_report_acomp(struct sk_buff *skb, struct 
crypto_alg *alg)
racomp.stat_decompress_cnt = v64;
v64 = atomic64_read(&alg->decompress_tlen);
racomp.stat_decompress_tlen = v64;
-   v64 = atomic64_read(&alg->cipher_err_cnt);
+   v64 = atomic64_read(&alg->compress_err_cnt);
racomp.stat_compress_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_ACOMP, sizeof(racomp), &racomp);
@@ -222,7 +222,7 @@ static int crypto_report_rng(struct sk_buff *skb, struct 
crypto_alg *alg)
rrng.stat_generate_tlen = v64;
v64 = atomic64_read(&alg->seed_cnt);
rrng.stat_seed_cnt = v64;
-   v64 = atomic64_read(&alg->hash_err_cnt);
+   v64 = atomic64_read(&alg->rng_err_cnt);
rrng.stat_rng_err_cnt = v64;
 
return nla_put(skb, CRYPTOCFGA_STAT_RNG, sizeof(rrng), &rrng);
-- 
2.18.1



[PATCH v3 09/10] crypto: crypto_user_stat: Split stats in multiple structures

2018-11-20 Thread Corentin Labbe
Like for userspace, this patch splits stats into multiple structures,
one for each algorithm class.
Signed-off-by: Corentin Labbe 
---
 crypto/algapi.c   | 108 +++
 crypto/crypto_user_stat.c |  82 -
 include/linux/crypto.h| 180 +-
 3 files changed, 210 insertions(+), 160 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index 6c98851f3a28..7ef812c4872f 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -259,13 +259,7 @@ static struct crypto_larval *__crypto_register_alg(struct 
crypto_alg *alg)
list_add(&larval->alg.cra_list, &crypto_alg_list);
 
 #ifdef CONFIG_CRYPTO_STATS
-   atomic64_set(&alg->encrypt_cnt, 0);
-   atomic64_set(&alg->decrypt_cnt, 0);
-   atomic64_set(&alg->encrypt_tlen, 0);
-   atomic64_set(&alg->decrypt_tlen, 0);
-   atomic64_set(&alg->verify_cnt, 0);
-   atomic64_set(&alg->cipher_err_cnt, 0);
-   atomic64_set(&alg->sign_cnt, 0);
+   memset(&alg->stats, 0, sizeof(alg->stats));
 #endif
 
 out:
@@ -1088,10 +1082,10 @@ void crypto_stats_ablkcipher_encrypt(unsigned int 
nbytes, int ret,
  struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->cipher_err_cnt);
+   atomic64_inc(&alg->stats.cipher.cipher_err_cnt);
} else {
-   atomic64_inc(&alg->encrypt_cnt);
-   atomic64_add(nbytes, &alg->encrypt_tlen);
+   atomic64_inc(&alg->stats.cipher.encrypt_cnt);
+   atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1100,10 +1094,10 @@ void crypto_stats_ablkcipher_decrypt(unsigned int 
nbytes, int ret,
  struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->cipher_err_cnt);
+   atomic64_inc(&alg->stats.cipher.cipher_err_cnt);
} else {
-   atomic64_inc(&alg->decrypt_cnt);
-   atomic64_add(nbytes, &alg->decrypt_tlen);
+   atomic64_inc(&alg->stats.cipher.decrypt_cnt);
+   atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1112,10 +1106,10 @@ void crypto_stats_aead_encrypt(unsigned int cryptlen,
struct crypto_alg *alg, int ret)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->aead_err_cnt);
+   atomic64_inc(&alg->stats.aead.aead_err_cnt);
} else {
-   atomic64_inc(&alg->encrypt_cnt);
-   atomic64_add(cryptlen, &alg->encrypt_tlen);
+   atomic64_inc(&alg->stats.aead.encrypt_cnt);
+   atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1124,10 +1118,10 @@ void crypto_stats_aead_decrypt(unsigned int cryptlen,
struct crypto_alg *alg, int ret)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->aead_err_cnt);
+   atomic64_inc(&alg->stats.aead.aead_err_cnt);
} else {
-   atomic64_inc(&alg->decrypt_cnt);
-   atomic64_add(cryptlen, &alg->decrypt_tlen);
+   atomic64_inc(&alg->stats.aead.decrypt_cnt);
+   atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1136,10 +1130,10 @@ void crypto_stats_akcipher_encrypt(unsigned int 
src_len, int ret,
struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
} else {
-   atomic64_inc(&alg->encrypt_cnt);
-   atomic64_add(src_len, &alg->encrypt_tlen);
+   atomic64_inc(&alg->stats.akcipher.encrypt_cnt);
+   atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1148,10 +1142,10 @@ void crypto_stats_akcipher_decrypt(unsigned int 
src_len, int ret,
struct crypto_alg *alg)
 {
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
-   atomic64_inc(&alg->akcipher_err_cnt);
+   atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt);
} else {
-   atomic64_inc(&alg->decrypt_cnt);
-   atomic64_add(src_len, &alg->decrypt_tlen);
+   atomic64_inc(&alg->stats.akcipher.decrypt_cnt);
+   atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen);
}
crypto_alg_put(alg);
 }
@@ -1159,18 +1153,18 @@ void crypto_stats_akcipher_decrypt(unsigned int 
src_len, int r

[PATCH v3 05/10] crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi

2018-11-20 Thread Corentin Labbe
This patch converts the getstat example tool to the recent changes done in 
crypto_user_stat
- changed all stats to u64
- separated struct stats for each crypto alg

Signed-off-by: Corentin Labbe 
---
 tools/crypto/getstat.c | 54 +-
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/tools/crypto/getstat.c b/tools/crypto/getstat.c
index 24115173a483..57fbb94608d4 100644
--- a/tools/crypto/getstat.c
+++ b/tools/crypto/getstat.c
@@ -152,53 +152,53 @@ static int get_stat(const char *drivername)
 
if (tb[CRYPTOCFGA_STAT_HASH]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_HASH];
-   struct crypto_stat *rhash =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tHash\n\tHash: %u bytes: %llu\n\tErrors: %u\n",
+   struct crypto_stat_hash *rhash =
+   (struct crypto_stat_hash *)RTA_DATA(rta);
+   printf("%s\tHash\n\tHash: %llu bytes: %llu\n\tErrors: %llu\n",
drivername,
rhash->stat_hash_cnt, rhash->stat_hash_tlen,
rhash->stat_hash_err_cnt);
} else if (tb[CRYPTOCFGA_STAT_COMPRESS]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_COMPRESS];
-   struct crypto_stat *rblk =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tCompress\n\tCompress: %u bytes: %llu\n\tDecompress: 
%u bytes: %llu\n\tErrors: %u\n",
+   struct crypto_stat_compress *rblk =
+   (struct crypto_stat_compress *)RTA_DATA(rta);
+   printf("%s\tCompress\n\tCompress: %llu bytes: 
%llu\n\tDecompress: %llu bytes: %llu\n\tErrors: %llu\n",
drivername,
rblk->stat_compress_cnt, rblk->stat_compress_tlen,
rblk->stat_decompress_cnt, rblk->stat_decompress_tlen,
rblk->stat_compress_err_cnt);
} else if (tb[CRYPTOCFGA_STAT_ACOMP]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_ACOMP];
-   struct crypto_stat *rcomp =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tACompress\n\tCompress: %u bytes: 
%llu\n\tDecompress: %u bytes: %llu\n\tErrors: %u\n",
+   struct crypto_stat_compress *rcomp =
+   (struct crypto_stat_compress *)RTA_DATA(rta);
+   printf("%s\tACompress\n\tCompress: %llu bytes: 
%llu\n\tDecompress: %llu bytes: %llu\n\tErrors: %llu\n",
drivername,
rcomp->stat_compress_cnt, rcomp->stat_compress_tlen,
rcomp->stat_decompress_cnt, rcomp->stat_decompress_tlen,
rcomp->stat_compress_err_cnt);
} else if (tb[CRYPTOCFGA_STAT_AEAD]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_AEAD];
-   struct crypto_stat *raead =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tAEAD\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u 
bytes: %llu\n\tErrors: %u\n",
+   struct crypto_stat_aead *raead =
+   (struct crypto_stat_aead *)RTA_DATA(rta);
+   printf("%s\tAEAD\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: %llu 
bytes: %llu\n\tErrors: %llu\n",
drivername,
raead->stat_encrypt_cnt, raead->stat_encrypt_tlen,
raead->stat_decrypt_cnt, raead->stat_decrypt_tlen,
raead->stat_aead_err_cnt);
} else if (tb[CRYPTOCFGA_STAT_BLKCIPHER]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_BLKCIPHER];
-   struct crypto_stat *rblk =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tCipher\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u 
bytes: %llu\n\tErrors: %u\n",
+   struct crypto_stat_cipher *rblk =
+   (struct crypto_stat_cipher *)RTA_DATA(rta);
+   printf("%s\tCipher\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: 
%llu bytes: %llu\n\tErrors: %llu\n",
drivername,
rblk->stat_encrypt_cnt, rblk->stat_encrypt_tlen,
rblk->stat_decrypt_cnt, rblk->stat_decrypt_tlen,
rblk->stat_cipher_err_cnt);
} else if (tb[CRYPTOCFGA_STAT_AKCIPHER]) {
struct rtattr *rta = tb[CRYPTOCFGA_STAT_AKCIPHER];
-   struct crypto_stat *rblk =
-   (struct crypto_stat *)RTA_DATA(rta);
-   printf("%s\tAkcipher\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u 
bytes: %llu\n\tSign: %u\n\tVerify: %u\n\tErrors: %u\n",
+   struct crypto_stat_akcipher *rblk =
+   (struct crypto_stat_akcipher *)RTA_DATA(rta);
+   printf("%s\tAkcipher\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: 
%llu bytes: %llu\n\tSign:

[PATCH v3 00/10] crypto: crypto_user_stat: misc enhancement

2018-11-20 Thread Corentin Labbe
Hello

This patchset fixes all reported problem by Eric biggers.

Regards

Changes since v3:
- moved all crypto_stats functions from header to algapi.c for using
  crypto_alg_get/put

Changes since v2:
- Better locking of crypto_alg via crypto_alg_get/crypto_alg_put
- remove all intermediate variables in crypto/crypto_user_stat.c
- splited all internal stats variables into different structures

Corentin Labbe (10):
  crypto: crypto_user_stat: made crypto_user_stat optional
  crypto: CRYPTO_STATS should depend on CRYPTO_USER
  crypto: crypto_user_stat: convert all stats from u32 to u64
  crypto: crypto_user_stat: split user space crypto stat structures
  crypto: tool: getstat: convert user space example to the new
crypto_user_stat uapi
  crypto: crypto_user_stat: fix use_after_free of struct xxx_request
  crypto: crypto_user_stat: Fix invalid stat reporting
  crypto: crypto_user_stat: remove intermediate variable
  crypto: crypto_user_stat: Split stats in multiple structures
  crypto: crypto_user_stat: rename err_cnt parameter

 crypto/Kconfig   |   1 +
 crypto/Makefile  |   3 +-
 crypto/ahash.c   |  17 +-
 crypto/algapi.c  | 293 ++-
 crypto/crypto_user_stat.c| 160 +--
 crypto/rng.c |   4 +-
 include/crypto/acompress.h   |  38 +---
 include/crypto/aead.h|  38 +---
 include/crypto/akcipher.h|  74 ++-
 include/crypto/hash.h|  32 +--
 include/crypto/internal/cryptouser.h |  17 ++
 include/crypto/kpp.h |  48 +
 include/crypto/rng.h |  27 +--
 include/crypto/skcipher.h|  36 +---
 include/linux/crypto.h   | 245 +-
 include/uapi/linux/cryptouser.h  | 102 ++
 tools/crypto/getstat.c   |  72 +++
 17 files changed, 677 insertions(+), 530 deletions(-)

-- 
2.18.1



Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Jean-Philippe Brucker
On 20/11/2018 09:16, Jonathan Cameron wrote:
> +CC Jean-Phillipe and iommu list.

Thanks for the Cc, sorry I don't have enough bandwidth to follow this
thread at the moment.

> In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> support
> SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> don't need
> to write any code for that. Because it has been done by IOMMU framework. 
> If it  

 Looks like the IOMMU code uses mmu_notifier, so it is identical to
 IB's ODP. The only difference is that IB tends to have the IOMMU page
 table in the device, not in the CPU.

 The only case I know if that is different is the new-fangled CAPI
 stuff where the IOMMU can directly use the CPU's page table and the
 IOMMU page table (in device or CPU) is eliminated.  
>>>
>>> Yes. We are not focusing on the current implementation. As mentioned in the
>>> cover letter. We are expecting Jean Philips' SVA patch:
>>> git://linux-arm.org/linux-jpb.  
>>
>> This SVA stuff does not look comparable to CAPI as it still requires
>> maintaining seperate IOMMU page tables.

With SVA, we use the same page tables in the IOMMU and CPU. It's the
same pgd pointer, there is no mirroring of mappings. We bind the process
page tables with the device using a PASID (Process Address Space ID).

After fork(), the child's mm is different from the parent's one, and is
not automatically bound to the device. The device driver will have to
issue a new bind() request, and the child mm will be bound with a
different PASID.

There could be a problem if the child inherits the parent's device
handle. Then depending on the device, the child could be able to program
DMA and possibly access the parent's address space. The parent needs to
be aware of that when using the bind() API, and close the device fd in
the child after fork().

We use MMU notifiers for some address space changes:

* The IOTLB needs to be invalidated after any unmap() to the process
address space. On Arm systems the SMMU IOTLBs can be invalidated by the
CPU TLBI instructions, but we still need to invalidate TLBs private to
devices that are arch-agnostic (Address Translation Cache in PCI ATS).

* When the process mm exits, we need to remove the associated PASID
configuration in the IOMMU and invalidate the TLBs.

Thanks,
Jean


Re: Guidance required for a crypto requirement

2018-11-20 Thread gre...@linuxfoundation.org
On Tue, Nov 20, 2018 at 10:30:56AM +, Kalyani Akula wrote:
> This email and any attachments are intended for the sole use of the named 
> recipient(s) and contain(s) confidential information that may be proprietary, 
> privileged or copyrighted under applicable law. If you are not the intended 
> recipient, do not read, copy, or forward this email message or any 
> attachments. Delete this email message and any attachments immediately.

Now deleted, sorry.


Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc

2018-11-20 Thread Ard Biesheuvel
On Tue, 20 Nov 2018 at 07:02, Herbert Xu  wrote:
>
> On Mon, Nov 19, 2018 at 01:24:51PM +0800, Herbert Xu wrote:
> >
> > In response to Martin's patch-set which I merged last week, I think
> > here is quick way out for the zinc interface.
> >
> > Going through the past zinc discussions it would appear that
> > everybody is quite happy with the zinc interface per se.  The
> > most contentious areas are in fact the algorithm implementations
> > under zinc, as well as the conversion of the crypto API algorithms
> > over to using the zinc interface (e.g., you can no longer access
> > specific implementations).
> >
> > My proposal is to merge the zinc interface as is, but to invert
> > how we place the algorithm implementations.  IOW the implementations
> > should stay where they are now, with in the crypto API.  However,
> > we will provide direct access to them for zinc without going through
> > the crypto API.  IOW we'll just export the functions directly.
> >
> > Here is a proof of concept patch to do it for chacha20-generic.
> > It basically replaces patch 3 in the October zinc patch series.
> > Actually this patch also exports the arm/x86 chacha20 functions
> > too so they are ready for use by zinc.
> >
> > If everyone is happy with this then we can immediately add the
> > zinc interface and expose the existing crypto API algorithms
> > through it.  This would allow wireguard to be merged right away.
> >
> > In parallel, the discussions over replacing the implementations
> > underneath can carry on without stalling the whole project.
> >
> > PS This patch is totally untested :)
>
> Here is an updated version demonstrating how we could access the
> accelerated versions of chacha20.  It also includes a final patch
> to deposit the new zinc version of x86-64 chacha20 into
> arch/x86/crypto where it can be used by both the crypto API as well
> as zinc.
>
> The key differences between this and the last zinc patch series:
>
> 1. The crypto API algorithms remain individually accessible, this
> is crucial as these algorithm names are exported to user-space so
> changing the names to foo-zinc is not going to work.
>

Are you saying user space may use names like "ctr-aes-neon" directly
rather than "ctr(aes)" for any implementation of the mode?

If so, that is highly unfortunate, since it means we'd be breaking
user space by wrapping a crypto library function with its own arch
specific specializations into a generic crypto API wrapper.

Note that I think that using AF_ALG to access software crypto routines
(as opposed to accelerators) is rather pointless to begin with, but if
it permits that today than we're stuck with it.

> 2. The arch-specific algorithm code lives in their own module rather
> than in a monolithic module.
>

This is one of the remaining issues I have with Zinc. However, modulo
the above concerns, I think it does make sense to have a separate
function API for synchronous software routines below the current
crypto API. What I don't want is a separate Zinc kingdom with a
different name, different maintainers and going through a different
tree, and with its own approach to test vectors, arch specific code,
etc etc


Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Jonathan Cameron
+CC Jean-Phillipe and iommu list.


On Mon, 19 Nov 2018 20:29:39 -0700
Jason Gunthorpe  wrote:

> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:  
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe 
> > > To: Kenneth Lee 
> > > CC: Leon Romanovsky , Kenneth Lee ,
> > >  Tim Sell , linux-...@vger.kernel.org, Alexander
> > >  Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Doug Ledford , Uwe Kleine-König
> > >  , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <20181119184954.gb4...@ziepe.ca>
> > > 
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to 
> > > > have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But 
> > > > this cannot
> > > > solve the COW problem: if the user process A share a page P with 
> > > > device, and A 
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you 
> > > > have
> > > > no way to let the device know it should use P' rather than P.  
> > > 
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > > 
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW. 
> > > 
> > > And this causes the device page tables to re-synchronize.  
> > 
> > I don't see such code. The current do_cow_fault() implemenation has nothing 
> > to
> > do with mm_notifer.  
> 
> Well, that sure sounds like it would be a bug in mmu_notifiers..
> 
> But considering Jean's SVA stuff seems based on mmu notifiers, I have
> a hard time believing that it has any different behavior from RDMA's
> ODP, and if it does have different behavior, then it is probably just
> a bug in the ODP implementation.
> 
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> > > > support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> > > > don't need
> > > > to write any code for that. Because it has been done by IOMMU 
> > > > framework. If it  
> > > 
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > > 
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.  
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.  
> 
> This SVA stuff does not look comparable to CAPI as it still requires
> maintaining seperate IOMMU page tables.
> 
> Also, those patches from Jean have a lot of references to
> mmu_notifiers (ie look at iommu_mmu_notifier).
> 
> Are you really sure it is actually any different at all?
> 
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.  
> > 
> > Yes. That's our first expectation. We can keep it with our driver. But 
> > because
> > there is no user driver support for any accelerator in mainline kernel. 
> > Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.  
> 
> Well, you should come with patches ack'ed by these other groups.
> 
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.   
> > 
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion 
> > and
> > try not to add complexity to the mm subsystem.  
> 
> Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> you don't have a VFIO driver if it needs to do stuff like that...
> 
> > > If it needs to do a bunch of other stuff, not related to PCI bar
> > > space, i