Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests

2018-09-06 Thread Ard Biesheuvel
On 7 September 2018 at 05:42, Herbert Xu  wrote:
> On Thu, Sep 06, 2018 at 03:58:52PM -0700, Kees Cook wrote:
>>
>> @@ -437,6 +442,12 @@ static inline struct crypto_skcipher 
>> *crypto_skcipher_reqtfm_check(
>>  {
>>   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>>
>> + if (req->__onstack) {
>> + if (WARN_ON(crypto_skcipher_alg(tfm)->base.cra_flags &
>> + CRYPTO_ALG_ASYNC))
>> + return ERR_PTR(-EINVAL);
>> + }
>
> Sorry but I don't like imposing a run-time check on everybody when
> stack-based requests are the odd ones out.  If we're going to make
> this a run-time check (I'd much prefer a compile-time check, but I
> understand that this may involve too much churn), then please do it
> for stack-based request users only.
>

OK, so given that all SKCIPHER_REQUEST_ON_STACK occurrences are
updated in this series anyway, perhaps we should add
skcipher_[en|de]crypt_onstack() flavors that encapsulate the
additional check? Only question is how to enforce at compile time that
those are used instead of the ordinary ones when using a stack
allocated request. Would you mind using some macro foo here involving
__builtin_types_compatible_p() ?


[PATCH] crypto: Adds user space interface for ALG_SET_KEY_TYPE

2018-09-06 Thread Kalyani Akula
ALG_SET_KEY_TYPE requires caller to pass the key_type to be used
for AES encryption/decryption.

Sometimes the cipher key will be stored in the device's
hardware (eFuse, BBRAM etc).So,there is a need to specify the information
about the key-type to use it for Encrypt or Decrypt operations.

This patch implements the above requirement.


Signed-off-by: Kalyani Akula 
---
 crypto/af_alg.c |  7 +++
 crypto/algif_skcipher.c | 12 
 crypto/blkcipher.c  |  9 +
 crypto/skcipher.c   | 18 ++
 include/crypto/if_alg.h |  2 ++
 include/crypto/skcipher.h   | 12 +++-
 include/linux/crypto.h  | 12 
 include/uapi/linux/if_alg.h |  1 +
 8 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index b053179..9ea6b86 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -261,6 +261,13 @@ static int alg_setsockopt(struct socket *sock, int level, 
int optname,
if (!type->setauthsize)
goto unlock;
err = type->setauthsize(ask->private, optlen);
+   break;
+   case ALG_SET_KEY_TYPE:
+   if (sock->state == SS_CONNECTED)
+   goto unlock;
+   if (!type->setkeytype)
+   goto unlock;
+   err = type->setkeytype(ask->private, optval, optlen);
}

 unlock:
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index cfdaab2..9164465 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -320,6 +320,17 @@ static int skcipher_setkey(void *private, const u8 *key, 
unsigned int keylen)
return crypto_skcipher_setkey(private, key, keylen);
 }

+static int skcipher_setkeytype(void *private, const u8 *key,
+  unsigned int keylen)
+{
+   struct skcipher_tfm *tfm = private;
+   int err;
+
+   err = crypto_skcipher_setkeytype(tfm->skcipher, key, keylen);
+
+   return err;
+}
+
 static void skcipher_sock_destruct(struct sock *sk)
 {
struct alg_sock *ask = alg_sk(sk);
@@ -384,6 +395,7 @@ static int skcipher_accept_parent(void *private, struct 
sock *sk)
.bind   =   skcipher_bind,
.release=   skcipher_release,
.setkey =   skcipher_setkey,
+   .setkeytype =   skcipher_setkeytype,
.accept =   skcipher_accept_parent,
.accept_nokey   =   skcipher_accept_parent_nokey,
.ops=   &algif_skcipher_ops,
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index f93abf1..3ea0e7f 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -408,6 +408,14 @@ static int setkey(struct crypto_tfm *tfm, const u8 *key, 
unsigned int keylen)
return cipher->setkey(tfm, key, keylen);
 }

+static int setkeytype(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int keylen)
+{
+   struct blkcipher_alg *cipher = &tfm->__crt_alg->cra_blkcipher;
+
+   return cipher->setkeytype(tfm, key, keylen);
+}
+
 static int async_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
unsigned int keylen)
 {
@@ -478,6 +486,7 @@ static int crypto_init_blkcipher_ops_sync(struct crypto_tfm 
*tfm)
unsigned long addr;

crt->setkey = setkey;
+   crt->setkeytype = setkeytype;
crt->encrypt = alg->encrypt;
crt->decrypt = alg->decrypt;

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 0bd8c6c..cb794dd 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -604,6 +604,23 @@ static int skcipher_setkey_blkcipher(struct 
crypto_skcipher *tfm,
return 0;
 }

+static int skcipher_setkeytype_blkcipher(struct crypto_skcipher *tfm,
+const u8 *key, unsigned int keylen)
+{
+   struct crypto_blkcipher **ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_blkcipher *blkcipher = *ctx;
+   int err;
+
+   crypto_blkcipher_clear_flags(blkcipher, ~0);
+   crypto_blkcipher_set_flags(blkcipher, crypto_skcipher_get_flags(tfm) &
+   CRYPTO_TFM_REQ_MASK);
+   err = crypto_blkcipher_setkeytype(blkcipher, key, keylen);
+   crypto_skcipher_set_flags(tfm, crypto_blkcipher_get_flags(blkcipher) &
+   CRYPTO_TFM_RES_MASK);
+
+   return err;
+}
+
 static int skcipher_crypt_blkcipher(struct skcipher_request *req,
int (*crypt)(struct blkcipher_desc *,
 struct scatterlist *,
@@ -670,6 +687,7 @@ static int crypto_init_skcipher_ops_blkcipher(struct 
crypto_tfm *tfm)
tfm->exit = crypto_exit_skcipher_ops_blkcipher;

skcipher->setkey = skcipher_setkey_blkcipher;
+   skcipher->setkeytype = skcipher_setkeytype_blkcipher;
skcipher->encrypt = skcipher_encrypt_blkcipher;
skcipher->decrypt = skcipher_decrypt_blkciphe

Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables

2018-09-06 Thread Olof Johansson
Hi,

On Thu, Sep 6, 2018 at 8:32 PM, Herbert Xu  wrote:
> On Thu, Sep 06, 2018 at 12:33:58PM -0700, Eric Dumazet wrote:
>> On Thu, Sep 6, 2018 at 12:21 PM Olof Johansson  wrote:
>> >
>> > Today these are all global shared variables per protocol, and in
>> > particular tcp_memory_allocated can get hot on a system with
>> > large number of CPUs and a substantial number of connections.
>> >
>> > Moving it over to a per-cpu variable makes it significantly cheaper,
>> > and the added overhead when summing up the percpu copies is still smaller
>> > than the cost of having a hot cacheline bouncing around.
>>
>> I am curious. We never noticed contention on this variable, at least for TCP.
>
> Yes these variables are heavily amortised so I'm surprised that
> they would cause much contention.
>
>> Please share some numbers with us.
>
> Indeed.

Certainly, just had to collect them again.

This is on a dual xeon box, with ~150-200k TCP connections. I see
about .7% CPU spent in __sk_mem_{reduce,raise}_allocated in the
inlined atomic ops, most of those in reduce.

Call path for reduce is practically all from tcp_write_timer on softirq:

__sk_mem_reduce_allocated
tcp_write_timer
call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter_state

With this patch, I see about .18+.11+.07 = .36% in percpu-related
functions called from the same __sk_mem functions.

Now, that's a halving of cycles samples on that specific setup. The
real difference though, is on another platform where atomics are more
expensive. There, this makes a significant difference. Unfortunately,
I can't share specifics but I think this change stands on its own on
the dual xeon setup as well, maybe with slightly less strong wording
on just how hot the variable/line happens to be.


-Olof


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-06 Thread Kenneth Lee
On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote:
> Date: Thu, 6 Sep 2018 09:31:33 -0400
> From: Jerome Glisse 
> To: Kenneth Lee 
> CC: Alex Williamson , Kenneth Lee
>  , Jonathan Corbet , Herbert Xu
>  , "David S . Miller" ,
>  Joerg Roedel , Hao Fang , Zhou Wang
>  , Zaibo Xu , Philippe
>  Ombredanne , Greg Kroah-Hartman
>  , Thomas Gleixner ,
>  linux-...@vger.kernel.org, linux-ker...@vger.kernel.org,
>  linux-crypto@vger.kernel.org, io...@lists.linux-foundation.org,
>  k...@vger.kernel.org, linux-accelerat...@lists.ozlabs.org, Lu Baolu
>  , Sanjay Kumar ,
>  linux...@huawei.com
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.0 (2018-05-17)
> Message-ID: <20180906133133.ga3...@redhat.com>
> 
> On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson wrote:
> > > Date: Tue, 4 Sep 2018 10:15:09 -0600
> > > From: Alex Williamson 
> > > To: Jerome Glisse 
> > > CC: Kenneth Lee , Jonathan Corbet ,
> > >  Herbert Xu , "David S . Miller"
> > >  , Joerg Roedel , Kenneth Lee
> > >  , Hao Fang , Zhou Wang
> > >  , Zaibo Xu , Philippe
> > >  Ombredanne , Greg Kroah-Hartman
> > >  , Thomas Gleixner ,
> > >  linux-...@vger.kernel.org, linux-ker...@vger.kernel.org,
> > >  linux-crypto@vger.kernel.org, io...@lists.linux-foundation.org,
> > >  k...@vger.kernel.org, linux-accelerat...@lists.ozlabs.org, Lu Baolu
> > >  , Sanjay Kumar ,
> > >  linux...@huawei.com
> > > Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> > > Message-ID: <20180904101509.62314...@t450s.home>
> > > 
> > > On Tue, 4 Sep 2018 11:00:19 -0400
> > > Jerome Glisse  wrote:
> > > 
> > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee wrote:
> > > > > From: Kenneth Lee 
> > > > > 
> > > > > WarpDrive is an accelerator framework to expose the hardware 
> > > > > capabilities
> > > > > directly to the user space. It makes use of the exist vfio and 
> > > > > vfio-mdev
> > > > > facilities. So the user application can send request and DMA to the
> > > > > hardware without interaction with the kernel. This removes the latency
> > > > > of syscall.
> > > > > 
> > > > > WarpDrive is the name for the whole framework. The component in kernel
> > > > > is called SDMDEV, Share Domain Mediated Device. Driver driver exposes 
> > > > > its
> > > > > hardware resource by registering to SDMDEV as a VFIO-Mdev. So the user
> > > > > library of WarpDrive can access it via VFIO interface.
> > > > > 
> > > > > The patchset contains document for the detail. Please refer to it for 
> > > > > more
> > > > > information.
> > > > > 
> > > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > > patch [1], which enables not only IO side page fault, but also PASID
> > > > > support to IOMMU and VFIO.
> > > > > 
> > > > > With these features, WarpDrive can support non-pinned memory and
> > > > > multi-process in the same accelerator device.  We tested it in our SoC
> > > > > integrated Accelerator (board ID: D06, Chip ID: HIP08). A reference 
> > > > > work
> > > > > tree can be found here: [2].
> > > > > 
> > > > > But it is not mandatory. This patchset is tested in the latest 
> > > > > mainline
> > > > > kernel without the SVA patches.  So it supports only one process for 
> > > > > each
> > > > > accelerator.
> > > > > 
> > > > > We have noticed the IOMMU aware mdev RFC announced recently [3].
> > > > > 
> > > > > The IOMMU aware mdev has similar idea but different intention 
> > > > > comparing to
> > > > > WarpDrive. It intends to dedicate part of the hardware resource to a 
> > > > > VM.
> > > > > And the design is supposed to be used with Scalable I/O 
> > > > > Virtualization.
> > > > > While sdmdev is intended to share the hardware resource with a big 
> > > > > amount
> > > > > of processes.  It just requires the hardware supporting address
> > > > > translation per process (PCIE's PASID or ARM SMMU's substream ID).
> > > > > 
> > > > > But we don't see serious confliction on both design. We believe they 
> > > > > can be
> > > > > normalized as one.
> > > > >   
> > > > 
> > > > So once again i do not understand why you are trying to do things
> > > > this way. Kernel already have tons of example of everything you
> > > > want to do without a new framework. Moreover i believe you are
> > > > confuse by VFIO. To me VFIO is for VM not to create general device
> > > > driver frame work.
> > > 
> > > VFIO is a userspace driver framework, the VM use case just happens to
> > > be a rather prolific one.  VFIO was never intended to be solely a VM
> > > device interface and has several other userspace users, notably DPDK
> > > and SPDK, an NVMe backend in QEMU, a userspace NVMe driver, a ruby
> > > wrapper, and perhaps others that I'm not aware of.  Whether vfio is
> > > appropriate interface here might certainly still be a debatable topic,
> > > but I would strongly disagree with your la

Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests

2018-09-06 Thread Herbert Xu
On Thu, Sep 06, 2018 at 03:58:52PM -0700, Kees Cook wrote:
>
> @@ -437,6 +442,12 @@ static inline struct crypto_skcipher 
> *crypto_skcipher_reqtfm_check(
>  {
>   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>  
> + if (req->__onstack) {
> + if (WARN_ON(crypto_skcipher_alg(tfm)->base.cra_flags &
> + CRYPTO_ALG_ASYNC))
> + return ERR_PTR(-EINVAL);
> + }

Sorry but I don't like imposing a run-time check on everybody when
stack-based requests are the odd ones out.  If we're going to make
this a run-time check (I'd much prefer a compile-time check, but I
understand that this may involve too much churn), then please do it
for stack-based request users only.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables

2018-09-06 Thread Herbert Xu
On Thu, Sep 06, 2018 at 12:33:58PM -0700, Eric Dumazet wrote:
> On Thu, Sep 6, 2018 at 12:21 PM Olof Johansson  wrote:
> >
> > Today these are all global shared variables per protocol, and in
> > particular tcp_memory_allocated can get hot on a system with
> > large number of CPUs and a substantial number of connections.
> >
> > Moving it over to a per-cpu variable makes it significantly cheaper,
> > and the added overhead when summing up the percpu copies is still smaller
> > than the cost of having a hot cacheline bouncing around.
> 
> I am curious. We never noticed contention on this variable, at least for TCP.

Yes these variables are heavily amortised so I'm surprised that
they would cause much contention.

> Please share some numbers with us.

Indeed.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 1/7] vfio/sdmdev: Add documents for WarpDrive framework

2018-09-06 Thread Kenneth Lee
On Thu, Sep 06, 2018 at 11:36:36AM -0700, Randy Dunlap wrote:
> Date: Thu, 6 Sep 2018 11:36:36 -0700
> From: Randy Dunlap 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu ,
>  Sanjay Kumar 
> CC: linux...@huawei.com
> Subject: Re: [PATCH 1/7] vfio/sdmdev: Add documents for WarpDrive framework
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <56f5f66d-f6d9-f4fa-40ca-e4a8bad17...@infradead.org>
> 
> Hi,
> 
> On 09/02/2018 05:51 PM, Kenneth Lee wrote:
> > From: Kenneth Lee 
> > 
> > WarpDrive is a common user space accelerator framework.  Its main component
> > in Kernel is called sdmdev, Share Domain Mediated Device. It exposes
> > the hardware capabilities to the user space via vfio-mdev. So processes in
> > user land can obtain a "queue" by open the device and direct access the
> > hardware MMIO space or do DMA operation via VFIO interface.
> > 
> > WarpDrive is intended to be used with Jean Philippe Brucker's SVA
> > patchset to support multi-process. But This is not a must.  Without the
> > SVA patches, WarpDrive can still work for one process for every hardware
> > device.
> > 
> > This patch add detail documents for the framework.
> > 
> > Signed-off-by: Kenneth Lee 
> > ---
> >  Documentation/00-INDEX|   2 +
> >  Documentation/warpdrive/warpdrive.rst | 100 
> >  Documentation/warpdrive/wd-arch.svg   | 728 ++
> >  3 files changed, 830 insertions(+)
> >  create mode 100644 Documentation/warpdrive/warpdrive.rst
> >  create mode 100644 Documentation/warpdrive/wd-arch.svg
> 
> > diff --git a/Documentation/warpdrive/warpdrive.rst 
> > b/Documentation/warpdrive/warpdrive.rst
> > new file mode 100644
> > index ..6d2a5d1e08c4
> > --- /dev/null
> > +++ b/Documentation/warpdrive/warpdrive.rst
> > @@ -0,0 +1,100 @@
> > +Introduction of WarpDrive
> > +=
> > +
> > +*WarpDrive* is a general accelerator framework for user space. It intends 
> > to
> > +provide interface for the user process to send request to hardware
> > +accelerator without heavy user-kernel interaction cost.
> > +
> > +The *WarpDrive* user library is supposed to provide a pipe-based API, such 
> > as:
> 
> Do you say "is supposed to" because it doesn't do that (yet)?
> Or you could just change that to say:
> 
>The WarpDrive user library provides a pipe-based API, such as:
> 

Actually, I tried to say it can be defined like this. But people can choose
other implementation with the same kernel API.

I will say it explicitly in the future version. Thank you.

> 
> > +::
> > +int wd_request_queue(struct wd_queue *q);
> > +void wd_release_queue(struct wd_queue *q);
> > +
> > +int wd_send(struct wd_queue *q, void *req);
> > +int wd_recv(struct wd_queue *q, void **req);
> > +int wd_recv_sync(struct wd_queue *q, void **req);
> > +int wd_flush(struct wd_queue *q);
> > +
> > +*wd_request_queue* creates the pipe connection, *queue*, between the
> > +application and the hardware. The application sends request and pulls the
> > +answer back by asynchronized wd_send/wd_recv, which directly interact with 
> > the
> > +hardware (by MMIO or share memory) without syscall.
> > +
> > +*WarpDrive* maintains a unified application address space among all 
> > involved
> > +accelerators.  With the following APIs: ::
> 
> Seems like an extra '.' there.  How about:
> 
>   accelerators with the following APIs: ::
> 

Err, the "with..." clause belong to the following "The referred process
space...".

> > +
> > +int wd_mem_share(struct wd_queue *q, const void *addr,
> > + size_t size, int flags);
> > +void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t 
> > size);
> > +
> > +The referred process space shared by these APIs can be directly referred 
> > by the
> > +hardware. The process can also dedicate its whole process space with flags,
> > +*WD_SHARE_ALL* (not in this patch yet).
> > +
> > +The name *WarpDrive* is simply a cool and general name meaning the 
> > framework
> > +makes the application faster. As it will be explained in this text later, 
> > the
> > +facility in kernel is called *SDMDEV*, namely "Share Domain Mediated 
> > Device".
> > +
> > +
> > +How does it work
> > +
> > +
> > +*WarpDrive* is built upon *VFIO-MDEV*. The queue is wrapped as *mdev* in 
> > VFIO.
> > +So memory sharing can be done via standard VFIO standard DMA interface.
> > +
> > +The architecture is illustrated as follow figure:
> > +
> > +.. image:: wd

[PATCH] crypto: padlock-aes: Add ecx to outputs for rep instructions

2018-09-06 Thread Ben Hutchings
The current constraints for inline "rep xcrypt*" instructions mark ecx
as an input only.  The compiler can therefore assume wrongly that ecx
holds the same value afterward, but in reality it will contain 0.

This previously led to data corruption, which was fixed around by
commit 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data
corruption").  But a future compiler or different optimisation
configuration could reintroduce the problem.  Update the constraints
to avoid this.

Fixes: a76c1c23d0c3 ("crypto: padlock-aes - work around Nano CPU ...")
Fixes: 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data ...")
Signed-off-by: Ben Hutchings 
---
This is totally untested, so don't assume I know what I'm talking
about. :-)

Ben.

 drivers/crypto/padlock-aes.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/padlock-aes.c b/drivers/crypto/padlock-aes.c
index 09d823d36d3a..079b85bf657d 100644
--- a/drivers/crypto/padlock-aes.c
+++ b/drivers/crypto/padlock-aes.c
@@ -191,16 +191,16 @@ static inline void rep_xcrypt_ecb(const u8 *input, u8 
*output, void *key,
  struct cword *control_word, int count)
 {
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"   /* rep xcryptecb */
- : "+S"(input), "+D"(output)
- : "d"(control_word), "b"(key), "c"(count));
+ : "+S"(input), "+D"(output), "+c"(count)
+ : "d"(control_word), "b"(key));
 }
 
 static inline u8 *rep_xcrypt_cbc(const u8 *input, u8 *output, void *key,
 u8 *iv, struct cword *control_word, int count)
 {
asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"   /* rep xcryptcbc */
- : "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (count));
+ : "+S" (input), "+D" (output), "+a" (iv), "+c" (count)
+ : "d" (control_word), "b" (key));
return iv;
 }
 
@@ -270,12 +270,12 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 
*output, void *key,
 
if (initial)
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"   /* rep 
xcryptecb */
- : "+S"(input), "+D"(output)
- : "d"(control_word), "b"(key), "c"(initial));
+ : "+S"(input), "+D"(output), "+c"(initial)
+ : "d"(control_word), "b"(key));
 
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"   /* rep xcryptecb */
- : "+S"(input), "+D"(output)
- : "d"(control_word), "b"(key), "c"(count));
+ : "+S"(input), "+D"(output), "+c"(count)
+ : "d"(control_word), "b"(key));
 }
 
 static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key,
@@ -290,12 +290,13 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 
*output, void *key,
 
if (initial)
asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"   /* rep 
xcryptcbc */
- : "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (initial));
+ : "+S" (input), "+D" (output), "+a" (iv),
+   "+c" (initial)
+ : "d" (control_word), "b" (key));
 
asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"   /* rep xcryptcbc */
- : "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (count));
+ : "+S" (input), "+D" (output), "+a" (iv), "+c" (count)
+ : "d" (control_word), "b" (key));
return iv;
 }
 
-- 
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
 Manchester, M1 2HF, United Kingdom



[PATCH v2 1/4] crypto: skcipher - Consolidate encrypt/decrypt sanity check

2018-09-06 Thread Kees Cook
In preparation to adding additional sanity checks before running an
skcipher request, this consolidates the open-coded checks into a single
function. Instead of passing both req and tfm into the new check this
just returns the tfm on success and an ERR_PTR on failure, keeping things
as clean as possible.

Signed-off-by: Kees Cook 
---
 include/crypto/skcipher.h | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 2f327f090c3e..6e954d398e0f 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -422,6 +422,27 @@ static inline struct crypto_skcipher 
*crypto_skcipher_reqtfm(
return __crypto_skcipher_cast(req->base.tfm);
 }
 
+/**
+ * crypto_skcipher_reqtfm_check() - obtain and check cipher handle from request
+ * @req: skcipher_request out of which the cipher handle is to be obtained
+ *
+ * Return the crypto_skcipher handle when furnishing an skcipher_request
+ * data structure. Check for error conditions that would make it unusable
+ * for an encrypt/decrypt call.
+ *
+ * Return: crypto_skcipher handle, or ERR_PTR on error.
+ */
+static inline struct crypto_skcipher *crypto_skcipher_reqtfm_check(
+   struct skcipher_request *req)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+   if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
+   return ERR_PTR(-ENOKEY);
+
+   return tfm;
+}
+
 /**
  * crypto_skcipher_encrypt() - encrypt plaintext
  * @req: reference to the skcipher_request handle that holds all information
@@ -435,10 +456,10 @@ static inline struct crypto_skcipher 
*crypto_skcipher_reqtfm(
  */
 static inline int crypto_skcipher_encrypt(struct skcipher_request *req)
 {
-   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm_check(req);
 
-   if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
-   return -ENOKEY;
+   if (IS_ERR(tfm))
+   return PTR_ERR(tfm);
 
return tfm->encrypt(req);
 }
@@ -456,10 +477,10 @@ static inline int crypto_skcipher_encrypt(struct 
skcipher_request *req)
  */
 static inline int crypto_skcipher_decrypt(struct skcipher_request *req)
 {
-   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm_check(req);
 
-   if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
-   return -ENOKEY;
+   if (IS_ERR(tfm))
+   return PTR_ERR(tfm);
 
return tfm->decrypt(req);
 }
-- 
2.17.1



[PATCH v2 3/4] crypto: skcipher - Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Kees Cook
In the quest to remove all stack VLA usage from the kernel[1], this caps
the non-ASYNC skcipher request size similar to other limits and adds a
sanity check at usage. After a review of all non-ASYNC algorithm callers
of crypto_skcipher_set_reqsize(), the largest appears to be 384:

4   struct sun4i_cipher_req_ctx
96  struct crypto_rfc3686_req_ctx
375 cts:
160 crypto_skcipher_blocksize(cipher) (max)
152 struct crypto_cts_reqctx
63  align_mask (max)
384 struct rctx (lrw, xts)

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 include/crypto/skcipher.h | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 3aabd5d098ed..cca216999bf1 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -144,9 +144,10 @@ struct skcipher_alg {
 /*
  * This must only ever be used with synchronous algorithms.
  */
+#define MAX_SYNC_SKCIPHER_REQSIZE  384
 #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \
char __##name##_desc[sizeof(struct skcipher_request) + \
-   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR = { 1 }; \
+   MAX_SYNC_SKCIPHER_REQSIZE] CRYPTO_MINALIGN_ATTR = { 1 }; \
struct skcipher_request *name = (void *)__##name##_desc
 
 /**
@@ -412,6 +413,17 @@ static inline unsigned int crypto_skcipher_default_keysize(
return tfm->keysize;
 }
 
+/**
+ * crypto_skcipher_reqsize() - obtain size of the request data structure
+ * @tfm: cipher handle
+ *
+ * Return: number of bytes
+ */
+static inline unsigned int crypto_skcipher_reqsize(struct crypto_skcipher *tfm)
+{
+   return tfm->reqsize;
+}
+
 /**
  * crypto_skcipher_reqtfm() - obtain cipher handle from request
  * @req: skcipher_request out of which the cipher handle is to be obtained
@@ -446,6 +458,9 @@ static inline struct crypto_skcipher 
*crypto_skcipher_reqtfm_check(
if (WARN_ON(crypto_skcipher_alg(tfm)->base.cra_flags &
CRYPTO_ALG_ASYNC))
return ERR_PTR(-EINVAL);
+   if (WARN_ON(crypto_skcipher_reqsize(tfm) >
+   MAX_SYNC_SKCIPHER_REQSIZE))
+   return ERR_PTR(-ENOSPC);
}
 
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
@@ -507,17 +522,6 @@ static inline int crypto_skcipher_decrypt(struct 
skcipher_request *req)
  * skcipher handle to the crypto_skcipher_* API calls.
  */
 
-/**
- * crypto_skcipher_reqsize() - obtain size of the request data structure
- * @tfm: cipher handle
- *
- * Return: number of bytes
- */
-static inline unsigned int crypto_skcipher_reqsize(struct crypto_skcipher *tfm)
-{
-   return tfm->reqsize;
-}
-
 /**
  * skcipher_request_set_tfm() - update cipher handle reference in request
  * @req: request handle to be modified
-- 
2.17.1



[PATCH v2 0/4] crypto: skcipher - Remove VLA usage

2018-09-06 Thread Kees Cook
This removes VLAs[1] from SKCIPHER_REQUEST_ON_STACK by making sure that
on-stack requests are being used only on non-ASYNC algorithms and that
enough space has been reserved.

v2:
- Instead of globally failing large reqsizes, limit to only non-ASYNC users
  of the on-stack request.
- Remove unused tfm argument after VLA removal.

-Kees

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Kees Cook (4):
  crypto: skcipher - Consolidate encrypt/decrypt sanity check
  crypto: skcipher - Enforce non-ASYNC for on-stack requests
  crypto: skcipher - Remove VLA usage for SKCIPHER_REQUEST_ON_STACK
  crypto: skcipher - Remove unused argument to SKCIPHER_REQUEST_ON_STACK()

 arch/s390/crypto/aes_s390.c   |  8 +-
 arch/x86/crypto/fpu.c |  4 +-
 crypto/algif_aead.c   |  2 +-
 crypto/authenc.c  |  2 +-
 crypto/authencesn.c   |  2 +-
 crypto/cryptd.c   |  4 +-
 crypto/echainiv.c |  2 +-
 crypto/gcm.c  |  2 +-
 crypto/seqiv.c|  2 +-
 drivers/block/cryptoloop.c|  2 +-
 drivers/crypto/axis/artpec6_crypto.c  |  2 +-
 drivers/crypto/ccp/ccp-crypto-aes-xts.c   |  2 +-
 drivers/crypto/chelsio/chcr_algo.c|  2 +-
 drivers/crypto/mxs-dcp.c  |  2 +-
 drivers/crypto/omap-aes.c |  2 +-
 drivers/crypto/picoxcell_crypto.c |  2 +-
 drivers/crypto/qce/ablkcipher.c   |  2 +-
 drivers/crypto/sahara.c   |  8 +-
 drivers/crypto/vmx/aes_cbc.c  |  4 +-
 drivers/crypto/vmx/aes_ctr.c  |  2 +-
 drivers/crypto/vmx/aes_xts.c  |  2 +-
 drivers/net/ppp/ppp_mppe.c|  6 +-
 drivers/staging/rtl8192e/rtllib_crypt_tkip.c  |  4 +-
 drivers/staging/rtl8192e/rtllib_crypt_wep.c   |  4 +-
 .../rtl8192u/ieee80211/ieee80211_crypt_tkip.c |  4 +-
 .../rtl8192u/ieee80211/ieee80211_crypt_wep.c  |  4 +-
 drivers/usb/wusbcore/crypto.c |  2 +-
 include/crypto/skcipher.h | 74 ++-
 net/ceph/crypto.c |  2 +-
 net/mac802154/llsec.c |  4 +-
 net/rxrpc/rxkad.c | 10 +--
 net/sunrpc/auth_gss/gss_krb5_crypto.c | 14 ++--
 net/wireless/lib80211_crypt_tkip.c|  4 +-
 net/wireless/lib80211_crypt_wep.c |  4 +-
 34 files changed, 116 insertions(+), 80 deletions(-)

-- 
2.17.1



[PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests

2018-09-06 Thread Kees Cook
Check at use-time whether an skcipher request is on the stack. If it
is, enforce that it must be backed by a synchronous algorithm, as is
required:

  https://www.redhat.com/archives/dm-devel/2018-January/msg00087.html

Co-developed-by: Ard Biesheuvel 
Signed-off-by: Kees Cook 
---
 include/crypto/skcipher.h | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 6e954d398e0f..3aabd5d098ed 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -19,6 +19,7 @@
 
 /**
  * struct skcipher_request - Symmetric key cipher request
+ * @__onstack: 1 if the request was allocated by SKCIPHER_REQUEST_ON_STACK
  * @cryptlen: Number of bytes to encrypt or decrypt
  * @iv: Initialisation Vector
  * @src: Source SG list
@@ -27,6 +28,7 @@
  * @__ctx: Start of private context data
  */
 struct skcipher_request {
+   unsigned char __onstack;
unsigned int cryptlen;
 
u8 *iv;
@@ -139,9 +141,12 @@ struct skcipher_alg {
struct crypto_alg base;
 };
 
+/*
+ * This must only ever be used with synchronous algorithms.
+ */
 #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \
char __##name##_desc[sizeof(struct skcipher_request) + \
-   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR; \
+   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR = { 1 }; \
struct skcipher_request *name = (void *)__##name##_desc
 
 /**
@@ -437,6 +442,12 @@ static inline struct crypto_skcipher 
*crypto_skcipher_reqtfm_check(
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 
+   if (req->__onstack) {
+   if (WARN_ON(crypto_skcipher_alg(tfm)->base.cra_flags &
+   CRYPTO_ALG_ASYNC))
+   return ERR_PTR(-EINVAL);
+   }
+
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return ERR_PTR(-ENOKEY);
 
-- 
2.17.1



[PATCH 4/4] crypto: skcipher - Remove unused argument to SKCIPHER_REQUEST_ON_STACK()

2018-09-06 Thread Kees Cook
Since the size is now fixed, there is no need to include the tfm
argument. This removes it from the definition and callers.

Suggested-by: Alexander Stein 
Signed-off-by: Kees Cook 
---
 arch/s390/crypto/aes_s390.c|  8 
 arch/x86/crypto/fpu.c  |  4 ++--
 crypto/algif_aead.c|  2 +-
 crypto/authenc.c   |  2 +-
 crypto/authencesn.c|  2 +-
 crypto/cryptd.c|  4 ++--
 crypto/echainiv.c  |  2 +-
 crypto/gcm.c   |  2 +-
 crypto/seqiv.c |  2 +-
 drivers/block/cryptoloop.c |  2 +-
 drivers/crypto/axis/artpec6_crypto.c   |  2 +-
 drivers/crypto/ccp/ccp-crypto-aes-xts.c|  2 +-
 drivers/crypto/chelsio/chcr_algo.c |  2 +-
 drivers/crypto/mxs-dcp.c   |  2 +-
 drivers/crypto/omap-aes.c  |  2 +-
 drivers/crypto/picoxcell_crypto.c  |  2 +-
 drivers/crypto/qce/ablkcipher.c|  2 +-
 drivers/crypto/sahara.c|  8 
 drivers/crypto/vmx/aes_cbc.c   |  4 ++--
 drivers/crypto/vmx/aes_ctr.c   |  2 +-
 drivers/crypto/vmx/aes_xts.c   |  2 +-
 drivers/net/ppp/ppp_mppe.c |  6 +++---
 drivers/staging/rtl8192e/rtllib_crypt_tkip.c   |  4 ++--
 drivers/staging/rtl8192e/rtllib_crypt_wep.c|  4 ++--
 .../rtl8192u/ieee80211/ieee80211_crypt_tkip.c  |  4 ++--
 .../rtl8192u/ieee80211/ieee80211_crypt_wep.c   |  4 ++--
 drivers/usb/wusbcore/crypto.c  |  2 +-
 include/crypto/skcipher.h  |  2 +-
 net/ceph/crypto.c  |  2 +-
 net/mac802154/llsec.c  |  4 ++--
 net/rxrpc/rxkad.c  | 10 +-
 net/sunrpc/auth_gss/gss_krb5_crypto.c  | 14 +++---
 net/wireless/lib80211_crypt_tkip.c |  4 ++--
 net/wireless/lib80211_crypt_wep.c  |  4 ++--
 34 files changed, 62 insertions(+), 62 deletions(-)

diff --git a/arch/s390/crypto/aes_s390.c b/arch/s390/crypto/aes_s390.c
index c54cb26eb7f5..212c076d36a7 100644
--- a/arch/s390/crypto/aes_s390.c
+++ b/arch/s390/crypto/aes_s390.c
@@ -204,7 +204,7 @@ static int fallback_blk_dec(struct blkcipher_desc *desc,
unsigned int ret;
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_aes_ctx *sctx = crypto_blkcipher_ctx(tfm);
-   SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
+   SKCIPHER_REQUEST_ON_STACK(req);
 
skcipher_request_set_tfm(req, sctx->fallback.blk);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
@@ -223,7 +223,7 @@ static int fallback_blk_enc(struct blkcipher_desc *desc,
unsigned int ret;
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_aes_ctx *sctx = crypto_blkcipher_ctx(tfm);
-   SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
+   SKCIPHER_REQUEST_ON_STACK(req);
 
skcipher_request_set_tfm(req, sctx->fallback.blk);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
@@ -472,7 +472,7 @@ static int xts_fallback_decrypt(struct blkcipher_desc *desc,
 {
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_xts_ctx *xts_ctx = crypto_blkcipher_ctx(tfm);
-   SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
+   SKCIPHER_REQUEST_ON_STACK(req);
unsigned int ret;
 
skcipher_request_set_tfm(req, xts_ctx->fallback);
@@ -491,7 +491,7 @@ static int xts_fallback_encrypt(struct blkcipher_desc *desc,
 {
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_xts_ctx *xts_ctx = crypto_blkcipher_ctx(tfm);
-   SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
+   SKCIPHER_REQUEST_ON_STACK(req);
unsigned int ret;
 
skcipher_request_set_tfm(req, xts_ctx->fallback);
diff --git a/arch/x86/crypto/fpu.c b/arch/x86/crypto/fpu.c
index 406680476c52..746cea05c07e 100644
--- a/arch/x86/crypto/fpu.c
+++ b/arch/x86/crypto/fpu.c
@@ -44,7 +44,7 @@ static int crypto_fpu_encrypt(struct skcipher_request *req)
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child = ctx->child;
-   SKCIPHER_REQUEST_ON_STACK(subreq, child);
+   SKCIPHER_REQUEST_ON_STACK(subreq);
int err;
 
skcipher_request_set_tfm(subreq, child);
@@ -65,7 +65,7 @@ static int crypto_fpu_decrypt(struct skcipher_request *req)
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child 

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Kees Cook
On Thu, Sep 6, 2018 at 1:22 PM, Kees Cook  wrote:
> On Wed, Sep 5, 2018 at 5:43 PM, Kees Cook  wrote:
>> On Wed, Sep 5, 2018 at 3:49 PM, Ard Biesheuvel
>>  wrote:
>>> On 5 September 2018 at 23:05, Kees Cook  wrote:
 On Wed, Sep 5, 2018 at 2:18 AM, Ard Biesheuvel
  wrote:
> On 4 September 2018 at 20:16, Kees Cook  wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> caps the skcipher request size similar to other limits and adds a sanity
>> check at registration. Looking at instrumented tcrypt output, the largest
>> is for lrw:
>>
>> crypt: testing lrw(aes)
>> crypto_skcipher_set_reqsize: 8
>> crypto_skcipher_set_reqsize: 88
>> crypto_skcipher_set_reqsize: 472
>>
>
> Are you sure this is a representative sampling? I haven't double
> checked myself, but we have plenty of drivers for peripherals in
> drivers/crypto that implement block ciphers, and they would not turn
> up in tcrypt unless you are running on a platform that provides the
> hardware in question.

 Hrm, excellent point. Looking at this again:
 [...]
 And of the crt_ablkcipher.reqsize assignments/initializers, I found:

 ablkcipher reqsize:
 1   struct dcp_aes_req_ctx
 8   struct atmel_tdes_reqctx
 8   struct cryptd_blkcipher_request_ctx
 8   struct mtk_aes_reqctx
 8   struct omap_des_reqctx
 8   struct s5p_aes_reqctx
 8   struct sahara_aes_reqctx
 8   struct stm32_cryp_reqctx
 8   struct stm32_cryp_reqctx
 16  struct ablk_ctx
 24  struct atmel_aes_reqctx
 48  struct omap_aes_reqctx
 48  struct omap_aes_reqctx
 48  struct qat_crypto_request
 56  struct artpec6_crypto_request_context
 64  struct chcr_blkcipher_req_ctx
 80  struct spacc_req
 80  struct virtio_crypto_sym_request
 136 struct qce_cipher_reqctx
 168 struct n2_request_context
 328 struct ccp_des3_req_ctx
 400 struct ccp_aes_req_ctx
 536 struct hifn_request_context
 992 struct cvm_req_ctx
 2456struct iproc_reqctx_s
>
> All of these are ASYNC (they're all crt_ablkcipher), so IIUC, I can ignore 
> them.
>
 The base ablkcipher wrapper is:
 80  struct ablkcipher_request

 And in my earlier skcipher wrapper analysis, lrw was the largest
 skcipher wrapper:
 384 struct rctx

 iproc_reqctx_s is an extreme outlier, with cvm_req_ctx at a bit less than 
 half.

 Making this a 2920 byte fixed array doesn't seem sensible at all
 (though that's what's already possible to use with existing
 SKCIPHER_REQUEST_ON_STACK users).

 What's the right path forward here?

>>>
>>> The skcipher implementations based on crypto IP blocks are typically
>>> asynchronous, and I wouldn't be surprised if a fair number of
>>> SKCIPHER_REQUEST_ON_STACK() users are limited to synchronous
>>> skciphers.
>>
>> Looks similar to ahash vs shash. :) Yes, so nearly all
>> crypto_alloc_skcipher() users explicitly mask away ASYNC. What's left
>> appears to be:
>>
>> crypto/drbg.c:  sk_tfm = crypto_alloc_skcipher(ctr_name, 0, 0);
>> crypto/tcrypt.c:tfm = crypto_alloc_skcipher(algo, 0, async ? 0
>> : CRYPTO_ALG_ASYNC);
>> drivers/crypto/omap-aes.c:  ctx->ctr =
>> crypto_alloc_skcipher("ecb(aes)", 0, 0);
>> drivers/md/dm-crypt.c:  cc->cipher_tfm.tfms[i] =
>> crypto_alloc_skcipher(ciphermode, 0, 0);
>> drivers/md/dm-integrity.c:  ic->journal_crypt =
>> crypto_alloc_skcipher(ic->journal_crypt_alg.alg_string, 0, 0);
>> fs/crypto/keyinfo.c:struct crypto_skcipher *tfm =
>> crypto_alloc_skcipher("ecb(aes)", 0, 0);
>> fs/crypto/keyinfo.c:ctfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
>> fs/ecryptfs/crypto.c:   crypt_stat->tfm =
>> crypto_alloc_skcipher(full_alg_name, 0, 0);
>>
>> I'll cross-reference this with SKCIPHER_REQUEST_ON_STACK...
>
> None of these use SKCIPHER_REQUEST_ON_STACK that I can find.
>
>>> So we could formalize this and limit SKCIPHER_REQUEST_ON_STACK() to
>>> synchronous skciphers, which implies that the reqsize limit only has
>>> to apply synchronous skciphers as well. But before we can do this, we
>>> have to identify the remaining occurrences that allow asynchronous
>>> skciphers to be used, and replace them with heap allocations.
>>
>> Sounds good; thanks!
>
> crypto_init_skcipher_ops_blkcipher() doesn't touch reqsize at all, so
> the only places I can find it gets changed are with direct callers of
> crypto_skcipher_set_reqsize(), which, when wrapping a sync blkcipher
> start with a reqsize == 0. So, the remaining non-ASYNC callers ask
> for:
>
> 4   struct sun4i_cipher_req_ctx
> 96  struct crypto_rfc3686_req_ctx
> 375 sum:
> 160 crypto_skcipher_blocksize(cipher) (max)
> 152 struct crypto_cts_reqc

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Kees Cook
On Wed, Sep 5, 2018 at 5:43 PM, Kees Cook  wrote:
> On Wed, Sep 5, 2018 at 3:49 PM, Ard Biesheuvel
>  wrote:
>> On 5 September 2018 at 23:05, Kees Cook  wrote:
>>> On Wed, Sep 5, 2018 at 2:18 AM, Ard Biesheuvel
>>>  wrote:
 On 4 September 2018 at 20:16, Kees Cook  wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> caps the skcipher request size similar to other limits and adds a sanity
> check at registration. Looking at instrumented tcrypt output, the largest
> is for lrw:
>
> crypt: testing lrw(aes)
> crypto_skcipher_set_reqsize: 8
> crypto_skcipher_set_reqsize: 88
> crypto_skcipher_set_reqsize: 472
>

 Are you sure this is a representative sampling? I haven't double
 checked myself, but we have plenty of drivers for peripherals in
 drivers/crypto that implement block ciphers, and they would not turn
 up in tcrypt unless you are running on a platform that provides the
 hardware in question.
>>>
>>> Hrm, excellent point. Looking at this again:
>>> [...]
>>> And of the crt_ablkcipher.reqsize assignments/initializers, I found:
>>>
>>> ablkcipher reqsize:
>>> 1   struct dcp_aes_req_ctx
>>> 8   struct atmel_tdes_reqctx
>>> 8   struct cryptd_blkcipher_request_ctx
>>> 8   struct mtk_aes_reqctx
>>> 8   struct omap_des_reqctx
>>> 8   struct s5p_aes_reqctx
>>> 8   struct sahara_aes_reqctx
>>> 8   struct stm32_cryp_reqctx
>>> 8   struct stm32_cryp_reqctx
>>> 16  struct ablk_ctx
>>> 24  struct atmel_aes_reqctx
>>> 48  struct omap_aes_reqctx
>>> 48  struct omap_aes_reqctx
>>> 48  struct qat_crypto_request
>>> 56  struct artpec6_crypto_request_context
>>> 64  struct chcr_blkcipher_req_ctx
>>> 80  struct spacc_req
>>> 80  struct virtio_crypto_sym_request
>>> 136 struct qce_cipher_reqctx
>>> 168 struct n2_request_context
>>> 328 struct ccp_des3_req_ctx
>>> 400 struct ccp_aes_req_ctx
>>> 536 struct hifn_request_context
>>> 992 struct cvm_req_ctx
>>> 2456struct iproc_reqctx_s

All of these are ASYNC (they're all crt_ablkcipher), so IIUC, I can ignore them.

>>> The base ablkcipher wrapper is:
>>> 80  struct ablkcipher_request
>>>
>>> And in my earlier skcipher wrapper analysis, lrw was the largest
>>> skcipher wrapper:
>>> 384 struct rctx
>>>
>>> iproc_reqctx_s is an extreme outlier, with cvm_req_ctx at a bit less than 
>>> half.
>>>
>>> Making this a 2920 byte fixed array doesn't seem sensible at all
>>> (though that's what's already possible to use with existing
>>> SKCIPHER_REQUEST_ON_STACK users).
>>>
>>> What's the right path forward here?
>>>
>>
>> The skcipher implementations based on crypto IP blocks are typically
>> asynchronous, and I wouldn't be surprised if a fair number of
>> SKCIPHER_REQUEST_ON_STACK() users are limited to synchronous
>> skciphers.
>
> Looks similar to ahash vs shash. :) Yes, so nearly all
> crypto_alloc_skcipher() users explicitly mask away ASYNC. What's left
> appears to be:
>
> crypto/drbg.c:  sk_tfm = crypto_alloc_skcipher(ctr_name, 0, 0);
> crypto/tcrypt.c:tfm = crypto_alloc_skcipher(algo, 0, async ? 0
> : CRYPTO_ALG_ASYNC);
> drivers/crypto/omap-aes.c:  ctx->ctr =
> crypto_alloc_skcipher("ecb(aes)", 0, 0);
> drivers/md/dm-crypt.c:  cc->cipher_tfm.tfms[i] =
> crypto_alloc_skcipher(ciphermode, 0, 0);
> drivers/md/dm-integrity.c:  ic->journal_crypt =
> crypto_alloc_skcipher(ic->journal_crypt_alg.alg_string, 0, 0);
> fs/crypto/keyinfo.c:struct crypto_skcipher *tfm =
> crypto_alloc_skcipher("ecb(aes)", 0, 0);
> fs/crypto/keyinfo.c:ctfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
> fs/ecryptfs/crypto.c:   crypt_stat->tfm =
> crypto_alloc_skcipher(full_alg_name, 0, 0);
>
> I'll cross-reference this with SKCIPHER_REQUEST_ON_STACK...

None of these use SKCIPHER_REQUEST_ON_STACK that I can find.

>> So we could formalize this and limit SKCIPHER_REQUEST_ON_STACK() to
>> synchronous skciphers, which implies that the reqsize limit only has
>> to apply synchronous skciphers as well. But before we can do this, we
>> have to identify the remaining occurrences that allow asynchronous
>> skciphers to be used, and replace them with heap allocations.
>
> Sounds good; thanks!

crypto_init_skcipher_ops_blkcipher() doesn't touch reqsize at all, so
the only places I can find it gets changed are with direct callers of
crypto_skcipher_set_reqsize(), which, when wrapping a sync blkcipher
start with a reqsize == 0. So, the remaining non-ASYNC callers ask
for:

4   struct sun4i_cipher_req_ctx
96  struct crypto_rfc3686_req_ctx
375 sum:
160 crypto_skcipher_blocksize(cipher) (max)
152 struct crypto_cts_reqctx
63  align_mask (max)
384 struct rctx

So, following your patch to encrypt/decrypt, I can add reqsize check
there. How does this look, on top of your patch?

---

Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables

2018-09-06 Thread Eric Dumazet
On Thu, Sep 6, 2018 at 12:21 PM Olof Johansson  wrote:
>
> Today these are all global shared variables per protocol, and in
> particular tcp_memory_allocated can get hot on a system with
> large number of CPUs and a substantial number of connections.
>
> Moving it over to a per-cpu variable makes it significantly cheaper,
> and the added overhead when summing up the percpu copies is still smaller
> than the cost of having a hot cacheline bouncing around.

I am curious. We never noticed contention on this variable, at least for TCP.

Please share some numbers with us.


[PATCH] net/sock: move memory_allocated over to percpu_counter variables

2018-09-06 Thread Olof Johansson
Today these are all global shared variables per protocol, and in
particular tcp_memory_allocated can get hot on a system with
large number of CPUs and a substantial number of connections.

Moving it over to a per-cpu variable makes it significantly cheaper,
and the added overhead when summing up the percpu copies is still smaller
than the cost of having a hot cacheline bouncing around.

Signed-off-by: Olof Johansson 
---
 crypto/af_alg.c | 10 --
 include/net/sctp/sctp.h |  3 ++-
 include/net/sock.h  | 12 ++--
 include/net/tcp.h   |  2 +-
 include/net/udp.h   |  2 +-
 net/core/sock.c |  5 -
 net/decnet/af_decnet.c  |  3 ++-
 net/ipv4/tcp.c  |  3 ++-
 net/ipv4/udp.c  |  4 +++-
 net/sctp/protocol.c |  6 ++
 net/sctp/socket.c   |  2 +-
 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index b053179e0bc5..1fd75a709d7b 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -29,7 +29,7 @@ struct alg_type_list {
struct list_head list;
 };
 
-static atomic_long_t alg_memory_allocated;
+static struct percpu_counter alg_memory_allocated;
 
 static struct proto alg_proto = {
.name   = "ALG",
@@ -1183,13 +1183,19 @@ static int __init af_alg_init(void)
if (err)
goto out;
 
-   err = sock_register(&alg_family);
+   err = percpu_counter_init(&alg_memory_allocated, 0, GFP_KERNEL);
if (err != 0)
goto out_unregister_proto;
 
+   err = sock_register(&alg_family);
+   if (err != 0)
+   goto out_free_percpu;
+
 out:
return err;
 
+out_free_percpu:
+   percpu_counter_destroy(&alg_memory_allocated);
 out_unregister_proto:
proto_unregister(&alg_proto);
goto out;
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 8c2caa370e0f..270579cf310b 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -36,7 +36,7 @@
  *Sridhar Samudrala 
  *Ardelle Fan   
  *Ryan Layer
- *Kevin Gao  
+ *Kevin Gao 
  */
 
 #ifndef __net_sctp_h__
@@ -114,6 +114,7 @@ __poll_t sctp_poll(struct file *file, struct socket *sock,
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
struct sctp_association *asoc);
+extern struct percpu_counter sctp_memory_allocated;
 extern struct percpu_counter sctp_sockets_allocated;
 int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *);
 struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
diff --git a/include/net/sock.h b/include/net/sock.h
index 433f45fc2d68..45aed5e84b5d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1114,7 +1114,7 @@ struct proto {
/* Memory pressure */
void(*enter_memory_pressure)(struct sock *sk);
void(*leave_memory_pressure)(struct sock *sk);
-   atomic_long_t   *memory_allocated;  /* Current allocated 
memory. */
+   struct percpu_counter   *memory_allocated;  /* Current allocated 
memory. */
struct percpu_counter   *sockets_allocated; /* Current number of 
sockets. */
/*
 * Pressure flag: try to collapse.
@@ -1237,19 +1237,19 @@ static inline bool sk_under_memory_pressure(const 
struct sock *sk)
 static inline long
 sk_memory_allocated(const struct sock *sk)
 {
-   return atomic_long_read(sk->sk_prot->memory_allocated);
+   return percpu_counter_sum_positive(sk->sk_prot->memory_allocated);
 }
 
-static inline long
+static inline void
 sk_memory_allocated_add(struct sock *sk, int amt)
 {
-   return atomic_long_add_return(amt, sk->sk_prot->memory_allocated);
+   percpu_counter_add(sk->sk_prot->memory_allocated, amt);
 }
 
 static inline void
 sk_memory_allocated_sub(struct sock *sk, int amt)
 {
-   atomic_long_sub(amt, sk->sk_prot->memory_allocated);
+   percpu_counter_sub(sk->sk_prot->memory_allocated, amt);
 }
 
 static inline void sk_sockets_allocated_dec(struct sock *sk)
@@ -1277,7 +1277,7 @@ proto_sockets_allocated_sum_positive(struct proto *prot)
 static inline long
 proto_memory_allocated(struct proto *prot)
 {
-   return atomic_long_read(prot->memory_allocated);
+   return percpu_counter_sum_positive(prot->memory_allocated);
 }
 
 static inline bool
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 770917d0caa7..2df1754cf3ab 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -248,7 +248,7 @@ extern long sysctl_tcp_mem[3];
 #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
 #define TCP_RACK_NO_DUPTHRESH0x4 /* Do not use DUPACK threshold in RACK */
 
-extern atomic_long_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
 extern unsigned long tcp_memory_pressure;
 
diff 

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Kees Cook
On Thu, Sep 6, 2018 at 7:49 AM, Ard Biesheuvel
 wrote:
> On 6 September 2018 at 15:11, Herbert Xu  wrote:
>> On Thu, Sep 06, 2018 at 11:29:41AM +0200, Ard Biesheuvel wrote:
>>>
>>> Perhaps not, but it is not enforced atm.
>>>
>>> In any case, limiting the reqsize is going to break things, so that
>>> needs to occur based on the sync/async nature of the algo. That also
>>> means we'll corrupt the stack if we ever end up using
>>> SKCIPHER_REQUEST_ON_STACK() with an async algo whose reqsize is
>>> greater than the sync reqsize limit, so I do think some additional
>>> sanity check is appropriate.
>>
>> I'd prefer compile-time based checks.  Perhaps we can introduce
>> a wrapper around crypto_skcipher, say crypto_skcipher_sync which
>> could then be used by SKCIPHER_REQUEST_ON_STACK to ensure that
>> only sync algorithms can use this construct.
>>
>
> That would require lots of changes in the callers, including ones that
> already take care to use sync algos only.
>
> How about we do something like the below instead?

Oh, I like this, thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 1/7] vfio/sdmdev: Add documents for WarpDrive framework

2018-09-06 Thread Randy Dunlap
Hi,

On 09/02/2018 05:51 PM, Kenneth Lee wrote:
> From: Kenneth Lee 
> 
> WarpDrive is a common user space accelerator framework.  Its main component
> in Kernel is called sdmdev, Share Domain Mediated Device. It exposes
> the hardware capabilities to the user space via vfio-mdev. So processes in
> user land can obtain a "queue" by open the device and direct access the
> hardware MMIO space or do DMA operation via VFIO interface.
> 
> WarpDrive is intended to be used with Jean Philippe Brucker's SVA
> patchset to support multi-process. But This is not a must.  Without the
> SVA patches, WarpDrive can still work for one process for every hardware
> device.
> 
> This patch add detail documents for the framework.
> 
> Signed-off-by: Kenneth Lee 
> ---
>  Documentation/00-INDEX|   2 +
>  Documentation/warpdrive/warpdrive.rst | 100 
>  Documentation/warpdrive/wd-arch.svg   | 728 ++
>  3 files changed, 830 insertions(+)
>  create mode 100644 Documentation/warpdrive/warpdrive.rst
>  create mode 100644 Documentation/warpdrive/wd-arch.svg

> diff --git a/Documentation/warpdrive/warpdrive.rst 
> b/Documentation/warpdrive/warpdrive.rst
> new file mode 100644
> index ..6d2a5d1e08c4
> --- /dev/null
> +++ b/Documentation/warpdrive/warpdrive.rst
> @@ -0,0 +1,100 @@
> +Introduction of WarpDrive
> +=
> +
> +*WarpDrive* is a general accelerator framework for user space. It intends to
> +provide interface for the user process to send request to hardware
> +accelerator without heavy user-kernel interaction cost.
> +
> +The *WarpDrive* user library is supposed to provide a pipe-based API, such 
> as:

Do you say "is supposed to" because it doesn't do that (yet)?
Or you could just change that to say:

   The WarpDrive user library provides a pipe-based API, such as:


> +::
> +int wd_request_queue(struct wd_queue *q);
> +void wd_release_queue(struct wd_queue *q);
> +
> +int wd_send(struct wd_queue *q, void *req);
> +int wd_recv(struct wd_queue *q, void **req);
> +int wd_recv_sync(struct wd_queue *q, void **req);
> +int wd_flush(struct wd_queue *q);
> +
> +*wd_request_queue* creates the pipe connection, *queue*, between the
> +application and the hardware. The application sends request and pulls the
> +answer back by asynchronized wd_send/wd_recv, which directly interact with 
> the
> +hardware (by MMIO or share memory) without syscall.
> +
> +*WarpDrive* maintains a unified application address space among all involved
> +accelerators.  With the following APIs: ::

Seems like an extra '.' there.  How about:

  accelerators with the following APIs: ::

> +
> +int wd_mem_share(struct wd_queue *q, const void *addr,
> + size_t size, int flags);
> +void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t 
> size);
> +
> +The referred process space shared by these APIs can be directly referred by 
> the
> +hardware. The process can also dedicate its whole process space with flags,
> +*WD_SHARE_ALL* (not in this patch yet).
> +
> +The name *WarpDrive* is simply a cool and general name meaning the framework
> +makes the application faster. As it will be explained in this text later, the
> +facility in kernel is called *SDMDEV*, namely "Share Domain Mediated Device".
> +
> +
> +How does it work
> +
> +
> +*WarpDrive* is built upon *VFIO-MDEV*. The queue is wrapped as *mdev* in 
> VFIO.
> +So memory sharing can be done via standard VFIO standard DMA interface.
> +
> +The architecture is illustrated as follow figure:
> +
> +.. image:: wd-arch.svg
> +:alt: WarpDrive Architecture
> +
> +Accelerator driver shares its capability via *SDMDEV* API: ::
> +
> +vfio_sdmdev_register(struct vfio_sdmdev *sdmdev);
> +vfio_sdmdev_unregister(struct vfio_sdmdev *sdmdev);
> +vfio_sdmdev_wake_up(struct spimdev_queue *q);
> +
> +*vfio_sdmdev_register* is a helper function to register the hardware to the
> +*VFIO_MDEV* framework. The queue creation is done by *mdev* creation 
> interface.
> +
> +*WarpDrive* User library mmap the mdev to access its mmio space and shared

s/mmio/MMIO/

> +memory. Request can be sent to, or receive from, hardware in this mmap-ed
> +space until the queue is full or empty.
> +
> +The user library can wait on the queue by ioctl(VFIO_SDMDEV_CMD_WAIT) the 
> mdev
> +if the queue is full or empty. If the queue status is changed, the hardware
> +driver use *vfio_sdmdev_wake_up* to wake up the waiting process.
> +
> +
> +Multiple processes support
> +==
> +
> +In the latest mainline kernel (4.18) when this document is written,
> +multi-process is not supported in VFIO yet.
> +
> +Jean Philippe Brucker has a patchset to enable it[1]_. We have tested it
> +with our hardware (which is known as *D06*). It works well. *WarpDrive* rely
> +on them to support multiple processes. If it is not e

Re: KASAN: use-after-free Read in sha512_mb_flusher

2018-09-06 Thread Tim Chen
On 08/20/2018 07:46 PM, Eric Biggers wrote:
> On Wed, Aug 15, 2018 at 09:35:03AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:5ed5da74de9e Add linux-next specific files for 20180813
>> git tree:   linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13fd82c440
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=18edf0289d1b5ab
>> dashboard link: https://syzkaller.appspot.com/bug?extid=94ee41ce442afe333d2e
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+94ee41ce442afe333...@syzkaller.appspotmail.com
>>
>> RDX: 22c0 RSI: 0005 RDI: 0004
>> RBP: 009300a0 R08:  R09: 
>> R10: 0f6a R11: 0246 R12: 0006
>> R13: 004d3840 R14: 004c85ae R15: 0003
>> ==
>> BUG: KASAN: use-after-free in sha512_mb_flusher+0x572/0x610
>> arch/x86/crypto/sha512-mb/sha512_mb.c:940
>> Read of size 8 at addr 8801b35669d8 by task kworker/0:2/1879
>>
>> CPU: 0 PID: 1879 Comm: kworker/0:2 Not tainted 4.18.0-next-20180813+ #37
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Workqueue: crypto mcryptd_flusher
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>>  kasan_report_error mm/kasan/report.c:354 [inline]
>>  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>>  sha512_mb_flusher+0x572/0x610 arch/x86/crypto/sha512-mb/sha512_mb.c:940
>>  mcryptd_flusher+0x342/0x4b0 crypto/mcryptd.c:208
>>  process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
>>  worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
>>  kthread+0x35a/0x420 kernel/kthread.c:246
>>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
>>
>> Allocated by task 18983:
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>>  set_track mm/kasan/kasan.c:460 [inline]
>>  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
>>  __do_kmalloc mm/slab.c:3718 [inline]
>>  __kmalloc+0x14e/0x720 mm/slab.c:3727
>>  kmalloc include/linux/slab.h:518 [inline]
>>  sock_kmalloc+0x156/0x1f0 net/core/sock.c:1983
>>  hash_accept_parent_nokey+0x58/0x2e0 crypto/algif_hash.c:438
>>  hash_accept_parent+0x5b/0x80 crypto/algif_hash.c:465
>>  af_alg_accept+0x127/0x7d0 crypto/af_alg.c:296
>>  alg_accept+0x46/0x60 crypto/af_alg.c:332
>>  __sys_accept4+0x3b2/0x8a0 net/socket.c:1589
>>  __do_sys_accept4 net/socket.c:1624 [inline]
>>  __se_sys_accept4 net/socket.c:1621 [inline]
>>  __x64_sys_accept4+0x97/0xf0 net/socket.c:1621
>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Freed by task 18979:
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>>  set_track mm/kasan/kasan.c:460 [inline]
>>  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
>>  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>>  __cache_free mm/slab.c:3498 [inline]
>>  kfree+0xd9/0x210 mm/slab.c:3813
>>  __sock_kfree_s net/core/sock.c:2004 [inline]
>>  sock_kfree_s+0x29/0x60 net/core/sock.c:2010
>>  hash_sock_destruct+0x157/0x1c0 crypto/algif_hash.c:427
>>  __sk_destruct+0x107/0xa60 net/core/sock.c:1560
>>  sk_destruct+0x78/0x90 net/core/sock.c:1595
>>  __sk_free+0xcf/0x300 net/core/sock.c:1606
>>  sk_free+0x42/0x50 net/core/sock.c:1617
>>  sock_put include/net/sock.h:1691 [inline]
>>  af_alg_release+0x6e/0x90 crypto/af_alg.c:126
>>  __sock_release+0xd7/0x250 net/socket.c:580
>>  sock_close+0x19/0x20 net/socket.c:1140
>>  __fput+0x376/0x8a0 fs/file_table.c:279
>>  fput+0x15/0x20 fs/file_table.c:312
>>  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
>>  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
>>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> The buggy address belongs to the object at 8801b3566600
>>  which belongs to the cache kmalloc-2048 of size 2048
>> The buggy address is located 984 bytes inside of
>>  2048-byte region [8801b3566600, 8801b3566e00)
>> The buggy address belongs to the page:
>> page:ea0006cd5980 count:1 mapcount:0 mapping:8801dac00c40 index:0x0
>> compound_mapcount: 0
>> flags: 0x2fffc008100(slab|head)
>> raw: 02fffc008100 ea0006cf2c08 ea000764ef08 8801dac00c40
>> raw:  8801b3566600 00010003 
>> page dumped because: kasan: 

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Ard Biesheuvel
On 6 September 2018 at 15:11, Herbert Xu  wrote:
> On Thu, Sep 06, 2018 at 11:29:41AM +0200, Ard Biesheuvel wrote:
>>
>> Perhaps not, but it is not enforced atm.
>>
>> In any case, limiting the reqsize is going to break things, so that
>> needs to occur based on the sync/async nature of the algo. That also
>> means we'll corrupt the stack if we ever end up using
>> SKCIPHER_REQUEST_ON_STACK() with an async algo whose reqsize is
>> greater than the sync reqsize limit, so I do think some additional
>> sanity check is appropriate.
>
> I'd prefer compile-time based checks.  Perhaps we can introduce
> a wrapper around crypto_skcipher, say crypto_skcipher_sync which
> could then be used by SKCIPHER_REQUEST_ON_STACK to ensure that
> only sync algorithms can use this construct.
>

That would require lots of changes in the callers, including ones that
already take care to use sync algos only.

How about we do something like the below instead?

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 2f327f090c3e..ace707d59cd9 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -19,6 +19,7 @@

 /**
  * struct skcipher_request - Symmetric key cipher request
+ * @__onstack: 1 if the request was allocated by SKCIPHER_REQUEST_ON_STACK
  * @cryptlen: Number of bytes to encrypt or decrypt
  * @iv: Initialisation Vector
  * @src: Source SG list
@@ -27,6 +28,7 @@
  * @__ctx: Start of private context data
  */
 struct skcipher_request {
+   unsigned char __onstack;
unsigned int cryptlen;

u8 *iv;
@@ -141,7 +143,7 @@ struct skcipher_alg {

 #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \
char __##name##_desc[sizeof(struct skcipher_request) + \
-   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR; \
+   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR = { 1 }; \
struct skcipher_request *name = (void *)__##name##_desc

 /**
@@ -437,6 +439,10 @@ static inline int crypto_skcipher_encrypt(struct
skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);

+   if (req->__onstack &&
+   (crypto_skcipher_alg(tfm)->base.cra_flags & CRYPTO_ALG_ASYNC))
+   return -EINVAL;
+
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;

@@ -458,6 +464,10 @@ static inline int crypto_skcipher_decrypt(struct
skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);

+   if (req->__onstack &&
+   (crypto_skcipher_alg(tfm)->base.cra_flags & CRYPTO_ALG_ASYNC))
+   return -EINVAL;
+
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-06 Thread Jerome Glisse
On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson wrote:
> > Date: Tue, 4 Sep 2018 10:15:09 -0600
> > From: Alex Williamson 
> > To: Jerome Glisse 
> > CC: Kenneth Lee , Jonathan Corbet ,
> >  Herbert Xu , "David S . Miller"
> >  , Joerg Roedel , Kenneth Lee
> >  , Hao Fang , Zhou Wang
> >  , Zaibo Xu , Philippe
> >  Ombredanne , Greg Kroah-Hartman
> >  , Thomas Gleixner ,
> >  linux-...@vger.kernel.org, linux-ker...@vger.kernel.org,
> >  linux-crypto@vger.kernel.org, io...@lists.linux-foundation.org,
> >  k...@vger.kernel.org, linux-accelerat...@lists.ozlabs.org, Lu Baolu
> >  , Sanjay Kumar ,
> >  linux...@huawei.com
> > Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> > Message-ID: <20180904101509.62314...@t450s.home>
> > 
> > On Tue, 4 Sep 2018 11:00:19 -0400
> > Jerome Glisse  wrote:
> > 
> > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee wrote:
> > > > From: Kenneth Lee 
> > > > 
> > > > WarpDrive is an accelerator framework to expose the hardware 
> > > > capabilities
> > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > facilities. So the user application can send request and DMA to the
> > > > hardware without interaction with the kernel. This removes the latency
> > > > of syscall.
> > > > 
> > > > WarpDrive is the name for the whole framework. The component in kernel
> > > > is called SDMDEV, Share Domain Mediated Device. Driver driver exposes 
> > > > its
> > > > hardware resource by registering to SDMDEV as a VFIO-Mdev. So the user
> > > > library of WarpDrive can access it via VFIO interface.
> > > > 
> > > > The patchset contains document for the detail. Please refer to it for 
> > > > more
> > > > information.
> > > > 
> > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > patch [1], which enables not only IO side page fault, but also PASID
> > > > support to IOMMU and VFIO.
> > > > 
> > > > With these features, WarpDrive can support non-pinned memory and
> > > > multi-process in the same accelerator device.  We tested it in our SoC
> > > > integrated Accelerator (board ID: D06, Chip ID: HIP08). A reference work
> > > > tree can be found here: [2].
> > > > 
> > > > But it is not mandatory. This patchset is tested in the latest mainline
> > > > kernel without the SVA patches.  So it supports only one process for 
> > > > each
> > > > accelerator.
> > > > 
> > > > We have noticed the IOMMU aware mdev RFC announced recently [3].
> > > > 
> > > > The IOMMU aware mdev has similar idea but different intention comparing 
> > > > to
> > > > WarpDrive. It intends to dedicate part of the hardware resource to a VM.
> > > > And the design is supposed to be used with Scalable I/O Virtualization.
> > > > While sdmdev is intended to share the hardware resource with a big 
> > > > amount
> > > > of processes.  It just requires the hardware supporting address
> > > > translation per process (PCIE's PASID or ARM SMMU's substream ID).
> > > > 
> > > > But we don't see serious confliction on both design. We believe they 
> > > > can be
> > > > normalized as one.
> > > >   
> > > 
> > > So once again i do not understand why you are trying to do things
> > > this way. Kernel already have tons of example of everything you
> > > want to do without a new framework. Moreover i believe you are
> > > confuse by VFIO. To me VFIO is for VM not to create general device
> > > driver frame work.
> > 
> > VFIO is a userspace driver framework, the VM use case just happens to
> > be a rather prolific one.  VFIO was never intended to be solely a VM
> > device interface and has several other userspace users, notably DPDK
> > and SPDK, an NVMe backend in QEMU, a userspace NVMe driver, a ruby
> > wrapper, and perhaps others that I'm not aware of.  Whether vfio is
> > appropriate interface here might certainly still be a debatable topic,
> > but I would strongly disagree with your last sentence above.  Thanks,
> > 
> > Alex
> > 
> 
> Yes, that is also my standpoint here.
> 
> > > So here is your use case as i understand it. You have a device
> > > with a limited number of command queues (can be just one) and in
> > > some case it can support SVA/SVM (when hardware support it and it
> > > is not disabled). Final requirement is being able to schedule cmds
> > > from userspace without ioctl. All of this exists already exists
> > > upstream in few device drivers.
> > > 
> > > 
> > > So here is how every body else is doing it. Please explain why
> > > this does not work.
> > > 
> > > 1 Userspace open device file driver. Kernel device driver create
> > >   a context and associate it with on open. This context can be
> > >   uniq to the process and can bind hardware resources (like a
> > >   command queue) to the process.
> > > 2 Userspace bind/acquire a commands queue and initialize it with
> > >   an ioctl on the device file. Through that ioctl userspace can
> > >  

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Herbert Xu
On Thu, Sep 06, 2018 at 11:29:41AM +0200, Ard Biesheuvel wrote:
>
> Perhaps not, but it is not enforced atm.
> 
> In any case, limiting the reqsize is going to break things, so that
> needs to occur based on the sync/async nature of the algo. That also
> means we'll corrupt the stack if we ever end up using
> SKCIPHER_REQUEST_ON_STACK() with an async algo whose reqsize is
> greater than the sync reqsize limit, so I do think some additional
> sanity check is appropriate.

I'd prefer compile-time based checks.  Perhaps we can introduce
a wrapper around crypto_skcipher, say crypto_skcipher_sync which
could then be used by SKCIPHER_REQUEST_ON_STACK to ensure that
only sync algorithms can use this construct.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] fscrypt: remove CRYPTO_CTR dependency

2018-09-06 Thread Ard Biesheuvel
On 5 September 2018 at 21:24, Eric Biggers  wrote:
> From: Eric Biggers 
>
> fscrypt doesn't use the CTR mode of operation for anything, so there's
> no need to select CRYPTO_CTR.  It was added by commit 71dea01ea2ed
> ("ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is
> enabled").  But, I've been unable to identify the arm64 crypto bug it
> was supposedly working around.
>
> I suspect the issue was seen only on some old Android device kernel
> (circa 3.10?).  So if the fix wasn't mistaken, the real bug is probably
> already fixed.  Or maybe it was actually a bug in a non-upstream crypto
> driver.
>
> So, remove the dependency.  If it turns out there's actually still a
> bug, we'll fix it properly.
>
> Signed-off-by: Eric Biggers 

Acked-by: Ard Biesheuvel 

This may be related to

11e3b725cfc2 crypto: arm64/aes-blk - honour iv_out requirement in CBC
and CTR modes

given that the commit in question mentions CTS. How it actually works
around the issue is unclear to me, though.




> ---
>  fs/crypto/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> index 02b7d91c92310..284b589b4774d 100644
> --- a/fs/crypto/Kconfig
> +++ b/fs/crypto/Kconfig
> @@ -6,7 +6,6 @@ config FS_ENCRYPTION
> select CRYPTO_ECB
> select CRYPTO_XTS
> select CRYPTO_CTS
> -   select CRYPTO_CTR
> select CRYPTO_SHA256
> select KEYS
> help
> --
> 2.19.0.rc2.392.g5ba43deb5a-goog
>


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-06 Thread Kenneth Lee
On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson wrote:
> Date: Tue, 4 Sep 2018 10:15:09 -0600
> From: Alex Williamson 
> To: Jerome Glisse 
> CC: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Kenneth Lee
>  , Hao Fang , Zhou Wang
>  , Zaibo Xu , Philippe
>  Ombredanne , Greg Kroah-Hartman
>  , Thomas Gleixner ,
>  linux-...@vger.kernel.org, linux-ker...@vger.kernel.org,
>  linux-crypto@vger.kernel.org, io...@lists.linux-foundation.org,
>  k...@vger.kernel.org, linux-accelerat...@lists.ozlabs.org, Lu Baolu
>  , Sanjay Kumar ,
>  linux...@huawei.com
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <20180904101509.62314...@t450s.home>
> 
> On Tue, 4 Sep 2018 11:00:19 -0400
> Jerome Glisse  wrote:
> 
> > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee wrote:
> > > From: Kenneth Lee 
> > > 
> > > WarpDrive is an accelerator framework to expose the hardware capabilities
> > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > facilities. So the user application can send request and DMA to the
> > > hardware without interaction with the kernel. This removes the latency
> > > of syscall.
> > > 
> > > WarpDrive is the name for the whole framework. The component in kernel
> > > is called SDMDEV, Share Domain Mediated Device. Driver driver exposes its
> > > hardware resource by registering to SDMDEV as a VFIO-Mdev. So the user
> > > library of WarpDrive can access it via VFIO interface.
> > > 
> > > The patchset contains document for the detail. Please refer to it for more
> > > information.
> > > 
> > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > patch [1], which enables not only IO side page fault, but also PASID
> > > support to IOMMU and VFIO.
> > > 
> > > With these features, WarpDrive can support non-pinned memory and
> > > multi-process in the same accelerator device.  We tested it in our SoC
> > > integrated Accelerator (board ID: D06, Chip ID: HIP08). A reference work
> > > tree can be found here: [2].
> > > 
> > > But it is not mandatory. This patchset is tested in the latest mainline
> > > kernel without the SVA patches.  So it supports only one process for each
> > > accelerator.
> > > 
> > > We have noticed the IOMMU aware mdev RFC announced recently [3].
> > > 
> > > The IOMMU aware mdev has similar idea but different intention comparing to
> > > WarpDrive. It intends to dedicate part of the hardware resource to a VM.
> > > And the design is supposed to be used with Scalable I/O Virtualization.
> > > While sdmdev is intended to share the hardware resource with a big amount
> > > of processes.  It just requires the hardware supporting address
> > > translation per process (PCIE's PASID or ARM SMMU's substream ID).
> > > 
> > > But we don't see serious confliction on both design. We believe they can 
> > > be
> > > normalized as one.
> > >   
> > 
> > So once again i do not understand why you are trying to do things
> > this way. Kernel already have tons of example of everything you
> > want to do without a new framework. Moreover i believe you are
> > confuse by VFIO. To me VFIO is for VM not to create general device
> > driver frame work.
> 
> VFIO is a userspace driver framework, the VM use case just happens to
> be a rather prolific one.  VFIO was never intended to be solely a VM
> device interface and has several other userspace users, notably DPDK
> and SPDK, an NVMe backend in QEMU, a userspace NVMe driver, a ruby
> wrapper, and perhaps others that I'm not aware of.  Whether vfio is
> appropriate interface here might certainly still be a debatable topic,
> but I would strongly disagree with your last sentence above.  Thanks,
> 
> Alex
> 

Yes, that is also my standpoint here.

> > So here is your use case as i understand it. You have a device
> > with a limited number of command queues (can be just one) and in
> > some case it can support SVA/SVM (when hardware support it and it
> > is not disabled). Final requirement is being able to schedule cmds
> > from userspace without ioctl. All of this exists already exists
> > upstream in few device drivers.
> > 
> > 
> > So here is how every body else is doing it. Please explain why
> > this does not work.
> > 
> > 1 Userspace open device file driver. Kernel device driver create
> >   a context and associate it with on open. This context can be
> >   uniq to the process and can bind hardware resources (like a
> >   command queue) to the process.
> > 2 Userspace bind/acquire a commands queue and initialize it with
> >   an ioctl on the device file. Through that ioctl userspace can
> >   be inform wether either SVA/SVM works for the device. If SVA/
> >   SVM works then kernel device driver bind the process to the
> >   device as part of this ioctl.
> > 3 If SVM/SVA does not work userspace do an ioctl to create dma
> >   buffer or something that does exactly the same thing.
> > 4 Userspace mmap the comm

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Ard Biesheuvel
On 6 September 2018 at 10:51, Herbert Xu  wrote:
> On Thu, Sep 06, 2018 at 10:11:59AM +0200, Ard Biesheuvel wrote:
>>
>> That way, we will almost certainly oops on a NULL pointer dereference
>> right after, but we at least the stack corruption.
>
> A crash is just as bad as a BUG_ON.
>
> Is this even a real problem? Do we have any users of this construct
> that is using it on async algorithms?
>

Perhaps not, but it is not enforced atm.

In any case, limiting the reqsize is going to break things, so that
needs to occur based on the sync/async nature of the algo. That also
means we'll corrupt the stack if we ever end up using
SKCIPHER_REQUEST_ON_STACK() with an async algo whose reqsize is
greater than the sync reqsize limit, so I do think some additional
sanity check is appropriate.


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-06 Thread Kenneth Lee
On Mon, Sep 03, 2018 at 10:32:16AM +0800, Lu Baolu wrote:
> Date: Mon, 3 Sep 2018 10:32:16 +0800
> From: Lu Baolu 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Sanjay Kumar
>  
> CC: baolu...@linux.intel.com, linux...@huawei.com
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <81edb8ff-d046-34e5-aee7-d8564e251...@linux.intel.com>
> 
> Hi,
> 
> On 09/03/2018 08:51 AM, Kenneth Lee wrote:
> >From: Kenneth Lee 
> >
> >WarpDrive is an accelerator framework to expose the hardware capabilities
> >directly to the user space. It makes use of the exist vfio and vfio-mdev
> >facilities. So the user application can send request and DMA to the
> >hardware without interaction with the kernel. This removes the latency
> >of syscall.
> >
> >WarpDrive is the name for the whole framework. The component in kernel
> >is called SDMDEV, Share Domain Mediated Device. Driver driver exposes its
> >hardware resource by registering to SDMDEV as a VFIO-Mdev. So the user
> >library of WarpDrive can access it via VFIO interface.
> >
> >The patchset contains document for the detail. Please refer to it for more
> >information.
> >
> >This patchset is intended to be used with Jean Philippe Brucker's SVA
> >patch [1], which enables not only IO side page fault, but also PASID
> >support to IOMMU and VFIO.
> >
> >With these features, WarpDrive can support non-pinned memory and
> >multi-process in the same accelerator device.  We tested it in our SoC
> >integrated Accelerator (board ID: D06, Chip ID: HIP08). A reference work
> >tree can be found here: [2].
> >
> >But it is not mandatory. This patchset is tested in the latest mainline
> >kernel without the SVA patches.  So it supports only one process for each
> >accelerator.
> >
> >We have noticed the IOMMU aware mdev RFC announced recently [3].
> >
> >The IOMMU aware mdev has similar idea but different intention comparing to
> >WarpDrive. It intends to dedicate part of the hardware resource to a VM.
> >And the design is supposed to be used with Scalable I/O Virtualization.
> >While sdmdev is intended to share the hardware resource with a big amount
> >of processes.  It just requires the hardware supporting address
> >translation per process (PCIE's PASID or ARM SMMU's substream ID).
> >
> >But we don't see serious confliction on both design. We believe they can be
> >normalized as one.
> >
> >The patch 1 is document of the framework. The patch 2 and 3 add sdmdev
> >support. The patch 4, 5 and 6 is drivers for Hislicon's ZIP Accelerator
> >which is registered to both crypto and warpdrive(sdmdev) and can be
> >used from kernel or user space at the same time. The patch 7 is a user
> >space sample demonstrating how WarpDrive works.
> >
> >
> >Change History:
> >V2 changed from V1:
> > 1. Change kernel framework name from SPIMDEV (Share Parent IOMMU
> >Mdev) to SDMDEV (Share Domain Mdev).
> > 2. Allocate Hardware Resource when a new mdev is created (While
> >it is allocated when the mdev is openned)
> > 3. Unmap pages from the shared domain when the sdmdev iommu group is
> >detached. (This procedure is necessary, but missed in V1)
> > 4. Update document accordingly.
> > 5. Rebase to the latest kernel (4.19.0-rc1)
> > 
> > According the review comment on RFCv1, We did try to use dma-buf
> > as back end of WarpDrive. It can work properly with the current
> > solution [4], but it cannot make use of process's
> > own memory address space directly. This is important to many
> > acceleration scenario. So dma-buf will be taken as a backup
> > alternative for noiommu scenario, it will be added in the future
> > version.
> >
> >
> >Refernces:
> >[1] https://www.spinics.net/lists/kernel/msg2651481.html
> >[2] 
> >https://github.com/Kenneth-Lee/linux-kernel-warpdrive/tree/warpdrive-sva-v0.5
> >[3] https://lkml.org/lkml/2018/7/22/34
> 
> Please refer to the latest version posted here for discussion.
> 
> https://lkml.org/lkml/2018/8/30/107

Sure. Thank you.

> 
> Best regards,
> Lu Baolu

-- 
-Kenneth(Hisilicon)


本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein 

Re: [PATCH 7/7] vfio/sdmdev: add user sample

2018-09-06 Thread Kenneth Lee
On Sun, Sep 02, 2018 at 07:25:12PM -0700, Randy Dunlap wrote:
> Date: Sun, 2 Sep 2018 19:25:12 -0700
> From: Randy Dunlap 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu ,
>  Sanjay Kumar 
> CC: linux...@huawei.com
> Subject: Re: [PATCH 7/7] vfio/sdmdev: add user sample
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: 
> 
> On 09/02/2018 05:52 PM, Kenneth Lee wrote:
> > From: Kenneth Lee 
> > 
> > This is the sample code to demostrate how WrapDrive user application
> > should be.
> > 
> > It contains:
> > 
> > 1. wd.[ch], the common library to provide WrapDrive interface.
> 
> WarpDrive
> 
> > 2. drv/*, the user driver to access the hardware upon spimdev
> > 3. test/*, the test application to use WrapDrive interface to access the
> 
>  WarpDrive
> 
> >hardware queue(s) of the accelerator.
> > 
> > The Hisilicon HIP08 ZIP accelerator is used in this sample.
> > 
> > Signed-off-by: Zaibo Xu 
> > Signed-off-by: Kenneth Lee 
> > Signed-off-by: Hao Fang 
> > Signed-off-by: Zhou Wang 
> > ---
> >  samples/warpdrive/AUTHORS  |   2 +
> >  samples/warpdrive/ChangeLog|   1 +
> >  samples/warpdrive/Makefile.am  |   9 +
> >  samples/warpdrive/NEWS |   1 +
> >  samples/warpdrive/README   |  32 +++
> >  samples/warpdrive/autogen.sh   |   3 +
> >  samples/warpdrive/cleanup.sh   |  13 ++
> >  samples/warpdrive/configure.ac |  52 +
> >  samples/warpdrive/drv/hisi_qm_udrv.c   | 223 ++
> >  samples/warpdrive/drv/hisi_qm_udrv.h   |  53 +
> >  samples/warpdrive/test/Makefile.am |   7 +
> >  samples/warpdrive/test/comp_hw.h   |  23 ++
> >  samples/warpdrive/test/test_hisi_zip.c | 206 +
> >  samples/warpdrive/wd.c | 309 +
> >  samples/warpdrive/wd.h | 154 
> >  samples/warpdrive/wd_adapter.c |  74 ++
> >  samples/warpdrive/wd_adapter.h |  43 
> >  17 files changed, 1205 insertions(+)
> >  create mode 100644 samples/warpdrive/AUTHORS
> >  create mode 100644 samples/warpdrive/ChangeLog
> >  create mode 100644 samples/warpdrive/Makefile.am
> >  create mode 100644 samples/warpdrive/NEWS
> >  create mode 100644 samples/warpdrive/README
> >  create mode 100755 samples/warpdrive/autogen.sh
> >  create mode 100755 samples/warpdrive/cleanup.sh
> >  create mode 100644 samples/warpdrive/configure.ac
> >  create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.c
> >  create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.h
> >  create mode 100644 samples/warpdrive/test/Makefile.am
> >  create mode 100644 samples/warpdrive/test/comp_hw.h
> >  create mode 100644 samples/warpdrive/test/test_hisi_zip.c
> >  create mode 100644 samples/warpdrive/wd.c
> >  create mode 100644 samples/warpdrive/wd.h
> >  create mode 100644 samples/warpdrive/wd_adapter.c
> >  create mode 100644 samples/warpdrive/wd_adapter.h
> 
> > diff --git a/samples/warpdrive/README b/samples/warpdrive/README
> > new file mode 100644
> > index ..3adf66b112fc
> > --- /dev/null
> > +++ b/samples/warpdrive/README
> > @@ -0,0 +1,32 @@
> > +WD User Land Demonstration
> > +==
> > +
> > +This directory contains some applications and libraries to demonstrate how 
> > a
> > +
> > +WrapDrive application can be constructed.
> 
>WarpDrive
> 
> > +
> > +
> > +As a demo, we try to make it simple and clear for understanding. It is not
> > +
> > +supposed to be used in business scenario.
> > +
> > +
> > +The directory contains the following elements:
> > +
> > +wd.[ch]
> > +   A demonstration WrapDrive fundamental library which wraps the basic
> 
>   WarpDrive
> 
> > +   operations to the WrapDrive-ed device.
> 
>   WarpDrive
> 
> > +
> > +wd_adapter.[ch]
> > +   User driver adaptor for wd.[ch]
> > +
> > +wd_utils.[ch]
> > +   Some utitlities function used by WD and its drivers
> > +
> > +drv/*
> > +   User drivers. It helps to fulfill the semantic of wd.[ch] for
> > +   particular hardware
> > +
> > +test/*
> > +   Test applications to use the wrapdrive library
> 
>warpdrive
> 
> -- 
> ~Randy

Thank you, will change them all in the coming version.

-- 
-Kenneth(Hisilicon)


本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部

Re: [PATCH 6/7] crypto: add sdmdev support to Hisilicon QM

2018-09-06 Thread Kenneth Lee
On Sun, Sep 02, 2018 at 07:19:03PM -0700, Randy Dunlap wrote:
> Date: Sun, 2 Sep 2018 19:19:03 -0700
> From: Randy Dunlap 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu ,
>  Sanjay Kumar 
> CC: linux...@huawei.com
> Subject: Re: [PATCH 6/7] crypto: add sdmdev support to Hisilicon QM
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <46e57b2b-756e-e256-5b3c-30749d865...@infradead.org>
> 
> On 09/02/2018 05:52 PM, Kenneth Lee wrote:
> > diff --git a/drivers/crypto/hisilicon/Kconfig 
> > b/drivers/crypto/hisilicon/Kconfig
> > index 1d155708cd69..b85fab48fdab 100644
> > --- a/drivers/crypto/hisilicon/Kconfig
> > +++ b/drivers/crypto/hisilicon/Kconfig
> > @@ -17,6 +17,16 @@ config CRYPTO_DEV_HISI_SEC
> >   To compile this as a module, choose M here: the module
> >   will be called hisi_sec.
> >  
> > +config CRYPTO_DEV_HISI_SDMDEV
> > +   bool "Enable SDMDEV interface"
> > +   depends on CRYPTO_DEV_HISILICON
> > +   select VFIO_SDMDEV
> > +   help
> > + Enable this enable the SDMDEV, "shared IOMMU Domain Mediated Device"
> 
> At a minimum:
> Enable this to enable the SDMDEV,
> 
> although that could be done better.  Maybe just:
> Enable the SDMDEV "shared IOMMU Domain Mediated Device"
> 
> 
> > + interface for all Hisilicon accelerators if they can. The SDMDEV
> 
> probably drop "if they can":  accelerators.  The SDMDEV interface
> 
> > + enable the WarpDrive user space accelerator driver to access the
> 
> enables

Thank you, will change them all in the coming version.

> 
> > + hardware function directly.
> > +
> 
> 
> -- 
> ~Randy

-- 
-Kenneth(Hisilicon)


本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!



Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver

2018-09-06 Thread Kenneth Lee
On Sun, Sep 02, 2018 at 07:15:07PM -0700, Randy Dunlap wrote:
> Date: Sun, 2 Sep 2018 19:15:07 -0700
> From: Randy Dunlap 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu ,
>  Sanjay Kumar 
> CC: linux...@huawei.com
> Subject: Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <4e46a451-d1cd-ac68-84b4-20792fdbc...@infradead.org>
> 
> On 09/02/2018 05:52 PM, Kenneth Lee wrote:
> > diff --git a/drivers/crypto/hisilicon/Kconfig 
> > b/drivers/crypto/hisilicon/Kconfig
> > index 8ca9c503bcb0..02a6eef84101 100644
> > --- a/drivers/crypto/hisilicon/Kconfig
> > +++ b/drivers/crypto/hisilicon/Kconfig
> > @@ -1,4 +1,8 @@
> >  # SPDX-License-Identifier: GPL-2.0
> > +config CRYPTO_DEV_HISILICON
> > +   tristate "Support for HISILICON CRYPTO ACCELERATOR"
> > +   help
> > + Enable this to use Hisilicon Hardware Accelerators
> 
>   Accelerators.

Thanks, will change it in next version.

> 
> 
> -- 
> ~Randy

-- 
-Kenneth(Hisilicon)


本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!



Re: [PATCH 3/7] vfio: add sdmdev support

2018-09-06 Thread Kenneth Lee
On Mon, Sep 03, 2018 at 10:55:57AM +0800, Lu Baolu wrote:
> Date: Mon, 3 Sep 2018 10:55:57 +0800
> From: Lu Baolu 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Sanjay Kumar
>  
> CC: linux...@huawei.com, baolu...@linux.intel.com
> Subject: Re: [PATCH 3/7] vfio: add sdmdev support
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <4ea51b20-dcc1-db32-18eb-24a004ab9...@linux.intel.com>
> 
> Hi,
> 
> On 09/03/2018 08:52 AM, Kenneth Lee wrote:
> >From: Kenneth Lee 
> >
> >SDMDEV is "Share Domain Mdev". It is a vfio-mdev. But differ from
> >the general vfio-mdev, it shares its parent's IOMMU. If Multi-PASID
> >support is enabled in the IOMMU (not yet in the current kernel HEAD),
> >multiple process can share the IOMMU by different PASID. If it is not
> >support, only one process can share the IOMMU with the kernel driver.
> >
> 
> If only for share domain purpose, I don't think it's necessary to create
> a new device type.
> 

Yes, if ONLY for share domain purpose. But we need also to share the interrupt.

> >Currently only the vfio type-1 driver is updated to make it to be aware
> >of.
> >
> >Signed-off-by: Kenneth Lee 
> >Signed-off-by: Zaibo Xu 
> >Signed-off-by: Zhou Wang 
> >---
> >  drivers/vfio/Kconfig  |   1 +
> >  drivers/vfio/Makefile |   1 +
> >  drivers/vfio/sdmdev/Kconfig   |  10 +
> >  drivers/vfio/sdmdev/Makefile  |   3 +
> >  drivers/vfio/sdmdev/vfio_sdmdev.c | 363 ++
> >  drivers/vfio/vfio_iommu_type1.c   | 151 -
> >  include/linux/vfio_sdmdev.h   |  96 
> >  include/uapi/linux/vfio_sdmdev.h  |  29 +++
> >  8 files changed, 648 insertions(+), 6 deletions(-)
> >  create mode 100644 drivers/vfio/sdmdev/Kconfig
> >  create mode 100644 drivers/vfio/sdmdev/Makefile
> >  create mode 100644 drivers/vfio/sdmdev/vfio_sdmdev.c
> >  create mode 100644 include/linux/vfio_sdmdev.h
> >  create mode 100644 include/uapi/linux/vfio_sdmdev.h
> >
> 
> [--cut for short --]
> 
> >diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >b/drivers/vfio/vfio_iommu_type1.c
> >index d9fd3188615d..ba73231d8692 100644
> >--- a/drivers/vfio/vfio_iommu_type1.c
> >+++ b/drivers/vfio/vfio_iommu_type1.c
> >@@ -41,6 +41,7 @@
> >  #include 
> >  #include 
> >  #include 
> >+#include 
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson "
> >@@ -89,6 +90,8 @@ struct vfio_dma {
> >  };
> >  struct vfio_group {
> >+/* iommu_group of mdev's parent device */
> >+struct iommu_group  *parent_group;
> > struct iommu_group  *iommu_group;
> > struct list_headnext;
> >  };
> >@@ -1327,6 +1330,109 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group 
> >*group, phys_addr_t *base)
> > return ret;
> >  }
> >+/* return 0 if the device is not sdmdev.
> >+ * return 1 if the device is sdmdev, the data will be updated with parent
> >+ *  device's group.
> >+ * return -errno if other error.
> >+ */
> >+static int vfio_sdmdev_type(struct device *dev, void *data)
> >+{
> >+struct iommu_group **group = data;
> >+struct iommu_group *pgroup;
> >+int (*_is_sdmdev)(struct device *dev);
> >+struct device *pdev;
> >+int ret = 1;
> >+
> >+/* vfio_sdmdev module is not configurated */
> >+_is_sdmdev = symbol_get(vfio_sdmdev_is_sdmdev);
> >+if (!_is_sdmdev)
> >+return 0;
> >+
> >+/* check if it belongs to vfio_sdmdev device */
> >+if (!_is_sdmdev(dev)) {
> >+ret = 0;
> >+goto out;
> >+}
> >+
> >+pdev = dev->parent;
> >+pgroup = iommu_group_get(pdev);
> >+if (!pgroup) {
> >+ret = -ENODEV;
> >+goto out;
> >+}
> >+
> >+if (group) {
> >+/* check if all parent devices is the same */
> >+if (*group && *group != pgroup)
> >+ret = -ENODEV;
> >+else
> >+*group = pgroup;
> >+}
> >+
> >+iommu_group_put(pgroup);
> >+
> >+out:
> >+symbol_put(vfio_sdmdev_is_sdmdev);
> >+
> >+return ret;
> >+}
> >+
> >+/* return 0 or -errno */
> >+static int vfio_sdmdev_bus(struct device *dev, void *data)
> >+{
> >+struct bus_type **bus = data;
> >+
> >+if (!dev->bus)
> >+return -ENODEV;
> >+
> >+/* ensure all devices has the same bus_type */
> >+if (*bus && *bus != dev->bus)
> >+return -EINVAL;
> >+
> >+*bus = dev->bus;
> >+return 0;
> >+}
> >+
> >+/* return 0 means it is not sd group, 1 means it is, or -EXXX for error */
> >+static int vfio_iommu_type1_attach_sdgrou

Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Herbert Xu
On Thu, Sep 06, 2018 at 10:11:59AM +0200, Ard Biesheuvel wrote:
>
> That way, we will almost certainly oops on a NULL pointer dereference
> right after, but we at least the stack corruption.

A crash is just as bad as a BUG_ON.

Is this even a real problem? Do we have any users of this construct
that is using it on async algorithms?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Gilad Ben-Yossef
On Thu, Sep 6, 2018 at 10:21 AM, Ard Biesheuvel
 wrote:

>>> The skcipher implementations based on crypto IP blocks are typically
>>> asynchronous, and I wouldn't be surprised if a fair number of
>>> SKCIPHER_REQUEST_ON_STACK() users are limited to synchronous
>>> skciphers.
>>
>> According to Herbert, SKCIPHER_REQUEST_ON_STACK() may only be used
>> for invoking synchronous ciphers.
>>
>> In fact, due to the way the crypto API is built, if you try using it
>> with any transformation that uses DMA
>> you would most probably end up trying to DMA to/from the stack which
>> as we all know is not a great idea.
>>
>
> Ah yes, I found [0] which contains that quote.
>
> So that means that Kees can disregard the occurrences that are async
> only, but it still implies that we cannot limit the reqsize like he
> proposes unless we take the sync/async nature into account.
> It also means we should probably BUG() or WARN() in
> SKCIPHER_REQUEST_ON_STACK() when used with an async algo.
>
>>>
>>> So we could formalize this and limit SKCIPHER_REQUEST_ON_STACK() to
>>> synchronous skciphers, which implies that the reqsize limit only has
>>> to apply synchronous skciphers as well. But before we can do this, we
>>> have to identify the remaining occurrences that allow asynchronous
>>> skciphers to be used, and replace them with heap allocations.
>>
>> Any such occurrences are almost for sure broken already due to the DMA
>> issue I've mentioned.
>>
>
> I am not convinced of this. The skcipher request struct does not
> contain any payload buffers, and whether the algo specific ctx struct
> is used for DMA is completely up to the driver. So I am quite sure
> there are plenty of async algos that work fine with
> SKCIPHER_REQUEST_ON_STACK() and vmapped stacks.


You are right that it is up to the driver but the cost is an extra
memory allocation and release
*per request* for any per request data that needs to be DMAable beyond
the actual plain
and cipher text buffers such as the IV, so driver writers have an
incentive against doing that :-)

Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker

values of β will give rise to dom!


Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Ard Biesheuvel
On 6 September 2018 at 09:21, Ard Biesheuvel  wrote:
> On 6 September 2018 at 06:53, Gilad Ben-Yossef  wrote:
>> On Thu, Sep 6, 2018 at 1:49 AM, Ard Biesheuvel
>>  wrote:
>>> On 5 September 2018 at 23:05, Kees Cook  wrote:
 On Wed, Sep 5, 2018 at 2:18 AM, Ard Biesheuvel
  wrote:
> On 4 September 2018 at 20:16, Kees Cook  wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> caps the skcipher request size similar to other limits and adds a sanity
>> check at registration. Looking at instrumented tcrypt output, the largest
>> is for lrw:
>>
>> crypt: testing lrw(aes)
>> crypto_skcipher_set_reqsize: 8
>> crypto_skcipher_set_reqsize: 88
>> crypto_skcipher_set_reqsize: 472
>>
>
> Are you sure this is a representative sampling? I haven't double
> checked myself, but we have plenty of drivers for peripherals in
> drivers/crypto that implement block ciphers, and they would not turn
> up in tcrypt unless you are running on a platform that provides the
> hardware in question.

 Hrm, excellent point. Looking at this again:

 The core part of the VLA is using this in the ON_STACK macro:

 static inline unsigned int crypto_skcipher_reqsize(struct crypto_skcipher 
 *tfm)
 {
 return tfm->reqsize;
 }

 I don't find any struct crypto_skcipher .reqsize static initializers,
 and the initial reqsize is here:

 static int crypto_init_skcipher_ops_ablkcipher(struct crypto_tfm *tfm)
 {
 ...
 skcipher->reqsize = crypto_ablkcipher_reqsize(ablkcipher) +
 sizeof(struct ablkcipher_request);

 with updates via crypto_skcipher_set_reqsize().

 So I have to examine ablkcipher reqsize too:

 static inline unsigned int crypto_ablkcipher_reqsize(
 struct crypto_ablkcipher *tfm)
 {
 return crypto_ablkcipher_crt(tfm)->reqsize;
 }

 And of the crt_ablkcipher.reqsize assignments/initializers, I found:

 ablkcipher reqsize:
 1   struct dcp_aes_req_ctx
 8   struct atmel_tdes_reqctx
 8   struct cryptd_blkcipher_request_ctx
 8   struct mtk_aes_reqctx
 8   struct omap_des_reqctx
 8   struct s5p_aes_reqctx
 8   struct sahara_aes_reqctx
 8   struct stm32_cryp_reqctx
 8   struct stm32_cryp_reqctx
 16  struct ablk_ctx
 24  struct atmel_aes_reqctx
 48  struct omap_aes_reqctx
 48  struct omap_aes_reqctx
 48  struct qat_crypto_request
 56  struct artpec6_crypto_request_context
 64  struct chcr_blkcipher_req_ctx
 80  struct spacc_req
 80  struct virtio_crypto_sym_request
 136 struct qce_cipher_reqctx
 168 struct n2_request_context
 328 struct ccp_des3_req_ctx
 400 struct ccp_aes_req_ctx
 536 struct hifn_request_context
 992 struct cvm_req_ctx
 2456struct iproc_reqctx_s

 The base ablkcipher wrapper is:
 80  struct ablkcipher_request

 And in my earlier skcipher wrapper analysis, lrw was the largest
 skcipher wrapper:
 384 struct rctx

 iproc_reqctx_s is an extreme outlier, with cvm_req_ctx at a bit less than 
 half.

 Making this a 2920 byte fixed array doesn't seem sensible at all
 (though that's what's already possible to use with existing
 SKCIPHER_REQUEST_ON_STACK users).

 What's the right path forward here?

>>>
>>> The skcipher implementations based on crypto IP blocks are typically
>>> asynchronous, and I wouldn't be surprised if a fair number of
>>> SKCIPHER_REQUEST_ON_STACK() users are limited to synchronous
>>> skciphers.
>>
>> According to Herbert, SKCIPHER_REQUEST_ON_STACK() may only be used
>> for invoking synchronous ciphers.
>>
>> In fact, due to the way the crypto API is built, if you try using it
>> with any transformation that uses DMA
>> you would most probably end up trying to DMA to/from the stack which
>> as we all know is not a great idea.
>>
>
> Ah yes, I found [0] which contains that quote.
>
> So that means that Kees can disregard the occurrences that are async
> only, but it still implies that we cannot limit the reqsize like he
> proposes unless we take the sync/async nature into account.
> It also means we should probably BUG() or WARN() in
> SKCIPHER_REQUEST_ON_STACK() when used with an async algo.
>

Something like this should do the trick:

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 2f327f090c3e..70584e0f26bc 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -142,7 +142,9 @@ struct skcipher_alg {
 #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \
char __##name##_desc[sizeof(struct skcipher_request) + \
crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR; \
-  

Re: [PATCH 3/7] vfio: add sdmdev support

2018-09-06 Thread Kenneth Lee
On Sun, Sep 02, 2018 at 07:11:12PM -0700, Randy Dunlap wrote:
> Date: Sun, 2 Sep 2018 19:11:12 -0700
> From: Randy Dunlap 
> To: Kenneth Lee , Jonathan Corbet ,
>  Herbert Xu , "David S . Miller"
>  , Joerg Roedel , Alex Williamson
>  , Kenneth Lee , Hao
>  Fang , Zhou Wang , Zaibo Xu
>  , Philippe Ombredanne , Greg
>  Kroah-Hartman , Thomas Gleixner
>  , linux-...@vger.kernel.org,
>  linux-ker...@vger.kernel.org, linux-crypto@vger.kernel.org,
>  io...@lists.linux-foundation.org, k...@vger.kernel.org,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu ,
>  Sanjay Kumar 
> CC: linux...@huawei.com
> Subject: Re: [PATCH 3/7] vfio: add sdmdev support
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> Message-ID: <0574e7dc-bed2-d2d0-3aa6-93590d54c...@infradead.org>
> 
> On 09/02/2018 05:52 PM, Kenneth Lee wrote:
> > diff --git a/drivers/vfio/sdmdev/Kconfig b/drivers/vfio/sdmdev/Kconfig
> > new file mode 100644
> > index ..51474272870d
> > --- /dev/null
> > +++ b/drivers/vfio/sdmdev/Kconfig
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +config VFIO_SDMDEV
> > +   tristate "Support for Share Domain MDEV"
> > +   depends on VFIO_MDEV_DEVICE
> > +   help
> > + Support for VFIO Share Domain MDEV, which enables the kernel to
> > + support light weight hardware accelerator framework, WarpDrive.
> 
> lightweight
> 
Thank you, will fix it.
> > +
> > + To compile this as a module, choose M here: the module will be called
> > + sdmdev.
> 
> 
> thanks,
> -- 
> ~Randy

-- 
-Kenneth(Hisilicon)



Re: [PATCH 2/2] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

2018-09-06 Thread Ard Biesheuvel
On 6 September 2018 at 06:53, Gilad Ben-Yossef  wrote:
> On Thu, Sep 6, 2018 at 1:49 AM, Ard Biesheuvel
>  wrote:
>> On 5 September 2018 at 23:05, Kees Cook  wrote:
>>> On Wed, Sep 5, 2018 at 2:18 AM, Ard Biesheuvel
>>>  wrote:
 On 4 September 2018 at 20:16, Kees Cook  wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> caps the skcipher request size similar to other limits and adds a sanity
> check at registration. Looking at instrumented tcrypt output, the largest
> is for lrw:
>
> crypt: testing lrw(aes)
> crypto_skcipher_set_reqsize: 8
> crypto_skcipher_set_reqsize: 88
> crypto_skcipher_set_reqsize: 472
>

 Are you sure this is a representative sampling? I haven't double
 checked myself, but we have plenty of drivers for peripherals in
 drivers/crypto that implement block ciphers, and they would not turn
 up in tcrypt unless you are running on a platform that provides the
 hardware in question.
>>>
>>> Hrm, excellent point. Looking at this again:
>>>
>>> The core part of the VLA is using this in the ON_STACK macro:
>>>
>>> static inline unsigned int crypto_skcipher_reqsize(struct crypto_skcipher 
>>> *tfm)
>>> {
>>> return tfm->reqsize;
>>> }
>>>
>>> I don't find any struct crypto_skcipher .reqsize static initializers,
>>> and the initial reqsize is here:
>>>
>>> static int crypto_init_skcipher_ops_ablkcipher(struct crypto_tfm *tfm)
>>> {
>>> ...
>>> skcipher->reqsize = crypto_ablkcipher_reqsize(ablkcipher) +
>>> sizeof(struct ablkcipher_request);
>>>
>>> with updates via crypto_skcipher_set_reqsize().
>>>
>>> So I have to examine ablkcipher reqsize too:
>>>
>>> static inline unsigned int crypto_ablkcipher_reqsize(
>>> struct crypto_ablkcipher *tfm)
>>> {
>>> return crypto_ablkcipher_crt(tfm)->reqsize;
>>> }
>>>
>>> And of the crt_ablkcipher.reqsize assignments/initializers, I found:
>>>
>>> ablkcipher reqsize:
>>> 1   struct dcp_aes_req_ctx
>>> 8   struct atmel_tdes_reqctx
>>> 8   struct cryptd_blkcipher_request_ctx
>>> 8   struct mtk_aes_reqctx
>>> 8   struct omap_des_reqctx
>>> 8   struct s5p_aes_reqctx
>>> 8   struct sahara_aes_reqctx
>>> 8   struct stm32_cryp_reqctx
>>> 8   struct stm32_cryp_reqctx
>>> 16  struct ablk_ctx
>>> 24  struct atmel_aes_reqctx
>>> 48  struct omap_aes_reqctx
>>> 48  struct omap_aes_reqctx
>>> 48  struct qat_crypto_request
>>> 56  struct artpec6_crypto_request_context
>>> 64  struct chcr_blkcipher_req_ctx
>>> 80  struct spacc_req
>>> 80  struct virtio_crypto_sym_request
>>> 136 struct qce_cipher_reqctx
>>> 168 struct n2_request_context
>>> 328 struct ccp_des3_req_ctx
>>> 400 struct ccp_aes_req_ctx
>>> 536 struct hifn_request_context
>>> 992 struct cvm_req_ctx
>>> 2456struct iproc_reqctx_s
>>>
>>> The base ablkcipher wrapper is:
>>> 80  struct ablkcipher_request
>>>
>>> And in my earlier skcipher wrapper analysis, lrw was the largest
>>> skcipher wrapper:
>>> 384 struct rctx
>>>
>>> iproc_reqctx_s is an extreme outlier, with cvm_req_ctx at a bit less than 
>>> half.
>>>
>>> Making this a 2920 byte fixed array doesn't seem sensible at all
>>> (though that's what's already possible to use with existing
>>> SKCIPHER_REQUEST_ON_STACK users).
>>>
>>> What's the right path forward here?
>>>
>>
>> The skcipher implementations based on crypto IP blocks are typically
>> asynchronous, and I wouldn't be surprised if a fair number of
>> SKCIPHER_REQUEST_ON_STACK() users are limited to synchronous
>> skciphers.
>
> According to Herbert, SKCIPHER_REQUEST_ON_STACK() may only be used
> for invoking synchronous ciphers.
>
> In fact, due to the way the crypto API is built, if you try using it
> with any transformation that uses DMA
> you would most probably end up trying to DMA to/from the stack which
> as we all know is not a great idea.
>

Ah yes, I found [0] which contains that quote.

So that means that Kees can disregard the occurrences that are async
only, but it still implies that we cannot limit the reqsize like he
proposes unless we take the sync/async nature into account.
It also means we should probably BUG() or WARN() in
SKCIPHER_REQUEST_ON_STACK() when used with an async algo.

>>
>> So we could formalize this and limit SKCIPHER_REQUEST_ON_STACK() to
>> synchronous skciphers, which implies that the reqsize limit only has
>> to apply synchronous skciphers as well. But before we can do this, we
>> have to identify the remaining occurrences that allow asynchronous
>> skciphers to be used, and replace them with heap allocations.
>
> Any such occurrences are almost for sure broken already due to the DMA
> issue I've mentioned.
>

I am not convinced of this. The skcipher request struct does not
contain any payload buffers, and whether the algo specific ctx struct
is used for DMA is completely up