Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Haren Myneni
On 08/31/2017 06:31 AM, Dan Streetman wrote:
> On Tue, Aug 29, 2017 at 5:54 PM, Haren Myneni  
> wrote:
>> On 08/29/2017 02:23 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2017-08-29 at 09:58 -0400, Dan Streetman wrote:
> +
> +   ret = -EINVAL;
> +   if (coproc && coproc->vas.rxwin) {
> +   wmem->txwin = nx842_alloc_txwin(coproc);

 this is wrong.  the workmem is scratch memory that's valid only for
 the duration of a single operation.
>>
>> Correct, workmem is used until crypto_free is called.
> 
> that's not a 'single operation'.  a single operation is compress() or
> decompress().

workmem is allocated in nx842_crypto_init (called from crypto_alloc) and freed 
in crypto_free(). We can have single compression / decompression() operation or 
multiple within this crypto session. In case of single operation, we will end 
up workmem, ctx->sbounce/dbounce alloc/free for each request.  

> 

 do you actually need a txwin per crypto transform?  or do you need a
 txwin per coprocessor?  or txwin per processor?  either per-coproc or
 per-cpu should be created at driver init and held separately
 (globally) instead of a per-transform txwin.  I really don't see why
 you would need a txwin per transform, because the coproc should not
 care how many different transforms there are.
>>>
>>> We should only need a single window for the whole kernel really, plus
>>> one per user process who wants direct access but that's not relevant
>>> here.
>>
>> Opening send window for each crypto transform (crypto_alloc,
>> compression/decompression, ..., crypto_free) so that does not
>> have to wait for the previous copy/paste complete.
>> VAS will map send and receive windows, and can cache in send
>> windows (up to 128). So I thought using the same send window
>> (per chip) for more requests (say 1000) may be adding overhead.
>>
>> I will make changes if you prefer using 1 send window per chip.
> 
> i don't have the spec, so i shouldn't be making the decision on it,
> but i do know putting a persistent field into the workmem is the wrong
> location.  If it's valid for the life of the transform, put it into
> the transform context.  The workmem buffer is intended to be used only
> during a single operation - it's "working memory" to perform each
> individual crypto transformation.
> 
>>
>>>
>>> Cheers,
>>> Ben.
>>>
>>
> 



Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Haren Myneni
On 08/31/2017 06:40 AM, Dan Streetman wrote:
> On Thu, Aug 31, 2017 at 3:44 AM, Haren Myneni  
> wrote:
>> Thanks MIchael and Dan for your review comments.
>>
>>
>> On 08/29/2017 06:32 AM, Dan Streetman wrote:
>>> On Mon, Aug 28, 2017 at 7:25 PM, Michael Ellerman  
>>> wrote:
 Hi Haren,

 Some comments inline ...

 Haren Myneni  writes:

> diff --git a/drivers/crypto/nx/nx-842-powernv.c 
> b/drivers/crypto/nx/nx-842-powernv.c
> index c0dd4c7e17d3..13089a0b9dfa 100644
> --- a/drivers/crypto/nx/nx-842-powernv.c
> +++ b/drivers/crypto/nx/nx-842-powernv.c
> @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
>
>  #define WORKMEM_ALIGN(CRB_ALIGN)
>  #define CSB_WAIT_MAX (5000) /* ms */
> +#define VAS_RETRIES  (10)

 Where does that number come from?
>>
>> Sometimes HW returns copy/paste failures.
> 
> why?  what is causing the failure?

Vas can return RMA_busy for several reasons - receive / send windows does not 
have credits or cached and etc.
 
> 
>> So we should retry the request again. With 10 retries, Test running
>> 12 hours was successful for repeated compression/decompression
>> requests with 1024 threads.
>>

 Do we have any idea what the trade off is between retrying vs just
 falling back to doing the request in software?
>>
>> Not checked the overhead with falling back to SW compression.
> 
> SW is very, very, very slow, due to 842 being an unaligned compression format.
> 
>>

> +/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
> +#define MAX_CREDITS_PER_RXFIFO   (1024)
>
>  struct nx842_workmem {
>   /* Below fields must be properly aligned */
> @@ -42,16 +46,27 @@ struct nx842_workmem {
>
>   ktime_t start;
>
> + struct vas_window *txwin;   /* Used with VAS function */

 I don't understand how it makes sense to put txwin and start between the
 fields above, and the padding.
>>>
>>> workmem is a scratch buffer and shouldn't be used for something
>>> persistent like this.
>>>

 If the workmem pointer you receive is not aligned, then PTR_ALIGN() will
 advance it and mean you end up writing over start and txwin.
>>
>> We always access workmem with PTR_ALIGN even when assigning txwin 
>> (nx842_powernv_crypto_init/exit_vas).
>> So we should not overwrite start and txwin,
>>
>> We can add txwin in nx842_crypto_ctx instead of workmem. But 
>> nx842_crypto_ctx is used for both powernv and pseries. Hence used workmem. 
>> But if nx842_crypto_ctx is preferred, I will send new patch soon.
>>

 That's probably not your bug, the code is already like that.
>>>
>>> no, it's a bug in this patch, because workmem is scratch whose
>>> contents are only valid for the duration of each operation (compress
>>> or decompress).
>>>

>   char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
>  } __packed __aligned(WORKMEM_ALIGN);

>>
>> Thanks
>> Haren
>>
> 



Re: [PATCH] crypto: MPI - kunmap after finishing accessing buffer

2017-08-31 Thread Herbert Xu
On Wed, Aug 30, 2017 at 06:33:41PM +0200, Stephan Mueller wrote:
> Am Dienstag, 22. August 2017, 08:33:15 CEST schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > On Thu, Aug 10, 2017 at 08:06:18AM +0200, Stephan Müller wrote:
> > > Hi Herbert,
> > > 
> > > I found that issue while playing around with edge conditions in my
> > > algif_akcipher implementation. This issue only manifests in a
> > > segmentation violation on 32 bit machines and with an SGL where each
> > > SG points to one byte. SGLs with larger buffers seem to be not
> > > affected by this issue.
> > > 
> > > Yet this access-after-unmap should be a candidate for stable, IMHO.
> > 
> > Good catch.  Thanks!
> > 
> > Fixes: 4816c9406430 ("lib/mpi: Fix SG miter leak")
> > Cc: 
> 
> Just to confirm: stable is not CCed in this email -- did you send it 
> separately to stable? (dto for "[PATCH v4] crypto: only call put_page on 
> referenced and used pages")

The cc is meant to go into the commit and when the commit hits
mainline it'll automatically get picked up by stable.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Dan Streetman
On Thu, Aug 31, 2017 at 3:44 AM, Haren Myneni  wrote:
> Thanks MIchael and Dan for your review comments.
>
>
> On 08/29/2017 06:32 AM, Dan Streetman wrote:
>> On Mon, Aug 28, 2017 at 7:25 PM, Michael Ellerman  
>> wrote:
>>> Hi Haren,
>>>
>>> Some comments inline ...
>>>
>>> Haren Myneni  writes:
>>>
 diff --git a/drivers/crypto/nx/nx-842-powernv.c 
 b/drivers/crypto/nx/nx-842-powernv.c
 index c0dd4c7e17d3..13089a0b9dfa 100644
 --- a/drivers/crypto/nx/nx-842-powernv.c
 +++ b/drivers/crypto/nx/nx-842-powernv.c
 @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");

  #define WORKMEM_ALIGN(CRB_ALIGN)
  #define CSB_WAIT_MAX (5000) /* ms */
 +#define VAS_RETRIES  (10)
>>>
>>> Where does that number come from?
>
> Sometimes HW returns copy/paste failures.

why?  what is causing the failure?

> So we should retry the request again. With 10 retries, Test running
> 12 hours was successful for repeated compression/decompression
> requests with 1024 threads.
>
>>>
>>> Do we have any idea what the trade off is between retrying vs just
>>> falling back to doing the request in software?
>
> Not checked the overhead with falling back to SW compression.

SW is very, very, very slow, due to 842 being an unaligned compression format.

>
>>>
 +/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
 +#define MAX_CREDITS_PER_RXFIFO   (1024)

  struct nx842_workmem {
   /* Below fields must be properly aligned */
 @@ -42,16 +46,27 @@ struct nx842_workmem {

   ktime_t start;

 + struct vas_window *txwin;   /* Used with VAS function */
>>>
>>> I don't understand how it makes sense to put txwin and start between the
>>> fields above, and the padding.
>>
>> workmem is a scratch buffer and shouldn't be used for something
>> persistent like this.
>>
>>>
>>> If the workmem pointer you receive is not aligned, then PTR_ALIGN() will
>>> advance it and mean you end up writing over start and txwin.
>
> We always access workmem with PTR_ALIGN even when assigning txwin 
> (nx842_powernv_crypto_init/exit_vas).
> So we should not overwrite start and txwin,
>
> We can add txwin in nx842_crypto_ctx instead of workmem. But nx842_crypto_ctx 
> is used for both powernv and pseries. Hence used workmem. But if 
> nx842_crypto_ctx is preferred, I will send new patch soon.
>
>>>
>>> That's probably not your bug, the code is already like that.
>>
>> no, it's a bug in this patch, because workmem is scratch whose
>> contents are only valid for the duration of each operation (compress
>> or decompress).
>>
>>>
   char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
  } __packed __aligned(WORKMEM_ALIGN);
>>>
>
> Thanks
> Haren
>


Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Dan Streetman
On Tue, Aug 29, 2017 at 5:54 PM, Haren Myneni  wrote:
> On 08/29/2017 02:23 PM, Benjamin Herrenschmidt wrote:
>> On Tue, 2017-08-29 at 09:58 -0400, Dan Streetman wrote:
 +
 +   ret = -EINVAL;
 +   if (coproc && coproc->vas.rxwin) {
 +   wmem->txwin = nx842_alloc_txwin(coproc);
>>>
>>> this is wrong.  the workmem is scratch memory that's valid only for
>>> the duration of a single operation.
>
> Correct, workmem is used until crypto_free is called.

that's not a 'single operation'.  a single operation is compress() or
decompress().

>>>
>>> do you actually need a txwin per crypto transform?  or do you need a
>>> txwin per coprocessor?  or txwin per processor?  either per-coproc or
>>> per-cpu should be created at driver init and held separately
>>> (globally) instead of a per-transform txwin.  I really don't see why
>>> you would need a txwin per transform, because the coproc should not
>>> care how many different transforms there are.
>>
>> We should only need a single window for the whole kernel really, plus
>> one per user process who wants direct access but that's not relevant
>> here.
>
> Opening send window for each crypto transform (crypto_alloc,
> compression/decompression, ..., crypto_free) so that does not
> have to wait for the previous copy/paste complete.
> VAS will map send and receive windows, and can cache in send
> windows (up to 128). So I thought using the same send window
> (per chip) for more requests (say 1000) may be adding overhead.
>
> I will make changes if you prefer using 1 send window per chip.

i don't have the spec, so i shouldn't be making the decision on it,
but i do know putting a persistent field into the workmem is the wrong
location.  If it's valid for the life of the transform, put it into
the transform context.  The workmem buffer is intended to be used only
during a single operation - it's "working memory" to perform each
individual crypto transformation.

>
>>
>> Cheers,
>> Ben.
>>
>


Re: [PATCH v7 00/19] simplify crypto wait for async op

2017-08-31 Thread Harsh Jain
HI Gilad,

I think we need an update in ESP also. Now EBUSY return means driver
has accepted, Packet should not be dropped in

esp_output_tail() function.

.

Regards
Harsh Jain



On Thu, Aug 24, 2017 at 7:48 PM, Gilad Ben-Yossef  wrote:
> Many users of kernel async. crypto services have a pattern of
> starting an async. crypto op and than using a completion
> to wait for it to end.
>
> This patch set simplifies this common use case in two ways:
>
> First, by separating the return codes of the case where a
> request is queued to a backlog due to the provider being
> busy (-EBUSY) from the case the request has failed due
> to the provider being busy and backlogging is not enabled
> (-EAGAIN).
>
> Next, this change is than built on to create a generic API
> to wait for a async. crypto operation to complete.
>
> The end result is a smaller code base and an API that is
> easier to use and more difficult to get wrong.
>
> The patch set was boot tested on x86_64 and arm64 which
> at the very least tests the crypto users via testmgr and
> tcrypt but I do note that I do not have access to some
> of the HW whose drivers are modified nor do I claim I was
> able to test all of the corner cases.
>
> The patch set is based upon linux-next release tagged
> next-20170824.
>
> Changes from v6:
> - Fix brown paper bag compile error on marvell/cesa
>   code.
>
> Changes from v5:
> - Remove redundant new line as spotted by Jonathan
>   Cameron.
> - Reworded dm-verity change commit message to better
>   clarify potential issue averted by change as
>   pointed out by Mikulas Patocka.
>
> Changes from v4:
> - Rebase on top of latest algif changes from Stephan
>   Mueller.
> - Fix typo in ccp patch title.
>
> Changes from v3:
> - Instead of changing the return code to indicate
>   backlog queueing, change the return code to indicate
>   transient busy state, as suggested by Herbert Xu.
>
> Changes from v2:
> - Patch title changed from "introduce crypto wait for
>   async op" to better reflect the current state.
> - Rebase on top of latest linux-next.
> - Add a new return code of -EIOCBQUEUED for backlog
>   queueing, as suggested by Herbert Xu.
> - Transform more users to the new API.
> - Update the drbg change to account for new init as
>   indicated by Stephan Muller.
>
> Changes from v1:
> - Address review comments from Eric Biggers.
> - Separated out bug fixes of existing code and rebase
>   on top of that patch set.
> - Rename 'ecr' to 'wait' in fscrypto code.
> - Split patch introducing the new API from the change
>   moving over the algif code which it originated from
>   to the new API.
> - Inline crypto_wait_req().
> - Some code indentation fixes.
>
> Gilad Ben-Yossef (19):
>   crypto: change transient busy return code to -EAGAIN
>   crypto: ccp: use -EAGAIN for transient busy indication
>   crypto: remove redundant backlog checks on EBUSY
>   crypto: marvell/cesa: remove redundant backlog checks on EBUSY
>   crypto: introduce crypto wait for async op
>   crypto: move algif to generic async completion
>   crypto: move pub key to generic async completion
>   crypto: move drbg to generic async completion
>   crypto: move gcm to generic async completion
>   crypto: move testmgr to generic async completion
>   fscrypt: move to generic async completion
>   dm: move dm-verity to generic async completion
>   cifs: move to generic async completion
>   ima: move to generic async completion
>   crypto: tcrypt: move to generic async completion
>   crypto: talitos: move to generic async completion
>   crypto: qce: move to generic async completion
>   crypto: mediatek: move to generic async completion
>   crypto: adapt api sample to use async. op wait
>
>  Documentation/crypto/api-samples.rst |  52 ++---
>  crypto/af_alg.c  |  27 -
>  crypto/ahash.c   |  12 +--
>  crypto/algapi.c  |   6 +-
>  crypto/algif_aead.c  |   8 +-
>  crypto/algif_hash.c  |  50 +
>  crypto/algif_skcipher.c  |   9 +-
>  crypto/api.c |  13 +++
>  crypto/asymmetric_keys/public_key.c  |  28 +
>  crypto/cryptd.c  |   4 +-
>  crypto/cts.c |   6 +-
>  crypto/drbg.c|  36 ++-
>  crypto/gcm.c |  32 ++
>  crypto/lrw.c |   8 +-
>  crypto/rsa-pkcs1pad.c|  16 +--
>  crypto/tcrypt.c  |  84 +--
>  crypto/testmgr.c | 204 
> ---
>  crypto/xts.c |   8 +-
>  drivers/crypto/ccp/ccp-crypto-main.c |   8 +-
>  drivers/crypto/ccp/ccp-dev.c |   7 +-
>  drivers/crypto/marvell/cesa.c|   3 +-
>  drivers/crypto/marvell/cesa.h|   2 +-
>  drivers/crypto/mediatek/mtk-aes.c|  31 +-
>  drivers/crypto/qce/sha.c |  30 +-
>  

Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Haren Myneni
Thanks MIchael and Dan for your review comments.


On 08/29/2017 06:32 AM, Dan Streetman wrote:
> On Mon, Aug 28, 2017 at 7:25 PM, Michael Ellerman  wrote:
>> Hi Haren,
>>
>> Some comments inline ...
>>
>> Haren Myneni  writes:
>>
>>> diff --git a/drivers/crypto/nx/nx-842-powernv.c 
>>> b/drivers/crypto/nx/nx-842-powernv.c
>>> index c0dd4c7e17d3..13089a0b9dfa 100644
>>> --- a/drivers/crypto/nx/nx-842-powernv.c
>>> +++ b/drivers/crypto/nx/nx-842-powernv.c
>>> @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
>>>
>>>  #define WORKMEM_ALIGN(CRB_ALIGN)
>>>  #define CSB_WAIT_MAX (5000) /* ms */
>>> +#define VAS_RETRIES  (10)
>>
>> Where does that number come from?

Sometimes HW returns copy/paste failures. So we should retry the request again. 
With 10 retries, Test running 12 hours was successful for repeated 
compression/decompression requests with 1024 threads. 

>>
>> Do we have any idea what the trade off is between retrying vs just
>> falling back to doing the request in software?

Not checked the overhead with falling back to SW compression. 

>>
>>> +/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
>>> +#define MAX_CREDITS_PER_RXFIFO   (1024)
>>>
>>>  struct nx842_workmem {
>>>   /* Below fields must be properly aligned */
>>> @@ -42,16 +46,27 @@ struct nx842_workmem {
>>>
>>>   ktime_t start;
>>>
>>> + struct vas_window *txwin;   /* Used with VAS function */
>>
>> I don't understand how it makes sense to put txwin and start between the
>> fields above, and the padding.
> 
> workmem is a scratch buffer and shouldn't be used for something
> persistent like this.
> 
>>
>> If the workmem pointer you receive is not aligned, then PTR_ALIGN() will
>> advance it and mean you end up writing over start and txwin.

We always access workmem with PTR_ALIGN even when assigning txwin 
(nx842_powernv_crypto_init/exit_vas).
So we should not overwrite start and txwin,

We can add txwin in nx842_crypto_ctx instead of workmem. But nx842_crypto_ctx 
is used for both powernv and pseries. Hence used workmem. But if 
nx842_crypto_ctx is preferred, I will send new patch soon. 

>>
>> That's probably not your bug, the code is already like that.
> 
> no, it's a bug in this patch, because workmem is scratch whose
> contents are only valid for the duration of each operation (compress
> or decompress).
> 
>>
>>>   char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
>>>  } __packed __aligned(WORKMEM_ALIGN);
>>

Thanks
Haren



[PATCH V4 7/7] crypto/nx: Add P9 NX support for 842 compression engine

2017-08-31 Thread Haren Myneni

This patch adds P9 NX support for 842 compression engine. Virtual
Accelerator Switchboard (VAS) is used to access 842 engine on P9.

For each NX engine per chip, setup receive window using
vas_rx_win_open() which configures RxFIFo with FIFO address, lpid,
pid and tid values. This unique (lpid, pid, tid) combination will
be used to identify the target engine.

For crypto open request, open send window on the NX engine for
the corresponding chip / cpu where the open request is executed.
This send window will be closed upon crypto close request.

NX provides high and normal priority FIFOs. For compression /
decompression requests, we use only hight priority FIFOs in kernel.

Each NX request will be communicated to VAS using copy/paste
instructions with vas_copy_crb() / vas_paste_crb() functions.

Signed-off-by: Haren Myneni 
Reviewed-by: Ram Pai 

---
 drivers/crypto/nx/Kconfig  |   1 +
 drivers/crypto/nx/nx-842-powernv.c | 377 -
 2 files changed, 372 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index ad7552a6998c..cd5dda9c48f4 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -38,6 +38,7 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
 config CRYPTO_DEV_NX_COMPRESS_POWERNV
tristate "Compression acceleration support on PowerNV platform"
depends on PPC_POWERNV
+   depends on PPC_VAS
default y
help
  Support for PowerPC Nest (NX) compression acceleration. This
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index c0dd4c7e17d3..874ddf5e9087 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Dan Streetman ");
@@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
 
 #define WORKMEM_ALIGN  (CRB_ALIGN)
 #define CSB_WAIT_MAX   (5000) /* ms */
+#define VAS_RETRIES(10)
+/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
+#define MAX_CREDITS_PER_RXFIFO (1024)
 
 struct nx842_workmem {
/* Below fields must be properly aligned */
@@ -42,16 +46,27 @@ struct nx842_workmem {
 
ktime_t start;
 
+   struct vas_window *txwin;   /* Used with VAS function */
char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
 } __packed __aligned(WORKMEM_ALIGN);
 
 struct nx842_coproc {
unsigned int chip_id;
unsigned int ct;
-   unsigned int ci;
+   unsigned int ci;/* Coprocessor instance, used with icswx */
+   struct {
+   struct vas_window *rxwin;
+   int id;
+   } vas;
struct list_head list;
 };
 
+/*
+ * Send the request to NX engine on the chip for the corresponding CPU
+ * where the process is executing. Use with VAS function.
+ */
+static DEFINE_PER_CPU(struct nx842_coproc *, coproc_inst);
+
 /* no cpu hotplug on powernv, so this list never changes after init */
 static LIST_HEAD(nx842_coprocs);
 static unsigned int nx842_ct;  /* used in icswx function */
@@ -513,6 +528,104 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
 }
 
 /**
+ * nx842_exec_vas - compress/decompress data using the 842 algorithm
+ *
+ * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * This compresses or decompresses the provided input buffer into the provided
+ * output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * output data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * The @workmem buffer should only be used by one function call at a time.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, size determined by
+ *   nx842_powernv_driver.workmem_size
+ * @fc: function code, see CCW Function Codes in nx-842.h
+ *
+ * Returns:
+ *   0 Success, output of length @outlenp stored in the buffer
+ * at @out
+ *   -ENODEV   Hardware unavailable
+ *   -ENOSPC   Output buffer is to small
+ *   -EMSGSIZE Input buffer too large
+ *   -EINVAL   buffer constraints do not fix nx842_constraints
+ *   -EPROTO   hardware error during operation
+ *   -ETIMEDOUThardware did not complete operation in reasonable time
+ *   -EINTRoperation was aborted
+ */
+static int nx842_exec_vas(const unsigned char *in, unsigned int inlen,
+ unsigned char *out, unsigned int *outlenp,
+ void *workmem, int fc)
+{
+   struct coprocessor_request_block *crb;
+   struct coprocessor_status_block *csb;
+   struct nx842_workmem *wmem;
+   struct vas_window *txwin;
+   

[PATCH V4 6/7] crypto/nx: Add P9 NX specific error codes for 842 engine

2017-08-31 Thread Haren Myneni

This patch adds changes for checking P9 specific 842 engine
error codes. These errros are reported in coprocessor status
block (CSB) for failures.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/icswx.h   |  3 +++
 drivers/crypto/nx/nx-842-powernv.c | 18 ++
 drivers/crypto/nx/nx-842.h |  8 
 3 files changed, 29 insertions(+)

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 27e588f6c72e..6a2c87577541 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -69,7 +69,10 @@ struct coprocessor_completion_block {
 #define CSB_CC_WR_PROTECTION   (16)
 #define CSB_CC_UNKNOWN_CODE(17)
 #define CSB_CC_ABORT   (18)
+#define CSB_CC_EXCEED_BYTE_COUNT   (19)/* P9 or later */
 #define CSB_CC_TRANSPORT   (20)
+#define CSB_CC_INVALID_CRB (21)/* P9 or later */
+#define CSB_CC_INVALID_DDE (30)/* P9 or later */
 #define CSB_CC_SEGMENTED_DDL   (31)
 #define CSB_CC_PROGRESS_POINT  (32)
 #define CSB_CC_DDE_OVERFLOW(33)
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 829b5cad0043..c0dd4c7e17d3 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -243,6 +243,13 @@ static int wait_for_csb(struct nx842_workmem *wmem,
case CSB_CC_TEMPL_OVERFLOW:
CSB_ERR(csb, "Compressed data template shows data past end");
return -EINVAL;
+   case CSB_CC_EXCEED_BYTE_COUNT:  /* P9 or later */
+   /*
+* DDE byte count exceeds the limit specified in Maximum
+* byte count register.
+*/
+   CSB_ERR(csb, "DDE byte count exceeds the limit");
+   return -EINVAL;
 
/* these should not happen */
case CSB_CC_INVALID_ALIGN:
@@ -284,9 +291,17 @@ static int wait_for_csb(struct nx842_workmem *wmem,
CSB_ERR(csb, "Too many DDEs in DDL");
return -EINVAL;
case CSB_CC_TRANSPORT:
+   case CSB_CC_INVALID_CRB:/* P9 or later */
/* shouldn't happen, we setup CRB correctly */
CSB_ERR(csb, "Invalid CRB");
return -EINVAL;
+   case CSB_CC_INVALID_DDE:/* P9 or later */
+   /*
+* shouldn't happen, setup_direct/indirect_dde creates
+* DDE right
+*/
+   CSB_ERR(csb, "Invalid DDE");
+   return -EINVAL;
case CSB_CC_SEGMENTED_DDL:
/* shouldn't happen, setup_ddl creates DDL right */
CSB_ERR(csb, "Segmented DDL error");
@@ -330,6 +345,9 @@ static int wait_for_csb(struct nx842_workmem *wmem,
case CSB_CC_HW:
CSB_ERR(csb, "Correctable hardware error");
return -EPROTO;
+   case CSB_CC_HW_EXPIRED_TIMER:   /* P9 or later */
+   CSB_ERR(csb, "Job did not finish within allowed time");
+   return -EPROTO;
 
default:
CSB_ERR(csb, "Invalid CC %d", csb->cc);
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 30929bd7d1a9..bb2f31792683 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -76,9 +76,17 @@
 #define CSB_CC_DECRYPT_OVERFLOW(64)
 /* asym crypt codes */
 #define CSB_CC_MINV_OVERFLOW   (128)
+/*
+ * HW error - Job did not finish in the maximum time allowed.
+ * Job terminated.
+ */
+#define CSB_CC_HW_EXPIRED_TIMER(224)
 /* These are reserved for hypervisor use */
 #define CSB_CC_HYP_RESERVE_START   (240)
 #define CSB_CC_HYP_RESERVE_END (253)
+#define CSB_CC_HYP_RESERVE_P9_END  (251)
+/* No valid interrupt server (P9 or later). */
+#define CSB_CC_HYP_RESERVE_NO_INTR_SERVER  (252)
 #define CSB_CC_HYP_NO_HW   (254)
 #define CSB_CC_HYP_HANG_ABORTED(255)
 
-- 
2.11.0





[PATCH V4 5/7] crypto/nx: Use kzalloc for workmem allocation

2017-08-31 Thread Haren Myneni

Send window is opened / closed for each crypto session.
So initializes txwin in workmem.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index d94e25df503b..da3cb8c35ec7 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -116,7 +116,7 @@ int nx842_crypto_init(struct crypto_tfm *tfm, struct 
nx842_driver *driver)
 
spin_lock_init(>lock);
ctx->driver = driver;
-   ctx->wmem = kmalloc(driver->workmem_size, GFP_KERNEL);
+   ctx->wmem = kzalloc(driver->workmem_size, GFP_KERNEL);
ctx->sbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
ctx->dbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
if (!ctx->wmem || !ctx->sbounce || !ctx->dbounce) {
-- 
2.11.0





[PATCH V4 4/7] crypto/nx: Add nx842_add_coprocs_list function

2017-08-31 Thread Haren Myneni

Updating coprocessor list is moved to nx842_add_coprocs_list().
This function will be used for both icswx and VAS functions.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 67dc06f9b557..829b5cad0043 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -550,6 +550,14 @@ static int nx842_powernv_decompress(const unsigned char 
*in, unsigned int inlen,
  wmem, CCW_FC_842_DECOMP_CRC);
 }
 
+static inline void nx842_add_coprocs_list(struct nx842_coproc *coproc,
+   int chipid)
+{
+   coproc->chip_id = chipid;
+   INIT_LIST_HEAD(>list);
+   list_add(>list, _coprocs);
+}
+
 static int __init nx842_powernv_probe(struct device_node *dn)
 {
struct nx842_coproc *coproc;
@@ -576,11 +584,9 @@ static int __init nx842_powernv_probe(struct device_node 
*dn)
if (!coproc)
return -ENOMEM;
 
-   coproc->chip_id = chip_id;
coproc->ct = ct;
coproc->ci = ci;
-   INIT_LIST_HEAD(>list);
-   list_add(>list, _coprocs);
+   nx842_add_coprocs_list(coproc, chip_id);
 
pr_info("coprocessor found on chip %d, CT %d CI %d\n", chip_id, ct, ci);
 
-- 
2.11.0





[PATCH V4 2/7] crypto/nx: Create nx842_configure_crb function

2017-08-31 Thread Haren Myneni

Configure CRB is moved to nx842_configure_crb() so that it can
be used for icswx and VAS exec functions. VAS function will be
added later with P9 support.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 57 +-
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 161987698bbc..1bd19e03eb7d 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -358,6 +358,40 @@ static int wait_for_csb(struct nx842_workmem *wmem,
return 0;
 }
 
+static int nx842_config_crb(const unsigned char *in, unsigned int inlen,
+   unsigned char *out, unsigned int outlen,
+   struct nx842_workmem *wmem)
+{
+   struct coprocessor_request_block *crb;
+   struct coprocessor_status_block *csb;
+   u64 csb_addr;
+   int ret;
+
+   crb = >crb;
+   csb = >csb;
+
+   /* Clear any previous values */
+   memset(crb, 0, sizeof(*crb));
+
+   /* set up DDLs */
+   ret = setup_ddl(>source, wmem->ddl_in,
+   (unsigned char *)in, inlen, true);
+   if (ret)
+   return ret;
+
+   ret = setup_ddl(>target, wmem->ddl_out,
+   out, outlen, false);
+   if (ret)
+   return ret;
+
+   /* set up CRB's CSB addr */
+   csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
+   csb_addr |= CRB_CSB_AT; /* Addrs are phys */
+   crb->csb_addr = cpu_to_be64(csb_addr);
+
+   return 0;
+}
+
 /**
  * nx842_exec_icswx - compress/decompress data using the 842 algorithm
  *
@@ -397,7 +431,6 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
struct coprocessor_status_block *csb;
struct nx842_workmem *wmem;
int ret;
-   u64 csb_addr;
u32 ccw;
unsigned int outlen = *outlenp;
 
@@ -411,33 +444,19 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
return -ENODEV;
}
 
-   crb = >crb;
-   csb = >csb;
-
-   /* Clear any previous values */
-   memset(crb, 0, sizeof(*crb));
-
-   /* set up DDLs */
-   ret = setup_ddl(>source, wmem->ddl_in,
-   (unsigned char *)in, inlen, true);
-   if (ret)
-   return ret;
-   ret = setup_ddl(>target, wmem->ddl_out,
-   out, outlen, false);
+   ret = nx842_config_crb(in, inlen, out, outlen, wmem);
if (ret)
return ret;
 
+   crb = >crb;
+   csb = >csb;
+
/* set up CCW */
ccw = 0;
ccw = SET_FIELD(CCW_CT, ccw, nx842_ct);
ccw = SET_FIELD(CCW_CI_842, ccw, 0); /* use 0 for hw auto-selection */
ccw = SET_FIELD(CCW_FC_842, ccw, fc);
 
-   /* set up CRB's CSB addr */
-   csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
-   csb_addr |= CRB_CSB_AT; /* Addrs are phys */
-   crb->csb_addr = cpu_to_be64(csb_addr);
-
wmem->start = ktime_get();
 
/* do ICSWX */
-- 
2.11.0





[PATCH V4 3/7] crypto/nx: Create nx842_delete_coprocs function

2017-08-31 Thread Haren Myneni

Move deleting coprocessors info upon exit or failure to
nx842_delete_coprocs().

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 1bd19e03eb7d..67dc06f9b557 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -593,6 +593,16 @@ static int __init nx842_powernv_probe(struct device_node 
*dn)
return 0;
 }
 
+static void nx842_delete_coprocs(void)
+{
+   struct nx842_coproc *coproc, *n;
+
+   list_for_each_entry_safe(coproc, n, _coprocs, list) {
+   list_del(>list);
+   kfree(coproc);
+   }
+}
+
 static struct nx842_constraints nx842_powernv_constraints = {
.alignment =DDE_BUFFER_ALIGN,
.multiple = DDE_BUFFER_LAST_MULT,
@@ -652,13 +662,7 @@ static __init int nx842_powernv_init(void)
 
ret = crypto_register_alg(_powernv_alg);
if (ret) {
-   struct nx842_coproc *coproc, *n;
-
-   list_for_each_entry_safe(coproc, n, _coprocs, list) {
-   list_del(>list);
-   kfree(coproc);
-   }
-
+   nx842_delete_coprocs();
return ret;
}
 
@@ -668,13 +672,8 @@ module_init(nx842_powernv_init);
 
 static void __exit nx842_powernv_exit(void)
 {
-   struct nx842_coproc *coproc, *n;
-
crypto_unregister_alg(_powernv_alg);
 
-   list_for_each_entry_safe(coproc, n, _coprocs, list) {
-   list_del(>list);
-   kfree(coproc);
-   }
+   nx842_delete_coprocs();
 }
 module_exit(nx842_powernv_exit);
-- 
2.11.0





[PATCH V4 1/7] crypto/nx: Rename nx842_powernv_function as icswx function

2017-08-31 Thread Haren Myneni

Rename nx842_powernv_function to nx842_powernv_exec.
nx842_powernv_exec points to nx842_exec_icswx and
will be point to VAS exec function which will be added later
for P9 NX support.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 3abb045cdba7..161987698bbc 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -54,7 +54,11 @@ struct nx842_coproc {
 
 /* no cpu hotplug on powernv, so this list never changes after init */
 static LIST_HEAD(nx842_coprocs);
-static unsigned int nx842_ct;
+static unsigned int nx842_ct;  /* used in icswx function */
+
+static int (*nx842_powernv_exec)(const unsigned char *in,
+   unsigned int inlen, unsigned char *out,
+   unsigned int *outlenp, void *workmem, int fc);
 
 /**
  * setup_indirect_dde - Setup an indirect DDE
@@ -355,7 +359,7 @@ static int wait_for_csb(struct nx842_workmem *wmem,
 }
 
 /**
- * nx842_powernv_function - compress/decompress data using the 842 algorithm
+ * nx842_exec_icswx - compress/decompress data using the 842 algorithm
  *
  * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
  * This compresses or decompresses the provided input buffer into the provided
@@ -385,7 +389,7 @@ static int wait_for_csb(struct nx842_workmem *wmem,
  *   -ETIMEDOUThardware did not complete operation in reasonable time
  *   -EINTRoperation was aborted
  */
-static int nx842_powernv_function(const unsigned char *in, unsigned int inlen,
+static int nx842_exec_icswx(const unsigned char *in, unsigned int inlen,
  unsigned char *out, unsigned int *outlenp,
  void *workmem, int fc)
 {
@@ -489,13 +493,13 @@ static int nx842_powernv_function(const unsigned char 
*in, unsigned int inlen,
  * @workmem: working memory buffer pointer, size determined by
  *   nx842_powernv_driver.workmem_size
  *
- * Returns: see @nx842_powernv_function()
+ * Returns: see @nx842_powernv_exec()
  */
 static int nx842_powernv_compress(const unsigned char *in, unsigned int inlen,
  unsigned char *out, unsigned int *outlenp,
  void *wmem)
 {
-   return nx842_powernv_function(in, inlen, out, outlenp,
+   return nx842_powernv_exec(in, inlen, out, outlenp,
  wmem, CCW_FC_842_COMP_CRC);
 }
 
@@ -517,13 +521,13 @@ static int nx842_powernv_compress(const unsigned char 
*in, unsigned int inlen,
  * @workmem: working memory buffer pointer, size determined by
  *   nx842_powernv_driver.workmem_size
  *
- * Returns: see @nx842_powernv_function()
+ * Returns: see @nx842_powernv_exec()
  */
 static int nx842_powernv_decompress(const unsigned char *in, unsigned int 
inlen,
unsigned char *out, unsigned int *outlenp,
void *wmem)
 {
-   return nx842_powernv_function(in, inlen, out, outlenp,
+   return nx842_powernv_exec(in, inlen, out, outlenp,
  wmem, CCW_FC_842_DECOMP_CRC);
 }
 
@@ -625,6 +629,8 @@ static __init int nx842_powernv_init(void)
if (!nx842_ct)
return -ENODEV;
 
+   nx842_powernv_exec = nx842_exec_icswx;
+
ret = crypto_register_alg(_powernv_alg);
if (ret) {
struct nx842_coproc *coproc, *n;
-- 
2.11.0





[PATCH V4 0/7] Enable NX 842 compression engine on Power9

2017-08-31 Thread Haren Myneni

P9 introduces Virtual Accelerator Switchboard (VAS) to communicate
with NX 842 engine. icswx function is used to access NX before.
On powerNV systems, NX-842 driver invokes VAS functions for
configuring RxFIFO (receive window) per each NX engine. VAS uses
this FIFO to communicate the request to NX. The kernel opens send
window which is used to transfer compression/decompression requests
to VAS. It maps the send window to the corresponding RxFIFO.
copy/paste instructions are used to pass the CRB to VAS.

This patch series adds P9 NX support for 842 compression engine.
First 5 patches reorganize the current code so that VAS function
can be added.
- nx842_powernv_function points to VAS function if VAS feature is
  available. Otherwise icswx function is used.
- Move configure CRB code nx842_cfg_crb() 
- In addition to freeing co-processor structs for initialization 
  failures and exit, both send and receive windows have to closed
  for VAS.
- Move updating coprocessor info list to nx842_add_coprocs_list().
- Initialize wormem allocation.

The last 2 patches adds configuring and invoking VAS, and also
checking P9 NX specific errors that are provided in co-processor
status block (CSB) for failures.

Patches have been tested on P9 DD1 system using VAS changes and
on P8 HW to make sure no regression.

This patchset depends on VAS kernel changes:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-August/162883.html

Thanks to Sukadev Bhattiprolu for his review, input and testing with
VAS changes. Also thanks to Michael Ellerman and Benjamin Herrenschmidt
for their valuable guidance and comments.

Changelog[V4]
- New VAS copy/paste API changes
- Proper device tree parsing for VAS ID as Michael Ellerman suggested. 

Changelog[V3]
- preemption disable for copy/paste as Nichalos Piggin suggested.
- PTR_ALIGN for workmem buffer based on Ram Pai's comemnt.

Changelog[v2]
- Open/close send windows in nx842_poernv_crypto_init/exit_vas().
- Changes for the new device-tree NX properties such as priority
  and compatible properties.
- Incorporated review comments from Michael Ellerman.
- Other minor issues found during HW testing.

Haren Myneni (7):
  crypto/nx: Rename nx842_powernv_function as icswx function
  crypto/nx: Create nx842_configure_crb function
  crypto/nx: Create nx842_delete_coprocs function
  crypto/nx: Add nx842_add_coprocs_list function
  crypto/nx: Use kzalloc for workmem allocation
  crypto/nx: Add P9 NX specific error codes for 842 engine
  crypto/nx: Add P9 NX support for 842 compression engine.

 arch/powerpc/include/asm/icswx.h   |   3 +
 drivers/crypto/nx/Kconfig  |   1 +
 drivers/crypto/nx/nx-842-powernv.c | 507 +
 drivers/crypto/nx/nx-842.c |   2 +-
 drivers/crypto/nx/nx-842.h |   8 +
 5 files changed, 473 insertions(+), 48 deletions(-)

-- 
2.11.0