Re: [PATCH v2 net-next 3/3] crypto: caam: Replace in_irq() usage.

2020-11-02 Thread Horia Geantă
On 11/2/2020 1:23 AM, Sebastian Andrzej Siewior wrote:
> The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
> scheduling is required or packet processing.
> 
> The usage of in_*() in drivers is phased out and Linus clearly requested
> that code which changes behaviour depending on context should either be
> separated or the context be conveyed in an argument passed by the caller,
> which usually knows the context.
> 
> Use the `sched_napi' argument passed by the callback. It is set true if
> called from the interrupt handler and NAPI should be scheduled.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Cc: "Horia Geantă" 
> Cc: Aymen Sghaier 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: Madalin Bucur 
> Cc: Jakub Kicinski 
> Cc: Li Yang 
> Cc: linux-cry...@vger.kernel.org
> Cc: net...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
Reviewed-by: Horia Geantă 

Thanks,
Horia


Re: [PATCH v2 net-next 1/3] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

2020-11-02 Thread Horia Geantă
On 11/2/2020 1:23 AM, Sebastian Andrzej Siewior wrote:
> dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
> invoked from:
> 
>  - Hard interrupt context
>  - Any context which is not serving soft interrupts
> 
> Any context which is not serving soft interrupts includes hard interrupts
> so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
> about this:
> 
> /*
>  * In case of threaded ISR, for RT kernels in_irq() does not return
>  * appropriate value, so use in_serving_softirq to distinguish between
>  * softirq and irq contexts.
>  */
>  if (in_irq() || !in_serving_softirq())
> 
> This has nothing to do with RT. Even on a non RT kernel force threaded
> interrupts run obviously in thread context and therefore in_irq() returns
> false when invoked from the handler.
> 
> The extension of the in_irq() check with !in_serving_softirq() was there
> when the drivers were added, but in the out of tree FSL BSP the original
> condition was in_irq() which got extended due to failures on RT.
> 
Looks like the initial FSL BSP commit adding this check is:
edca0b7a448a ("dpaa_eth: Fix Rx-stall issue in threaded ISR")
https://source.codeaurora.org/external/qoriq/qoriq-yocto-sdk/linux/commit/?h=fsl-sdk-v1.2=edca0b7a448ac18ef0a9b1238209b7595d511e19

This was done for dpaa_eth and the same logic was reused in caam.
In the process of upstreaming the development history got lost and
the comment in dpaa_eth was removed.

This was back in 2012 on a v3.0.34 kernel.
Not sure if/how things changed in the meantime, i.e. whether in_irq()
behaviour when called from softirq changed on -rt kernels (assuming this was
the problem Priyanka tried solving).

> The usage of in_xxx() in drivers is phased out and Linus clearly requested
> that code which changes behaviour depending on context should either be
> separated or the context be conveyed in an argument passed by the caller,
> which usually knows the context. Right he is, the above construct is
> clearly showing why.
> 
> The following callchains have been analyzed to end up in
> dpaa_eth_napi_schedule():
> 
> qman_p_poll_dqrr()
>   __poll_portal_fast()
> fq->cb.dqrr()
>dpaa_eth_napi_schedule()
> 
> portal_isr()
>   __poll_portal_fast()
> fq->cb.dqrr()
>dpaa_eth_napi_schedule()
> 
> Both need to schedule NAPI.
Only the call from interrupt context.

> The crypto part has another code path leading up to this:
>   kill_fq()
>  empty_retired_fq()
>qman_p_poll_dqrr()
>  __poll_portal_fast()
> fq->cb.dqrr()
>dpaa_eth_napi_schedule()
> 
> kill_fq() is called from task context and ends up scheduling NAPI, but
> that's pointless and an unintended side effect of the !in_serving_softirq()
> check.
> 
Correct.

> The code path:
>   caam_qi_poll() -> qman_p_poll_dqrr()
> 
> is invoked from NAPI and I *assume* from crypto's NAPI device and not
> from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
> (because this is what happens now) but could be changed if it is wrong
> due to `budget' handling.
> 
Looks good to me.

> Add an argument to __poll_portal_fast() which is true if NAPI needs to be
> scheduled. This requires propagating the value to the caller including
> `qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Cc: "Horia Geantă" 
> Cc: Aymen Sghaier 
> Cc: Herbert XS 
> Cc: "David S. Miller" 
> Cc: Madalin Bucur 
> Cc: Jakub Kicinski 
> Cc: Li Yang 
> Cc: linux-cry...@vger.kernel.org
> Cc: net...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
Reviewed-by: Horia Geantă 

Thanks,
Horia


Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-11 Thread Horia Geantă
On 8/10/2020 8:03 PM, Eric Biggers wrote:
> On Mon, Aug 10, 2020 at 05:33:39PM +0300, Horia Geantă wrote:
>> On 8/10/2020 4:45 PM, Herbert Xu wrote:
>>> On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
>>>>
>>>> With all due respect, but this makes no sense.
>>>
>>> I agree.  This is a lot of churn for no gain.
>>>
>> I would say the gain is that all skcipher algorithms would behave the same
>> when input length equals zero - i.e. treat the request as a no-op.
>>
>> We can't say "no input" has any meaning to the other skcipher algorithms,
>> but the convention is to accept this case and just return 0.
>> I don't see why XTS has to be handled differently.
>>
> 
> CTS also rejects empty inputs.
> 
> The rule it follows is just that all input lengths >= blocksize are allowed.
> Input lengths < blocksize aren't allowed.
> 
Indeed, thanks.

What about, for example, CBC?
AFAICT cbc(aes) with input length = 0 is valid.

Same for CTR (with the note that blocksize = 1) and several other algorithms
mentioned in the cover letter.

What's the rule in these cases?

Thanks,
Horia


Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Horia Geantă
On 8/10/2020 4:45 PM, Herbert Xu wrote:
> On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
>>
>> With all due respect, but this makes no sense.
> 
> I agree.  This is a lot of churn for no gain.
> 
I would say the gain is that all skcipher algorithms would behave the same
when input length equals zero - i.e. treat the request as a no-op.

We can't say "no input" has any meaning to the other skcipher algorithms,
but the convention is to accept this case and just return 0.
I don't see why XTS has to be handled differently.

Thanks,
Horia


[PATCH] powerpc/dts/fsl: add crypto node alias for B4

2019-03-20 Thread Horia Geantă
crypto node alias is needed by U-boot to identify the node and
perform fix-ups, like adding "fsl,sec-era" property.

Signed-off-by: Horia Geantă 
---
 arch/powerpc/boot/dts/fsl/b4qds.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/boot/dts/fsl/b4qds.dtsi 
b/arch/powerpc/boot/dts/fsl/b4qds.dtsi
index 999efd3bc167..05be919f3545 100644
--- a/arch/powerpc/boot/dts/fsl/b4qds.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4qds.dtsi
@@ -40,6 +40,7 @@
interrupt-parent = <>;
 
aliases {
+   crypto = 
phy_sgmii_10 = _sgmii_10;
phy_sgmii_11 = _sgmii_11;
phy_sgmii_1c = _sgmii_1c;
-- 
2.17.1



[PATCH] soc: fsl: guts: make fsl_guts_get_svr() static

2019-02-21 Thread Horia Geantă
The export of fsl_guts_get_svr() is a left-over, it's currently used
only internally and users needing SoC information should use the generic
soc_device infrastructure.

Signed-off-by: Horia Geantă 
---
 drivers/soc/fsl/guts.c   | 3 +--
 include/linux/fsl/guts.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
index 4f9655087bd7..63f6df86f9e5 100644
--- a/drivers/soc/fsl/guts.c
+++ b/drivers/soc/fsl/guts.c
@@ -115,7 +115,7 @@ static const struct fsl_soc_die_attr *fsl_soc_die_match(
return NULL;
 }
 
-u32 fsl_guts_get_svr(void)
+static u32 fsl_guts_get_svr(void)
 {
u32 svr = 0;
 
@@ -129,7 +129,6 @@ u32 fsl_guts_get_svr(void)
 
return svr;
 }
-EXPORT_SYMBOL(fsl_guts_get_svr);
 
 static int fsl_guts_probe(struct platform_device *pdev)
 {
diff --git a/include/linux/fsl/guts.h b/include/linux/fsl/guts.h
index 941b11811f85..1fc0edd71c52 100644
--- a/include/linux/fsl/guts.h
+++ b/include/linux/fsl/guts.h
@@ -135,8 +135,6 @@ struct ccsr_guts {
u32 srds2cr1;   /* 0x.0f44 - SerDes2 Control Register 0 */
 } __attribute__ ((packed));
 
-u32 fsl_guts_get_svr(void);
-
 /* Alternate function signal multiplex control */
 #define MPC85xx_PMUXCR_QE(x) (0x8000 >> (x))
 
-- 
2.16.2



Re: [PATCH v2] crypto: talitos - fix IPsec cipher in length

2018-03-23 Thread Horia Geantă
On 3/22/2018 11:57 AM, Christophe Leroy wrote:
> For SEC 2.x+, cipher in length must contain only the ciphertext length.
   turns out this should be 3.x+

> In case of using hardware ICV checking, the ICV length is provided via
> the "extent" field of the descriptor pointer.
> 
> Cc: <sta...@vger.kernel.org> # 4.8+
> Fixes: 549bd8bc5987 ("crypto: talitos - Implement AEAD for SEC1 using 
> HMAC_SNOOP_NO_AFEU")
> Reported-by: Horia Geantă <horia.gea...@nxp.com>
> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr>
Tested-by: Horia Geantă <horia.gea...@nxp.com>

Thanks,
Horia



Re: [PATCH 1/2] crypto: talitos - don't persistently map req_ctx->hw_context and req_ctx->buf

2018-03-02 Thread Horia Geantă
On 2/26/2018 6:40 PM, Christophe Leroy wrote:
> Commit 49f9783b0cea ("crypto: talitos - do hw_context DMA mapping
> outside the requests") introduced a persistent dma mapping of
> req_ctx->hw_context
> Commit 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash
> on SEC1") introduced a persistent dma mapping of req_ctx->buf
> 
> As there is no destructor for req_ctx (the request context), the
> associated dma handlers where set in ctx (the tfm context). This is
> wrong as several hash operations can run with the same ctx.
> 
> This patch removes this persistent mapping.
> 
> Reported-by: Horia Geanta <horia.gea...@nxp.com>
> Fixes: 49f9783b0cea ("crypto: talitos - do hw_context DMA mapping outside the 
> requests")
> Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
> SEC1")
> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr>
Tested-by: Horia Geantă <horia.gea...@nxp.com>

Please add this to 4.15.y -stable tree.

Thanks,
Horia


Re: [PATCH 17/18] crypto: talitos - chain in buffered data for ahash on SEC1

2018-03-02 Thread Horia Geantă
On 10/6/2017 4:05 PM, Christophe Leroy wrote:
[...]
> @@ -1778,6 +1814,36 @@ static int common_nonsnoop_hash(struct talitos_edesc 
> *edesc,
>   if (is_sec1 && from_talitos_ptr_len(>ptr[3], true) == 0)
>   talitos_handle_buggy_hash(ctx, edesc, >ptr[3]);
>  
> + if (is_sec1 && req_ctx->nbuf && length) {
> + struct talitos_desc *desc2 = desc + 1;
> + dma_addr_t next_desc;
[...]
> + next_desc = dma_map_single(dev, >hdr1, TALITOS_DESC_SIZE,
> +DMA_BIDIRECTIONAL);
> + desc->next_desc = cpu_to_be32(next_desc);
Where is desc->next_desc initialized for the !is_sec1 case?
Memory allocation is done using kmalloc(), and since desc->next_desc is checked
in some cases also for SEC 2.x+, it should be initialized to 0.

Thanks,
Horia



Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-22 Thread Horia Geantă
On 2/22/2018 1:47 PM, Herbert Xu wrote:
> On Tue, Feb 20, 2018 at 11:32:25AM +0000, Horia Geantă wrote:
>>
>> If final/finup is optional, how is the final hash supposed to be retrieved?
> 
> Sometimes the computation ends with a partial hash, that's what
> export is for.  Also it is completely legal to abandon the hash
> state entirely.
> 
Thanks for the explanation.
It's unintuitive to call .init() -> .update() and then not to call any of
.final(), .finup(), .export().

Christophe,

IIUC this means that there is no room for improvement.
This patch needs to be reverted, to restore previous behaviour when the
hw_context was mapped / unmapped for every request.

Thanks,
Horia


Re: [PATCH 4.14, 4.9] crypto: talitos - fix Kernel Oops on hashing an empty file

2018-02-22 Thread Horia Geantă
On 2/22/2018 9:08 AM, Christophe Leroy wrote:
> Upstream 87a81dce53b1ea61acaeefa5191a0376a2d1d721
> 
> Performing the hash of an empty file leads to a kernel Oops
> 
> [   44.504600] Unable to handle kernel paging request for data at address 
> 0x000c
> [   44.512819] Faulting instruction address: 0xc02d2be8
> [   44.524088] Oops: Kernel access of bad area, sig: 11 [#1]
> [   44.529171] BE PREEMPT CMPC885
> [   44.532232] CPU: 0 PID: 491 Comm: md5sum Not tainted 
> 4.15.0-rc8-00211-g3a968610b6ea #81
> [   44.540814] NIP:  c02d2be8 LR: c02d2984 CTR: 
> [   44.545812] REGS: c6813c90 TRAP: 0300   Not tainted  
> (4.15.0-rc8-00211-g3a968610b6ea)
> [   44.554223] MSR:  9032   CR: 48222822  XER: 2000
> [   44.560855] DAR: 000c DSISR: c000
> [   44.560855] GPR00: c02d28fc c6813d40 c6828000 c646fa40 0001 0001 
> 0001 
> [   44.560855] GPR08: 004c  c000bfcc  28222822 100280d4 
>  10020008
> [   44.560855] GPR16:  0020   10024008  
> c646f9f0 c6179a10
> [   44.560855] GPR24:  0001 c62f0018 c6179a10  c6367a30 
> c62f c646f9c0
> [   44.598542] NIP [c02d2be8] ahash_process_req+0x448/0x700
> [   44.603751] LR [c02d2984] ahash_process_req+0x1e4/0x700
> [   44.608868] Call Trace:
> [   44.611329] [c6813d40] [c02d28fc] ahash_process_req+0x15c/0x700 
> (unreliable)
> [   44.618302] [c6813d90] [c02060c4] hash_recvmsg+0x11c/0x210
> [   44.623716] [c6813db0] [c0331354] ___sys_recvmsg+0x98/0x138
> [   44.629226] [c6813eb0] [c03332c0] __sys_recvmsg+0x40/0x84
> [   44.634562] [c6813f10] [c03336c0] SyS_socketcall+0xb8/0x1d4
> [   44.640073] [c6813f40] [c000d1ac] ret_from_syscall+0x0/0x38
> [   44.645530] Instruction dump:
> [   44.648465] 38c1 7f63db78 4e800421 7c791b78 54690ffe 0f09 80ff0190 
> 2f87
> [   44.656122] 40befe50 2f990001 409e0210 813f01bc <8129000c> b39e003a 
> 7d29c214 913e003c
> 
> This patch fixes that Oops by checking if src is NULL.
> 
> Fixes: 6a1e8d14156d4 ("crypto: talitos - making mapping helpers more generic")
> Cc: 
> Signed-off-by: Christophe Leroy 

Isn't this needed also in 4.15.y?

Thanks,
Horia


Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-20 Thread Horia Geantă
On 2/20/2018 12:34 PM, Herbert Xu wrote:
> On Mon, Feb 19, 2018 at 01:16:30PM +0000, Horia Geantă wrote:
>>
>>> And what about ALGIF path from user space ?
>>> What if the user never calls the last sendmsg() which will call 
>>> hash_finup() ?
>>>
>> User is expected to follow the rules of the crypto API.
>> Of course, kernel won't (or at least shouldn't) crash in case of misuse.
>> However, in these cases some resources might not be freed - it's unavoidable.
> 
> the crypto API does not require the presence of a finalisation.
> It is entirely optional.  So leaving resources pinned down until
> final/finup occurs is unacceptable, both from user-space and the
> kernel.
> 
If final/finup is optional, how is the final hash supposed to be retrieved?

According to documentation, these are the accepted flows (with the option to
export/import a partial hash b/w update and final/finup):

.init() -> .update() -> .final()
^| |
'' '---> HASH

.init() -> .update() -> .finup()
^| |
'' '---> HASH

   .digest()
   |
   '---> HASH

Note that digest() is not an issue in the case we are discussing, since resource
allocation happens only in init().

Thanks,
Horia


Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-19 Thread Horia Geantă
On 2/19/2018 11:14 AM, Christophe LEROY wrote:
> Le 19/02/2018 à 09:30, Horia Geantă a écrit :
>> On 2/19/2018 9:58 AM, Christophe LEROY wrote:
>>> Le 18/02/2018 à 18:14, Horia Geantă a écrit :
>>>> There is no ahash_exit() callback mirroring ahash_init().
>>>>
>>>> The clean-up of request ctx should be done in the last states of the hash 
>>>> flows
>>>> described here:
>>>> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
>>>> for e.g. in the final() callback.
>>>
>>> Unfortunatly it seems that we can't rely on those finalising functions
>>> being called all the time.
>>> If you look into test_ahash_jiffies() for instance, in case of error the
>>> call of crypto_hash_final() is skipped.
>>
>> If test_ahash_jiffies() errors before calling crypto_ahash_final(req), this
>> means a previous callback failed.
>> Accordingly, DMA unmapping should be performed also on the corresponding 
>> errors
>> paths in the driver.
>>
> 
> And what about ALGIF path from user space ?
> What if the user never calls the last sendmsg() which will call 
> hash_finup() ?
> 
User is expected to follow the rules of the crypto API.
Of course, kernel won't (or at least shouldn't) crash in case of misuse.
However, in these cases some resources might not be freed - it's unavoidable.

Horia


Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-19 Thread Horia Geantă
On 2/19/2018 9:58 AM, Christophe LEROY wrote:
> Le 18/02/2018 à 18:14, Horia Geantă a écrit :
>> There is no ahash_exit() callback mirroring ahash_init().
>>
>> The clean-up of request ctx should be done in the last states of the hash 
>> flows
>> described here:
>> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
>> for e.g. in the final() callback.
> 
> Unfortunatly it seems that we can't rely on those finalising functions 
> being called all the time.
> If you look into test_ahash_jiffies() for instance, in case of error the 
> call of crypto_hash_final() is skipped.

If test_ahash_jiffies() errors before calling crypto_ahash_final(req), this
means a previous callback failed.
Accordingly, DMA unmapping should be performed also on the corresponding errors
paths in the driver.

Horia


Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-18 Thread Horia Geantă
On 2/17/2018 6:32 PM, Christophe LEROY wrote:
> 
> 
> Le 07/02/2018 à 15:39, Horia Geantă a écrit :
>> On 10/6/2017 4:06 PM, Christophe Leroy wrote:
>>> At every request, we map and unmap the same hash hw_context.
>>>
>>> This patch moves the dma mapping/unmapping in functions ahash_init()
>>> and ahash_import().
>>>
>>> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr>
>>> ---
>>>   drivers/crypto/talitos.c | 80 
>>> ++--
>>>   1 file changed, 57 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
>>> index ebfd6d982ed6..d495649d5267 100644
>>> --- a/drivers/crypto/talitos.c
>>> +++ b/drivers/crypto/talitos.c
>>> @@ -819,6 +819,7 @@ struct talitos_ctx {
>>> unsigned int keylen;
>>> unsigned int enckeylen;
>>> unsigned int authkeylen;
>>> +   dma_addr_t dma_hw_context;
>> This doesn't look correct.
>>
>> talitos_ctx structure is the tfm context.
>> dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
>> structure (request context).
> 
> Yes but I have now found how I can know that the request context is 
> being released in order to unmap() dma at that time.
> It is tricky to use the tmf context I agree, but at least I know when 
> tmf context get destroyed, ie in talitos_cra_exit_ahash()
> The request context is created by ahash_request_alloc() and released by
> ahash_request_free(). I have not found the way to call dma_unmap() 
> before ahash_request_free() gets called.
> 
>>
>> If there are multiple requests in flight for the same tfm, dma_hw_context 
>> will
>> be overwritten.
> 
> Before overwritting dma_hw_context, it is always released, see 
> talitos_cra_exit_ahash(), ahash_init(), ahash_import()
> 
The problem is not the unmapping.
If there are two requests for the same tfm, then given the following sequence
1. tfm->ahash_init(req1)
tfm_ctx->dma_hw_context points to req1_ctx->hw_context
2. tfm->ahash_init(req2)
tfm_ctx->dma_hw_context [unmapped, then] points to req2_ctx->hw_context
i.e. req1 will use the hw_context of req2.

>>
>> dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx 
>> struct).
> 
> Any suggestion then on how to handle the issue explained above ?
> 
There is no ahash_exit() callback mirroring ahash_init().

The clean-up of request ctx should be done in the last states of the hash flows
described here:
https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
for e.g. in the final() callback.

Hope this helps,
Horia


Re: [PATCH 16/18] crypto: talitos - do hw_context DMA mapping outside the requests

2018-02-07 Thread Horia Geantă
On 10/6/2017 4:06 PM, Christophe Leroy wrote:
> At every request, we map and unmap the same hash hw_context.
> 
> This patch moves the dma mapping/unmapping in functions ahash_init()
> and ahash_import().
> 
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/crypto/talitos.c | 80 
> ++--
>  1 file changed, 57 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index ebfd6d982ed6..d495649d5267 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -819,6 +819,7 @@ struct talitos_ctx {
>   unsigned int keylen;
>   unsigned int enckeylen;
>   unsigned int authkeylen;
> + dma_addr_t dma_hw_context;
This doesn't look correct.

talitos_ctx structure is the tfm context.
dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
structure (request context).

If there are multiple requests in flight for the same tfm, dma_hw_context will
be overwritten.

dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx 
struct).

Thanks,
Horia


Re: [PATCH 00/18] crypto: talitos - fixes and performance improvement

2017-12-08 Thread Horia Geantă
On 10/12/2017 6:20 PM, Herbert Xu wrote:
> On Fri, Oct 06, 2017 at 03:04:31PM +0200, Christophe Leroy wrote:
>> This serie fixes and improves the talitos crypto driver.
>>
>> First 6 patchs are fixes of failures reported by the new tests in the
>> kernel crypto test manager.
>>
Looks like these fixes are required also on older 4.9+ -stable kernels.
(I haven't seen them on latest 4.9.68-stable mail from Greg, even though
they are in main tree.)

In case you agree, what would be the recommended way to add the patches
to -stable?

Thanks,
Horia

>> The 8 following patches are cleanups and simplifications.
>>
>> The last 4 ones are performance improvement. The main improvement is
>> in the one before the last, it divides by 2 the time needed for a md5
>> hash on the SEC1.
>>
>> Christophe Leroy (18):
>>   crypto: talitos - fix AEAD test failures
>>   crypto: talitos - fix memory corruption on SEC2
>>   crypto: talitos - fix setkey to check key weakness
>>   crypto: talitos - fix AEAD for sha224 on non sha224 capable chips
>>   crypto: talitos - fix use of sg_link_tbl_len
>>   crypto: talitos - fix ctr-aes-talitos
>>   crypto: talitos - zeroize the descriptor with memset()
>>   crypto: talitos - declare local functions static
>>   crypto: talitos - use devm_kmalloc()
>>   crypto: talitos - use of_property_read_u32()
>>   crypto: talitos - use devm_ioremap()
>>   crypto: talitos - don't check the number of channels at each interrupt
>>   crypto: talitos - remove to_talitos_ptr_len()
>>   crypto: talitos - simplify tests in ipsec_esp()
>>   crypto: talitos - DMA map key in setkey()
>>   crypto: talitos - do hw_context DMA mapping outside the requests
>>   crypto: talitos - chain in buffered data for ahash on SEC1
>>   crypto: talitos - avoid useless copy
> 
> All applied.  Thanks.
> 


[PATCH v2 7/7] crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64

2017-06-24 Thread Horia Geantă
Now that ioread64 and iowrite64 are always available we don't
need the ugly ifdefs to change their implementation when they
are not.

Signed-off-by: Logan Gunthorpe <log...@deltatee.com>
Cc: Horia Geantă <horia.gea...@nxp.com>
Cc: Dan Douglass <dan.dougl...@nxp.com>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Cc: "David S. Miller" <da...@davemloft.net>

Updated patch such that behaviour does not change
from i.MX workaround point of view.

Signed-off-by: Horia Geantă <horia.gea...@nxp.com>
---
 drivers/crypto/caam/regs.h | 33 -
 1 file changed, 4 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index 84d2f838a063..b893ebb24e65 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -134,50 +134,25 @@ static inline void clrsetbits_32(void __iomem *reg, u32 
clear, u32 set)
  *base + 0x : least-significant 32 bits
  *base + 0x0004 : most-significant 32 bits
  */
-#ifdef CONFIG_64BIT
 static inline void wr_reg64(void __iomem *reg, u64 data)
 {
+#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
iowrite64(data, reg);
else
-   iowrite64be(data, reg);
-}
-
-static inline u64 rd_reg64(void __iomem *reg)
-{
-   if (caam_little_end)
-   return ioread64(reg);
-   else
-   return ioread64be(reg);
-}
-
-#else /* CONFIG_64BIT */
-static inline void wr_reg64(void __iomem *reg, u64 data)
-{
-#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
-   if (caam_little_end) {
-   wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
-   wr_reg32((u32 __iomem *)(reg), data);
-   } else
 #endif
-   {
-   wr_reg32((u32 __iomem *)(reg), data >> 32);
-   wr_reg32((u32 __iomem *)(reg) + 1, data);
-   }
+   iowrite64be(data, reg);
 }
 
 static inline u64 rd_reg64(void __iomem *reg)
 {
 #ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
-   return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg)));
+   return ioread64(reg);
else
 #endif
-   return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg) + 1));
+   return ioread64be(reg);
 }
-#endif /* CONFIG_64BIT  */
 
 #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
 #ifdef CONFIG_SOC_IMX7D
-- 
2.12.0.264.gd6db3f216544



Re: [PATCH 7/7] crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64

2017-06-23 Thread Horia Geantă
On 6/22/2017 7:49 PM, Logan Gunthorpe wrote:
> Now that ioread64 and iowrite64 are always available we don't
> need the ugly ifdefs to change their implementation when they
> are not.
> 
Thanks Logan.

Note however this is not equivalent - it changes the behaviour, since
CAAM engine on i.MX6S/SL/D/Q platforms is broken in terms of 64-bit
register endianness - see CONFIG_CRYPTO_DEV_FSL_CAAM_IMX usage in code
you are removing.

[Yes, current code has its problems, as it does not differentiate b/w
i.MX platforms with and without the (unofficial) erratum, but this
should be fixed separately.]

Below is the change that would keep current logic - still forcing i.MX
to write CAAM 64-bit registers in BE even if the engine is LE (yes, diff
is doing a poor job).

Horia

diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index 84d2f838a063..b893ebb24e65 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -134,50 +134,25 @@ static inline void clrsetbits_32(void __iomem
*reg, u32 clear, u32 set)
  *base + 0x : least-significant 32 bits
  *base + 0x0004 : most-significant 32 bits
  */
-#ifdef CONFIG_64BIT
 static inline void wr_reg64(void __iomem *reg, u64 data)
 {
+#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
iowrite64(data, reg);
else
-   iowrite64be(data, reg);
-}
-
-static inline u64 rd_reg64(void __iomem *reg)
-{
-   if (caam_little_end)
-   return ioread64(reg);
-   else
-   return ioread64be(reg);
-}
-
-#else /* CONFIG_64BIT */
-static inline void wr_reg64(void __iomem *reg, u64 data)
-{
-#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
-   if (caam_little_end) {
-   wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
-   wr_reg32((u32 __iomem *)(reg), data);
-   } else
 #endif
-   {
-   wr_reg32((u32 __iomem *)(reg), data >> 32);
-   wr_reg32((u32 __iomem *)(reg) + 1, data);
-   }
+   iowrite64be(data, reg);
 }

 static inline u64 rd_reg64(void __iomem *reg)
 {
 #ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
-   return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg)));
+   return ioread64(reg);
else
 #endif
-   return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg) + 1));
+   return ioread64be(reg);
 }
-#endif /* CONFIG_64BIT  */

 #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
 #ifdef CONFIG_SOC_IMX7D


> Signed-off-by: Logan Gunthorpe <log...@deltatee.com>
> Cc: "Horia Geantă" <horia.gea...@nxp.com>
> Cc: Dan Douglass <dan.dougl...@nxp.com>
> Cc: Herbert Xu <herb...@gondor.apana.org.au>
> Cc: "David S. Miller" <da...@davemloft.net>
> ---
>  drivers/crypto/caam/regs.h | 29 -
>  1 file changed, 29 deletions(-)
> 
> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
> index 84d2f838a063..26fc19dd0c39 100644
> --- a/drivers/crypto/caam/regs.h
> +++ b/drivers/crypto/caam/regs.h
> @@ -134,7 +134,6 @@ static inline void clrsetbits_32(void __iomem *reg, u32 
> clear, u32 set)
>   *base + 0x : least-significant 32 bits
>   *base + 0x0004 : most-significant 32 bits
>   */
> -#ifdef CONFIG_64BIT
>  static inline void wr_reg64(void __iomem *reg, u64 data)
>  {
>   if (caam_little_end)
> @@ -151,34 +150,6 @@ static inline u64 rd_reg64(void __iomem *reg)
>   return ioread64be(reg);
>  }
>  
> -#else /* CONFIG_64BIT */
> -static inline void wr_reg64(void __iomem *reg, u64 data)
> -{
> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> - if (caam_little_end) {
> - wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
> - wr_reg32((u32 __iomem *)(reg), data);
> - } else
> -#endif
> - {
> - wr_reg32((u32 __iomem *)(reg), data >> 32);
> - wr_reg32((u32 __iomem *)(reg) + 1, data);
> - }
> -}
> -
> -static inline u64 rd_reg64(void __iomem *reg)
> -{
> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> - if (caam_little_end)
> - return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
> - (u64)rd_reg32((u32 __iomem *)(reg)));
> - else
> -#endif
> - return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
> - (u64)rd_reg32((u32 __iomem *)(reg) + 1));
> -}
> -#endif /* CONFIG_64BIT  */
> -
>  #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  #ifdef CONFIG_SOC_IMX7D
>  #define cpu_to_caam_dma(value) \
> 


Re: [PATCH] powerpc: fix distclean with Makefile.postlink

2017-05-08 Thread Horia Geantă
On 5/8/2017 2:57 PM, Michael Ellerman wrote:
> Horia Geantă <horia.gea...@nxp.com> writes:
> 
>> Makefile.postlink always includes include/config/auto.conf, however
>> this file is not present in a clean kernel tree, causing make to fail:
>>
>> arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or 
>> directory
>> make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
>> make: *** [vmlinuxclean] Error 2
>>
>> Change the inclusion such that file not being found does not trigger
>> an error.
>>
>> Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check 
>> relocations")
> 
> I can't reproduce this. What exact steps are you doing? And what version
> of Make?
> 
Start with a clean kernel tree and then
make distclean
arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such
file or directory
make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
make: *** [vmlinuxclean] Error 2

make --version
GNU Make 3.82
Built for x86_64-redhat-linux-gnu
Copyright (C) 2010  Free Software Foundation, Inc.
[...]

The fix is basically the same as:
6e5b95cdbd0e MIPS: Fix distclean with Makefile.postlink

Regards,
Horia



[PATCH] powerpc: fix distclean with Makefile.postlink

2017-05-08 Thread Horia Geantă
Makefile.postlink always includes include/config/auto.conf, however
this file is not present in a clean kernel tree, causing make to fail:

arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or 
directory
make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
make: *** [vmlinuxclean] Error 2

Change the inclusion such that file not being found does not trigger
an error.

Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check relocations")
Reported-by: Mircea Pop <mircea@nxp.com>
Signed-off-by: Horia Geantă <horia.gea...@nxp.com>
---
 arch/powerpc/Makefile.postlink | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index 3c22d64b2de9..eccfcc88afae 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -7,7 +7,7 @@
 PHONY := __archpost
 __archpost:
 
-include include/config/auto.conf
+-include include/config/auto.conf
 include scripts/Kbuild.include
 
 quiet_cmd_relocs_check = CHKREL  $@
-- 
2.12.0.264.gd6db3f216544



Re: [PATCH v2] crypto: talitos: Extend max key length for SHA384/512-HMAC and AEAD

2017-05-02 Thread Horia Geantă
On 5/2/2017 4:38 PM, Martin Hicks wrote:
> 
> An updated patch that also handles the additional key length requirements
> for the AEAD algorithms.
> 
> The max keysize is not 96.  For SHA384/512 it's 128, and for the AEAD
> algorithms it's longer still.  Extend the max keysize for the
> AEAD size for AES256 + HMAC(SHA512).
> 
> Cc: <sta...@vger.kernel.org> # 3.6+
> Fixes: 357fb60502ede ("crypto: talitos - add sha224, sha384 and sha512 to 
> existing AEAD algorithms")
> Signed-off-by: Martin Hicks <m...@bork.org>
Acked-by: Horia Geantă <horia.gea...@nxp.com>

Thanks,
Horia



Re: [PATCH] crypto: talitos: Extend max key length for SHA384/512-HMAC

2017-04-28 Thread Horia Geantă
On 4/27/2017 6:46 PM, Martin Hicks wrote:
> 
> The max keysize for both of these is 128, not 96.  Before, with keysizes
> over 96, the memcpy in ahash_setkey() would overwrite memory beyond the
> key field.
> 
While here, what about aead_setkey()?
AFAICT, TALITOS_MAX_KEY_SIZE value has been incorrect since forever; it
does not cover the maximum values of encryption key + authentication key
for authenc algorithms.

So, based on algorithms currently registered, I think this should be:
#define TALITOS_MAX_KEY_SIZE(AES_MAX_KEY_SIZE + \
 SHA512_BLOCK_SIZE)


> Signed-off-by: Martin Hicks 
Cc: sta...@vger.kernel.org # 3.6+
Fixes: 357fb60502ede ("crypto: talitos - add sha224, sha384 and sha512
to existing AEAD algorithms")

> ---
>  drivers/crypto/talitos.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index 0bba6a1..97dc85e 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -816,7 +816,7 @@ static void talitos_unregister_rng(struct device *dev)
>   * HMAC_SNOOP_NO_AFEA (HSNA) instead of type IPSEC_ESP
>   */
>  #define TALITOS_CRA_PRIORITY_AEAD_HSNA   (TALITOS_CRA_PRIORITY - 1)
> -#define TALITOS_MAX_KEY_SIZE 96
> +#define TALITOS_MAX_KEY_SIZE SHA512_BLOCK_SIZE /* SHA512 has the 
> largest keysize input */
>  #define TALITOS_MAX_IV_LENGTH16 /* max of AES_BLOCK_SIZE, 
> DES3_EDE_BLOCK_SIZE */
>  
>  struct talitos_ctx {
> 



[PATCH v3 4/8] powerpc: add io{read,write}64 accessors

2016-05-19 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.

Acked-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Horia Geantă <horia.gea...@nxp.com>
---
 arch/powerpc/kernel/iomap.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 12e48d56f771..3963f0b68d52 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -38,6 +38,18 @@ EXPORT_SYMBOL(ioread16);
 EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
+#ifdef __powerpc64__
+u64 ioread64(void __iomem *addr)
+{
+   return readq(addr);
+}
+u64 ioread64be(void __iomem *addr)
+{
+   return readq_be(addr);
+}
+EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(ioread64be);
+#endif /* __powerpc64__ */
 
 void iowrite8(u8 val, void __iomem *addr)
 {
@@ -64,6 +76,18 @@ EXPORT_SYMBOL(iowrite16);
 EXPORT_SYMBOL(iowrite16be);
 EXPORT_SYMBOL(iowrite32);
 EXPORT_SYMBOL(iowrite32be);
+#ifdef __powerpc64__
+void iowrite64(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
+void iowrite64be(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
+EXPORT_SYMBOL(iowrite64);
+EXPORT_SYMBOL(iowrite64be);
+#endif /* __powerpc64__ */
 
 /*
  * These are the "repeat read/write" functions. Note the
-- 
2.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/8] powerpc: add io{read,write}64 accessors

2016-05-05 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.

Signed-off-by: Horia Geantă <horia.gea...@nxp.com>
---
 arch/powerpc/kernel/iomap.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 12e48d56f771..3963f0b68d52 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -38,6 +38,18 @@ EXPORT_SYMBOL(ioread16);
 EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
+#ifdef __powerpc64__
+u64 ioread64(void __iomem *addr)
+{
+   return readq(addr);
+}
+u64 ioread64be(void __iomem *addr)
+{
+   return readq_be(addr);
+}
+EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(ioread64be);
+#endif /* __powerpc64__ */
 
 void iowrite8(u8 val, void __iomem *addr)
 {
@@ -64,6 +76,18 @@ EXPORT_SYMBOL(iowrite16);
 EXPORT_SYMBOL(iowrite16be);
 EXPORT_SYMBOL(iowrite32);
 EXPORT_SYMBOL(iowrite32be);
+#ifdef __powerpc64__
+void iowrite64(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
+void iowrite64be(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
+EXPORT_SYMBOL(iowrite64);
+EXPORT_SYMBOL(iowrite64be);
+#endif /* __powerpc64__ */
 
 /*
  * These are the "repeat read/write" functions. Note the
-- 
2.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/7] powerpc: add io{read,write}64 accessors

2016-05-04 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.

Signed-off-by: Horia Geantă <horia.gea...@nxp.com>
---
 arch/powerpc/kernel/iomap.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 12e48d56f771..3963f0b68d52 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -38,6 +38,18 @@ EXPORT_SYMBOL(ioread16);
 EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
+#ifdef __powerpc64__
+u64 ioread64(void __iomem *addr)
+{
+   return readq(addr);
+}
+u64 ioread64be(void __iomem *addr)
+{
+   return readq_be(addr);
+}
+EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(ioread64be);
+#endif /* __powerpc64__ */
 
 void iowrite8(u8 val, void __iomem *addr)
 {
@@ -64,6 +76,18 @@ EXPORT_SYMBOL(iowrite16);
 EXPORT_SYMBOL(iowrite16be);
 EXPORT_SYMBOL(iowrite32);
 EXPORT_SYMBOL(iowrite32be);
+#ifdef __powerpc64__
+void iowrite64(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
+void iowrite64be(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
+EXPORT_SYMBOL(iowrite64);
+EXPORT_SYMBOL(iowrite64be);
+#endif /* __powerpc64__ */
 
 /*
  * These are the "repeat read/write" functions. Note the
-- 
2.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-19 Thread Horia Geantă
On 3/18/2015 12:03 AM, Kim Phillips wrote:
 On Tue, 17 Mar 2015 19:58:55 +0200
 Horia Geantă horia.gea...@freescale.com wrote:
 
 On 3/17/2015 2:19 AM, Kim Phillips wrote:
 On Mon, 16 Mar 2015 12:02:51 +0200
 Horia Geantă horia.gea...@freescale.com wrote:

 On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

 Seems there are quite a few places that do not use the
 {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
 Among them, IPsec and dm-crypt.
 I've looked at the code and I don't think it can be converted to use
 crypto API.

 why not?

 It would imply having 2 memory allocations, one for crypto request and
 the other for the rest of the data bundled with the request (for IPsec
 that would be ESN + space for IV + sg entries for authenticated-only
 data and sk_buff extension, if needed).

 Trying to have a single allocation by making ESN, IV etc. part of the
 request private context requires modifying tfm.reqsize on the fly.
 This won't work without adding some kind of locking for the tfm.
 
 can't a common minimum tfm.reqsize be co-established up front, at
 least for the fast path?

Indeed, for IPsec at tfm allocation time - esp_init_state() -
tfm.reqsize could be increased to account for what is known for a given
flow: ESN, IV and asg (S/G entries for authenticated-only data).
The layout would be:
aead request (fixed part)
private ctx of backend algorithm
seq_no_hi (if ESN)
IV
asg
sg -- S/G table for skb_to_sgvec; how many entries is the question

Do you have a suggestion for how many S/G entries to preallocate for
representing the sk_buff data to be encrypted?
An ancient esp4.c used ESP_NUM_FAST_SG, set to 4.
Btw, currently maximum number of fragments supported by the net stack
(MAX_SKB_FRAGS) is 16 or more.

 This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
 places. Some of the maintainers do not agree, as you've seen.

 would modifying the crypto API to either have a different
 *_request_alloc() API, and/or adding calls to negotiate the GFP mask
 between crypto users and drivers, e.g., get/set_gfp_mask, work?

 I think what DaveM asked for was the change to be transparent.

 Besides converting to *_request_alloc(), seems that all other options
 require some extra awareness from the user.
 Could you elaborate on the idea above?
 
 was merely suggesting communicating GFP flags anonymously across the
 API, i.e., GFP_DMA wouldn't appear in user code.

Meaning user would have to get_gfp_mask before allocating a crypto
request - i.e. instead of kmalloc(..., GFP_ATOMIC) to have
kmalloc(GFP_ATOMIC | get_gfp_mask(aead))?

 An alternative would be for talitos to use the page allocator to get 1 /
 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
 = 8 kB), dma_map_page the area and manage it internally for talitos_desc
 hw descriptors.
 What do you think?

 There's a comment in esp_alloc_tmp(): Use spare space in skb for
 this where possible, which is ideally where we'd want to be (esp.

 Ok, I'll check that. But note the where possible - finding room in the
 skb to avoid the allocation won't always be the case, and then we're
 back to square one.

So the skb cb is out of the question, being too small (48B).
Any idea what was the intention of the TODO - maybe to use the
tailroom in the skb data area?

 because that memory could already be DMA-able).  Your above
 suggestion would be in the opposite direction of that.

 The proposal:
 -removes dma (un)mapping on the fast path
 
 sure, but at the expense of additional complexity.

Right, there's no free lunch. But it's cheaper.

 -avoids requesting dma mappable memory for more than it's actually
 needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
 only its private context)
 
 compared to the payload?  Plus, we have plenty of DMA space these
 days.
 
 -for caam it has the added benefit of speeding the below search for the
 offending descriptor in the SW ring from O(n) to O(1):
 for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
  sw_idx = (tail + i)  (JOBR_DEPTH - 1);

  if (jrp-outring[hw_idx].desc ==
  jrp-entinfo[sw_idx].desc_addr_dma)
  break; /* found */
 }
 (drivers/crypto/caam/jr.c - caam_dequeue)
 
 how?  The job ring h/w will still be spitting things out
 out-of-order.

jrp-outring[hw_idx].desc bus address can be used to find the sw_idx in
O(1):

dma_addr_t desc_base = dma_map_page(alloc_page(GFP_DMA),...);
[...]
sw_idx = (desc_base - jrp-outring[hw_idx].desc) / JD_SIZE;

JD_SIZE would be 16 words (64B) - 13 words used for the h/w job
descriptor, 3 words can be used for smth. else.
Basically all JDs would be filled at a 64B-aligned offset in the memory
page.

 Plus, like I said, it's taking the problem in the wrong direction:
 we need to strive to merge the allocation

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-17 Thread Horia Geantă
On 3/17/2015 2:19 AM, Kim Phillips wrote:
 On Mon, 16 Mar 2015 12:02:51 +0200
 Horia Geantă horia.gea...@freescale.com wrote:
 
 On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

 Seems there are quite a few places that do not use the
 {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
 Among them, IPsec and dm-crypt.
 I've looked at the code and I don't think it can be converted to use
 crypto API.
 
 why not?

It would imply having 2 memory allocations, one for crypto request and
the other for the rest of the data bundled with the request (for IPsec
that would be ESN + space for IV + sg entries for authenticated-only
data and sk_buff extension, if needed).

Trying to have a single allocation by making ESN, IV etc. part of the
request private context requires modifying tfm.reqsize on the fly.
This won't work without adding some kind of locking for the tfm.

 
 This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
 places. Some of the maintainers do not agree, as you've seen.
 
 would modifying the crypto API to either have a different
 *_request_alloc() API, and/or adding calls to negotiate the GFP mask
 between crypto users and drivers, e.g., get/set_gfp_mask, work?

I think what DaveM asked for was the change to be transparent.

Besides converting to *_request_alloc(), seems that all other options
require some extra awareness from the user.
Could you elaborate on the idea above?

 
 An alternative would be for talitos to use the page allocator to get 1 /
 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
 = 8 kB), dma_map_page the area and manage it internally for talitos_desc
 hw descriptors.
 What do you think?
 
 There's a comment in esp_alloc_tmp(): Use spare space in skb for
 this where possible, which is ideally where we'd want to be (esp.

Ok, I'll check that. But note the where possible - finding room in the
skb to avoid the allocation won't always be the case, and then we're
back to square one.

 because that memory could already be DMA-able).  Your above
 suggestion would be in the opposite direction of that.

The proposal:
-removes dma (un)mapping on the fast path
-avoids requesting dma mappable memory for more than it's actually
needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
only its private context)
-for caam it has the added benefit of speeding the below search for the
offending descriptor in the SW ring from O(n) to O(1):
for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
sw_idx = (tail + i)  (JOBR_DEPTH - 1);

if (jrp-outring[hw_idx].desc ==
jrp-entinfo[sw_idx].desc_addr_dma)
break; /* found */
}
(drivers/crypto/caam/jr.c - caam_dequeue)

Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-16 Thread Horia Geantă
On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Seems there are quite a few places that do not use the
{aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
Among them, IPsec and dm-crypt.
I've looked at the code and I don't think it can be converted to use
crypto API.

This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
places. Some of the maintainers do not agree, as you've seen.

An alternative would be for talitos to use the page allocator to get 1 /
2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
= 8 kB), dma_map_page the area and manage it internally for talitos_desc
hw descriptors.
What do you think?

Thanks,
Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-11 Thread Horia Geantă
On 3/9/2015 5:08 PM, Martin Hicks wrote:
 On Mon, Mar 9, 2015 at 6:16 AM, Horia Geantă horia.gea...@freescale.com 
 wrote:
 On 3/3/2015 7:44 PM, Martin Hicks wrote:
 On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă
 horia.gea...@freescale.com wrote:

 For talitos, there are two cases:

 1. request data size is = data unit / sector size
 talitos can handle any IV / tweak scheme

 2. request data size  sector size
 since talitos internally generates the IV for the next sector by
 incrementing the previous IV, only IV schemes that allocate consecutive
 IV to consecutive sectors will function correctly.


 it's not clear to me that #1 is right.  I guess it could be, but the
 IV length would be limited to 8 bytes.

 Yes, there's a limitation in talitos wrt. XTS IV / tweak size - it's up
 to 8 bytes.
 So I guess ESSIV won't work with talitos-xts, since the encrypted IV
 output is 16 bytes.
 But as previously said, ESSIV breaks the XTS standard requirement for
 having a consecutive IV for consecutive blocks. ESSIV should really be
 used only with disk-level encryption schemes that require an
 unpredictable IV.
 
 Ok.  I'll verify that the second half of the IV is zeroed.
 
 One last thing that I'm not sure of is what string to place in
 cra_ablkcipher.geniv field.   eseqiv seems wrong if plain/plain64
 are the IVs that XTS is designed for.

Right. But since currently dm-crypt does not use .givencrypt and deals
with IV generation by itself, we're safe.
When dm-crypt IV generation will be moved to crypto, we'll have to
revisit this.

While here: note that xts-talitos supports only two key lengths - 256
and 512 bits. There are tcrypt speed tests that check also for 384-bit
keys (which is out-of-spec, but still...), leading to a Key Size Error
- see below (KSE bit in AESU Interrupt Status Register is set)

testing speed of async xts(aes) (xts-aes-talitos) encryption
[...]
test 5 (384 bit key, 16 byte blocks):
talitos ffe3.crypto: CDPR is NULL, giving up search for offending
descriptor
talitos ffe3.crypto: AESUISR 0x_0200
talitos ffe3.crypto: DESCBUF 0x64300011_
talitos ffe3.crypto: DESCBUF 0x_
talitos ffe3.crypto: DESCBUF 0x0010_
talitos ffe3.crypto: DESCBUF 0x0030_
talitos ffe3.crypto: DESCBUF 0x0010_
talitos ffe3.crypto: DESCBUF 0x0010_
talitos ffe3.crypto: DESCBUF 0x0010_
talitos ffe3.crypto: DESCBUF 0x_
encryption() failed flags=0

So for xts, driver must enforce 256/512 bit key lengths and return
CRYPTO_TFM_RES_BAD_KEY_LEN in all other cases.
Or a SW fallback could be used for the other cases, but I don't think
it's worth the effort since these are non-standard.

Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-09 Thread Horia Geantă
On 3/6/2015 6:48 AM, Herbert Xu wrote:
 On Thu, Mar 05, 2015 at 11:35:23AM +0200, Horia Geantă wrote:

 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

 Right. And this flag would apply only to request __ctx[].

 Herbert, would this be an acceptable addition to crypto API?
 
 How would such a flag work?

Hm, I thought that GFP_DMA memory could be allocated only for request
private ctx. This is obviously not the case.

*_request_alloc(tfm, gfp) crypto API functions would do:
if (crypto_tfm_get_flags(tfm)  CRYPTO_TFM_REQ_DMA)
gfp |= GFP_DMA;

Thanks,
Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] crypto: talitos: Add AES-XTS Support

2015-03-09 Thread Horia Geantă
On 3/7/2015 3:16 AM, Kim Phillips wrote:
 On Fri, 6 Mar 2015 11:49:43 -0500
 Martin Hicks m...@bork.org wrote:
 
 On Thu, Mar 5, 2015 at 7:16 PM, Kim Phillips kim.phill...@freescale.com 
 wrote:
 On Fri, 20 Feb 2015 12:00:10 -0500
 Martin Hicks m...@bork.org wrote:

 The newer talitos hardware has support for AES in XTS mode.

 Assuming it's the same thing, AES-XCBC gets added with SEC v3.0
 h/w.  Assuming hw_supports() doesn't already support this algorithm

 AES-XCBC isn't the same thing as AES-XTS.
 
 Thanks.
 
 combination (technically via the mode bit), this needs to be
 reflected in the patch so the driver doesn't think SEC 2.x devices
 can do XTS, too.

 Right.  I hadn't looked into how exactly hw_supports() works.  It only
 indicates which execution units are present (in this case the AES
 unit).  I actually think XTS gets introduced in SEC v3.3.2.  I also
 have an MPC8379 (sec3.3) and it does not have XTS.

 Can you look internally to find out in which hardware it was
 introduced?  Is there a SEC 3.3.1 that also has XTS?
 
 later MPC8315Es had a SEC v3.3.1, but AFAICT, it doesn't support
 XTS, so, yes, it's likely v3.3.2 and above (if any).

There's a public application note on freescale.com:
AN3645 - SEC 2x/3x Descriptor Programmer's Guide (Rev.3/2010)

Table 4 - EUs Supported in Each SEC Version summarizes which
algorithms / modes are supported for every talitos version.
Unfortunately this goes up to SEC 3.3.1.
Since XTS doesn't show up, 3.3.2 would be the first supporting it.

Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-09 Thread Horia Geantă
On 3/3/2015 7:44 PM, Martin Hicks wrote:
 On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă
 horia.gea...@freescale.com wrote:
 On 3/3/2015 12:09 AM, Martin Hicks wrote:

 On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote:

 If crypto API allows to encrypt more sectors in one run
 (handling IV internally) dmcrypt can be modified of course.

 But do not forget we can use another IV (not only sequential number)
 e.g. ESSIV with XTS as well (even if it doesn't make much sense, some 
 people
 are using it).

 Interesting, I'd not considered using XTS with an IV other than plain/64.
 The talitos hardware would not support aes/xts in any mode other than
 plain/plain64 I don't think...Although perhaps you could push in an 8-byte
 IV and the hardware would interpret it as the sector #.


 For talitos, there are two cases:

 1. request data size is = data unit / sector size
 talitos can handle any IV / tweak scheme

 2. request data size  sector size
 since talitos internally generates the IV for the next sector by
 incrementing the previous IV, only IV schemes that allocate consecutive
 IV to consecutive sectors will function correctly.

 
 it's not clear to me that #1 is right.  I guess it could be, but the
 IV length would be limited to 8 bytes.

Yes, there's a limitation in talitos wrt. XTS IV / tweak size - it's up
to 8 bytes.
So I guess ESSIV won't work with talitos-xts, since the encrypted IV
output is 16 bytes.
But as previously said, ESSIV breaks the XTS standard requirement for
having a consecutive IV for consecutive blocks. ESSIV should really be
used only with disk-level encryption schemes that require an
unpredictable IV.

 
 This also points out that claiming that the XTS IV size is 16 bytes,
 as my current patch does, could be problematic.  It's handy because
 the first 8 bytes should contain a plain64 sector #, and the second
 u64 can be used to encode the sector size but it would be a mistake
 for someone to use the second 8 bytes for the rest of a 16byte IV.

XTS IV *is* 16 bytes. The fact that xts-talitos can handle only 8 bytes
is a problem indeed, but for plain and plain64 should not matter.

Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-05 Thread Horia Geantă
On 3/4/2015 2:23 AM, Kim Phillips wrote:
 On Tue,  3 Mar 2015 08:21:37 -0500
 Martin Hicks m...@bork.org wrote:
 
 @@ -1170,6 +1237,8 @@ static struct talitos_edesc 
 *talitos_edesc_alloc(struct device *dev,
   edesc-dma_len,
   DMA_BIDIRECTIONAL);
  edesc-req.desc = edesc-desc;
 +/* A copy of the crypto_async_request to use the crypto_queue backlog */
 +memcpy(edesc-req.base, areq, sizeof(struct crypto_async_request));
 
 this seems backward, or, at least can be done more efficiently IMO:
 talitos_cra_init should set the tfm's reqsize so the rest of
 the driver can wholly embed its talitos_edesc (which should also
 wholly encapsulate its talitos_request (i.e., not via a pointer))
 into the crypto API's request handle allocation.  This
 would absorb and eliminate the talitos_edesc kmalloc and frees, the
 above memcpy, and replace the container_of after the
 crypto_dequeue_request with an offset_of, right?
 
 When scatter-gather buffers are needed, we can assume a slower-path
 and make them do their own allocations, since their sizes vary
 depending on each request.  Of course, a pointer to those
 allocations would need to be retained somewhere in the request
 handle.

Unfortunately talitos_edesc structure size is most of the times
variable. Its exact size can only be established at request time, and
not at tfm init time.

Fixed size would be sizeof(talitos_edesc).
Below are factors that influence the variable part, i.e. link_tbl in
talitos_edesc:
- whether any assoc / src / dst data is scattered
- icv_stashing (case when ICV checking is done in SW)

Still we'd be better with:
-crypto API allocates request + request context (i.e.
sizeof(talitos_edesc) + any alignment required)
-talitos driver allocates variable part of talitos_edesc (if needed)

instead of:
-crypto API allocates request
-talitos driver allocates talitos_edesc (fixed + variable)
-memcopy of the req.base (crypto_async_request) into talitos_edesc

both in terms of performance and readability.

At first look, the driver wouldn't change that much:
-talitos_cra_init() callback would have to set tfm.reqsize to
sizeof(talitos_edesc) + padding and also add the CRYPTO_TFM_REQ_DMA
indication in tfm.crt_flags
-talitos_edesc_alloc() logic would be pretty much the same, but would
allocate memory only for the link_tbl

I'm willing to do these changes if needed.

 
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Right. And this flag would apply only to request __ctx[].

Herbert, would this be an acceptable addition to crypto API?

Thanks,
Horia


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-03 Thread Horia Geantă
On 3/3/2015 12:09 AM, Martin Hicks wrote:
 
 On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote:

 If crypto API allows to encrypt more sectors in one run
 (handling IV internally) dmcrypt can be modified of course.

 But do not forget we can use another IV (not only sequential number)
 e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people
 are using it).
 
 Interesting, I'd not considered using XTS with an IV other than plain/64.
 The talitos hardware would not support aes/xts in any mode other than
 plain/plain64 I don't think...Although perhaps you could push in an 8-byte
 IV and the hardware would interpret it as the sector #.
 

For talitos, there are two cases:

1. request data size is = data unit / sector size
talitos can handle any IV / tweak scheme

2. request data size  sector size
since talitos internally generates the IV for the next sector by
incrementing the previous IV, only IV schemes that allocate consecutive
IV to consecutive sectors will function correctly.

Let's not forget what XTS standard says about IVs / tweak values:
- each data unit (sector in this case) is assigned a non-negative tweak
value and
- tweak values are assigned *consecutively*, starting from an arbitrary
non-negative value
- there's no requirement for tweak values to be unpredictable

Thus, in theory ESSIV is not supposed to be used with XTS mode: the IVs
for consecutive sectors are not consecutive values.
In practice, as Milan said, the combination is sometimes used. It
functions correctly in SW (and also in talitos as long as req. data size
= sector size).

 Maybe the following question would be if the dmcrypt sector IV algorithms
 should moved into crypto API as well.
 (But because I misused dmcrypt IVs hooks for some additional operations
 for loopAES and old Truecrypt CBC mode, it is not so simple...)
 
 Speaking again with talitos in mind, there would be no advantage for this
 hardware.  Although larger requests are possible only a single IV can be
 provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs
 are the only option (short of switching to 4kB block size).

Right, as explained above talitos does what the XTS mode standard
mandates. So it won't work properly in case of cbc-aes:essiv with
request sizes larger than sector size.

Still, in SW at least, XTS could be improved to process more sectors in
one shot, regardless of the IV scheme used - as long as there's a
IV.next() function and both data size and sector size are known.

Horia

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Horia Geantă
On 2/20/2015 7:00 PM, Martin Hicks wrote:
 This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2.
 
 One of the nice things about this hardware is that it knows how to deal
 with encrypt/decrypt requests that are larger than sector size, but that 
 also requires that that the sector size be passed into the crypto engine
 as an XTS cipher context parameter.
 
 When a request is larger than the sector size the sector number is
 incremented by the talitos engine and the tweak key is re-calculated
 for the new sector.
 
 I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit
 and 256bit) to ensure interoperability with the software AES-XTS
 implementation.  All testing was done using dm-crypt/LUKS with
 aes-xts-plain64.
 
 Is there a better solution that just hard coding the sector size to
 (1SECTOR_SHIFT)?  Maybe dm-crypt should be modified to pass the
 sector size along with the plain/plain64 IV to an XTS algorithm?

AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not
aware of a sector size (data unit size in IEEE P1619 terminology):
There's a hidden assumption that all the data send to xts in one request
belongs to a single sector. Even more, it's supposed that the first
16-byte block in the request is block 0 in the sector. These can be
seen from the way the tweak (T) value is computed.
(Side note: there's no support of ciphertext stealing in crypto/xts.c -
i.e. sector sizes must be a multiple of underlying block cipher size -
that is 16B.)

If dm-crypt would be modified to pass sector size somehow, all in-kernel
xts implementations would have to be made aware of the change.
I have nothing against this, but let's see what crypto maintainers have
to say...

BTW, there were some discussions back in 2013 wrt. being able to
configure / increase sector size, smth. crypto engines would benefit from:
http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html
(experimental patch)
http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html

The experimental patch sends sector size as the req-nbytes - hidden
assumption: data size sent in an xts crypto request equals a sector.

I am not sure if there was a follow-up though...
Adding Milan - maybe he could shed some light.

Thanks,
Horia

 
 Martin Hicks (2):
   crypto: talitos: Clean ups and comment fixes for ablkcipher commands
   crypto: talitos: Add AES-XTS Support
 
  drivers/crypto/talitos.c |   45 +
  drivers/crypto/talitos.h |1 +
  2 files changed, 38 insertions(+), 8 deletions(-)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE

2015-02-27 Thread Horia Geantă
On 2/20/2015 6:21 PM, Martin Hicks wrote:
 This is properly defined in the md5 header file.
 ---

Signed-off-by tag is missing.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] crypto: talitos: Add AES-XTS Support

2015-02-27 Thread Horia Geantă
On 2/20/2015 7:00 PM, Martin Hicks wrote:
 The newer talitos hardware has support for AES in XTS mode.
 
 Signed-off-by: Martin Hicks m...@bork.org
 ---

checkpatch complains about formatting, please check.

  drivers/crypto/talitos.c |   33 +
  drivers/crypto/talitos.h |1 +
  2 files changed, 34 insertions(+)
 
 diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
 index 6b2a19a..38cbde1 100644
 --- a/drivers/crypto/talitos.c
 +++ b/drivers/crypto/talitos.c
 @@ -40,9 +40,11 @@
  #include linux/spinlock.h
  #include linux/rtnetlink.h
  #include linux/slab.h
 +#include linux/device-mapper.h
  
  #include crypto/algapi.h
  #include crypto/aes.h
 +#include crypto/xts.h
  #include crypto/des.h
  #include crypto/sha.h
  #include crypto/md5.h
 @@ -1464,9 +1466,22 @@ static struct talitos_edesc 
 *ablkcipher_edesc_alloc(struct ablkcipher_request *
   areq, bool encrypt)
  {
   struct crypto_ablkcipher *cipher = crypto_ablkcipher_reqtfm(areq);
 + struct crypto_tfm *tfm = (struct crypto_tfm *)cipher;
   struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher);
   unsigned int ivsize = crypto_ablkcipher_ivsize(cipher);
  
 + /*
 +  * AES-XTS uses the first two AES Context registers for:
 +  *
 +  * Register 1:   Sector Number (Little Endian)
 +  * Register 2:   Sector Size   (Big Endian)
 +  *
 +  * Whereas AES-CBC uses registers 1/2 as a 16-byte IV.
 +  */
 + if (!strcmp(crypto_tfm_alg_name(tfm),xts(aes)))

I guess it would be better to use ctx-desc_hdr_template instead of
string comparison.

 + /* Fixed sized sector */
 + *((u64 *)areq-info + 1) = cpu_to_be64((1SECTOR_SHIFT));
 +
   return talitos_edesc_alloc(ctx-dev, NULL, areq-src, areq-dst,
  areq-info, 0, areq-nbytes, 0, ivsize, 0,
  areq-base.flags, areq-base, encrypt);
 @@ -2192,6 +2207,24 @@ static struct talitos_alg_template driver_algs[] = {
DESC_HDR_MODE0_DEU_CBC |
DESC_HDR_MODE0_DEU_3DES,
   },
 + {   .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
 + .alg.crypto = {
 + .cra_name = xts(aes),
 + .cra_driver_name = xts-aes-talitos,
 + .cra_blocksize = XTS_BLOCK_SIZE,
 + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
 + CRYPTO_ALG_ASYNC,
 + .cra_ablkcipher = {
 + .min_keysize = AES_MIN_KEY_SIZE * 2,
 + .max_keysize = AES_MAX_KEY_SIZE * 2,
 + .ivsize = XTS_BLOCK_SIZE,
 + }
 + },
 + .desc_hdr_template = DESC_HDR_TYPE_COMMON_NONSNOOP_NO_AFEU |
 + DESC_HDR_SEL0_AESU |
 + DESC_HDR_MODE0_AESU_XTS,
 + },
 +
   /* AHASH algorithms. */
   {   .type = CRYPTO_ALG_TYPE_AHASH,
   .alg.hash = {
 diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
 index a6f73e2..735da82 100644
 --- a/drivers/crypto/talitos.h
 +++ b/drivers/crypto/talitos.h
 @@ -316,6 +316,7 @@ extern int talitos_submit(struct device *dev, int ch, 
 struct talitos_edesc *edes
  /* primary execution unit mode (MODE0) and derivatives */
  #define  DESC_HDR_MODE0_ENCRYPT  cpu_to_be32(0x0010)
  #define  DESC_HDR_MODE0_AESU_CBC cpu_to_be32(0x0020)
 +#define  DESC_HDR_MODE0_AESU_XTS cpu_to_be32(0x0420)
  #define  DESC_HDR_MODE0_DEU_CBC  cpu_to_be32(0x0040)
  #define  DESC_HDR_MODE0_DEU_3DES cpu_to_be32(0x0020)
  #define  DESC_HDR_MODE0_MDEU_CONTcpu_to_be32(0x0800)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling

2015-02-24 Thread Horia Geantă
On 2/20/2015 6:21 PM, Martin Hicks wrote:
 I was running into situations where the hardware FIFO was filling up, and
 the code was returning EAGAIN to dm-crypt and just dropping the submitted
 crypto request.
 
 This adds support in talitos for a software backlog queue.  When requests
 can't be queued to the hardware immediately EBUSY is returned.  The queued
 requests are dispatched to the hardware in received order as hardware FIFO
 slots become available.
 
 Signed-off-by: Martin Hicks m...@bork.org

Hi Martin,

Thanks for the effort!
Indeed we noticed that talitos (and caam) don't play nicely with
dm-crypt, lacking a backlog mechanism.

Please run checkpatch --strict and fix the errors, warnings.

 ---
  drivers/crypto/talitos.c |   92 
 +++---
  drivers/crypto/talitos.h |3 ++
  2 files changed, 74 insertions(+), 21 deletions(-)
 
 diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
 index d3472be..226654c 100644
 --- a/drivers/crypto/talitos.c
 +++ b/drivers/crypto/talitos.c
 @@ -183,43 +183,72 @@ static int init_device(struct device *dev)
  }
  
  /**
 - * talitos_submit - submits a descriptor to the device for processing
 + * talitos_handle_queue - performs submissions either of new descriptors
 + *or ones waiting in the queue backlog.
   * @dev: the SEC device to be used
   * @ch:  the SEC device channel to be used
 - * @edesc:   the descriptor to be processed by the device
 - * @context: a handle for use by caller (optional)

The context kernel-doc should have been removed in patch 4/5.

 + * @edesc:   the descriptor to be processed by the device (optional)
   *
   * desc must contain valid dma-mapped (bus physical) address pointers.
   * callback must check err and feedback in descriptor header
 - * for device processing status.
 + * for device processing status upon completion.
   */
 -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
 +int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc 
 *edesc)
  {
   struct talitos_private *priv = dev_get_drvdata(dev);
 - struct talitos_request *request = edesc-req;
 + struct talitos_request *request, *orig_request = NULL;
 + struct crypto_async_request *async_req;
   unsigned long flags;
   int head;
 + int ret = -EINPROGRESS; 
  
   spin_lock_irqsave(priv-chan[ch].head_lock, flags);
  
 + if (edesc) {
 + orig_request = edesc-req;
 + crypto_enqueue_request(priv-chan[ch].queue, 
 orig_request-base);
 + }

The request goes through the SW queue even if there are empty slots in
the HW queue, doing unnecessary crypto_queue_encrypt() and
crypto_queue_decrypt(). Trying to use the HW queue first would be better.

 +
 +flush_another:
 + if (priv-chan[ch].queue.qlen == 0) {
 + spin_unlock_irqrestore(priv-chan[ch].head_lock, flags);
 + return 0;
 + }
 +
   if (!atomic_inc_not_zero(priv-chan[ch].submit_count)) {
   /* h/w fifo is full */
   spin_unlock_irqrestore(priv-chan[ch].head_lock, flags);
 - return -EAGAIN;
 + return -EBUSY;
   }
  
 - head = priv-chan[ch].head;
 + /* Dequeue the oldest request */
 + async_req = crypto_dequeue_request(priv-chan[ch].queue);
 +
 + request = container_of(async_req, struct talitos_request, base);
   request-dma_desc = dma_map_single(dev, request-desc,
  sizeof(*request-desc),
  DMA_BIDIRECTIONAL);
  
   /* increment fifo head */
 + head = priv-chan[ch].head;
   priv-chan[ch].head = (priv-chan[ch].head + 1)  (priv-fifo_len - 1);
  
 - smp_wmb();
 - priv-chan[ch].fifo[head] = request;
 + spin_unlock_irqrestore(priv-chan[ch].head_lock, flags);
 +
 + /*
 +  * Mark a backlogged request as in-progress, return EBUSY because
 +  * the original request that was submitted is backlogged.

s/is backlogged/is backlogged or dropped
Original request will not be enqueued by crypto_queue_enqueue() if the
CRYPTO_TFM_REQ_MAY_BACKLOG flag is not set (since SW queue is for
backlog only) - that's the case for IPsec requests.

 +  */
 + if (request != orig_request) {
 + struct crypto_async_request *areq = request-context;
 + areq-complete(areq, -EINPROGRESS);
 + ret = -EBUSY;
 + }
 +
 + spin_lock_irqsave(priv-chan[ch].head_lock, flags);
  
   /* GO! */
 + priv-chan[ch].fifo[head] = request;
   wmb();
   out_be32(priv-chan[ch].reg + TALITOS_FF,
upper_32_bits(request-dma_desc));
 @@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct 
 talitos_edesc *edesc)
  
   spin_unlock_irqrestore(priv-chan[ch].head_lock, flags);
  
 - return -EINPROGRESS;
 + /*
 +  * When handling the queue via the completion path, queue