Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-27 Thread Andy Lutomirski
On Tue, Dec 27, 2016 at 6:16 AM, Daniel Borkmann  wrote:
> On 12/27/2016 10:58 AM, Herbert Xu wrote:
>>
>> On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote:
>>>
>>>
>>> According to Daniel, the networking folks want to let embedded systems
>>> include BPF without requiring the crypto core.
>>
>>
>> Last I checked the IPv4 stack depended on the crypto API so this
>> sounds bogus.
>
>
> I think there's a bit of a mixup here with what I said. To clarify,
> requirement back then from tracing folks was that bpf engine and
> therefore bpf syscall can be build w/o networking enabled for small
> devices, so dependencies preferably need to be kept on a absolute
> minimum, same counts for either making it suddenly a depend on
> CRYPTO or a select CRYPTO for just those few lines that can be
> pulled in from lib/ code instead.

Somehow I had that in my head as "networking" not "tracing", probably
because of the TCA stuff.  Whoops.

Anyway, I'm rewriting the crypto part of the patch completely based on
Ard's feedback.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-27 Thread Daniel Borkmann

On 12/27/2016 10:58 AM, Herbert Xu wrote:

On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote:


According to Daniel, the networking folks want to let embedded systems
include BPF without requiring the crypto core.


Last I checked the IPv4 stack depended on the crypto API so this
sounds bogus.


I think there's a bit of a mixup here with what I said. To clarify,
requirement back then from tracing folks was that bpf engine and
therefore bpf syscall can be build w/o networking enabled for small
devices, so dependencies preferably need to be kept on a absolute
minimum, same counts for either making it suddenly a depend on
CRYPTO or a select CRYPTO for just those few lines that can be
pulled in from lib/ code instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-27 Thread Herbert Xu
On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote:
>
> According to Daniel, the networking folks want to let embedded systems
> include BPF without requiring the crypto core.

Last I checked the IPv4 stack depended on the crypto API so this
sounds bogus.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-26 Thread Andy Lutomirski
On Mon, Dec 26, 2016 at 9:51 AM, Ard Biesheuvel
 wrote:
> On 26 December 2016 at 07:57, Herbert Xu  wrote:
>> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>>
>>> I actually do use incremental hashing later on.   BPF currently
>>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>>> I change it to hash as it goes.
>>
>> How much data is this supposed to hash on average? If it's a large
>> amount then perhaps using the existing crypto API would be a better
>> option than adding this.
>>
>
> This is a good point actually: you didn't explain *why* BPF shouldn't
> depend on the crypto API.

According to Daniel, the networking folks want to let embedded systems
include BPF without requiring the crypto core.

At some point, I'd also like to use modern hash functions for module
verification.  If doing so would require the crypto core to be
available when modules are loaded, then the crypto core couldn't be
modular.  (Although it occurs to me that my patches get that wrong --
if this change happens, I need to split the code so that the library
functions can be built in even if CRYPTO=m.)

Daniel, would you be okay with BPF selecting CRYPTO and CRYPTO_HASH?

Also, as a bikeshed thought: I could call the functions
sha256_init_direct(), etc.  Then there wouldn't be namespace
collisions and the fact that they bypass accelerated drivers would be
more obvious.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-26 Thread Ard Biesheuvel
On 26 December 2016 at 07:57, Herbert Xu  wrote:
> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>
>> I actually do use incremental hashing later on.   BPF currently
>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>> I change it to hash as it goes.
>
> How much data is this supposed to hash on average? If it's a large
> amount then perhaps using the existing crypto API would be a better
> option than adding this.
>

This is a good point actually: you didn't explain *why* BPF shouldn't
depend on the crypto API.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-25 Thread Herbert Xu
On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
> 
> I actually do use incremental hashing later on.   BPF currently
> vmallocs() a big temporary buffer just so it can fill it and hash it.
> I change it to hash as it goes.

How much data is this supposed to hash on average? If it's a large
amount then perhaps using the existing crypto API would be a better
option than adding this.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-24 Thread Andy Lutomirski
On Sat, Dec 24, 2016 at 2:33 AM, Ard Biesheuvel
 wrote:
> Hi Andy,
>
> On 24 December 2016 at 02:22, Andy Lutomirski  wrote:
>> There are some pieecs of kernel code that want to compute SHA256
>> directly without going through the crypto core.  Adjust the exported
>> API to decouple it from the crypto core.
>>
>
> There are a bunch of things happening at the same time in this patch,
> i.e., unnecessary renames of functions with static linkage, return
> type changes to the base prototypes (int (*)(...) to void (*)(...))
> and the change for the base functions to take a struct sha256_state
> ctx rather than a shash_desc. I suppose you are mainly after the
> latter, so could we please drop the other changes?
>
> For the name clashes, could we simply use the crypto_ prefix for the
> globally visible functions rather than using names that are already in
> use? (and having to go around clean up the conflicts)
> As for the return type changes, the base functions intentionally
> return int to allow tail calls from the functions exposed by the
> crypto API (whose prototypes cannot be changed). Unlikely to matter in
> the grand scheme of things (especially now that the base layer
> consists of static inline functions primarily), but it is equally
> pointless to go around and change them to return void IMO.
>
> So what remains is the struct shash_desc to struct sha256_state
> change, which makes sense given that you are after a sha256_digest()
> function that does not require the crypto API. But it seems your use
> case does not rely on incremental hashing, and so there is no reason
> for the state to be exposed outside of the implementation, and we
> could simply expose a crypto_sha256_digest() routine from the
> sha256_generic.c implementation instead.

I actually do use incremental hashing later on.   BPF currently
vmallocs() a big temporary buffer just so it can fill it and hash it.
I change it to hash as it goes.

I painted the bike shed the other way because I thought that crypto_
names should indicate that they're the versions compatible with the
crypto API, but I take your point about churn.  Part of the reason I
didn't want to use crypto_sha256_update is because that function is
currently like this:

int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
  unsigned int len)

and I wanted to avoid churn.  The sha256_update() functions scattered
all over were static, so I didn't worry about them.

I'm going to give this another try as a split-up series that tries to
avoid making any changes beyond simple function renames to the
drivers.

>
> Also, I strongly feel that crypto and other security related patches
> should be tested before being posted, even if they are only RFC,
> especially when they are posted by high profile kernel devs like
> yourself. (Your code incorrectly calls crypto_sha2_final() in a couple
> of places, resulting in the finalization being performed twice, once
> with the accelerated block transform and again with the generic
> transform)
>

I tested it, albeit poorly.  I wanted feedback on the API (thanks!)
and I figured I could more carefully check the implementation once the
API survives a bit of review.  Since it looks like I have to rework
this, I'd need to re-test anyway.

>> I suspect this will very slightly speed up the SHA256 shash operations
>> as well by reducing the amount of indirection involved.
>>
>
> I think you have a valid point when it comes to the complexity of the
> crypto API in general. But the struct sha256_state is embedded in the
> shash_desc rather than referred to via a pointer, so the level of
> indirection does not appear to change. And given how 99.9% of the
> SHA256 execution time is spent in the block transform routine anyway,
> I expect the performance delta to be in the noise tbh.

s/very slightly/negligibly?  There's an extra speedup from avoiding a
variable-length stack allocation, but that probably doesn't matter
much either.

>
> Finally, another thing to keep in mind is that the base layers of
> SHA-1, SHA-256 and SHA-512 are intentionally structured in the same
> way. If there is a need for a digest() entry point, I'd prefer to add
> them for all flavours.

I want to get sha256 right first.  Once it's in good shape, making the
same changes to the other variants should be easy.

>
> Whether this still belongs under crypto or under lib/sha256.c as a
> library function (allowing archs to override it) is open for debate.
> If hashing BPF programs becomes a hot spot, we probably have bigger
> problems.
>
> Regards,
> Ard.
>
> P.S. I do take your point regarding the arch_sha256_block_transform()
> proposed in your follow up email, but there are some details (SIMD,
> availability of the instructions etc) that would make it only suitable
> for the generic implementation anyway, and the base layer was already
> a huge improvement compared to the open coded implementations of the

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-24 Thread Ard Biesheuvel
Hi Andy,

On 24 December 2016 at 02:22, Andy Lutomirski  wrote:
> There are some pieecs of kernel code that want to compute SHA256
> directly without going through the crypto core.  Adjust the exported
> API to decouple it from the crypto core.
>

There are a bunch of things happening at the same time in this patch,
i.e., unnecessary renames of functions with static linkage, return
type changes to the base prototypes (int (*)(...) to void (*)(...))
and the change for the base functions to take a struct sha256_state
ctx rather than a shash_desc. I suppose you are mainly after the
latter, so could we please drop the other changes?

For the name clashes, could we simply use the crypto_ prefix for the
globally visible functions rather than using names that are already in
use? (and having to go around clean up the conflicts)
As for the return type changes, the base functions intentionally
return int to allow tail calls from the functions exposed by the
crypto API (whose prototypes cannot be changed). Unlikely to matter in
the grand scheme of things (especially now that the base layer
consists of static inline functions primarily), but it is equally
pointless to go around and change them to return void IMO.

So what remains is the struct shash_desc to struct sha256_state
change, which makes sense given that you are after a sha256_digest()
function that does not require the crypto API. But it seems your use
case does not rely on incremental hashing, and so there is no reason
for the state to be exposed outside of the implementation, and we
could simply expose a crypto_sha256_digest() routine from the
sha256_generic.c implementation instead.

Also, I strongly feel that crypto and other security related patches
should be tested before being posted, even if they are only RFC,
especially when they are posted by high profile kernel devs like
yourself. (Your code incorrectly calls crypto_sha2_final() in a couple
of places, resulting in the finalization being performed twice, once
with the accelerated block transform and again with the generic
transform)

> I suspect this will very slightly speed up the SHA256 shash operations
> as well by reducing the amount of indirection involved.
>

I think you have a valid point when it comes to the complexity of the
crypto API in general. But the struct sha256_state is embedded in the
shash_desc rather than referred to via a pointer, so the level of
indirection does not appear to change. And given how 99.9% of the
SHA256 execution time is spent in the block transform routine anyway,
I expect the performance delta to be in the noise tbh.

Finally, another thing to keep in mind is that the base layers of
SHA-1, SHA-256 and SHA-512 are intentionally structured in the same
way. If there is a need for a digest() entry point, I'd prefer to add
them for all flavours.

E.g, something like

"""
@@ -126,3 +128,23 @@ static inline int sha256_base_finish(struct
shash_desc *desc, u8 *out)
*sctx = (struct sha256_state){};
return 0;
 }
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+   unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   return __sha256_base_finish(sctx, out, digest_size);
+}
+
+static inline int sha256_base_digest(const u8 *data, unsigned int len, u8 *out,
+sha256_block_fn *block_fn)
+{
+   struct sha256_state sctx;
+
+   __sha256_base_init();
+   sha256_base_do_update(, data, len, block_fn);
+   sha256_base_do_finalize(, block_fn);
+
+   return __sha256_base_finish(, out, SHA256_DIGEST_SIZE);
+}
"""

(where __sha256_base_init() and __sha256_base_finish() are the
existing functions modified to take a struct sha256_state rather than
a shash_desc) should be sufficient to allow a generic
crypto_sha256_digest() to be composed that does not rely on the crypto
API.

Whether this still belongs under crypto or under lib/sha256.c as a
library function (allowing archs to override it) is open for debate.
If hashing BPF programs becomes a hot spot, we probably have bigger
problems.

Regards,
Ard.

P.S. I do take your point regarding the arch_sha256_block_transform()
proposed in your follow up email, but there are some details (SIMD,
availability of the instructions etc) that would make it only suitable
for the generic implementation anyway, and the base layer was already
a huge improvement compared to the open coded implementations of the
SHA boilerplate.


> Cc: Ard Biesheuvel 
> Cc: Herbert Xu 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/arm/crypto/sha2-ce-glue.c  | 10 ---
>  arch/arm/crypto/sha256_glue.c   | 23 ++-
>  arch/arm/crypto/sha256_neon_glue.c  | 34 +++--
>  arch/arm64/crypto/sha2-ce-glue.c| 13 
>  arch/arm64/crypto/sha256-glue.c | 59 
> 

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 6:22 PM, Andy Lutomirski  wrote:
> There are some pieecs of kernel code that want to compute SHA256
> directly without going through the crypto core.  Adjust the exported
> API to decouple it from the crypto core.
>
> I suspect this will very slightly speed up the SHA256 shash operations
> as well by reducing the amount of indirection involved.
>

I should also mention: there's a nice potential cleanup that's
possible on top of this.  Currently, most of the accelerated SHA256
implementations just swap out the block function.  Another approach to
enabling this would be to restructure sha256_update along the lines
of:

sha256_block_fn_t fn = arch_sha256_block_fn(len);
sha256_base_do_update(sctx, data, len, arch_sha256_block_fn(len));

The idea being that arch code can decide whether to use an accelerated
block function based on context (x86, for example, can't always use
xmm regs) and length (on x86, using the accelerated versions for short
digests is very slow due to the state save/restore that happens) and
then the core code can just use it.

This would allow a lot of the boilerplate that this patch was forced
to modify to be deleted outright.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-23 Thread Andy Lutomirski
There are some pieecs of kernel code that want to compute SHA256
directly without going through the crypto core.  Adjust the exported
API to decouple it from the crypto core.

I suspect this will very slightly speed up the SHA256 shash operations
as well by reducing the amount of indirection involved.

Cc: Ard Biesheuvel 
Cc: Herbert Xu 
Signed-off-by: Andy Lutomirski 
---
 arch/arm/crypto/sha2-ce-glue.c  | 10 ---
 arch/arm/crypto/sha256_glue.c   | 23 ++-
 arch/arm/crypto/sha256_neon_glue.c  | 34 +++--
 arch/arm64/crypto/sha2-ce-glue.c| 13 
 arch/arm64/crypto/sha256-glue.c | 59 +
 arch/x86/crypto/sha256_ssse3_glue.c | 46 +
 arch/x86/purgatory/purgatory.c  |  2 +-
 arch/x86/purgatory/sha256.c | 25 ++--
 arch/x86/purgatory/sha256.h | 22 --
 crypto/sha256_generic.c | 50 +++
 include/crypto/sha.h| 29 ++
 include/crypto/sha256_base.h| 40 -
 12 files changed, 184 insertions(+), 169 deletions(-)
 delete mode 100644 arch/x86/purgatory/sha256.h

diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0755b2d657f3..8832c2f85591 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -38,7 +38,7 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 
*data,
return crypto_sha256_arm_update(desc, data, len);
 
kernel_neon_begin();
-   sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
 
@@ -48,17 +48,19 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 
*data,
 static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
 unsigned int len, u8 *out)
 {
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
if (!may_use_simd())
return crypto_sha256_arm_finup(desc, data, len, out);
 
kernel_neon_begin();
if (len)
-   sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha2_ce_transform);
-   sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
+   sha256_base_do_finalize(sctx, (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
 
-   return sha256_base_finish(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 
 static int sha2_ce_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index a84e869ef900..405a29a9a9d3 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -36,27 +36,34 @@ asmlinkage void sha256_block_data_order(u32 *digest, const 
void *data,
 int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
/* make sure casting to sha256_block_fn() is safe */
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
 
-   return sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
(sha256_block_fn *)sha256_block_data_order);
+   return 0;
 }
 EXPORT_SYMBOL(crypto_sha256_arm_update);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int sha256_arm_final(struct shash_desc *desc, u8 *out)
 {
-   sha256_base_do_finalize(desc,
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sha256_base_do_finalize(sctx,
(sha256_block_fn *)sha256_block_data_order);
-   return sha256_base_finish(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 
 int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
 {
-   sha256_base_do_update(desc, data, len,
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha256_block_data_order);
-   return sha256_final(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 EXPORT_SYMBOL(crypto_sha256_arm_finup);
 
@@ -64,7 +71,7 @@ static struct shash_alg algs[] = { {
.digestsize =   SHA256_DIGEST_SIZE,
.init   =   sha256_base_init,
.update =   crypto_sha256_arm_update,
-   .final  =   sha256_final,
+   .final  =   sha256_arm_final,
.finup  =   crypto_sha256_arm_finup,
.descsize   =   sizeof(struct sha256_state),
.base