Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Tue, Dec 27, 2016 at 6:16 AM, Daniel Borkmannwrote: > On 12/27/2016 10:58 AM, Herbert Xu wrote: >> >> On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote: >>> >>> >>> According to Daniel, the networking folks want to let embedded systems >>> include BPF without requiring the crypto core. >> >> >> Last I checked the IPv4 stack depended on the crypto API so this >> sounds bogus. > > > I think there's a bit of a mixup here with what I said. To clarify, > requirement back then from tracing folks was that bpf engine and > therefore bpf syscall can be build w/o networking enabled for small > devices, so dependencies preferably need to be kept on a absolute > minimum, same counts for either making it suddenly a depend on > CRYPTO or a select CRYPTO for just those few lines that can be > pulled in from lib/ code instead. Somehow I had that in my head as "networking" not "tracing", probably because of the TCA stuff. Whoops. Anyway, I'm rewriting the crypto part of the patch completely based on Ard's feedback. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On 12/27/2016 10:58 AM, Herbert Xu wrote: On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote: According to Daniel, the networking folks want to let embedded systems include BPF without requiring the crypto core. Last I checked the IPv4 stack depended on the crypto API so this sounds bogus. I think there's a bit of a mixup here with what I said. To clarify, requirement back then from tracing folks was that bpf engine and therefore bpf syscall can be build w/o networking enabled for small devices, so dependencies preferably need to be kept on a absolute minimum, same counts for either making it suddenly a depend on CRYPTO or a select CRYPTO for just those few lines that can be pulled in from lib/ code instead. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote: > > According to Daniel, the networking folks want to let embedded systems > include BPF without requiring the crypto core. Last I checked the IPv4 stack depended on the crypto API so this sounds bogus. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Mon, Dec 26, 2016 at 9:51 AM, Ard Biesheuvelwrote: > On 26 December 2016 at 07:57, Herbert Xu wrote: >> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote: >>> >>> I actually do use incremental hashing later on. BPF currently >>> vmallocs() a big temporary buffer just so it can fill it and hash it. >>> I change it to hash as it goes. >> >> How much data is this supposed to hash on average? If it's a large >> amount then perhaps using the existing crypto API would be a better >> option than adding this. >> > > This is a good point actually: you didn't explain *why* BPF shouldn't > depend on the crypto API. According to Daniel, the networking folks want to let embedded systems include BPF without requiring the crypto core. At some point, I'd also like to use modern hash functions for module verification. If doing so would require the crypto core to be available when modules are loaded, then the crypto core couldn't be modular. (Although it occurs to me that my patches get that wrong -- if this change happens, I need to split the code so that the library functions can be built in even if CRYPTO=m.) Daniel, would you be okay with BPF selecting CRYPTO and CRYPTO_HASH? Also, as a bikeshed thought: I could call the functions sha256_init_direct(), etc. Then there wouldn't be namespace collisions and the fact that they bypass accelerated drivers would be more obvious. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On 26 December 2016 at 07:57, Herbert Xuwrote: > On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote: >> >> I actually do use incremental hashing later on. BPF currently >> vmallocs() a big temporary buffer just so it can fill it and hash it. >> I change it to hash as it goes. > > How much data is this supposed to hash on average? If it's a large > amount then perhaps using the existing crypto API would be a better > option than adding this. > This is a good point actually: you didn't explain *why* BPF shouldn't depend on the crypto API. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote: > > I actually do use incremental hashing later on. BPF currently > vmallocs() a big temporary buffer just so it can fill it and hash it. > I change it to hash as it goes. How much data is this supposed to hash on average? If it's a large amount then perhaps using the existing crypto API would be a better option than adding this. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Sat, Dec 24, 2016 at 2:33 AM, Ard Biesheuvelwrote: > Hi Andy, > > On 24 December 2016 at 02:22, Andy Lutomirski wrote: >> There are some pieecs of kernel code that want to compute SHA256 >> directly without going through the crypto core. Adjust the exported >> API to decouple it from the crypto core. >> > > There are a bunch of things happening at the same time in this patch, > i.e., unnecessary renames of functions with static linkage, return > type changes to the base prototypes (int (*)(...) to void (*)(...)) > and the change for the base functions to take a struct sha256_state > ctx rather than a shash_desc. I suppose you are mainly after the > latter, so could we please drop the other changes? > > For the name clashes, could we simply use the crypto_ prefix for the > globally visible functions rather than using names that are already in > use? (and having to go around clean up the conflicts) > As for the return type changes, the base functions intentionally > return int to allow tail calls from the functions exposed by the > crypto API (whose prototypes cannot be changed). Unlikely to matter in > the grand scheme of things (especially now that the base layer > consists of static inline functions primarily), but it is equally > pointless to go around and change them to return void IMO. > > So what remains is the struct shash_desc to struct sha256_state > change, which makes sense given that you are after a sha256_digest() > function that does not require the crypto API. But it seems your use > case does not rely on incremental hashing, and so there is no reason > for the state to be exposed outside of the implementation, and we > could simply expose a crypto_sha256_digest() routine from the > sha256_generic.c implementation instead. I actually do use incremental hashing later on. BPF currently vmallocs() a big temporary buffer just so it can fill it and hash it. I change it to hash as it goes. I painted the bike shed the other way because I thought that crypto_ names should indicate that they're the versions compatible with the crypto API, but I take your point about churn. Part of the reason I didn't want to use crypto_sha256_update is because that function is currently like this: int crypto_sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len) and I wanted to avoid churn. The sha256_update() functions scattered all over were static, so I didn't worry about them. I'm going to give this another try as a split-up series that tries to avoid making any changes beyond simple function renames to the drivers. > > Also, I strongly feel that crypto and other security related patches > should be tested before being posted, even if they are only RFC, > especially when they are posted by high profile kernel devs like > yourself. (Your code incorrectly calls crypto_sha2_final() in a couple > of places, resulting in the finalization being performed twice, once > with the accelerated block transform and again with the generic > transform) > I tested it, albeit poorly. I wanted feedback on the API (thanks!) and I figured I could more carefully check the implementation once the API survives a bit of review. Since it looks like I have to rework this, I'd need to re-test anyway. >> I suspect this will very slightly speed up the SHA256 shash operations >> as well by reducing the amount of indirection involved. >> > > I think you have a valid point when it comes to the complexity of the > crypto API in general. But the struct sha256_state is embedded in the > shash_desc rather than referred to via a pointer, so the level of > indirection does not appear to change. And given how 99.9% of the > SHA256 execution time is spent in the block transform routine anyway, > I expect the performance delta to be in the noise tbh. s/very slightly/negligibly? There's an extra speedup from avoiding a variable-length stack allocation, but that probably doesn't matter much either. > > Finally, another thing to keep in mind is that the base layers of > SHA-1, SHA-256 and SHA-512 are intentionally structured in the same > way. If there is a need for a digest() entry point, I'd prefer to add > them for all flavours. I want to get sha256 right first. Once it's in good shape, making the same changes to the other variants should be easy. > > Whether this still belongs under crypto or under lib/sha256.c as a > library function (allowing archs to override it) is open for debate. > If hashing BPF programs becomes a hot spot, we probably have bigger > problems. > > Regards, > Ard. > > P.S. I do take your point regarding the arch_sha256_block_transform() > proposed in your follow up email, but there are some details (SIMD, > availability of the instructions etc) that would make it only suitable > for the generic implementation anyway, and the base layer was already > a huge improvement compared to the open coded implementations of the
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
Hi Andy, On 24 December 2016 at 02:22, Andy Lutomirskiwrote: > There are some pieecs of kernel code that want to compute SHA256 > directly without going through the crypto core. Adjust the exported > API to decouple it from the crypto core. > There are a bunch of things happening at the same time in this patch, i.e., unnecessary renames of functions with static linkage, return type changes to the base prototypes (int (*)(...) to void (*)(...)) and the change for the base functions to take a struct sha256_state ctx rather than a shash_desc. I suppose you are mainly after the latter, so could we please drop the other changes? For the name clashes, could we simply use the crypto_ prefix for the globally visible functions rather than using names that are already in use? (and having to go around clean up the conflicts) As for the return type changes, the base functions intentionally return int to allow tail calls from the functions exposed by the crypto API (whose prototypes cannot be changed). Unlikely to matter in the grand scheme of things (especially now that the base layer consists of static inline functions primarily), but it is equally pointless to go around and change them to return void IMO. So what remains is the struct shash_desc to struct sha256_state change, which makes sense given that you are after a sha256_digest() function that does not require the crypto API. But it seems your use case does not rely on incremental hashing, and so there is no reason for the state to be exposed outside of the implementation, and we could simply expose a crypto_sha256_digest() routine from the sha256_generic.c implementation instead. Also, I strongly feel that crypto and other security related patches should be tested before being posted, even if they are only RFC, especially when they are posted by high profile kernel devs like yourself. (Your code incorrectly calls crypto_sha2_final() in a couple of places, resulting in the finalization being performed twice, once with the accelerated block transform and again with the generic transform) > I suspect this will very slightly speed up the SHA256 shash operations > as well by reducing the amount of indirection involved. > I think you have a valid point when it comes to the complexity of the crypto API in general. But the struct sha256_state is embedded in the shash_desc rather than referred to via a pointer, so the level of indirection does not appear to change. And given how 99.9% of the SHA256 execution time is spent in the block transform routine anyway, I expect the performance delta to be in the noise tbh. Finally, another thing to keep in mind is that the base layers of SHA-1, SHA-256 and SHA-512 are intentionally structured in the same way. If there is a need for a digest() entry point, I'd prefer to add them for all flavours. E.g, something like """ @@ -126,3 +128,23 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) *sctx = (struct sha256_state){}; return 0; } + +static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) +{ + unsigned int digest_size = crypto_shash_digestsize(desc->tfm); + struct sha256_state *sctx = shash_desc_ctx(desc); + + return __sha256_base_finish(sctx, out, digest_size); +} + +static inline int sha256_base_digest(const u8 *data, unsigned int len, u8 *out, +sha256_block_fn *block_fn) +{ + struct sha256_state sctx; + + __sha256_base_init(); + sha256_base_do_update(, data, len, block_fn); + sha256_base_do_finalize(, block_fn); + + return __sha256_base_finish(, out, SHA256_DIGEST_SIZE); +} """ (where __sha256_base_init() and __sha256_base_finish() are the existing functions modified to take a struct sha256_state rather than a shash_desc) should be sufficient to allow a generic crypto_sha256_digest() to be composed that does not rely on the crypto API. Whether this still belongs under crypto or under lib/sha256.c as a library function (allowing archs to override it) is open for debate. If hashing BPF programs becomes a hot spot, we probably have bigger problems. Regards, Ard. P.S. I do take your point regarding the arch_sha256_block_transform() proposed in your follow up email, but there are some details (SIMD, availability of the instructions etc) that would make it only suitable for the generic implementation anyway, and the base layer was already a huge improvement compared to the open coded implementations of the SHA boilerplate. > Cc: Ard Biesheuvel > Cc: Herbert Xu > Signed-off-by: Andy Lutomirski > --- > arch/arm/crypto/sha2-ce-glue.c | 10 --- > arch/arm/crypto/sha256_glue.c | 23 ++- > arch/arm/crypto/sha256_neon_glue.c | 34 +++-- > arch/arm64/crypto/sha2-ce-glue.c| 13 > arch/arm64/crypto/sha256-glue.c | 59 >
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On Fri, Dec 23, 2016 at 6:22 PM, Andy Lutomirskiwrote: > There are some pieecs of kernel code that want to compute SHA256 > directly without going through the crypto core. Adjust the exported > API to decouple it from the crypto core. > > I suspect this will very slightly speed up the SHA256 shash operations > as well by reducing the amount of indirection involved. > I should also mention: there's a nice potential cleanup that's possible on top of this. Currently, most of the accelerated SHA256 implementations just swap out the block function. Another approach to enabling this would be to restructure sha256_update along the lines of: sha256_block_fn_t fn = arch_sha256_block_fn(len); sha256_base_do_update(sctx, data, len, arch_sha256_block_fn(len)); The idea being that arch code can decide whether to use an accelerated block function based on context (x86, for example, can't always use xmm regs) and length (on x86, using the accelerated versions for short digests is very slow due to the state save/restore that happens) and then the core code can just use it. This would allow a lot of the boilerplate that this patch was forced to modify to be deleted outright. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
There are some pieecs of kernel code that want to compute SHA256 directly without going through the crypto core. Adjust the exported API to decouple it from the crypto core. I suspect this will very slightly speed up the SHA256 shash operations as well by reducing the amount of indirection involved. Cc: Ard BiesheuvelCc: Herbert Xu Signed-off-by: Andy Lutomirski --- arch/arm/crypto/sha2-ce-glue.c | 10 --- arch/arm/crypto/sha256_glue.c | 23 ++- arch/arm/crypto/sha256_neon_glue.c | 34 +++-- arch/arm64/crypto/sha2-ce-glue.c| 13 arch/arm64/crypto/sha256-glue.c | 59 + arch/x86/crypto/sha256_ssse3_glue.c | 46 + arch/x86/purgatory/purgatory.c | 2 +- arch/x86/purgatory/sha256.c | 25 ++-- arch/x86/purgatory/sha256.h | 22 -- crypto/sha256_generic.c | 50 +++ include/crypto/sha.h| 29 ++ include/crypto/sha256_base.h| 40 - 12 files changed, 184 insertions(+), 169 deletions(-) delete mode 100644 arch/x86/purgatory/sha256.h diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c index 0755b2d657f3..8832c2f85591 100644 --- a/arch/arm/crypto/sha2-ce-glue.c +++ b/arch/arm/crypto/sha2-ce-glue.c @@ -38,7 +38,7 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data, return crypto_sha256_arm_update(desc, data, len); kernel_neon_begin(); - sha256_base_do_update(desc, data, len, + sha256_base_do_update(sctx, data, len, (sha256_block_fn *)sha2_ce_transform); kernel_neon_end(); @@ -48,17 +48,19 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data, static int sha2_ce_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return crypto_sha256_arm_finup(desc, data, len, out); kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, + sha256_base_do_update(sctx, data, len, (sha256_block_fn *)sha2_ce_transform); - sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform); + sha256_base_do_finalize(sctx, (sha256_block_fn *)sha2_ce_transform); kernel_neon_end(); - return sha256_base_finish(desc, out); + return crypto_sha2_final(desc, out); } static int sha2_ce_final(struct shash_desc *desc, u8 *out) diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c index a84e869ef900..405a29a9a9d3 100644 --- a/arch/arm/crypto/sha256_glue.c +++ b/arch/arm/crypto/sha256_glue.c @@ -36,27 +36,34 @@ asmlinkage void sha256_block_data_order(u32 *digest, const void *data, int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data, unsigned int len) { + struct sha256_state *sctx = shash_desc_ctx(desc); + /* make sure casting to sha256_block_fn() is safe */ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); - return sha256_base_do_update(desc, data, len, + sha256_base_do_update(sctx, data, len, (sha256_block_fn *)sha256_block_data_order); + return 0; } EXPORT_SYMBOL(crypto_sha256_arm_update); -static int sha256_final(struct shash_desc *desc, u8 *out) +static int sha256_arm_final(struct shash_desc *desc, u8 *out) { - sha256_base_do_finalize(desc, + struct sha256_state *sctx = shash_desc_ctx(desc); + + sha256_base_do_finalize(sctx, (sha256_block_fn *)sha256_block_data_order); - return sha256_base_finish(desc, out); + return crypto_sha2_final(desc, out); } int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - sha256_base_do_update(desc, data, len, + struct sha256_state *sctx = shash_desc_ctx(desc); + + sha256_base_do_update(sctx, data, len, (sha256_block_fn *)sha256_block_data_order); - return sha256_final(desc, out); + return crypto_sha2_final(desc, out); } EXPORT_SYMBOL(crypto_sha256_arm_finup); @@ -64,7 +71,7 @@ static struct shash_alg algs[] = { { .digestsize = SHA256_DIGEST_SIZE, .init = sha256_base_init, .update = crypto_sha256_arm_update, - .final = sha256_final, + .final = sha256_arm_final, .finup = crypto_sha256_arm_finup, .descsize = sizeof(struct sha256_state), .base