Re: sha-512...
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): sha512_transform: 0: 9d e3 bc 78 save %sp, -904, %sp git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git b85a088f15f2070b7180735a231012843a5ac96c crypto: sha512 - use standard ror64() -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/3] sha512: reduce stack usage even on i386
On 1/28/12, Herbert Xu herb...@gondor.apana.org.au wrote: On Fri, Jan 27, 2012 at 08:51:30PM +0300, Alexey Dobriyan wrote: I think this is because your tree contained %16 code instead if 15. Now that it contains 15 it should become applicable. OK. -- [PATCH] sha512: reduce stack usage even on i386 Can you try the approach that git takes with using asm to read and write W (see previous email from Linus in respone to my push request)? As it stands your patch is simply relying on gcc's ability to optimise. At least with asm volatile we know that gcc will leave it alone. For some reason it doesn't. :-( I've also tried full barriers. With this patch, stack usage is still ~900 bytes. diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c index dd0439d..35e7ae7 100644 --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -66,16 +66,6 @@ static const u64 sha512_K[80] = { #define s0(x) (ror64(x, 1) ^ ror64(x, 8) ^ (x 7)) #define s1(x) (ror64(x,19) ^ ror64(x,61) ^ (x 6)) -static inline void LOAD_OP(int I, u64 *W, const u8 *input) -{ - W[I] = __be64_to_cpu( ((__be64*)(input))[I] ); -} - -static inline void BLEND_OP(int I, u64 *W) -{ - W[I 15] += s1(W[(I-2) 15]) + W[(I-7) 15] + s0(W[(I-15) 15]); -} - static void sha512_transform(u64 *state, const u8 *input) { @@ -84,26 +74,29 @@ sha512_transform(u64 *state, const u8 *input) int i; u64 W[16]; - /* load the input */ -for (i = 0; i 16; i++) -LOAD_OP(i, W, input); - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ - t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ + { \ + u64 tmp = be64_to_cpu(*((__be64 *)input + (i)));\ + *(volatile u64 *)W[i] = tmp; \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp; \ t2 = e0(a) + Maj(a, b, c); \ d += t1;\ - h = t1 + t2 + h = t1 + t2;\ + } #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ - BLEND_OP(i, W); \ - t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15];\ + { \ + u64 tmp = W[(i) 15] + s1(W[(i-2) 15]) + W[(i-7) 15] + s0(W[(i-15) 15]);\ + *(volatile u64 *)W[(i) 15] = tmp;\ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp; \ t2 = e0(a) + Maj(a, b, c); \ d += t1;\ - h = t1 + t2 + h = t1 + t2;\ + } for (i = 0; i 16; i += 8) { SHA512_0_15(i, a, b, c, d, e, f, g, h); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/3] sha512: reduce stack usage even on i386
On Thu, Jan 26, 2012 at 01:35:02PM +1100, Herbert Xu wrote: On Wed, Jan 18, 2012 at 09:02:10PM +0300, Alexey Dobriyan wrote: Fix still excessive stack usage on i386. There is too much loop unrolling going on, despite W[16] being used, gcc screws up this for some reason. So, don't be smart, use simple code from SHA-512 definition, this keeps code size _and_ stack usage back under control even on i386: -14b: 81 ec 9c 03 00 00 sub$0x39c,%esp +149: 81 ec 64 01 00 00 sub$0x164,%esp $ size ../sha512_generic-i386-00* textdata bss dec hex filename 15521 712 0 162333f69 ../sha512_generic-i386-000.o 4225 712 049371349 ../sha512_generic-i386-001.o Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org Hmm, your patch doesn't apply against my crypto tree. Please regenerate. I think this is because your tree contained %16 code instead if 15. Now that it contains 15 it should become applicable. Anyway. -- [PATCH] sha512: reduce stack usage even on i386 Fix still excessive stack usage on i386. There is too much loop unrolling going on, despite W[16] being used, gcc screws up this for some reason. So, don't be smart, use simple code from SHA-512 definition, this keeps code size _and_ stack usage back under control even on i386: -14b: 81 ec 9c 03 00 00 sub$0x39c,%esp +149: 81 ec 64 01 00 00 sub$0x164,%esp $ size ../sha512_generic-i386-00* textdata bss dec hex filename 15521 712 0 162333f69 ../sha512_generic-i386-000.o 4225 712 049371349 ../sha512_generic-i386-001.o Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- crypto/sha512_generic.c | 42 -- 1 file changed, 20 insertions(+), 22 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -100,35 +100,33 @@ sha512_transform(u64 *state, const u8 *input) #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ t2 = e0(a) + Maj(a, b, c); \ - d += t1;\ - h = t1 + t2 + h = g; \ + g = f; \ + f = e; \ + e = d + t1; \ + d = c; \ + c = b; \ + b = a; \ + a = t1 + t2 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ BLEND_OP(i, W); \ - t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15]; \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i 15]; \ t2 = e0(a) + Maj(a, b, c); \ - d += t1;\ - h = t1 + t2 - - for (i = 0; i 16; i += 8) { + h = g; \ + g = f; \ + f = e; \ + e = d + t1; \ + d = c; \ + c = b; \ + b = a; \ + a = t1 + t2 + + for (i = 0; i 16; i++) { SHA512_0_15(i, a, b, c, d, e, f, g, h); - SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); - SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); - SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); - SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); - SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); - SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); - SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); } - for (i = 16; i 80; i += 8) { + for (i = 16; i 80; i++) { SHA512_16_79(i, a, b, c, d, e, f, g, h); - SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); - SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); - SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); - SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); - SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); - SHA512_16_79(i + 6, c, d, e, f, g, h, a, b); - SHA512_16_79(i + 7, b, c, d, e, f, g, h
[PATCH 4/3] sha512: reduce stack usage even on i386
Fix still excessive stack usage on i386. There is too much loop unrolling going on, despite W[16] being used, gcc screws up this for some reason. So, don't be smart, use simple code from SHA-512 definition, this keeps code size _and_ stack usage back under control even on i386: -14b: 81 ec 9c 03 00 00 sub$0x39c,%esp +149: 81 ec 64 01 00 00 sub$0x164,%esp $ size ../sha512_generic-i386-00* textdata bss dec hex filename 15521 712 0 162333f69 ../sha512_generic-i386-000.o 4225 712 049371349 ../sha512_generic-i386-001.o Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- crypto/sha512_generic.c | 42 -- 1 file changed, 20 insertions(+), 22 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -95,35 +95,33 @@ sha512_transform(u64 *state, const u8 *input) #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ t2 = e0(a) + Maj(a, b, c); \ - d += t1;\ - h = t1 + t2 + h = g; \ + g = f; \ + f = e; \ + e = d + t1; \ + d = c; \ + c = b; \ + b = a; \ + a = t1 + t2 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ BLEND_OP(i, W); \ - t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15]; \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i 15]; \ t2 = e0(a) + Maj(a, b, c); \ - d += t1;\ - h = t1 + t2 - - for (i = 0; i 16; i += 8) { + h = g; \ + g = f; \ + f = e; \ + e = d + t1; \ + d = c; \ + c = b; \ + b = a; \ + a = t1 + t2 + + for (i = 0; i 16; i++) { SHA512_0_15(i, a, b, c, d, e, f, g, h); - SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); - SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); - SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); - SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); - SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); - SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); - SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); } - for (i = 16; i 80; i += 8) { + for (i = 16; i 80; i++) { SHA512_16_79(i, a, b, c, d, e, f, g, h); - SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); - SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); - SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); - SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); - SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); - SHA512_16_79(i + 6, c, d, e, f, g, h, a, b); - SHA512_16_79(i + 7, b, c, d, e, f, g, h, a); } state[0] += a; state[1] += b; state[2] += c; state[3] += d; -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On 1/16/12, Eric Dumazet eric.duma...@gmail.com wrote: Le lundi 16 janvier 2012 à 09:56 +, David Laight a écrit : Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... + for (i = 0; i 16; i += 8) { ... + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } David No overflow since loop is done for only i=0 and i=8 By the way, I suspect previous code was chosen years ago because this version uses less stack but adds much more code bloat. I think W[80] was use because it's the most straightforward way to write this code by following spec. All SHA definitions have full message schedule pseudocoded before hash computation. size crypto/sha512_generic.o crypto/sha512_generic_old.o text data bss dec hex filename 17369 704 0 180734699 crypto/sha512_generic.o 8249 704 0895322f9 crypto/sha512_generic_old.o This is because SHA-512 is fundamentally 64-bit algorithm multiplied by excessive unrolling. Surprisingly, doing variable renaming by hand like in spec: t1 = ... t2 = ... h = g; g = f; f = e; e = d + T1; d = c; c = b; b = a; a = t1 + t2; bring stack space on i386 under control too. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On 1/16/12, David Laight david.lai...@aculab.com wrote: Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ +t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... +for (i = 0; i 16; i += 8) { ... +SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); +} No, why should it? i can be only 0 and 8. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha512: make it work, undo percpu message schedule
On Fri, Jan 13, 2012 at 01:34:13PM +0100, Eric Dumazet wrote: Le vendredi 13 janvier 2012 à 13:33 +0200, Alexey Dobriyan a écrit : On 1/13/12, Eric Dumazet eric.duma...@gmail.com wrote: + static u64 msg_schedule[80]; + static DEFINE_SPINLOCK(msg_schedule_lock); No guys, no. SHA-512 only needs u64[16] running window for message scheduling. I'm sending whitespace mangled patch which is only tested with selfcryptotest passed, so you won't apply something complex. Stackspace usage drops down to like this: -139: 48 81 ec c8 02 00 00sub$0x2c8,%rsp +136: 48 81 ec 18 01 00 00sub$0x118,%rsp --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -21,8 +21,6 @@ #include linux/percpu.h #include asm/byteorder.h -static DEFINE_PER_CPU(u64[80], msg_schedule); - static inline u64 Ch(u64 x, u64 y, u64 z) { return z ^ (x (y ^ z)); @@ -80,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input) static inline void BLEND_OP(int I, u64 *W) { - W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; + W[I%16] = s1(W[(I-2)%16]) + W[(I-7)%16] + s0(W[(I-15)%16]) + W[I%16]; } static void @@ -89,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 *W = get_cpu_var(msg_schedule); + u64 W[16]; /* load the input */ for (i = 0; i 16; i++) LOAD_OP(i, W, input); -for (i = 16; i 80; i++) { -BLEND_OP(i, W); -} - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; - /* now iterate */ - for (i=0; i80; i+=8) { - t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; - t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2; - t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; - t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2; - t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; - t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2; - t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; - t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2; - t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; - t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2; - t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; - t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2; - t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; - t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2; - t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7]; - t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2; +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ + t2 = e0(a) + Maj(a,b,c);\ + d += t1;\ + h = t1 + t2 + +#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ + BLEND_OP(i, W); \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \ + t2 = e0(a) + Maj(a,b,c);\ + d += t1;\ + h = t1 + t2 + + for (i = 0; i 16; i += 8) { + SHA512_0_15(i, a, b, c, d, e, f, g, h); + SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); + SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); + SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); + SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); + SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); + SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } + for (i = 16; i 80; i += 8) { + SHA512_16_79(i, a, b, c, d, e, f, g, h); + SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); + SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); + SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); + SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); + SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); + SHA512_16_79(i + 6, c, d, e, f, g, h, a, b); + SHA512_16_79(i + 7, b, c, d, e, f, g, h, a); } state[0] += a; state[1] += b; state[2] += c; state[3] += d; @@ -128,8 +136,6 @@ sha512_transform(u64 *state, const u8 *input) /* erase our data */ a = b = c = d = e = f = g = h = t1 = t2 = 0; - memset(W, 0, sizeof(__get_cpu_var(msg_schedule))); - put_cpu_var(msg_schedule); } static int Even if its true, its not stable material. stable teams want obvious patches. I understand that. But it _is_ obvious if you see what macro
[PATCH 1/3] sha512: make it work, undo percpu message schedule
commit f9e2bca6c22d75a289a349f869701214d63b5060 aka crypto: sha512 - Move message schedule W[80] to static percpu area created global message schedule area. If sha512_update will ever be entered twice, hash will be silently calculated incorrectly. Probably the easiest way to notice incorrect hashes being calculated is to run 2 ping floods over AH with hmac(sha512): #!/usr/sbin/setkey -f flush; spdflush; add IP1 IP2 ah 25 -A hmac-sha512 0x0025; add IP2 IP1 ah 52 -A hmac-sha512 0x0052; spdadd IP1 IP2 any -P out ipsec ah/transport//require; spdadd IP2 IP1 any -P in ipsec ah/transport//require; XfrmInStateProtoError will start ticking with -EBADMSG being returned from ah_input(). This never happens with, say, hmac(sha1). With patch applied (on BOTH sides), XfrmInStateProtoError does not tick with multiple bidirectional ping flood streams like it doesn't tick with SHA-1. After this patch sha512_transform() will start using ~750 bytes of stack on x86_64. This is OK for simple loads, for something more heavy, stack reduction will be done separatedly. Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- crypto/sha512_generic.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -21,8 +21,6 @@ #include linux/percpu.h #include asm/byteorder.h -static DEFINE_PER_CPU(u64[80], msg_schedule); - static inline u64 Ch(u64 x, u64 y, u64 z) { return z ^ (x (y ^ z)); @@ -89,7 +87,7 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 *W = get_cpu_var(msg_schedule); + u64 W[80]; /* load the input */ for (i = 0; i 16; i++) @@ -128,8 +126,6 @@ sha512_transform(u64 *state, const u8 *input) /* erase our data */ a = b = c = d = e = f = g = h = t1 = t2 = 0; - memset(W, 0, sizeof(__get_cpu_var(msg_schedule))); - put_cpu_var(msg_schedule); } static int -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] sha512: use standard ror64()
Use standard ror64() instead of hand-written. There is no standard ror64, so create it. The difference is shift value being unsigned int instead of uint64_t (for which there is no reason). gcc starts to emit native ROR instructions which it doesn't do for some reason currently. This should make the code faster. Patch survives in-tree crypto test and ping flood with hmac(sha512) on. Signed-off-by: Alexey Dobriyan adobri...@gmail.com --- crypto/sha512_generic.c | 13 - include/linux/bitops.h | 20 2 files changed, 24 insertions(+), 9 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -31,11 +31,6 @@ static inline u64 Maj(u64 x, u64 y, u64 z) return (x y) | (z (x | y)); } -static inline u64 RORu64(u64 x, u64 y) -{ -return (x y) | (x (64 - y)); -} - static const u64 sha512_K[80] = { 0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL, 0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL, @@ -66,10 +61,10 @@ static const u64 sha512_K[80] = { 0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL, }; -#define e0(x) (RORu64(x,28) ^ RORu64(x,34) ^ RORu64(x,39)) -#define e1(x) (RORu64(x,14) ^ RORu64(x,18) ^ RORu64(x,41)) -#define s0(x) (RORu64(x, 1) ^ RORu64(x, 8) ^ (x 7)) -#define s1(x) (RORu64(x,19) ^ RORu64(x,61) ^ (x 6)) +#define e0(x) (ror64(x,28) ^ ror64(x,34) ^ ror64(x,39)) +#define e1(x) (ror64(x,14) ^ ror64(x,18) ^ ror64(x,41)) +#define s0(x) (ror64(x, 1) ^ ror64(x, 8) ^ (x 7)) +#define s1(x) (ror64(x,19) ^ ror64(x,61) ^ (x 6)) static inline void LOAD_OP(int I, u64 *W, const u8 *input) { --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -56,6 +56,26 @@ static inline unsigned long hweight_long(unsigned long w) } /** + * rol64 - rotate a 64-bit value left + * @word: value to rotate + * @shift: bits to roll + */ +static inline __u64 rol64(__u64 word, unsigned int shift) +{ + return (word shift) | (word (64 - shift)); +} + +/** + * ror64 - rotate a 64-bit value right + * @word: value to rotate + * @shift: bits to roll + */ +static inline __u64 ror64(__u64 word, unsigned int shift) +{ + return (word shift) | (word (64 - shift)); +} + +/** * rol32 - rotate a 32-bit value left * @word: value to rotate * @shift: bits to roll -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] sha512: reduce stack usage to safe number
For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16]. Consequently, keeping all W[80] array on stack is unnecessary, only 16 values are really needed. Using W[16] instead of W[80] greatly reduces stack usage (~750 bytes to ~340 bytes on x86_64). Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. Round number is positive, so remainder operation should be without surprises. * initial full message scheduling is trimmed to first 16 values which come from data block, the rest is calculated before it's needed. * original loop body is unrolled version of new SHA512_0_15 and SHA512_16_79 macros, unrolling was done to not do explicit variable renaming. Otherwise it's the very same code after preprocessing. See sha1_transform() code which does the same trick. Patch survives in-tree crypto test and original bugreport test (ping flood with hmac(sha512). See FIPS 180-2 for SHA-512 definition http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- This is patch is for stable if 750 byte stack usage is not considered safe. crypto/sha512_generic.c | 58 1 file changed, 34 insertions(+), 24 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input) static inline void BLEND_OP(int I, u64 *W) { - W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; + W[I % 16] += s1(W[(I-2) % 16]) + W[(I-7) % 16] + s0(W[(I-15) % 16]); } static void @@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 W[80]; + u64 W[16]; /* load the input */ for (i = 0; i 16; i++) LOAD_OP(i, W, input); -for (i = 16; i 80; i++) { -BLEND_OP(i, W); -} - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; - /* now iterate */ - for (i=0; i80; i+=8) { - t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; - t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2; - t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; - t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2; - t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; - t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2; - t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; - t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2; - t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; - t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2; - t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; - t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2; - t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; - t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2; - t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7]; - t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2; +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ + t2 = e0(a) + Maj(a, b, c); \ + d += t1;\ + h = t1 + t2 + +#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ + BLEND_OP(i, W); \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \ + t2 = e0(a) + Maj(a, b, c); \ + d += t1;\ + h = t1 + t2 + + for (i = 0; i 16; i += 8) { + SHA512_0_15(i, a, b, c, d, e, f, g, h); + SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); + SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); + SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); + SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); + SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); + SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } + for (i = 16; i 80; i += 8) { + SHA512_16_79(i, a, b, c, d, e, f, g, h); + SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); + SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); + SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); + SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); + SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); + SHA512_16_79(i + 6, c, d, e, f, g, h, a, b
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On Sat, Jan 14, 2012 at 11:08:45AM -0800, Linus Torvalds wrote: On Sat, Jan 14, 2012 at 10:40 AM, Alexey Dobriyan adobri...@gmail.com wrote: Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. Round number is positive, so remainder operation should be without surprises. Don't use % except on unsigned values. Even if it's positive, if it's a signed number and the compiler doesn't *see* that it is absolutely positive, division is nontrivial. Even when you divide by a constant. For example, % 16 on an 'int' on x86-64 will generate movl%edi, %edx sarl$31, %edx shrl$28, %edx leal(%rdi,%rdx), %eax andl$15, %eax subl%edx, %eax in order to get the signed case right. The fact that the end result is correct for unsigned numbers is irrelevant: it's still stupid and slow. With an unsigned int, '% 16' will generate the obvious andl$15, %eax instead. Quite frankly, stop using division in the first place. Dividing by powers-of-two and expecting the compiler to fix things up is just stupid, *exactly* because of issues like these: you either have to think about it carefully, or the compiler may end up creating crap code. For the record, it generates andl $15 here. So just use 15 instead. That doesn't have these kinds of issues. It is a *good* thing when the C code is close to the end result you want to generate. It is *not* a good thing to write code that looks nothing like the end result and just expect the compiler to do the right thing. Even if the compiler does do the right thing, what was the advantage? Here is updated patch which explicitly uses (equally tested): --- For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16]. Consequently, keeping all W[80] array on stack is unnecessary, only 16 values are really needed. Using W[16] instead of W[80] greatly reduces stack usage (~750 bytes to ~340 bytes on x86_64). Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. * initial full message scheduling is trimmed to first 16 values which come from data block, the rest is calculated right before it's needed. * original loop body is unrolled version of new SHA512_0_15 and SHA512_16_79 macros, unrolling was done to not do explicit variable renaming. Otherwise it's the very same code after preprocessing. See sha1_transform() code which does the same trick. Patch survives in-tree crypto test and original bugreport test (ping flood with hmac(sha512). See FIPS 180-2 for SHA-512 definition http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- This is patch is for stable if 750 byte stack usage is not considered safe. crypto/sha512_generic.c | 58 1 file changed, 34 insertions(+), 24 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input) static inline void BLEND_OP(int I, u64 *W) { - W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; + W[I 15] += s1(W[(I-2) 15]) + W[(I-7) 15] + s0(W[(I-15) 15]); } static void @@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 W[80]; + u64 W[16]; /* load the input */ for (i = 0; i 16; i++) LOAD_OP(i, W, input); -for (i = 16; i 80; i++) { -BLEND_OP(i, W); -} - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; - /* now iterate */ - for (i=0; i80; i+=8) { - t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; - t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2; - t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; - t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2; - t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; - t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2; - t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; - t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2; - t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; - t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2; - t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; - t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2; - t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; - t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2; - t1 = a + e1(f) + Ch(f,g,h
Re: sha512: make it work, undo percpu message schedule
On 1/13/12, Eric Dumazet eric.duma...@gmail.com wrote: + static u64 msg_schedule[80]; + static DEFINE_SPINLOCK(msg_schedule_lock); No guys, no. SHA-512 only needs u64[16] running window for message scheduling. I'm sending whitespace mangled patch which is only tested with selfcryptotest passed, so you won't apply something complex. Stackspace usage drops down to like this: -139: 48 81 ec c8 02 00 00sub$0x2c8,%rsp +136: 48 81 ec 18 01 00 00sub$0x118,%rsp --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -21,8 +21,6 @@ #include linux/percpu.h #include asm/byteorder.h -static DEFINE_PER_CPU(u64[80], msg_schedule); - static inline u64 Ch(u64 x, u64 y, u64 z) { return z ^ (x (y ^ z)); @@ -80,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input) static inline void BLEND_OP(int I, u64 *W) { - W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; + W[I%16] = s1(W[(I-2)%16]) + W[(I-7)%16] + s0(W[(I-15)%16]) + W[I%16]; } static void @@ -89,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 *W = get_cpu_var(msg_schedule); + u64 W[16]; /* load the input */ for (i = 0; i 16; i++) LOAD_OP(i, W, input); -for (i = 16; i 80; i++) { -BLEND_OP(i, W); -} - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; - /* now iterate */ - for (i=0; i80; i+=8) { - t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; - t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2; - t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; - t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2; - t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; - t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2; - t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; - t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2; - t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; - t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2; - t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; - t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2; - t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; - t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2; - t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7]; - t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2; +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ + t2 = e0(a) + Maj(a,b,c);\ + d += t1;\ + h = t1 + t2 + +#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\ + BLEND_OP(i, W); \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \ + t2 = e0(a) + Maj(a,b,c);\ + d += t1;\ + h = t1 + t2 + + for (i = 0; i 16; i += 8) { + SHA512_0_15(i, a, b, c, d, e, f, g, h); + SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); + SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); + SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); + SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); + SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); + SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } + for (i = 16; i 80; i += 8) { + SHA512_16_79(i, a, b, c, d, e, f, g, h); + SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); + SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); + SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); + SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); + SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); + SHA512_16_79(i + 6, c, d, e, f, g, h, a, b); + SHA512_16_79(i + 7, b, c, d, e, f, g, h, a); } state[0] += a; state[1] += b; state[2] += c; state[3] += d; @@ -128,8 +136,6 @@ sha512_transform(u64 *state, const u8 *input) /* erase our data */ a = b = c = d = e = f = g = h = t1 = t2 = 0; - memset(W, 0, sizeof(__get_cpu_var(msg_schedule))); - put_cpu_var(msg_schedule); } static int -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha512: make it work, undo percpu message schedule
On Wed, Jan 11, 2012 at 11:36:11AM +1100, Herbert Xu wrote: On Wed, Jan 11, 2012 at 03:00:40AM +0300, Alexey Dobriyan wrote: commit f9e2bca6c22d75a289a349f869701214d63b5060 aka crypto: sha512 - Move message schedule W[80] to static percpu area created global message schedule area. If sha512_update will ever be entered twice, hilarity ensures. Hmm, do you know why this happens? On the face of it this shouldn't be possible as preemption is disabled. Herbert, I couldn't come up with a single scenario. :-( But the bug is easy to reproduce. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha512: make it work, undo percpu message schedule
On Wed, Jan 11, 2012 at 03:00:40AM +0300, Alexey Dobriyan wrote: - memset(W, 0, sizeof(__get_cpu_var(msg_schedule))); And, yes, this is intentional -- modern gcc pisses on stone age data clearing. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
HMAC and stuff
aalg_list array contains list of approved HMAC algorightms. Do I understand correctly that to update this list some sort of official document like RFC has to be present? For example, it contains hmac(rmd160) entry, but doesn't contain hmac(rmd128) and other RIPEMD functions (there is even test for hmac(rmd128)). Also, kernel has more cryptographic hash functions than there are allowed by ipsec code like Tiger hashes, Whirlpool etc. They are dead code, if IPSec code doesn't user to use them. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add IPSec IP Range in Linux kernel
On Tue, Nov 8, 2011 at 8:24 AM, Peter P Waskiewicz Jr peter.p.waskiewicz...@intel.com wrote: On Mon, 2011-11-07 at 19:10 -0800, Daniil Stolnikov wrote: Hello! Found that the stack IPSec in Linux does not support any IP range. Many people ask this question. The archives say strongswan said that their daemon supports a range, but the Linux IPSec stack supports only the subnets. I am writing to you to implement support for IP range in Linux. I think that a lot more people will appreciate this innovation. It'd be even better if you could write a patch for us to review. oh, come on! changing addr_match() is trivial for ipv4 and easy for ipv6. :-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] whirlpool: count rounds from 0
rc[0] is unused because rounds are counted from 1. Save an u64! Signed-off-by: Alexey Dobriyan adobri...@gmail.com --- crypto/wp512.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) --- a/crypto/wp512.c +++ b/crypto/wp512.c @@ -762,11 +762,17 @@ static const u64 C7[256] = { 0x86228644a411c286ULL, }; -static const u64 rc[WHIRLPOOL_ROUNDS + 1] = { - 0xULL, 0x1823c6e887b8014fULL, 0x36a6d2f5796f9152ULL, - 0x60bc9b8ea30c7b35ULL, 0x1de0d7c22e4bfe57ULL, 0x157737e59ff04adaULL, - 0x58c9290ab1a06b85ULL, 0xbd5d10f4cb3e0567ULL, 0xe427418ba77d95d8ULL, - 0xfbee7c66dd17479eULL, 0xca2dbf07ad5a8333ULL, +static const u64 rc[WHIRLPOOL_ROUNDS] = { + 0x1823c6e887b8014fULL, + 0x36a6d2f5796f9152ULL, + 0x60bc9b8ea30c7b35ULL, + 0x1de0d7c22e4bfe57ULL, + 0x157737e59ff04adaULL, + 0x58c9290ab1a06b85ULL, + 0xbd5d10f4cb3e0567ULL, + 0xe427418ba77d95d8ULL, + 0xfbee7c66dd17479eULL, + 0xca2dbf07ad5a8333ULL, }; /** @@ -793,7 +799,7 @@ static void wp512_process_buffer(struct wp512_ctx *wctx) { state[6] = block[6] ^ (K[6] = wctx-hash[6]); state[7] = block[7] ^ (K[7] = wctx-hash[7]); - for (r = 1; r = WHIRLPOOL_ROUNDS; r++) { + for (r = 0; r WHIRLPOOL_ROUNDS; r++) { L[0] = C0[(int)(K[0] 56) ] ^ C1[(int)(K[7] 48) 0xff] ^ -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] compiler: prevent dead store elimination
On Mon, Mar 1, 2010 at 11:32 AM, Mikael Pettersson mi...@it.uu.se wrote: Arjan van de Ven writes: On Sat, 27 Feb 2010 21:47:42 +0100 Roel Kluin roel.kl...@gmail.com wrote: +void secure_bzero(void *p, size_t n) +{ + memset(p, 0, n); + ARRAY_PREVENT_DSE(p, n); +} +EXPORT_SYMBOL(secure_bzero); please don't introduce bzero again to the kernel; make it secure_memset() please. What's so secure in this function? :^) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
On Mon, Feb 15, 2010 at 10:11 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Mon, Feb 15, 2010 at 09:47:25AM +0200, Alexey Dobriyan wrote: On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au wrote: Is this reproducible every time you unload aes_x86_64 after boot? No, what I do is 1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit 2. modprobe/rmmod all modules (not much) ~1 hour of this workload and it hits sometimes with aes_x86_64, sometimes with aes_generic. Was this with that IPCOMP bug fixed? Yes, ipcomp bug triggers almost immediately. Anyway, this is just description of what I do. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
BUG: unable to handle kernel NULL pointer dereference at 0018 IP: [81145bf4] crypto_remove_spawns+0xd4/0x340 PGD bdc48067 PUD bc954067 PMD 0 Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent CPU 0 Pid: 16500, comm: rmmod Not tainted 2.6.33-rc7-next-20100212+ #9 P5E/P5E RIP: 0010:[81145bf4] [81145bf4] crypto_remove_spawns+0xd4/0x340 RSP: 0018:8800bc9dfde8 EFLAGS: 00010282 RAX: 8800bc901498 RBX: RCX: 8800ba859610 RDX: 8800bc900380 RSI: 8800bc9dfe18 RDI: 8800bc9015c0 RBP: 8800bc9dfe68 R08: R09: R10: R11: R12: 8800bc901488 R13: 8800bc9dfe18 R14: a05817e0 R15: FS: 7fdd2ec1c6f0() GS:88000220() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0018 CR3: bca34000 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process rmmod (pid: 16500, threadinfo 8800bc9de000, task 8800bd53ad90) Stack: 8800bc9dfe08 8800bc9dfe28 8800bc9dfe98 042181636020 0 8800bc9dfe08 8800bc9dfe08 8800bc9015c0 8800bc900380 0 8800ba859808 8800ba859610 8800bc9dfe98 a05817e0 Call Trace: [81145eb1] crypto_remove_alg+0x51/0x60 [81145ef3] crypto_unregister_alg+0x33/0x90 [a058175c] aes_fini+0x10/0x12 [aes_x86_64] [8107266c] sys_delete_module+0x19c/0x250 [8100256b] system_call_fastpath+0x16/0x1b Code: 02 00 eb c3 0f 1f 00 48 8b 47 08 48 8d 75 c0 4c 89 28 49 89 45 08 48 8b 55 c0 e8 a8 fa 02 00 48 8d 45 a0 48 8b 18 48 39 d8 74 44 4c 8b 63 18 4d 39 f4 0f 84 4e 02 00 00 48 8b 13 48 8b 43 08 4c RIP [81145bf4] crypto_remove_spawns+0xd4/0x340 RSP 8800bc9dfde8 CR2: 0018 crypto_remove_spawns: spawn = list_first_entry(spawns, struct crypto_spawn, list); inst = spawn-inst; spawn is NULL here. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au wrote: Is this reproducible every time you unload aes_x86_64 after boot? No, what I do is 1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit 2. modprobe/rmmod all modules (not much) ~1 hour of this workload and it hits sometimes with aes_x86_64, sometimes with aes_generic. Please attach your config file? Full config later, for now it's ipv4 only, XFRM stuff as modules, crypto modules as modules, almost all debugging on. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html