Re: sha-512...

2012-02-15 Thread Alexey Dobriyan
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
 From: Herbert Xu herb...@gondor.hengli.com.au
 Date: Wed, 15 Feb 2012 16:16:08 +1100
 
  OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
  that is expected since we moved W onto the stack.
 
 Right.
 
  I guess we could go back to the percpu solution, what do you
  think?
 
 I'm not entirely sure, we might have to.
 
 sha512 is notorious for generating terrible code with gcc on 32-bit
 targets, so...  The sha512 test in the glibc testsuite tends to
 timeout on 32-bit sparc. :-)

Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):

 sha512_transform:
   0:   9d e3 bc 78 save  %sp, -904, %sp

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
b85a088f15f2070b7180735a231012843a5ac96c
crypto: sha512 - use standard ror64()
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-30 Thread Alexey Dobriyan
On 1/28/12, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Fri, Jan 27, 2012 at 08:51:30PM +0300, Alexey Dobriyan wrote:

 I think this is because your tree contained %16 code instead if 15.
 Now that it contains 15 it should become applicable.

 OK.

 --
 [PATCH] sha512: reduce stack usage even on i386

 Can you try the approach that git takes with using asm to read
 and write W (see previous email from Linus in respone to my push
 request)? As it stands your patch is simply relying on gcc's
 ability to optimise.  At least with asm volatile we know that
 gcc will leave it alone.

For some reason it doesn't. :-( I've also tried full barriers.

With this patch, stack usage is still ~900 bytes.

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index dd0439d..35e7ae7 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -66,16 +66,6 @@ static const u64 sha512_K[80] = {
 #define s0(x)   (ror64(x, 1) ^ ror64(x, 8) ^ (x  7))
 #define s1(x)   (ror64(x,19) ^ ror64(x,61) ^ (x  6))

-static inline void LOAD_OP(int I, u64 *W, const u8 *input)
-{
-   W[I] = __be64_to_cpu( ((__be64*)(input))[I] );
-}
-
-static inline void BLEND_OP(int I, u64 *W)
-{
-   W[I  15] += s1(W[(I-2)  15]) + W[(I-7)  15] + s0(W[(I-15)  15]);
-}
-
 static void
 sha512_transform(u64 *state, const u8 *input)
 {
@@ -84,26 +74,29 @@ sha512_transform(u64 *state, const u8 *input)
int i;
u64 W[16];

-   /* load the input */
-for (i = 0; i  16; i++)
-LOAD_OP(i, W, input);
-
/* load the state into our registers */
a=state[0];   b=state[1];   c=state[2];   d=state[3];
e=state[4];   f=state[5];   g=state[6];   h=state[7];

 #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
+   {   \
+   u64 tmp = be64_to_cpu(*((__be64 *)input + (i)));\
+   *(volatile u64 *)W[i] = tmp;   \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp;   \
t2 = e0(a) + Maj(a, b, c);  \
d += t1;\
-   h = t1 + t2
+   h = t1 + t2;\
+   }

 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
-   BLEND_OP(i, W); \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15];\
+   {   \
+   u64 tmp = W[(i)  15] + s1(W[(i-2)  15]) + W[(i-7)  15] +
s0(W[(i-15)  15]);\
+   *(volatile u64 *)W[(i)  15] = tmp;\
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp;   \
t2 = e0(a) + Maj(a, b, c);  \
d += t1;\
-   h = t1 + t2
+   h = t1 + t2;\
+   }

for (i = 0; i  16; i += 8) {
SHA512_0_15(i, a, b, c, d, e, f, g, h);
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-27 Thread Alexey Dobriyan
On Thu, Jan 26, 2012 at 01:35:02PM +1100, Herbert Xu wrote:
 On Wed, Jan 18, 2012 at 09:02:10PM +0300, Alexey Dobriyan wrote:
  Fix still excessive stack usage on i386.
  
  There is too much loop unrolling going on, despite W[16] being used,
  gcc screws up this for some reason. So, don't be smart, use simple code
  from SHA-512 definition, this keeps code size _and_ stack usage back
  under control even on i386:
  
  -14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
  +149:   81 ec 64 01 00 00   sub$0x164,%esp
  
  $ size ../sha512_generic-i386-00*
 textdata bss dec hex filename
15521 712   0   162333f69 ../sha512_generic-i386-000.o
 4225 712   049371349 ../sha512_generic-i386-001.o
  
  Signed-off-by: Alexey Dobriyan adobri...@gmail.com
  Cc: sta...@vger.kernel.org
 
 Hmm, your patch doesn't apply against my crypto tree.  Please
 regenerate.

I think this is because your tree contained %16 code instead if 15.
Now that it contains 15 it should become applicable.

Anyway.
--
[PATCH] sha512: reduce stack usage even on i386

Fix still excessive stack usage on i386.

There is too much loop unrolling going on, despite W[16] being used,
gcc screws up this for some reason. So, don't be smart, use simple code
from SHA-512 definition, this keeps code size _and_ stack usage back
under control even on i386:

-14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
+149:   81 ec 64 01 00 00   sub$0x164,%esp

$ size ../sha512_generic-i386-00*
   textdata bss dec hex filename
  15521 712   0   162333f69 ../sha512_generic-i386-000.o
   4225 712   049371349 ../sha512_generic-i386-001.o

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 crypto/sha512_generic.c |   42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -100,35 +100,33 @@ sha512_transform(u64 *state, const u8 *input)
 #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
 
 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
BLEND_OP(i, W); \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15]; \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i  15]; \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
-
-   for (i = 0; i  16; i += 8) {
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
+
+   for (i = 0; i  16; i++) {
SHA512_0_15(i, a, b, c, d, e, f, g, h);
-   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
}
-   for (i = 16; i  80; i += 8) {
+   for (i = 16; i  80; i++) {
SHA512_16_79(i, a, b, c, d, e, f, g, h);
-   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_16_79(i + 7, b, c, d, e, f, g, h

[PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-18 Thread Alexey Dobriyan
Fix still excessive stack usage on i386.

There is too much loop unrolling going on, despite W[16] being used,
gcc screws up this for some reason. So, don't be smart, use simple code
from SHA-512 definition, this keeps code size _and_ stack usage back
under control even on i386:

-14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
+149:   81 ec 64 01 00 00   sub$0x164,%esp

$ size ../sha512_generic-i386-00*
   textdata bss dec hex filename
  15521 712   0   162333f69 ../sha512_generic-i386-000.o
   4225 712   049371349 ../sha512_generic-i386-001.o

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 crypto/sha512_generic.c |   42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -95,35 +95,33 @@ sha512_transform(u64 *state, const u8 *input)
 #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
 
 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
BLEND_OP(i, W); \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15]; \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i  15]; \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
-
-   for (i = 0; i  16; i += 8) {
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
+
+   for (i = 0; i  16; i++) {
SHA512_0_15(i, a, b, c, d, e, f, g, h);
-   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
}
-   for (i = 16; i  80; i += 8) {
+   for (i = 16; i  80; i++) {
SHA512_16_79(i, a, b, c, d, e, f, g, h);
-   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
}
 
state[0] += a; state[1] += b; state[2] += c; state[3] += d;
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sha512: reduce stack usage to safe number

2012-01-17 Thread Alexey Dobriyan
On 1/16/12, Eric Dumazet eric.duma...@gmail.com wrote:
 Le lundi 16 janvier 2012 à 09:56 +, David Laight a écrit :
 Doesn't this badly overflow W[] ..

  +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
  +  t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
 ...
  +  for (i = 0; i  16; i += 8) {
 ...
  +  SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
  +  }

  David



 No overflow since loop is done for only i=0 and i=8

 By the way, I suspect previous code was chosen years ago because this
 version uses less stack but adds much more code bloat.

I think W[80] was use because it's the most straightforward way to
write this code
by following spec.

All SHA definitions have full message schedule pseudocoded
before hash computation.

 size crypto/sha512_generic.o crypto/sha512_generic_old.o
text  data bss dec hex filename
   17369   704   0   180734699 crypto/sha512_generic.o
8249   704   0895322f9 crypto/sha512_generic_old.o

This is because SHA-512 is fundamentally 64-bit algorithm multiplied by
excessive unrolling.

Surprisingly, doing variable renaming by hand like in spec:
t1 = ...
t2 = ...
h = g;
g = f;
f = e;
e = d + T1;
d = c;
c = b;
b = a;
a = t1 + t2;

bring stack space on i386 under control too.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sha512: reduce stack usage to safe number

2012-01-16 Thread Alexey Dobriyan
On 1/16/12, David Laight david.lai...@aculab.com wrote:
 Doesn't this badly overflow W[] ..

 +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
 +t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
 ...
 +for (i = 0; i  16; i += 8) {
 ...
 +SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
 +}

No, why should it?
i can be only 0 and 8.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha512: make it work, undo percpu message schedule

2012-01-14 Thread Alexey Dobriyan
On Fri, Jan 13, 2012 at 01:34:13PM +0100, Eric Dumazet wrote:
 Le vendredi 13 janvier 2012 à 13:33 +0200, Alexey Dobriyan a écrit :
  On 1/13/12, Eric Dumazet eric.duma...@gmail.com wrote:
  
   + static u64 msg_schedule[80];
   + static DEFINE_SPINLOCK(msg_schedule_lock);
  
  No guys, no.
  
  SHA-512 only needs u64[16] running window for message scheduling.
  
  I'm sending whitespace mangled patch which is only tested
  with selfcryptotest passed, so you won't apply something complex.
  
  Stackspace usage drops down to like this:
  
  -139:   48 81 ec c8 02 00 00sub$0x2c8,%rsp
  +136:   48 81 ec 18 01 00 00sub$0x118,%rsp
  
  --- a/crypto/sha512_generic.c
  +++ b/crypto/sha512_generic.c
  @@ -21,8 +21,6 @@
   #include linux/percpu.h
   #include asm/byteorder.h
  
  -static DEFINE_PER_CPU(u64[80], msg_schedule);
  -
   static inline u64 Ch(u64 x, u64 y, u64 z)
   {
   return z ^ (x  (y ^ z));
  @@ -80,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input)
  
   static inline void BLEND_OP(int I, u64 *W)
   {
  -   W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
  +   W[I%16] = s1(W[(I-2)%16]) + W[(I-7)%16] + s0(W[(I-15)%16]) + W[I%16];
   }
  
   static void
  @@ -89,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input)
  u64 a, b, c, d, e, f, g, h, t1, t2;
  
  int i;
  -   u64 *W = get_cpu_var(msg_schedule);
  +   u64 W[16];
  
  /* load the input */
   for (i = 0; i  16; i++)
   LOAD_OP(i, W, input);
  
  -for (i = 16; i  80; i++) {
  -BLEND_OP(i, W);
  -}
  -
  /* load the state into our registers */
  a=state[0];   b=state[1];   c=state[2];   d=state[3];
  e=state[4];   f=state[5];   g=state[6];   h=state[7];
  
  -   /* now iterate */
  -   for (i=0; i80; i+=8) {
  -   t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
  -   t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2;
  -   t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
  -   t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2;
  -   t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
  -   t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2;
  -   t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
  -   t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2;
  -   t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
  -   t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2;
  -   t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
  -   t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2;
  -   t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
  -   t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2;
  -   t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7];
  -   t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2;
  +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
  +   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
  +   t2 = e0(a) + Maj(a,b,c);\
  +   d += t1;\
  +   h = t1 + t2
  +
  +#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
  +   BLEND_OP(i, W); \
  +   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \
  +   t2 = e0(a) + Maj(a,b,c);\
  +   d += t1;\
  +   h = t1 + t2
  +
  +   for (i = 0; i  16; i += 8) {
  +   SHA512_0_15(i, a, b, c, d, e, f, g, h);
  +   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
  +   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
  +   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
  +   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
  +   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
  +   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
  +   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
  +   }
  +   for (i = 16; i  80; i += 8) {
  +   SHA512_16_79(i, a, b, c, d, e, f, g, h);
  +   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
  +   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
  +   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
  +   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
  +   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
  +   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
  +   SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
  }
  
  state[0] += a; state[1] += b; state[2] += c; state[3] += d;
  @@ -128,8 +136,6 @@ sha512_transform(u64 *state, const u8 *input)
  
  /* erase our data */
  a = b = c = d = e = f = g = h = t1 = t2 = 0;
  -   memset(W, 0, sizeof(__get_cpu_var(msg_schedule)));
  -   put_cpu_var(msg_schedule);
   }
  
   static int
 
 
 Even if its true, its not stable material.
 
 stable teams want obvious patches.

I understand that.

But it _is_ obvious if you see what macro

[PATCH 1/3] sha512: make it work, undo percpu message schedule

2012-01-14 Thread Alexey Dobriyan
commit f9e2bca6c22d75a289a349f869701214d63b5060
aka crypto: sha512 - Move message schedule W[80] to static percpu area
created global message schedule area.

If sha512_update will ever be entered twice, hash will be silently
calculated incorrectly.

Probably the easiest way to notice incorrect hashes being calculated is
to run 2 ping floods over AH with hmac(sha512):

#!/usr/sbin/setkey -f
flush;
spdflush;
add IP1 IP2 ah 25 -A hmac-sha512 
0x0025;
add IP2 IP1 ah 52 -A hmac-sha512 
0x0052;
spdadd IP1 IP2 any -P out ipsec ah/transport//require;
spdadd IP2 IP1 any -P in  ipsec ah/transport//require;

XfrmInStateProtoError will start ticking with -EBADMSG being returned
from ah_input(). This never happens with, say, hmac(sha1).

With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
with multiple bidirectional ping flood streams like it doesn't tick
with SHA-1.

After this patch sha512_transform() will start using ~750 bytes of stack on 
x86_64.
This is OK for simple loads, for something more heavy, stack reduction will be 
done
separatedly.

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 crypto/sha512_generic.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -21,8 +21,6 @@
 #include linux/percpu.h
 #include asm/byteorder.h
 
-static DEFINE_PER_CPU(u64[80], msg_schedule);
-
 static inline u64 Ch(u64 x, u64 y, u64 z)
 {
 return z ^ (x  (y ^ z));
@@ -89,7 +87,7 @@ sha512_transform(u64 *state, const u8 *input)
u64 a, b, c, d, e, f, g, h, t1, t2;
 
int i;
-   u64 *W = get_cpu_var(msg_schedule);
+   u64 W[80];
 
/* load the input */
 for (i = 0; i  16; i++)
@@ -128,8 +126,6 @@ sha512_transform(u64 *state, const u8 *input)
 
/* erase our data */
a = b = c = d = e = f = g = h = t1 = t2 = 0;
-   memset(W, 0, sizeof(__get_cpu_var(msg_schedule)));
-   put_cpu_var(msg_schedule);
 }
 
 static int
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] sha512: use standard ror64()

2012-01-14 Thread Alexey Dobriyan
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.

The difference is shift value being unsigned int instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.

Patch survives in-tree crypto test and ping flood with hmac(sha512) on.

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
---

 crypto/sha512_generic.c |   13 -
 include/linux/bitops.h  |   20 
 2 files changed, 24 insertions(+), 9 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -31,11 +31,6 @@ static inline u64 Maj(u64 x, u64 y, u64 z)
 return (x  y) | (z  (x | y));
 }
 
-static inline u64 RORu64(u64 x, u64 y)
-{
-return (x  y) | (x  (64 - y));
-}
-
 static const u64 sha512_K[80] = {
 0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
 0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
@@ -66,10 +61,10 @@ static const u64 sha512_K[80] = {
 0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL,
 };
 
-#define e0(x)   (RORu64(x,28) ^ RORu64(x,34) ^ RORu64(x,39))
-#define e1(x)   (RORu64(x,14) ^ RORu64(x,18) ^ RORu64(x,41))
-#define s0(x)   (RORu64(x, 1) ^ RORu64(x, 8) ^ (x  7))
-#define s1(x)   (RORu64(x,19) ^ RORu64(x,61) ^ (x  6))
+#define e0(x)   (ror64(x,28) ^ ror64(x,34) ^ ror64(x,39))
+#define e1(x)   (ror64(x,14) ^ ror64(x,18) ^ ror64(x,41))
+#define s0(x)   (ror64(x, 1) ^ ror64(x, 8) ^ (x  7))
+#define s1(x)   (ror64(x,19) ^ ror64(x,61) ^ (x  6))
 
 static inline void LOAD_OP(int I, u64 *W, const u8 *input)
 {
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -56,6 +56,26 @@ static inline unsigned long hweight_long(unsigned long w)
 }
 
 /**
+ * rol64 - rotate a 64-bit value left
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u64 rol64(__u64 word, unsigned int shift)
+{
+   return (word  shift) | (word  (64 - shift));
+}
+
+/**
+ * ror64 - rotate a 64-bit value right
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u64 ror64(__u64 word, unsigned int shift)
+{
+   return (word  shift) | (word  (64 - shift));
+}
+
+/**
  * rol32 - rotate a 32-bit value left
  * @word: value to rotate
  * @shift: bits to roll
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] sha512: reduce stack usage to safe number

2012-01-14 Thread Alexey Dobriyan
For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 
16].
Consequently, keeping all W[80] array on stack is unnecessary,
only 16 values are really needed.

Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).

Line by line explanation:
* BLEND_OP
  array is circular now, all indexes have to be modulo 16.
  Round number is positive, so remainder operation should be
  without surprises.

* initial full message scheduling is trimmed to first 16 values which
  come from data block, the rest is calculated before it's needed.

* original loop body is unrolled version of new SHA512_0_15 and
  SHA512_16_79 macros, unrolling was done to not do explicit variable
  renaming. Otherwise it's the very same code after preprocessing.
  See sha1_transform() code which does the same trick.

Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).

See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 This is patch is for stable if 750 byte stack usage is not
 considered safe.

 crypto/sha512_generic.c |   58 
 1 file changed, 34 insertions(+), 24 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input)
 
 static inline void BLEND_OP(int I, u64 *W)
 {
-   W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+   W[I % 16] += s1(W[(I-2) % 16]) + W[(I-7) % 16] + s0(W[(I-15) % 16]);
 }
 
 static void
@@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input)
u64 a, b, c, d, e, f, g, h, t1, t2;
 
int i;
-   u64 W[80];
+   u64 W[16];
 
/* load the input */
 for (i = 0; i  16; i++)
 LOAD_OP(i, W, input);
 
-for (i = 16; i  80; i++) {
-BLEND_OP(i, W);
-}
-
/* load the state into our registers */
a=state[0];   b=state[1];   c=state[2];   d=state[3];
e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-   /* now iterate */
-   for (i=0; i80; i+=8) {
-   t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
-   t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2;
-   t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
-   t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2;
-   t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
-   t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2;
-   t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
-   t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2;
-   t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
-   t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2;
-   t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
-   t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2;
-   t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
-   t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2;
-   t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7];
-   t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2;
+#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
+   t2 = e0(a) + Maj(a, b, c);  \
+   d += t1;\
+   h = t1 + t2
+
+#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
+   BLEND_OP(i, W); \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \
+   t2 = e0(a) + Maj(a, b, c);  \
+   d += t1;\
+   h = t1 + t2
+
+   for (i = 0; i  16; i += 8) {
+   SHA512_0_15(i, a, b, c, d, e, f, g, h);
+   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
+   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
+   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
+   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
+   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
+   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
+   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
+   }
+   for (i = 16; i  80; i += 8) {
+   SHA512_16_79(i, a, b, c, d, e, f, g, h);
+   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
+   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
+   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
+   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
+   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
+   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b

Re: [PATCH 2/3] sha512: reduce stack usage to safe number

2012-01-14 Thread Alexey Dobriyan
On Sat, Jan 14, 2012 at 11:08:45AM -0800, Linus Torvalds wrote:
 On Sat, Jan 14, 2012 at 10:40 AM, Alexey Dobriyan adobri...@gmail.com wrote:
 
  Line by line explanation:
  * BLEND_OP
   array is circular now, all indexes have to be modulo 16.
   Round number is positive, so remainder operation should be
   without surprises.
 
 Don't use % except on unsigned values. Even if it's positive, if
 it's a signed number and the compiler doesn't *see* that it is
 absolutely positive, division is nontrivial. Even when you divide by a
 constant.
 
 For example, % 16 on an 'int' on x86-64 will generate
 
   movl%edi, %edx
   sarl$31, %edx
   shrl$28, %edx
   leal(%rdi,%rdx), %eax
   andl$15, %eax
   subl%edx, %eax
 
 in order to get the signed case right. The fact that the end result is
 correct for unsigned numbers is irrelevant: it's still stupid and
 slow.
 
 With an unsigned int, '% 16' will generate the obvious
 
   andl$15, %eax
 
 instead.
 
 Quite frankly, stop using division in the first place. Dividing by
 powers-of-two and expecting the compiler to fix things up is just
 stupid, *exactly* because of issues like these: you either have to
 think about it carefully, or the compiler may end up creating crap
 code.

For the record, it generates andl $15 here.

 So just use  15 instead. That doesn't have these kinds of issues.
 It is a *good* thing when the C code is close to the end result you
 want to generate. It is *not* a good thing to write code that looks
 nothing like the end result and just expect the compiler to do the
 right thing. Even if the compiler does do the right thing, what was
 the advantage?

Here is updated patch which explicitly uses  (equally tested):
---

For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15]
and W[i - 16]. Consequently, keeping all W[80] array on stack is
unnecessary, only 16 values are really needed.

Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).

Line by line explanation:
* BLEND_OP
  array is circular now, all indexes have to be modulo 16.

* initial full message scheduling is trimmed to first 16 values which
  come from data block, the rest is calculated right before it's needed.

* original loop body is unrolled version of new SHA512_0_15 and
  SHA512_16_79 macros, unrolling was done to not do explicit variable
  renaming. Otherwise it's the very same code after preprocessing.
  See sha1_transform() code which does the same trick.

Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).

See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 This is patch is for stable if 750 byte stack usage is not
 considered safe.

 crypto/sha512_generic.c |   58 
 1 file changed, 34 insertions(+), 24 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input)
 
 static inline void BLEND_OP(int I, u64 *W)
 {
-   W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+   W[I  15] += s1(W[(I-2)  15]) + W[(I-7)  15] + s0(W[(I-15)  15]);
 }
 
 static void
@@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input)
u64 a, b, c, d, e, f, g, h, t1, t2;
 
int i;
-   u64 W[80];
+   u64 W[16];
 
/* load the input */
 for (i = 0; i  16; i++)
 LOAD_OP(i, W, input);
 
-for (i = 16; i  80; i++) {
-BLEND_OP(i, W);
-}
-
/* load the state into our registers */
a=state[0];   b=state[1];   c=state[2];   d=state[3];
e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-   /* now iterate */
-   for (i=0; i80; i+=8) {
-   t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
-   t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2;
-   t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
-   t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2;
-   t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
-   t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2;
-   t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
-   t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2;
-   t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
-   t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2;
-   t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
-   t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2;
-   t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
-   t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2;
-   t1 = a + e1(f) + Ch(f,g,h

Re: sha512: make it work, undo percpu message schedule

2012-01-13 Thread Alexey Dobriyan
On 1/13/12, Eric Dumazet eric.duma...@gmail.com wrote:

 + static u64 msg_schedule[80];
 + static DEFINE_SPINLOCK(msg_schedule_lock);

No guys, no.

SHA-512 only needs u64[16] running window for message scheduling.

I'm sending whitespace mangled patch which is only tested
with selfcryptotest passed, so you won't apply something complex.

Stackspace usage drops down to like this:

-139:   48 81 ec c8 02 00 00sub$0x2c8,%rsp
+136:   48 81 ec 18 01 00 00sub$0x118,%rsp

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -21,8 +21,6 @@
 #include linux/percpu.h
 #include asm/byteorder.h

-static DEFINE_PER_CPU(u64[80], msg_schedule);
-
 static inline u64 Ch(u64 x, u64 y, u64 z)
 {
 return z ^ (x  (y ^ z));
@@ -80,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input)

 static inline void BLEND_OP(int I, u64 *W)
 {
-   W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+   W[I%16] = s1(W[(I-2)%16]) + W[(I-7)%16] + s0(W[(I-15)%16]) + W[I%16];
 }

 static void
@@ -89,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input)
u64 a, b, c, d, e, f, g, h, t1, t2;

int i;
-   u64 *W = get_cpu_var(msg_schedule);
+   u64 W[16];

/* load the input */
 for (i = 0; i  16; i++)
 LOAD_OP(i, W, input);

-for (i = 16; i  80; i++) {
-BLEND_OP(i, W);
-}
-
/* load the state into our registers */
a=state[0];   b=state[1];   c=state[2];   d=state[3];
e=state[4];   f=state[5];   g=state[6];   h=state[7];

-   /* now iterate */
-   for (i=0; i80; i+=8) {
-   t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
-   t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2;
-   t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
-   t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2;
-   t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
-   t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2;
-   t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
-   t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2;
-   t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
-   t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2;
-   t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
-   t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2;
-   t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
-   t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2;
-   t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7];
-   t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2;
+#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
+   t2 = e0(a) + Maj(a,b,c);\
+   d += t1;\
+   h = t1 + t2
+
+#define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
+   BLEND_OP(i, W); \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \
+   t2 = e0(a) + Maj(a,b,c);\
+   d += t1;\
+   h = t1 + t2
+
+   for (i = 0; i  16; i += 8) {
+   SHA512_0_15(i, a, b, c, d, e, f, g, h);
+   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
+   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
+   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
+   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
+   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
+   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
+   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
+   }
+   for (i = 16; i  80; i += 8) {
+   SHA512_16_79(i, a, b, c, d, e, f, g, h);
+   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
+   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
+   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
+   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
+   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
+   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
+   SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
}

state[0] += a; state[1] += b; state[2] += c; state[3] += d;
@@ -128,8 +136,6 @@ sha512_transform(u64 *state, const u8 *input)

/* erase our data */
a = b = c = d = e = f = g = h = t1 = t2 = 0;
-   memset(W, 0, sizeof(__get_cpu_var(msg_schedule)));
-   put_cpu_var(msg_schedule);
 }

 static int
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha512: make it work, undo percpu message schedule

2012-01-12 Thread Alexey Dobriyan
On Wed, Jan 11, 2012 at 11:36:11AM +1100, Herbert Xu wrote:
 On Wed, Jan 11, 2012 at 03:00:40AM +0300, Alexey Dobriyan wrote:
  commit f9e2bca6c22d75a289a349f869701214d63b5060
  aka crypto: sha512 - Move message schedule W[80] to static percpu area
  created global message schedule area.
  
  If sha512_update will ever be entered twice, hilarity ensures.
 
 Hmm, do you know why this happens? On the face of it this shouldn't
 be possible as preemption is disabled.

Herbert, I couldn't come up with a single scenario. :-(
But the bug is easy to reproduce.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha512: make it work, undo percpu message schedule

2012-01-10 Thread Alexey Dobriyan
On Wed, Jan 11, 2012 at 03:00:40AM +0300, Alexey Dobriyan wrote:
 - memset(W, 0, sizeof(__get_cpu_var(msg_schedule)));

And, yes, this is intentional -- modern gcc pisses on stone age data clearing.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HMAC and stuff

2011-12-29 Thread Alexey Dobriyan
aalg_list array contains list of approved HMAC algorightms.
Do I understand correctly that to update this list some sort of
official document like RFC has to be present?

For example, it contains hmac(rmd160) entry, but doesn't contain hmac(rmd128)
and other RIPEMD functions (there is even test for hmac(rmd128)).

Also, kernel has more cryptographic hash functions than there are allowed
by ipsec code like Tiger hashes, Whirlpool etc. They are dead code, if
IPSec code doesn't user to use them.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add IPSec IP Range in Linux kernel

2011-11-08 Thread Alexey Dobriyan
On Tue, Nov 8, 2011 at 8:24 AM, Peter P Waskiewicz Jr
peter.p.waskiewicz...@intel.com wrote:
 On Mon, 2011-11-07 at 19:10 -0800, Daniil Stolnikov wrote:
 Hello!

 Found that the stack IPSec in Linux does not support any IP range. Many 
 people ask this question. The archives say strongswan said that their daemon 
 supports a range, but the Linux IPSec stack supports only the subnets. I am 
 writing to you to implement support for IP range in Linux. I think that a 
 lot more people will appreciate this innovation.

 It'd be even better if you could write a patch for us to review.

oh, come on!
changing addr_match() is trivial for ipv4 and easy for ipv6. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] whirlpool: count rounds from 0

2011-09-26 Thread Alexey Dobriyan
rc[0] is unused because rounds are counted from 1.
Save an u64!

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
---

 crypto/wp512.c |   18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

--- a/crypto/wp512.c
+++ b/crypto/wp512.c
@@ -762,11 +762,17 @@ static const u64 C7[256] = {
0x86228644a411c286ULL,
 };
 
-static const u64 rc[WHIRLPOOL_ROUNDS + 1] = {
-   0xULL, 0x1823c6e887b8014fULL, 0x36a6d2f5796f9152ULL,
-   0x60bc9b8ea30c7b35ULL, 0x1de0d7c22e4bfe57ULL, 0x157737e59ff04adaULL,
-   0x58c9290ab1a06b85ULL, 0xbd5d10f4cb3e0567ULL, 0xe427418ba77d95d8ULL,
-   0xfbee7c66dd17479eULL, 0xca2dbf07ad5a8333ULL,
+static const u64 rc[WHIRLPOOL_ROUNDS] = {
+   0x1823c6e887b8014fULL,
+   0x36a6d2f5796f9152ULL,
+   0x60bc9b8ea30c7b35ULL,
+   0x1de0d7c22e4bfe57ULL,
+   0x157737e59ff04adaULL,
+   0x58c9290ab1a06b85ULL,
+   0xbd5d10f4cb3e0567ULL,
+   0xe427418ba77d95d8ULL,
+   0xfbee7c66dd17479eULL,
+   0xca2dbf07ad5a8333ULL,
 };
 
 /**
@@ -793,7 +799,7 @@ static void wp512_process_buffer(struct wp512_ctx *wctx) {
state[6] = block[6] ^ (K[6] = wctx-hash[6]);
state[7] = block[7] ^ (K[7] = wctx-hash[7]);
 
-   for (r = 1; r = WHIRLPOOL_ROUNDS; r++) {
+   for (r = 0; r  WHIRLPOOL_ROUNDS; r++) {
 
L[0] = C0[(int)(K[0]  56)   ] ^
   C1[(int)(K[7]  48)  0xff] ^
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] compiler: prevent dead store elimination

2010-03-01 Thread Alexey Dobriyan
On Mon, Mar 1, 2010 at 11:32 AM, Mikael Pettersson mi...@it.uu.se wrote:
 Arjan van de Ven writes:
   On Sat, 27 Feb 2010 21:47:42 +0100
   Roel Kluin roel.kl...@gmail.com wrote:
    +void secure_bzero(void *p, size_t n)
    +{
    +  memset(p, 0, n);
    +  ARRAY_PREVENT_DSE(p, n);
    +}
    +EXPORT_SYMBOL(secure_bzero);
  
  
   please don't introduce bzero again to the kernel;
  
   make it secure_memset() please.

What's so secure in this function? :^)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

2010-02-15 Thread Alexey Dobriyan
On Mon, Feb 15, 2010 at 10:11 AM, Herbert Xu
herb...@gondor.apana.org.au wrote:
 On Mon, Feb 15, 2010 at 09:47:25AM +0200, Alexey Dobriyan wrote:
 On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au 
 wrote:
  Is this reproducible every time you unload aes_x86_64 after boot?

 No, what I do is

 1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit
 2. modprobe/rmmod all modules (not much)

 ~1 hour of this workload and it hits sometimes with aes_x86_64,
 sometimes with aes_generic.

 Was this with that IPCOMP bug fixed?

Yes, ipcomp bug triggers almost immediately.
Anyway, this is just description of what I do.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

2010-02-14 Thread Alexey Dobriyan
BUG: unable to handle kernel NULL pointer dereference at 0018
IP: [81145bf4] crypto_remove_spawns+0xd4/0x340
PGD bdc48067 PUD bc954067 PMD 0 
Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: 
/sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
CPU 0 
Pid: 16500, comm: rmmod Not tainted 2.6.33-rc7-next-20100212+ #9 P5E/P5E
RIP: 0010:[81145bf4]  [81145bf4] 
crypto_remove_spawns+0xd4/0x340
RSP: 0018:8800bc9dfde8  EFLAGS: 00010282
RAX: 8800bc901498 RBX:  RCX: 8800ba859610
RDX: 8800bc900380 RSI: 8800bc9dfe18 RDI: 8800bc9015c0
RBP: 8800bc9dfe68 R08:  R09: 
R10:  R11:  R12: 8800bc901488
R13: 8800bc9dfe18 R14: a05817e0 R15: 
FS:  7fdd2ec1c6f0() GS:88000220() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0018 CR3: bca34000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process rmmod (pid: 16500, threadinfo 8800bc9de000, task 8800bd53ad90)
Stack:
 8800bc9dfe08 8800bc9dfe28 8800bc9dfe98 042181636020
0 8800bc9dfe08 8800bc9dfe08 8800bc9015c0 8800bc900380
0 8800ba859808 8800ba859610 8800bc9dfe98 a05817e0
Call Trace:
 [81145eb1] crypto_remove_alg+0x51/0x60
 [81145ef3] crypto_unregister_alg+0x33/0x90
 [a058175c] aes_fini+0x10/0x12 [aes_x86_64]
 [8107266c] sys_delete_module+0x19c/0x250
 [8100256b] system_call_fastpath+0x16/0x1b
Code: 02 00 eb c3 0f 1f 00 48 8b 47 08 48 8d 75 c0 4c 89 28 49 89 45 08 48 8b 
55 c0 e8 a8 fa 02 00 48 8d 45 a0 48 8b 18 48 39 d8 74 44 4c 8b 63 18 4d 39 f4 
0f 84 4e 02 00 00 48 8b 13 48 8b 43 08 4c 
RIP  [81145bf4] crypto_remove_spawns+0xd4/0x340
 RSP 8800bc9dfde8
CR2: 0018


crypto_remove_spawns:

spawn = list_first_entry(spawns, struct crypto_spawn, list);
inst = spawn-inst;

spawn is NULL here.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto_remove_spawns: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

2010-02-14 Thread Alexey Dobriyan
On Mon, Feb 15, 2010 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 Is this reproducible every time you unload aes_x86_64 after boot?

No, what I do is

1. setup ipcomp in tunnel mode _in fresh netns_ and immediately exit
2. modprobe/rmmod all modules (not much)

~1 hour of this workload and it hits sometimes with aes_x86_64,
sometimes with aes_generic.

 Please attach your config file?

Full config later, for now it's ipv4 only, XFRM stuff as modules,
crypto modules as modules, almost all debugging on.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html