Re: [PATCH 2/3] [eSTREAM] stream: Wrapper for eSTREAM ciphers

2007-11-13 Thread Tan Swee Heng
Hi Herbert,

On Nov 13, 2007 9:43 AM, Herbert Xu [EMAIL PROTECTED] wrote:
 Why couldn't straem ciphers that require an IV just use the
 blkcipher interface? Please enlighten me :)

From what I understand, the blkcipher interface provides functions
like crypto_blkcipher_set_iv() for the caller to set IV. What it does
is to set *iv in blkcipher_tfm to point to the IV buffer. Later this
pointer is passed to desc-info and walk-iv. (Some caller like
dm-crypt.c sets desc-info = iv directly though.) Subsequently
templates like cbc and ctr pick up the IV pointer from walk-iv.
For cbc, the IV is XORed into the input block before calling the
underlying cipher. For ctr, the IV is used to form a counter block
before calling the underlying cipher.

In fact, my stream template patch uses blkcipher in the same way.
However unlike cbc and ctr, stream cannot process the IV. It
must pass it to the underlying eSTREAM cipher's setiv() because each
cipher's setiv() manipulates the IV differently. (Salsa20 uses it in a
counter block; other eSTREAM ciphers mix the IV with the key in their
key expansion.)

So blkcipher is indeed fine for stream ciphers as you stated - I even
use it in stream. The problem is that cipher_alg and cipher_tfm do
not have callbacks for eSTREAM ciphers to expose setiv(). The
estream patch tries to address this issue by introducing
crypto_estream_type, estream_alg and estream_tfm.

I hoped my explanation is clear. The difference in set IV semantics
for block modes and eSTREAM ciphers can be confusing. The patches I've
submitted recently are my solution to this problem. It is probably not
the best solution. If you or any other expert on this list have other
ideas, please discuss and I will try to implement them. (Although the
patches pass tcrypt and seem to embody eSTREAM ciphers rather well, I
just realized they are not usable in dm-crypt as dm-crypt.c explicitly
uses crypto_cipher. Bummer!)

Swee Heng
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [crypto] S390-AES add fallback driver.

2007-11-13 Thread Sebastian Siewior
* Jan Glauber | 2007-11-12 18:04:29 [+]:

Sebastian, thanks for working on this! Do you know if I need other
posted patches that are not yet in cryptodev-2.6 for this to work?
Nope I should work. I tested it on Herbert's cryptodev tree.

I'm asking becuase I'm getting the following crash using tcrypt (aes
192-bit key, ecb-mode) :(
Too bad it doesn't work out of the box :D

Call Trace:
(?02ee5680? 0x2ee5680)
 ?0001008292ae? crypto_ecb_setkey+0x52/0x74 ?ecb?
 ?00010082316e? setkey_fallback_blk+0x5e/0x98 ?aes_s390?
 ?000100886d76? test_cipher+0x2da/0x8f0 ?tcrypt?
 ?00010080570e? init+0x70e/0x1808 ?tcrypt?
 ?000674f4? sys_init_module+0x148/0x1e64
 ?000222f8? sysc_noemu+0x10/0x16
 ?0211ff6e? 0x211ff6e

From my limited understanding of the internals of crypto API I think
this is because crypto_ecb_setkey() calls crypto_cipher_setkey() instead
of crypto_blkcipher_setkey() and the layout of struct blkcipher_tfm
has the *iv where cipher_tfm has the setkey(). And oops, since the *iv
is zero we have a null pointer call. But maybe I'm just missing another 
patch...
Please send me (private if you prefer) a full log and I look into it.

thanks, Jan

Sebastian
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 15:34:33 -0700

 My preferred solution is to make loop unrolling conditional on
 CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
 (first) patch (see attached). This part:

The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
basically for everyone, this is what people get by default
and this is what every distribution uses.

Therefore %99. of folks will get the slowdown.

So in my book this is not an acceptable way to deal with
this problem.
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 18:41, David Miller wrote:
 From: Denys Vlasenko [EMAIL PROTECTED]
 Date: Tue, 13 Nov 2007 15:34:33 -0700

  My preferred solution is to make loop unrolling conditional on
  CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
  (first) patch (see attached). This part:

 The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
 basically for everyone, this is what people get by default
 and this is what every distribution uses.

 Therefore %99. of folks will get the slowdown.

 So in my book this is not an acceptable way to deal with
 this problem.

Loop unrolling here amounts to 25% code growth:

   textdata bss dec hex filename
  21714   0   0   2171454d2 camellia5.o
  15906   0   0   159063e22 camellia5_Os.o

Saving 25% or code size and going 5% slower is perfectly acceptable
tradeof for some users. NB: I'm not saying all, ut some significant
part of users would like to be able to have this choice.

If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
do you have other ideas?
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] [eSTREAM] stream: Wrapper for eSTREAM ciphers

2007-11-13 Thread Herbert Xu
On Wed, Nov 14, 2007 at 01:25:37AM +0800, Tan Swee Heng wrote:

 In fact, my stream template patch uses blkcipher in the same way.
 However unlike cbc and ctr, stream cannot process the IV. It
 must pass it to the underlying eSTREAM cipher's setiv() because each
 cipher's setiv() manipulates the IV differently. (Salsa20 uses it in a
 counter block; other eSTREAM ciphers mix the IV with the key in their
 key expansion.)

I think we're talking past each other :)

What I'm suggesting is that you implement the stream ciphers that
use an IV directly using the blkcipher interface, and not the cipher
interface.  That way you can do whatever you want with the IV.

 So blkcipher is indeed fine for stream ciphers as you stated - I even
 use it in stream. The problem is that cipher_alg and cipher_tfm do
 not have callbacks for eSTREAM ciphers to expose setiv(). The
 estream patch tries to address this issue by introducing
 crypto_estream_type, estream_alg and estream_tfm.

That's right.  Apart from Salsa you shouldn't have to use the cipher
interface at all.  Which means that what the cipher interface lacks
is not a problem :)

Salsa can use the cipher interface because deep down it's a block
cipher.  It's just being used in counter mode.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 19:47:08 -0700

 If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
 do you have other ideas?

Look at ways to make the code run faster without loop unrolling?
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Noriaki TAKAMIYA
Hi,

 Tue, 13 Nov 2007 19:47:08 -0700
 [Subject: Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 
 64bit-ization]
 Denys Vlasenko [EMAIL PROTECTED] wrote...

 On Tuesday 13 November 2007 18:41, David Miller wrote:
  From: Denys Vlasenko [EMAIL PROTECTED]
  Date: Tue, 13 Nov 2007 15:34:33 -0700
 
   My preferred solution is to make loop unrolling conditional on
   CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
   (first) patch (see attached). This part:
 
  The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
  basically for everyone, this is what people get by default
  and this is what every distribution uses.
 
  Therefore %99. of folks will get the slowdown.
 
  So in my book this is not an acceptable way to deal with
  this problem.
 
 Loop unrolling here amounts to 25% code growth:
 
textdata bss dec hex filename
   21714   0   0   2171454d2 camellia5.o
   15906   0   0   159063e22 camellia5_Os.o
 
 Saving 25% or code size and going 5% slower is perfectly acceptable
 tradeof for some users. NB: I'm not saying all, ut some significant
 part of users would like to be able to have this choice.

  IMHO, if you are going to use camellia on the embedded system, size
  of code will be important.

  On the other hand, I think typically the CPU performance is
  restricted on the embedded system, so the performance of code will
  be important...

  I'm not sure 5% slow down is important or not. It will depend on the
  system.

  Regards,

--
Noriaki TAKAMYA
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 20:49, David Miller wrote:
 From: Denys Vlasenko [EMAIL PROTECTED]
 Date: Tue, 13 Nov 2007 19:47:08 -0700

  If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
  do you have other ideas?

 Look at ways to make the code run faster without loop unrolling?

I did it. I noticed that key setup is mostly operating on 64-bit
quantities, and provided alternative implementation which
exploits that fact. It's smaller and faster.

However, after I've done that, the question still stands:
should I unroll the loop or not?

The situation we are in now is exactly the sutiation I want to
avoid:

On Wednesday 07 November 2007 06:22, Denys Vlasenko wrote:
  Having two versions of the cdoe is unmaintainable.  So please
  either decide that 5% is worth it or isn't.

 *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.

 I just want to escape vicious cycle of -Os people arguing with
 -O2 people to no end. I don't want somebody to come later
 and unroll the loop again. And then me to come
 and de-unroll it again...

 It's better for everybody to recognize that both POVs are valid,
 and have provisions for tuning size/speed tradeoff by the user
 (person which builds the binary).

That's why I made a patch where unrolling can be enabled by CONFIG_xxx.

I will resubmit the patch without de-unrolling.
Meanwhile, I'd like to ask you guys to think about ways
to make size/speed tradeoffs selectable at build time.
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 22:30:47 -0700

 On Tuesday 13 November 2007 20:49, David Miller wrote:
  From: Denys Vlasenko [EMAIL PROTECTED]
  Date: Tue, 13 Nov 2007 19:47:08 -0700
 
   If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
   do you have other ideas?
 
  Look at ways to make the code run faster without loop unrolling?
 
 I did it. I noticed that key setup is mostly operating on 64-bit
 quantities, and provided alternative implementation which
 exploits that fact. It's smaller and faster.

Great, then you don't have to unroll the loop and performance
is at least as good as before _and_ you save code space.

It's perfect, you don't need compile time checks or anything
silly like that.

Please submit this new version :-)
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 22:30, Denys Vlasenko wrote:
 I will resubmit the patch without de-unrolling.
 Meanwhile, I'd like to ask you guys to think about ways
 to make size/speed tradeoffs selectable at build time.

Here is the patch which has loops still unrolled,
but otherwise unchanged.

Description:

Use alternative key setup implementation with mostly 64-bit ops
if BITS_PER_LONG = 64. Both much smaller and much faster.

Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
Code was similar, with just one additional if() we can use came code.

Replace (x  0xff) with (u8)x, gcc is not smart enough to realize
that it can do (x  0xff) this way (which is smaller at least on i386).

Don't do (x  0xff) in a few places where x cannot be  255 anyway:
t0 = il  16; v = camellia_sp0222[(t1  8)  0xff];
il16 is u32, (thus t1  8) is one byte!

Signed-off-by: Denys Vlasenko [EMAIL PROTECTED]
--
vda
diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-13 22:47:28.0 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-13 22:57:54.0 -0700
@@ -36,6 +36,13 @@
 #include linux/kernel.h
 #include linux/module.h
 
+#if BITS_PER_LONG = 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include camellia_64.c
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
 do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
 } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
 do {			\
 	il = xl ^ kl;		\
 	ir = xr ^ kr;		\
 	t0 = il  16;		\
 	t1 = ir  16;		\
-	yl = camellia_sp1110[ir  0xff]\
-	   ^ camellia_sp0222[(t1  8)  0xff]			\
-	   ^ camellia_sp3033[t1  0xff]\
-	   ^ camellia_sp4404[(ir  8)  0xff];			\
-	yr = camellia_sp1110[(t0  8)  0xff]			\
-	   ^ camellia_sp0222[t0  0xff]\
-	   ^ camellia_sp3033[(il  8)  0xff]			\
-	   ^ camellia_sp4404[il  0xff];			\
+	yl = camellia_sp1110[(u8)(ir )]			\
+	   ^ camellia_sp0222[(t1  8)]			\
+	   ^ camellia_sp3033[(u8)(t1 )]			\
+	   ^ camellia_sp4404[(u8)(ir  8)];			\
+	yr = camellia_sp1110[(t0  8)]			\
+	   ^ camellia_sp0222[(u8)(t0 )]			\
+	   ^ camellia_sp3033[(u8)(il  8)]			\
+	   ^ camellia_sp4404[(u8)(il )];			\
 	yl ^= yr;		\
 	yr = ROR8(yr);		\
 	yr ^= yl;		\
 } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-do {\
-	t0 = kll;			\
-	t2 = krr;			\
-	t0 = ll;			\
-	t2 |= rr;			\
-	rl ^= t2;			\
-	lr ^= ROL1(t0);			\
-	t3 = krl;			\
-	t1 = klr;			\
-	t3 = rl;			\
-	t1 |= lr;			\
-	ll ^= t1;			\
-	rr ^= ROL1(t3);			\
-} while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-do {\
-	ir =  camellia_sp1110[xr  0xff];\
-	il =  camellia_sp1110[(xl24)  0xff];\
-	ir ^= camellia_sp0222[(xr24)  0xff];\
-	il ^= camellia_sp0222[(xl16)  0xff];\
-	ir ^= camellia_sp3033[(xr16)  0xff];\
-	il ^= camellia_sp3033[(xl8)  0xff];\
-	ir ^= camellia_sp4404[(xr8)  0xff];\
-	il ^= camellia_sp4404[xl  0xff];\
-	il ^= kl;			\
-	ir ^= il ^ kr;			\
-	yl ^= ir;			\
-	yr ^= ROR8(il) ^ ir;		\
-} while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10]  ~subR[8]);
-	dw = tl  subL[8],  /* FL(kl1) */
+	dw = tl  subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,150 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;   /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-