Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-21 Thread Denys Vlasenko
On Wednesday 21 November 2007 00:12, Herbert Xu wrote:
> On Wed, Nov 21, 2007 at 12:08:57AM -0800, Denys Vlasenko wrote:
> > Yes, with minor modifications "64-bit" version
> > can be compiled and will work correctly on 32-bit CPU.
> > But it will be larger. This is what I got on i386:
> >
> >textdata bss dec hex filename
> >   18230 224   0   184544816 t/crypto/camellia.o
> >   20198 224   0   204224fc6 t_fake64/crypto/camellia.o
>
> What are the size differences on x86-64?

The above sizes were: final code (with all patches applied)
built for i386
versus same code with #if BITS_PER_LONG >= 64 replaced by #if 1,
and a few fixes for "integer is too big for long" warnings)

For 64-bit, replacing that #if is a no-op, sizes
will be the same.

If you are asking about 64-bit size comparison *across patches*
5..8, here they are:

64-bit:
dec  hex   filename
227865902  2.6.23.1.camellia4.t64/crypto/camellia.o
2142253ae  2.6.23.1.camellia5.t64/crypto/camellia.o
163553fe3  2.6.23.1.camellia6.t64/crypto/camellia.o
158133dc5  2.6.23.1.camellia7.t64/crypto/camellia.o
156703d36  2.6.23.1.camellia8.t64/crypto/camellia.o

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-21 Thread Herbert Xu
On Wed, Nov 21, 2007 at 12:08:57AM -0800, Denys Vlasenko wrote:
>
> Yes, with minor modifications "64-bit" version
> can be compiled and will work correctly on 32-bit CPU.
> But it will be larger. This is what I got on i386:
> 
>textdata bss dec hex filename
>   18230 224   0   184544816 t/crypto/camellia.o
>   20198 224   0   204224fc6 t_fake64/crypto/camellia.o

What are the size differences on x86-64?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-21 Thread Denys Vlasenko
On Tuesday 20 November 2007 19:53, Herbert Xu wrote:
> On Sun, Nov 18, 2007 at 08:30:16PM -0800, Denys Vlasenko wrote:
> > Oh, Herbert, have heart, my camellia.c source file is smaller
> > than the one I started from. It's not like it's twice as big.
> > It's smaller already.
> >
> > 64-bit key setup is not just faster, it is also smaller
> > by ~4k, and this benefit is always there, not only when
> > key setup is performed.
>
> The key setup path is the slow path so I don't see why we can't
> just switch to the 64-bit version if it's better.

Yes, with minor modifications "64-bit" version
can be compiled and will work correctly on 32-bit CPU.
But it will be larger. This is what I got on i386:

   textdata bss dec hex filename
  18230 224   0   184544816 t/crypto/camellia.o
  20198 224   0   204224fc6 t_fake64/crypto/camellia.o

> BTW I tried to apply your -6 patch but it doesn't apply against
> cryptodev-2.6 so I had to drop it.

Will correct this and re-post.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-20 Thread Herbert Xu
On Sun, Nov 18, 2007 at 08:30:16PM -0800, Denys Vlasenko wrote:
>
> Oh, Herbert, have heart, my camellia.c source file is smaller
> than the one I started from. It's not like it's twice as big.
> It's smaller already.
> 
> 64-bit key setup is not just faster, it is also smaller
> by ~4k, and this benefit is always there, not only when
> key setup is performed.

The key setup path is the slow path so I don't see why we can't
just switch to the 64-bit version if it's better.

BTW I tried to apply your -6 patch but it doesn't apply against
cryptodev-2.6 so I had to drop it.

Please make sure that you only send one patch per email and that
they apply against cryptodev-2.6.  If the patches have dependencies
then please make that clear as well.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-19 Thread Noriaki TAKAMIYA
Hi,

>> Sun, 18 Nov 2007 20:30:16 -0800
>> [Subject: Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 
>> 64bit-ization]
>> Denys Vlasenko <[EMAIL PROTECTED]> wrote...

> > > camellia6:
> > > unifies encrypt/decrypt routines for different key lengths.
> > > This reduces module size by ~25%, with tiny (less than 1%)
> > > speed impact.
> > > Also collapses encrypt/decrypt into more readable
> > > (visually shorter) form using macros.
> 
> And here is
> 
> camellia7:
> Move "key XOR is end of F-function" code part into
> camellia_setup_tail(), it is sufficiently similar
> between camellia_setup128 and camellia_setup256.
> This shaves off another ~1k:
>   dec hex filename
> 2141453a6 2.6.23.1.camellia6.t/crypto/camellia.o
> 205185026 2.6.23.1.camellia7.t/crypto/camellia.o
> 163553fe3 2.6.23.1.camellia6.t64/crypto/camellia.o
> 158133dc5 2.6.23.1.camellia7.t64/crypto/camellia.o
> 
> 
> At the moment I cannot run test it, try to do it ASAP.
> 
> Takamiya-san, can you review attached patch please?

  Sorry for late reply.

  I think you're testing now:-), and if speed impact is less than 1%
  as you say, I think it is acceptable.

  The smaller code size is, the easier to enable camellia in the
  embedded systems.

  Regards,

Acked-by: Noriaki TAKAMIYA <[EMAIL PROTECTED]>

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-18 Thread Denys Vlasenko
Hi Herbert,

On Sunday 18 November 2007 05:21, Herbert Xu wrote:
> On Wed, Nov 14, 2007 at 02:28:25PM -0700, Denys Vlasenko wrote:
> > I also split this patch into two parts for easier review:
> > camellia5:
> > adds 64-bit key setup
>
> Sorry but this still duplicates way too much code.  Also key
> setup is the slow path relatively speaking so it's even less
> justifiable.

Oh, Herbert, have heart, my camellia.c source file is smaller
than the one I started from. It's not like it's twice as big.
It's smaller already.

64-bit key setup is not just faster, it is also smaller
by ~4k, and this benefit is always there, not only when
key setup is performed.

With attached camellia7 patch, I further reduce the size
of key setup routines by reusing a bit of the code
at the end of them. 2 screenfuls of code less.

I hope it makes code duplication a bit more tolerable.

> > camellia6:
> > unifies encrypt/decrypt routines for different key lengths.
> > This reduces module size by ~25%, with tiny (less than 1%)
> > speed impact.
> > Also collapses encrypt/decrypt into more readable
> > (visually shorter) form using macros.

And here is

camellia7:
Move "key XOR is end of F-function" code part into
camellia_setup_tail(), it is sufficiently similar
between camellia_setup128 and camellia_setup256.
This shaves off another ~1k:
  dec hex filename
2141453a6 2.6.23.1.camellia6.t/crypto/camellia.o
205185026 2.6.23.1.camellia7.t/crypto/camellia.o
163553fe3 2.6.23.1.camellia6.t64/crypto/camellia.o
158133dc5 2.6.23.1.camellia7.t64/crypto/camellia.o


At the moment I cannot run test it, try to do it ASAP.

Takamiya-san, can you review attached patch please?

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
-- 
vda
diff -urpN linux-2.6.23.1.camellia6/crypto/camellia.c linux-2.6.23.1.camellia7/crypto/camellia.c
--- linux-2.6.23.1.camellia6/crypto/camellia.c	2007-11-14 11:30:27.0 -0800
+++ linux-2.6.23.1.camellia7/crypto/camellia.c	2007-11-18 20:15:19.0 -0800
@@ -380,15 +380,80 @@ static const u32 camellia_sp4404[256] = 
 #ifdef __BIG_ENDIAN
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
 #else
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
 #endif
 
-static void camellia_setup_tail(u64 *subkey, int max)
+static void camellia_setup_tail(u64 *subkey, u64 *sub, int max)
 {
+	u64 t;
 	u32 dw;
-	int i = 2;
+	int i;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];   /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];   /* FL(kl1) */
+	SUBKEY(9) = sub[9];   /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16]; /* FL(kl3) */
+	SUBKEY(17) = sub[17]; /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	if (max == 24) {
+		SUBKEY(23) = sub[22]; /* round 18 */
+		SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+	} else { 
+		t = subL(26) ^ (subR(26) & ~subR(24));
+		dw = (u32)t & subL(24); /* FL(kl5) */
+		t = (t << 32) | (subR(26) ^ ROL1(dw));
+		SUBKEY(23) = sub[22] ^ t; /* round 18 */
+		SUBKEY(24) = sub[24]; /* FL(kl5) */
+		SUBKEY(25) = sub[25]; /* FLinv(kl6) */
+		t = subL(23) ^ (subR(23) & ~subR(25));
+		dw = (u32)t & subL(25); /* FLinv(kl6) */
+		t = (t << 32) | (subR(23) ^ ROL1(dw));
+		SUBKEY(26) = t ^ sub[27]; /* round 19 */
+		SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	

Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-18 Thread Herbert Xu
On Wed, Nov 14, 2007 at 02:28:25PM -0700, Denys Vlasenko wrote:
>
> I also split this patch into two parts for easier review:
> camellia5:
> adds 64-bit key setup

Sorry but this still duplicates way too much code.  Also key
setup is the slow path relatively speaking so it's even less
justifiable.

> camellia6:
> unifies encrypt/decrypt routines for different key lengths.
> This reduces module size by ~25%, with tiny (less than 1%)
> speed impact.
> Also collapses encrypt/decrypt into more readable
> (visually shorter) form using macros.

This looks pretty neat though.  I'll merge it unless I hear any
objections.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-14 Thread Denys Vlasenko
On Wednesday 14 November 2007 07:14, Herbert Xu wrote:
> On Wed, Nov 14, 2007 at 12:15:19AM -0700, Denys Vlasenko wrote:
> > Use alternative key setup implementation with mostly 64-bit ops
> > if BITS_PER_LONG >= 64. Both much smaller and much faster.
>
> Can we please not have two versions of the same algorithm in C?
> They're a pain to maintain and test.
>
> Where performance is paramount you could look at doing an assembly
> version.  Unlike two C versions at least that can be easily tested
> by someone who has access to the platform in question.

Having two versions, one in C and another in assembly cannot be easier
than two C versions. Moreover, asm version will be arch specific -
one needs to write separate amd64/ppc64/sparc64/etc versions.
It means even more versions to maintain.

It would be faster too, though, and I think it makes sense to do it
for most popular arches sometime in future.

What I have now is a generic 64-bit C implentation which is
likely to be much faster and a bit smaller than 32-bit one
on _all_ 64-bit arches. For i386 it's 33% faster.

I think this win is big enough to justify having two versions.

I think that you are right that having separate camellia_64.c
with substantial duplication is bad. I reworked ot so that
both 32-bit and 64-bit code is now in camellia.c,
and I removed (merged) all duplicated stuff (constants, macros,
and whole encryption/decryption part).

I also split this patch into two parts for easier review:
camellia5:
adds 64-bit key setup
camellia6:
unifies encrypt/decrypt routines for different key lengths.
This reduces module size by ~25%, with tiny (less than 1%)
speed impact.
Also collapses encrypt/decrypt into more readable
(visually shorter) form using macros.

Compiled it on i385 and amd64:

   textdata bss dec hex filename
  29724 224   0   2994874fc 2.6.23.1.camellia.t/crypto/camellia.o
  29233 224   0   294577311 2.6.23.1.camellia5.t/crypto/camellia.o
  21190 224   0   2141453a6 2.6.23.1.camellia6.t/crypto/camellia.o

  22498 288   0   227865902 2.6.23.1.camellia.t64/crypto/camellia.o
  21134 288   0   2142253ae 2.6.23.1.camellia5.t64/crypto/camellia.o
  16067 288   0   163553fe3 2.6.23.1.camellia6.t64/crypto/camellia.o

Takamiya-san, can you review attached patches please?

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-14 12:30:27.0 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-14 12:30:27.0 -0700
@@ -310,6 +310,589 @@ static const u32 camellia_sp4404[256] = 
 #define CAMELLIA_BLOCK_SIZE  16
 #define CAMELLIA_TABLE_BYTE_LEN 272
 
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+
+#if BITS_PER_LONG >= 64
+
+/*
+ * Key setup implementation with mostly 64-bit ops
+ */
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+} while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)\
+do {		\
+	w = l;		\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+} while(0)
+
+#define CAMELLIA_F(x, k, y, i)	\
+do {			\
+	u32 yl, yr;		\
+	i = x ^ k;		\
+	yl = camellia_sp1110[(u8)i]\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[(i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;		\
+	yr = ROR8(yr);		\
+	yr ^= yl;		\
+	y = ((u64)yl << 32) + yr;\
+} while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)

Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-14 Thread Herbert Xu
On Wed, Nov 14, 2007 at 12:15:19AM -0700, Denys Vlasenko wrote:
>
> Use alternative key setup implementation with mostly 64-bit ops
> if BITS_PER_LONG >= 64. Both much smaller and much faster.

Can we please not have two versions of the same algorithm in C?
They're a pain to maintain and test.

Where performance is paramount you could look at doing an assembly
version.  Unlike two C versions at least that can be easily tested
by someone who has access to the platform in question.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 23:10, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Tue, 13 Nov 2007 22:30:47 -0700
>
> > On Tuesday 13 November 2007 20:49, David Miller wrote:
> > > From: Denys Vlasenko <[EMAIL PROTECTED]>
> > > Date: Tue, 13 Nov 2007 19:47:08 -0700
> > >
> > > > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > > > do you have other ideas?
> > >
> > > Look at ways to make the code run faster without loop unrolling?
> >
> > I did it. I noticed that key setup is mostly operating on 64-bit
> > quantities, and provided alternative implementation which
> > exploits that fact. It's smaller and faster.
>
> Great, then you don't have to unroll the loop and performance
> is at least as good as before _and_ you save code space.

Unfortunately, it's applicable only to key setup,
and unrolling happens in actual encryption.

But the point still stands: irrespective of other optimizations,
unrolled and non-unrolled forms will still have different sizes
and speeds, and in some cases (like this one) you can't
pick one form which fits all.

> Please submit this new version :-)

Just did it. It's linux-2.6.23.1.camellia5.diff
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 22:30, Denys Vlasenko wrote:
> I will resubmit the patch without de-unrolling.
> Meanwhile, I'd like to ask you guys to think about ways
> to make size/speed tradeoffs selectable at build time.

Here is the patch which has loops still unrolled,
but otherwise unchanged.

Description:

Use alternative key setup implementation with mostly 64-bit ops
if BITS_PER_LONG >= 64. Both much smaller and much faster.

Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
Code was similar, with just one additional if() we can use came code.

Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
that it can do (x & 0xff) this way (which is smaller at least on i386).

Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
il16 is u32, (thus t1 >> 8) is one byte!

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-13 22:47:28.0 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-13 22:57:54.0 -0700
@@ -36,6 +36,13 @@
 #include 
 #include 
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
 do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
 } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
 do {			\
 	il = xl ^ kl;		\
 	ir = xr ^ kr;		\
 	t0 = il >> 16;		\
 	t1 = ir >> 16;		\
-	yl = camellia_sp1110[ir & 0xff]\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir )]			\
+	   ^ camellia_sp0222[(t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1 )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[(t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0 )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il )];			\
 	yl ^= yr;		\
 	yr = ROR8(yr);		\
 	yr ^= yl;		\
 } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-do {\
-	t0 = kll;			\
-	t2 = krr;			\
-	t0 &= ll;			\
-	t2 |= rr;			\
-	rl ^= t2;			\
-	lr ^= ROL1(t0);			\
-	t3 = krl;			\
-	t1 = klr;			\
-	t3 &= rl;			\
-	t1 |= lr;			\
-	ll ^= t1;			\
-	rr ^= ROL1(t3);			\
-} while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-do {\
-	ir =  camellia_sp1110[xr & 0xff];\
-	il =  camellia_sp1110[(xl>>24) & 0xff];\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];\
-	il ^= camellia_sp4404[xl & 0xff];\
-	il ^= kl;			\
-	ir ^= il ^ kr;			\
-	yl ^= ir;			\
-	yr ^= ROR8(il) ^ ir;		\
-} while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,150 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;   /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),

Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 22:30:47 -0700

> On Tuesday 13 November 2007 20:49, David Miller wrote:
> > From: Denys Vlasenko <[EMAIL PROTECTED]>
> > Date: Tue, 13 Nov 2007 19:47:08 -0700
> >
> > > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > > do you have other ideas?
> >
> > Look at ways to make the code run faster without loop unrolling?
> 
> I did it. I noticed that key setup is mostly operating on 64-bit
> quantities, and provided alternative implementation which
> exploits that fact. It's smaller and faster.

Great, then you don't have to unroll the loop and performance
is at least as good as before _and_ you save code space.

It's perfect, you don't need compile time checks or anything
silly like that.

Please submit this new version :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 20:49, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Tue, 13 Nov 2007 19:47:08 -0700
>
> > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > do you have other ideas?
>
> Look at ways to make the code run faster without loop unrolling?

I did it. I noticed that key setup is mostly operating on 64-bit
quantities, and provided alternative implementation which
exploits that fact. It's smaller and faster.

However, after I've done that, the question still stands:
should I unroll the loop or not?

The situation we are in now is exactly the sutiation I want to
avoid:

On Wednesday 07 November 2007 06:22, Denys Vlasenko wrote:
> > Having two versions of the cdoe is unmaintainable.  So please
> > either decide that 5% is worth it or isn't.
>
> *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.
>
> I just want to escape vicious cycle of -Os people arguing with
> -O2 people to no end. I don't want somebody to come later
> and unroll the loop again. And then me to come
> and de-unroll it again...
>
> It's better for everybody to recognize that both POVs are valid,
> and have provisions for tuning size/speed tradeoff by the user
> (person which builds the binary).

That's why I made a patch where unrolling can be enabled by CONFIG_xxx.

I will resubmit the patch without de-unrolling.
Meanwhile, I'd like to ask you guys to think about ways
to make size/speed tradeoffs selectable at build time.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Noriaki TAKAMIYA
Hi,

>> Tue, 13 Nov 2007 19:47:08 -0700
>> [Subject: Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 
>> 64bit-ization]
>> Denys Vlasenko <[EMAIL PROTECTED]> wrote...

> On Tuesday 13 November 2007 18:41, David Miller wrote:
> > From: Denys Vlasenko <[EMAIL PROTECTED]>
> > Date: Tue, 13 Nov 2007 15:34:33 -0700
> >
> > > My preferred solution is to make loop unrolling conditional on
> > > CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> > > (first) patch (see attached). This part:
> >
> > The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
> > basically for everyone, this is what people get by default
> > and this is what every distribution uses.
> >
> > Therefore %99. of folks will get the slowdown.
> >
> > So in my book this is not an acceptable way to deal with
> > this problem.
> 
> Loop unrolling here amounts to 25% code growth:
> 
>textdata bss dec hex filename
>   21714   0   0   2171454d2 camellia5.o
>   15906   0   0   159063e22 camellia5_Os.o
> 
> Saving 25% or code size and going 5% slower is perfectly acceptable
> tradeof for some users. NB: I'm not saying all, ut some significant
> part of users would like to be able to have this choice.

  IMHO, if you are going to use camellia on the embedded system, size
  of code will be important.

  On the other hand, I think typically the CPU performance is
  restricted on the embedded system, so the performance of code will
  be important...

  I'm not sure 5% slow down is important or not. It will depend on the
  system.

  Regards,

--
Noriaki TAKAMYA
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 19:47:08 -0700

> If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> do you have other ideas?

Look at ways to make the code run faster without loop unrolling?
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 18:41, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Tue, 13 Nov 2007 15:34:33 -0700
>
> > My preferred solution is to make loop unrolling conditional on
> > CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> > (first) patch (see attached). This part:
>
> The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
> basically for everyone, this is what people get by default
> and this is what every distribution uses.
>
> Therefore %99. of folks will get the slowdown.
>
> So in my book this is not an acceptable way to deal with
> this problem.

Loop unrolling here amounts to 25% code growth:

   textdata bss dec hex filename
  21714   0   0   2171454d2 camellia5.o
  15906   0   0   159063e22 camellia5_Os.o

Saving 25% or code size and going 5% slower is perfectly acceptable
tradeof for some users. NB: I'm not saying all, ut some significant
part of users would like to be able to have this choice.

If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
do you have other ideas?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-13 Thread David Miller
From: Denys Vlasenko <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:34:33 -0700

> My preferred solution is to make loop unrolling conditional on
> CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> (first) patch (see attached). This part:

The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
basically for everyone, this is what people get by default
and this is what every distribution uses.

Therefore %99. of folks will get the slowdown.

So in my book this is not an acceptable way to deal with
this problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-12 Thread Noriaki TAKAMIYA
Hi,

  sorry, again.

>> Tue, 13 Nov 2007 15:07:02 +0900 (JST) 
>> [Subject: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 
>> 64bit-ization] 
>> Noriaki TAKAMIYA <[EMAIL PROTECTED]> wrote...

> > I'd like to hear the opinion of the author.
> > 
> > Takamiya-san, what do you think about this change?
> 
>   For IPsec processing, I think performance is important.
> 
>   If this fix improves the performance, it is acceptable.

  I misunderstood the meaning. If this fix decreases the performance,
  I wouldn't prefer this patch(and the below is also one of the
  reason).

>   But, there are many duplicate decralations between camellia.c and
>   camellia_64.c...
>   (e.g., CAMELLIA_MIN_KEY_SIZE and so on)

  Regards,

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-12 Thread Noriaki TAKAMIYA
Hi,

  Sorry for late reply

>> Thu, 8 Nov 2007 21:30:20 +0800 頃、
>> [Subject: Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization] において、
>> Herbert Xu <[EMAIL PROTECTED]>さんが書きました

> On Wed, Nov 07, 2007 at 01:22:52PM +, Denys Vlasenko wrote:
> >
> > *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.
> 
> I'd like to hear the opinion of the author.
> 
> Takamiya-san, what do you think about this change?

  For IPsec processing, I think performance is important.

  If this fix improves the performance, it is acceptable.

  But, there are many duplicate decralations between camellia.c and
  camellia_64.c...
  (e.g., CAMELLIA_MIN_KEY_SIZE and so on)

  Regards,

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-08 Thread Herbert Xu
On Wed, Nov 07, 2007 at 01:22:52PM +, Denys Vlasenko wrote:
>
> *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.

I'd like to hear the opinion of the author.

Takamiya-san, what do you think about this change?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-07 Thread Denys Vlasenko
On Tuesday 06 November 2007 14:23, Herbert Xu wrote:
> On Thu, Oct 25, 2007 at 12:48:29PM +0100, Denys Vlasenko wrote:
> > On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > > Hi Hervert,
> > > 
> > > Please review and maybe propagate upstream following patches.
> > > 
> > > camellia5.diff
> > > Use alternative key setup implementation with mostly 64-bit ops
> > > if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > > 
> > > Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> > > Code was similar, with just one additional if() we can use came code.
> > > 
> > > If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> > > use loop in camellia_do_en/decrypt instead of unrolled code.
> > > ~5% encrypt/decrypt slowdown.
> 
> Having two versions of the cdoe is unmaintainable.  So please
> either decide that 5% is worth it or isn't.

*I* am happy with 5% speed sacrifice. I'm afraid other people won't be.

I just want to escape vicious cycle of -Os people arguing with
-O2 people to no end. I don't want somebody to come later
and unroll the loop again. And then me to come
and de-unroll it again...

It's better for everybody to recognize that both POVs are valid,
and have provisions for tuning size/speed tradeoff by the user
(person which builds the binary).

That's why I made a patch where unrolling can be enabled by CONFIG_xxx.

If you disagree with me and don't want this type of selectability,
the updated patch is attached.

Thanks!
--
vda
--- linux-2.6.23.src/crypto/camellia4.c	2007-10-24 19:03:57.0 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-11-07 13:06:48.0 +
@@ -36,6 +36,13 @@
 #include 
 #include 
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
 do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
 } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
 do {			\
 	il = xl ^ kl;		\
 	ir = xr ^ kr;		\
 	t0 = il >> 16;		\
 	t1 = ir >> 16;		\
-	yl = camellia_sp1110[ir & 0xff]\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir )]			\
+	   ^ camellia_sp0222[(t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1 )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[(t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0 )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il )];			\
 	yl ^= yr;		\
 	yr = ROR8(yr);		\
 	yr ^= yl;		\
 } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-do {\
-	t0 = kll;			\
-	t2 = krr;			\
-	t0 &= ll;			\
-	t2 |= rr;			\
-	rl ^= t2;			\
-	lr ^= ROL1(t0);			\
-	t3 = krl;			\
-	t1 = klr;			\
-	t3 &= rl;			\
-	t1 |= lr;			\
-	ll ^= t1;			\
-	rr ^= ROL1(t3);			\
-} while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-do {\
-	ir =  camellia_sp1110[xr & 0xff];\
-	il =  camellia_sp1110[(xl>>24) & 0xff];\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];\
-	il ^= camellia_sp4404[xl & 0xff];\
-	il ^= kl;			\
-	ir ^= il ^ kr;			\
-	yl ^= ir;			\
-	yr ^= ROR8(il) ^ ir;		\
-} while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,144 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;   /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMEL

Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-11-06 Thread Herbert Xu
On Thu, Oct 25, 2007 at 12:48:29PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia5.diff
> > Use alternative key setup implementation with mostly 64-bit ops
> > if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > 
> > Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> > Code was similar, with just one additional if() we can use came code.
> > 
> > If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> > use loop in camellia_do_en/decrypt instead of unrolled code.
> > ~5% encrypt/decrypt slowdown.

Having two versions of the cdoe is unmaintainable.  So please
either decide that 5% is worth it or isn't.

The rest of this patch looks fine.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization

2007-10-26 Thread Noriaki TAKAMIYA
>> Thu, 25 Oct 2007 12:48:29 +0100
>> [Subject: [PATCH 5/5] camellia: de-unrolling, 64bit-ization]
>> Denys Vlasenko <[EMAIL PROTECTED]> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia5.diff
> > Use alternative key setup implementation with mostly 64-bit ops
> > if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > 
> > Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> > Code was similar, with just one additional if() we can use came code.
> > 
> > If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> > use loop in camellia_do_en/decrypt instead of unrolled code.
> > ~5% encrypt/decrypt slowdown.
> > 
> > Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
> > that it can do (x & 0xff) this way (which is smaller at least on i386).
> > 
> > Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
> > t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
> > il16 is u32, (thus t1 >> 8) is one byte!
> 
> Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
> --
> vda
Acked-by: Noriaki TAKAMIYA <[EMAIL PROTECTED]>

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html