Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Albert ARIBAUD
Hi Simon,

On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass s...@chromium.org
wrote:

 Hi Albert,
 
 On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD
 albert.u.b...@aribaud.net wrote:
  Hi Simon -- and sorry for the dupe.
 
  On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass s...@chromium.org
  wrote:
 
  I tried using:
 
  #ifdef USE_HOSTCC
 crc = htobe32(crc);
  #else
 crc = cpu_to_be32(crc);
  #endif
 memcpy(output, crc, sizeof(crc));
 
 
  This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
  lot slower due to the overhead of a very small memcpy().
 
  43e2c1d8: e28d1008 add r1, sp, #8
  43e2c1dc: e3a02004 mov r2, #4
  43e2c1e0: e6bf0f30 rev r0, r0
  43e2c1e4: e5210004 str r0, [r1, #-4]!
  43e2c1e8: e1a4 mov r0, r4
  43e2c1ec: eb001af7 bl 43e32dd0 memcpy
 
  How about replacing the memcpy with an explicit put_unaligned(),
  similar to what was done in
 
  http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
 
  with get_unaligned()? The code will be longer than above, but shorter
  than the above plus the memcpy(), and faster too -- actually, I'm
  surprised that the compiler does not unroll the memcpy() on its own,
  considering the size argument is a constant.
 
 Do you mean like this?
 
 #ifdef USE_HOSTCC
crc = htobe32(crc);
memcpy(output, crc, sizeof(crc));
 #else
crc = cpu_to_be32(crc);
put_unaligned(crc, (uint32_t *)output);
 #endif
 
 This produces the same code as my original patch. If this is
 acceptable then I will do that, although it doesn't really seem any
 better.

Wolfgang may not like it any more than put_unaligned_be32() as it
builds upon it (and the disk patch could have used the be32 version as
well). Personally, I think we should allow and use these...

... and work on optimizing their implementation for ARM; we should be
able to reduce such put()s and get()s to a few instructions. And to
avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

 Regards,
 Simon
 
 [and sorry for my dup]

Actually I'm the culprit.

Amicalement,
-- 
Albert.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Wolfgang Denk
Dear Albert ARIBAUD,

In message 20130418082027.4b5ea191@lilith you wrote:
 
  #ifdef USE_HOSTCC
 crc = htobe32(crc);
 memcpy(output, crc, sizeof(crc));
  #else
 crc = cpu_to_be32(crc);
 put_unaligned(crc, (uint32_t *)output);
  #endif
  
  This produces the same code as my original patch. If this is
  acceptable then I will do that, although it doesn't really seem any
  better.
 
 Wolfgang may not like it any more than put_unaligned_be32() as it
 builds upon it (and the disk patch could have used the be32 version as

Indeed.

 well). Personally, I think we should allow and use these...
 
 ... and work on optimizing their implementation for ARM; we should be
 able to reduce such put()s and get()s to a few instructions. And to

OK - and what about the other architectures that suffer from the same
issues?

 avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

I really dislike introducing such custom functions when we have
standard functions available that implement the same purposes.

Checking Linux code (as U-Boot is not representative here):

- find * -name '*.c' | wc -l
18362
- find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l
136

i. e. just 0.75% of the source files actually use any of the
put_unaligned*() variants - it is a highly exotic function.

htobe32() is even worse - just a single source file in the whole Lnux
tree uses it (arch/um/drivers/cow_user.c).


Can we not stick to standard functions?  Instead of htobe32() we
should use htonl() which is available both in U-Boot and in the host
environment.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
While money can't buy happiness, it certainly lets  you  choose  your
own form of misery.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Albert ARIBAUD
Hi Wolfgang,

On Thu, 18 Apr 2013 12:36:00 +0200, Wolfgang Denk w...@denx.de wrote:

 Dear Albert ARIBAUD,
 
 In message 20130418082027.4b5ea191@lilith you wrote:
  
   #ifdef USE_HOSTCC
  crc = htobe32(crc);
  memcpy(output, crc, sizeof(crc));
   #else
  crc = cpu_to_be32(crc);
  put_unaligned(crc, (uint32_t *)output);
   #endif
   
   This produces the same code as my original patch. If this is
   acceptable then I will do that, although it doesn't really seem any
   better.
  
  Wolfgang may not like it any more than put_unaligned_be32() as it
  builds upon it (and the disk patch could have used the be32 version as
 
 Indeed.
 
  well). Personally, I think we should allow and use these...
  
  ... and work on optimizing their implementation for ARM; we should be
  able to reduce such put()s and get()s to a few instructions. And to
 
 OK - and what about the other architectures that suffer from the same
 issues?

They should/could provide their own optimized versions, obviously.

  avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
 
 I really dislike introducing such custom functions when we have
 standard functions available that implement the same purposes.

I understand the point; this is a classical opposition of generic vs
optimized.

 Checking Linux code (as U-Boot is not representative here):
 
   - find * -name '*.c' | wc -l
   18362
   - find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l
   136
 
 i. e. just 0.75% of the source files actually use any of the
 put_unaligned*() variants - it is a highly exotic function.
 
 htobe32() is even worse - just a single source file in the whole Lnux
 tree uses it (arch/um/drivers/cow_user.c).
 
 
 Can we not stick to standard functions?  Instead of htobe32() we
 should use htonl() which is available both in U-Boot and in the host
 environment.

Note that there are two problems here: endianness conversion and
unaligned storage. We can indeed use htoln() instead of htobe32(), but
that only affects/solved endianness issues, not alignment ones, for
which there is no solution that is efficient across all ARM targets.

Note too that if the hash infrastructure mandated that the output
buffer be 4-byte-aligned, i.e. a u32* or similar, we would not even
have to worry about unalignment; such a constraint on the output
buffer makes all the more sense if we consider the fact that current
hash sizes are 4 (crc32), 20 (SHA1) and 32 (SHA256), all multiples of
4, and future hashes will most certainly not be less aligned.

So how about changing the element type of output in the definition of
hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
sha256_csum_wd and crc32_wd_buf, and adapting the output argument
of the sole call to hash_func_ws, that is, the local u8
output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
alignment.

Note finally that we *need* unaligned access macros/inline
functions/whatever, if only for exceptions laid out in
doc/README.arm-unaligned-accesses. If we avoid calling them too often,
we can live with generic.

 Best regards,
 
 Wolfgang Denk

Amicalement,
-- 
Albert.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Wolfgang Denk
Dear Albert,

In message 20130418131810.38c916b8@lilith you wrote:
 
  OK - and what about the other architectures that suffer from the same
  issues?
 
 They should/could provide their own optimized versions, obviously.

Well, if this was a generally needed or even useful service, that
might make sense.  But it is highly exotic stuff...

  I really dislike introducing such custom functions when we have
  standard functions available that implement the same purposes.
 
 I understand the point; this is a classical opposition of generic vs
 optimized.

Not really.  It is more opposition against premature optimization at
the cost of non-standard code.

Is there any justification that we really need hightly optimized
versus generic code here?  Is this used anywhere in a place where
performance is critical?

  Can we not stick to standard functions?  Instead of htobe32() we
  should use htonl() which is available both in U-Boot and in the host
  environment.
 
 Note that there are two problems here: endianness conversion and
 unaligned storage. We can indeed use htoln() instead of htobe32(), but
 that only affects/solved endianness issues, not alignment ones, for
 which there is no solution that is efficient across all ARM targets.

Correct.  But the htobe32() approach was cobining this with the
generic memcpy(), too, so we would do the same with htoln().

 So how about changing the element type of output in the definition of
 hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
 sha256_csum_wd and crc32_wd_buf, and adapting the output argument
 of the sole call to hash_func_ws, that is, the local u8
 output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
 alignment.

OK, but that would be a totally different approach (which appears to
be a good one, here).

 Note finally that we *need* unaligned access macros/inline
 functions/whatever, if only for exceptions laid out in
 doc/README.arm-unaligned-accesses. If we avoid calling them too often,
 we can live with generic.

Agreed; the focus should always be on avoidance, though.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
\|/  \|/ \|/  \|/
 @~/ ,. \~@   @~/ ,. \~@
/_( \__/ )_\ /_( \__/ )_\
   \__U_/   \__U_/
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Albert ARIBAUD
Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk w...@denx.de wrote:

  So how about changing the element type of output in the definition of
  hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
  sha256_csum_wd and crc32_wd_buf, and adapting the output argument
  of the sole call to hash_func_ws, that is, the local u8
  output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
  alignment.
 
 OK, but that would be a totally different approach (which appears to
 be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

- fix the endianness issue: change the endianness of crc through the
  use of htonl() but leave the existing memcpy() in place as it is,
  even though it is not speed-optimized. That's what Simon's patch
  does if the HOSTCC part is ignored;

- fix the unalignment issue by changing parameter 'output' of function
  type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
  code accordingly -- which would lead to replacing the crc32 final
  memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Amicalement,
-- 
Albert.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Tom Rini
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
 Hi Wolfgang,
 
 On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk w...@denx.de wrote:
 
   So how about changing the element type of output in the definition of
   hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
   sha256_csum_wd and crc32_wd_buf, and adapting the output argument
   of the sole call to hash_func_ws, that is, the local u8
   output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
   alignment.
  
  OK, but that would be a totally different approach (which appears to
  be a good one, here).
 
 Indeed; I would suggest splitting this change in two independent ones:
 
 - fix the endianness issue: change the endianness of crc through the
   use of htonl() but leave the existing memcpy() in place as it is,
   even though it is not speed-optimized. That's what Simon's patch
   does if the HOSTCC part is ignored;
 
 - fix the unalignment issue by changing parameter 'output' of function
   type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
   code accordingly -- which would lead to replacing the crc32 final
   memcpy() with a single indirect assignment.
 
 These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue
today, it's just slow, not an unaligned access that leads to abort,
right?  So part one today for release, part two next week after release.

-- 
Tom


signature.asc
Description: Digital signature
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Albert ARIBAUD
Hi Tom,

On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini tr...@ti.com wrote:

 On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
  Hi Wolfgang,
  
  On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk w...@denx.de wrote:
  
So how about changing the element type of output in the definition of
hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
sha256_csum_wd and crc32_wd_buf, and adapting the output argument
of the sole call to hash_func_ws, that is, the local u8
output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
alignment.
   
   OK, but that would be a totally different approach (which appears to
   be a good one, here).
  
  Indeed; I would suggest splitting this change in two independent ones:
  
  - fix the endianness issue: change the endianness of crc through the
use of htonl() but leave the existing memcpy() in place as it is,
even though it is not speed-optimized. That's what Simon's patch
does if the HOSTCC part is ignored;
  
  - fix the unalignment issue by changing parameter 'output' of function
type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
code accordingly -- which would lead to replacing the crc32 final
memcpy() with a single indirect assignment.
  
  These two changes could be submitted either separately, or as a series.
 
 Now so that I'm clear, if we don't do anything about the unaligned issue
 today, it's just slow, not an unaligned access that leads to abort,
 right?  So part one today for release, part two next week after release.

Yes, the current code is just slower than it could be, but works, and
so would it with Simon's patch as long as it keeps the memcpy().

Amicalement,
-- 
Albert.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Simon Glass
Hi,

On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD
albert.u.b...@aribaud.net wrote:
 Hi Tom,

 On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini tr...@ti.com wrote:

 On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
  Hi Wolfgang,
 
  On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk w...@denx.de wrote:
 
So how about changing the element type of output in the definition of
hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
sha256_csum_wd and crc32_wd_buf, and adapting the output argument
of the sole call to hash_func_ws, that is, the local u8
output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
alignment.
  
   OK, but that would be a totally different approach (which appears to
   be a good one, here).
 
  Indeed; I would suggest splitting this change in two independent ones:
 
  - fix the endianness issue: change the endianness of crc through the
use of htonl() but leave the existing memcpy() in place as it is,
even though it is not speed-optimized. That's what Simon's patch
does if the HOSTCC part is ignored;
 
  - fix the unalignment issue by changing parameter 'output' of function
type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
code accordingly -- which would lead to replacing the crc32 final
memcpy() with a single indirect assignment.
 
  These two changes could be submitted either separately, or as a series.

 Now so that I'm clear, if we don't do anything about the unaligned issue
 today, it's just slow, not an unaligned access that leads to abort,
 right?  So part one today for release, part two next week after release.

 Yes, the current code is just slower than it could be, but works, and
 so would it with Simon's patch as long as it keeps the memcpy().

Yes the alignment issue is manufactured IMO by the char * used for the
function signature and I did not feel comfortable declaring that it
must be word aligned. To be it seems that the type should be naturally
aligned. What do people think about that?

Anyway, that's why I ended up with this patch, which I felt was small
enough to be accepted as a bug fix for the release.

Albert's email exactly mirrors my thinking on this - and yes I would
be much more comfortable not having to create part 2 for the release.

Please let me know what I should do with this patch for now.

Regards,
Simon
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-18 Thread Tom Rini
On Thu, Apr 18, 2013 at 12:06:47PM -0700, Simon Glass wrote:
 Hi,
 
 On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD
 albert.u.b...@aribaud.net wrote:
  Hi Tom,
 
  On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini tr...@ti.com wrote:
 
  On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
   Hi Wolfgang,
  
   On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk w...@denx.de wrote:
  
 So how about changing the element type of output in the definition of
 hash_func_ws, adapting the corresponding implementations 
 sha1_csum_wd,
 sha256_csum_wd and crc32_wd_buf, and adapting the output argument
 of the sole call to hash_func_ws, that is, the local u8
 output[HASH_MAX_DIGEST_SIZE]; in hash.c? Then we'be done with
 alignment.
   
OK, but that would be a totally different approach (which appears to
be a good one, here).
  
   Indeed; I would suggest splitting this change in two independent ones:
  
   - fix the endianness issue: change the endianness of crc through the
 use of htonl() but leave the existing memcpy() in place as it is,
 even though it is not speed-optimized. That's what Simon's patch
 does if the HOSTCC part is ignored;
  
   - fix the unalignment issue by changing parameter 'output' of function
 type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
 code accordingly -- which would lead to replacing the crc32 final
 memcpy() with a single indirect assignment.
  
   These two changes could be submitted either separately, or as a series.
 
  Now so that I'm clear, if we don't do anything about the unaligned issue
  today, it's just slow, not an unaligned access that leads to abort,
  right?  So part one today for release, part two next week after release.
 
  Yes, the current code is just slower than it could be, but works, and
  so would it with Simon's patch as long as it keeps the memcpy().
 
 Yes the alignment issue is manufactured IMO by the char * used for the
 function signature and I did not feel comfortable declaring that it
 must be word aligned. To be it seems that the type should be naturally
 aligned. What do people think about that?
 
 Anyway, that's why I ended up with this patch, which I felt was small
 enough to be accepted as a bug fix for the release.
 
 Albert's email exactly mirrors my thinking on this - and yes I would
 be much more comfortable not having to create part 2 for the release.
 
 Please let me know what I should do with this patch for now.

Lets follow the outline Albert made above, two patches, #2 depends on #1
and #1 is now, #2 is next week.

-- 
Tom


signature.asc
Description: Digital signature
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-17 Thread Simon Glass
Hi Wolfgang,

On Tue, Apr 16, 2013 at 10:40 PM, Wolfgang Denk w...@denx.de wrote:
 Dear Simon Glass,

 In message 
 capnjgz2jrvpq56_epvkumh8e1l2lj0hgzncs-grn9bxfosx...@mail.gmail.com you 
 wrote:

  +#ifdef USE_HOSTCC
  + crc = htobe32(crc);
memcpy(output, crc, sizeof(crc));
  +#else
  + put_unaligned_be32(crc, output);
  +#endif
 
  Why is this depending on USE_HOSTCC, and not on the endianess?

 We always want big-endian in this case, since the bytes have to date
 been written that way by the crc32 command.

 Let me rephrase.  Why do we need an #ifdef here, and why depends this
 on USE_HOSTCC?

  And why do we need the #ifdef?  Can we not always use htobe32() and
  put_unaligned_be32() ?

 Well I don't think put_unaligned_be32 is available to user space,
 which is the environment that the tools are built under. It is
 available in the kernel, but that's not our environment.

 It appears put_unaligned_be32() is a widely unknown, pretty exotic
 function that so far has been used in ony two source files:

 drivers/usb/gadget/f_mass_storage.c
 lib/tpm.c

 The implementation (in include/linux/unaligned/generic.h) is ugly
 and pretty expensive in terms of run time and memory footprint.

 I would like to avoid it's usage all together.

It ends up producing this on ARMv7:

43e2c1d8: e1a03820 lsr r3, r0, #16
43e2c1dc: e5c40003 strb r0, [r4, #3]
43e2c1e0: e5c43001 strb r3, [r4, #1]
43e2c1e4: e1a02423 lsr r2, r3, #8
43e2c1e8: e7e73450 ubfx r3, r0, #8, #8
43e2c1ec: e5c42000 strb r2, [r4]
43e2c1f0: e5c43002 strb r3, [r4, #2]

In fact with unaligned support we could optimise this.


 I'm not happy with this solution and would be pleased to find a better
 way, but I'm not sure what it is.

 Does the htobe32() + memcpy() approach not work everywhere?

But htobe32() isn't available in U-Boot.

I tried using:

#ifdef USE_HOSTCC
   crc = htobe32(crc);
#else
   crc = cpu_to_be32(crc);
#endif
   memcpy(output, crc, sizeof(crc));


This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8
43e2c1dc: e3a02004 mov r2, #4
43e2c1e0: e6bf0f30 rev r0, r0
43e2c1e4: e5210004 str r0, [r1, #-4]!
43e2c1e8: e1a4 mov r0, r4
43e2c1ec: eb001af7 bl 43e32dd0 memcpy


 But this patch does fix a real bug which we should sort out before the
 release, one way or another.

 Agreed.

Let me know what you prefer and I will update the patch.

Regards,
Simon
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-17 Thread Albert ARIBAUD
Hi Simon -- and sorry for the dupe.

On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass s...@chromium.org
wrote:

 I tried using:
 
 #ifdef USE_HOSTCC
crc = htobe32(crc);
 #else
crc = cpu_to_be32(crc);
 #endif
memcpy(output, crc, sizeof(crc));
 
 
 This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
 lot slower due to the overhead of a very small memcpy().
 
 43e2c1d8: e28d1008 add r1, sp, #8
 43e2c1dc: e3a02004 mov r2, #4
 43e2c1e0: e6bf0f30 rev r0, r0
 43e2c1e4: e5210004 str r0, [r1, #-4]!
 43e2c1e8: e1a4 mov r0, r4
 43e2c1ec: eb001af7 bl 43e32dd0 memcpy

How about replacing the memcpy with an explicit put_unaligned(),
similar to what was done in

http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

with get_unaligned()? The code will be longer than above, but shorter
than the above plus the memcpy(), and faster too -- actually, I'm
surprised that the compiler does not unroll the memcpy() on its own,
considering the size argument is a constant.

 Regards,
 Simon

Amicalement,
-- 
Albert.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-17 Thread Simon Glass
Hi Albert,

On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD
albert.u.b...@aribaud.net wrote:
 Hi Simon -- and sorry for the dupe.

 On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass s...@chromium.org
 wrote:

 I tried using:

 #ifdef USE_HOSTCC
crc = htobe32(crc);
 #else
crc = cpu_to_be32(crc);
 #endif
memcpy(output, crc, sizeof(crc));


 This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
 lot slower due to the overhead of a very small memcpy().

 43e2c1d8: e28d1008 add r1, sp, #8
 43e2c1dc: e3a02004 mov r2, #4
 43e2c1e0: e6bf0f30 rev r0, r0
 43e2c1e4: e5210004 str r0, [r1, #-4]!
 43e2c1e8: e1a4 mov r0, r4
 43e2c1ec: eb001af7 bl 43e32dd0 memcpy

 How about replacing the memcpy with an explicit put_unaligned(),
 similar to what was done in

 http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

 with get_unaligned()? The code will be longer than above, but shorter
 than the above plus the memcpy(), and faster too -- actually, I'm
 surprised that the compiler does not unroll the memcpy() on its own,
 considering the size argument is a constant.

Do you mean like this?

#ifdef USE_HOSTCC
   crc = htobe32(crc);
   memcpy(output, crc, sizeof(crc));
#else
   crc = cpu_to_be32(crc);
   put_unaligned(crc, (uint32_t *)output);
#endif


This produces the same code as my original patch. If this is
acceptable then I will do that, although it doesn't really seem any
better.

Regards,
Simon

[and sorry for my dup]
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-16 Thread Simon Glass
Hi Wolfgang,

On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk w...@denx.de wrote:
 Dear Simon Glass,

 In message 1365203470-9099-1-git-send-email-...@chromium.org you wrote:
 When crc32 is handled by the hash library, it requires the data to be in
 big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
 command reports incorrect data. For example, previously we might see:


 +#ifdef USE_HOSTCC
 + crc = htobe32(crc);
   memcpy(output, crc, sizeof(crc));
 +#else
 + put_unaligned_be32(crc, output);
 +#endif

 Why is this depending on USE_HOSTCC, and not on the endianess?

We always want big-endian in this case, since the bytes have to date
been written that way by the crc32 command.


 And why do we need the #ifdef?  Can we not always use htobe32() and
 put_unaligned_be32() ?

Well I don't think put_unaligned_be32 is available to user space,
which is the environment that the tools are built under. It is
available in the kernel, but that's not our environment.

I'm not happy with this solution and would be pleased to find a better
way, but I'm not sure what it is.

But this patch does fix a real bug which we should sort out before the
release, one way or another.


 Best regards,

 Wolfgang Denk

Regards,
Simon
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-16 Thread Tom Rini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/16/2013 05:57 PM, Simon Glass wrote:
 Hi Wolfgang,
 
 On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk w...@denx.de wrote:
 Dear Simon Glass,
 
 In message 1365203470-9099-1-git-send-email-...@chromium.org
 you wrote:
 When crc32 is handled by the hash library, it requires the data
 to be in big-endian format, since it reads it byte-wise. Thus
 at present the 'crc32' command reports incorrect data. For
 example, previously we might see:
 
 
 +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output,
 crc, sizeof(crc)); +#else + put_unaligned_be32(crc,
 output); +#endif
 
 Why is this depending on USE_HOSTCC, and not on the endianess?
 
 We always want big-endian in this case, since the bytes have to
 date been written that way by the crc32 command.

In other words, this code is executed on both the host and the target,
and there's not a uniform endian sanitizer function.

- -- 
Tom
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRbdghAAoJENk4IS6UOR1WzzsP/j+JhIIkp9EK4YwDQ7H6F06V
i60/Y/EaVg32HAqMjo66y2GCcu7ak9fobZo3wHnAjrFkaSJWq4s+eDNOtJiHbr/O
ZwQbDJ/t5Yf7vuZ1OhBTII+pYHXYcDW+30QZpBP3+ydY8Zkg7tsAcQ5rH7KdHWuD
83k7DLjv9obn85eeVikNkfB7ONNFVRug07Obcd+jbXdQamc/VWxWZyvUwDiKGCYH
eqmss/hQ7o343FWKqsVNSxd3/tF7z3PNevWm83xJxlc5xbtyJ/8kad6qkkILtOGd
G5OOXnL7BpSqL2mxt5ruW+cwqOnp74SvoQXuM6lNZzeAmlirAginYbnDSvjQamIy
DK1BQAjodXGU7nZxffw4vKZzbGzkgRl2KkuGvQfpMJzoJRyWvrbDzIVOeF/bXXQS
UEvASOFAdqKpMPNJnIm16GUtH6/OyEWK+8HInFse7K19ycM4M/TpM2dhmVZrHG0S
Q+Xq7ZI6pEq0SjMIfhuCwYJS6lqbtlQ5eOk+KoTYXOWneOzOgDGKO+El53wDwKQ5
0icIUwNKn4ZKMg7HLE25Docx3Ez6OVBD2Aelz0wlc5FMWlmLrHe9oYaufDN36bBH
D5XrLeBDjj89mTOHl4V0U3tZ/1iLLFqxo9RyNG6lPkLOQD62vLPxjyPYqY8Q6q85
kPdToR/o/YfFk3EsiYpD
=0O7d
-END PGP SIGNATURE-
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-16 Thread Simon Glass
Hi Tom,

On Tue, Apr 16, 2013 at 4:00 PM, Tom Rini tr...@ti.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 04/16/2013 05:57 PM, Simon Glass wrote:
 Hi Wolfgang,

 On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk w...@denx.de wrote:
 Dear Simon Glass,

 In message 1365203470-9099-1-git-send-email-...@chromium.org
 you wrote:
 When crc32 is handled by the hash library, it requires the data
 to be in big-endian format, since it reads it byte-wise. Thus
 at present the 'crc32' command reports incorrect data. For
 example, previously we might see:


 +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output,
 crc, sizeof(crc)); +#else + put_unaligned_be32(crc,
 output); +#endif

 Why is this depending on USE_HOSTCC, and not on the endianess?

 We always want big-endian in this case, since the bytes have to
 date been written that way by the crc32 command.

 In other words, this code is executed on both the host and the target,
 and there's not a uniform endian sanitizer function.

Well to be honest, I don't think the code is needed on the host, and I
could put the #ifdef around the whole function. I just thought that
might be a bit short-sighted.

Regards,
Simon
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-16 Thread Wolfgang Denk
Dear Simon Glass,

In message capnjgz2jrvpq56_epvkumh8e1l2lj0hgzncs-grn9bxfosx...@mail.gmail.com 
you wrote:
 
  +#ifdef USE_HOSTCC
  + crc = htobe32(crc);
memcpy(output, crc, sizeof(crc));
  +#else
  + put_unaligned_be32(crc, output);
  +#endif
 
  Why is this depending on USE_HOSTCC, and not on the endianess?
 
 We always want big-endian in this case, since the bytes have to date
 been written that way by the crc32 command.

Let me rephrase.  Why do we need an #ifdef here, and why depends this
on USE_HOSTCC?

  And why do we need the #ifdef?  Can we not always use htobe32() and
  put_unaligned_be32() ?
 
 Well I don't think put_unaligned_be32 is available to user space,
 which is the environment that the tools are built under. It is
 available in the kernel, but that's not our environment.

It appears put_unaligned_be32() is a widely unknown, pretty exotic
function that so far has been used in ony two source files:

drivers/usb/gadget/f_mass_storage.c
lib/tpm.c

The implementation (in include/linux/unaligned/generic.h) is ugly
and pretty expensive in terms of run time and memory footprint.

I would like to avoid it's usage all together.

 I'm not happy with this solution and would be pleased to find a better
 way, but I'm not sure what it is.

Does the htobe32() + memcpy() approach not work everywhere?

 But this patch does fix a real bug which we should sort out before the
 release, one way or another.

Agreed.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
I find this a nice feature but it is not according to  the  documen-
tation. Or is it a BUG?   Let's call it an accidental feature. :-)
   - Larry Wall in 6...@jpl-devvax.jpl.nasa.gov
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-06 Thread Wolfgang Denk
Dear Simon Glass,

In message 1365203470-9099-1-git-send-email-...@chromium.org you wrote:
 When crc32 is handled by the hash library, it requires the data to be in
 big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
 command reports incorrect data. For example, previously we might see:


 +#ifdef USE_HOSTCC
 + crc = htobe32(crc);
   memcpy(output, crc, sizeof(crc));
 +#else
 + put_unaligned_be32(crc, output);
 +#endif

Why is this depending on USE_HOSTCC, and not on the endianess?

And why do we need the #ifdef?  Can we not always use htobe32() and
put_unaligned_be32() ?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
UNIX was not designed to stop you from doing stupid things,  because
that would also stop you from doing clever things.   - Doug Gwyn
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


[U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-05 Thread Simon Glass
When crc32 is handled by the hash library, it requires the data to be in
big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
command reports incorrect data. For example, previously we might see:

Peach # crc32 4000 100
CRC32 for 4000 ... 40ff == 0d968558

but instead with the hash library we see:

Peach # crc32 4000 100
CRC32 for 4000 ... 40ff == 5885960d

Correct this.

Signed-off-by: Simon Glass s...@chromium.org
Reviewed-by: Vadim Bendebury vben...@google.com
---
 lib/crc32.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/crc32.c b/lib/crc32.c
index 76205da..94720bf 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -10,6 +10,7 @@
 
 #ifndef USE_HOSTCC
 #include common.h
+#include asm/unaligned.h
 #endif
 #include compiler.h
 #include u-boot/crc.h
@@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int 
ilen,
uint32_t crc;
 
crc = crc32_wd(0, input, ilen, chunk_sz);
+#ifdef USE_HOSTCC
+   crc = htobe32(crc);
memcpy(output, crc, sizeof(crc));
+#else
+   put_unaligned_be32(crc, output);
+#endif
 }
-- 
1.8.1.3

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] crc32: Correct endianness of crc32 result

2013-04-05 Thread Allen Martin
On Fri, Apr 05, 2013 at 04:11:10PM -0700, Simon Glass wrote:
 When crc32 is handled by the hash library, it requires the data to be in
 big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
 command reports incorrect data. For example, previously we might see:
 
 Peach # crc32 4000 100
 CRC32 for 4000 ... 40ff == 0d968558
 
 but instead with the hash library we see:
 
 Peach # crc32 4000 100
 CRC32 for 4000 ... 40ff == 5885960d
 
 Correct this.
 
 Signed-off-by: Simon Glass s...@chromium.org
 Reviewed-by: Vadim Bendebury vben...@google.com
 ---
  lib/crc32.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/lib/crc32.c b/lib/crc32.c
 index 76205da..94720bf 100644
 --- a/lib/crc32.c
 +++ b/lib/crc32.c
 @@ -10,6 +10,7 @@
  
  #ifndef USE_HOSTCC
  #include common.h
 +#include asm/unaligned.h
  #endif
  #include compiler.h
  #include u-boot/crc.h
 @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned 
 int ilen,
   uint32_t crc;
  
   crc = crc32_wd(0, input, ilen, chunk_sz);
 +#ifdef USE_HOSTCC
 + crc = htobe32(crc);
   memcpy(output, crc, sizeof(crc));
 +#else
 + put_unaligned_be32(crc, output);
 +#endif
  }
 -- 
 1.8.1.3
 

Tested on Tegra114 Dalmore, verified crc32 comes out as expected now.

Reviewed-by: Allen Martin amar...@nvidia.com
Tested-by: Allen Martin amar...@nvidia.com

-- 
nvpublic
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot