Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-16 Thread Michael Cree
On 16/07/12 09:27, Jonathan Nieder wrote:
> Michael Cree wrote:
> 
>> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote:
> 
>>>   gcc takes full advantage by converting the get_be32
>>> calls back to a load and bswap and producing a whole bunch of
>>> unaligned access traps.
>>
>> Alpha does not have a bswap (or similar) instruction.
> [...]
>>.  If the pointer is unaligned
>> then two neighbouring aligned 32bit loads are required to ensure
>> that all four bytes are loaded.  If the pointer is aligned then a
>> single 32bit load gets all the four bytes.  Having never looked at
>> the generated assembler code, I nevertheless suspect that is the
>> guts of the optimisation --- the compiler can eliminate an access to
>> memory if it knows the pointer is aligned.
> 
> How about:
> 
>   gcc takes full advantage by using a single 32-bit load,
>   resulting in a whole bunch of unaligned access traps.

Yes, that's good.

I just checked the generated assembler and I am correct except that I
underestimated the number of memory accesses for the a priori known
unaligned data case.  It is actually doing four long-word memory access,
i.e. a memory access for each individual byte despite that it only needs
to do a maximum of two, so the optimisation for the a priori known
aligned data saves three memory accesses, not just one.  (And probably
also saves on a load-load replay trap on the EV6 and later CPUs, i.e., a
complete and very costly replay of the whole pipeline when the CPU
realises just in the nick of time that it has cocked up the
interrelationships between reordered load instructions.)

Cheers
Michael.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-15 Thread Jonathan Nieder
Hi,

Michael Cree wrote:

> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote:

>>   gcc takes full advantage by converting the get_be32
>> calls back to a load and bswap and producing a whole bunch of
>> unaligned access traps.
>
> Alpha does not have a bswap (or similar) instruction.
[...]
>.  If the pointer is unaligned
> then two neighbouring aligned 32bit loads are required to ensure
> that all four bytes are loaded.  If the pointer is aligned then a
> single 32bit load gets all the four bytes.  Having never looked at
> the generated assembler code, I nevertheless suspect that is the
> guts of the optimisation --- the compiler can eliminate an access to
> memory if it knows the pointer is aligned.

How about:

gcc takes full advantage by using a single 32-bit load,
resulting in a whole bunch of unaligned access traps.

You can see the generated assembler with "make block-sha1/sha1.s" and
reading the resulting sha1.s file in the current working directory
(oops).  I don't have an alpha cross-compiler installed or access to
an alpha to check for myself.

-- >8 --
Subject: Makefile: fix location of listing produced by "make subdir/foo.s"

When I invoke "make block-sha1/sha1.s", 'make' runs $(CC) -S without
specifying where it should put its output and the output ends up in
./sha1.s.  Confusing.

Add an -o option to the .s rule to fix this.  We were already doing
that for most compiler invocations but had forgotten it for the
assembler listings.

Signed-off-by: Jonathan Nieder 
---
Thanks,
Jonathan

 Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index dc709029..e8d2798f 100644
--- a/Makefile
+++ b/Makefile
@@ -2230,7 +2230,7 @@ $(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs)
 endif
 
 %.s: %.c GIT-CFLAGS FORCE
-   $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
+   $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
 
 ifdef USE_COMPUTED_HEADER_DEPENDENCIES
 # Take advantage of gcc's on-the-fly dependency generation
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-15 Thread Michael Cree

Jonathan,

Thanks for acting so promptly on this.  Just a minor point on the  
commit message below.


On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote:

Unfortunately, Michael noticed on an Alpha machine that git was using
plain 32-bit reads anyway.  As soon as we convert a pointer to int *,
the compiler can assume that the object pointed to is correctly
aligned as an int (C99 section 6.3.2.3 "pointer conversions"
paragraph 7), and gcc takes full advantage by converting the get_be32
calls back to a load and bswap and producing a whole bunch of
unaligned access traps.


Alpha does not have a bswap (or similar) instruction.  I do recall  
reading somewhere that one can get halfway to swapping endianness by  
treating a number as a VAX float and converting to IEEE float, with  
only a couple or so more instructions to achieve the full swap.  But I  
don't think that is available to us anyway because on Debian we  
compile for generic Alpha. I suspect that the crux of the matter is  
that compiling for generic Alpha also means that byte access  
instructions are not permitted.  In other words, all memory accesses  
must be long words (32bits) or quad words (64bits).  In the get_be32  
routine the compiler _has_ to issue 32bit memory accesses whether the  
base pointer is char * or int *.  If the pointer is unaligned then two  
neighbouring aligned 32bit loads are required to ensure that all four  
bytes are loaded.  If the pointer is aligned then a single 32bit load  
gets all the four bytes.  Having never looked at the generated  
assembler code, I nevertheless suspect that is the guts of the  
optimisation --- the compiler can eliminate an access to memory if it  
knows the pointer is aligned.


Cheers
Michael.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-14 Thread Linus Torvalds
Looks good to me. I'd suggest doing the macro argument expansion in

   #define SHA_SRC(t) get_be32((unsigned char *) block + t*4)

with parenthesis around 't', but that's a fairly independent thing.
The old code didn't do that either.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-14 Thread Jonathan Nieder
With 660231aa (block-sha1: support for architectures with memory
alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to
access 32-bit chunks of memory one byte at a time on arches that
prefer that:

#define get_be32(p)( \
(*((unsigned char *)(p) + 0) << 24) | \
(*((unsigned char *)(p) + 1) << 16) | \
(*((unsigned char *)(p) + 2) <<  8) | \
(*((unsigned char *)(p) + 3) <<  0) )

The code previously accessed these values by just using htonl(*p).

Unfortunately, Michael noticed on an Alpha machine that git was using
plain 32-bit reads anyway.  As soon as we convert a pointer to int *,
the compiler can assume that the object pointed to is correctly
aligned as an int (C99 section 6.3.2.3 "pointer conversions"
paragraph 7), and gcc takes full advantage by converting the get_be32
calls back to a load and bswap and producing a whole bunch of
unaligned access traps.

So we need to obey the alignment constraints even when only dealing
with pointers instead of actual values.  Do so by changing the type
of 'data' to void *.  This patch renames 'data' to 'block' at the same
time to make sure all references are updated to reflect the new type.

Reported-and-tested-by: Michael Cree 
Signed-off-by: Jonathan Nieder 
---
Linus Torvalds wrote:

> Anyway, the whole "noticed on alpha" makes no sense, since alpha isn't
> even big-endian.

Thanks again for catching that.  Here's a reroll.

 block-sha1/sha1.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index d8934757..10fd94d1 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -100,7 +100,7 @@
  * Where do we get the source from? The first 16 iterations get it from
  * the input data, the next mix it from the 512-bit array.
  */
-#define SHA_SRC(t) get_be32(data + t)
+#define SHA_SRC(t) get_be32((unsigned char *) block + t*4)
 #define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
 
 #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \
@@ -114,7 +114,7 @@
 #define T_40_59(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, ((B&C)+(D&(B^C))) , 
0x8f1bbcdc, A, B, C, D, E )
 #define T_60_79(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) ,  0xca62c1d6, 
A, B, C, D, E )
 
-static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data)
+static void blk_SHA1_Block(blk_SHA_CTX *ctx, const void *block)
 {
unsigned int A,B,C,D,E;
unsigned int array[16];
@@ -125,7 +125,7 @@ static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned 
int *data)
D = ctx->H[3];
E = ctx->H[4];
 
-   /* Round 1 - iterations 0-16 take their input from 'data' */
+   /* Round 1 - iterations 0-16 take their input from 'block' */
T_0_15( 0, A, B, C, D, E);
T_0_15( 1, E, A, B, C, D);
T_0_15( 2, D, E, A, B, C);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html