Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
On 16/07/12 09:27, Jonathan Nieder wrote: > Michael Cree wrote: > >> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: > >>> gcc takes full advantage by converting the get_be32 >>> calls back to a load and bswap and producing a whole bunch of >>> unaligned access traps. >> >> Alpha does not have a bswap (or similar) instruction. > [...] >>. If the pointer is unaligned >> then two neighbouring aligned 32bit loads are required to ensure >> that all four bytes are loaded. If the pointer is aligned then a >> single 32bit load gets all the four bytes. Having never looked at >> the generated assembler code, I nevertheless suspect that is the >> guts of the optimisation --- the compiler can eliminate an access to >> memory if it knows the pointer is aligned. > > How about: > > gcc takes full advantage by using a single 32-bit load, > resulting in a whole bunch of unaligned access traps. Yes, that's good. I just checked the generated assembler and I am correct except that I underestimated the number of memory accesses for the a priori known unaligned data case. It is actually doing four long-word memory access, i.e. a memory access for each individual byte despite that it only needs to do a maximum of two, so the optimisation for the a priori known aligned data saves three memory accesses, not just one. (And probably also saves on a load-load replay trap on the EV6 and later CPUs, i.e., a complete and very costly replay of the whole pipeline when the CPU realises just in the nick of time that it has cocked up the interrelationships between reordered load instructions.) Cheers Michael. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
Hi, Michael Cree wrote: > On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: >> gcc takes full advantage by converting the get_be32 >> calls back to a load and bswap and producing a whole bunch of >> unaligned access traps. > > Alpha does not have a bswap (or similar) instruction. [...] >. If the pointer is unaligned > then two neighbouring aligned 32bit loads are required to ensure > that all four bytes are loaded. If the pointer is aligned then a > single 32bit load gets all the four bytes. Having never looked at > the generated assembler code, I nevertheless suspect that is the > guts of the optimisation --- the compiler can eliminate an access to > memory if it knows the pointer is aligned. How about: gcc takes full advantage by using a single 32-bit load, resulting in a whole bunch of unaligned access traps. You can see the generated assembler with "make block-sha1/sha1.s" and reading the resulting sha1.s file in the current working directory (oops). I don't have an alpha cross-compiler installed or access to an alpha to check for myself. -- >8 -- Subject: Makefile: fix location of listing produced by "make subdir/foo.s" When I invoke "make block-sha1/sha1.s", 'make' runs $(CC) -S without specifying where it should put its output and the output ends up in ./sha1.s. Confusing. Add an -o option to the .s rule to fix this. We were already doing that for most compiler invocations but had forgotten it for the assembler listings. Signed-off-by: Jonathan Nieder --- Thanks, Jonathan Makefile |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index dc709029..e8d2798f 100644 --- a/Makefile +++ b/Makefile @@ -2230,7 +2230,7 @@ $(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) endif %.s: %.c GIT-CFLAGS FORCE - $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< + $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< ifdef USE_COMPUTED_HEADER_DEPENDENCIES # Take advantage of gcc's on-the-fly dependency generation -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
Jonathan, Thanks for acting so promptly on this. Just a minor point on the commit message below. On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: Unfortunately, Michael noticed on an Alpha machine that git was using plain 32-bit reads anyway. As soon as we convert a pointer to int *, the compiler can assume that the object pointed to is correctly aligned as an int (C99 section 6.3.2.3 "pointer conversions" paragraph 7), and gcc takes full advantage by converting the get_be32 calls back to a load and bswap and producing a whole bunch of unaligned access traps. Alpha does not have a bswap (or similar) instruction. I do recall reading somewhere that one can get halfway to swapping endianness by treating a number as a VAX float and converting to IEEE float, with only a couple or so more instructions to achieve the full swap. But I don't think that is available to us anyway because on Debian we compile for generic Alpha. I suspect that the crux of the matter is that compiling for generic Alpha also means that byte access instructions are not permitted. In other words, all memory accesses must be long words (32bits) or quad words (64bits). In the get_be32 routine the compiler _has_ to issue 32bit memory accesses whether the base pointer is char * or int *. If the pointer is unaligned then two neighbouring aligned 32bit loads are required to ensure that all four bytes are loaded. If the pointer is aligned then a single 32bit load gets all the four bytes. Having never looked at the generated assembler code, I nevertheless suspect that is the guts of the optimisation --- the compiler can eliminate an access to memory if it knows the pointer is aligned. Cheers Michael. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
Looks good to me. I'd suggest doing the macro argument expansion in #define SHA_SRC(t) get_be32((unsigned char *) block + t*4) with parenthesis around 't', but that's a fairly independent thing. The old code didn't do that either. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
With 660231aa (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to access 32-bit chunks of memory one byte at a time on arches that prefer that: #define get_be32(p)( \ (*((unsigned char *)(p) + 0) << 24) | \ (*((unsigned char *)(p) + 1) << 16) | \ (*((unsigned char *)(p) + 2) << 8) | \ (*((unsigned char *)(p) + 3) << 0) ) The code previously accessed these values by just using htonl(*p). Unfortunately, Michael noticed on an Alpha machine that git was using plain 32-bit reads anyway. As soon as we convert a pointer to int *, the compiler can assume that the object pointed to is correctly aligned as an int (C99 section 6.3.2.3 "pointer conversions" paragraph 7), and gcc takes full advantage by converting the get_be32 calls back to a load and bswap and producing a whole bunch of unaligned access traps. So we need to obey the alignment constraints even when only dealing with pointers instead of actual values. Do so by changing the type of 'data' to void *. This patch renames 'data' to 'block' at the same time to make sure all references are updated to reflect the new type. Reported-and-tested-by: Michael Cree Signed-off-by: Jonathan Nieder --- Linus Torvalds wrote: > Anyway, the whole "noticed on alpha" makes no sense, since alpha isn't > even big-endian. Thanks again for catching that. Here's a reroll. block-sha1/sha1.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index d8934757..10fd94d1 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -100,7 +100,7 @@ * Where do we get the source from? The first 16 iterations get it from * the input data, the next mix it from the 512-bit array. */ -#define SHA_SRC(t) get_be32(data + t) +#define SHA_SRC(t) get_be32((unsigned char *) block + t*4) #define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ @@ -114,7 +114,7 @@ #define T_40_59(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, ((B&C)+(D&(B^C))) , 0x8f1bbcdc, A, B, C, D, E ) #define T_60_79(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 0xca62c1d6, A, B, C, D, E ) -static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) +static void blk_SHA1_Block(blk_SHA_CTX *ctx, const void *block) { unsigned int A,B,C,D,E; unsigned int array[16]; @@ -125,7 +125,7 @@ static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) D = ctx->H[3]; E = ctx->H[4]; - /* Round 1 - iterations 0-16 take their input from 'data' */ + /* Round 1 - iterations 0-16 take their input from 'block' */ T_0_15( 0, A, B, C, D, E); T_0_15( 1, E, A, B, C, D); T_0_15( 2, D, E, A, B, C); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html