Re: Bit-field struct member sign extension pattern results in redundant
> On 5 Sep 2017, at 9:35 AM, Michael Clarkwrote: > >> >> On 19 Aug 2017, at 4:10 AM, Richard Henderson wrote: >> >> On 08/17/2017 03:29 PM, Michael Clark wrote: >>> hand coded x86 asm (no worse because the sar depends on the lea) >>> >>> sx5(int): >>> shl edi, 27 >>> sar edi, 27 >>> movsx eax, dl >> >> Typo in the register, but I know what you mean. More interestingly, edi >> already has the sign-extended value, so "mov eax, edi" sufficies (saving one >> byte or perhaps allowing better register allocation). >> >> That said, if anyone is tweaking x86-64 patterns on this, consider >> >> sx5(int): >> rorxeax, edi, 5 >> sar eax, 27 >> >> where the (newish) bmi2 rorx instruction allows the data to be moved into >> place >> while rotating, avoiding the extra move entirely. > > The patterns might be hard to match unless we can get the gimple expansions > for bitfield access to use the mode of the enclosing type. > > For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is > generating two shifts on QI mode sub-registers, each with a sign-extend. > > For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is > generating two shifts on HI mode sub-registers, each with a sign-extend. > > The bitfield expansion logic is selecting the narrowest type that can hold > the bitwidth of the field, versus the bitwidth of the enclosing type and this > appears to be in the gimple to rtl transform, possibly in expr.c. > > Using the mode of the enclosing type for bitfield member access expansion > would likely solve this problem. I suspect that with the use of caches and > word sized register accesses, that this sort of change would be reasonable to > make for 32-bit and 64-bit targets. > > I also noticed something I didn’t spot earlier. On RV32 the sign extend and > shifts are correctly coalesced into 27 bit shifts in the combine stage. We > are presently looking into another redundant sign extension issue on RV64 > that could potentially be related. It could be that the shift coalescing > optimisation doesn’t happen unless the redundant sign extensions are > eliminated early in combine by simplify_rtx. i.e. the pattern is more complex > due to sign_extend ops that are not eliminated. > > - https://cx.rv8.io/g/2FxpNw > > RV64 and Aarch64 both appear to have the issue but with different expansions > for the shift and extend pattern due to the mode of access (QI or HI). Field > accesses above 16 bits create SI mode accesses and generate the expected > code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it > appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI > mode and HI mode loads in some bitfield test cases when a struct is passed by > pointer but has no effect on this particular issue. This shows the codegen > for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as > SLOW_BYTE_ACCESS defined to 1. > > - https://cx.rv8.io/g/TyXnoG > > A target independent fix that would solve the issue on ARM and RISC-V would > be to access bitfield members with the mode of the bitfield member's > enclosing type instead of the smallest mode that can hold the bitwidth of the > type. If we had a char or short member in the struct, I can understand the > use of QI and HI mode, as we would need narrorwer accesses due to alignment > issues, but in this case the member is in int and one would expect this to > expand to SI mode accesses if the enclosing type is SI mode. It appears that 64-bit targets which promote to an SI mode subregister of DI mode register prevents combine from coalescing the shift and sign_extend. The only difference I can spot between RV32 which coalesces the shift during combine, and RV64 and other 64-bit targets which do not coalesce the shifts during combine, is that the 64-bit targets have promoted the shift operand to a SI mode subregister of a DI mode register, whereas on RV 32 the shift operand is simply an SI mode register. I suspect there is some code in simplify-rtx.c that is missing the sub register case however I’m still trying to isolate the code that coalesces shift and sign_extend. I’ve found a similar, perhaps smaller, but related case where a shift is not coalesced however in this case it is also not coalesced on RV32 https://cx.rv8.io/g/fPdk2F > riscv64: > > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > > sx17(int): > slliw a0,a0,15 > sraiw a0,a0,15 > ret > > riscv32: > > sx5(int): > slli a0,a0,27 > srai a0,a0,27 > ret > > sx17(int): > slli a0,a0,15 > srai a0,a0,15 > ret > > aarch64: > > sx5(int): > sbfiz w0, w0, 3, 5 > asr w0, w0, 3 > ret > > sx17(int):
Re: Bit-field struct member sign extension pattern results in redundant
> On 19 Aug 2017, at 4:10 AM, Richard Hendersonwrote: > > On 08/17/2017 03:29 PM, Michael Clark wrote: >> hand coded x86 asm (no worse because the sar depends on the lea) >> >> sx5(int): >>shl edi, 27 >>sar edi, 27 >>movsx eax, dl > > Typo in the register, but I know what you mean. More interestingly, edi > already has the sign-extended value, so "mov eax, edi" sufficies (saving one > byte or perhaps allowing better register allocation). > > That said, if anyone is tweaking x86-64 patterns on this, consider > > sx5(int): > rorxeax, edi, 5 > sar eax, 27 > > where the (newish) bmi2 rorx instruction allows the data to be moved into > place > while rotating, avoiding the extra move entirely. The patterns might be hard to match unless we can get the gimple expansions for bitfield access to use the mode of the enclosing type. For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is generating two shifts on QI mode sub-registers, each with a sign-extend. For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is generating two shifts on HI mode sub-registers, each with a sign-extend. The bitfield expansion logic is selecting the narrowest type that can hold the bitwidth of the field, versus the bitwidth of the enclosing type and this appears to be in the gimple to rtl transform, possibly in expr.c. Using the mode of the enclosing type for bitfield member access expansion would likely solve this problem. I suspect that with the use of caches and word sized register accesses, that this sort of change would be reasonable to make for 32-bit and 64-bit targets. I also noticed something I didn’t spot earlier. On RV32 the sign extend and shifts are correctly coalesced into 27 bit shifts in the combine stage. We are presently looking into another redundant sign extension issue on RV64 that could potentially be related. It could be that the shift coalescing optimisation doesn’t happen unless the redundant sign extensions are eliminated early in combine by simplify_rtx. i.e. the pattern is more complex due to sign_extend ops that are not eliminated. - https://cx.rv8.io/g/2FxpNw RV64 and Aarch64 both appear to have the issue but with different expansions for the shift and extend pattern due to the mode of access (QI or HI). Field accesses above 16 bits create SI mode accesses and generate the expected code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI mode and HI mode loads in some bitfield test cases when a struct is passed by pointer but has no effect on this particular issue. This shows the codegen for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as SLOW_BYTE_ACCESS defined to 1. - https://cx.rv8.io/g/TyXnoG A target independent fix that would solve the issue on ARM and RISC-V would be to access bitfield members with the mode of the bitfield member's enclosing type instead of the smallest mode that can hold the bitwidth of the type. If we had a char or short member in the struct, I can understand the use of QI and HI mode, as we would need narrorwer accesses due to alignment issues, but in this case the member is in int and one would expect this to expand to SI mode accesses if the enclosing type is SI mode. riscv64: sx5(int): slliw a0,a0,3 slliw a0,a0,24 sraiw a0,a0,24 sraiw a0,a0,3 ret sx17(int): slliw a0,a0,15 sraiw a0,a0,15 ret riscv32: sx5(int): slli a0,a0,27 srai a0,a0,27 ret sx17(int): slli a0,a0,15 srai a0,a0,15 ret aarch64: sx5(int): sbfiz w0, w0, 3, 5 asr w0, w0, 3 ret sx17(int): sbfx w0, w0, 0, 17 ret
Re: Bit-field struct member sign extension pattern results in redundant
On 08/17/2017 03:29 PM, Michael Clark wrote: > hand coded x86 asm (no worse because the sar depends on the lea) > > sx5(int): > shl edi, 27 > sar edi, 27 > movsx eax, dl Typo in the register, but I know what you mean. More interestingly, edi already has the sign-extended value, so "mov eax, edi" sufficies (saving one byte or perhaps allowing better register allocation). That said, if anyone is tweaking x86-64 patterns on this, consider sx5(int): rorxeax, edi, 5 sar eax, 27 where the (newish) bmi2 rorx instruction allows the data to be moved into place while rotating, avoiding the extra move entirely. r~
Re: Bit-field struct member sign extension pattern results in redundant
On Fri, 2017-08-18 at 10:29 +1200, Michael Clark wrote: > > This one is quite interesting: > > - https://cx.rv8.io/g/WXWMTG > > It’s another target independent bug. x86 is using some LEA followed > by SAR trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. > In any case RISC-V seems like a nice target to try to fix this > codegen for, as its less risk than attempting a fix in x86 ;-) > > - https://github.com/riscv/riscv-gcc/issues/89 > > code: > > template > inline T signextend(const T x) > { > struct {T x:B;} s; > return s.x = x; > } > > int sx5(int x) { > return signextend(x); > } > > riscv asm: > > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > > hand coded riscv asm > > sx5(int): > slliw a0,a0,27 > sraiw a0,a0,27 > ret > Maybe related ... https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67644 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50521 Cheers, Oleg
[OT] Re: Bit-field struct member sign extension pattern results in redundant
On Fri, Aug 18, 2017 at 10:56:10PM +1200, Michael Clark wrote: > > > On 18 Aug 2017, at 10:41 PM, Gabriel Paubertwrote: > > > > On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote: > >> Sorry I had to send again as my Apple mailer is munging emails. I’ve > >> disabled RTF. > >> > >> > >> This one is quite interesting: > >> > >> - https://cx.rv8.io/g/WXWMTG > >> > >> It’s another target independent bug. x86 is using some LEA followed by SAR > >> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case > >> RISC-V seems like a nice target to try to fix this codegen for, as its > >> less risk than attempting a fix in x86 ;-) > >> > >> - https://github.com/riscv/riscv-gcc/issues/89 > >> > >> code: > >> > >>template > >>inline T signextend(const T x) > >>{ > >>struct {T x:B;} s; > >>return s.x = x; > >>} > >> > >>int sx5(int x) { > >>return signextend(x); > >>} > >> > >> riscv asm: > >> > >>sx5(int): > >> slliw a0,a0,3 > >> slliw a0,a0,24 > >> sraiw a0,a0,24 > >> sraiw a0,a0,3 > >> ret > >> > >> hand coded riscv asm > >> > >>sx5(int): > >> slliw a0,a0,27 > >> sraiw a0,a0,27 > >> ret > >> > >> x86 asm: > >> > >>sx5(int): > >> lea eax, [0+rdi*8] > >> sar al, 3 > >> movsx eax, al > >> ret > >> > >> hand coded x86 asm (no worse because the sar depends on the lea) > >> > >>sx5(int): > >> shl edi, 27 > >> sar edi, 27 > >> movsx eax, dl > > > > Huh? dl is not a subreg of edi! > > > > s/edi/edx/ and it may work. > > > > dil can also be used, but only on 64 bit. > > Sorry I meant dil on x86-64. I was sure that it was possible to extend into > another register. It is, but only from the first 4 registers on 32 bit: AL, BL, CL, and DL can be accessed as 8 bit registers, as well as the high part of the 16 bit registers: AH, BH, CH and DH. This is essentially a left-over of the 16 bit processors (8086 to 80286), carried over without change in the 32 bit processors. 64 bit extensions made registers more orthogonal in this respect, but there is still some weirdness due to historical reasons. > I have not done much i386 asm so I am unaware of the constraints. Fortunate you are, especially since you hopefully did not have to live through the 16 bit nightmare of segments and near and far addresses. >Can the source and dest registers for movsx not differ on i386? I thought they >could. They can, but only from the first 4 registers. On the other hand movsx eax,dh (in Intel's syntax) is perfectly valid. > In any case, the plot thickens… > > I believe we have bugs on both RISC-V and Aarch64. Maybe, I don't know either arch well enough, but RISC-V looks intuitive enough to understand that it is bad both in terms of performance and code size. Another arch you could try is Power. > I found that it at least appears like it is transitioning to a char or short > as the break is at 24 and 16 depending on the width, and values over 16 work > as one would expect. > > Here is an updated test program: https://cx.rv8.io/g/M9ewNf > > template > inline T signextend(const T x) > { > struct {T x:B;} s; > return s.x = x; > } > > int sx3(int x) { return signextend(x); } > int sx5(int x) { return signextend(x); } > int sx11(int x) { return signextend(x); } > int sx14(int x) { return signextend(x); } > int sx19(int x) { return signextend(x); } > > I filed a bug on riscv-gcc but I think it is target independent code given > there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a > single sbfx for all of the test functions. > > - https://github.com/riscv/riscv-gcc/issues/89 > > Should I file a bug on GCC bugzilla given it looks to be target independent? I think so, but you should get confirmation from someone else. > > On RISC-V, the codegen is much more obviously wrong, but essentially the same > thing is happening on Aarch64 but there is only one additional instruction > instead of two. > > sx3(int): > slliw a0,a0,5 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,5 > ret > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > sx11(int): > slliw a0,a0,5 > slliw a0,a0,16 > sraiw a0,a0,16 > sraiw a0,a0,5 > ret > sx14(int): > slliw a0,a0,2 > slliw a0,a0,16 > sraiw a0,a0,16 > sraiw a0,a0,2 > ret > sx19(int): > slliw a0,a0,13 > sraiw a0,a0,13 > ret > >
Re: Bit-field struct member sign extension pattern results in redundant
> On 18 Aug 2017, at 10:56 PM, Michael Clarkwrote: > >> >> On 18 Aug 2017, at 10:41 PM, Gabriel Paubert wrote: >> >> On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote: >>> Sorry I had to send again as my Apple mailer is munging emails. I’ve >>> disabled RTF. >>> >>> >>> This one is quite interesting: >>> >>> - https://cx.rv8.io/g/WXWMTG >>> >>> It’s another target independent bug. x86 is using some LEA followed by SAR >>> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case >>> RISC-V seems like a nice target to try to fix this codegen for, as its less >>> risk than attempting a fix in x86 ;-) >>> >>> - https://github.com/riscv/riscv-gcc/issues/89 >>> >>> code: >>> >>> template >>> inline T signextend(const T x) >>> { >>> struct {T x:B;} s; >>> return s.x = x; >>> } >>> >>> int sx5(int x) { >>> return signextend(x); >>> } >>> >>> riscv asm: >>> >>> sx5(int): >>> slliw a0,a0,3 >>> slliw a0,a0,24 >>> sraiw a0,a0,24 >>> sraiw a0,a0,3 >>> ret >>> >>> hand coded riscv asm >>> >>> sx5(int): >>> slliw a0,a0,27 >>> sraiw a0,a0,27 >>> ret >>> >>> x86 asm: >>> >>> sx5(int): >>> lea eax, [0+rdi*8] >>> sar al, 3 >>> movsx eax, al >>> ret >>> >>> hand coded x86 asm (no worse because the sar depends on the lea) >>> >>> sx5(int): >>> shl edi, 27 >>> sar edi, 27 >>> movsx eax, dl >> >> Huh? dl is not a subreg of edi! >> >> s/edi/edx/ and it may work. >> >> dil can also be used, but only on 64 bit. In any case it is more straightforward given we are just dealing with an int bitfield member in this case so a regular move would suffice. The mov gets retired in rename (movsx and movzx can both be retired in rename I think). The same pattern could be used for all 5 function examples, with predictable performance, versus 5 different solutions for the 5 different bitfield positions (that’s actually very amazing that the pattern matching on x86 finds 5 unique ways to lower this code pattern). Padon my Intel syntax. e.g. sx5(int): shl edi, 27 sar edi, 27 mov eax, edi However I’m more interested in what the fix would be for RISC-V and Aarch64. It’s less obvious how to fix it on x86-64 because there are too many different ways. On RISC-V there is only one obvious way and that is to not artificially narrow the type when we are extracting from an int aligned struct bit field member to an int. Likewise on Aarch64, one sbfiz instruction. > Sorry I meant dil on x86-64. I was sure that it was possible to extend into > another register. I have not done much i386 asm so I am unaware of the > constraints. Can the source and dest registers for movsx not differ on i386? > I thought they could. > > In any case, the plot thickens… > > I believe we have bugs on both RISC-V and Aarch64. > > I found that it at least appears like it is transitioning to a char or short > as the break is at 24 and 16 depending on the width, and values over 16 work > as one would expect. > > Here is an updated test program: https://cx.rv8.io/g/M9ewNf > > template > inline T signextend(const T x) > { > struct {T x:B;} s; > return s.x = x; > } > > int sx3(int x) { return signextend(x); } > int sx5(int x) { return signextend(x); } > int sx11(int x) { return signextend(x); } > int sx14(int x) { return signextend(x); } > int sx19(int x) { return signextend(x); } > > I filed a bug on riscv-gcc but I think it is target independent code given > there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a > single sbfx for all of the test functions. > > - https://github.com/riscv/riscv-gcc/issues/89 > > Should I file a bug on GCC bugzilla given it looks to be target independent? > > On RISC-V, the codegen is much more obviously wrong, but essentially the same > thing is happening on Aarch64 but there is only one additional instruction > instead of two. > > sx3(int): > slliw a0,a0,5 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,5 > ret > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > sx11(int): > slliw a0,a0,5 > slliw a0,a0,16 > sraiw a0,a0,16 > sraiw a0,a0,5 > ret > sx14(int): > slliw a0,a0,2 > slliw a0,a0,16 > sraiw a0,a0,16 > sraiw a0,a0,2 > ret > sx19(int): > slliw a0,a0,13 > sraiw a0,a0,13 > ret
Re: Bit-field struct member sign extension pattern results in redundant
> On 18 Aug 2017, at 10:41 PM, Gabriel Paubertwrote: > > On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote: >> Sorry I had to send again as my Apple mailer is munging emails. I’ve >> disabled RTF. >> >> >> This one is quite interesting: >> >> - https://cx.rv8.io/g/WXWMTG >> >> It’s another target independent bug. x86 is using some LEA followed by SAR >> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case >> RISC-V seems like a nice target to try to fix this codegen for, as its less >> risk than attempting a fix in x86 ;-) >> >> - https://github.com/riscv/riscv-gcc/issues/89 >> >> code: >> >> template >> inline T signextend(const T x) >> { >> struct {T x:B;} s; >> return s.x = x; >> } >> >> int sx5(int x) { >> return signextend(x); >> } >> >> riscv asm: >> >> sx5(int): >>slliw a0,a0,3 >>slliw a0,a0,24 >>sraiw a0,a0,24 >>sraiw a0,a0,3 >>ret >> >> hand coded riscv asm >> >> sx5(int): >>slliw a0,a0,27 >>sraiw a0,a0,27 >>ret >> >> x86 asm: >> >> sx5(int): >>lea eax, [0+rdi*8] >>sar al, 3 >>movsx eax, al >>ret >> >> hand coded x86 asm (no worse because the sar depends on the lea) >> >> sx5(int): >>shl edi, 27 >>sar edi, 27 >>movsx eax, dl > > Huh? dl is not a subreg of edi! > > s/edi/edx/ and it may work. > > dil can also be used, but only on 64 bit. Sorry I meant dil on x86-64. I was sure that it was possible to extend into another register. I have not done much i386 asm so I am unaware of the constraints. Can the source and dest registers for movsx not differ on i386? I thought they could. In any case, the plot thickens… I believe we have bugs on both RISC-V and Aarch64. I found that it at least appears like it is transitioning to a char or short as the break is at 24 and 16 depending on the width, and values over 16 work as one would expect. Here is an updated test program: https://cx.rv8.io/g/M9ewNf template inline T signextend(const T x) { struct {T x:B;} s; return s.x = x; } int sx3(int x) { return signextend(x); } int sx5(int x) { return signextend(x); } int sx11(int x) { return signextend(x); } int sx14(int x) { return signextend(x); } int sx19(int x) { return signextend(x); } I filed a bug on riscv-gcc but I think it is target independent code given there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a single sbfx for all of the test functions. - https://github.com/riscv/riscv-gcc/issues/89 Should I file a bug on GCC bugzilla given it looks to be target independent? On RISC-V, the codegen is much more obviously wrong, but essentially the same thing is happening on Aarch64 but there is only one additional instruction instead of two. sx3(int): slliw a0,a0,5 slliw a0,a0,24 sraiw a0,a0,24 sraiw a0,a0,5 ret sx5(int): slliw a0,a0,3 slliw a0,a0,24 sraiw a0,a0,24 sraiw a0,a0,3 ret sx11(int): slliw a0,a0,5 slliw a0,a0,16 sraiw a0,a0,16 sraiw a0,a0,5 ret sx14(int): slliw a0,a0,2 slliw a0,a0,16 sraiw a0,a0,16 sraiw a0,a0,2 ret sx19(int): slliw a0,a0,13 sraiw a0,a0,13 ret
Re: Bit-field struct member sign extension pattern results in redundant
On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote: > Sorry I had to send again as my Apple mailer is munging emails. I’ve disabled > RTF. > > > This one is quite interesting: > > - https://cx.rv8.io/g/WXWMTG > > It’s another target independent bug. x86 is using some LEA followed by SAR > trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case > RISC-V seems like a nice target to try to fix this codegen for, as its less > risk than attempting a fix in x86 ;-) > > - https://github.com/riscv/riscv-gcc/issues/89 > > code: > > template > inline T signextend(const T x) > { > struct {T x:B;} s; > return s.x = x; > } > > int sx5(int x) { > return signextend(x); > } > > riscv asm: > > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > > hand coded riscv asm > > sx5(int): > slliw a0,a0,27 > sraiw a0,a0,27 > ret > > x86 asm: > > sx5(int): > lea eax, [0+rdi*8] > sar al, 3 > movsx eax, al > ret > > hand coded x86 asm (no worse because the sar depends on the lea) > > sx5(int): > shl edi, 27 > sar edi, 27 > movsx eax, dl Huh? dl is not a subreg of edi! s/edi/edx/ and it may work. dil can also be used, but only on 64 bit. Gabriel
Re: Bit-field struct member sign extension pattern results in redundant
> On 18 Aug 2017, at 11:13 AM, Michael Clarkwrote: > > So it is a bug on arm too? and can be done with one sbfiz instruction? > (assuming I’m understand sbfiz from my first reading) e.g. > > sbfiz w0, w0, 0, 2 >ret Getting my 3’s and 5’s swapped. Confused by gcc. sbfiz w0, w0, 0, 4 >> Which is: >> (insn 7 6 8 2 (set (reg:SI 80) >> (sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ]) >> (const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi} >>(expr_list:REG_DEAD (reg:SI 0 x0 [ x ]) >> (nil))) >> >> (insn 14 9 17 2 (set (reg/i:SI 0 x0) >> (ashiftrt:SI (reg:SI 80) >> (const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3} >>(expr_list:REG_DEAD (reg:SI 80) >> (nil))) >> >> What I suspect is we get a QImode register in there and that is what >> is happening to both x86 and riscv. >
Re: Bit-field struct member sign extension pattern results in redundant
> On 18 Aug 2017, at 10:41 AM, Andrew Pinskiwrote: > > On Thu, Aug 17, 2017 at 3:29 PM, Michael Clark wrote: >> Sorry I had to send again as my Apple mailer is munging emails. I’ve >> disabled RTF. >> >> >> This one is quite interesting: >> >> - https://cx.rv8.io/g/WXWMTG >> >> It’s another target independent bug. x86 is using some LEA followed by SAR >> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case >> RISC-V seems like a nice target to try to fix this codegen for, as its less >> risk than attempting a fix in x86 ;-) >> >> - https://github.com/riscv/riscv-gcc/issues/89 >> >> code: >> >>template >>inline T signextend(const T x) >>{ >>struct {T x:B;} s; >>return s.x = x; >>} >> >>int sx5(int x) { >>return signextend(x); >>} > > on AARCH64 I get: >sbfiz w0, w0, 3, 5 >asr w0, w0, 3 >ret So it is a bug on arm too? and can be done with one sbfiz instruction? (assuming I’m understand sbfiz from my first reading) e.g. sbfiz w0, w0, 0, 2 ret > Which is: > (insn 7 6 8 2 (set (reg:SI 80) >(sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ]) >(const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi} > (expr_list:REG_DEAD (reg:SI 0 x0 [ x ]) >(nil))) > > (insn 14 9 17 2 (set (reg/i:SI 0 x0) >(ashiftrt:SI (reg:SI 80) >(const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3} > (expr_list:REG_DEAD (reg:SI 80) >(nil))) > > What I suspect is we get a QImode register in there and that is what > is happening to both x86 and riscv. Curious. Thanks. I still need to learn more about gcc pattern matching and lowering. I actually have this exact code pattern using a struct member to do arbitrary width sign extend as it’s the first method for arbitrary bit width extension, in Google and on the Stanford Bit Twiddling Hacks page. I understand i’m better off changing my pattern to an explicit shift left and shift right, given I’m now aware this is unambiguously the optimal lowering for ISAs with 1 cycle shifters and no field extract instructions, however given others are likely to stumble on the Stanford page, I wouldn’t be surprised if i’m not the only one using this pattern: - https://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend The x86 sequence is kind of trick and it’s no better off in instruction count by using two shifts. I’m also annoyed that BZHI in BMI2 is an arbitrary width zero extend versus a bit extend. I’d like a BEXT-like instruction that copies a bit at offset n, but given it is only two instruction saving it should also do a right shift so it can be used for signed and unsigned field extract (with a bit copy or bit zero operand bit). Thanks, Michael.
Re: Bit-field struct member sign extension pattern results in redundant
On Thu, Aug 17, 2017 at 3:29 PM, Michael Clarkwrote: > Sorry I had to send again as my Apple mailer is munging emails. I’ve disabled > RTF. > > > This one is quite interesting: > > - https://cx.rv8.io/g/WXWMTG > > It’s another target independent bug. x86 is using some LEA followed by SAR > trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case > RISC-V seems like a nice target to try to fix this codegen for, as its less > risk than attempting a fix in x86 ;-) > > - https://github.com/riscv/riscv-gcc/issues/89 > > code: > > template > inline T signextend(const T x) > { > struct {T x:B;} s; > return s.x = x; > } > > int sx5(int x) { > return signextend(x); > } on AARCH64 I get: sbfiz w0, w0, 3, 5 asr w0, w0, 3 ret Which is: (insn 7 6 8 2 (set (reg:SI 80) (sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ]) (const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi} (expr_list:REG_DEAD (reg:SI 0 x0 [ x ]) (nil))) (insn 14 9 17 2 (set (reg/i:SI 0 x0) (ashiftrt:SI (reg:SI 80) (const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3} (expr_list:REG_DEAD (reg:SI 80) (nil))) What I suspect is we get a QImode register in there and that is what is happening to both x86 and riscv. Thanks, Andrew > > riscv asm: > > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > > hand coded riscv asm > > sx5(int): > slliw a0,a0,27 > sraiw a0,a0,27 > ret > > x86 asm: > > sx5(int): > lea eax, [0+rdi*8] > sar al, 3 > movsx eax, al > ret > > hand coded x86 asm (no worse because the sar depends on the lea) > > sx5(int): > shl edi, 27 > sar edi, 27 > movsx eax, dl > ret