Re: Bit-field struct member sign extension pattern results in redundant

2017-09-14 Thread Michael Clark

> On 5 Sep 2017, at 9:35 AM, Michael Clark  wrote:
> 
>> 
>> On 19 Aug 2017, at 4:10 AM, Richard Henderson  wrote:
>> 
>> On 08/17/2017 03:29 PM, Michael Clark wrote:
>>> hand coded x86 asm (no worse because the sar depends on the lea)
>>> 
>>> sx5(int):
>>>   shl edi, 27
>>>   sar edi, 27
>>>   movsx eax, dl
>> 
>> Typo in the register, but I know what you mean.  More interestingly, edi
>> already has the sign-extended value, so "mov eax, edi" sufficies (saving one
>> byte or perhaps allowing better register allocation).
>> 
>> That said, if anyone is tweaking x86-64 patterns on this, consider
>> 
>> sx5(int):
>>  rorxeax, edi, 5
>>  sar eax, 27
>> 
>> where the (newish) bmi2 rorx instruction allows the data to be moved into 
>> place
>> while rotating, avoiding the extra move entirely.
> 
> The patterns might be hard to match unless we can get the gimple expansions 
> for bitfield access to use the mode of the enclosing type.
> 
> For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is 
> generating two shifts on QI mode sub-registers, each with a sign-extend.
> 
> For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is 
> generating two shifts on HI mode sub-registers, each with a sign-extend.
> 
> The bitfield expansion logic is selecting the narrowest type that can hold 
> the bitwidth of the field, versus the bitwidth of the enclosing type and this 
> appears to be in the gimple to rtl transform, possibly in expr.c.
> 
> Using the mode of the enclosing type for bitfield member access expansion 
> would likely solve this problem. I suspect that with the use of caches and 
> word sized register accesses, that this sort of change would be reasonable to 
> make for 32-bit and 64-bit targets.
> 
> I also noticed something I didn’t spot earlier. On RV32 the sign extend and 
> shifts are correctly coalesced into 27 bit shifts in the combine stage. We 
> are presently looking into another redundant sign extension issue on RV64 
> that could potentially be related. It could be that the shift coalescing 
> optimisation doesn’t happen unless the redundant sign extensions are 
> eliminated early in combine by simplify_rtx. i.e. the pattern is more complex 
> due to sign_extend ops that are not eliminated.
> 
> - https://cx.rv8.io/g/2FxpNw
> 
> RV64 and Aarch64 both appear to have the issue but with different expansions 
> for the shift and extend pattern due to the mode of access (QI or HI). Field 
> accesses above 16 bits create SI mode accesses and generate the expected 
> code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it 
> appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI 
> mode and HI mode loads in some bitfield test cases when a struct is passed by 
> pointer but has no effect on this particular issue. This shows the codegen 
> for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as 
> SLOW_BYTE_ACCESS defined to 1.
> 
> - https://cx.rv8.io/g/TyXnoG
> 
> A target independent fix that would solve the issue on ARM and RISC-V would 
> be to access bitfield members with the mode of the bitfield member's 
> enclosing type instead of the smallest mode that can hold the bitwidth of the 
> type. If we had a char or short member in the struct, I can understand the 
> use of QI and HI mode, as we would need narrorwer accesses due to alignment 
> issues, but in this case the member is in int and one would expect this to 
> expand to SI mode accesses if the enclosing type is SI mode.

It appears that 64-bit targets which promote to an SI mode subregister of DI 
mode register prevents combine from coalescing the shift and sign_extend. The 
only difference I can spot between RV32 which coalesces the shift during 
combine, and RV64 and other 64-bit targets which do not coalesce the shifts 
during combine, is that the 64-bit targets have promoted the shift operand to a 
SI mode subregister of a DI mode register, whereas on RV 32 the shift operand 
is simply an SI mode register. I suspect there is some code in simplify-rtx.c 
that is missing the sub register case however I’m still trying to isolate the 
code that coalesces shift and sign_extend.

I’ve found a similar, perhaps smaller, but related case where a shift is not 
coalesced however in this case it is also not coalesced on RV32

https://cx.rv8.io/g/fPdk2F

> riscv64:
> 
>   sx5(int):
> slliw a0,a0,3
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,3
> ret
> 
>   sx17(int):
> slliw a0,a0,15
> sraiw a0,a0,15
> ret
> 
> riscv32:
> 
>   sx5(int):
> slli a0,a0,27
> srai a0,a0,27
> ret
> 
>   sx17(int):
> slli a0,a0,15
> srai a0,a0,15
> ret
> 
> aarch64:
> 
>   sx5(int):
> sbfiz w0, w0, 3, 5
> asr w0, w0, 3
> ret
> 
>   sx17(int):

Re: Bit-field struct member sign extension pattern results in redundant

2017-09-04 Thread Michael Clark

> On 19 Aug 2017, at 4:10 AM, Richard Henderson  wrote:
> 
> On 08/17/2017 03:29 PM, Michael Clark wrote:
>> hand coded x86 asm (no worse because the sar depends on the lea)
>> 
>>  sx5(int):
>>shl edi, 27
>>sar edi, 27
>>movsx eax, dl
> 
> Typo in the register, but I know what you mean.  More interestingly, edi
> already has the sign-extended value, so "mov eax, edi" sufficies (saving one
> byte or perhaps allowing better register allocation).
> 
> That said, if anyone is tweaking x86-64 patterns on this, consider
> 
> sx5(int):
>   rorxeax, edi, 5
>   sar eax, 27
> 
> where the (newish) bmi2 rorx instruction allows the data to be moved into 
> place
> while rotating, avoiding the extra move entirely.

The patterns might be hard to match unless we can get the gimple expansions for 
bitfield access to use the mode of the enclosing type.

For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is 
generating two shifts on QI mode sub-registers, each with a sign-extend.

For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is 
generating two shifts on HI mode sub-registers, each with a sign-extend.

The bitfield expansion logic is selecting the narrowest type that can hold the 
bitwidth of the field, versus the bitwidth of the enclosing type and this 
appears to be in the gimple to rtl transform, possibly in expr.c.

Using the mode of the enclosing type for bitfield member access expansion would 
likely solve this problem. I suspect that with the use of caches and word sized 
register accesses, that this sort of change would be reasonable to make for 
32-bit and 64-bit targets.

I also noticed something I didn’t spot earlier. On RV32 the sign extend and 
shifts are correctly coalesced into 27 bit shifts in the combine stage. We are 
presently looking into another redundant sign extension issue on RV64 that 
could potentially be related. It could be that the shift coalescing 
optimisation doesn’t happen unless the redundant sign extensions are eliminated 
early in combine by simplify_rtx. i.e. the pattern is more complex due to 
sign_extend ops that are not eliminated.

- https://cx.rv8.io/g/2FxpNw

RV64 and Aarch64 both appear to have the issue but with different expansions 
for the shift and extend pattern due to the mode of access (QI or HI). Field 
accesses above 16 bits create SI mode accesses and generate the expected code. 
The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it appears 
to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI mode and HI 
mode loads in some bitfield test cases when a struct is passed by pointer but 
has no effect on this particular issue. This shows the codegen for the fix to 
the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as SLOW_BYTE_ACCESS 
defined to 1.

- https://cx.rv8.io/g/TyXnoG

A target independent fix that would solve the issue on ARM and RISC-V would be 
to access bitfield members with the mode of the bitfield member's enclosing 
type instead of the smallest mode that can hold the bitwidth of the type. If we 
had a char or short member in the struct, I can understand the use of QI and HI 
mode, as we would need narrorwer accesses due to alignment issues, but in this 
case the member is in int and one would expect this to expand to SI mode 
accesses if the enclosing type is SI mode.


riscv64:

sx5(int):
  slliw a0,a0,3
  slliw a0,a0,24
  sraiw a0,a0,24
  sraiw a0,a0,3
  ret

sx17(int):
  slliw a0,a0,15
  sraiw a0,a0,15
  ret

riscv32:

sx5(int):
  slli a0,a0,27
  srai a0,a0,27
  ret

sx17(int):
  slli a0,a0,15
  srai a0,a0,15
  ret

aarch64:

sx5(int):
  sbfiz w0, w0, 3, 5
  asr w0, w0, 3
  ret

sx17(int):
  sbfx w0, w0, 0, 17
  ret




Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Richard Henderson
On 08/17/2017 03:29 PM, Michael Clark wrote:
> hand coded x86 asm (no worse because the sar depends on the lea)
> 
>   sx5(int):
> shl edi, 27
> sar edi, 27
> movsx eax, dl

Typo in the register, but I know what you mean.  More interestingly, edi
already has the sign-extended value, so "mov eax, edi" sufficies (saving one
byte or perhaps allowing better register allocation).

That said, if anyone is tweaking x86-64 patterns on this, consider

sx5(int):
rorxeax, edi, 5
sar eax, 27

where the (newish) bmi2 rorx instruction allows the data to be moved into place
while rotating, avoiding the extra move entirely.


r~


Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Oleg Endo
On Fri, 2017-08-18 at 10:29 +1200, Michael Clark wrote:
> 
> This one is quite interesting:
> 
> - https://cx.rv8.io/g/WXWMTG
> 
> It’s another target independent bug. x86 is using some LEA followed
> by SAR trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice.
> In any case RISC-V seems like a nice target to try to fix this
> codegen for, as its less risk than attempting a fix in x86 ;-)
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> code:
> 
>   template 
>   inline T signextend(const T x)
>   {
>   struct {T x:B;} s;
>   return s.x = x;
>   }
> 
>   int sx5(int x) {
>   return signextend(x);
>   }
> 
> riscv asm:
> 
>   sx5(int):
>     slliw a0,a0,3
>     slliw a0,a0,24
>     sraiw a0,a0,24
>     sraiw a0,a0,3
>     ret
> 
> hand coded riscv asm
> 
>   sx5(int):
>     slliw a0,a0,27
>     sraiw a0,a0,27
>     ret
> 

Maybe related ...

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50521


Cheers,
Oleg


[OT] Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Gabriel Paubert
On Fri, Aug 18, 2017 at 10:56:10PM +1200, Michael Clark wrote:
> 
> > On 18 Aug 2017, at 10:41 PM, Gabriel Paubert  wrote:
> > 
> > On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
> >> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
> >> disabled RTF.
> >> 
> >> 
> >> This one is quite interesting:
> >> 
> >> - https://cx.rv8.io/g/WXWMTG
> >> 
> >> It’s another target independent bug. x86 is using some LEA followed by SAR 
> >> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
> >> RISC-V seems like a nice target to try to fix this codegen for, as its 
> >> less risk than attempting a fix in x86 ;-)
> >> 
> >> - https://github.com/riscv/riscv-gcc/issues/89
> >> 
> >> code:
> >> 
> >>template 
> >>inline T signextend(const T x)
> >>{
> >>struct {T x:B;} s;
> >>return s.x = x;
> >>}
> >> 
> >>int sx5(int x) {
> >>return signextend(x);
> >>}
> >> 
> >> riscv asm:
> >> 
> >>sx5(int):
> >>  slliw a0,a0,3
> >>  slliw a0,a0,24
> >>  sraiw a0,a0,24
> >>  sraiw a0,a0,3
> >>  ret
> >> 
> >> hand coded riscv asm
> >> 
> >>sx5(int):
> >>  slliw a0,a0,27
> >>  sraiw a0,a0,27
> >>  ret
> >> 
> >> x86 asm:
> >> 
> >>sx5(int):
> >>  lea eax, [0+rdi*8]
> >>  sar al, 3
> >>  movsx eax, al
> >>  ret
> >> 
> >> hand coded x86 asm (no worse because the sar depends on the lea)
> >> 
> >>sx5(int):
> >>  shl edi, 27
> >>  sar edi, 27
> >>  movsx eax, dl
> > 
> > Huh? dl is not a subreg of edi!
> > 
> > s/edi/edx/ and it may work.
> > 
> > dil can also be used, but only on 64 bit.
> 
> Sorry I meant dil on x86-64. I was sure that it was possible to extend into 
> another register. 

It is, but only from the first 4 registers on 32 bit: AL, BL, CL, and
DL can be accessed as 8 bit registers, as well as the high part of the
16 bit registers: AH, BH, CH and DH.

This is essentially a left-over of the 16 bit processors (8086 to
80286), carried over without change in the 32 bit processors.

64 bit extensions made registers more orthogonal in this respect, but
there is still some weirdness due to historical reasons.

> I have not done much i386 asm so I am unaware of the constraints. 

Fortunate you are, especially since you hopefully did not have to live
through the 16 bit nightmare of segments and near and far addresses.

>Can the source and dest registers for movsx not differ on i386? I thought they 
>could.

They can, but only from the first 4 registers. On the other hand
movsx eax,dh (in Intel's syntax) is perfectly valid.

> In any case, the plot thickens…
> 
> I believe we have bugs on both RISC-V and Aarch64.

Maybe, I don't know either arch well enough, but RISC-V looks intuitive
enough to understand that it is bad both in terms of performance and
code size.

Another arch you could try is Power.

> I found that it at least appears like it is transitioning to a char or short 
> as the break is at 24 and 16 depending on the width, and values over 16 work 
> as one would expect.
> 
> Here is an updated test program: https://cx.rv8.io/g/M9ewNf
> 
>   template 
>   inline T signextend(const T x)
>   {
> struct {T x:B;} s;
> return s.x = x;
>   }
> 
>   int sx3(int x) { return signextend(x); }
>   int sx5(int x) { return signextend(x); }
>   int sx11(int x) { return signextend(x); }
>   int sx14(int x) { return signextend(x); }
>   int sx19(int x) { return signextend(x); }
> 
> I filed a bug on riscv-gcc but I think it is target independent code given 
> there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a 
> single sbfx for all of the test functions.
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> Should I file a bug on GCC bugzilla given it looks to be target independent?

I think so, but you should get confirmation from someone else.

> 
> On RISC-V, the codegen is much more obviously wrong, but essentially the same 
> thing is happening on Aarch64 but there is only one additional instruction 
> instead of two.
> 
>   sx3(int):
> slliw a0,a0,5
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,5
> ret
>   sx5(int):
> slliw a0,a0,3
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,3
> ret
>   sx11(int):
> slliw a0,a0,5
> slliw a0,a0,16
> sraiw a0,a0,16
> sraiw a0,a0,5
> ret
>   sx14(int):
> slliw a0,a0,2
> slliw a0,a0,16
> sraiw a0,a0,16
> sraiw a0,a0,2
> ret
>   sx19(int):
> slliw a0,a0,13
> sraiw a0,a0,13
> ret
> 
> 


Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Michael Clark

> On 18 Aug 2017, at 10:56 PM, Michael Clark  wrote:
> 
>> 
>> On 18 Aug 2017, at 10:41 PM, Gabriel Paubert  wrote:
>> 
>> On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
>>> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
>>> disabled RTF.
>>> 
>>> 
>>> This one is quite interesting:
>>> 
>>> - https://cx.rv8.io/g/WXWMTG
>>> 
>>> It’s another target independent bug. x86 is using some LEA followed by SAR 
>>> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
>>> RISC-V seems like a nice target to try to fix this codegen for, as its less 
>>> risk than attempting a fix in x86 ;-)
>>> 
>>> - https://github.com/riscv/riscv-gcc/issues/89
>>> 
>>> code:
>>> 
>>> template 
>>> inline T signextend(const T x)
>>> {
>>> struct {T x:B;} s;
>>> return s.x = x;
>>> }
>>> 
>>> int sx5(int x) {
>>> return signextend(x);
>>> }
>>> 
>>> riscv asm:
>>> 
>>> sx5(int):
>>>   slliw a0,a0,3
>>>   slliw a0,a0,24
>>>   sraiw a0,a0,24
>>>   sraiw a0,a0,3
>>>   ret
>>> 
>>> hand coded riscv asm
>>> 
>>> sx5(int):
>>>   slliw a0,a0,27
>>>   sraiw a0,a0,27
>>>   ret
>>> 
>>> x86 asm:
>>> 
>>> sx5(int):
>>>   lea eax, [0+rdi*8]
>>>   sar al, 3
>>>   movsx eax, al
>>>   ret
>>> 
>>> hand coded x86 asm (no worse because the sar depends on the lea)
>>> 
>>> sx5(int):
>>>   shl edi, 27
>>>   sar edi, 27
>>>   movsx eax, dl
>> 
>> Huh? dl is not a subreg of edi!
>> 
>> s/edi/edx/ and it may work.
>> 
>> dil can also be used, but only on 64 bit.

In any case it is more straightforward given we are just dealing with an int 
bitfield member in this case so a regular move would suffice. The mov gets 
retired in rename (movsx and movzx can both be retired in rename I think). The 
same pattern could be used for all 5 function examples, with predictable 
performance, versus 5 different solutions for the 5 different bitfield 
positions (that’s actually very amazing that the pattern matching on x86 finds 
5 unique ways to lower this code pattern).  Padon my Intel syntax. e.g.

sx5(int):
 shl edi, 27
 sar edi, 27
 mov eax, edi

However I’m more interested in what the fix would be for RISC-V and Aarch64. 
It’s less obvious how to fix it on x86-64 because there are too many different 
ways. On RISC-V there is only one obvious way and that is to not artificially 
narrow the type when we are extracting from an int aligned struct bit field 
member to an int. Likewise on Aarch64, one sbfiz instruction.

> Sorry I meant dil on x86-64. I was sure that it was possible to extend into 
> another register. I have not done much i386 asm so I am unaware of the 
> constraints. Can the source and dest registers for movsx not differ on i386? 
> I thought they could.
> 
> In any case, the plot thickens…
> 
> I believe we have bugs on both RISC-V and Aarch64.
> 
> I found that it at least appears like it is transitioning to a char or short 
> as the break is at 24 and 16 depending on the width, and values over 16 work 
> as one would expect.
> 
> Here is an updated test program: https://cx.rv8.io/g/M9ewNf
> 
>   template 
>   inline T signextend(const T x)
>   {
> struct {T x:B;} s;
> return s.x = x;
>   }
> 
>   int sx3(int x) { return signextend(x); }
>   int sx5(int x) { return signextend(x); }
>   int sx11(int x) { return signextend(x); }
>   int sx14(int x) { return signextend(x); }
>   int sx19(int x) { return signextend(x); }
> 
> I filed a bug on riscv-gcc but I think it is target independent code given 
> there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a 
> single sbfx for all of the test functions.
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> Should I file a bug on GCC bugzilla given it looks to be target independent?
> 
> On RISC-V, the codegen is much more obviously wrong, but essentially the same 
> thing is happening on Aarch64 but there is only one additional instruction 
> instead of two.
> 
>   sx3(int):
> slliw a0,a0,5
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,5
> ret
>   sx5(int):
> slliw a0,a0,3
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,3
> ret
>   sx11(int):
> slliw a0,a0,5
> slliw a0,a0,16
> sraiw a0,a0,16
> sraiw a0,a0,5
> ret
>   sx14(int):
> slliw a0,a0,2
> slliw a0,a0,16
> sraiw a0,a0,16
> sraiw a0,a0,2
> ret
>   sx19(int):
> slliw a0,a0,13
> sraiw a0,a0,13
> ret



Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Michael Clark

> On 18 Aug 2017, at 10:41 PM, Gabriel Paubert  wrote:
> 
> On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
>> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
>> disabled RTF.
>> 
>> 
>> This one is quite interesting:
>> 
>> - https://cx.rv8.io/g/WXWMTG
>> 
>> It’s another target independent bug. x86 is using some LEA followed by SAR 
>> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
>> RISC-V seems like a nice target to try to fix this codegen for, as its less 
>> risk than attempting a fix in x86 ;-)
>> 
>> - https://github.com/riscv/riscv-gcc/issues/89
>> 
>> code:
>> 
>>  template 
>>  inline T signextend(const T x)
>>  {
>>  struct {T x:B;} s;
>>  return s.x = x;
>>  }
>> 
>>  int sx5(int x) {
>>  return signextend(x);
>>  }
>> 
>> riscv asm:
>> 
>>  sx5(int):
>>slliw a0,a0,3
>>slliw a0,a0,24
>>sraiw a0,a0,24
>>sraiw a0,a0,3
>>ret
>> 
>> hand coded riscv asm
>> 
>>  sx5(int):
>>slliw a0,a0,27
>>sraiw a0,a0,27
>>ret
>> 
>> x86 asm:
>> 
>>  sx5(int):
>>lea eax, [0+rdi*8]
>>sar al, 3
>>movsx eax, al
>>ret
>> 
>> hand coded x86 asm (no worse because the sar depends on the lea)
>> 
>>  sx5(int):
>>shl edi, 27
>>sar edi, 27
>>movsx eax, dl
> 
> Huh? dl is not a subreg of edi!
> 
> s/edi/edx/ and it may work.
> 
> dil can also be used, but only on 64 bit.

Sorry I meant dil on x86-64. I was sure that it was possible to extend into 
another register. I have not done much i386 asm so I am unaware of the 
constraints. Can the source and dest registers for movsx not differ on i386? I 
thought they could.

In any case, the plot thickens…

I believe we have bugs on both RISC-V and Aarch64.

I found that it at least appears like it is transitioning to a char or short as 
the break is at 24 and 16 depending on the width, and values over 16 work as 
one would expect.

Here is an updated test program: https://cx.rv8.io/g/M9ewNf

template 
inline T signextend(const T x)
{
  struct {T x:B;} s;
  return s.x = x;
}

int sx3(int x) { return signextend(x); }
int sx5(int x) { return signextend(x); }
int sx11(int x) { return signextend(x); }
int sx14(int x) { return signextend(x); }
int sx19(int x) { return signextend(x); }

I filed a bug on riscv-gcc but I think it is target independent code given 
there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a 
single sbfx for all of the test functions.

- https://github.com/riscv/riscv-gcc/issues/89

Should I file a bug on GCC bugzilla given it looks to be target independent?

On RISC-V, the codegen is much more obviously wrong, but essentially the same 
thing is happening on Aarch64 but there is only one additional instruction 
instead of two.

sx3(int):
  slliw a0,a0,5
  slliw a0,a0,24
  sraiw a0,a0,24
  sraiw a0,a0,5
  ret
sx5(int):
  slliw a0,a0,3
  slliw a0,a0,24
  sraiw a0,a0,24
  sraiw a0,a0,3
  ret
sx11(int):
  slliw a0,a0,5
  slliw a0,a0,16
  sraiw a0,a0,16
  sraiw a0,a0,5
  ret
sx14(int):
  slliw a0,a0,2
  slliw a0,a0,16
  sraiw a0,a0,16
  sraiw a0,a0,2
  ret
sx19(int):
  slliw a0,a0,13
  sraiw a0,a0,13
  ret




Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Gabriel Paubert
On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
> Sorry I had to send again as my Apple mailer is munging emails. I’ve disabled 
> RTF.
> 
> 
> This one is quite interesting:
> 
> - https://cx.rv8.io/g/WXWMTG
> 
> It’s another target independent bug. x86 is using some LEA followed by SAR 
> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
> RISC-V seems like a nice target to try to fix this codegen for, as its less 
> risk than attempting a fix in x86 ;-)
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> code:
> 
>   template 
>   inline T signextend(const T x)
>   {
>   struct {T x:B;} s;
>   return s.x = x;
>   }
> 
>   int sx5(int x) {
>   return signextend(x);
>   }
> 
> riscv asm:
> 
>   sx5(int):
> slliw a0,a0,3
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,3
> ret
> 
> hand coded riscv asm
> 
>   sx5(int):
> slliw a0,a0,27
> sraiw a0,a0,27
> ret
> 
> x86 asm:
> 
>   sx5(int):
> lea eax, [0+rdi*8]
> sar al, 3
> movsx eax, al
> ret
> 
> hand coded x86 asm (no worse because the sar depends on the lea)
> 
>   sx5(int):
> shl edi, 27
> sar edi, 27
> movsx eax, dl

Huh? dl is not a subreg of edi!

s/edi/edx/ and it may work.

dil can also be used, but only on 64 bit.

Gabriel


Re: Bit-field struct member sign extension pattern results in redundant

2017-08-17 Thread Michael Clark

> On 18 Aug 2017, at 11:13 AM, Michael Clark  wrote:
> 
> So it is a bug on arm too? and can be done with one sbfiz instruction? 
> (assuming I’m understand sbfiz from my first reading) e.g.
> 
>   sbfiz   w0, w0, 0, 2
>ret

Getting my 3’s and 5’s swapped. Confused by gcc.

sbfiz   w0, w0, 0, 4

>> Which is:
>> (insn 7 6 8 2 (set (reg:SI 80)
>>   (sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ])
>>   (const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi}
>>(expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
>>   (nil)))
>> 
>> (insn 14 9 17 2 (set (reg/i:SI 0 x0)
>>   (ashiftrt:SI (reg:SI 80)
>>   (const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3}
>>(expr_list:REG_DEAD (reg:SI 80)
>>   (nil)))
>> 
>> What I suspect is we get a QImode register in there and that is what
>> is happening to both x86 and riscv.
> 



Re: Bit-field struct member sign extension pattern results in redundant

2017-08-17 Thread Michael Clark

> On 18 Aug 2017, at 10:41 AM, Andrew Pinski  wrote:
> 
> On Thu, Aug 17, 2017 at 3:29 PM, Michael Clark  wrote:
>> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
>> disabled RTF.
>> 
>> 
>> This one is quite interesting:
>> 
>> - https://cx.rv8.io/g/WXWMTG
>> 
>> It’s another target independent bug. x86 is using some LEA followed by SAR 
>> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
>> RISC-V seems like a nice target to try to fix this codegen for, as its less 
>> risk than attempting a fix in x86 ;-)
>> 
>> - https://github.com/riscv/riscv-gcc/issues/89
>> 
>> code:
>> 
>>template 
>>inline T signextend(const T x)
>>{
>>struct {T x:B;} s;
>>return s.x = x;
>>}
>> 
>>int sx5(int x) {
>>return signextend(x);
>>}
> 
> on AARCH64 I get:
>sbfiz   w0, w0, 3, 5
>asr w0, w0, 3
>ret

So it is a bug on arm too? and can be done with one sbfiz instruction? 
(assuming I’m understand sbfiz from my first reading) e.g.

sbfiz   w0, w0, 0, 2
ret

> Which is:
> (insn 7 6 8 2 (set (reg:SI 80)
>(sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ])
>(const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi}
> (expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
>(nil)))
> 
> (insn 14 9 17 2 (set (reg/i:SI 0 x0)
>(ashiftrt:SI (reg:SI 80)
>(const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3}
> (expr_list:REG_DEAD (reg:SI 80)
>(nil)))
> 
> What I suspect is we get a QImode register in there and that is what
> is happening to both x86 and riscv.

Curious. Thanks.

I still need to learn more about gcc pattern matching and lowering.

I actually have this exact code pattern using a struct member to do arbitrary 
width sign extend as it’s the first method for arbitrary bit width extension, 
in Google and on the Stanford Bit Twiddling Hacks page. I understand i’m better 
off changing my pattern to an explicit shift left and shift right, given I’m 
now aware this is unambiguously the optimal lowering for ISAs with 1 cycle 
shifters and no field extract instructions, however given others are likely to 
stumble on the Stanford page, I wouldn’t be surprised if i’m not the only one 
using this pattern:

- https://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend

The x86 sequence is kind of trick and it’s no better off in instruction count 
by using two shifts. I’m also annoyed that BZHI in BMI2 is an arbitrary width 
zero extend versus a bit extend. I’d like a BEXT-like instruction that copies a 
bit at offset n, but given it is only two instruction saving it should also do 
a right shift so it can be used for signed and unsigned field extract (with a 
bit copy or bit zero operand bit).

Thanks,
Michael.



Re: Bit-field struct member sign extension pattern results in redundant

2017-08-17 Thread Andrew Pinski
On Thu, Aug 17, 2017 at 3:29 PM, Michael Clark  wrote:
> Sorry I had to send again as my Apple mailer is munging emails. I’ve disabled 
> RTF.
>
>
> This one is quite interesting:
>
> - https://cx.rv8.io/g/WXWMTG
>
> It’s another target independent bug. x86 is using some LEA followed by SAR 
> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
> RISC-V seems like a nice target to try to fix this codegen for, as its less 
> risk than attempting a fix in x86 ;-)
>
> - https://github.com/riscv/riscv-gcc/issues/89
>
> code:
>
> template 
> inline T signextend(const T x)
> {
> struct {T x:B;} s;
> return s.x = x;
> }
>
> int sx5(int x) {
> return signextend(x);
> }

on AARCH64 I get:
sbfiz   w0, w0, 3, 5
asr w0, w0, 3
ret

Which is:
(insn 7 6 8 2 (set (reg:SI 80)
(sign_extend:SI (ashift:QI (reg:QI 0 x0 [ x+3 ])
(const_int 3 [0x3] t.cc:5 557 {*extendsi_ashlqi}
 (expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
(nil)))

(insn 14 9 17 2 (set (reg/i:SI 0 x0)
(ashiftrt:SI (reg:SI 80)
(const_int 3 [0x3]))) t.cc:10 530 {*aarch64_ashr_sisd_or_int_si3}
 (expr_list:REG_DEAD (reg:SI 80)
(nil)))

What I suspect is we get a QImode register in there and that is what
is happening to both x86 and riscv.

Thanks,
Andrew

>
> riscv asm:
>
> sx5(int):
>   slliw a0,a0,3
>   slliw a0,a0,24
>   sraiw a0,a0,24
>   sraiw a0,a0,3
>   ret
>
> hand coded riscv asm
>
> sx5(int):
>   slliw a0,a0,27
>   sraiw a0,a0,27
>   ret
>
> x86 asm:
>
> sx5(int):
>   lea eax, [0+rdi*8]
>   sar al, 3
>   movsx eax, al
>   ret
>
> hand coded x86 asm (no worse because the sar depends on the lea)
>
> sx5(int):
>   shl edi, 27
>   sar edi, 27
>   movsx eax, dl
>   ret