[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2023-04-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #29 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Palmer Dabbelt
:

https://gcc.gnu.org/g:a252c73b1f2b920d8a2ff2e8ca59989e06652fbb

commit r13-7220-ga252c73b1f2b920d8a2ff2e8ca59989e06652fbb
Author: Palmer Dabbelt 
Date:   Mon Apr 17 11:20:42 2023 -0700

RISC-V: Clean up the pr106602.c testcase

The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts).  This renames the original test
case and adds two more for those targets.

gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.

(cherry picked from commit 8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368)

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2023-04-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #28 from CVS Commits  ---
The trunk branch has been updated by Palmer Dabbelt :

https://gcc.gnu.org/g:8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368

commit r14-5-g8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368
Author: Palmer Dabbelt 
Date:   Tue Apr 11 11:04:56 2023 -0700

RISC-V: Clean up the pr106602.c testcase

The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts).  This renames the original test
case and adds two more for those targets.

gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-12-27 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #27 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-12-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #26 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:2e886eef7f2b5aadb00171af868f0895b647c3a4

commit r13-4907-g2e886eef7f2b5aadb00171af868f0895b647c3a4
Author: Raphael Moreira Zinsly 
Date:   Tue Dec 27 18:29:25 2022 -0500

RISC-V: Produce better code with complex constants [PR95632] [PR106602]

gcc/Changelog:
PR target/95632
PR target/106602
* config/riscv/riscv.md: New pattern to simulate complex
const_int loads.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr95632.c: New test.
* gcc.target/riscv/pr106602.c: New test.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-16 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #25 from Jeffrey A. Law  ---
To outline what we were thinking.  Yes, it's possible that 4->3 combinations
aren't supported.  I'd have to sit down with the combine sources to be sure.

So the alternate approach we came up with was to mimick ZBS's ability to load
up single bit constants with a define_insn_and_split, obviously splitting it
into a constant load + shift unconditionally if it's still around post-combine.

That's going to lead into a bigger question about constant loads.  Right now we
tend to break them down during expansion.  The advantage of that is the
components become CSE candidates.  The downside is it will tend to inhibit
instruction combination as we've seen in this case.  I don't have a good answer
on the best approach.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-16 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #24 from Jeffrey A. Law  ---
Just a note.  Raphael and I are going to poke at this from a different
direction.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-03 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #23 from Vineet Gupta  ---
(In reply to Jeffrey A. Law from comment #20)
> Yea, I think so (3 shifts).  Two for masking, one to put the bits in the
> right position.  Then we just have to figure out how to combine the initial
> shift with the 3 for the masking and ultimately result with just two :-)

Does combine handle 3 split outputs ?

If I hack my define_split to only have 2, I can see it split + matching the 2
insn (final codegen is obviously bogus)

  Trying 7, 8, 9 -> 10:
7: r78:DI=0x1
8: r79:DI=r78:DI<<0x26
  REG_DEAD r78:DI
  REG_EQUAL 0x40
9: r77:DI=r79:DI-0x40
  REG_DEAD r79:DI
  REG_EQUAL 0x3fffc0
   10: r75:DI=r76:DI:DI
  REG_DEAD r77:DI
  REG_DEAD r76:DI
  Failed to match this instruction:
  (set (reg:DI 75)
(and:DI (reg:DI 76)
(const_int 274877906880 [0x3fffc0])))
  Splitting with gen_split_37 (riscv.md:2089)
  Successfully matched this instruction:
  (set (reg:DI 77)
(lshiftrt:DI (reg:DI 76) <---
(const_int 6 [0x6])))
  Successfully matched this instruction:
  (set (reg:DI 75)
(ashift:DI (reg:DI 77)   <---
(const_int 32 [0x20])))

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-03 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #22 from Vineet Gupta  ---
(In reply to Vineet Gupta from comment #21)


> However 
>  try_combine
> recog_for_combine
>   recog_for_combine_1
>  recog( )
> 
> is failing and we get "Failed to recognize..."

False alarm: In a different successful split case, recog() does fail initially
too. I see split_insns correctly returning the first shift-right.

  (insn 22 0 23 (set (reg:DI 77)
(lshiftrt:DI (reg:DI 76)
(const_int 6 [0x6]))) -1
 (nil))

debugging

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-02 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #21 from Vineet Gupta  ---

I've been experimenting with 

(define_predicate "consecutive_bits_operand"
  (match_code "const_int")
{
unsigned HOST_WIDE_INT val = UINTVAL (op);
if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0)
return false;

return true;
})

(define_split
  [(set (match_operand:DI 0 "register_operand")
(and:DI (match_operand:DI 1 "register_operand")
(match_operand:DI 2 "consecutive_bits_operand")))
   (clobber (match_operand:DI 3 "register_operand"))]
  "TARGET_64BIT"
 [(set (match_dup 3)
   (lshiftrt:DI (match_dup 1) (match_dup 4)))
  (set (match_dup 3)
   (ashift:DI (match_dup 3) (match_dup 5)))
  (set (match_dup 0)
   (lshiftrt:DI (match_dup 3) (match_dup 6)))]
{
  unsigned HOST_WIDE_INT mask = UINTVAL (operands[2]);
  int r = ctz_hwi (mask);
  int l = clz_hwi (mask);
  operands[4] = GEN_INT (r);
  operands[5] = GEN_INT (r+l);
  operands[6] = GEN_INT (l);
})

However 
 try_combine
recog_for_combine
  recog_for_combine_1
 recog( )

is failing and we get "Failed to recognize..."

(gdb) call debug_rtx(x1)
(set (reg:DI 75)
(and:DI (reg:DI 76)
(const_int 274877906880 [0x3fffc0])))

I can't step through recog() cpp code since gen* insert #line for md file and
gdb steps through unrelated md code. FWIW cc1 is being built with -g3 -O0

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #20 from Jeffrey A. Law  ---
Yea, I think so (3 shifts).  Two for masking, one to put the bits in the right
position.  Then we just have to figure out how to combine the initial shift
with the 3 for the masking and ultimately result with just two :-)

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-02 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #19 from Vineet Gupta  ---
(In reply to Jeffrey A. Law from comment #13)
> Trying 7, 8, 9 -> 10:
> 7: r140:DI=0x1
> 8: r141:DI=r140:DI<<0x26
>   REG_DEAD r140:DI
>   REG_EQUAL 0x40
> 9: r139:DI=r141:DI-0x40
>   REG_DEAD r141:DI
>   REG_EQUAL 0x3fffc0
>10: r137:DI=r138:DI:DI
>   REG_DEAD r139:DI
>   REG_DEAD r138:DI
> Failed to match this instruction:
> (set (reg:DI 137)
> (and:DI (reg:DI 138)
> (const_int 274877906880 [0x3fffc0])))
> 
> 
> That's what we're looking for.  I think I had a wrong switch somewhere. 
> Match that with a define_split and you should be good to go.

I experimented with a define_split and initially saw it generate 

sllia5,a0,32
srlia0,a5,6 

but realized for both leading and trailing zeroes (either of arbitrary length)
we'll need 3 shifts ?

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-02 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #18 from Vineet Gupta  ---
(In reply to Jeffrey A. Law from comment #17)
> Vineet: I don't know your priorities and such, but I've got a junior
> engineer starting in a bit under two weeks and this would actually be a good
> issue for them to tackle as part of learning about GCC.  So if you want to
> set it aside, I can have someone poking at it fairly soon.

Seem slike I've spent way too much time on this than warranted so might as well
complete it :-)

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #17 from Jeffrey A. Law  ---
Vineet: I don't know your priorities and such, but I've got a junior engineer
starting in a bit under two weeks and this would actually be a good issue for
them to tackle as part of learning about GCC.  So if you want to set it aside,
I can have someone poking at it fairly soon.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #16 from Andrew Pinski  ---
(In reply to Vineet Gupta from comment #14)
> (In reply to Jeffrey A. Law from comment #13)
> > Trying 7, 8, 9 -> 10:
> > 7: r140:DI=0x1
> > 8: r141:DI=r140:DI<<0x26
> >   REG_DEAD r140:DI
> >   REG_EQUAL 0x40
> > 9: r139:DI=r141:DI-0x40
> >   REG_DEAD r141:DI
> >   REG_EQUAL 0x3fffc0
> >10: r137:DI=r138:DI:DI
> >   REG_DEAD r139:DI
> >   REG_DEAD r138:DI
> > Failed to match this instruction:
> > (set (reg:DI 137)
> > (and:DI (reg:DI 138)
> > (const_int 274877906880 [0x3fffc0])))
> > 
> > 
> > That's what we're looking for.  I think I had a wrong switch somewhere. 
> > Match that with a define_split and you should be good to go.
> 
> I think you are missing the original left shift 6.
> So insn 6 from my dumps is important if we are to match
> zero_extendsidi2_shifted which matches and+shift (iff 3>>2==0x).
> 
> But it feels like you agree on using REG_EQUAL note (if avail) to created
> merged pattern in try_combine() as opposed to my original thinking of only
> doing it if regular pattern match failed.

So what Jeff is saying is you just need a define_split which matches that final
set.

Something like:
(define_split
  [(set (match_operand:DI 0 "register_operand" "")
(and:SI (match_operand:SI 1 "register_operand" "")
(match_operand:SI 2 "shifted_mask_operand" "")))]
  ""
  [(set (match_operand:DI 0 

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #15 from Jeffrey A. Law  ---
THe hope is the shift 6 combines with the first shift you emit for

(set (reg:DI 137)
(and:DI (reg:DI 138)
(const_int 274877906880 [0x3fffc0])))

Conceptually this is similar to creating bridge patterns for the combiner.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #14 from Vineet Gupta  ---
(In reply to Jeffrey A. Law from comment #13)
> Trying 7, 8, 9 -> 10:
> 7: r140:DI=0x1
> 8: r141:DI=r140:DI<<0x26
>   REG_DEAD r140:DI
>   REG_EQUAL 0x40
> 9: r139:DI=r141:DI-0x40
>   REG_DEAD r141:DI
>   REG_EQUAL 0x3fffc0
>10: r137:DI=r138:DI:DI
>   REG_DEAD r139:DI
>   REG_DEAD r138:DI
> Failed to match this instruction:
> (set (reg:DI 137)
> (and:DI (reg:DI 138)
> (const_int 274877906880 [0x3fffc0])))
> 
> 
> That's what we're looking for.  I think I had a wrong switch somewhere. 
> Match that with a define_split and you should be good to go.

I think you are missing the original left shift 6.
So insn 6 from my dumps is important if we are to match
zero_extendsidi2_shifted which matches and+shift (iff 3>>2==0x).

But it feels like you agree on using REG_EQUAL note (if avail) to created
merged pattern in try_combine() as opposed to my original thinking of only
doing it if regular pattern match failed.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #13 from Jeffrey A. Law  ---

Trying 7, 8, 9 -> 10:
7: r140:DI=0x1
8: r141:DI=r140:DI<<0x26
  REG_DEAD r140:DI
  REG_EQUAL 0x40
9: r139:DI=r141:DI-0x40
  REG_DEAD r141:DI
  REG_EQUAL 0x3fffc0
   10: r137:DI=r138:DI:DI
  REG_DEAD r139:DI
  REG_DEAD r138:DI
Failed to match this instruction:
(set (reg:DI 137)
(and:DI (reg:DI 138)
(const_int 274877906880 [0x3fffc0])))


That's what we're looking for.  I think I had a wrong switch somewhere.  Match
that with a define_split and you should be good to go.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #12 from Jeffrey A. Law  ---
But insns 6, 7 and 8 aren't important here.  We have a REG_EQUAL on insn 9
which indicates that (reg:DI 77) has the value 0xffc0.  So I would
have expected combine to substitute that into the use of (reg:DI 77) in insn 10
which would give you a shot a recognizing the result is just a pair of shifts. 
But the REG_EQUAL doesn't seem to be used in that case.

My insn #s are different, but here's the relevant part of the dump:

Trying 9 -> 15:
9: r139:DI=r141:DI-0x40
  REG_DEAD r141:DI   
  REG_EQUAL 0x3fffc0 
   15: a0:DI=r138:DI:DI
  REG_DEAD r138:DI   
  REG_DEAD r139:DI
Failed to match this instruction: 
(set (reg/i:DI 10 a0)
(and:DI (plus:DI (reg:DI 141)
(const_int -64 [0xffc0]))
(reg:DI 138)))

BUt we know the (and (plus ...))) is just 0x3fffc0, what's not clear is why
we don't use that information and try to recognize 

(set (reg:DI 10) (and (reg:DI 138) (const_int 0x3fffc0))

Which is just a pair of shifts.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #11 from Vineet Gupta  ---

This definitely seems worth pursuing:

For bitmanip, following four insn seq matches

  Trying 7, 8, 6 -> 9:
7: r78:DI=0x40

8: r77:DI=r78:DI-0x40
  REG_EQUAL 0x3fffc0

6: r76:DI=r79:DI<<0x6

9: r75:DI=r76:DI:DI

  Successfully matched this instruction:
  (set (reg:DI 75)
 (and:DI (ashift:DI (reg:DI 79)

For non-bitmanip, a similar 3 insn seq doesn't

  Trying 9, 6 -> 10:
9: r77:DI=r79:DI-0x40
  REG_EQUAL 0x3fffc0

6: r76:DI=r80:DI<<0x6

   10: r75:DI=r76:DI:DI

  Failed to match this instruction:
  (set (reg:DI 75)
(and:DI (plus:DI (reg:DI 79)
(const_int -64 [0xffc0]))
(ashift:DI (reg:DI 80)
(const_int 6 [0x6]

If we re-ran last failing case with REQ_EQUAL note it would match.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #10 from Vineet Gupta  ---
At expand time, RV backend was NOT creating REQ_EQUAL note for 1 << 38

  (insn 8 7 9 2 (set (reg:DI 79)
(ashift:DI (reg:DI 78)
(const_int 38 [0x26]))) "../../../shifter.c":10:40 -1

However cse1 adds it anyways, so that was a false alarm.

  (insn 8 7 9 2 (set (reg:DI 79)
(ashift:DI (reg:DI 78)
(const_int 38 [0x26]))) "../../../../shifter.c":10:40 159 {ashldi3}
(expr_list:REG_EQUAL (const_int 274877906944 [0x40])


So when entering combine, we have following

  (insn 6 3 7 2 (set (reg:DI 76)
(ashift:DI (reg/v:DI 74 [ a ])
(const_int 6 [0x6])))

  (insn 7 6 8 2 (set (reg:DI 78)
(const_int 1 [0x1]))

  (insn 8 7 9 2 (set (reg:DI 79)
(ashift:DI (reg:DI 78)
(const_int 38 [0x26])))
(expr_list:REG_EQUAL (const_int 274877906944 [0x40])

  (insn 9 8 10 2 (set (reg:DI 77)
(plus:DI (reg:DI 79)
(const_int -64 [0xffc0])))
(expr_list:REG_EQUAL (const_int 274877906880 [0x3fffc0])

  (insn 10 9 15 2 (set (reg:DI 75)
(and:DI (reg:DI 76)
(reg:DI 77)))

Obviously the regular code flow can't merge/match 5.

The trailing note processing in combine replaces the pattern with available set
REG_EQUAL but it only handles two instructions at a time. This issue needs it
to handle three: insn 9, 10, 6

Perhaps we need to do the REQ_EQUAL note handling not seperately in the end,
but as part of each of 2 insn combine / 3 insn combine / 4 insn combine (after
the regular try_combine() fails for each of them ?

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #9 from Vineet Gupta  ---
(In reply to Jeffrey A. Law from comment #7)
> There's some code in combine that's supposed to take advantage of REG_EQUAL
> notes which is supposed to help with this kind of scenario.  Digging into
> that might help.

Awesome, thx for the pointer Jeff. 

Indeed combine has a note processing fallback in the end. RV backend is got
generating a note for the const split case (1 << 38). Let me see if I can fix
that.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #8 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #7)
> There's some code in combine that's supposed to take advantage of REG_EQUAL
> notes which is supposed to help with this kind of scenario.  Digging into
> that might help.

IMO that's the right way to go here.  I think anything we do in the RISC-V
backend would likely just push around the problem: splitting early seems like
generally the right thing to do, but we'll eventually trip up combine just by
virtue of making instruction sequences longer.  IIUC something like REG_EQUAL
would allow us to keep both flavors around so something can sort it out later.

That said, I've never really reached this deep into the middle end so it's all
a bit over my head and I decided it'd be saner to just close the file and say
nothing ;)

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-11-01

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #7 from Jeffrey A. Law  ---
There's some code in combine that's supposed to take advantage of REG_EQUAL
notes which is supposed to help with this kind of scenario.  Digging into that
might help.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-10-31 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #6 from Vineet Gupta  ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Vineet Gupta from comment #3)
> > Interestingly, if one builds for -march=rv64gc_zbs  # single bit extension
> > 
> > then the optimal code seq for bitmanip is generated, while no zbs
> > instructions are used.
> > 
> > foo2:
> > sllia5,a0,32
> > srlia0,a5,26
> > ret
> 
> Most likely because
> lia5,1
>   sllia5,a5,38
> Could be done using one instruction.
> 
> And then combine could do its thing. But with multiple instructions, it
> becomes harder to do.
> I have not looked into rtl dumps but this is just knowing what combine can
> do and such.

Looks like our updates collided, indeed that 1 << 38 is the issue here.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-10-31 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #5 from Vineet Gupta  ---
Gimple for the test is

  _1 = a_2(D) << 6;
  _3 = _1 & 274877906880;   //  0x3f__ffc0

And 0x3f__ffc0 = 0x40__ - 0x40

For !TARGET_ZBA there's a combiner pattern to match the seq of instruction
generated normally:  define_insn_and_split "zero_extendsidi2_shifted"

Trying 7, 8, 6 -> 9:
7: r78:DI=0x40
8: r77:DI=r78:DI-0x40
  REG_DEAD r78:DI
  REG_EQUAL 0x3fffc0
6: r76:DI=r79:DI<<0x6
  REG_DEAD r79:DI
9: r75:DI=r76:DI:DI
Successfully matched this instruction:
(set (reg:DI 75)
(and:DI (ashift:DI (reg:DI 79)
(const_int 6 [0x6]))
(const_int 274877906880 [0x3fffc0])))


However for !bitmanip, RTL expansion splittable_const_int_operand() breaks up
0x40__ into 1 << 38

(insn 7 6 8 2 (set (reg:DI 78)
  (const_int 1 [0x1])) 

(insn 8 7 9 2 (set (reg:DI 79)
  (ashift:DI (reg:DI 78)
(const_int 38 [0x26]))) 

So we end up with 5 tot insn, which combine can't

And splittable_const_int_operand() has following check

  if (TARGET_64BIT && TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (INTVAL (op)))
return false;

which explains why 

1. zba alone doesn't generate slli.uw
2. zbs generates optimal slli+srli although these have nothing to do with zbs

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-10-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

--- Comment #4 from Andrew Pinski  ---
(In reply to Vineet Gupta from comment #3)
> Interestingly, if one builds for -march=rv64gc_zbs  # single bit extension
> 
> then the optimal code seq for bitmanip is generated, while no zbs
> instructions are used.
> 
> foo2:
>   sllia5,a0,32
>   srlia0,a5,26
>   ret

Most likely because
li  a5,1
sllia5,a5,38
Could be done using one instruction.

And then combine could do its thing. But with multiple instructions, it becomes
harder to do.
I have not looked into rtl dumps but this is just knowing what combine can do
and such.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-10-31 Thread vineetg at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

Vineet Gupta  changed:

   What|Removed |Added

 CC||jeffreyalaw at gmail dot com,
   ||vineetg at rivosinc dot com
Summary|riscv: suboptimal codegen   |riscv: suboptimal codegen
   |for shift left, right, left |for
   ||zero_extendsidi2_shifted
   ||w/o bitmanip

--- Comment #3 from Vineet Gupta  ---
Interestingly, if one builds for -march=rv64gc_zbs  # single bit extension

then the optimal code seq for bitmanip is generated, while no zbs instructions
are used.

foo2:
sllia5,a0,32
srlia0,a5,26
ret