[Bug target/99531] [11 Regression] Performance regression since gcc 9 (argument passing / register allocation)

2024-05-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

--- Comment #12 from Oleg Endo  ---
(In reply to Oleg Endo from comment #11)
> 
> This caused PR 115148

I absolutely lack the imagination to see the connection of the change in #c6
and PR 115148.  This is the change in the output code that results in the
problem on SH:

.LVL108:
bt/s.L178   !
mov #-1,r0  !, 
@@ -1832,36 +1830,39 @@
.byte   .L215-.L190
.byte   .L181-.L190
 .LVL109:
-   .align 1
-.L192:
 .LBE111:
 .LBE110:
.loc 1 234 9
.loc 1 234 14
+   .align 1
+.L287:
+   .align 1
+.L288:

Any ideas or suggestions are appreciated.

[Bug target/115148] [12/13/14/15 Regression][SH]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #16 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #15)
> Created attachment 58258 [details]
> Diff of generated assembly without and with changes from PR99531
> 
> I have generated a diff that shows the difference in the generated assembly
> without and with the patch a7acb6dca941db2b1c135107dac3a34a20650d5c.

That's great, thanks a lot!

This is the problematic hunks, which causes the wrong code alignment:

.LVL108:
bt/s.L178   !
mov #-1,r0  !, 
@@ -1832,36 +1830,39 @@
.byte   .L215-.L190
.byte   .L181-.L190
 .LVL109:
-   .align 1
-.L192:
 .LBE111:
 .LBE110:
.loc 1 234 9
.loc 1 234 14
+   .align 1
+.L287:
+   .align 1
+.L288:

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

Oleg Endo  changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #14 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #13)
> I can even confirm that reverting a7acb6dca941db2b1c135107dac3a34a20650d5c
> (with some minor merge adjustments) on current git master fixes the problem
> for me.

Great.  Thanks a lot!

Vladimir, do you have any idea what could be going wrong here?  It seems after
your change in ira-costs.c, the emitted .align directive that is emitted in in
sh.cc (barrier_align) gets moved around which results in wrongly aligned
labels.  It's difficult for me to imagine the connection of the two ...

[Bug target/99531] [11 Regression] Performance regression since gcc 9 (argument passing / register allocation)

2024-05-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #11 from Oleg Endo  ---
(In reply to GCC Commits from comment #5)
> The master branch has been updated by Vladimir Makarov
> :
> 
> https://gcc.gnu.org/g:a7acb6dca941db2b1c135107dac3a34a20650d5c
> 
> commit r12-5944-ga7acb6dca941db2b1c135107dac3a34a20650d5c
> Author: Vladimir N. Makarov 
> Date:   Mon Dec 13 13:48:12 2021 -0500
> 
> [PR99531] Modify pseudo class cost calculation when processing move
> involving the pseudo and a hard register
> 
> Pseudo class calculated on the 1st iteration should not have a
> special treatment in cost calculation when processing move involving
> the pseudo and a hard register.
> 
> gcc/ChangeLog:
> 
> PR target/99531
> * ira-costs.c (record_operand_costs): Do not take pseudo class
> calculated on the 1st iteration into account when processing move
> involving the pseudo and a hard register.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/99531
> * gcc.target/i386/pr99531.c: New test.


This caused PR 115148

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #11 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #10)
> 
> Yes, definitely. Will take a little longer though as I need to fix my setup
> first.

Thanks in advance.  Wait for your update.

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-19 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #9 from Oleg Endo  ---
(In reply to Oleg Endo from comment #8)
> 
> It looks like something dpulicates the ".align 1" directive after the byte
> table and then also duplicates it.  Perhaps the directive is treated wrongly
> as an insn or something like that . 

Wanted to write: It looks like something duplicates the ".align 1" directive
after the byte table and then also sinks it further down in the code between
the other labels.

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-19 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #8 from Oleg Endo  ---
(In reply to Oleg Endo from comment #7)
> (In reply to Oleg Endo from comment #5)
> > 
> > The following hunk seems to fix the ".align 1" following the short byte 
> > table
> > 
> > diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> > index ef3c2e6791d..01349328171 100644
> > --- a/gcc/config/sh/sh.cc
> > +++ b/gcc/config/sh/sh.cc
> > @@ -5755,7 +5755,7 @@ barrier_align (rtx_insn *barrier_or_label)
> >return ((optimize_size
> >|| ((unsigned) XVECLEN (pat, 1) * GET_MODE_SIZE (GET_MODE
> > (pat))
> ><= (unsigned) 1 << (CACHE_LOG - 2)))
> > - ? 1 : align_jumps.levels[0].log);
> > + ? 2 : align_jumps.levels[0].log);
> >  }
> >  
> >rtx_insn *next = next_active_insn (barrier_or_label);
> > 
> > 
> 
> Sorry, I forgot that ".align 1" actually means alignment on 2-byte boundary.
> So that shouldn't be the issue there.

But indeed, there is something going on with the placement of the .align
directive after the short byte table.


GCC 11:

.L225:
.long   ov_read@PLT-(.LPCS29+2-.)
.align 2
.L188:
.byte   .L214-.L189
.byte   .L214-.L189
.byte   .L197-.L189
.LVL111:
.align 1
.L191:
.LBE111:
.LBE110:
.loc 1 234 9
.loc 1 234 14
! read-vorbis.c:234: ca_assert(v->size >= (off_t) n_read);



GCC 13:

.L225:
.long   ov_read@PLT-(.LPCS29+2-.)
.align 2
.L187:
.byte   .L213-.L188
.byte   .L213-.L188
.byte   .L179-.L188
.LVL110:
.LBE111:
.LBE110:
.loc 1 234 9
.loc 1 234 14
.align 1
.L285:
.align 1
.L286:
! read-vorbis.c:234: ca_assert(v->size >= (off_t) n_read);


This puts the labels LVL110, LBE111, LBE110 at the wrong odd byte alignment. 
Adding a '.align 1' manually after the byte table allows assembling the file
without errors.

It looks like something dpulicates the ".align 1" directive after the byte
table and then also duplicates it.  Perhaps the directive is treated wrongly as
an insn or something like that . 

Adrian, if you have the means, can you bisect it to pinpoint the commit where
this error starts occuring?

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-19 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #7 from Oleg Endo  ---
(In reply to Oleg Endo from comment #5)
> 
> The following hunk seems to fix the ".align 1" following the short byte table
> 
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index ef3c2e6791d..01349328171 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -5755,7 +5755,7 @@ barrier_align (rtx_insn *barrier_or_label)
>return ((optimize_size
>|| ((unsigned) XVECLEN (pat, 1) * GET_MODE_SIZE (GET_MODE
> (pat))
><= (unsigned) 1 << (CACHE_LOG - 2)))
> - ? 1 : align_jumps.levels[0].log);
> + ? 2 : align_jumps.levels[0].log);
>  }
>  
>rtx_insn *next = next_active_insn (barrier_or_label);
> 
> 

Sorry, I forgot that ".align 1" actually means alignment on 2-byte boundary. 
So that shouldn't be the issue there.

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-19 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

--- Comment #5 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #4)
> Created attachment 58244 [details]
> Preprocessed source from building read-vorbis.c with gcc-14 and -fverbose-asm
> 
> (In reply to Oleg Endo from comment #3)
> > Can you please try to compile with -fverbose-asm  maybe it will give a
> > first hint as to where the problematic code comes from.
> 
> Done. See attached pr115148-preprocessed-source-verbose.tgz.

Thanks!

I was able to reproduce it myself rather easily with my limited setup.
The issue seems to be with the function 'barrier_align' in sh.cc which
determines the alignment following the data table that is emitted in the code.

The following hunk seems to fix the ".align 1" following the short byte table

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index ef3c2e6791d..01349328171 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -5755,7 +5755,7 @@ barrier_align (rtx_insn *barrier_or_label)
   return ((optimize_size
   || ((unsigned) XVECLEN (pat, 1) * GET_MODE_SIZE (GET_MODE (pat))
   <= (unsigned) 1 << (CACHE_LOG - 2)))
- ? 1 : align_jumps.levels[0].log);
+ ? 2 : align_jumps.levels[0].log);
 }

   rtx_insn *next = next_active_insn (barrier_or_label);


However, I haven't checked for any advert side effects.


The line was modified last time in commit
e1fab8ba2337fd3bdd9c7112fae22d7ab001c2eb by myself, as part of the SH5 removal.

- ? 1 << TARGET_SHMEDIA : align_jumps_log);
+ ? 1 : align_jumps_log)

Before that, TARGET_SHMEDIA used to be defined as follows in sh.h:

#define TARGET_SHMEDIA (TARGET_SH5 && ! TARGET_SH1)

So for anything non-SH5 it should have evaluated to "0", which would produce "1
<< 0" which is "1" for the problematic line above.

Maybe it's just a latent bug that got exposed by some other SH independent
basic block rearrangement optimizations.


Can you please attach the .s file when compiled with GCC 11, just for
comparison/reference.

[Bug target/115148] [SH] [12/13/14 Regression]: libcanberra fails with 'unaligned opcodes detected in executable segment'

2024-05-18 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115148

Oleg Endo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-05-19
 Ever confirmed|0   |1

--- Comment #3 from Oleg Endo  ---
In your attached libcanberra_la-read-vorbis.s I can spot the following
suspicious code:


.LBB70:
.loc 1 44 9 is_stmt 0
mova.L52,r0
.LVL48:
mov.b   @(r0,r2),r1
extu.b  r1,r1
.LVL49:
brafr1
nop

.

.L82:
.long   ca_vorbis_get_nchannels@PLT-(.LPCS10+2-.)
.L86:
.long   free@PLT-(.LPCS9+2-.)
.L87:
.long   ov_clear@PLT-(.LPCS11+2-.)
.align 2
.L52:
.byte   .L54-.L53
.byte   .L54-.L53
.byte   .L51-.L53
.align 1  (1)
.L54:
.LBE70:
.LBE72:   (2)
.loc 1 61 24
bra .L50
mov #-7,r9


In (1) a lookup table of 3 bytes is emitted.  Because of the following ".align
1" directive, the following code at (2) will be misaligned.

Can you please try to compile with -fverbose-asm  maybe it will give a
first hint as to where the problematic code comes from.

[Bug target/115102] [SH] GCC misunderstands swap.b instruction

2024-05-15 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115102

Oleg Endo  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-05-15
 Status|UNCONFIRMED |NEW

--- Comment #2 from Oleg Endo  ---
(In reply to Richard Biener from comment #1)
> I believe this might be the middle-end using bswap32 (you can try to confirm
> for SH by looking at the dump generated by -fdump-tree-optimized).
> 

Haven't checked the tree dump, but combine log shows patterns with
'bswap-something' in it.  And the patterns that it tries to match are some sort
of partial bswaps.

On SH there is already a pattern:

(define_insn "swapbsi2"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(ior:SI (and:SI (match_operand:SI 1 "arith_reg_operand" "r")
(const_int -65536)) ;; 0x
(ior:SI (and:SI (ashift:SI (match_dup 1) (const_int 8))
(const_int 65280))
(and:SI (ashiftrt:SI (match_dup 1) (const_int 8))
(const_int 255)]

But combine doesn't arrive there and it stops at the intermediate combine
bridge pattern 

(define_insn_and_split "*swapbisi2_and_shl8"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(ior:SI (and:SI (ashift:SI (match_operand:SI 1 "arith_reg_operand" "r")
   (const_int 8))
(const_int 65280))
(match_operand:SI 2 "arith_reg_operand" "r")))]


Adding another variant of the existing pattern, which combine is looking for,
helps in this case:

(define_insn "*swapbisi2"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(ior:SI (ior:SI (and:SI (ashift:SI
   (match_operatnd:SI 1 "arith_reg_operand"
"r")
   (const_int 8))
(const_int 65280))
(and:SI (lshiftrt:SI (match_dup 1) (const_int 8))
(const_int 255)))
(and:SI (match_dup 1)
(const_int -65536]

I haven't checked if the original pattern is still used in other cases or not.

[Bug rtl-optimization/113533] Code generation regression after change for pr111267

2024-03-09 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #17 from Oleg Endo  ---
(In reply to Jeffrey A. Law from comment #16)
> Note that Jakub recently twiddled fwprop to throttle it back a little.  As a
> result the regressions seen in this BZ are no longer an issue.  I'm going to
> remove the regression marker.
> 
> Roger/Oleg, if y'all are going to continue on the costing work, we may want
> to keep this open.  Otherwise I wouldn't lose any sleep if we just closed it.

Thanks for the update on this, Jeff.  Even if the regression disappeared, the
cost calculations for memory accesses are clearly wrong in that spot.  So I'd
like to at least fix the core issues.  Just haven't had time to test it more
thoroughly for any side-effects yet.

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2024-03-02 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

Oleg Endo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #18 from Oleg Endo  ---
Should be fixed.

[Bug target/101737] SH4 -Os causes internal compiler error when building pixman

2024-03-02 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101737

Oleg Endo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||olegendo at gcc dot gnu.org
 Status|NEW |RESOLVED

--- Comment #9 from Oleg Endo  ---
Thanks everyone for staying on this and re-testing.  It should be fixed now on
the open branches.  If you want to use a version older than GCC 11, please
apply the committed patch to your GCC source.

[Bug target/111898] [12/13/14 Regression][SH] internal compiler error: Segmentation fault when building PostgreSQL 16

2024-02-29 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111898

Oleg Endo  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Oleg Endo  ---
OK, let's close it.  Thanks for staying on it.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-28 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #15 from Oleg Endo  ---
(In reply to Roger Sayle from comment #14)
> My apologies for not keeping folks updated on my thinking. Following Oleg's
> feedback, I've decided to slim down my proposed fix to the bare minimum, and
> postpone the other rtx_costs improvements until GCC 15 (or later), when I'll
> have more time to use to CSiBE to demonstrate the benefits/tradeoffs for -Os
> and -Oz.

Alright, I think we've got a couple of issues here.  It's a good idea to split
out and address the actual issue of this PR.  I'm curious to see the effects of
-Oz.


> For example, with fwprop about to transition to insn_cost, it
> would be good for the SH backend to provide a sh_insn_cost target hook.

Nothing against that.  I just really wanted to understand your line of thought
of altering the address costs (i.e. treating all addressing modes with equal
cost 0) in case of optimizing for size.

> 
> The current minimal patch to fix this specific regression is:
> 
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index 2c411c3..fba6c0fd465 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -3313,7 +3313,8 @@ sh_rtx_costs (rtx x, machine_mode mode
> ATTRIBUTE_UNUSED, i
> nt outer_code,
> {
>   *total = sh_address_cost (XEXP (XEXP (x, 0), 0),
> GET_MODE (XEXP (x, 0)),
> -   MEM_ADDR_SPACE (XEXP (x, 0)), true);
> +   MEM_ADDR_SPACE (XEXP (x, 0)), true)
> +  + COSTS_N_INSNS (1);
>   return true;
> }
>return false;
> 
> The minor complication is that as explained above this results in:
> PASS->FAIL: gcc.target/sh/pr59533-1.c scan-assembler-times addc 6
> PASS->FAIL: gcc.target/sh/pr59533-1.c scan-assembler-times cmp/pz 25
> PASS->FAIL: gcc.target/sh/pr59533-1.c scan-assembler-times shll 3
> PASS->FAIL: gcc.target/sh/pr59533-1.c scan-assembler-times subc 14
> 
> which were failures that were fixed (or silenced) by my solution to PR111267.

Yeah, I can imagine that the correction to mem-sign-extend costs will pop
something else.  It should be adapted of course as well.  I'll try to check
that out myself. Although it might take (a bit overloaded at the moment).


> I will note that although the scan-assembler-times complain, that this
> tweak to sh_rtx_costs reduces the total number of instructions in pr59533-1.c
> which (normally) indicates that its an improvement.
> 
> *** old.s   Thu Jan 25 22:54:11 2024
> --- new.s   Thu Jan 25 22:54:23 2024
> ***
> *** 15,23 
> .global test_01
> .type   test_01, @function
>   test_01:
> -   mov.b   @r4,r0
> -   extu.b  r0,r0
> mov.b   @r4,r1
> cmp/pz  r1
> mov #0,r1
> rts
> --- 15,22 
> .global test_01
> .type   test_01, @function
>   test_01:
> mov.b   @r4,r1
> +   extu.b  r1,r0
> cmp/pz  r1
> mov #0,r1
> rts
> ...

Which test is that exactly?  In pr59533-1.c 'test_01' is this:

int
test_01 (unsigned char* a)
{
  /* 1x cmp/pz, 1x addc  */
  return a[0] + (a[0] < 128);
}

... which is totally different from what you posted above.  Something seems to
got mixed up.



> Hence I'm looking into PR59533, which has separate tests for sh2a and !sh2a,
> and my latest discoveries are the -m2a isn't supported if I build gcc using
> --target=sh3-linux-gnu, and that --target=sh2a-linux-gnu doesn't
> automatically default to --target=sh2aeb-linux-gnu and instead gives a fatal
> error about "SH2A does not support little-endian" during the build.  All
> part (joy?) of the learning curve.

Yeah, every backend has its own flavor of setup and dealing with options.
You don't need to build for linux-gnu to check these issues or even run the
CSiBE set. Just configure the build it for --target=sh-elf and it should be
good to go.  By default it builds all multilibs and the default option is -m1
(= SH1).  You can then use -m2a -mb for SH2A.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-26 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #13 from Oleg Endo  ---
(In reply to Roger Sayle from comment #12)
> It should be mentioned that the fwprop fix for PR11267 also resolved several
> FAILs in gcc.target/sh/pr59533.c.  I mention this as tweaking the cost of
> SIGN_EXTEND in sh_rtx_costs interacts with the (redundant) extensions
> mentioned in the initial description of PR59533.

Good to know, thanks!  I'll try to look into it.


> It's still not entirely clear to me why we would want to squash the costs
> of addresses to 0 when optimizing for size?  What does effect does it have
> on the generated code?  I can't imagine how it would be possibly making
> any smaller code?

Roger, could you please comment on that?  I'm still somewhat puzzled...

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #11 from Oleg Endo  ---
(In reply to Roger Sayle from comment #10)

> I've found an interesting table of SH cycle counts (for different CPUs) at
> http://www.shared-ptr.com/sh_insns.html

Yeah, I know.  I did that ;)

> In my proposed patch, the address cost (1) when optimizing for size attempts
> to return the additional size of an instruction based on the addressing
> mode.  For register, and reg+reg addressing modes there is no size increase
> (overhead), and for adressing modes with displacements, and displacements to
> address pointers, there is a cost.

AFAIR, I've added the 'sh_address_cost' function.  The intention was/is to
encourage/discourage usage of certain address modes based on the side effects
and impact on the surrounding code.  All insns/addr modes have the same length
and basically same execution time.  However, e.g. @(reg+reg) has a constraint
on 'r0' usage, so I weighted that heavier.  If there's anything that could use
@(reg+disp) as an alternative, that'd be better in some cases. (not sure if
such optimizations actually are done...)

> (2) when optimizing for speed, address
> cost remains between 0 and 3, and is used to prioritize between (equivalent
> numbers of) instructions.  Normally, rtx_costs are defined in terms of
> COST_N_INSNS, which multiplies by 4.  Hence on many platforms a single
> instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (or
> a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is
> disfavored to a single instruction that doesn't reference memory,
> COSTS_N_INSNS(1)+0.

That's actually what sh_rtx_costs was supposed to do as well.  I think in usual
cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EXTEND
for the case of the mem load and it shows up only now, many years later.

It's still not entirely clear to me why we would want to squash the costs of
addresses to 0 when optimizing for size?  What does effect does it have on the
generated code?  I can't imagine how it would be possibly making any smaller
code?

With your patch, in case of the SIGN_EXTEND with mem operand, it would make the
address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operand as
well as mem operand.  So both insns are equally weighted and could be
considered interchangeable.  And we might bump into this type of regression
again, if some (future) optimization decides that it can interchange/substitute
insns of the same cost... 


> For example, SH currently reports multiplications as a single cycle operation,

That doesn't seem to be the case.  It's supposed to be using the function
'multcosts' in sh.cc, which returns at least a cost of '2'.  Note that on SH1
and SH2 there is no dynamic (barrel) shift.  So actually some multiplications
could be faster than stitched shifts.


> sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode 
> multiplications are the same as DImode multiplications.

I guess this is because SH doesn't have real DImode multiplication (64 x 64 ->
64/128 bit).  It can only do 32 x 32 -> 64 bit widening multiplication.  Any
real DImode multiplication will result in either expanded sequence to calculate
sum of particial products or a libcall, AFAIR

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #9 from Oleg Endo  ---
(In reply to Roger Sayle from comment #8)
> Created attachment 57190 [details]
> proposed patch
> 
> Proposed patch to provide a sane/saner set of rtx_costs for SH.  There's
> plenty more that could be done, but these changes are (more than) sufficient
> to resolve the code quality regression caused by improved fwprop.  If
> someone could try this out on SH, and report back the results, that would be
> great.


You've added differentiation for 'speed ?' in 'sh_address_cost'.  Like this
one.

   /* 'GBR + 0'.  Account one more because of R0 restriction.  */
   if (REG_P (x) && REGNO (x) == GBR_REG)
-return 2;
+return speed ? 2 : 0;

What's the intention here?  Why does the cost of the address computation
reduced when not optimizing for speed?  It distorts the address costs and makes
them all equal.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #7 from Oleg Endo  ---
(In reply to Roger Sayle from comment #6)
> To help diagnose the problem, I came up with this simple patch:

Thanks for looking into it!

> which then helps reveal that on sh3-linux-gnu with -O1 we see:

I think this will also happen on all sh-elf sub-targets, not necessarily
sh3-linux... if it helps anything ... 

> propagating insn 6 into insn 12, replacing:
> (set (reg:SI 174 [ _1 ])
> (sign_extend:SI (reg:QI 169 [ *a_7(D) ])))
> successfully matched this instruction to *extendqisi2_compact_snd:
> (set (reg:SI 174 [ _1 ])
> (sign_extend:SI (mem:QI (reg/v/f:SI 168 [ aD.1817 ]) [0 *a_7(D)+0 S1
> A8])))
> change is profitable (cost 4 -> cost 1)
> 
> which confirms Andrew's and Oleg's analyses above; the sh_rtx_costs function
> is a little odd... Reading from memory is four times faster than using a
> pseudo!?
> I'm investigating a "costs" patch for the SH backend.

Looks like sh_rtx_costs function assumes that the costs of the whole RTX are
summed up outside in the caller.

In sh_rtx_costs SIGN_EXTEND, ZERO_EXTEND, the 'sh_address_cost' is returned
directly for the MEM_P case. It should probably have went through COSTS_N_INSN
to get it into the same scale as for the arith_reg_operand case.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #5 from Oleg Endo  ---
(In reply to Andrew Pinski from comment #3)

> That seems to make the cost of a load/store if just an index, the same as the 
> cost
> as the index. Which definitely seems wrong. It should be the cost of the 
> load/store
> and the cost of the address formation. 

Yep, that doesn't seem to be accurate.

> The way aarch64 implements its _rtx_costs is that it cases SET and if the LHS 
> is a
> mem, then it is the cost of the store there and returns true (though you 
> might need
> to take into RHS if it can be more than just a register like). And then MEM
> includes the cost of the load.

That's what I'd also do intuitively.

> i386 does something similar too.

If every backend does that same thing, isn't it better to do it as a default
for everybody?  I can imagine the minimalist backends (like RX) will regress on
this case as well. (Haven't checked it though)

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #2 from Oleg Endo  ---
(In reply to Andrew Pinski from comment #1)
> This is most likely a cost model issue on sh3.

You mean this one (sh.cc, sh_rtx_costs)?

/* The cost of a mem access is mainly the cost of the address mode.  */
case MEM:
  *total = sh_address_cost (XEXP (x, 0), GET_MODE (x), MEM_ADDR_SPACE (x),
true);
  return true;

[Bug target/113193] [SH] ICE in gen_reg_rtx, at emit-rtl.cc:1177 with -mfcsa -funsafe-math-operations

2024-01-04 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113193

Oleg Endo  changed:

   What|Removed |Added

  Known to fail||13.2.1
Version|14.0|13.0
 Status|UNCONFIRMED |NEW
   Keywords||ra
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-05

--- Comment #1 from Oleg Endo  ---
I was able to reproduce the bug with the following compiler options on 13
branch:

-x c++ -std=c++17 -O3 -m4-single -ml -mfsca -mfsrra -funsafe-math-optimizations


combined.cpp: In function 'void transform(const model_transform&, const vec3*,
int, float*)':
combined.cpp:574:1: internal compiler error: in gen_reg_rtx, at
emit-rtl.cc:1171
0x61da53 gen_reg_rtx(machine_mode)
../../gcc/gcc/emit-rtl.cc:1171
0x9e8cc1 copy_to_reg(rtx_def*)
../../gcc/gcc/explow.cc:622
0x9d7727 operand_subword_force(rtx_def*, poly_int<1u, unsigned long>,
machine_mode)
../../gcc/gcc/emit-rtl.cc:1812
0xa0eddc emit_move_multi_word
../../gcc/gcc/expr.cc:4129
0xa12bdb gen_move_insn(rtx_def*, rtx_def*)
../../gcc/gcc/expr.cc:4359
0xccfcda gen_reload
../../gcc/gcc/reload1.cc:8614
0xcd8956 emit_output_reload_insns
../../gcc/gcc/reload1.cc:7667
0xcd8956 do_output_reload
../../gcc/gcc/reload1.cc:7939
0xcd8956 emit_reload_insns
../../gcc/gcc/reload1.cc:8003
0xcd8956 reload_as_needed
../../gcc/gcc/reload1.cc:4543
0xcdc460 reload(rtx_insn*, int)
../../gcc/gcc/reload1.cc:1047
0xb78004 do_reload
../../gcc/gcc/ira.cc:5975
0xb78004 execute
../../gcc/gcc/ira.cc:6149


With added -mlra:

ombined.cpp: In function 'void transform(const model_transform&, const vec3*,
int, float*)':
combined.cpp:574:1: internal compiler error: maximum number of generated reload
insns per insn achieved (90)
0xbd0699 lra_constraints(bool)
../../gcc/gcc/lra-constraints.cc:5258
0xbbc182 lra(_IO_FILE*)
../../gcc/gcc/lra.cc:2375
0xb77999 do_reload
../../gcc/gcc/ira.cc:5963
0xb77999 execute
../../gcc/gcc/ira.cc:6149

[Bug target/93808] [11/12/13/14 Regression] [SH] Ruby crashes with 'Illegal Instruction' with -fcrossjumping

2023-10-30 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93808

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #37 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #32)
> (In reply to John Paul Adrian Glaubitz from comment #31)
> > Ah, I forgot to add -O1 and -fno-cross-jumping to CFLAGS.
> > 
> > Are the builtin_traps() optimized out for -O2?
> > 
> > I'm building with the correct flags now.
> 
> Traps also didn't trigger with -O1 and -fno-cross-jumping.

Adrian, what happened to this issue in the end?  Do you remember?

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-23 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #17 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #16)

> Just saw the branch updates, thanks! FWIW, I did observe this issue in
> gcc-13 only but not gcc-11 or gcc-12. But that might be owed to the fact
> that Debian's gcc-11 and gcc-12 packages had not received the latest branch
> updates yet.

Yes, I know it has been observed on the latest GCC-13 only.  It was an
oversight in the code from the beginning, which got triggered by another change
in the compiler.  I haven't checked which change exactly, but I guess it must
have been some backported change. So if the same/similar thing gets backported
to GCC-11 or GCC-12 it might trigger this bug there, too.  Hence the preventive
measure.  It's a rather obvious bug/fix, so should be safe.

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-23 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #15 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #10)
> 
> Do we need anything else before the fix can be merged?

No, should be fine.  I'll leave this PR open for a while in case anything else
related pops up.  Thanks for testing.

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #9 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #8)
> 
> I built gcc-13 natively with the patch applied with a full bootstrap
> including stage2 and stage3. Full build log available in [1].
> 
> > [1] https://people.debian.org/~glaubitz/gcc-13_13.2.0-2+sh4_sh4.build.gz

Perfect! Thanks!

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #7 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #5)
> 
> I can confirm that this patch fixes the build of e2fsprogs with gcc-13 for
> me.

Great, thanks!  Do you have an option to run a compiler bootstrap or other
runtime test on sh4-linux with this?

[Bug target/65250] [SH] Improve comparisons followed by a negated cstore

2023-10-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65250

--- Comment #2 from Oleg Endo  ---
Briefly checked this one on GCC-13.  It generates the optimal sequence.

[Bug target/111892] [SH] [13 Regression] internal compiler error: Illegal instruction when building e2fsprogs

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111892

--- Comment #5 from Oleg Endo  ---
Adrian, can you please give it another go with the patch I've posted in PR
111001 ?

https://gcc.gnu.org/bugzilla/attachment.cgi?id=56164

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #4 from Oleg Endo  ---
Created attachment 56164
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56164=edit
sh_pr11001_fix.patch

Can you please try this patch?  It should solve the problem, but not sure if
there could be any unexpected side effects.

[Bug target/51708] [SH] CSE constants after combine/split

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51708

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #5 from Oleg Endo  ---
The latest patch https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543 by Alex
in PR 54089 inserts some additional passes in order to hoist constant loads for
shift operations out of loops.  Perhaps that would fix this issue, too.

[Bug target/111892] [SH] [13 Regression] internal compiler error: Illegal instruction when building e2fsprogs

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111892

Oleg Endo  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #4 from Oleg Endo  ---
This seems to be a dup of PR 111001.

Compiling the pre processed source with GCC-13 sh-elf cross compiler with 

sh-elf-gcc -m4 -Wdate-time -D_FORTIFY_SOURCE=2  -g -O2 -fPIE
-fstack-protector-strong -Wformat -Werror=format-security  -DHAVE_CONFIG_H -c
-dp -fdump-rtl-all -fdump-rtl-all-details rw_bitmaps.i

shows that the sh_treg_combine2 pass is hitting an infinite loop.

[Bug target/101177] sh3: internal compiler error: Illegal instruction

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101177

--- Comment #13 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #12)
> 
> That was super-fast! Thanks a lot!

Not really. ~2 years from patch to commit.  Sorry about that.

[Bug target/101177] sh3: internal compiler error: Illegal instruction

2023-10-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101177

Oleg Endo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #11 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #5)
> (In reply to Segher Boessenkool from comment #4)
> > (In reply to Harold Gutch from comment #1)
> > > I cannot claim to understand the details of this patch, but note that in 
> > > the
> > > second part of the diff the condition gets inverted.
> > 
> > Yes, that looks like a clear mistake.  I don't know why I did that, perhaps
> > at first I flipped the control flow there?  Who knows.  Sorry for it anyway!
> 
> Was this ever fixed?

It should be now.

[Bug target/81426] [SH]: unable to find a register to spill in class 'R0_REGS' when building webkit2gtk

2023-10-16 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81426

--- Comment #12 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #11)
> Created attachment 56123 [details]
> Preprocessed source from building GHC with gcc-13
> 
> This is still present in gcc-13, I just ran into it while cross-building the
> Haskell compiler GHC for sh4:
> 

Have you tried using the -mlra option for this build?

[Bug target/54089] [SH] Refactor shift patterns

2023-10-14 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #105 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #104)
> I've been thinking about something. I suspect that this patch could take
> work away from other patches. I'm sorry, I don't know how to express myself
> properly. I mean there's several patches that corrects shift patterns and
> 'tst' instruction generation (most of them are written by you, by the way).
> I suspect that some of them might not run anymore because this patch looks
> more general and should cover more cases, including yet unknown cases, I
> hope. And, in the end, dead code may appear because of it. I hope I was able
> to make my point clearly.

Yes, I understand what you're saying.  As other parts of the compiler evolve,
the RTL input that the backend code has to work with changes.  It's a normal
thing that happens during the course of development.  Some patterns might stop
working (especially those combine patterns are prone to that).  And sometimes
things magically start working because something got fixed somewhere else.

I've tried to add SH specific test cases to try and keep it in check.  Ideally
we'd have to go through all of the specific SH quirks in the backend
periodically, try to remove them one by one and run tests to see if the
patterns are still working/needed or whether they can be removed.

Let me know if you have more test cases (that work or don't work).

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-13 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

--- Comment #3 from Oleg Endo  ---
(In reply to Oleg Endo from comment #2)
> I've briefly tried on a local gcc version 13.1.1 20230714 
> 
> While it doesn't crash, the sh_treg_combine2 pass seems to be stuck in an
> infinite loop.  It produces a log file > 200 MByte.

It trips on the following insn:

(insn 1431 1430 179 19 (set (reg/v:DI 264 [ blk_cnt ])
(reg/v:DI 264 [ blk_cnt ])) "rw_bitmaps.c":341:11 -1
 (nil))

... which is a reg-reg move on itself (i.e. a nop).  For some reason this insn
is emitted by the split1 pass, which runs before sh_treg_combine2.

[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2

2023-10-12 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001

Oleg Endo  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-13
 Ever confirmed|0   |1
 CC||olegendo at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #2 from Oleg Endo  ---
I've briefly tried on a local gcc version 13.1.1 20230714 

While it doesn't crash, the sh_treg_combine2 pass seems to be stuck in an
infinite loop.  It produces a log file > 200 MByte.

[Bug target/54089] [SH] Refactor shift patterns

2023-10-12 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #103 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #102)
> Created attachment 55543 [details]
> Arithmetic right shift late expanding v2
> 
> Here's the patch. I hope I did not miss anything.
> 

Sorry, I've been busy with other things.  I think your patch is OK, but I
wanted to test it in more detail before committing it.

[Bug target/107704] [13/14 Regression] Testsuite regression after recent DCE changes

2023-10-12 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107704

--- Comment #5 from Oleg Endo  ---
(In reply to Jeffrey A. Law from comment #2)
> ACK.  And as I mentioned, the RTL form looks like it ought to be caught by
> the SH specific code to optimize T reg handling.  I don't care enough about
> the SH to try and debug a missed optimization though.

Haven't seen this one.  Will try to have a look at it, since I wrote those
parts.

[Bug target/111343] [SH] Including in C++23 causes an ICE with -m4-single-only

2023-09-26 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111343

Oleg Endo  changed:

   What|Removed |Added

 Target|SH4 |sh*-*-*
 CC||olegendo at gcc dot gnu.org

--- Comment #2 from Oleg Endo  ---
I think this would be a problem on all targets where DF is forced to be SF,
which I think is not a good thing at all.  It's also causing problems with
Fortan.

I'm not sure if it makes sense to add support for DF == SF in the libraries.

Perhaps at least on SH4 the -m4-single-only option shouldn't be used generally
to build whole projects, but rather only a few files where it's really needed.

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2023-07-17 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

--- Comment #17 from Oleg Endo  ---
(In reply to Rin Okuyama from comment #16)
> The fix has been cherry-picked for NetBSD:
> https://mail-index.netbsd.org/source-changes/2023/07/18/msg146078.html
> 
> Let me thank you again, Oleg!

Sure, you're welcome.  Let me know if there's anything else.  When you open an
SH related PR here in GCC's bugzilla, please add me to the CC list, or else I
might not see it timely.

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2023-07-13 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

Oleg Endo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Oleg Endo  ---
Fixed on master, GCC 13, GCC 12, GCC 11.

Should also be applied to older release branches that are maintained elsewhere.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-13 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #101 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #100)
> Created attachment 55513 [details]
> Arithmetic right shift late expanding
> 
> (In reply to Oleg Endo from comment #99)
> > Meanwhile, here's my iteration on your patch.
> 
> Thank you! I did all checks I did before, added testcases, and edited to fit
> the code style.

Looks OK.  Just 3 things:

> +++ gcc-13.1.0/gcc/testsuite/gcc.target/sh/pr54089-11.c   2023-07-07 
> 08:56:41.212345982 +0300
> @@ -0,0 +1,37 @@
> +/* Check that 'tst #64,r0' genrated.  */
> +/* { dg-do compile }  */
> +/* { dg-options "-O2" }  */

Please rename this test to pr49263... to have all tst #imm,r0 related tests in
one place.

Also:
  - 'genrated' -> 'generated'
  - space before opening paren of function args
  - 2 spaces indention
  - similarly, check code style of other added tests

> --- gcc-13.1.0.orig/gcc/testsuite/gcc.target/sh/pr54089-12.c  1970-01-01 
> 03:00:00.0 +0300

Can you try out Segher's suggestion in #c88 to make the regex look less
cluttered?

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2023-07-09 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

--- Comment #9 from Oleg Endo  ---
(In reply to Rin Okuyama from comment #8)
> (In reply to Oleg Endo from comment #7)
> 
> Hi Oleg. Thank you very much for your great work!
> 
> I've tested your patch with GCC 10.4.0 in this weekend, and it perfectly
> worked:
> - testcase attached to comment #0 compiled correctly
> - full regression tests on NetBSD/shle-current built by patched GCC
> successfully completed without any regression (compared with system built by
> GCC with that peephole optimization removed)
> - some 3rd party softwares (including perl 5.36.0, ruby 3.10.10, zsh 5.9,
> ...) self-built on shle host by GCC of same code base
> 
> Let me thank you again, Oleg!
> 
> rin

Thanks you so much for getting back!  Sorry for being late for 2 years with
this.  I will commit the patch to all open GCC versions.  Unfortunately GCC
10.5 has just been released recently and that was the last version 10.  So the
patch will be included from GCC version 11.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-09 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Oleg Endo  changed:

   What|Removed |Added

  Attachment #28207|0   |1
is obsolete||
  Attachment #28633|0   |1
is obsolete||

--- Comment #99 from Oleg Endo  ---
Created attachment 55506
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55506=edit
proposed patch

(In reply to Alexander Klepikov from comment #98)
> Created attachment 55503 [details]
> Testcase for SH specific loop optimization pass
> 

Thanks.  I'll check it out.
Meanwhile, here's my iteration on your patch.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #97 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #96)
> 
> Not really. 'expand_ashiftrt' could emit
> 
> mov  #imm, r1
> shad r1,   r0
> 
> and 'mov' instruction would be loop invariant if there's a loop. It looks
> like 'ashrsi3_libcall_expanded' is a bad name. I think name
> 'ashrsi3_n_call_expanded' would be more appropriate.

Ah, yes, indeed.  I'm cleaning up your patch and will rename it.

> 
> > However, there is one case from  your previous posts in PR 49263:
> > 
> > int f_rshift(char v){
> > return v >> S;
> > }
> > 
> > This will not work on SH2 and expand to a libcall.
> 
> I think you mean SH2A, because the same is going on with SH4.
> 
> >  On SH4 the combine pass
> > converts it into:
> > 
> > Successfully matched this instruction:
> > (set (reg:SI 166)
> > (ashiftrt:SI (reg/v:SI 164 [ v ])
> > (const_int 31 [0x1f])))
> > 
> > But it doesn't even try to do so with the 'ashrsi3_n_call' pattern.  Not
> > sure why ...
> 
> Well, the same thing is going on when using vanilla GCC. It looks like it's
> happening due to char sign extension. Then instruction is catched by
> 'ashrsi2_31' pattern. In another words, it looks to me like an optimization.

It can be fixed by adding another pattern.  I'll make another patch for that
later.

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

--- Comment #7 from Oleg Endo  ---
Created attachment 55498
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55498=edit
also change address register when applying the peephole

The attached patch should fix the issue without removing the peephole
optimization.  If the eliminated register is in the memory address operand,
then the memory operand will be updated respectively.

Does anybody still has access to the system where this bug happened? I'd
appreciate any testing support.

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

Oleg Endo  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

[Bug target/106161] Dubious choice of optimization strategy

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106161

Oleg Endo  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-07
 CC||olegendo at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Oleg Endo  ---
I can confirm this on GCC 13.

[Bug rtl-optimization/55190] ivopts causes loop setup bloat

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190

--- Comment #12 from Oleg Endo  ---
(In reply to Oleg Endo from comment #0)
> The following code:
> 
> struct X
> {
>   int a, b, c, d, e;
> };
> 
> int test (X* x, unsigned int c)
> {
>   int s = 0;
>   unsigned int i;
>   for (i = 0; i < c; ++i)
> s += x[i].b;
>   return s;
> }
> 
> results in:
> tst r5,r5
> bt/s.L4
> mov r5,r1
> shll2   r1
> add r5,r1
> mov.l   .L9,r2
> shll2   r1
> add #-20,r1
> shlr2   r1
> mul.l   r2,r1
> mov.l   .L10,r2
> add #4,r4
> mov #0,r0
> sts macl,r1
> and r2,r1
> add #1,r1
> .L3:
> mov.l   @r4,r2
> dt  r1
> add #20,r4
> bf/s.L3
> add r2,r0
> rts
> nop
> .L4:
> rts
> mov #0,r0
> .L11:
> .align 2
> .L9:
> .long   214748365
> .L10:
> .long   1073741823
> 

As of GCC 13, this still produces the same code with loop header bloat.

[Bug target/62233] unnecessary shift instructions to prepare loop counter

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62233

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #4 from Oleg Endo  ---
Looks like with GCC 13 it now converts the loop into a memset call.

typedef struct {
  int l;
  int b[258];
} S;

void clear2 (S* s )
{
   int i;
   int len = s->l;

   int* p = s->b;
   for (i = 0; i < len; i++)
   p[i] = 0;
}

compiling with GCC 13 -mlra -m2 -ml -O2:

mov.l   @r4+,r6 ! 6 [c=1 l=2]  movsi_i/5
cmp/pl  r6  ! 10[c=4 l=2]  cmpgtsi_t/0
bf.s.L1 ! 11[c=17 l=2]  *cbranch_t
shll2   r6  ! 16[c=4 l=2]  ashlsi3_k/1
mov.l   .L5,r0  ! 19[c=10 l=2]  movsi_i/0
jmp @r0 ! 20[c=5 l=2]  sibcall_valuei
mov #0,r5   ! 17[c=4 l=2]  movsi_i/2
.align 1
.L1:
rts 
nop ! 41[c=0 l=4]  *return_i
.L6:
.align 2
.L5:
.long   _memset


However, compiling with GCC 13 -mlra -m2 -ml -O2 -fno-builtin results in pretty
much the same code as in comment #2

[Bug target/65162] [10/11/12/13/14 Regression][SH] Redundant tests when storing bswapped T bit result in unaligned mem

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65162

--- Comment #13 from Oleg Endo  ---
(In reply to Oleg Endo from comment #1)
> (In reply to Oleg Endo from comment #0)
> > The following example is taken from libav, which contains quite some uses of
> > this code pattern.
> > 
> > typedef unsigned short int uint16_t;
> > union unaligned_16 { uint16_t l; } __attribute__((packed));
> > 
> > int
> > test (unsigned char* buf, int bits_per_component)
> > {
> >   (((union unaligned_16 *)(buf))->l) =
> > __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);
> > 
> >   return 0;
> > }
> > 
> 
> BTW, it should actually translate to something like:
> 
>   mov r6,r0
>   cmp/eq  #10,r0
>   movtr0
>   mov.b   r0,@(1,r4)
>   mov #0,r0
>   rts
>   mov.b   r0,@r4
> 

It's pretty much what GCC 13 produces (with -mlra -ml)

mov r5,r0   ! 51[c=4 l=2]  movsi_i/1
cmp/eq  #10,r0  ! 48[c=4 l=2]  cmpeqsi_t/1
movtr0  ! 49[c=4 l=2]  movt
mov #0,r1   ! 11[c=4 l=2]  movsi_i/2
mov.b   r0,@(1,r4)  ! 29[c=4 l=2]  *movqi_store_mem_disp04/0
mov #0,r0   ! 34[c=4 l=2]  movsi_i/2
rts ! 55[c=0 l=2]  *return_i
mov.b   r1,@r4  ! 18[c=4 l=2]  *movqi/5

[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #3 from Oleg Endo  ---
(In reply to Richard Biener from comment #2)
> It looks like the target cannot do arbitrary constant shifts so it benefits
> from shifting incrementally.  Even if that is exposed early enough for CSE
> the optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE
> here.

That's right.  SH1, SH2 doesn't have a barrel shifter and needs stitched
constant shifts.  In some cases we resort to a rt lib call to avoid code bloat.

There are a couple of opportunities when sharing intermediate results of
incremental / stitched shifts.  A while ago I had the idea of writing an RTL
pass that would try to figure that out...

In this case the shifts are expanded to RTL with the constant shift amounts
already propagated and the incremental shifts removed, so it's a bit harder to
undo this at the RTL level, but not impossible.

On SH3, SH4 dynamic shifts are available, but it requires another register +
constant load.  Incremental / stitched shifts would be always better on SH for
this test case.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #95 from Oleg Endo  ---
(In reply to Oleg Endo from comment #93)
> (In reply to Alexander Klepikov from comment #92)
> > I remembered why I used two different insns - first to eliminate infinite
> > loop with help of marking insn with attribute, and second because I could
> > not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> > functions but we have not 'set_attr_*'.
> 
> Yes, I thought so.  I'll give it a try myself with your patch ... please
> hold on.

Finally could have a look at it, sorry for the delay.

The infinite loop is in splitting of the 'ashrsi3_n_call' pattern with the
constant 1.  This is because 'ashrsi3_n_call' match overlaps with the 'shar'
pattern.

One quick fix is to put the condition into the first string.  A nicer way would
be to use a predicate which would constrain the operand[2] to "not 1".  But
it's the first and only use so quick version is OK.

In addition, that pattern (not only the split condition) should be matched only
before register allocation, so the 'can_create_pseudo_p' check is moved.

Another point is that you had the 'cfun->machine->ashrsi3_libcall_expanded++;'
in the wrong place.  It was counting up even if it wouldn't have emitted the
libcall.

This seems to work:

(define_insn_and_split "ashrsi3_n_call"
  [(set (match_operand:SI 0 "arith_reg_dest")
(ashiftrt:SI (match_operand:SI 1 "arith_reg_operand")
 (match_operand:SI 2 "const_int_operand")))
(clobber (reg:SI T_REG))]
  "TARGET_SH1 && can_create_pseudo_p () && operands[2] != const1_rtx"
  "#"
  "&& 1"
  [(const_int 0)]
{
  if (expand_ashiftrt (operands))
DONE;

  if (dump_file)
fprintf(dump_file, "ashrsi3_n_call: Expanding ashrsi3 libcall\n");

  cfun->machine->ashrsi3_libcall_expanded++;

  int value = INTVAL (operands[2]) & 31;

  char func[18];
  sprintf (func, "__ashiftrt_r4_%d", value);

  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);

  rtx wrk = gen_reg_rtx (Pmode);
  rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
  DONE;
})


However, there is one case from  your previous posts in PR 49263:

int f_rshift(char v){
return v >> S;
}

This will not work on SH2 and expand to a libcall.  On SH4 the combine pass
converts it into:

Successfully matched this instruction:
(set (reg:SI 166)
(ashiftrt:SI (reg/v:SI 164 [ v ])
(const_int 31 [0x1f])))

But it doesn't even try to do so with the 'ashrsi3_n_call' pattern.  Not sure
why ...

[Bug target/54089] [SH] Refactor shift patterns

2023-06-23 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #93 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #92)
> I remembered why I used two different insns - first to eliminate infinite
> loop with help of marking insn with attribute, and second because I could
> not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> functions but we have not 'set_attr_*'.

Yes, I thought so.  I'll give it a try myself with your patch ... please hold
on.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #90 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #89)
> Here's what I did
> ...

Can you attach it here as a patch please?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #85 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #83)
> Created attachment 55367 [details]
> Collapsed libcall and additional loop move invariants patch v3

Thanks for staying on it!  I've looked through the latest version of your
patch.

There are still formatting issues.  I will not point out one by one at this
time (but will so later in the end).  For now, can you try to run the style
check script on your changes?

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=contrib/check_GNU_style.sh

and also briefly cross check with https://gcc.gnu.org/codingconventions.html


As for the actual contents of the patch...


> bool
> expand_ashiftrt (rtx *operands)
> {
>   int value = INTVAL (operands[2]) & 31;
^^

Missing check for 'const_int' operand.  See other uses of 'CONST_INT_P'.

> if (dump_file)
>fprintf(dump_file, "ashrsi3: Emitting collapsed libcall\n");

This can be omitted.  It will be obvious in the RTL dump after the expand pass
because of the insn name, or not?


'expand_ashiftrt' is called only in the pattern 'define_expand "ashrsi3"', so
we can move all the code into the expander, like it's done in e.g.
'define_expand "lshrsi3"'.

Then the function 'expand_ashiftrt_individual' can be renamed back to
'expand_ashiftrt'.  

Actually, the 'define_expand "ashrsi3"' can just emit the
'ashrsi3_libcall_collapsed' directly.  In other words, change the original
'expand_ashiftrt' function tail:

>  wrk = gen_reg_rtx (Pmode);
>
> /* Load the value into an arg reg and call a helper.  */
>  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
>  sprintf (func, "__ashiftrt_r4_%d", value);
>  rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
>  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
>  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
>  return true;

to this:

>  emit_insn (gen_ashrsi3_libcall_collapsed (operands[0], operands[1], 
> GEN_INT(value)));
>  return true;

... but of course only if operand[2] is actually a 'const_int'.  If operand[2]
is not a constant, then the whole expander should fail and not emit anything. 
Which is what the original code was doing.  In case of shifts by non-constant
amounts, the middle-end will then expand the generic libcall for it (if I
remember correctly).

So basically, all we're doing here is adding a proxy pattern for the existing
'ashrsi3_n', that has a simpler structure and can be used by other passes like
combine easier.  Sounds plausible, but looking through the other shift
patterns, they would all need that treatment?

I think the 'define_insn "ashrsi3_libcall_collapsed"' and
'define_insn_and_split "ashrsi3_libcall_expand"' can be merged into a single
pattern 'define_insn_and_split "ashrsi3_n_call" and the function
'expand_ashrsi3_libcall' (the tail of the original 'expand_ashiftrt') can be
just put into the splitter code in the new "ashrsi3_n_call".

The splitter condition should include "&& can_create_pseudo_p ()" to make sure
it's ran before register allocation.

I think this will reduce the change set a bit and make the intention of the
patch a bit clearer.

> +/* { dg-final { scan-assembler 
> "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:"
>  { target { ! has_dyn_shift } } } }  */

Can you try to somehow write this in a simpler way?  Maybe omit some of the
register number matches, as they don't matter etc.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #80 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #79)
> 
> I mean that if a user run GCC with -O1 and we don't do SH specific loop move
> invariants pass at -O1, a user could look at binary (or at .S file) and find
> that not all invariants are moved out of the loops, because library
> addresses are hoisted after split1 pass. That would confuse a user even if
> RTL dump is generated, and he can decide that it's a bug into loop2 pass. I
> suggest to generate sh_loop dumps if RTL dump is requested and place there a
> message that sh_loop hoisting is not done due to optimization level setting.

I don't think users would get surprised or anything by lack of certain
optimization. There is no official set of guaranteed optimizations performed at
a particular -O level.  It's OK to output a message into the RTL dump of
course.  But everything else seems a bit overthinking the issue.

Actually the hoisting might not be always done.  E.g. when register pressure in
a loop is high, it might be better to not hoist the function address and keep
it inside of the loop to reduce register pressure.  But I'm not sure the loop
optimization passes are smart enough to do that (yet).

> 
> I checked several cases and they are because ld cannot link little endian
> binary. When I run failed command taken from log file, it always fails for
> linking little endian binary. But sometimes logs say that executing of
> little endian binary is successful. I'm at a loss - how is that even
> possible?

Ugh, sounds weird. Sometimes a particular binutils version is also no good. 
Have you tried using an older binutils?  Maybe it's more stable?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #78 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #77)
> > It'd be good if the newly added passes are ran only with -O2 or higher.
> 
> This can be confusing to users when they discover that not all invariants
> are moved out of loops. Then we should inform them about that at least.

I don't think the compiler reports any optimizations not done to the user at
lower optimization levels?  It's normal not to do certain optimizations at a
lower level, that's why we have -O0 -O1 -O2 ... or do you mean something else
by that?

> I'm thinking about this for some time. We know that we should potentially
> run additional loop optimization pass when we're splitting libcall. I did
> not find the way to know in what function we are splitting yet.

The compiler processes one function at a time, all passes at once.  It doesn't
mix passes of different functions.  So I think using global variable in sh.cc +
override 'set_current_function' should get the job done.  When the insn split
code is executed, just set the global flag in the SH specific function context.


> I see some strange new exec fails only at testsuite logs. Right now I'm
> trying to find the cause.

What's the configure line of your GCC build?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-16 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #76 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #75)
> I found that patch incorrectly works when '-O0 -fmove-loop-invariants' flags
> are set. Stock loop optimization passes do not run when '-O0' is specified
> desipte '-fmove-loop-invariants' is set. I'll do the same and recheck again.

It'd be good if the newly added passes are ran only with -O2 or higher.  Or
even better, if we can find a way to enable them only when actually needed. 
E.g. when it's splitting a shift insn that will potentially need the loop
optimizations again, set a flag in the function.

One way to do that could be adding SH specific function context class/struct to
store some per-function data for the compilation process.  Right now we'd have
only one entry, that is whether to run the additional passes again or not. 
This per-function 'SH context' can be stored as a global variable in sh.cc. 
Then override 'set_current_function' in the backend to reset the SH specific
per-function context.

However, to get better test coverage, it's better first let the additional loop
passes run all the time to uncover any potential issues.  Later the above can
be added as a supplementary optimization.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #72 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #71)
> 
> > * Do we really need to add that new source file sh_loop.cc with the wrapper
> > classes?  Can't we just instantiate the passes that are needed in-place when
> > they are registered?
> 
> Unfortunately, no.
> [...]

That's too bad.  Thanks for digging.  I haven't checked myself recently, I
thought this infrastructure has been improved a little bit, but seems not so. 
Perhaps SH is the only backend that wants to do this kind of thing.  Let's
leave it at that for now.  Later, after everything else has been cleared out we
can ask other GCC developers' opinion on this.

> Yes, of course! I'll add some 'tst #imm,0' presence tests.

That'd be good.

> Should I add hoist test? I do not yet understand how hoisting check should be
> performed, but I hope to find examples.

Also not sure how to do that at the moment.  Maybe scanning RTL dumps of the
new extra passes?  Or perhaps just a simple example as a start:

void test_func (int* a_ptr, int* b_ptr, unsigned int count)
{
  for (unsigned int i = 0; i < count; ++i)
a_ptr[i] >> 5;


  for (unsigned int i = 0; i < count; ++i)
b_ptr[i] >> 5;
}

... and then just scan the asm output and check against the expected number of
insns.  In this case, mov.l insn to load the function call address.


> Maybe some other tests? If I'm missing something, please tell me.

Right now not that I can think of.


> And as I understand, I should name test file something like
> pr54089-next-free-number.c, right?

Yes, that should be fine.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #70 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #69)
> Created attachment 55284 [details]
> Collapsed libcall and additional loop move invariants patch

Thanks for sharing the patch.  I also don't have the GCC SH test environment
setup at the moment, so it will take me some time to get up to speed on that.

There are some formatting nits in your patch, please check again:
* don't add / remove empty lines for no reason
* '{' goes on new line (follow surrounding code style)
* in comments: two spaces after the period
* closing ')' and ']' in the RTL should go on the same line (follow surrounding
style)

But more interestingly:
* Do we really need to add that new source file sh_loop.cc with the wrapper
classes?  Can't we just instantiate the passes that are needed in-place when
they are registered?

* Can you add some tests to the SH specific tests?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #67 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #66)
> (In reply to Alexander Klepikov from comment #65)
> > > I'm thinking of something else.
> > 
> > During kernel compile I got few internal errors in different passes. That
> > is, additional loop optimization pass patch is no good at all.
> 
> I am sorry, I am, as always, panicking ahead of time. I think I found what's
> wrong and do additional tests.

Don't worry.  I know what you're going through.  Been there myself ;)
Take your time.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #63 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #62)
> 
> My project is small and it compiles in under 1 second on both clean and
> patched GCC. sh.exp test suite mean run time is 117s on clean and 119s on
> patched. I did 1 warm-up run and then 3 main one-threaded runs for each
> task. I'm thinking of something else.

We have to consider that SH is also a linux target and it's also used to build
larger applications (and GCC itself ...).  It'd be good to not regress too much
in this regard.  One way to check it is the CSiBE test set.  I can help testing
your patch later.

> 
> Implementing features not supported by hardware will always be a tradeoff.

I'd say it's generally about how to find the best choice of
instructions/sequences.  With GCC's "waterfall" optimization it's impossible to
find the best solution because it doesn't have a way to compare the total cost
of one solution vs. another, for example.  We have to work around that ;)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #61 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #60)
> > Maybe it's easier to add some shift specific passes.
> 
> Well, I couldn't think of anything better and ported the loop optimization
> pass. More precisely, all loop optimization passes, because they are tied to
> each other. They run after split1 pass. And it worked!
> 
> I want to beleive that another loop optimization pass won't break anything
> because loops are already optimized.

Theoretically it should't ... 

> 
> If that's redundant, I thought of expanding libcall if it's inside a loop
> before loop optimization passes.

I'm a bit concerned about the increased compile time.  Have you observed
anything (negative) in this regard?

Loop, hoist, constant propagation etc (+ all the good stuff) optimizations are
done before insn combine / split1.  We could add a simple SH specific pass that
goes over the RTL and does stuff to shifts before those other optimizations. 
However, it might miss insn combine opportunities later on.  I'm thinking about
your tst #imm,r0 case from before.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-04 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #59 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #58)
> 
> Ouch. That's a real problem. Short loops can become slower on about 10%. But
> is it possible to detect a loop during expand pass? It looks like basic
> blocks have loop depth info, but I still don't now how to access a basic
> block.

There is 'BLOCK_FOR_INSN'

But I'm not sure it will be helpful during the initial expand pass.


> I would try that. I think loop optimiztion pass should be repeated after
> split pass. Do you know how to do it?

I don't know if we can simply repeat the loop optimizations.  I think I've
tried doing something like that before -- repeating some of the RTL passes, but
without any useful results.  It could also result in oscillations (pass A does
a transformation, pass B undoes it, then pass A would do it again ...).  Maybe
you can get better results.

There are already 2 SH specific passes 'sh_optimize_sett_clrt' and
'sh_treg_combine'.  You can look at how they are instantiated and inserted into
the RTL passes chain in 'register_sh_passes'.

Maybe it's easier to add some shift specific passes.

[Bug rtl-optimization/58517] ifcvt (after combine) puts ccreg clobbering insn between ccset insn and ccreg use

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58517

--- Comment #17 from Oleg Endo  ---
(In reply to Andrew Pinski from comment #16)
> Now I am curious if T_REG should be BImode rather than SImode ... Then
> ifcvt.cc would not have to be modified. I Know BImode is newer than when sh
> target was added but maybe if someone cares about the sh target could try to
> modify the backend to use BImode for T_REG here.

Yeah, I know. Back then I was entertaining the idea.  But T-bit being SImode is
so intertwined with all other things in the backend, it looks like a big job to
undo/redo that.  For example there are patterns that use the T bit in other
arithmetic insns patterns.  Would need to check what would happen to those. 
Sounds like a can of worms.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #57 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #56)
> 
> > In that test you can see the unnecessary push/pop of PR.  This is because
> > initially it wanted to expand as a library call, but then your patterns
> > decided to change the insns. This can or can't be avoided, depending on the
> > case.
> 
> That's an valuable observation, thank you! I found why. Sometimes 'collapsed
> libcall' instruction is emitted in combine pass and 'clobber PR' is set
> then. I commented this out and checked that file and it seems to be OK. I
> will rerun testsuite again to be sure.

Another thing ... the reason why it's desirable to expand into the libcall
earlier is to allow hoisting the function call address outside of loops and
things like that.  That happens only once at the RTL level during compilation
and that's before the combine pass.  Since there are no loops around
optimization passes in the compiler, it's always going to be a compromise.  But
it might be OK to repeat a particular optimization pass only for SH, if it
helps anything.

> 
> Now regarding tests that fail on vanilla GCC and pass with patch. It looks
> like something was changed in the middle of GCC and it started to produce
> other instruction patterns that cannot be catched by existing isns and thus
> expanded to library calls. The patch simply fixed it.

Yes, that happens every now and then, as folks work on target independent
optimizations in the middle-end.  SH backend hasn't been keeping up for a while
in this regard.

[Bug rtl-optimization/58517] ifcvt (after combine) puts ccreg clobbering insn between ccset insn and ccreg use

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58517

--- Comment #13 from Oleg Endo  ---
(In reply to Andrew Pinski from comment #12)
> Created attachment 55239 [details]
> Patch which does work on this
> 
> Though, I need to double to make sure it works for other cases still.
> sh is the case where we have a non-CC mode register for the flags which is
> why this was harder than other targets.

Thanks for looking at this (after 10 years .. wow, time flies).


> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 4622dba0121..ae1a0f7eb7c 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -1510,7 +1510,7 @@ (define_expand "movsicc"
>rtx op0 = XEXP (operands[1], 0);
>rtx op1 = XEXP (operands[1], 1);
>
> -  if (! currently_expanding_to_rtl)
> +  if (0 && ! currently_expanding_to_rtl)
>  FAIL;
>
>   switch (code)
> diff --git a/gcc/config/sh/t-sh b/gcc/config/sh/t-sh
> index 1cc8c39d2a8..8ffcc288cf3 100644
> --- a/gcc/config/sh/t-sh
> +++ b/gcc/config/sh/t-sh
> @@ -89,8 +89,8 @@ MULTILIB_OSDIRNAMES = \
>   m4a=!m4a $(OTHER_ENDIAN)/m4a=!$(OTHER_ENDIAN)/m4a \
>   m4al=!m4al $(OTHER_ENDIAN)/m4al=!$(OTHER_ENDIAN)/m4al
>  
> -$(out_object_file): gt-sh.h
> -gt-sh.h : s-gtype ; @true
> +#$(out_object_file): gt-sh.h
> +#gt-sh.h : s-gtype ; @true
>  
>  # Local Variables:
>  # mode: Makefile

Are these hunks intentional?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #55 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #54)
> Regarding testsuite. There's execute fails, but this is due to lack of
> multilib. I'll rebuild and retest.
> 
> There's also fail in pr64345-1.c, in this function:
> [...]
> 
> But it looks more like it's not a fail, but an optimization.

Yeah, that looks like an improvement.  There might be some SH specific tests
that scan for particular assembler outputs like that one.  Those tests would
need to be adjusted of course.

In that test you can see the unnecessary push/pop of PR.  This is because
initially it wanted to expand as a library call, but then your patterns decided
to change the insns. This can or can't be avoided, depending on the case.

> 
> But also there's tests that pass on patched but fail on clean. I'll take a
> closer look on them later after GCC and multilibs rebuild.

Yes, there are some (well ... quite a lot actually) tests that will also fail
on vanilla GCC on SH.  Hence the need to look at the test result delta
before/after patch.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #53 from Oleg Endo  ---
(In reply to Segher Boessenkool from comment #52)
> 
> There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for if
> you really do not want the instruction combiner to create particular
> instruction patterns (but it does nothing to prevent other parts of the
> compiler from doing the same!)

Thanks for pointing it out.  I knew I missed something recent ;)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #51 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #50)
> 
> Ooh, my bad! You are absolutely right. A function is inlined and division is
> converted to 4 'shar's which at combine pass are catched by 'define_insn
> "ashrsi3_libcall_collapsed"' which is little unexpected to me. Of course, it
> is expanded at 'split' pass to lib call. 5 and less right shifts should not
> convert to a lib call, but that can be easily corrected.
> 
> But maybe there is a way to exclude particular insn from combine pass? (I
> guess not).
> 

I don't think there is a direct way to hide patterns from the combine pass,
although I could be wrong, since I haven't been tracking the combine pass
development closely for a while.

Perhaps you can look at how the left-shifts are done as a reference.


> Now concerning GCC testsuite. I ran it in such way:
> 
> make check RUNTESTFLAGS="CFLAGS='$CFLAGS -I/usr/local/sh-elf/include/'
> --target_board=sh-sim\{-m2e,-m4\}\{-ml,-mb\}"
> 
> There are lots of failed tests on non-patched GCC. Even if I narrow tests
> list to sh.exp, it still fails a lot of times. At last there's nothing in
> logs that produce ICE in RTL. I'm not sure I could find a crash due to the
> patch at all, even if it were there.

To test it, you first use a known good version of GCC without your patch, build
the whole toolchain from scratch and run it.  Then collect the test results.

Then apply your patch, and repeat the process.  Collect the test results again.

Then compare both test results.  If there are any new changes in the test
results your patch is hitting something.

Usually I'd run the testsuite as follows:

make -k check
RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

(running make with multiple jobs here is useful, and still it will take some
hours)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #49 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #48)
> Let's continue discussion we started here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
> 
> I've found that my patch catches integer division. In short, it appears to
> work unpredictable. It looks like there's no easy way to catch right shift.

What do you mean it catches integer division? There could be several places
during compilation, where the compiler will try to convert integer division to
right shifts.  But it will not do so unless it's wrong.  The patterns you have
added shouldn't match a division operation.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-30 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #50 from Oleg Endo  ---
Actually, let's take any further discussion of shift patterns to PR 54089.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-30 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #49 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #48)
> I made tests (including *.c files from GCC testsuite) and everything looks
> fine for now. But I'm still afraid that pattern for 'ashrsi3_libcall_expand'
> is too wide. It is possible to narrow it down as much as possible by adding
> distinct attribute and set when emitting 'ashrsi3_libcall_collapsed' and
> then check it and fail if not set:
> 

For this kind of change, the whole GCC test suite needs to be ran for at least
big/little -m2,-m4 variants.


+(define_insn_and_split "ashrsi3_libcall_expand"
+  [(parallel [(set (match_operand:SI 0 "arith_reg_dest")
+   (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand")
+   (match_operand:SI 2 "const_int_operand"))
+   )(clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+  ])]

The 'parallel' construct looks strange.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-29 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #47 from Oleg Endo  ---
(In reply to Eric Gallager from comment #46)
> Reminder that patches go to the gcc-patches mailing list

It's OK.  We're just throwing around ideas, nothing final

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-28 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #44 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #43)
> 
> Well, not really. Look what's happening during expand pass when 'ashrsi3' is
> expanding. Function 'expand_ashiftrt' is called and what it does at the end
> - it explicitly emits 3 insns:
> [...]

> 
> By the way, right shift for integers expands to only one 'lshiftrt' insn and
> that's why it can be catched and converted to 'tst'.
> 

Yeah, I might have dropped the ball on the right shift patterns back then and
only reworked the left shift patterns to do that. 


> 
> As far as I understand these insns could be catched later by a peephole and
> converted to 'tstsi_t' insn like it is done for other much simple insn
> sequences.

It's the combine RTL pass and split1 RTL pass that does most of this work here.
 Peephole pass in GCC is too simplistic for this.


> 
> Thank you for your time and detailed explanations! I agree with you on all
> points. Software cannot be perfect and it's OK for GCC not to be super
> optimized, so this part better sholud be left intact.

We can't have it perfect, but we can try ;)

> 
> By the way, I tried to link library to my project and I figured out that
> linker is smart enough to link only necessary library functions even without
> LTO. So increase in size is about 100 or 200 bytes, that is acceptable.
> Thank you very much for help!

You're welcome.

Yes, to strip out unused library functions it doesn't need LTO.  But using LTO
for embedded/MCU  firmware, I find it can reduce the code size by about 20%. 
For example, it can also inline small library functions in your code (if the
library was also compiled with LTO).

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-26 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #42 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #41)
> 
> Thank you! I have an idea. If it's impossible to defer initial optimization,
> maybe it's possible to emit some intermediate insn and catch it and optimize
> later?

This is basically what is supposed to be happening there already.

However, it's a bit of a dilemma.

1) If we don't have a dynamic shift insn and we smash the constant shift into
individual 
stitching shifts early, it might open some new optimization opportunities, e.g.
by sharing intermediate shift results.  Not sure though if that actually
happens in practice though.

2) Whether to use the dynamic shift insn or emit a function call or use
stitching shifts sequence, it all has an impact on register allocation and
other instruction use.  This can be problematic during the course of RTL
optimization passes.

3) Even if we have a dynamic shift, sometimes it's more beneficial to emit a
shorter stitching shift sequence.  Which one is better depends on the
surrounding code.  I don't think there is anything good there to make the
proper choice.

Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR 52628,
PR 58017


> > BTW, have you tried it on a more recent GCC?  There have also been some
> > optimizations in the middle-end (a bit more backend independent) for this
> > kind of thing.
> 
> I tried 13.1 about week or two ago with the same result.
> 

Good to know.  Thanks for checking it!

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-25 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #40 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #39)
> 
> I'm sorry, but .md lang is too complicated for me.

Yeah, it looks alien at first sight.  But it's where a lot of things happen
w.r.t. instruction selection.

> It looks like with optimization enabled it converts bitwise AND to right
> shift and then optimizes again. But SH4 has 'shad' and 'shad' can be
> optimized to 'tst'. And SH2E has libcall instead of dynamic shift and libcll
> cannot be converted. It seems that very first optimization spoils things.
> 
> But when we have numerous 'shar' instructions, optimization joins the game
> again and converts them to 'tst'.

Yes, something like this is what I remember happening there.  I'll try to look
into the issue with your test cases and see if it's possible to add some
patterns to catch those.

BTW, have you tried it on a more recent GCC?  There have also been some
optimizations in the middle-end (a bit more backend independent) for this kind
of thing.

> You are absolutely right, the code will be larger when we do right shifts.
> But there's situations when you can't use library. For exmple, SH7055 engine
> control unit can barely contain the program. The library just won't fit.

Have you tried to use whole program optimization via -flto and
-ffunction-sections? It  should be able to strip out all unnecessary library
functions.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #38 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #37)
> 
> As far as I understand from GCC sources, function I patched
> 'expand_ashiftrt' process only constant values of shift. As you can see
> earlier, I added your and other examples to tests.

OK, thanks for the additional test cases.  It really looks like the way the
constant shift is expanded (via ashrsi3_n insn) on SH1/SH2 is getting in the
way.

The tst insn is mainly formed by the combine pass, which relies on certain insn
patterns and combinations thereof.  See also sh.md, around line 530.

You can look at the debug output with the -fdump-rtl-all option to see what's
happening in the RTL passes.

What your patch is doing is to make it not emit the ashrsi3_n insn for constant
shifts altogether?  I guess it will make code that actually needs those real
shifts larger, as it will always emit the whole shift stitching sequence.  That
might be a good thing or not.


> It looks like really
> dynamic shifts translate to library calls.

So the option name '-mdisable-dynshift-libcall' doesn't make sense.
What it actually does is more like '-mdisable-constshift-libcall'.

> 
> Should I test more exotic situations? If so, could you please help me with
> really exotic or weired examples?

Have you had a look at the existing test cases for this in
gcc/testsuite/gcc.target/sh ?

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #36 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #35)
> 
> As I understand, you meant the following (I added new functions at the end
> of file):
> 
> $ cat f.c
> #define ADDR 0x
> #define P ((unsigned char *)ADDR)
> #define FLAG 0x40
> #define S 7
> 

Yes, that's what I meant, thanks.

Can you also compile for little endian, and most of all, use -O2 optimization
level.  Some optimizations are not done below -O2.

> 
> I choose that name because I wanted to disable dynamic shift instructions
> for all CPUs. I did not hope that it will affect SH-2E code in such way.
> 
> I can rewrite the patch so that it only affects CPUs that do not support
> dynamic shifts and disables library call for dynamic shifts. I'll do it
> anyway because I need it badly. How do you think, what name of option would
> be better: '-mdisable-dynshift-libcall' or '-mhw-shift'? Or if you want,
> please suggest another one. Thank you!

'-mdisable-dynshift-libcall' would be more appropriate for what it tries to do,
I think.  Although that is a whole different issue ... but what is it going to
do for real dynamic shifts on SH2?

What kind of code is it supposed to emit for things like

unsigned int dyn_shift (unsigned int x, unsigned int y)
{
  return x << (y & 31);
}

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-23 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #34 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #33)
> Created attachment 55142 [details]
> Disable dynamic shift instructions patch

First of all, thanks for digging into this.  This issue has been a can of
worms, due to all sorts of reasons.

As you have discovered, some code patterns take the shift instruction route,
which is basically decided earlier by the various middle-end optimizations. 
There have also been some changes to those parts recently, but I haven't been
watching what it does for SH.

> unsigned int f(char v){
> return (v & FLAG) == FLAG;
> }

Bit-tests of char and unsigned char should be covered by the test-suite and
should work -- at least originally.  However, what might be triggering this
problem is the '== FLAG' comparison.  When I was working on this issue I only
used '== 0' or '!= 0' comparison.  I can imagine that your test code triggers
some other middle end optimizations and hence we get this.

Can you try to rewrite your test code to something like this?

unsigned int f(char v){
return (v & FLAG) != 0;
}

... and see if it generates the tst instruction as expected?


> I also compiled my project with '-m2e' and new '-mdisable-dynshift'
> options and tested it in SH-2E mone on Renesas's emulator that comes
> with High-performance Embedded Workshop and all unit tests run as expected.

I'm not sure what the purpose of the '-mdisable-dynshift' option would be here
though.  For '-m2e' TARGET_DYNSHIFT is already 'false'.  So the option seems
misnamed.

[Bug target/55295] [SH] Add support for fipr instruction

2023-03-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #17 from Oleg Endo  ---
(In reply to Luke Benstead from comment #16)
> OK so perhaps adding __builtin_sh_fipr is a good first step?

Yeah, you can try and see if it produces any useful results for you.

[Bug target/55295] [SH] Add support for fipr instruction

2023-03-21 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #15 from Oleg Endo  ---
It's been too long since I've looked into it.  Maybe some middle-end parts got
more suitable over the time, but it was difficult to make it generate the fipr
instruction automatically due to the reasons stated above.

[Bug target/101469] wrong code with "-O2 -fPIE" for SH

2021-07-18 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101469

--- Comment #4 from Oleg Endo  ---
(In reply to Rin Okuyama from comment #3)
> Thank you very much for your analysis!
> 
> If that peephole is removed, GCC 10.3 generates working codes.
> 
> NetBSD/shle built by this compiler works fine as far as I can see.
> I'm carrying out full regression tests of NetBSD on this system.
> (It takes about a day to finish.)
> 
> Is there a better fix than mechanically removing that peephole?

Thanks for reporting and analyzing this issue.

I was suspecting that to be a broken peephole pattern.  Of course it's better
to fix the broken pattern instead of deleting it completely.

I'll try to come up with a patch during this week.

[Bug target/97431] [SH] Python crashes with 'Segmentation fault with -finline-small-functions

2020-10-15 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97431

--- Comment #6 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #5)

So the difference seems to be only the -fPIC option?  Can you get the
preprocessed .i file with -save-temps ?

[Bug target/97431] [SH] Python crashes with 'Segmentation fault with -finline-small-functions

2020-10-15 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97431

Oleg Endo  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-10-15
 Status|UNCONFIRMED |NEW

--- Comment #4 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #3)
> r110x5  5
> r120x299f367c   698300028
> r130x8b82232
> r140x2a15f4a0   706081952
> 
> Disassembled function:
> 
> Dump of assembler code for function long_richcompare:
>0x29631850 <+0>: mov.l   r8,@-r15
>0x29631852 <+2>: mova0x296318e8 ,r0
>0x29631854 <+4>: mov.l   r12,@-r15
>0x29631856 <+6>: mov.l   @(4,r4),r1
>0x29631858 <+8>: mov.l   0x296318e8 ,r12 ! 
> 0x3c1d94
>0x2963185a <+10>:add #64,r1
>0x2963185c <+12>:mov.l   @(20,r1),r2
>0x2963185e <+14>:mov.l   0x296318ec ,r1   ! 
> 0x100
>0x29631860 <+16>:tst r1,r2
>0x29631862 <+18>:bt.s0x296318d8 
>0x29631864 <+20>:add r0,r12
>0x29631866 <+22>:mov.l   @(4,r5),r2
>0x29631868 <+24>:add #64,r2
>0x2963186a <+26>:mov.l   @(20,r2),r2
>0x2963186c <+28>:tst r1,r2
>0x2963186e <+30>:bt.s0x296318d8 
>0x29631870 <+32>:cmp/eq  r5,r4
>0x29631872 <+34>:bt.s0x29631940 
>0x29631874 <+36>:mov #5,r1
>0x29631876 <+38>:mov.l   @(8,r4),r7
>0x29631878 <+40>:mov.l   @(8,r5),r1
>0x2963187a <+42>:mov r7,r8
>0x2963187c <+44>:cmp/eq  r1,r7
>0x2963187e <+46>:bf.s0x296318e8 
>0x29631880 <+48>:sub r1,r8
> 
>
>
>0x296318e6 <+150>:   mov.l   @r15+,r8
> => 0x296318e8 <+152>:   mov.l   r9,@(16,r13)


Just to point out the obvious, r13 is never initialized nor referenced by
anything else throughout the function. What are the compiler options?

[Bug target/81426] [SH]: unable to find a register to spill in class 'R0_REGS' when building webkit2gtk

2020-03-07 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81426

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2020-03-08
 Status|UNCONFIRMED |NEW

[Bug target/93877] [9/10 Regression] [SH] webkit2gtk fails to build with "internal compiler error: in extract_constrain_insn, at recog.c:2211"

2020-02-26 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

Oleg Endo  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #12 from Oleg Endo  ---
This looks to be an LRA related issue (checking with GCC 10).  Without -mlra it
actually compiles fine.  Adding -mlra causes some issues:

sh-elf-g++ -m4 -ml -matomic-model=soft-gusa -O2 -fno-strict-aliasing
-fno-exceptions -fno-rtti -fPIC -fstack-protector-strong -std=c++17 -mlra

../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp: In static
member function 'static sh::TConstantUnion*
sh::TIntermConstantUnion::FoldAggregateBuiltIn(sh::TIntermAggregate*,
sh::TDiagnostics*)':
../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp:3747:1:
error: insn does not satisfy its constraints:
(insn 9652 9651 3240 388 (parallel [
(set (reg:SF 66 fr2 [orig:662 _764 ] [662])
(reg:SF 0 r0 [3347]))
(use (reg:SI 154 fpscr0))
(clobber (scratch:SI))
])
"../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp":3568:60 214
{movsf_ie}
 (expr_list:REG_DEAD (reg:SF 0 r0 [3347])
(nil)))
during RTL pass: cprop_hardreg
../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp:3747:1:
internal compiler error: in extract_constrain_insn, at recog.c:2195
0x5dee99 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc/gcc/rtl-error.c:108
0x5deec2 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc/gcc/rtl-error.c:118
0xe46907 extract_constrain_insn(rtx_insn*)
../../gcc/gcc/recog.c:2195
0xe4a204 copyprop_hardreg_forward_1
../../gcc/gcc/regcprop.c:802
0xe4b212 execute
../../gcc/gcc/regcprop.c:1367




sh-elf-g++ -m4 -ml -matomic-model=soft-gusa -O2  -std=c++17 -mlra

during RTL pass: reload
../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp: In member
function 'sh::TConstantUnion*
sh::TIntermConstantUnion::foldUnaryComponentWise(TOperator,
sh::TDiagnostics*)':
../Source/ThirdParty/ANGLE/src/compiler/translator/IntermNode.cpp:3047:1:
internal compiler error: in lra_set_insn_recog_data, at lra.c:1000
0xd1f5f1 lra_set_insn_recog_data(rtx_insn*)
../../gcc/gcc/lra.c:998
0xd2218f lra_get_insn_recog_data
../../gcc/gcc/lra-int.h:488
0xd2218f remove_scratches_1
../../gcc/gcc/lra.c:2058
0xd22247 lra_emit_move(rtx_def*, rtx_def*)
../../gcc/gcc/lra.c:503
0xd399bb curr_insn_transform
../../gcc/gcc/lra-constraints.c:4434
0xd3b20d lra_constraints(bool)
../../gcc/gcc/lra-constraints.c:5025
0xd227d4 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2437
0xcd20c9 do_reload
../../gcc/gcc/ira.c:5523
0xcd20c9 execute
../../gcc/gcc/ira.c:5709


Vlad, maybe you have some hint where to look?

[Bug target/93877] [9/10 Regression] [SH] webkit2gtk fails to build with "internal compiler error: in extract_constrain_insn, at recog.c:2211"

2020-02-26 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

--- Comment #10 from Oleg Endo  ---
I can't reproduce the first case with a standalone sh-elf compiler (GCC 9).

The compile flags mention

  -specs=/usr/share/dpkg/pie-compile.specs

... what's in that specs file?

[Bug target/55212] [SH] Switch to LRA

2020-02-26 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

--- Comment #125 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #122)
> 
> The build process is. Fixing broken packages isn't.
> 
> Everything is around 13.000 source packages.
>

> And, finally, the buildd capacity is limited on sh4. If this was POWER or
> SPARC, we would have plenty of resources for a complete rebuild.

Can you at least please try to check if the kernel builds & runs OK?

As for the other packages, is there a way to let it just try to rebuild
everything and get a list of the packages that failed to build.  We don't need
to sit down and fix all issues one by one for now.  At least we can compare the
list against the current list of non-building packages.  If the difference is
not too big, we can consider turning on LRA by default from GCC 10 on.

[Bug target/93808] [9 Regression] [SH] Ruby crashes with 'Illegal Instruction' with -fcrossjumping

2020-02-26 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93808

--- Comment #24 from Oleg Endo  ---
Adrian, have you had a chance to apply the test patch in comment #22 and re-run
it?

[Bug target/55212] [SH] Switch to LRA

2020-02-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

--- Comment #121 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #120)
> 
> That's a huge task which is why I prefer fixing issues on the fly.

I thought this is almost fully automated?

You can apply this patch to GCC to enable LRA by default for SH

--- sh.opt.orig 2019-03-04 10:09:09.244521000 +0900
+++ sh.opt  2020-02-26 10:19:55.414340269 +0900
@@ -299,5 +299,5 @@
 Enable the use of the fsrra instruction.

 mlra
-Target Report Var(sh_lra_flag) Init(0) Save
+Target Report Var(sh_lra_flag) Init(1) Save
 Use LRA instead of reload (transitional).

Then let it rebuild everything.


> 
> We are going to switch over anyway with GCC-12 or GCC-13, so I'm not sure
> what we are gaining if we continue to wait.

I don't get it.  You want to enable LRA by default for your distro, but you
don't want to rebuild all the packages with that modification?  It's like ..
shipping a distro with a new compiler, which potentially can't compile the
distro packages (correctly), so instead we secretly use an older compiler to
build the packages  ?  Is that normal practice?  Sorry, sounds like a mess
to me.

[Bug target/93877] [9/10 Regression] [SH] webkit2gtk fails to build with "internal compiler error: in extract_constrain_insn, at recog.c:2211"

2020-02-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

Oleg Endo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-02-26
 Ever confirmed|0   |1

--- Comment #9 from Oleg Endo  ---
The "movsf_ie" insn doesn't allow FP reg -> GP reg moves in its constraints,
but it allows it in its predicates "general_movdst_operand" and
"general_movsrc_operand".  I will have a look and try to correct it.

[Bug target/93877] [9/10 Regression] [SH] webkit2gtk fails to build with "internal compiler error: in extract_constrain_insn, at recog.c:2211"

2020-02-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

--- Comment #6 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #5)
> Hmm, there is one other source code file within webkit2gtk where
> -fno-move-loop-invariants does not help. I can only get that particular
> source file to be compiled if I disable all optimizations.

As always, just because the error message seems to be the same, doesn't mean
it's the same issue/bug in the compiler.  Please post the other failing code,
too.

[Bug target/55212] [SH] Switch to LRA

2020-02-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

--- Comment #119 from Oleg Endo  ---
(In reply to John Paul Adrian Glaubitz from comment #118)
> Is there anything that currently speaks against switching to LRA by default
> now?

There were a couple of code quality issues, which I would need to dig up and
re-check if it's still an issue.

> 
> I think, it would be a good idea for gcc-11 or even gcc-10 if possible. I
> will report all issues I'm running into since I am constantly monitoring
> package builds on Debian/sh4.

Last time this issue came up, I asked you to re-build all debian with -mlra
enabled

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00977.html

I'm still waiting for the results.

I've been telling you to turn on LRA for specific cases, but that doesn't mean
it's not gonna have regressions.  So please try using -mlra for *all* packages,
including kernel and let the system boostrap.

Please report your findings in this PR.

[Bug target/93876] [9 10 Regression] [SH] webkit2gtk fails to build with "error: unable to find a register to spill in class 'R0_REGS'"

2020-02-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93876

Oleg Endo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #8 from Oleg Endo  ---
Closing as 'won't fix'

[Bug target/93876] [9 10 Regression] [SH] webkit2gtk fails to build with "error: unable to find a register to spill in class 'R0_REGS'"

2020-02-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93876

--- Comment #6 from Oleg Endo  ---
If -mlra helps with this case, then let's close this one as 'WON'T FIX', ok?

  1   2   3   4   5   6   7   8   9   10   >