[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 Oleg Endo changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #14 from Oleg Endo --- Fixed on 5, won't fix for 4.9.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #13 from Oleg Endo olegendo at gcc dot gnu.org --- Author: olegendo Date: Thu Oct 16 12:21:29 2014 New Revision: 216314 URL: https://gcc.gnu.org/viewcvs?rev=216314root=gccview=rev Log: gcc/ PR target/59401 * config/sh/sh.h (CALL_REALLY_USED_REGISTERS): Expand macro and set GBR to 0. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.h
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #11 from Oleg Endo olegendo at gcc dot gnu.org --- Created attachment 33723 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33723action=edit Make GBR call preserved by default (In reply to Oleg Endo from comment #10) (In reply to Kazumoto Kojima from comment #9) (In reply to Oleg Endo from comment #8) change the value for gbr in sh.h CALL_USED_REGISTERS from '1' to '0' and confirm that everything is still OK? The comment and document about CALL_USED_REGISTERS say that it must be a superset of FIXED_REGISTERS. CALL_REALLY_USED_REGISTERS might be a macro for that purpose. Right. In this case it's probably better to do it in sh_conditional_register_usage. It would be nice if '-fcall-saved-gbr' and '-fcall-used-gbr' still remained functional afterwards ... I'll give it a try. sh_conditional_register_usage is invoked after the -fcall-*-* options are processed. Defining the CALL_REALLY_USED_REGISTERS macro with the default values works as expected, i.e. -fcall-used-gbr still works. Kaz, could you please add the attached patch to your test run?
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #12 from Kazumoto Kojima kkojima at gcc dot gnu.org --- (In reply to Oleg Endo from comment #11) Kaz, could you please add the attached patch to your test run? No new failures for the top level make -k check on sh4-unknown-linux-gnu.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #5 from Oleg Endo olegendo at gcc dot gnu.org --- Ugh, there's actually another problem with this thing, I think: void other (void); int test0 (void) { int x = ((int*)__builtin_thread_pointer ())[2]; other (); return ((int*)__builtin_thread_pointer ())[5] + x; } compiles to: stcgbr,r8 mov.l.L3,r1 jsr@r1 mov.l@(8,r8),r9 mov.l@(20,r8),r0 use GBR value before the call addr9,r0 lds.l@r15+,pr mov.l@r15+,r9 rts mov.l@r15+,r8 By default, GBR is marked as call-clobbered. Currently, if the GBR mem optimization thing sees any calls in a function it will not form the GBR address. It's not optimal, but was supposed to produce at least correct code. Obviously the code above is wrong. The __builtin_thread_pointer insn is hoisted by something else before combine/split1. The patch from r216128 is incomplete. I'm checking it out.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Oleg Endo from comment #5) Ugh, there's actually another problem with this thing, I think: ... By default, GBR is marked as call-clobbered. Currently, if the GBR mem optimization thing sees any calls in a function it will not form the GBR address. It's not optimal, but was supposed to produce at least correct code. Obviously the code above is wrong. The __builtin_thread_pointer insn is hoisted by something else before combine/split1. The patch from r216128 is incomplete. I'm checking it out. No, hoisting __builtin_thread_pointer is OK. It only makes the following impossible to do: void foo (void* otherthread) { ... access current thread's context ... switch_thread (otherthread); ... access otherthread's context ... } The compiler doesn't know that switch_thread modifies the thread pointer. One workaround is: switch_thread (otherthread); void* newtp; __asm__ __volatile__ (stc gbr, %0 : =r (newtp)); __builtin_set_thread_pointer (newtp); However, this is just a hypothetical use case. I don't think there is an immediate use for that. Usually the thread pointer is not switched like that in the middle of a function, probably not even when implementing cooperative threading/fibers. Thus I think it would make sense to change the default behavior from GBR call clobbered to GBR call preserved. Actually making GBR call preserved is the only way for the GBR mem access optimization to make any sense because otherwise it will never form GBR mems if calls are present in a function, which defeats its purpose. It would be possible to make a better analysis of when GBR is used (before / after calls etc), but I think it's worth it in this case. Kaz, what's your opinion on making GBR to be call preserved by default?
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org --- (In reply to Oleg Endo from comment #6) Kaz, what's your opinion on making GBR to be call preserved by default? Looks OK to me for 5.0. It's clearly an ABI change but a change to the more robust direction and wouldn't be surprising to users.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #8 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Kazumoto Kojima from comment #7) (In reply to Oleg Endo from comment #6) Kaz, what's your opinion on making GBR to be call preserved by default? Looks OK to me for 5.0. It's clearly an ABI change but a change to the more robust direction and wouldn't be surprising to users. Yes, I was thinking to do that for 5.0, not for the released branches. Just to be on the safe side, for your next test run, could you please change the value for gbr in sh.h CALL_USED_REGISTERS from '1' to '0' and confirm that everything is still OK?
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org --- (In reply to Oleg Endo from comment #8) change the value for gbr in sh.h CALL_USED_REGISTERS from '1' to '0' and confirm that everything is still OK? The comment and document about CALL_USED_REGISTERS say that it must be a superset of FIXED_REGISTERS. CALL_REALLY_USED_REGISTERS might be a macro for that purpose.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #10 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Kazumoto Kojima from comment #9) (In reply to Oleg Endo from comment #8) change the value for gbr in sh.h CALL_USED_REGISTERS from '1' to '0' and confirm that everything is still OK? The comment and document about CALL_USED_REGISTERS say that it must be a superset of FIXED_REGISTERS. CALL_REALLY_USED_REGISTERS might be a macro for that purpose. Right. In this case it's probably better to do it in sh_conditional_register_usage. It would be nice if '-fcall-saved-gbr' and '-fcall-used-gbr' still remained functional afterwards ... I'll give it a try.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #2 from Oleg Endo olegendo at gcc dot gnu.org --- Author: olegendo Date: Sun Oct 12 23:14:07 2014 New Revision: 216128 URL: https://gcc.gnu.org/viewcvs?rev=216128root=gccview=rev Log: gcc/ PR target/59401 * config/sh/sh-protos (sh_find_equiv_gbr_addr): Use rtx_insn* instead of rtx. * config/sh/sh.c (sh_find_equiv_gbr_addr): Use def chains instead of insn walking. (sh_find_equiv_gbr_addr): Do nothing if input mem is already a GBR address. Use def chains to handle GBR clobbering call insns. gcc/testsuite/ PR target/59401 PR target/54760 * gcc.target/pr54760-5.c: New. * gcc.target/pr54760-6.c: New. * gcc.target/sh/pr59401-1.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54760-5.c trunk/gcc/testsuite/gcc.target/sh/pr54760-6.c trunk/gcc/testsuite/gcc.target/sh/pr59401-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/testsuite/ChangeLog
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 Oleg Endo olegendo at gcc dot gnu.org changed: What|Removed |Added CC||kkojima at gcc dot gnu.org --- Comment #3 from Oleg Endo olegendo at gcc dot gnu.org --- Kaz, I'd like to backport this to 4.8 and 4.9, but I can't decide in which form. The patch from r216128 could be applied to the branches. Another option is to prevent the sh_find_equiv_gbr_addr function from crossing labels. This is just a few lines, but of course it produces not so good results in some cases. Do you have an opinion on that?
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org --- (In reply to Oleg Endo from comment #3) Do you have an opinion on that? Looks the latter is enough for the released branches. I'm OK with eather way, though.
[Bug target/59401] [SH] GBR addressing mode optimization produces wrong code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59401 --- Comment #1 from Oleg Endo olegendo at gcc dot gnu.org --- An example where the base address is retrieved from the GBR in one basic block, but used in different basic blocks: struct tcb_t { int x, y, z, w; }; int test_01 (int a, tcb_t* b, int c) { tcb_t* tcb = (tcb_t*)__builtin_thread_pointer (); return (a 5) ? tcb-x : tcb-w; } By coincidence it produces correct code: mov r4,r0 tst #5,r0 bf .L7 mov.l @(12,gbr),r0 rts nop .L7: rts mov.l @(0,gbr),r0 A proper fix for this would be to collect all leaf values for the address registers in question from all predecessor basic blocks as it's done in the function sh_optimize_sett_clrt::find_last_ccreg_values in sh_optimize_sett_clrt.cc.