[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #105 from Oleg Endo --- (In reply to Alexander Klepikov from comment #104) > I've been thinking about something. I suspect that this patch could take > work away from other patches. I'm sorry, I don't know how to express myself > properly. I mean there's several patches that corrects shift patterns and > 'tst' instruction generation (most of them are written by you, by the way). > I suspect that some of them might not run anymore because this patch looks > more general and should cover more cases, including yet unknown cases, I > hope. And, in the end, dead code may appear because of it. I hope I was able > to make my point clearly. Yes, I understand what you're saying. As other parts of the compiler evolve, the RTL input that the backend code has to work with changes. It's a normal thing that happens during the course of development. Some patterns might stop working (especially those combine patterns are prone to that). And sometimes things magically start working because something got fixed somewhere else. I've tried to add SH specific test cases to try and keep it in check. Ideally we'd have to go through all of the specific SH quirks in the backend periodically, try to remove them one by one and run tests to see if the patterns are still working/needed or whether they can be removed. Let me know if you have more test cases (that work or don't work).
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #104 from Alexander Klepikov --- (In reply to Oleg Endo from comment #103) > Sorry, I've been busy with other things. I think your patch is OK, but I > wanted to test it in more detail before committing it. Oh, it's OK. By the way, your contribution is not less than mine. I've been thinking about something. I suspect that this patch could take work away from other patches. I'm sorry, I don't know how to express myself properly. I mean there's several patches that corrects shift patterns and 'tst' instruction generation (most of them are written by you, by the way). I suspect that some of them might not run anymore because this patch looks more general and should cover more cases, including yet unknown cases, I hope. And, in the end, dead code may appear because of it. I hope I was able to make my point clearly.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #103 from Oleg Endo --- (In reply to Alexander Klepikov from comment #102) > Created attachment 55543 [details] > Arithmetic right shift late expanding v2 > > Here's the patch. I hope I did not miss anything. > Sorry, I've been busy with other things. I think your patch is OK, but I wanted to test it in more detail before committing it.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Alexander Klepikov changed: What|Removed |Added Attachment #55503|0 |1 is obsolete|| Attachment #55513|0 |1 is obsolete|| --- Comment #102 from Alexander Klepikov --- Created attachment 55543 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543=edit Arithmetic right shift late expanding v2 Here's the patch. I hope I did not miss anything. Now considering regexp. I remade it using 'check-function-bodies' command and now it looks less confusing. I also found that in this testcase right shift expands to 'shad' instructions early on platforms that have dynamic shift support, so I deleted checks for those CPUs. I don't like that 'check-function-bodies' ignores asm labels but it's better than nothing.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #101 from Oleg Endo --- (In reply to Alexander Klepikov from comment #100) > Created attachment 55513 [details] > Arithmetic right shift late expanding > > (In reply to Oleg Endo from comment #99) > > Meanwhile, here's my iteration on your patch. > > Thank you! I did all checks I did before, added testcases, and edited to fit > the code style. Looks OK. Just 3 things: > +++ gcc-13.1.0/gcc/testsuite/gcc.target/sh/pr54089-11.c 2023-07-07 > 08:56:41.212345982 +0300 > @@ -0,0 +1,37 @@ > +/* Check that 'tst #64,r0' genrated. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ Please rename this test to pr49263... to have all tst #imm,r0 related tests in one place. Also: - 'genrated' -> 'generated' - space before opening paren of function args - 2 spaces indention - similarly, check code style of other added tests > --- gcc-13.1.0.orig/gcc/testsuite/gcc.target/sh/pr54089-12.c 1970-01-01 > 03:00:00.0 +0300 Can you try out Segher's suggestion in #c88 to make the regex look less cluttered?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #100 from Alexander Klepikov --- Created attachment 55513 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55513=edit Arithmetic right shift late expanding (In reply to Oleg Endo from comment #99) > Meanwhile, here's my iteration on your patch. Thank you! I did all checks I did before, added testcases, and edited to fit the code style.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Oleg Endo changed: What|Removed |Added Attachment #28207|0 |1 is obsolete|| Attachment #28633|0 |1 is obsolete|| --- Comment #99 from Oleg Endo --- Created attachment 55506 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55506=edit proposed patch (In reply to Alexander Klepikov from comment #98) > Created attachment 55503 [details] > Testcase for SH specific loop optimization pass > Thanks. I'll check it out. Meanwhile, here's my iteration on your patch.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Alexander Klepikov changed: What|Removed |Added Attachment #55382|0 |1 is obsolete|| --- Comment #98 from Alexander Klepikov --- Created attachment 55503 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55503=edit Testcase for SH specific loop optimization pass I modified testcase for hoisting tests. It contains two different dg-final commands at once. First uses 'scan-assembler' command and second uses 'check-function-bodies'. The difference is that 'scan-assembler' uses "plain" regexp and is more powerfull but more complicated to understand. 'check-function-bodies' is simpler but less powerfull and requires C++, if I'm not mistaken. They both work, but only one is needed because they do the same. I hope this will help.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #97 from Oleg Endo --- (In reply to Alexander Klepikov from comment #96) > > Not really. 'expand_ashiftrt' could emit > > mov #imm, r1 > shad r1, r0 > > and 'mov' instruction would be loop invariant if there's a loop. It looks > like 'ashrsi3_libcall_expanded' is a bad name. I think name > 'ashrsi3_n_call_expanded' would be more appropriate. Ah, yes, indeed. I'm cleaning up your patch and will rename it. > > > However, there is one case from your previous posts in PR 49263: > > > > int f_rshift(char v){ > > return v >> S; > > } > > > > This will not work on SH2 and expand to a libcall. > > I think you mean SH2A, because the same is going on with SH4. > > > On SH4 the combine pass > > converts it into: > > > > Successfully matched this instruction: > > (set (reg:SI 166) > > (ashiftrt:SI (reg/v:SI 164 [ v ]) > > (const_int 31 [0x1f]))) > > > > But it doesn't even try to do so with the 'ashrsi3_n_call' pattern. Not > > sure why ... > > Well, the same thing is going on when using vanilla GCC. It looks like it's > happening due to char sign extension. Then instruction is catched by > 'ashrsi2_31' pattern. In another words, it looks to me like an optimization. It can be fixed by adding another pattern. I'll make another patch for that later.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #96 from Alexander Klepikov --- (In reply to Oleg Endo from comment #95) > The infinite loop is in splitting of the 'ashrsi3_n_call' pattern with the > constant 1. This is because 'ashrsi3_n_call' match overlaps with the 'shar' > pattern. Yes, I confirm that. 'operands[2] != const1_rtx' works well, thank you! > Another point is that you had the > 'cfun->machine->ashrsi3_libcall_expanded++;' in the wrong place. It was > counting up even if it wouldn't have emitted the libcall. Not really. 'expand_ashiftrt' could emit mov #imm, r1 shad r1, r0 and 'mov' instruction would be loop invariant if there's a loop. It looks like 'ashrsi3_libcall_expanded' is a bad name. I think name 'ashrsi3_n_call_expanded' would be more appropriate. > However, there is one case from your previous posts in PR 49263: > > int f_rshift(char v){ > return v >> S; > } > > This will not work on SH2 and expand to a libcall. I think you mean SH2A, because the same is going on with SH4. > On SH4 the combine pass > converts it into: > > Successfully matched this instruction: > (set (reg:SI 166) > (ashiftrt:SI (reg/v:SI 164 [ v ]) > (const_int 31 [0x1f]))) > > But it doesn't even try to do so with the 'ashrsi3_n_call' pattern. Not > sure why ... Well, the same thing is going on when using vanilla GCC. It looks like it's happening due to char sign extension. Then instruction is catched by 'ashrsi2_31' pattern. In another words, it looks to me like an optimization.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #95 from Oleg Endo --- (In reply to Oleg Endo from comment #93) > (In reply to Alexander Klepikov from comment #92) > > I remembered why I used two different insns - first to eliminate infinite > > loop with help of marking insn with attribute, and second because I could > > not set attribute when emitting insn from C code. Whe have 'get_attr_*' > > functions but we have not 'set_attr_*'. > > Yes, I thought so. I'll give it a try myself with your patch ... please > hold on. Finally could have a look at it, sorry for the delay. The infinite loop is in splitting of the 'ashrsi3_n_call' pattern with the constant 1. This is because 'ashrsi3_n_call' match overlaps with the 'shar' pattern. One quick fix is to put the condition into the first string. A nicer way would be to use a predicate which would constrain the operand[2] to "not 1". But it's the first and only use so quick version is OK. In addition, that pattern (not only the split condition) should be matched only before register allocation, so the 'can_create_pseudo_p' check is moved. Another point is that you had the 'cfun->machine->ashrsi3_libcall_expanded++;' in the wrong place. It was counting up even if it wouldn't have emitted the libcall. This seems to work: (define_insn_and_split "ashrsi3_n_call" [(set (match_operand:SI 0 "arith_reg_dest") (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand") (match_operand:SI 2 "const_int_operand"))) (clobber (reg:SI T_REG))] "TARGET_SH1 && can_create_pseudo_p () && operands[2] != const1_rtx" "#" "&& 1" [(const_int 0)] { if (expand_ashiftrt (operands)) DONE; if (dump_file) fprintf(dump_file, "ashrsi3_n_call: Expanding ashrsi3 libcall\n"); cfun->machine->ashrsi3_libcall_expanded++; int value = INTVAL (operands[2]) & 31; char func[18]; sprintf (func, "__ashiftrt_r4_%d", value); emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); rtx wrk = gen_reg_rtx (Pmode); rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab; emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab)); emit_move_insn (operands[0], gen_rtx_REG (SImode, 4)); DONE; }) However, there is one case from your previous posts in PR 49263: int f_rshift(char v){ return v >> S; } This will not work on SH2 and expand to a libcall. On SH4 the combine pass converts it into: Successfully matched this instruction: (set (reg:SI 166) (ashiftrt:SI (reg/v:SI 164 [ v ]) (const_int 31 [0x1f]))) But it doesn't even try to do so with the 'ashrsi3_n_call' pattern. Not sure why ...
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #94 from Segher Boessenkool --- (In reply to Alexander Klepikov from comment #92) > I remembered why I used two different insns - first to eliminate infinite > loop with help of marking insn with attribute, and second because I could > not set attribute when emitting insn from C code. Whe have 'get_attr_*' > functions but we have not 'set_attr_*'. An attribute is part of the instruction *definition*, the define_insn; it isn't a property you put on one instance of it.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #93 from Oleg Endo --- (In reply to Alexander Klepikov from comment #92) > I remembered why I used two different insns - first to eliminate infinite > loop with help of marking insn with attribute, and second because I could > not set attribute when emitting insn from C code. Whe have 'get_attr_*' > functions but we have not 'set_attr_*'. Yes, I thought so. I'll give it a try myself with your patch ... please hold on.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #92 from Alexander Klepikov --- I remembered why I used two different insns - first to eliminate infinite loop with help of marking insn with attribute, and second because I could not set attribute when emitting insn from C code. Whe have 'get_attr_*' functions but we have not 'set_attr_*'.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Alexander Klepikov changed: What|Removed |Added Attachment #55367|0 |1 is obsolete|| --- Comment #91 from Alexander Klepikov --- Created attachment 55382 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55382=edit Draft libcall/sh_loop patch, causes infinite loop Please patch with '-p2'.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #90 from Oleg Endo --- (In reply to Alexander Klepikov from comment #89) > Here's what I did > ... Can you attach it here as a patch please?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #89 from Alexander Klepikov --- Here's what I did sh.md: (define_expand "ashrsi3" [(parallel [(set (match_operand:SI 0 "arith_reg_dest" "") (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "") (match_operand:SI 2 "nonmemory_operand" ""))) (clobber (reg:SI T_REG))])] "" { if (expand_ashiftrt (operands)) DONE; if (!CONST_INT_P (operands[2])) FAIL; int value = INTVAL (operands[2]) & 31; emit_insn (gen_ashrsi3_n_call (operands[0], operands[1], GEN_INT(value))); DONE; }) (define_insn_and_split "ashrsi3_n_call" [(set (match_operand:SI 0 "arith_reg_dest" "=r") (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "0") (match_operand:SI 2 "const_int_operand"))) (clobber (reg:SI T_REG))] "TARGET_SH1" "#" "&& can_create_pseudo_p ()" [(const_int 0)] { char func[18]; int value; rtx wrk; cfun->machine->ashrsi3_libcall_expanded++; if (expand_ashiftrt (operands)) DONE; if (!CONST_INT_P (operands[2])) FAIL; value = INTVAL (operands[2]) & 31; if (dump_file) fprintf(dump_file, "ashrsi3_n_call: Expanding ashrsi3 libcall\n"); wrk = gen_reg_rtx (Pmode); /* Load the value into an arg reg and call a helper. */ emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); sprintf (func, "__ashiftrt_r4_%d", value); rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab; emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab)); emit_move_insn (operands[0], gen_rtx_REG (SImode, 4)); DONE; }) sh.cc: bool expand_ashiftrt (rtx *operands) { rtx wrk; int value; if (TARGET_DYNSHIFT) { if (!CONST_INT_P (operands[2])) { rtx count = copy_to_mode_reg (SImode, operands[2]); emit_insn (gen_negsi2 (count, count)); emit_insn (gen_ashrsi3_d (operands[0], operands[1], count)); return true; } else if (ashiftrt_insns[INTVAL (operands[2]) & 31] > 1 + SH_DYNAMIC_SHIFT_COST) { rtx count = force_reg (SImode, GEN_INT (- (INTVAL (operands[2]) & 31))); emit_insn (gen_ashrsi3_d (operands[0], operands[1], count)); return true; } } if (!CONST_INT_P (operands[2])) return false; value = INTVAL (operands[2]) & 31; if (value == 31) { /* If we are called from abs expansion, arrange things so that we we can use a single MT instruction that doesn't clobber the source, if LICM can hoist out the load of the constant zero. */ if (currently_expanding_to_rtl) { emit_insn (gen_cmpgtsi_t (force_reg (SImode, CONST0_RTX (SImode)), operands[1])); emit_insn (gen_mov_neg_si_t (operands[0], get_t_reg_rtx ())); return true; } emit_insn (gen_ashrsi2_31 (operands[0], operands[1])); return true; } else if (value >= 16 && value <= 19) { wrk = gen_reg_rtx (SImode); emit_insn (gen_ashrsi2_16 (wrk, operands[1])); value -= 16; while (value--) gen_ashift (ASHIFTRT, 1, wrk); emit_move_insn (operands[0], wrk); return true; } /* Expand a short sequence inline, longer call a magic routine. */ else if (value <= 5) { wrk = gen_reg_rtx (SImode); emit_move_insn (wrk, operands[1]); while (value--) gen_ashift (ASHIFTRT, 1, wrk); emit_move_insn (operands[0], wrk); return true; } return false; } Now GCC falls into infinite loop when compiling this: int f_short_rshift_signed(char v){ return v >> 5; } It looks like it splits and then combines again.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #88 from Segher Boessenkool --- (In reply to Oleg Endo from comment #85) > > +/* { dg-final { scan-assembler > > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:" > > { target { ! has_dyn_shift } } } } */ > > Can you try to somehow write this in a simpler way? Maybe omit some of the > register number matches, as they don't matter etc. Do not use double-quoted strings unless you need interpolation? If you use {} around the string you do not need to backslash-quote (and double-quote) so much at all. [0-9] is \d whitespace is \s See the Tcl re_syntax manual page :-)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #87 from Segher Boessenkool --- (In reply to Oleg Endo from comment #53) > (In reply to Segher Boessenkool from comment #52) > > There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for > > if > > you really do not want the instruction combiner to create particular > > instruction patterns (but it does nothing to prevent other parts of the > > compiler from doing the same!) > > Thanks for pointing it out. I knew I missed something recent ;) g:78e4f1ad4e48, from 2012? Well, fairly recent, okay :-)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #86 from Alexander Klepikov --- (In reply to Oleg Endo from comment #85) >For now, can you try to run the style check script on your changes? Thank you, that's what I've been missing! > > bool > > expand_ashiftrt (rtx *operands) > > { > > int value = INTVAL (operands[2]) & 31; > ^^ > > Missing check for 'const_int' operand. See other uses of 'CONST_INT_P'. I was completely sure that condition '(match_operand:SI 2 "const_int_operand")' in define_insn_and_split guarantees that 'CONST_INT_P (operands[2]) == true'. > I think the 'define_insn "ashrsi3_libcall_collapsed"' and > 'define_insn_and_split "ashrsi3_libcall_expand"' can be merged into a single > pattern 'define_insn_and_split "ashrsi3_n_call" and the function > 'expand_ashrsi3_libcall' (the tail of the original 'expand_ashiftrt') can be > just put into the splitter code in the new "ashrsi3_n_call". > > The splitter condition should include "&& can_create_pseudo_p ()" to make > sure it's ran before register allocation. > > I think this will reduce the change set a bit and make the intention of the > patch a bit clearer. > Yes, it looks like it works in general. But first tests show that all constant dynamic shifts expand to library call for all targets, even for those with dynshift instructions. That's because 'ashrsi3_n_call' and 'ashrsi3' should check whether right shift could be expanded to individual shifts. They both should execute the code that I separated into 'expand_ashiftrt_individual' function to expand to individual shifts properly. > > +/* { dg-final { scan-assembler > > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:" > > { target { ! has_dyn_shift } } } } */ > > Can you try to somehow write this in a simpler way? Maybe omit some of the > register number matches, as they don't matter etc. I took a note, I'll deal with it later.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #85 from Oleg Endo --- (In reply to Alexander Klepikov from comment #83) > Created attachment 55367 [details] > Collapsed libcall and additional loop move invariants patch v3 Thanks for staying on it! I've looked through the latest version of your patch. There are still formatting issues. I will not point out one by one at this time (but will so later in the end). For now, can you try to run the style check script on your changes? https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=contrib/check_GNU_style.sh and also briefly cross check with https://gcc.gnu.org/codingconventions.html As for the actual contents of the patch... > bool > expand_ashiftrt (rtx *operands) > { > int value = INTVAL (operands[2]) & 31; ^^ Missing check for 'const_int' operand. See other uses of 'CONST_INT_P'. > if (dump_file) >fprintf(dump_file, "ashrsi3: Emitting collapsed libcall\n"); This can be omitted. It will be obvious in the RTL dump after the expand pass because of the insn name, or not? 'expand_ashiftrt' is called only in the pattern 'define_expand "ashrsi3"', so we can move all the code into the expander, like it's done in e.g. 'define_expand "lshrsi3"'. Then the function 'expand_ashiftrt_individual' can be renamed back to 'expand_ashiftrt'. Actually, the 'define_expand "ashrsi3"' can just emit the 'ashrsi3_libcall_collapsed' directly. In other words, change the original 'expand_ashiftrt' function tail: > wrk = gen_reg_rtx (Pmode); > > /* Load the value into an arg reg and call a helper. */ > emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); > sprintf (func, "__ashiftrt_r4_%d", value); > rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab; > emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab)); > emit_move_insn (operands[0], gen_rtx_REG (SImode, 4)); > return true; to this: > emit_insn (gen_ashrsi3_libcall_collapsed (operands[0], operands[1], > GEN_INT(value))); > return true; ... but of course only if operand[2] is actually a 'const_int'. If operand[2] is not a constant, then the whole expander should fail and not emit anything. Which is what the original code was doing. In case of shifts by non-constant amounts, the middle-end will then expand the generic libcall for it (if I remember correctly). So basically, all we're doing here is adding a proxy pattern for the existing 'ashrsi3_n', that has a simpler structure and can be used by other passes like combine easier. Sounds plausible, but looking through the other shift patterns, they would all need that treatment? I think the 'define_insn "ashrsi3_libcall_collapsed"' and 'define_insn_and_split "ashrsi3_libcall_expand"' can be merged into a single pattern 'define_insn_and_split "ashrsi3_n_call" and the function 'expand_ashrsi3_libcall' (the tail of the original 'expand_ashiftrt') can be just put into the splitter code in the new "ashrsi3_n_call". The splitter condition should include "&& can_create_pseudo_p ()" to make sure it's ran before register allocation. I think this will reduce the change set a bit and make the intention of the patch a bit clearer. > +/* { dg-final { scan-assembler > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:" > { target { ! has_dyn_shift } } } } */ Can you try to somehow write this in a simpler way? Maybe omit some of the register number matches, as they don't matter etc.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #84 from Alexander Klepikov --- I've forgot to say that first I ran all tests with SH specific loop optimization enabled when condition 'optimize && flag_move_loop_invariants' is true. And only then I ran all tests with final (at the moment) version of patch once more.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #83 from Alexander Klepikov --- Created attachment 55367 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55367=edit Collapsed libcall and additional loop move invariants patch v3 I digged other targets and I found that cfun->machine can point to user-defined structure. So I implemented per-function data through it. Loop optimizer now runs when -O2 or higher is specified and only for functions where ashrsi3 libcall was expanded. I also ran all tests I did before a few times.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #82 from Alexander Klepikov --- I have read the docs and other targets' code, and the puzzle finally came together. A struct with additional current function context is a really good idea! I'll implement it in a couple of days.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #81 from Alexander Klepikov --- (In reply to Oleg Endo from comment #78) > The compiler processes one function at a time, all passes at once. It > doesn't mix passes of different functions. So I think using global variable > in sh.cc + override 'set_current_function' should get the job done. When > the insn split code is executed, just set the global flag in the SH specific > function context. I made a proof of concept using a global variable only. It is defined in sh.cc and it is defined as extern in sh_loop.cc. It is set when splitting is done and it is cleared when sh_loop_done is called, i.e. at the end of loop optimization. Do we really need 'set_current_function'? Because so far it seems a little excessive.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #80 from Oleg Endo --- (In reply to Alexander Klepikov from comment #79) > > I mean that if a user run GCC with -O1 and we don't do SH specific loop move > invariants pass at -O1, a user could look at binary (or at .S file) and find > that not all invariants are moved out of the loops, because library > addresses are hoisted after split1 pass. That would confuse a user even if > RTL dump is generated, and he can decide that it's a bug into loop2 pass. I > suggest to generate sh_loop dumps if RTL dump is requested and place there a > message that sh_loop hoisting is not done due to optimization level setting. I don't think users would get surprised or anything by lack of certain optimization. There is no official set of guaranteed optimizations performed at a particular -O level. It's OK to output a message into the RTL dump of course. But everything else seems a bit overthinking the issue. Actually the hoisting might not be always done. E.g. when register pressure in a loop is high, it might be better to not hoist the function address and keep it inside of the loop to reduce register pressure. But I'm not sure the loop optimization passes are smart enough to do that (yet). > > I checked several cases and they are because ld cannot link little endian > binary. When I run failed command taken from log file, it always fails for > linking little endian binary. But sometimes logs say that executing of > little endian binary is successful. I'm at a loss - how is that even > possible? Ugh, sounds weird. Sometimes a particular binutils version is also no good. Have you tried using an older binutils? Maybe it's more stable?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #79 from Alexander Klepikov --- (In reply to Oleg Endo from comment #78) > (In reply to Alexander Klepikov from comment #77) > > > It'd be good if the newly added passes are ran only with -O2 or higher. > > > > This can be confusing to users when they discover that not all invariants > > are moved out of loops. Then we should inform them about that at least. > > I don't think the compiler reports any optimizations not done to the user at > lower optimization levels? It's normal not to do certain optimizations at a > lower level, that's why we have -O0 -O1 -O2 ... or do you mean something > else by that? > I mean that if a user run GCC with -O1 and we don't do SH specific loop move invariants pass at -O1, a user could look at binary (or at .S file) and find that not all invariants are moved out of the loops, because library addresses are hoisted after split1 pass. That would confuse a user even if RTL dump is generated, and he can decide that it's a bug into loop2 pass. I suggest to generate sh_loop dumps if RTL dump is requested and place there a message that sh_loop hoisting is not done due to optimization level setting. But, actually, it's not that important at the moment because I'm aiming to do the second option - run sh_loop only when it's potentially needed. > The compiler processes one function at a time, all passes at once. That's what I missed! Thank you for clarification! > > I see some strange new exec fails only at testsuite logs. Right now I'm > > trying to find the cause. > > What's the configure line of your GCC build? ../gcc-13.1.0/configure --target=sh-elf --with-endian=big,little --disable-bootstrap --enable-languages=c,c++ --disable-nls --with-gnu-as --with-gnu-ld --prefix=/usr/local/sh-toolchain/ --without-newlib --without-headers I checked several cases and they are because ld cannot link little endian binary. When I run failed command taken from log file, it always fails for linking little endian binary. But sometimes logs say that executing of little endian binary is successful. I'm at a loss - how is that even possible?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #78 from Oleg Endo --- (In reply to Alexander Klepikov from comment #77) > > It'd be good if the newly added passes are ran only with -O2 or higher. > > This can be confusing to users when they discover that not all invariants > are moved out of loops. Then we should inform them about that at least. I don't think the compiler reports any optimizations not done to the user at lower optimization levels? It's normal not to do certain optimizations at a lower level, that's why we have -O0 -O1 -O2 ... or do you mean something else by that? > I'm thinking about this for some time. We know that we should potentially > run additional loop optimization pass when we're splitting libcall. I did > not find the way to know in what function we are splitting yet. The compiler processes one function at a time, all passes at once. It doesn't mix passes of different functions. So I think using global variable in sh.cc + override 'set_current_function' should get the job done. When the insn split code is executed, just set the global flag in the SH specific function context. > I see some strange new exec fails only at testsuite logs. Right now I'm > trying to find the cause. What's the configure line of your GCC build?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #77 from Alexander Klepikov --- > It'd be good if the newly added passes are ran only with -O2 or higher. This can be confusing to users when they discover that not all invariants are moved out of loops. Then we should inform them about that at least. > Or even better, if we can find a way to enable them only when actually > needed. > E.g. when it's splitting a shift insn that will potentially need the loop > optimizations again, set a flag in the function. I'm thinking about this for some time. We know that we should potentially run additional loop optimization pass when we're splitting libcall. I did not find the way to know in what function we are splitting yet. > However, to get better test coverage, it's better first let the additional > loop passes run all the time to uncover any potential issues. Later the > above can be added as a supplementary optimization. I see some strange new exec fails only at testsuite logs. Right now I'm trying to find the cause.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #76 from Oleg Endo --- (In reply to Alexander Klepikov from comment #75) > I found that patch incorrectly works when '-O0 -fmove-loop-invariants' flags > are set. Stock loop optimization passes do not run when '-O0' is specified > desipte '-fmove-loop-invariants' is set. I'll do the same and recheck again. It'd be good if the newly added passes are ran only with -O2 or higher. Or even better, if we can find a way to enable them only when actually needed. E.g. when it's splitting a shift insn that will potentially need the loop optimizations again, set a flag in the function. One way to do that could be adding SH specific function context class/struct to store some per-function data for the compilation process. Right now we'd have only one entry, that is whether to run the additional passes again or not. This per-function 'SH context' can be stored as a global variable in sh.cc. Then override 'set_current_function' in the backend to reset the SH specific per-function context. However, to get better test coverage, it's better first let the additional loop passes run all the time to uncover any potential issues. Later the above can be added as a supplementary optimization.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #75 from Alexander Klepikov --- I found that patch incorrectly works when '-O0 -fmove-loop-invariants' flags are set. Stock loop optimization passes do not run when '-O0' is specified desipte '-fmove-loop-invariants' is set. I'll do the same and recheck again.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Alexander Klepikov changed: What|Removed |Added Attachment #55284|0 |1 is obsolete|| --- Comment #74 from Alexander Klepikov --- Created attachment 55321 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55321=edit Collapsed libcall and additional loop move invariants patch v2
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #73 from Alexander Klepikov --- I had to check everything a few more times because I found a bug in my patch that caused the long shifts to expand incorrectly. I added a test that checks correct shifts expansion in this regard. I also managed to make a test that verifies hoisting. It can be done with 'scan-assembler' command and regexp. If you're interested, there's more powerful command 'check-function-bodies', but it looks like it works only when C++ cross compiler is built, so I decided to leave the regexp.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #72 from Oleg Endo --- (In reply to Alexander Klepikov from comment #71) > > > * Do we really need to add that new source file sh_loop.cc with the wrapper > > classes? Can't we just instantiate the passes that are needed in-place when > > they are registered? > > Unfortunately, no. > [...] That's too bad. Thanks for digging. I haven't checked myself recently, I thought this infrastructure has been improved a little bit, but seems not so. Perhaps SH is the only backend that wants to do this kind of thing. Let's leave it at that for now. Later, after everything else has been cleared out we can ask other GCC developers' opinion on this. > Yes, of course! I'll add some 'tst #imm,0' presence tests. That'd be good. > Should I add hoist test? I do not yet understand how hoisting check should be > performed, but I hope to find examples. Also not sure how to do that at the moment. Maybe scanning RTL dumps of the new extra passes? Or perhaps just a simple example as a start: void test_func (int* a_ptr, int* b_ptr, unsigned int count) { for (unsigned int i = 0; i < count; ++i) a_ptr[i] >> 5; for (unsigned int i = 0; i < count; ++i) b_ptr[i] >> 5; } ... and then just scan the asm output and check against the expected number of insns. In this case, mov.l insn to load the function call address. > Maybe some other tests? If I'm missing something, please tell me. Right now not that I can think of. > And as I understand, I should name test file something like > pr54089-next-free-number.c, right? Yes, that should be fine.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #71 from Alexander Klepikov --- > There are some formatting nits in your patch, please check again: Thanks for pointing that out, I'll check again. > But more interestingly: > * Do we really need to add that new source file sh_loop.cc with the wrapper > classes? Can't we just instantiate the passes that are needed in-place when > they are registered? Unfortunately, no. First of all, classes are defined in loop-init.cc and thus cannot be accessed directly. There are wrappers called 'make_pass_rtl_blah_bla_blah' which create class instances, but. Class .name field is hardcoded, there's no option to set it when initializing, and pass manager checks if new pass name differs from any already existing. Here's the code from passes.cc: if (pass->type == new_pass_info->pass->type && pass->name && !strcmp (pass->name, new_pass_info->reference_pass_name) //Name check ... Moreover, pass_rtl_loop_init::execute method calls a function that do the following check: gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT); And in our pass (that goes after split1 pass), current_ir_type()!=IR_RTL_CFGLAYOUT, so we will get ICE. And even if we could inherit the class, 'execute' method is final. I'm not so good at C++, so may be there's another way. I know that the less code is the better, but I'm afraid, not this time. > > * Can you add some tests to the SH specific tests? Yes, of course! I'll add some 'tst #imm,0' presence tests. Should I add hoist test? I do not yet understand how hoisting check should be performed, but I hope to find examples. Maybe some other tests? If I'm missing something, please tell me. And as I understand, I should name test file something like pr54089-next-free-number.c, right? Thank you.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #70 from Oleg Endo --- (In reply to Alexander Klepikov from comment #69) > Created attachment 55284 [details] > Collapsed libcall and additional loop move invariants patch Thanks for sharing the patch. I also don't have the GCC SH test environment setup at the moment, so it will take me some time to get up to speed on that. There are some formatting nits in your patch, please check again: * don't add / remove empty lines for no reason * '{' goes on new line (follow surrounding code style) * in comments: two spaces after the period * closing ')' and ']' in the RTL should go on the same line (follow surrounding style) But more interestingly: * Do we really need to add that new source file sh_loop.cc with the wrapper classes? Can't we just instantiate the passes that are needed in-place when they are registered? * Can you add some tests to the SH specific tests?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #69 from Alexander Klepikov --- Created attachment 55284 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55284=edit Collapsed libcall and additional loop move invariants patch
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #68 from Alexander Klepikov --- OK, I finished my patch. Let me remind you briefly, what it does. First, it does not emit library call for ashrsi3 during expand pass. Instead it emits intermediate 'collapsed' libcall insn which is expanded later at split1 pass. Consecutive right arithmetic shifts can be catched at combine pass and converted to 'collapsed' libcall insn. Then 'collapsed' insn splits whether to short sequnce of individual right shifts or to __ashiftrt_r4_N library call during split1 pass. Finally, loop move invariants only pass executed right after split1 pass to hoist library addresses that potentially were emitted during split pass. I ran GCC testsuite and there were 1 new failed test that actually turned out to be an optimization, and 3 tests that failed on clean GCC and passed on patched. I could not set up my environment to test all little endian exec tests, so for those I don't have results. Also I successfully compiled (but not linked due to not fully set environment) Linux kenel 6.1. My homemade performance tests (I timed linux kernel build) showed a slowdown of less then 3%. To reduce slowdown it would be good to run loop move invariants pass only in functions that have expanded libcall, but to do so it is necessary to mark such funcnion when emitting insn and I'm not so sure if insn knows about in what function it is located.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #67 from Oleg Endo --- (In reply to Alexander Klepikov from comment #66) > (In reply to Alexander Klepikov from comment #65) > > > I'm thinking of something else. > > > > During kernel compile I got few internal errors in different passes. That > > is, additional loop optimization pass patch is no good at all. > > I am sorry, I am, as always, panicking ahead of time. I think I found what's > wrong and do additional tests. Don't worry. I know what you're going through. Been there myself ;) Take your time.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #66 from Alexander Klepikov --- (In reply to Alexander Klepikov from comment #65) > > I'm thinking of something else. > > During kernel compile I got few internal errors in different passes. That > is, additional loop optimization pass patch is no good at all. I am sorry, I am, as always, panicking ahead of time. I think I found what's wrong and do additional tests.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #65 from Alexander Klepikov --- > I'm thinking of something else. During kernel compile I got few internal errors in different passes. That is, additional loop optimization pass patch is no good at all.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #64 from Alexander Klepikov --- > We have to consider that SH is also a linux target and it's also used to > build larger applications (and GCC itself ...). It'd be good to not regress > too much in this regard. One way to check it is the CSiBE test set. I can > help testing your patch later. I agree with you. Thank you!
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #63 from Oleg Endo --- (In reply to Alexander Klepikov from comment #62) > > My project is small and it compiles in under 1 second on both clean and > patched GCC. sh.exp test suite mean run time is 117s on clean and 119s on > patched. I did 1 warm-up run and then 3 main one-threaded runs for each > task. I'm thinking of something else. We have to consider that SH is also a linux target and it's also used to build larger applications (and GCC itself ...). It'd be good to not regress too much in this regard. One way to check it is the CSiBE test set. I can help testing your patch later. > > Implementing features not supported by hardware will always be a tradeoff. I'd say it's generally about how to find the best choice of instructions/sequences. With GCC's "waterfall" optimization it's impossible to find the best solution because it doesn't have a way to compare the total cost of one solution vs. another, for example. We have to work around that ;)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #62 from Alexander Klepikov --- > I'm a bit concerned about the increased compile time. Have you observed > anything (negative) in this regard? My project is small and it compiles in under 1 second on both clean and patched GCC. sh.exp test suite mean run time is 117s on clean and 119s on patched. I did 1 warm-up run and then 3 main one-threaded runs for each task. I'm thinking of something else. > > Loop, hoist, constant propagation etc (+ all the good stuff) optimizations > are done before insn combine / split1. We could add a simple SH specific > pass that goes over the RTL and does stuff to shifts before those other > optimizations. That's a good idea! > However, it might miss insn combine opportunities later on. Implementing features not supported by hardware will always be a tradeoff.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #61 from Oleg Endo --- (In reply to Alexander Klepikov from comment #60) > > Maybe it's easier to add some shift specific passes. > > Well, I couldn't think of anything better and ported the loop optimization > pass. More precisely, all loop optimization passes, because they are tied to > each other. They run after split1 pass. And it worked! > > I want to beleive that another loop optimization pass won't break anything > because loops are already optimized. Theoretically it should't ... > > If that's redundant, I thought of expanding libcall if it's inside a loop > before loop optimization passes. I'm a bit concerned about the increased compile time. Have you observed anything (negative) in this regard? Loop, hoist, constant propagation etc (+ all the good stuff) optimizations are done before insn combine / split1. We could add a simple SH specific pass that goes over the RTL and does stuff to shifts before those other optimizations. However, it might miss insn combine opportunities later on. I'm thinking about your tst #imm,r0 case from before.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #60 from Alexander Klepikov --- > Maybe it's easier to add some shift specific passes. Well, I couldn't think of anything better and ported the loop optimization pass. More precisely, all loop optimization passes, because they are tied to each other. They run after split1 pass. And it worked! I want to beleive that another loop optimization pass won't break anything because loops are already optimized. If that's redundant, I thought of expanding libcall if it's inside a loop before loop optimization passes.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #59 from Oleg Endo --- (In reply to Alexander Klepikov from comment #58) > > Ouch. That's a real problem. Short loops can become slower on about 10%. But > is it possible to detect a loop during expand pass? It looks like basic > blocks have loop depth info, but I still don't now how to access a basic > block. There is 'BLOCK_FOR_INSN' But I'm not sure it will be helpful during the initial expand pass. > I would try that. I think loop optimiztion pass should be repeated after > split pass. Do you know how to do it? I don't know if we can simply repeat the loop optimizations. I think I've tried doing something like that before -- repeating some of the RTL passes, but without any useful results. It could also result in oscillations (pass A does a transformation, pass B undoes it, then pass A would do it again ...). Maybe you can get better results. There are already 2 SH specific passes 'sh_optimize_sett_clrt' and 'sh_treg_combine'. You can look at how they are instantiated and inserted into the RTL passes chain in 'register_sh_passes'. Maybe it's easier to add some shift specific passes.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #58 from Alexander Klepikov --- > Another thing ... the reason why it's desirable to expand into the libcall > earlier is to allow hoisting the function call address outside of loops and > things like that. Ouch. That's a real problem. Short loops can become slower on about 10%. But is it possible to detect a loop during expand pass? It looks like basic blocks have loop depth info, but I still don't now how to access a basic block. > That happens only once at the RTL level during > compilation and that's before the combine pass. Since there are no loops > around optimization passes in the compiler, it's always going to be a > compromise. But it might be OK to repeat a particular optimization pass > only for SH, if it helps anything. I would try that. I think loop optimiztion pass should be repeated after split pass. Do you know how to do it?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #57 from Oleg Endo --- (In reply to Alexander Klepikov from comment #56) > > > In that test you can see the unnecessary push/pop of PR. This is because > > initially it wanted to expand as a library call, but then your patterns > > decided to change the insns. This can or can't be avoided, depending on the > > case. > > That's an valuable observation, thank you! I found why. Sometimes 'collapsed > libcall' instruction is emitted in combine pass and 'clobber PR' is set > then. I commented this out and checked that file and it seems to be OK. I > will rerun testsuite again to be sure. Another thing ... the reason why it's desirable to expand into the libcall earlier is to allow hoisting the function call address outside of loops and things like that. That happens only once at the RTL level during compilation and that's before the combine pass. Since there are no loops around optimization passes in the compiler, it's always going to be a compromise. But it might be OK to repeat a particular optimization pass only for SH, if it helps anything. > > Now regarding tests that fail on vanilla GCC and pass with patch. It looks > like something was changed in the middle of GCC and it started to produce > other instruction patterns that cannot be catched by existing isns and thus > expanded to library calls. The patch simply fixed it. Yes, that happens every now and then, as folks work on target independent optimizations in the middle-end. SH backend hasn't been keeping up for a while in this regard.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #56 from Alexander Klepikov --- > > Regarding testsuite. There's execute fails, but this is due to lack of > > multilib. I'll rebuild and retest. > > > > There's also fail in pr64345-1.c, in this function: > > [...] > > > > But it looks more like it's not a fail, but an optimization. > > Yeah, that looks like an improvement. There might be some SH specific tests > that scan for particular assembler outputs like that one. Those tests would > need to be adjusted of course. I checked and that one was the only one. > In that test you can see the unnecessary push/pop of PR. This is because > initially it wanted to expand as a library call, but then your patterns > decided to change the insns. This can or can't be avoided, depending on the > case. That's an valuable observation, thank you! I found why. Sometimes 'collapsed libcall' instruction is emitted in combine pass and 'clobber PR' is set then. I commented this out and checked that file and it seems to be OK. I will rerun testsuite again to be sure. Now regarding tests that fail on vanilla GCC and pass with patch. It looks like something was changed in the middle of GCC and it started to produce other instruction patterns that cannot be catched by existing isns and thus expanded to library calls. The patch simply fixed it.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #55 from Oleg Endo --- (In reply to Alexander Klepikov from comment #54) > Regarding testsuite. There's execute fails, but this is due to lack of > multilib. I'll rebuild and retest. > > There's also fail in pr64345-1.c, in this function: > [...] > > But it looks more like it's not a fail, but an optimization. Yeah, that looks like an improvement. There might be some SH specific tests that scan for particular assembler outputs like that one. Those tests would need to be adjusted of course. In that test you can see the unnecessary push/pop of PR. This is because initially it wanted to expand as a library call, but then your patterns decided to change the insns. This can or can't be avoided, depending on the case. > > But also there's tests that pass on patched but fail on clean. I'll take a > closer look on them later after GCC and multilibs rebuild. Yes, there are some (well ... quite a lot actually) tests that will also fail on vanilla GCC on SH. Hence the need to look at the test result delta before/after patch.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #54 from Alexander Klepikov --- Regarding testsuite. There's execute fails, but this is due to lack of multilib. I'll rebuild and retest. There's also fail in pr64345-1.c, in this function: typedef signed char int8_t; int test_int8_t__shift_7_8 (int8_t* x) { return ((*x >> 7) ^ 1) & 1; } Clean GCC: .file "pr64345-1-test.c" .text .text .align 1 .align 2 .global _test_int8_t__shift_7_8 .type _test_int8_t__shift_7_8, @function _test_int8_t__shift_7_8: mov.l .L4,r1 sts.l pr,@-r15 jsr @r1 mov.b @r4,r4 mov r4,r0 tst #1,r0 movtr0 lds.l @r15+,pr rts nop .L5: .align 2 .L4: .long ___ashiftrt_r4_7 .size _test_int8_t__shift_7_8, .-_test_int8_t__shift_7_8 .ident "GCC: (GNU) 13.1.0" Patched GCC: .file "pr64345-1-test.c" .text .text .align 1 .align 2 .global _test_int8_t__shift_7_8 .type _test_int8_t__shift_7_8, @function _test_int8_t__shift_7_8: mov.b @r4,r1 sts.l pr,@-r15 cmp/pz r1 movtr0 lds.l @r15+,pr rts nop .size _test_int8_t__shift_7_8, .-_test_int8_t__shift_7_8 .ident "GCC: (GNU) 13.1.0" But it looks more like it's not a fail, but an optimization. But also there's tests that pass on patched but fail on clean. I'll take a closer look on them later after GCC and multilibs rebuild.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #53 from Oleg Endo --- (In reply to Segher Boessenkool from comment #52) > > There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for if > you really do not want the instruction combiner to create particular > instruction patterns (but it does nothing to prevent other parts of the > compiler from doing the same!) Thanks for pointing it out. I knew I missed something recent ;)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #52 from Segher Boessenkool --- (In reply to Alexander Klepikov from comment #50) > But maybe there is a way to exclude particular insn from combine pass? (I > guess not). In general, it is best to let combine just work on everything. It will not replace instructions if the replacement is more expensive, and it will only ever create instruction sequences with the same semantics as what it started with. There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for if you really do not want the instruction combiner to create particular instruction patterns (but it does nothing to prevent other parts of the compiler from doing the same!)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #51 from Oleg Endo --- (In reply to Alexander Klepikov from comment #50) > > Ooh, my bad! You are absolutely right. A function is inlined and division is > converted to 4 'shar's which at combine pass are catched by 'define_insn > "ashrsi3_libcall_collapsed"' which is little unexpected to me. Of course, it > is expanded at 'split' pass to lib call. 5 and less right shifts should not > convert to a lib call, but that can be easily corrected. > > But maybe there is a way to exclude particular insn from combine pass? (I > guess not). > I don't think there is a direct way to hide patterns from the combine pass, although I could be wrong, since I haven't been tracking the combine pass development closely for a while. Perhaps you can look at how the left-shifts are done as a reference. > Now concerning GCC testsuite. I ran it in such way: > > make check RUNTESTFLAGS="CFLAGS='$CFLAGS -I/usr/local/sh-elf/include/' > --target_board=sh-sim\{-m2e,-m4\}\{-ml,-mb\}" > > There are lots of failed tests on non-patched GCC. Even if I narrow tests > list to sh.exp, it still fails a lot of times. At last there's nothing in > logs that produce ICE in RTL. I'm not sure I could find a crash due to the > patch at all, even if it were there. To test it, you first use a known good version of GCC without your patch, build the whole toolchain from scratch and run it. Then collect the test results. Then apply your patch, and repeat the process. Collect the test results again. Then compare both test results. If there are any new changes in the test results your patch is hitting something. Usually I'd run the testsuite as follows: make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" (running make with multiple jobs here is useful, and still it will take some hours)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #50 from Alexander Klepikov --- > > I've found that my patch catches integer division. In short, it appears to > > work unpredictable. It looks like there's no easy way to catch right shift. > > What do you mean it catches integer division? There could be several places > during compilation, where the compiler will try to convert integer division > to right shifts. But it will not do so unless it's wrong. The patterns you > have added shouldn't match a division operation. Ooh, my bad! You are absolutely right. A function is inlined and division is converted to 4 'shar's which at combine pass are catched by 'define_insn "ashrsi3_libcall_collapsed"' which is little unexpected to me. Of course, it is expanded at 'split' pass to lib call. 5 and less right shifts should not convert to a lib call, but that can be easily corrected. But maybe there is a way to exclude particular insn from combine pass? (I guess not). Here are the files that confused me: $ cat pr49880-3.c /* Check that the option -mdiv=call-table works. */ /* { dg-do link } */ /* { dg-options "-mdiv=call-table" } */ int test00 (int a, int b) { return a / b; } unsigned int test01 (unsigned int a, unsigned b) { return a / b; } int main (int argc, char** argv) { return test00 (argc, 123) + test01 (argc, 123); } Not-patched GCC $ cat pr49880-3.s.clean .file "pr49880-3.c" .text .text .align 1 .align 2 .global _test00 .type _test00, @function _test00: mov.l .L4,r0 sts.l pr,@-r15 jsr @r0 nop lds.l @r15+,pr rts nop .L5: .align 2 .L4: .long ___sdivsi3 .size _test00, .-_test00 .align 1 .align 2 .global _test01 .type _test01, @function _test01: mov.l .L8,r0 sts.l pr,@-r15 jsr @r0 nop lds.l @r15+,pr rts nop .L9: .align 2 .L8: .long ___udivsi3 .size _test01, .-_test01 .section.text.startup,"ax",@progbits .align 1 .align 2 .global _main .type _main, @function _main: mov.l .L11,r1 dmuls.l r1,r4 sts mach,r0 sharr0 sharr0 sharr0 sharr0 mov r4,r1 shllr1 subcr1,r1 sub r1,r0 mov.l .L12,r1 dmulu.l r1,r4 sts mach,r1 sub r1,r4 shlrr4 add r4,r1 shlr2 r1 shlr2 r1 shlr2 r1 rts add r1,r0 .L13: .align 2 .L11: .long 558694933 .L12: .long 174592167 .size _main, .-_main .ident "GCC: (GNU) 13.1.0" Patched GCC $ cat pr49880-3.s .file "pr49880-3.c" .text .text .align 1 .align 2 .global _test00 .type _test00, @function _test00: mov.l .L4,r0 sts.l pr,@-r15 jsr @r0 nop lds.l @r15+,pr rts nop .L5: .align 2 .L4: .long ___sdivsi3 .size _test00, .-_test00 .align 1 .align 2 .global _test01 .type _test01, @function _test01: mov.l .L8,r0 sts.l pr,@-r15 jsr @r0 nop lds.l @r15+,pr rts nop .L9: .align 2 .L8: .long ___udivsi3 .size _test01, .-_test01 .section.text.startup,"ax",@progbits .align 1 .align 2 .global _main .type _main, @function _main: mov.l .L12,r2 mov r4,r1 sts.l pr,@-r15 dmuls.l r2,r4 mov.l .L13,r2 jsr @r2 sts mach,r4 mov r1,r2 shllr2 subcr2,r2 mov r4,r0 sub r2,r0 mov.l .L14,r2 dmulu.l r2,r1 sts mach,r2 sub r2,r1 shlrr1 add r1,r2 shlr2 r2 shlr2 r2 shlr2 r2 add r2,r0 lds.l @r15+,pr rts nop .L15: .align 2 .L12: .long 558694933 .L13: .long ___ashiftrt_r4_4 .L14: .long 174592167 .size _main, .-_main .ident "GCC: (GNU) 13.1.0" Now concerning GCC testsuite. I ran it in such way: make check RUNTESTFLAGS="CFLAGS='$CFLAGS -I/usr/local/sh-elf/include/' --target_board=sh-sim\{-m2e,-m4\}\{-ml,-mb\}" There are lots of failed tests on non-patched GCC. Even if I narrow tests list to sh.exp, it still fails a lot of times. At last there's nothing in logs that produce ICE in RTL. I'm not sure I could find a crash due to the patch at all, even if it were there. And finally, 'parallel' appears to be unnecessary, thank you for pointing.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #49 from Oleg Endo --- (In reply to Alexander Klepikov from comment #48) > Let's continue discussion we started here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263 > > I've found that my patch catches integer division. In short, it appears to > work unpredictable. It looks like there's no easy way to catch right shift. What do you mean it catches integer division? There could be several places during compilation, where the compiler will try to convert integer division to right shifts. But it will not do so unless it's wrong. The patterns you have added shouldn't match a division operation.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Alexander Klepikov changed: What|Removed |Added CC||klepikov.alex+bugs at gmail dot co ||m --- Comment #48 from Alexander Klepikov --- Let's continue discussion we started here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263 I've found that my patch catches integer division. In short, it appears to work unpredictable. It looks like there's no easy way to catch right shift.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #47 from Oleg Endo --- Thanks for the reminder. The issue hasn't been fully resolved. Please leave this PR open. This will be helpful when getting back to the issue.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #46 from Martin Liška --- Can the bug be marked as resolved?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #45 from Oleg Endo --- Author: olegendo Date: Fri May 6 09:41:57 2016 New Revision: 235950 URL: https://gcc.gnu.org/viewcvs?rev=235950=gcc=rev Log: gcc/ PR target/54089 * config/sh/sh.md (*rotcr): Add another variant. gcc/testsuite/ PR target/54089 * gcc.target/sh/pr54089-1.c (test_24): Add new sub-test. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #44 from Oleg Endo --- Author: olegendo Date: Mon Feb 22 13:33:31 2016 New Revision: 233601 URL: https://gcc.gnu.org/viewcvs?rev=233601=gcc=rev Log: gcc/ PR target/69806 PR target/54089 * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p, sh_dynamicalize_shift_p): Handle negative shift counts. * config/sh/sh.md (ashlsi3, lshrsi3_n, lshrsi3_n_clobbers_t): Don't use force_reg on the shift constant. (lshrsi3): Likewise. Expand into lshrsi3_n* instead of lshrsi3_d. (lshrsi3_d): Handle negative shift counts. gcc/testsuite/ PR target/69806 PR target/54089 * gcc.target/sh/pr54089-10.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-10.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Segher Boessenkool segher at gcc dot gnu.org changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #41 from Segher Boessenkool segher at gcc dot gnu.org --- On the other hand, there are quite a few archs (more modern ones mostly) where the Linux kernel uses the real libgcc. And the SH kernel maintainers won't complain anyway (SH is orphaned).
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #40 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Rich Felker from comment #39) Oleg, do you have rights under your copyright assignment agreement to dual-license the libgcc part of the patch under GPLv2 so it could be included in the kernel, and if so, would you be willing to do so? If I understand it correctly, it should be OK. Please contact me directly about this. If possible, please also add the other people who want this to the CC list. I agree it would be cleaner for the kernel not to duplicate libgcc code, but getting in a simple patch to update the code that's there would be a lot easier than the build-architectural change of using libgcc itself in the kernel, which is an issue I'd really rather not fight with the kernel developers on if they disagree. :-) A quick search for libgcc linux kernel tells the story about this. Unfortunately, you are right.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #42 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #41) On the other hand, there are quite a few archs (more modern ones mostly) where the Linux kernel uses the real libgcc. If this is the case (could you please name a few of those archs just for reference?) ... And the SH kernel maintainers won't complain anyway (SH is orphaned). ... then this issue now is a very good opportunity to fix the actual problem (remove libgcc code from the kernel and link it against the proper libgcc).
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #43 from Segher Boessenkool segher at gcc dot gnu.org --- The archs that already use the real libgcc are arc, cris, hexagon, m32r, nios2, openrisc, parisc, xtensa. grep -w LIBGCC ~/src/kernel/arch/*/Makefile
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #39 from Rich Felker bugdal at aerifal dot cx --- Oleg, do you have rights under your copyright assignment agreement to dual-license the libgcc part of the patch under GPLv2 so it could be included in the kernel, and if so, would you be willing to do so? I agree it would be cleaner for the kernel not to duplicate libgcc code, but getting in a simple patch to update the code that's there would be a lot easier than the build-architectural change of using libgcc itself in the kernel, which is an issue I'd really rather not fight with the kernel developers on if they disagree. :-)
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Rich Felker bugdal at aerifal dot cx changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #33 from Rich Felker bugdal at aerifal dot cx --- The commit in comment 16 broke Linux (the kernel) and nobody seems to have noticed since 2012...
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #34 from Oleg Endo olegendo at gcc dot gnu.org --- Thanks for spotting and pin pointing. Do you have a bit more info, maybe a reduced test case?
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #35 from Rich Felker bugdal at aerifal dot cx --- No, but the cause is simple -- the kernel doesn't use libgcc but defines its own versions of these functions, and has the old ones but not the new ones. I tried just removing the kernel's copies and using libgcc.a but it produced a non-bootable kernel and I didn't test further; my guess would be some relocations in the libgcc.a versions are incompatible with the way the kernel is built.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #36 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Rich Felker from comment #35) No, but the cause is simple -- the kernel doesn't use libgcc but defines its own versions of these functions, and has the old ones but not the new ones. Yeah, that's what happens when other SW relies on the implementation details. I don't think it's a problem of GCC, but rather the kernel. You can try to copy the new functions from the GCC source to the kernel source.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #37 from Rich Felker bugdal at aerifal dot cx --- I expect this is going to be problematic from a license perspective unless they can be licensed under GPLv2...
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #38 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Rich Felker from comment #33) The commit in comment 16 broke Linux (the kernel) and nobody seems to have noticed since 2012... There aren't that many (publicly known) users of Linux on SH2 or SH2A. If they don't speak up and stay in the underground, nobody will notice :) (In reply to Rich Felker from comment #37) I expect this is going to be problematic from a license perspective unless they can be licensed under GPLv2... In this case, adjust the functions in the kernel to match the interface of the compiler. However, maybe it's better to figure out how to have libgcc in the kernel to avoid this kind of situation in the future.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #32 from Oleg Endo olegendo at gcc dot gnu.org --- Author: olegendo Date: Tue Dec 16 21:37:42 2014 New Revision: 218795 URL: https://gcc.gnu.org/viewcvs?rev=218795root=gccview=rev Log: gcc/testsuite/ PR target/54089 * gcc.target/sh/pr54089-1.c: Change optimization level from -O1 to -O2. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #31 from Oleg Endo olegendo at gcc dot gnu.org --- Author: olegendo Date: Fri May 16 23:12:19 2014 New Revision: 210537 URL: http://gcc.gnu.org/viewcvs?rev=210537root=gccview=rev Log: gcc/ PR target/54089 * config/sh/predicates.md (negt_reg_shl31_operand): Match additional patterns. * config/sh/sh.md (*negt_msb): Merge SH2A and non-SH2A variants. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.md
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #30 from Oleg Endo olegendo at gcc dot gnu.org --- A case from libmpeg2/slice.c: mov.b @(1,r10),r0// load of shift amount shldr7,r6 add #1,r6 extu.b r0,r0 // zero extend shift amount shldr0,r1 // r1 = r0 mov r1,r0 The zero extension of the shift amount variable could be omitted because shift amounts 31 are undefined behavior. If the shift amount is in the valid range of 0...31 the zero extension won't do anything. A reduced test case: int test33 (unsigned char* x, int y) { return y x[4]; } results in: mov.b @(4,r4),r0 extu.b r0,r0 shldr0,r5 rts mov r5,r0
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #29 from Oleg Endo olegendo at gcc dot gnu.org 2013-02-16 11:36:37 UTC --- Another case taken from CSiBE / bzip2, where reusing the intermediate shift result would be better: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n-b[7] = (UChar)((hi32 24) 0xFF); n-b[6] = (UChar)((hi32 16) 0xFF); n-b[5] = (UChar)((hi32 8) 0xFF); n-b[4] = (UChar) (hi32 0xFF); /* n-b[3] = (UChar)((lo32 24) 0xFF); n-b[2] = (UChar)((lo32 16) 0xFF); n-b[1] = (UChar)((lo32 8) 0xFF); n-b[0] = (UChar) (lo32 0xFF); */ } on rev 196091 with -O2 -m4 compiles to: mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 mov.b r0,@(4,r4) which would be better as: mov r6,r0 mov.b r0,@(4,r4) shlr8 r0 mov.b r0,@(5,r4) shlr8 r0 mov.b r0,@(6,r4) shlr8 r0 mov.b r0,@(7,r4) this would require reordering of the mem stores, which should be OK to do if the mem is not volatile. Reordering the stores manually: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n-b[4] = (UChar) (hi32 0xFF); n-b[5] = (UChar)((hi32 8) 0xFF); n-b[6] = (UChar)((hi32 16) 0xFF); n-b[7] = (UChar)((hi32 24) 0xFF); } still results in: mov r6,r0 mov.b r0,@(4,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) ... at least this case should be handled, I think.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #28 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-13 01:13:55 UTC --- (In reply to comment #27) FWIW, while the ARC600/700 have a full set up static and dynamic shifts, the ARC 601 with the swap option has similar issues with shifts that should be stitched together from multiple instructions, and preferences of unsigned over signed shifts. Ah, good to know that SH is not the only case where it would make sense. I've created another PR for this issue: PR 55306
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #26 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-09 10:48:17 UTC --- (In reply to comment #25) Maybe the better solution would indeed be to add a arith - logical shift conversion pass before combine, or try to convert arith shifts in the split pass after combine by looking at the data flow of the shifted values. Another idea would be to extend the combine pass, so that whenever it internally converts arith - logical shifts, it always checks the costs of the shift op, and if the logical shift is cheaper, always split out the logical shift.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #27 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 2012-11-09 13:29:10 UTC --- (In reply to comment #26) (In reply to comment #25) Maybe the better solution would indeed be to add a arith - logical shift conversion pass before combine, or try to convert arith shifts in the split pass after combine by looking at the data flow of the shifted values. FWIW, while the ARC600/700 have a full set up static and dynamic shifts, the ARC 601 with the swap option has similar issues with shifts that should be stitched together from multiple instructions, and preferences of unsigned over signed shifts. Another idea would be to extend the combine pass, so that whenever it internally converts arith - logical shifts, it always checks the costs of the shift op, and if the logical shift is cheaper, always split out the logical shift. Indeed, combine's insistence that a computation has just one canonical form, which depends on how much it known about the input/output data, is in want of some correction. We got define_split for things that should become multiple instructions, but nothing for things that should just be differently-shaped instructions, and there are no predicates available to get to know about the (non)zero / signbits that combine knows about. E.g. the following peephole in arc.md is only necessary because combine does some unwanted 'simplifications' with known-zero bits: ;; - ;; Pattern 1 : r0 = r1 {i} ;; r3 = r4/INT + r0 ;;and commutative ;; || ;; \/ ;; add{i} r3,r4/INT,r1 ;; - ;; ??? This should be covered by combine, alas, at times combine gets ;; too clever for it's own good: when the shifted input is known to be ;; either 0 or 1, the operation will be made into an if-then-else, and ;; thus fail to match the add_n pattern. Example: _mktm_r, line 85 in ;; newlib/libc/time/mktm_r.c . (define_peephole2 [(set (match_operand:SI 0 dest_reg_operand ) (ashift:SI (match_operand:SI 1 register_operand ) (match_operand:SI 2 const_int_operand ))) (set (match_operand:SI 3 dest_reg_operand ) (plus:SI (match_operand:SI 4 nonmemory_operand ) (match_operand:SI 5 nonmemory_operand )))] (INTVAL (operands[2]) == 1 || INTVAL (operands[2]) == 2 || INTVAL (operands[2]) == 3) (true_regnum (operands[4]) == true_regnum (operands[0]) || true_regnum (operands[5]) == true_regnum (operands[0])) (peep2_reg_dead_p (2, operands[0]) || (true_regnum (operands[3]) == true_regnum (operands[0]))) ;; the preparation statements take care to put proper operand in operands[4] ;; operands[4] will always contain the correct operand. This is added to satisfy commutativity [(set (match_dup 3) (plus:SI (mult:SI (match_dup 1) (match_dup 2)) (match_dup 4)))] if (true_regnum (operands[4]) == true_regnum (operands[0])) operands[4] = operands[5]; operands[2] = GEN_INT (1 INTVAL (operands[2])); )
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #25 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-07 23:31:20 UTC --- Created attachment 28633 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28633 Arithmetic right shift rework 2 This could be an alternative approach for the arith right shifts. Applied to rev 193240: CSiBE -m4-single -O2 -ml -mpretend-cmove sum: 3159747 - 3160047+300 / +0.009494 % CSiBE -m2 -O2 -ml sum: 3314907 - 3313319-1588 / -0.047905 % The idea here is to let arith shifts initially go through a slightly more complex pattern 'ashrsi3_bridge_const', which allows the lib function address or shad constant to be CSE'ed early on. As mentioned in comment #18, the arith - logical shift conversion is done by combine only under certain circumstances. After some trial and error I arrived at the 'ashrsi3_bridge_const' pattern, which makes combine convert most of the arith shifts into logical shifts. In some cases it also finds more opportunities, but regressions like this one still remain: void MC_put_xy_16_c (uint8_t * dest, const uint8_t * ref, const int stride, int height) { // must see a shlr instead of 2x shar. // somehow the loop prevents this... do { dest[0] = (((ref[0]+ref[0 +1]+(ref+stride)[0]+(ref+stride)[0 +1]+2)2)); ref += stride; dest += stride; } while (--height); } In addition to that, some unlucky register allocations appear, which cause unneeded reg-reg copies (it seems something has trouble utilizing commutative insns such as 'add'). The other problem is that patterns such as: (define_insn_and_split *lshrsi3 [(set (match_operand:SI 0 arith_reg_dest) (lshiftrt:SI (match_operand:SI 1 arith_reg_operand) (match_operand:SI 2 shift_count_operand))) (clobber (reg:SI T_REG)) (use (match_operand:SI 3 general_operand)) (use (match_operand:SI 4 general_operand))] TARGET_SH1 can_create_pseudo_p () CONST_INT_P (operands[2]) # 1 [(const_int 0)] { emit_insn (gen_lshrsi3 (operands[0], operands[1], operands[2])); DONE; }) have to be added for combine to convert more arith - logical shifts. In this patch I've added only one such pattern. However, if for example zero_extract patterns are added, variations such as above would be required, too. This looks discomforting to me. Maybe the better solution would indeed be to add a arith - logical shift conversion pass before combine, or try to convert arith shifts in the split pass after combine by looking at the data flow of the shifted values.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #24 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-06 11:55:47 UTC --- Author: olegendo Date: Tue Nov 6 11:55:43 2012 New Revision: 193236 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193236 Log: PR target/54089 * config/sh/sh.c (and_xor_ior_costs, addsubcosts): Double the costs for ops larger than SImode. * config/sh/sh.md (rotcl, *rotcl): New insns and splits. (ashldi3_k): Convert to insn_and_split and use new rotcl insn. PR target/54089 * gcc.target/sh/pr54089-8.c: New. * gcc.target/sh/pr54089-9.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-8.c trunk/gcc/testsuite/gcc.target/sh/pr54089-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org changed: What|Removed |Added CC||amylaar at gcc dot gnu.org --- Comment #22 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 2012-10-16 10:52:46 UTC --- (In reply to comment #0) The code related to shift patterns in sh.c / sh.md maybe could use some improvements here and there. In some places clobbers and scratch regs could be avoided. There is also a large part that deals with minimizing and-shift/shift-and sequences, but there are no test cases to verify that those actually work. It would also be interesting to see, whether some of the and-shift/shift-and combine code could be reduced due to improvements in the middle-end. Be careful with removing 'useless' clobbers, as they might be needed to facilitate instruction combinations into patterns that have these clobbers.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #23 from Oleg Endo olegendo at gcc dot gnu.org 2012-10-16 11:49:14 UTC --- (In reply to comment #22) (In reply to comment #0) The code related to shift patterns in sh.c / sh.md maybe could use some improvements here and there. In some places clobbers and scratch regs could be avoided. There is also a large part that deals with minimizing and-shift/shift-and sequences, but there are no test cases to verify that those actually work. It would also be interesting to see, whether some of the and-shift/shift-and combine code could be reduced due to improvements in the middle-end. Be careful with removing 'useless' clobbers, as they might be needed to facilitate instruction combinations into patterns that have these clobbers. Yeah, I've noticed ;) Logical left/right shifts are now expanded into T-clobbering and non-T-clobbering pattern variations.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #21 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-30 18:45:56 UTC --- I've noticed that there seems to be a problem with register allocation related to shift insns. For example in the CSiBE set, I've seen sequences such as mov.w .L342,r2 mov #0,r7 .align 2 .L340: mov r7,r1 add r2,r1 sharr1 mov r1,r3 shll2 r3 mov r3,r0 mov.l @(r0,r5),r3 cmp/gt r4,r3 bf 0f quite often. Not sure why this happens. Maybe because 'gen_shifty_op' (which emits the shift sequence insns) is called after reload and not during the first split pass after combine...
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #20 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-25 19:06:34 UTC --- Author: olegendo Date: Tue Sep 25 19:06:28 2012 New Revision: 191743 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191743 Log: PR target/54089 * config/sh/constraints.md (Jhb): New constraint. * config/sh/predicates.md (negt_reg_shl31_operand): New predicate. * config/sh/sh.md (rotrsi3): New expander. (rotrsi3_1, *rotrsi3_1, *rotlsi3_1): New insns. (rotlsi3, rotlhi3): Use const_int_operand predicate instead of immediate_operand and remove CONST_INT_P checks in expansion code. (*rotcr): Cleanup variable usage. Handle preceding nott insn. Add split with swapped operands. (*rotcr_neg_t, *movt_msb, *negt_msb): New insns and splits. PR target/54089 * gcc.target/sh/pr54089-1.c (test_15, test_16, test_17, test_18, test_19, test_20, test_21, test_22, test_23): New functions. * gcc.target/sh/pr54089-4.c: New. * gcc.target/sh/pr54089-5.c: New. * gcc.target/sh/pr54089-6.c: New. * gcc.target/sh/pr54089-7.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-4.c trunk/gcc/testsuite/gcc.target/sh/pr54089-5.c trunk/gcc/testsuite/gcc.target/sh/pr54089-6.c trunk/gcc/testsuite/gcc.target/sh/pr54089-7.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/constraints.md trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #19 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-19 17:48:31 UTC --- Author: olegendo Date: Wed Sep 19 17:48:25 2012 New Revision: 191490 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191490 Log: PR target/54089 * config/sh/predicates.md (arith_reg_or_t_reg_operand): New predicate. * config/sh/sh.md (*rotcr): Use arith_reg_or_t_reg_operand predicate. Handle the case where one of the operands is T_REG. Add new pattern to handle MSB extraction. PR target/54089 * gcc.target/sh/pr54089-1.c (test_11, test_12, test_13, test_14): New functions. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #18 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-17 23:29:52 UTC --- Created attachment 28207 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28207 Arithmetic right shift rework I have tried to apply the same strategies to the arithmetic right shift patterns as for logical right/left shift patterns. What happens then is that combine will see arithmetic right shift patterns (for shift amounts 1) and will fail to convert arithmetic right shifts to logical right shifts where it would actually be possible. This results in bigger code since arithmetic shifts on SH are more expensive than logical shifts. Unfortunately, combine will only do the shift type conversion if it can combine the logical shift with some other insn -- which is not very likely on SH. Thus, the logical shift is discarded. If however, no arithmetic shift pattern is there that matches the shift amount (which is currently the case), combine will split out the logical shift. Not having arithmetic right shift patterns makes it difficult to write combine patterns for things like ... char test (int x) { return x 24; } which ends up as: -m4: mov#-24,r1 shadr1,r4 rts exts.br4,r0 -m2: mov.l.L3,r1 sts.lpr,@-r15 jsr@r1 nop mov lds.l@r15+,pr rts nop .L4: .align 2 .L3: .long___ashiftrt_r4_24 On the other hand, the following is handled almost as expected: unsigned char test (int a) { return a 24; } -m4: movr4,r0 shlr16r0 rts shlr8r0 -m2: mov.l.L3,r1 sts.lpr,@-r15 jsr@r1 nop extu.br4,r0 lds.l@r15+,p rts nop .L4: .align 2 .L3: .long___ashiftrt_r4_24 According to my understanding, to fix those issues something like the attached patch would be required. However, it would also require a separate arithmetic right shift to logical right shift conversion pass.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #16 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-10 20:35:30 UTC --- Author: olegendo Date: Mon Sep 10 20:35:25 2012 New Revision: 191161 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191161 Log: PR target/54089 * config/sh/sh.h (SH_DYNAMIC_SHIFT_COST): Set always to 1 if dynamic shifts are available. (SHIFT_COUNT_TRUNCATED): Always define to 0. Correct comment. * config/sh/sh.c (ashl_lshr_seq, ext_ashl_lshr_seq): Add comments. * config/sh/predicates.md (shift_count_operand): Allow arith_reg_operand even if TARGET_DYNSHIFT is false. * config/sh/sh.md (ashlsi3, lshrsi3): Expand library call patterns if needed. (ashlsi3_d_call, lshrsi3_d_call): New insns. PR target/54089 * config/sh/lib1funcs.S (ashlsi3): Reimplement as ashlsi3_r0. (lshrsi3): Reimplement as lshrsi3_r0. PR target/54089 * gcc.target/sh/pr54089-3.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/libgcc/ChangeLog trunk/libgcc/config/sh/lib1funcs.S
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #17 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-10 21:27:30 UTC --- Created attachment 28163 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28163 Alternative dropped software dynamic ashlsi3,lshrsi3 patch (In reply to comment #16) Author: olegendo Date: Mon Sep 10 20:35:25 2012 New Revision: 191161 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191161 Log: PR target/54089 * config/sh/sh.h (SH_DYNAMIC_SHIFT_COST): Set always to 1 if dynamic shifts are available. (SHIFT_COUNT_TRUNCATED): Always define to 0. Correct comment. * config/sh/sh.c (ashl_lshr_seq, ext_ashl_lshr_seq): Add comments. * config/sh/predicates.md (shift_count_operand): Allow arith_reg_operand even if TARGET_DYNSHIFT is false. * config/sh/sh.md (ashlsi3, lshrsi3): Expand library call patterns if needed. (ashlsi3_d_call, lshrsi3_d_call): New insns. PR target/54089 * config/sh/lib1funcs.S (ashlsi3): Reimplement as ashlsi3_r0. (lshrsi3): Reimplement as lshrsi3_r0. PR target/54089 * gcc.target/sh/pr54089-3.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/libgcc/ChangeLog trunk/libgcc/config/sh/lib1funcs.S This patch is just for the record/reference. It is a little bit more complex than the committed patch. The main difference is the clobber list and the input/output regs of the shift insns. The committed version seemed more beneficial.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #15 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-22 22:52:25 UTC --- Author: olegendo Date: Wed Aug 22 22:52:17 2012 New Revision: 190603 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190603 Log: PR target/54089 * config/sh/predicates (p27_rshift_count_operand, not_p27_rshift_count_operand): New predicates. * config/sh/sh.c (sh_ashlsi_clobbers_t_reg_p, sh_lshrsi_clobbers_t_reg_p, sh_dynamicalize_shift_p): Handle special case when shift amount is 31. (gen_ashift): Emit gen_shlr instead of gen_lshrsi3_m. * config/sh/sh.md (ashlsi3_d): Set type to 'dyn_shift' instead of 'arith'. (ashlsi_c): Rename to shll. Adapt calls to gen_ashlsi_c throughout the file. (lshrsi3): Remove clobber from expander. Use shift_count_operand instead of nonmemory_operand predicate for second operand. Add handling of case lshrsi3_n_clobbers_t. (lshrsi3_k): Use p27_rshift_count_operand for second operand. (lshrsi3_d): Make insn_and_split. Split dynamic shift to constant shift sequences if beneficial. (lshrsi3_n): Make insn_and_split. Split constant shift sequence to dynamic shift if beneficial. (lshrsi3_n_clobbers_t): New insn_and_split. (lshrsi3_m): Delete. PR target/54089 * gcc.target/sh/pr54089-2.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #14 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-20 21:29:26 UTC --- For the record, I've committed the following under the wrong PR number by accident. Author: olegendo Date: Mon Aug 20 20:54:20 2012 New Revision: 190545 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190545 Log: PR target/50489 * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits. (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split. * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function. * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it. PR target/50489 * gcc.target/sh/pr54089-1.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #13 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-16 23:13:18 UTC --- Author: olegendo Date: Thu Aug 16 23:13:11 2012 New Revision: 190457 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190457 Log: PR target/54089 * config/sh/sh.md (ashlsi3_d): Do not split if it would result in a T_REG clobber. Correct comment. (ashlsi3_n): Correct comment. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #12 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-11 20:25:45 UTC --- (In reply to comment #9) (In reply to comment #8) #define SH_DYNAMIC_SHIFT_COST (TARGET_DYNSHIFT ? 1 : 20) Sounds reasonable. Perhaps some historical reason for the original one, though I don't know why. Could you run SCiBE tests with it? I've checked result-size for -m2a-single -mb -O2 ... In total there's a decrease of 1728 bytes, with a few cases where there are increases. I've briefly checked out why code gets bigger in those cases. As far as I can see at the moment, one reason is because right shifts are dynamicalized (converted to dynamic shift insns) although it's not really beneficial. I think I'll try to brush up the right shifts patterns first, then try again.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #10 from Ryan Mansfield rmansfield at qnx dot com 2012-08-10 14:24:55 UTC --- Created attachment 27985 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27985 preprocessed src ./xgcc -B. -w ~/ice2.i -c -Os /home/ryan/ice2.i: In function 'tg_extent': /home/ryan/ice2.i:81:2: error: unrecognizable insn: } ^ (insn 189 163 190 9 (set (reg:SI 227) (ashift:SI (reg:SI 147 t) (const_int 1 [0x1]))) /home/ryan/ice2.i:75 -1 (nil)) /home/ryan/ice2.i:81:2: internal compiler error: in extract_insn, at recog.c:2125 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #11 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-10 15:40:07 UTC --- (In reply to comment #10) Created attachment 27985 [details] preprocessed src ./xgcc -B. -w ~/ice2.i -c -Os /home/ryan/ice2.i: In function 'tg_extent': /home/ryan/ice2.i:81:2: error: unrecognizable insn: } ^ (insn 189 163 190 9 (set (reg:SI 227) (ashift:SI (reg:SI 147 t) (const_int 1 [0x1]))) /home/ryan/ice2.i:75 -1 (nil)) /home/ryan/ice2.i:81:2: internal compiler error: in extract_insn, at recog.c:2125 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Thanks for the test case! It turns out that this is basically the same ICE as triggered by your test case in PR 39423. I will include this test case in the next patch for PR 39423.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 Oleg Endo olegendo at gcc dot gnu.org changed: What|Removed |Added CC||kkojima at gcc dot gnu.org --- Comment #4 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 21:54:42 UTC --- I'm currently playing around with the macro SHIFT_COUNT_TRUNCATED (sh.h) and the target hook TARGET_SHIFT_TRUNCATION_MASK (which is not implemented yet). Doing the following on rev 190259 (which is actually wrong): sh.h: -#define SHIFT_COUNT_TRUNCATED (! TARGET_SH3 ! TARGET_SH2A) +#define SHIFT_COUNT_TRUNCATED (1) sh.c: +/* Implement the TARGET_SHIFT_TRUNCATION_MASK target hook. */ + +#undef TARGET_SHIFT_TRUNCATION_MASK +#define TARGET_SHIFT_TRUNCATION_MASK sh_shift_truncation_mask + +static unsigned HOST_WIDE_INT +sh_shift_truncation_mask (enum machine_mode mode) +{ + int bitsize = GET_MODE_BIT_SIZE (mode); + + if (TARGET_SHMEDIA) +return bitsize - 1; + + return MAX (32, bitsize) - 1; +} ... and looking at some CSiBE size results, I see the a couple of weird cases similar to what happens to the set_bh_page function in linux-2.4.23-pre3-testplatform/fs/buffer.c. Without the changes: ... .L592: mov.l .L598,r1 !! mov r9,r3 mov.l @r1,r2 !! r2 = zone_table[0] add #124,r2 mov.l @(36,r2),r1 mov.l @(32,r2),r2 add r10,r1 mov.l r9,@(56,r8) sub r2,r3 mov r3,r2 mov.l .L599,r3 sharr2 sharr2 mul.l r3,r2 mov #12,r3 sts macl,r2 shldr3,r2 add r2,r1 mov.l r1,@(52,r8) With the changes: ... .L592: mov.l @(24,r8),r0 !! mov r8,r3 mov.l .L598,r1 shlr16 r0!! shlr8 r0!! shll2 r0!! mov.l @(r0,r1),r2 !! r2 = zone_table[page-flags ZONE_SHIFT] add #124,r2 mov.l @(36,r2),r1 mov.l @(32,r2),r2 add r10,r1 mov.l r8,@(56,r9) sub r2,r3 mov r3,r2 mov.l .L599,r3 sharr2 sharr2 mul.l r3,r2 mov #12,r3 sts macl,r2 shldr3,r2 add r2,r1 mov.l r1,@(52,r9) It seems that without the (wrong) patch, the index in the inline function 'page_zone' is reduced from 'page-flags ZONE_SHIFT' to '0', and thus the resulting code is wrong?! I've tried to reproduce this in an isolated test case but couldn't get it to do the same - the generated code seems always correct, with and without the changes. I'm confused... Kaz, do you have any idea what could be going on there?
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #5 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:17:54 UTC --- OK, I checking out the preprocessed file reveals the following relevant pieces: typedef struct page { struct list_head list; struct address_space *mapping; unsigned long index; struct page *next_hash; atomic_t count; unsigned long flags;// struct list_head lru; struct page **pprev_hash; struct buffer_head * buffers; } mem_map_t; static inline zone_t *page_zone(struct page *page) { return zone_table[page-flags (64 - 8)]; // = page-flags 56 } Depending on whether SHIFT_COUNT_TRUNCATED evals to 1 or 0, the function above returns either zone_table[0] (dyn shifts) or zone_table[page-flags 24] (no dyn shifts). Both should be OK to do, since that's undefined behavior. In this case maybe fixing SHIFT_COUNT_TRUNCATED to '0' would be better, since that would produce faster undefined behavior code ;)
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:27:55 UTC --- Author: olegendo Date: Thu Aug 9 23:27:51 2012 New Revision: 190273 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190273 Log: PR target/54089 * config/sh/sh-protos (shift_insns_rtx): Delete. (sh_ashlsi_clobbers_t_reg_p): Add. * config/sh/sh.c (shift_insns, shift_amounts, ext_shift_insns, ext_shift_amounts): Merge arrays of ints to array of structs. Adapt usage of arrays throughout the file. (shift_insns_rtx): Delete unused function. (sh_ashlsi_clobbers_t_reg_p): New function. * config/sh/sh.md (ashlsi3): Emit ashlsi3_n_clobbers_t insn if the final shift sequence will clobber T_REG. (ashlsi3_n): Split only if the final shift sequence will not clobber T_REG. (ashlsi3_n_clobbers_t): New insn_and_split. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #7 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:36:10 UTC --- Kaz, another thing I'm a bit unsure about ... #define SH_DYNAMIC_SHIFT_COST \ (TARGET_HARD_SH4 ? 1 : TARGET_DYNSHIFT ? (optimize_size ? 1 : 2) : 20) Do you have any idea, why this is not simply #define SH_DYNAMIC_SHIFT_COST (TARGET_DYNSHIFT) ? I've checked the HW manuals for SH2A, SH3 and SH4 and all state that dynamic shifts take one cycle, i.e. are as fast/slow as the other constant shift insns...