[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #8 from Oleg Endo --- (In reply to Oleg Endo from comment #1) > > void test2_2 (unsigned int x, unsigned int* y) > { > unsigned int xx = x >> 1; > unsigned int p = x & 1; > if (p != 0) > foo (xx); > } > And of course also in the opposite direction: void test4_2 (unsigned int x) { if (x & (1 << 31)) ((void(*)(void))(x << 1)) (); }
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #7 from Oleg Endo olegendo at gcc dot gnu.org --- Another example: unsigned int count_trailing_nonzero_bits (unsigned int v, unsigned int c) { c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; c += v 1; v = 1; return c; } ideally should compile to: mov #0,r1 shlr r4 movt r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0 shlr r4 addc r1,r0
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org --- The shll/shlr insns effectively perform two operations: T = zero_extract single bit 0 / 31 from reg reg = reg 1 / reg 1. The other shift insns as in comment #5 perform only a single operation. Thus those two things should be probably handled slightly differently. With my current patchset for handling single bit zero_extract (PR 64345), code like void test4 (unsigned int x, unsigned int* y) { y[0] = (x 0) 1; y[1] = (x 1) 1; y[2] = (x 2) 1; y[3] = (x 3) 1; } results in the following insns right after the combine pass: (insn 7 4 8 2 (set (reg:SI 171 [ D.1733 ]) (and:SI (reg/v:SI 169 [ x ]) (const_int 1 [0x1]))) sh_tmp.cpp:432 115 {*andsi_compact} (nil)) ... (insn 10 9 11 2 (parallel [ (set (reg:SI 173 [ D.1733 ]) (zero_extract:SI (reg/v:SI 169 [ x ]) (const_int 1 [0x1]) (const_int 1 [0x1]))) (clobber (reg:SI 147 t)) ]) sh_tmp.cpp:433 409 {any_treg_expr_to_reg} (expr_list:REG_UNUSED (reg:SI 147 t) (nil))) ... (insn 13 12 14 2 (parallel [ (set (reg:SI 175 [ D.1733 ]) (zero_extract:SI (reg/v:SI 169 [ x ]) (const_int 1 [0x1]) (const_int 2 [0x2]))) (clobber (reg:SI 147 t)) ]) sh_tmp.cpp:434 409 {any_treg_expr_to_reg} (expr_list:REG_UNUSED (reg:SI 147 t) (nil))) ... (insn 16 15 17 2 (parallel [ (set (reg:SI 177 [ D.1733 ]) (zero_extract:SI (reg/v:SI 169 [ x ]) (const_int 1 [0x1]) (const_int 3 [0x3]))) (clobber (reg:SI 147 t)) ]) sh_tmp.cpp:435 409 {any_treg_expr_to_reg} (expr_list:REG_UNUSED (reg:SI 147 t) (expr_list:REG_DEAD (reg/v:SI 169 [ x ]) (nil Those pseudo-insns are then split into tst/bld/movt/movrt sequences in the split1 pass. If a special shll/shlr pass is done right after combine and before split1, it's possible to identify potential good shll/shlr sequences rather easily and rewrite the code to use shll/shlr instead.
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 Oleg Endo olegendo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-12-02 CC||segher at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Oleg Endo olegendo at gcc dot gnu.org --- Combine recently received some updates which improve handling of multiple-set parallel insns. Applying the following: Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md(revision 218250) +++ gcc/config/sh/sh.md(working copy) @@ -5156,6 +5156,12 @@ DONE; } + if (operands[2] == const1_rtx) +{ + emit_insn (gen_shlr (operands[0], operands[1])); + DONE; +} + /* If the lshrsi3_* insn is going to clobber the T_REG it must be expanded here. */ if (CONST_INT_P (operands[2]) will always expand the multiple-set shlr insn and combine will be able to utilize this. The test case void test2_1 (unsigned int x, unsigned int* y) { y[0] = x 1; y[1] = x 1; } will compile to the desired sequence: shlrr4 movtr1 mov.l r4,@r5 rts mov.l r1,@(4,r5) However, in the context of e.g. pointer tagging use cases, the tag bits are usually used with conditional branches: void test2_2 (unsigned int x, unsigned int* y) { unsigned int xx = x 1; unsigned int p = x 1; if (p != 0) foo (xx); } Combine can't handle this, because the shift and test insns end up in different basic blocks. Moreover, in order to utilize the shlr insn, the branch condition needs to be inverted. This could be done by emitting a movt-tst sequence and let the sh_treg_combine pass optimize it away by inverting the branch condition.
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #2 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Oleg Endo from comment #1) void test2_1 (unsigned int x, unsigned int* y) { y[0] = x 1; y[1] = x 1; } will compile to the desired sequence: shlrr4 movtr1 mov.l r4,@r5 rts mov.l r1,@(4,r5) Changing the order of the operations to: void test2_1 (unsigned int x, unsigned int* y) { y[0] = x 1; y[1] = x 1; } will make it fail to combine the insns though.
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #3 from Oleg Endo olegendo at gcc dot gnu.org --- A more advanced example: void test4 (unsigned int x, unsigned int* y) { y[0] = (x 0) 1; y[1] = (x 1) 1; y[2] = x 2; } currently compiles to: mov r4,r0 and #1,r0 mov.l r0,@r5 mov r4,r0 shlrr0 and #1,r0 shlr2 r4 mov.l r0,@(4,r5) rts mov.l r4,@(8,r5) better: shlr r4 movt r0 shlr r4 mov.lr0,@r5 movt r1 mov.lr4,@(8,r5) rts mov.lr1,@(4,r5)
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #4 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Oleg Endo from comment #1) Combine recently received some updates which improve handling of multiple-set parallel insns. Applying the following: Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 218250) +++ gcc/config/sh/sh.md (working copy) @@ -5156,6 +5156,12 @@ DONE; } + if (operands[2] == const1_rtx) +{ + emit_insn (gen_shlr (operands[0], operands[1])); + DONE; +} + /* If the lshrsi3_* insn is going to clobber the T_REG it must be expanded here. */ if (CONST_INT_P (operands[2]) will always expand the multiple-set shlr insn and combine will be able to utilize this. Doing that for the shlr insn is OK, since there is no other alternative to do a 1 bit right shift without touching the T bit. However, since there is a non-T-bit-clobbering shll alternative (add x,x), doing the same for shll might have negative side effects on other sequences.
[Bug target/63321] [SH] Unused T bit result of shll / shlr insns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321 --- Comment #5 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Oleg Endo from comment #3) A more advanced example: void test4 (unsigned int x, unsigned int* y) { y[0] = (x 0) 1; y[1] = (x 1) 1; y[2] = x 2; } Which is just another example of re-using intermediate results of stitched shifts, only a bit more complex due to the multiple-set insns. void test5 (unsigned int x, unsigned int* y) { y[0] = x (2); y[1] = x (2 + 2); y[2] = x (2 + 2 + 8); } currently compiles to: mov r4,r1 shll2 r1 mov.l r1,@r5 mov r4,r1 shll2 r1 shll2 r1 mov.l r1,@(4,r5) mov #12,r1 shldr1,r4 rts mov.l r4,@(8,r5) better: shll2 r4 mov.l r4,@r5 shll2 r4 mov.l r4,@(4,r5) shll8 r4 rts mov.l r4,@(8,r5) See also some examples in PR 54089.