[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Comment #6 from hubicka at gcc dot gnu dot org 2005-10-31 18:15 --- Actually the cited 4.0 sequence do not obey the const int x86_read_modify = ~(m_PENT | m_PPRO); setting -march=athlon or using -Os makes the move go away. Additionally I get following sequence out of 4.0 in SUSE distro: movlterm, %eax movl4012(%ebx), %ecx movl4028(%eax), %eax imull %ecx, %eax that also contains the extra memory move. So I am going to close this as invalid. Honza -- hubicka at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Comment #5 from mmitchel at gcc dot gnu dot org 2005-10-31 04:43 --- I think we should look for some solution to this problem, without reverting the previous patch. If this problem is amenable to a peephole, let's solve it that way. That said, I'm going to downgrade this to P4; if we can't fix it for 4.1, we'll look again for 4.2. -- mmitchel at gcc dot gnu dot org changed: What|Removed |Added Priority|P2 |P4 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
-- steven at gcc dot gnu dot org changed: What|Removed |Added Severity|normal |minor http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-28 17:29 --- (In reply to comment #2) > While it might be probably possible to design peephole or combiner insn patter > I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was > optimizing this by pure luck and the patch seems to have overall positive > effect > on code size... IMHO closing these bugs as WONTFIX is not the right thing to do. This is clearly a missed optimization opportunity. The fact that it worked by chance before your (overall good) patch does not make fixing this issue less desirable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Additional Comments From hubicka at gcc dot gnu dot org 2005-09-28 16:52 --- The actual problem here is that from combine's point of view the two alternatives (lea preceeded by loads, or add with memory operand followed by shift) looks equivalent and previously the shorter sequence was purely choosed by luck because combine followed the right edge first. It is not quite possible to solve this by combine's splitting mechanizm as the number of instruction don't change before the read-modify instructions are broken up. While it might be probably possible to design peephole or combiner insn patter I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was optimizing this by pure luck and the patch seems to have overall positive effect on code size... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Additional Comments From hubicka at gcc dot gnu dot org 2005-09-28 16:51 --- The actual problem here is that from combine's point of view the two alternatives (lea preceeded by loads, or add with memory operand followed by shift) looks equivalent and previously the shorter sequence was purely choosed by luck because combine followed the right edge first. It is not quite possible to solve this by combine's splitting mechanizm as the number of instruction don't change before the read-modify instructions are broken up. While it might be probably possible to design peephole or combiner insn patter I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was optimizing this by pure luck and the patch seems to have overall positive effect on code size... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-08-10 00:05 --- Confirmed, the rtl dumps at .combinea are almost the same except for: -(insn 45 44 46 3 (set (reg:SI 70 [ .savelines ]) -(mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ]) -(const_int 4012 [0xfac])) [5 .savelines+0 S4 A32])) 35 {*movsi_1} (nil) -(nil)) +(insn 39 38 40 3 (set (reg:SI 65 [ .num_ptrs ]) +(mem/s:SI (plus:SI (reg/f:SI 63 [ term ]) +(const_int 4028 [0xfbc])) [4 .num_ptrs+0 S4 A32])) 34 {*movsi_1} (insn_list: REG_DEP_TRUE 38 (nil)) +(expr_list:REG_DEAD (reg/f:SI 63 [ term ]) +(nil))) -(insn 46 45 47 3 (parallel [ -(set (reg:SI 71) -(mult:SI (mem/s:SI (plus:SI (reg/f:SI 68 [ term ]) -(const_int 4028 [0xfbc])) [5 .num_ptrs+0 S4 A32]) -(reg:SI 70 [ .savelines ]))) +(insn 40 39 41 3 (parallel [ +(set (reg:SI 64) +(mult:SI (reg:SI 65 [ .num_ptrs ]) +(mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ]) +(const_int 4012 [0xfac])) [4 .savelines+0 S4 A32]))) (clobber (reg:CC 17 flags)) -]) 173 {*mulsi3_1} (insn_list:REG_DEP_TRUE 43 (insn_list:REG_DEP_TRUE 45 (nil))) -(expr_list:REG_DEAD (reg/f:SI 68 [ term ]) +]) 182 {*mulsi3_1} (insn_list:REG_DEP_TRUE 39 (nil)) +(expr_list:REG_DEAD (reg:SI 65 [ .num_ptrs ]) (expr_list:REG_UNUSED (reg:CC 17 flags) -(expr_list:REG_DEAD (reg:SI 70 [ .savelines ]) -(nil) +(nil Which is because we expand the multiply (from tree to rtl) as: (insn 38 36 39 (set (reg/f:SI 63) (mem/f/i:SI (symbol_ref:SI ("term") [flags 0x40] ) [9 term+0 S4 A32])) -1 (nil) (nil)) (insn 39 38 40 (set (reg:SI 65) (mem/s:SI (plus:SI (reg/f:SI 63) (const_int 4028 [0xfbc])) [4 .num_ptrs+0 S4 A32])) -1 (nil) (nil)) (insn 40 39 41 (parallel [ (set (reg:SI 64) (mult:SI (reg:SI 65) (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ]) (const_int 4012 [0xfac])) [4 .savelines+0 S4 A32]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) before we actually expanded the load of the screen->savelines seperately: (insn 43 41 44 (set (reg/f:SI 68) (mem/f/i:SI (symbol_ref:SI ("term") [flags 0x40] ) [9 term+0 S4 A32])) -1 (nil) (nil)) (insn 44 43 45 (set (reg:SI 69) (mem/s:SI (plus:SI (reg/f:SI 68) (const_int 4028 [0xfbc])) [5 .num_ptrs+0 S4 A32])) -1 (nil) (nil)) (insn 45 44 46 (set (reg:SI 70) (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ]) (const_int 4012 [0xfac])) [5 .savelines+0 S4 A32])) -1 (nil) (nil)) (insn 46 45 47 (parallel [ (set (reg:SI 71) (mult:SI (reg:SI 69) (reg:SI 70))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) This is either a target problem for the expansion or a middle-end problem while expand, I am going to say a target issue. I think this was caused by: 2005-07-30 Jan Hubicka <[EMAIL PROTECTED]> * expr.c (expand_expr_real_1): Do not load mem targets into register. * i386.c (ix86_fixup_binary_operands): Likewise. (ix86_expand_unary_operator): Likewise. (ix86_expand_fp_absneg_operator): Likewise. * optabs.c (expand_vec_cond_expr): Validate dest. -- What|Removed |Added CC||hubicka at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Component|rtl-optimization|target Ever Confirmed||1 GCC build triplet|i686-pc-linux-gnu | GCC host triplet|i686-pc-linux-gnu | Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2005-08-10 00:05:55 date|| Summary|extra move generated on x86 |[4.1 Regression] extra move ||generated on x86 Target Milestone|--- |4.1.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302