[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #8 from Andrew Pinski --- This is a dup of bug 11877 which is now fixed on the trunk. *** This bug has been marked as a duplicate of bug 11877 ***
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #7 from law at redhat dot com 2009-09-30 14:47 --- Subject: Re: GCC choosing poor code sequence for certain stores (x86) On 09/30/09 03:22, jakub at gcc dot gnu dot org wrote: --- Comment #6 from jakub at gcc dot gnu dot org 2009-09-30 09:22 --- For x86-64 we perhaps want further checks for the size optimization - if the scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes). And likely the 2 insns will be slower. But if the address already needs rex prefix, it is still a win. Do we have any good way to test if the address needs a rex prefix? I see the rex_prefix attribute in i386.md, but that's for testing an entire insn and based on my quick reading of i386.md it's not complete as many insns set the attribute explicitly. Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-09-29 16:07 --- difficult -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #2 from law at redhat dot com 2009-09-29 17:12 --- I don't understand your comment Richard. Isn't it just something like this? (define_peephole2 [(match_scratch:SI 2 r) (set (match_operand:SI 0 memory_operand ) (match_operand:SI 1 const_0_operand ))] [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (match_dup 2))] ) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #3 from rth at gcc dot gnu dot org 2009-09-29 21:18 --- There are already peepholes for this, though the condition appears to be slightly wrong for -Os. See i386.md:21121 : (define_peephole2 [(match_scratch:SI 1 r) (set (match_operand:SI 0 memory_operand ) (const_int 0))] optimize_insn_for_speed_p () ! TARGET_USE_MOV0 TARGET_SPLIT_LONG_MOVES get_attr_length (insn) = ix86_cur_cost ()-large_insn peep2_regno_dead_p (0, FLAGS_REG) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #4 from law at redhat dot com 2009-09-29 21:55 --- Subject: Re: GCC choosing poor code sequence for certain stores (x86) On 09/29/09 15:18, rth at gcc dot gnu dot org wrote: --- Comment #3 from rth at gcc dot gnu dot org 2009-09-29 21:18 --- There are already peepholes for this, though the condition appears to be slightly wrong for -Os. See i386.md:21121 : (define_peephole2 [(match_scratch:SI 1 r) (set (match_operand:SI 0 memory_operand ) (const_int 0))] optimize_insn_for_speed_p () ! TARGET_USE_MOV0 TARGET_SPLIT_LONG_MOVES get_attr_length (insn)= ix86_cur_cost ()-large_insn peep2_regno_dead_p (0, FLAGS_REG) Ah, yes, the flags register needs to be available. As for the condition, after reading optimization guides for the various x86 chips that mov $0, mem is generally going to be faster than xor temp, temp mov temp, mem So I was thinking we'd want something like this for the condition. ((optimize_insn_for_size_p () || (!TARGET_USE_MOV0 TARGET_SPLIT_LONG_MOVES get_attr_length (insn) = ix86_cur_cost()-large_insn)) peep2_regno_dead_p (0, FLAGS_REG) Which I think should always give us the xor sequence when optimizing for size or when optimizing for the odd x86 implementation where the xor sequence is faster. I can easily bundle that up as a patch if it looks right to you... Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #5 from rth at gcc dot gnu dot org 2009-09-29 23:43 --- Yeah, that looks right. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505