[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2021-07-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Andrew Pinski  ---
This is a dup of bug 11877 which is now fixed on the trunk.

*** This bug has been marked as a duplicate of bug 11877 ***

[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-30 Thread law at redhat dot com


--- Comment #7 from law at redhat dot com  2009-09-30 14:47 ---
Subject: Re:  GCC choosing poor code sequence for certain
 stores (x86)

On 09/30/09 03:22, jakub at gcc dot gnu dot org wrote:
 --- Comment #6 from jakub at gcc dot gnu dot org  2009-09-30 09:22 ---
 For x86-64 we perhaps want further checks for the size optimization - if the
 scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte
 movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes).
 And likely the 2 insns will be slower.
 But if the address already needs rex prefix, it is still a win.



Do we have any good way to test if the address needs a rex prefix?  I 
see the rex_prefix attribute in i386.md, but that's for testing an 
entire insn and based on my quick reading of i386.md it's not complete 
as many insns set the attribute explicitly.

Jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505



[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-29 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2009-09-29 16:07 ---
difficult


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505



[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-29 Thread law at redhat dot com


--- Comment #2 from law at redhat dot com  2009-09-29 17:12 ---
I don't understand your comment Richard.  Isn't it just something like this?
(define_peephole2
  [(match_scratch:SI 2 r)
   (set (match_operand:SI 0 memory_operand )
(match_operand:SI 1 const_0_operand ))]
  
  [(set (match_dup 2) (match_dup 1))
   (set (match_dup 0) (match_dup 2))]
  )


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505



[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-29 Thread rth at gcc dot gnu dot org


--- Comment #3 from rth at gcc dot gnu dot org  2009-09-29 21:18 ---
There are already peepholes for this, though the condition appears to be
slightly wrong for -Os.  See i386.md:21121 :

(define_peephole2
  [(match_scratch:SI 1 r)
   (set (match_operand:SI 0 memory_operand )
(const_int 0))]
  optimize_insn_for_speed_p ()
! TARGET_USE_MOV0
TARGET_SPLIT_LONG_MOVES
get_attr_length (insn) = ix86_cur_cost ()-large_insn
peep2_regno_dead_p (0, FLAGS_REG)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505



[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-29 Thread law at redhat dot com


--- Comment #4 from law at redhat dot com  2009-09-29 21:55 ---
Subject: Re:  GCC choosing poor code sequence for certain
 stores (x86)

On 09/29/09 15:18, rth at gcc dot gnu dot org wrote:
 --- Comment #3 from rth at gcc dot gnu dot org  2009-09-29 21:18 ---
 There are already peepholes for this, though the condition appears to be
 slightly wrong for -Os.  See i386.md:21121 :

 (define_peephole2
[(match_scratch:SI 1 r)
 (set (match_operand:SI 0 memory_operand )
  (const_int 0))]
optimize_insn_for_speed_p ()
   ! TARGET_USE_MOV0
   TARGET_SPLIT_LONG_MOVES
   get_attr_length (insn)= ix86_cur_cost ()-large_insn
   peep2_regno_dead_p (0, FLAGS_REG)



Ah, yes, the flags register needs to be available.

As for the condition, after reading optimization guides for the various 
x86 chips that

 mov $0, mem

is generally going to be faster than

 xor  temp, temp
 mov temp, mem

So I was thinking we'd want something like this for the condition.

  ((optimize_insn_for_size_p ()
|| (!TARGET_USE_MOV0
 TARGET_SPLIT_LONG_MOVES
 get_attr_length (insn) = ix86_cur_cost()-large_insn))
 peep2_regno_dead_p (0, FLAGS_REG)

Which I think should always give us the xor sequence when optimizing for 
size or when optimizing for the odd x86 implementation where the xor 
sequence is faster.


I can easily bundle that up as a patch if it looks right to you...

Jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505



[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)

2009-09-29 Thread rth at gcc dot gnu dot org


--- Comment #5 from rth at gcc dot gnu dot org  2009-09-29 23:43 ---
Yeah, that looks right.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505