[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-04-14 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034

--- Comment #5 from Roger Sayle  ---
The latest CSiBE results on x86_64-pc-linux-gnu:  With -Os the total size is
3696263, and with -Os -mno-stv the total size is 3966887, i.e. 624 bytes
larger.  The worst regression from -mno-stv is
teem-1.6.0-src/src/nrrd/parseNrrd which 402 bytes larger, and the best
improvement from -mno-stv is linux-2.4.23-pre3-testplatform/net/ipv4/route
which is 134 bytes smaller.  So I think this is a fine tuning problem.

cmp/cmov is much shorter than a pmax or a pmin, so SImode MAX/MIN should have
negative gain with -Os.  Likewise for const0_rtx.

[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-04-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034

--- Comment #4 from Richard Biener  ---
Example that we don't transform but could:

typedef int v4si __attribute__((vector_size(16)));

#define min(a,b) ((a)<(b)?(a):(b))

v4si foo (v4si a, v4si b)
{
  a[0] = min (a[0], b[0]);
  return a;
}

here the scalar code is

movd%xmm0, %edx
movd%xmm1, %eax
cmpl%edx, %eax
cmovg   %edx, %eax
pinsrd  $0, %eax, %xmm0

where we could use sth like

movq %xmm0, %xmm2
minpd %xmm2, %xmm1


a testcase variant could return the scalar minimum.  For both cases it's
likely a win even for -Os.

[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-04-14 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #3 from Roger Sayle  ---
Hi Hongtao,
Note that -mstv is a net win on the code size benchmark CSiBE, so gating the
entire pass on optimize_size is not an ideal solution.  Instead the gain
function needs to choose which chains to transform based on optimize_size aware
costs.

[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-03-27 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034

--- Comment #2 from Hongyu Wang  ---
For -O2 stv doesn't do such transform
Computing gain for chain #1...
  Instruction gain 8 for 7: {r84:SI=smax(r85:SI,0);clobber flags:CC;}
  REG_DEAD r85:SI
  REG_UNUSED flags:CC
  Instruction conversion gain: 8
  Registers conversion cost: 12
  Total gain: -4

Since sse->integer reg move cost is 6 for generic cost.

Buf for -Os the cost is 3 so it is consider to be profitable.
Computing gain for chain #1...
  Instruction gain 8 for 7: {r84:SI=smax(r85:SI,0);clobber flags:CC;}
  REG_DEAD r85:SI
  REG_UNUSED flags:CC
  Instruction conversion gain: 8
  Registers conversion cost: 6
  Total gain: 2

FWIW, the solution would be either adjust the ix86_size cost, or blocks out 
optimize_size in the stv gate.

[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-03-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2022-03-23
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |10.4
   Priority|P3  |P2

--- Comment #1 from Richard Biener  ---
With -mavx it gets worse since vpxor is one byte larger than xorps.  Not sure
if STV is tuned for -Os very well.