[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 --- Comment #5 from Roger Sayle --- The latest CSiBE results on x86_64-pc-linux-gnu: With -Os the total size is 3696263, and with -Os -mno-stv the total size is 3966887, i.e. 624 bytes larger. The worst regression from -mno-stv is teem-1.6.0-src/src/nrrd/parseNrrd which 402 bytes larger, and the best improvement from -mno-stv is linux-2.4.23-pre3-testplatform/net/ipv4/route which is 134 bytes smaller. So I think this is a fine tuning problem. cmp/cmov is much shorter than a pmax or a pmin, so SImode MAX/MIN should have negative gain with -Os. Likewise for const0_rtx.
[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 --- Comment #4 from Richard Biener --- Example that we don't transform but could: typedef int v4si __attribute__((vector_size(16))); #define min(a,b) ((a)<(b)?(a):(b)) v4si foo (v4si a, v4si b) { a[0] = min (a[0], b[0]); return a; } here the scalar code is movd%xmm0, %edx movd%xmm1, %eax cmpl%edx, %eax cmovg %edx, %eax pinsrd $0, %eax, %xmm0 where we could use sth like movq %xmm0, %xmm2 minpd %xmm2, %xmm1 a testcase variant could return the scalar minimum. For both cases it's likely a win even for -Os.
[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 Roger Sayle changed: What|Removed |Added CC||roger at nextmovesoftware dot com --- Comment #3 from Roger Sayle --- Hi Hongtao, Note that -mstv is a net win on the code size benchmark CSiBE, so gating the entire pass on optimize_size is not an ideal solution. Instead the gain function needs to choose which chains to transform based on optimize_size aware costs.
[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 --- Comment #2 from Hongyu Wang --- For -O2 stv doesn't do such transform Computing gain for chain #1... Instruction gain 8 for 7: {r84:SI=smax(r85:SI,0);clobber flags:CC;} REG_DEAD r85:SI REG_UNUSED flags:CC Instruction conversion gain: 8 Registers conversion cost: 12 Total gain: -4 Since sse->integer reg move cost is 6 for generic cost. Buf for -Os the cost is 3 so it is consider to be profitable. Computing gain for chain #1... Instruction gain 8 for 7: {r84:SI=smax(r85:SI,0);clobber flags:CC;} REG_DEAD r85:SI REG_UNUSED flags:CC Instruction conversion gain: 8 Registers conversion cost: 6 Total gain: 2 FWIW, the solution would be either adjust the ix86_size cost, or blocks out optimize_size in the stv gate.
[Bug target/105034] [10/11/12 regression]Suboptimal codegen for min/max with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 Richard Biener changed: What|Removed |Added Last reconfirmed||2022-03-23 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Target Milestone|--- |10.4 Priority|P3 |P2 --- Comment #1 from Richard Biener --- With -mavx it gets worse since vpxor is one byte larger than xorps. Not sure if STV is tuned for -Os very well.