[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #10 from Richard Biener --- (In reply to H.J. Lu from comment #9) > Created attachment 53008 [details] > A patch for pr104441-1a.c > > Does it help? Yes, that fixes the issue.
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #9 from H.J. Lu --- Created attachment 53008 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53008=edit A patch for pr104441-1a.c Does it help?
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #8 from Alexander Monakov --- (In reply to Richard Biener from comment #7) > But it looks like the testcase is broken: > > __attribute__((always_inline, target("avx2"))) > static __m256i > load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) > { > __m128i src01, src23; > src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); > src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); > return _mm256_setr_m128i(src01, src23); > } > > it seems to expect that src23 is zero before inserting the data? If you look in the original PR 104441 testcase, it has sensible code: static __m256i __attribute__((always_inline)) load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) { __m128i src01, src23; src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); src01 = _mm_insert_epi32(src01, *(int32_t *)(src + 1 * stride), 1); src23 = _mm_cvtsi32_si128(*(int32_t*)(src + 2 * stride)); src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); return _mm256_setr_m128i(src01, src23); }
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 Richard Biener changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #7 from Richard Biener --- (In reply to Andrew Pinski from comment #6) > https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577192.html On current trunk x86_64 that gets FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3 FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1 FAIL: gcc.target/i386/pr104441-1b.c execution test FAIL: gcc.target/i386/pr98335.c scan-assembler movzbl FAIL: gcc.target/i386/pr98335.c scan-assembler-not movb FAIL: gnat.dg/sso8.adb execution test FAIL: libgomp.c/loop-19.c execution test FAILs can be reproduced in an unpatched tree with specifying -fdisable-rtl-init-regs Assembly difference for gcc.target/i386/pr104441-1b.c is (besides RA): - vpxor %xmm1, %xmm1, %xmm1 + vpinsrd $1, (%rax,%r10), %xmm5, %xmm1 + vpinsrd $1, (%rdx,%r9), %xmm4, %xmm3 vmovd (%rax), %xmm0 - vpxor %xmm2, %xmm2, %xmm2 addl$4, %ecx - vpinsrd $1, (%rax,%r10), %xmm1, %xmm1 - vpinsrd $1, (%rdx,%r9), %xmm2, %xmm2 adding initialization in compute4x_m_sad_avx2_intrin of reg 109 at in block 4 for insn 33. adding initialization in compute4x_m_sad_avx2_intrin of reg 99 at in block 4 for insn 48. where we have for example -(insn 97 31 98 4 (clobber (reg/v:V2DI 109 [ src23 ])) "/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1 - (nil)) -(insn 98 97 33 4 (set (reg/v:V2DI 109 [ src23 ]) -(const_vector:V2DI [ -(const_int 0 [0]) repeated x2 -])) "/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1 - (nil)) -(insn 33 98 36 4 (set (reg:V4SI 138 [ src23 ]) +(insn 33 31 36 4 (set (reg:V4SI 138 [ src23 ]) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 137 [ MEM[(int32_t *)src_62 + _41 * 1] ])) (subreg:V4SI (reg/v:V2DI 109 [ src23 ]) 0) (const_int 2 [0x2]))) "/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 6925 {sse4_1_pinsrd} where this produces { undef, MEM, undef, undef } without init-regs But it looks like the testcase is broken: __attribute__((always_inline, target("avx2"))) static __m256i load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) { __m128i src01, src23; src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); return _mm256_setr_m128i(src01, src23); } it seems to expect that src23 is zero before inserting the data?
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #6 from Andrew Pinski --- https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577192.html
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #5 from Richard Biener --- Re-running the experiment of disabling init-regs on x86_64 on trunk shows +FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto -fi nline-small-functions -fno-early-inlining FAIL: gcc.dg/tree-prof/20050826-2.c scan-tree-dump-not dom2 "Invalid sum" +FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vmovd| movd)[ t]+[^{\\n]*%xmm[0-9] 3 +FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vpinsr d|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1 +FAIL: gnat.dg/sso8.adb execution test with both -m64 and -m32 The gcc.dg/lto/pr48622 failure is a link-fail: /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /tmp/cc8guozm.ltrans0.ltrans.o: in function `main':^M :(.text+0x18): undefined reference to `ashift_qi_1'^M collect2: error: ld returned 1 exit status^M compiler exited with status 1 I think the testcase is broken - with initregs likely the int main () { if (ashift_qi_0 (0xff) != (u8) ((u8) 0xff << 0)) abort (); test directly resolved to abort (), leaving the rest of the code dead. The gcc.target/i386/extract-insert-combining.c looks like a combine missed optimization when facing uninitialized regs compared to all-zero. We get pinsrd $0, %esi, %xmm0 pinsrd $0, %edi, %xmm1 movl%esi, -12(%rsp) paddd %xmm0, %xmm1 pinsrd $0, %esi, %xmm0 paddd %xmm1, %xmm0 movd%xmm0, %eax ret preserving the "uninitialized" state of %xmm0 high, when initregs explicitely zeros %xmm0 then this is matched to movd. I cannot assess what goes wrong with gnat.dg/sso8.adb, but it might be a testsuite bug as well.
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #3 from Richard Biener --- After the testsuite fixes current trunk on x86_64 shows the following regressions (bootstrap and test for all languages including Ada, Go and Obj-C++, testing with {,-m32}) === gcc tests === Running target unix/ UNRESOLVED: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o execute -O -flto -finline-small-functions -fno-early-inlining FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto -finline-small-functions -fno-early-inlining FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3 FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1 Running target unix//-m32 UNRESOLVED: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o execute -O -flto -finline-small-functions -fno-early-inlining FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto -finline-small-functions -fno-early-inlining FAIL: gcc.dg/torture/pr54098.c -Os (internal compiler error) FAIL: gcc.dg/torture/pr54098.c -Os (test for excess errors) FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3 FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times (?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1 FAIL: gcc.target/i386/pr22152.c scan-assembler-times movq[ t]+[^\\n]*%mm 1 === libgo Summary for unix === # of expected passes133 Running target unix/-m32 FAIL: math === libgomp Summary for unix/ === Running target unix//-m32 FAIL: libgomp.c++/target-6.C execution test === libstdc++ tests === Running target unix/ FAIL: experimental/propagate_const/cons/default.cc execution test Running target unix//-m32 FAIL: experimental/propagate_const/cons/default.cc execution test
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 Eric Botcazou changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-04 CC||ebotcazou at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Eric Botcazou --- Probably worth investigating indeed.
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #1 from Richard Biener --- For FAIL: gcc.dg/vect/vect-strided-a-u8-i8-gap7.c execution test it indeed 'makes combine happy' with enabled vs. disabled diff *** *** 912,935 (plus:QI (reg:QI 264 [ ivtmp.50 ]) (const_int -3 [0xfffd]))) ! Trying 401 -> 186: Failed to match this instruction: (set (pc) ! (pc)) ! Successfully matched this instruction: ! (set (pc) ! (label_ref 191)) ! allowing combination of insns 401 and 186 ! original costs 4 + 4 = 0 ! replacement cost 4 ! deferring deletion of insn with uid = 401. ! modifying other_insn 187: pc=L191 ! REG_BR_PROB 9996 ! deferring rescan insn with uid = 187. Trying 193 -> 197: Failed to match this instruction: --- 911,923 (plus:QI (reg:QI 264 [ ivtmp.50 ]) (const_int -3 [0xfffd]))) ! Trying 186 -> 187: Failed to match this instruction: (set (pc) ! (if_then_else (eq (reg/v:QI 255 [ y ]) ! (const_int 0 [0])) ! (label_ref 191) ! (pc))) Trying 193 -> 197: Failed to match this instruction: where 'y' is unused and has a set to zero. But this shows an issue with the testcase which uses uninitialized y to abort(!) __attribute__ ((noinline)) int main1 () { int i; s arr[N]; s *ptr = arr; s res[N]; unsigned char u, t, s, x, y, z, w; for (i = 0; i < N; i++) { arr[i].a = i; arr[i].b = i * 2; arr[i].c = 17; arr[i].d = i+34; arr[i].e = i * 3 + 5; arr[i].f = i * 5; arr[i].g = i - 3; arr[i].h = 67; if (y) /* Avoid vectorization. */ abort (); } probably in the others as well. I'm going to fix those.
[Bug target/61810] init-regs.c papers over issues elsewhere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 --- Comment #2 from Richard Biener --- vect testsuite is now clean on trunk.