[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #10 from Richard Biener  ---
(In reply to H.J. Lu from comment #9)
> Created attachment 53008 [details]
> A patch for pr104441-1a.c
> 
> Does it help?

Yes, that fixes the issue.

[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-20 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #9 from H.J. Lu  ---
Created attachment 53008
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53008=edit
A patch for pr104441-1a.c

Does it help?

[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
(In reply to Richard Biener from comment #7)
> But it looks like the testcase is broken:
> 
> __attribute__((always_inline, target("avx2")))
> static __m256i
> load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride)
> { 
>   __m128i src01, src23;
>   src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride));
>   src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1);
>   return _mm256_setr_m128i(src01, src23);
> }
> 
> it seems to expect that src23 is zero before inserting the data?

If you look in the original PR 104441 testcase, it has sensible code:

static __m256i __attribute__((always_inline))
load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride)
{
  __m128i src01, src23;
  src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride));
  src01 = _mm_insert_epi32(src01, *(int32_t *)(src + 1 * stride), 1);
  src23 = _mm_cvtsi32_si128(*(int32_t*)(src + 2 * stride));
  src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1);
  return _mm256_setr_m128i(src01, src23);
}

[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

Richard Biener  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #7 from Richard Biener  ---
(In reply to Andrew Pinski from comment #6)
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577192.html

On current trunk x86_64 that gets

FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1
FAIL: gcc.target/i386/pr104441-1b.c execution test
FAIL: gcc.target/i386/pr98335.c scan-assembler movzbl
FAIL: gcc.target/i386/pr98335.c scan-assembler-not movb

FAIL: gnat.dg/sso8.adb execution test

FAIL: libgomp.c/loop-19.c execution test


FAILs can be reproduced in an unpatched tree with specifying
-fdisable-rtl-init-regs

Assembly difference for gcc.target/i386/pr104441-1b.c is (besides RA):

-   vpxor   %xmm1, %xmm1, %xmm1
+   vpinsrd $1, (%rax,%r10), %xmm5, %xmm1
+   vpinsrd $1, (%rdx,%r9), %xmm4, %xmm3
vmovd   (%rax), %xmm0
-   vpxor   %xmm2, %xmm2, %xmm2
addl$4, %ecx
-   vpinsrd $1, (%rax,%r10), %xmm1, %xmm1
-   vpinsrd $1, (%rdx,%r9), %xmm2, %xmm2

adding initialization in compute4x_m_sad_avx2_intrin of reg 109 at in block 4
for insn 33.
adding initialization in compute4x_m_sad_avx2_intrin of reg 99 at in block 4
for insn 48.

where we have for example

-(insn 97 31 98 4 (clobber (reg/v:V2DI 109 [ src23 ]))
"/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1
- (nil))
-(insn 98 97 33 4 (set (reg/v:V2DI 109 [ src23 ])
-(const_vector:V2DI [
-(const_int 0 [0]) repeated x2
-])) "/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1
- (nil))
-(insn 33 98 36 4 (set (reg:V4SI 138 [ src23 ])
+(insn 33 31 36 4 (set (reg:V4SI 138 [ src23 ])
 (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 137 [ MEM[(int32_t
*)src_62 + _41 * 1] ]))
 (subreg:V4SI (reg/v:V2DI 109 [ src23 ]) 0)
 (const_int 2 [0x2])))
"/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 6925
{sse4_1_pinsrd}

where this produces { undef, MEM, undef, undef } without init-regs

But it looks like the testcase is broken:

__attribute__((always_inline, target("avx2")))
static __m256i
load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride)
{ 
  __m128i src01, src23;
  src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride));
  src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1);
  return _mm256_setr_m128i(src01, src23);
}

it seems to expect that src23 is zero before inserting the data?

[Bug target/61810] init-regs.c papers over issues elsewhere

2021-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #6 from Andrew Pinski  ---
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577192.html

[Bug target/61810] init-regs.c papers over issues elsewhere

2021-08-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #5 from Richard Biener  ---
Re-running the experiment of disabling init-regs on x86_64 on trunk shows

+FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto
-fi
nline-small-functions -fno-early-inlining
 FAIL: gcc.dg/tree-prof/20050826-2.c scan-tree-dump-not dom2 "Invalid sum"
+FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vmovd|
movd)[ t]+[^{\\n]*%xmm[0-9] 3
+FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vpinsr
d|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1

+FAIL: gnat.dg/sso8.adb execution test

with both -m64 and -m32

The gcc.dg/lto/pr48622 failure is a link-fail:

/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld:
/tmp/cc8guozm.ltrans0.ltrans.o: in function `main':^M
:(.text+0x18): undefined reference to `ashift_qi_1'^M
collect2: error: ld returned 1 exit status^M
compiler exited with status 1

I think the testcase is broken - with initregs likely the

int
main ()
{
  if (ashift_qi_0 (0xff) != (u8) ((u8) 0xff << 0))
abort ();

test directly resolved to abort (), leaving the rest of the code dead.

The gcc.target/i386/extract-insert-combining.c looks like a combine
missed optimization when facing uninitialized regs compared to all-zero.
We get

pinsrd  $0, %esi, %xmm0
pinsrd  $0, %edi, %xmm1
movl%esi, -12(%rsp)
paddd   %xmm0, %xmm1
pinsrd  $0, %esi, %xmm0
paddd   %xmm1, %xmm0
movd%xmm0, %eax
ret

preserving the "uninitialized" state of %xmm0 high, when initregs
explicitely zeros %xmm0 then this is matched to movd.

I cannot assess what goes wrong with gnat.dg/sso8.adb, but it might be
a testsuite bug as well.

[Bug target/61810] init-regs.c papers over issues elsewhere

2015-12-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #3 from Richard Biener  ---
After the testsuite fixes current trunk on x86_64 shows the following
regressions
(bootstrap and test for all languages including Ada, Go and Obj-C++, testing
with {,-m32})

=== gcc tests ===

Running target unix/
UNRESOLVED: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o execute -O
-flto -finline-small-functions -fno-early-inlining
FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto
-finline-small-functions -fno-early-inlining
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1

Running target unix//-m32
UNRESOLVED: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o execute -O
-flto -finline-small-functions -fno-early-inlining
FAIL: gcc.dg/lto/pr48622 c_lto_pr48622_0.o-c_lto_pr48622_0.o link, -O -flto
-finline-small-functions -fno-early-inlining
FAIL: gcc.dg/torture/pr54098.c   -Os  (internal compiler error)
FAIL: gcc.dg/torture/pr54098.c   -Os  (test for excess errors)
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vmovd|movd)[ t]+[^{\\n]*%xmm[0-9] 3
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vpinsrd|pinsrd)[ t]+[^{\\n]*%xmm[0-9] 1
FAIL: gcc.target/i386/pr22152.c scan-assembler-times movq[ t]+[^\\n]*%mm 1

=== libgo Summary for unix ===

# of expected passes133

Running target unix/-m32
FAIL: math

=== libgomp Summary for unix/ ===

Running target unix//-m32
FAIL: libgomp.c++/target-6.C execution test

=== libstdc++ tests ===

Running target unix/
FAIL: experimental/propagate_const/cons/default.cc execution test

Running target unix//-m32
FAIL: experimental/propagate_const/cons/default.cc execution test

[Bug target/61810] init-regs.c papers over issues elsewhere

2015-12-04 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-04
 CC||ebotcazou at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Eric Botcazou  ---
Probably worth investigating indeed.

[Bug target/61810] init-regs.c papers over issues elsewhere

2015-12-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #1 from Richard Biener  ---
For

FAIL: gcc.dg/vect/vect-strided-a-u8-i8-gap7.c execution test

it indeed 'makes combine happy' with enabled vs. disabled diff

***
*** 912,935 
  (plus:QI (reg:QI 264 [ ivtmp.50 ])
  (const_int -3 [0xfffd])))

! Trying 401 -> 186:
  Failed to match this instruction:
  (set (pc)
! (pc))
! Successfully matched this instruction:
! (set (pc)
! (label_ref 191))
! allowing combination of insns 401 and 186
! original costs 4 + 4 = 0
! replacement cost 4
! deferring deletion of insn with uid = 401.
! modifying other_insn   187: pc=L191
!   REG_BR_PROB 9996
! deferring rescan insn with uid = 187.

  Trying 193 -> 197:
  Failed to match this instruction:
--- 911,923 
  (plus:QI (reg:QI 264 [ ivtmp.50 ])
  (const_int -3 [0xfffd])))

! Trying 186 -> 187:
  Failed to match this instruction:
  (set (pc)
! (if_then_else (eq (reg/v:QI 255 [ y ])
! (const_int 0 [0]))
! (label_ref 191)
! (pc)))

  Trying 193 -> 197:
  Failed to match this instruction:

where 'y' is unused and has a set to zero.  But this shows an issue with
the testcase which uses uninitialized y to abort(!)

__attribute__ ((noinline)) int
main1 ()
{
  int i;
  s arr[N];
  s *ptr = arr;
  s res[N];
  unsigned char u, t, s, x, y, z, w;

  for (i = 0; i < N; i++)
{
  arr[i].a = i;
  arr[i].b = i * 2;
  arr[i].c = 17;
  arr[i].d = i+34;
  arr[i].e = i * 3 + 5;
  arr[i].f = i * 5;
  arr[i].g = i - 3;
  arr[i].h = 67;
  if (y) /* Avoid vectorization.  */
abort ();
}

probably in the others as well.  I'm going to fix those.

[Bug target/61810] init-regs.c papers over issues elsewhere

2015-12-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

--- Comment #2 from Richard Biener  ---
vect testsuite is now clean on trunk.