[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

Matthias Kretz (Vir)  changed:

   What|Removed |Added

 Resolution|FIXED   |DUPLICATE

--- Comment #6 from Matthias Kretz (Vir)  ---


*** This bug has been marked as a duplicate of bug 93843 ***

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

Matthias Kretz (Vir)  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Matthias Kretz (Vir)  ---
this is fixed after PR93843 was fixed

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-26 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919
Bug 93919 depends on bug 93843, which changed state.

Bug 93843 Summary: [10 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

--- Comment #4 from Matthias Kretz (Vir)  ---
Yes, this is the same issue.

FWIW, a vectorization with SSE4.1 could do:
  pxor xmm0, xmm0
  pinsrw xmm0, WORD PTR in[rip], 0
  pmovsxbw xmm0, xmm0
  movd DWORD PTR out[rip], xmm0

Whether that's faster than
  movsx eax, BYTE PTR in[rip]
  mov WORD PTR out[rip], ax
  movsx eax, BYTE PTR in[rip+1]
  mov WORD PTR out[rip+2], ax

probably depends on whether the load/store ports are limiting the performance
on this section of code. Without SSE4.1 I don't think it's worth vectorizing
this conversion.

In any case, my analysis that there's an out-of-bounds store was wrong. Please
disregard.

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

--- Comment #3 from Andrew Pinski  ---
(In reply to Richard Biener from comment #2)
> Sounds like a dup of PR93843

Yes it does.

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||rsandifo at gcc dot gnu.org
 Depends on||93843

--- Comment #2 from Richard Biener  ---
Sounds like a dup of PR93843


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843
[Bug 93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

Andrew Pinski  changed:

   What|Removed |Added

 Target|x86_64-*-*, i?86-*-*|x86_64-*-*, i?86-*-*,
   ||aarch64-linux-gnu
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-02-25
  Component|tree-optimization   |middle-end
   Target Milestone|--- |10.0
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
  vector(8) char16_t vect__20.10; //8*2 = 16 bytes

  vect__19.9_55 = MEM  [(vector(32) char *)];
  vect__20.10_56 = [vec_unpack_lo_expr] vect__19.9_55;
  vect__20.10_57 = [vec_unpack_hi_expr] vect__19.9_55;
  MEM  [(char16_t *)] = vect__20.10_56;
  MEM  [(char16_t *) + 16B] = vect__20.10_57;

  vect__20.15_63 = MEM  [(vector(32) char *) + 16B];
  vect__33.16_58 = (vector(2) char16_t) vect__20.15_63;
  MEM  [(char16_t *) + 32B] = vect__33.16_58;


;; MEM  [(char16_t *) + 32B] = vect__33.16_58;

(insn 165 164 166 (set (reg:SI 251)
(sign_extend:SI (mem/c:HI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -80 [0xffb0])) [0 MEM  [(vector(32) char *) + 16B]+0 S2 A128]))) "t87656.c":14:14 -1
 (nil))

(insn 166 165 0 (set (mem/c:SI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -32 [0xffe0])) [1 MEM  [(char16_t *) + 32B]+0 S4 A256])
(reg:SI 251)) "t87656.c":14:14 -1
 (nil))

HuH?  We get this bad code also on aarch64-linux-gnu.