[Bug c++/100261] New: [11/12 Regression] ICE: tree check: expected var_decl or type_decl, have error_mark in emit_tinfo_decl, at cp/rtti.c:1643

2021-04-25 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100261

Bug ID: 100261
   Summary: [11/12 Regression] ICE: tree check: expected var_decl
or type_decl, have error_mark in emit_tinfo_decl, at
cp/rtti.c:1643
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gcc-11.0.1-alpha20210418 snapshot (g:b412ce8e961052e6becea3bc783a53e1d5feaa0f)
ICEs when compiling the following testcase, reduced from
libstdc++-v3/testsuite/18_support/type_info/fundamental.cc:

#include 

namespace std {
  namespace decimal {
class decimal32 {
  float private__decfloat32;
};
  }
}

void
foo ()
{
  typeid (float);
  typeid (std::decimal::decimal32);
}

% g++-11.0.1 -c dvovnhjr.cc
dvovnhjr.cc:15:34: error: conflicting declaration 'const
__class_type_info_pseudo_8 _ZTIf'
   15 |   typeid (std::decimal::decimal32);
  |  ^
dvovnhjr.cc:14:16: note: previous declaration as 'const
__fundamental_type_info_pseudo_2 _ZTIf'
   14 |   typeid (float);
  |^
dvovnhjr.cc:16:1: internal compiler error: tree check: expected var_decl or
type_decl, have error_mark in emit_tinfo_decl, at cp/rtti.c:1643
   16 | }
  | ^
0x814248 tree_check_failed(tree_node const*, char const*, int, char const*,
...)
   
/var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/tree.c:9816
0x6c3715 tree_check2(tree_node*, char const*, int, char const*, tree_code,
tree_code)
   
/var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/tree.h:3372
0x6c3715 emit_tinfo_decl(tree_node*)
   
/var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/cp/rtti.c:1643
0x99d5cc c_parse_final_cleanups()
   
/var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/cp/decl2.c:4994

[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

--- Comment #3 from Hongtao.liu  ---
(In reply to Andrew Pinski from comment #2)
> The problem is right away in expand:
> ;; vect__36.383_12 = MEM  [(char *
> {ref-all})_10 + 16B];
> 
> (insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ])
> (mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ])
> (const_int 16 [0x10])) [0 MEM 
> [(char * {ref-all})_10 + 16B]+0 S16 A128])) -1
>  (nil))
> 
> 
> I think SLP did not mark the load as unaligned even though it knows it is
> one:
But gimple tree is marked as aligned.

 
unit-size 
align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea300a80 precision:128 min  max

pointer_to_this >
unsigned V1TI size  unit-size

align:128 warn_if_not_align:0 symtab:0 alias-set 31 canonical-type
0x7fffe9a59150 nunits:1
pointer_to_this >

arg:0 
sizes-gimplified public unsigned type_6 DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea30c498>
visited
def_stmt _11 =  + _214;
version:11
ptr-info 0x7fffe9487330>
arg:1 
constant 16>>

[Bug c/100260] New: DSE: join stores

2021-04-25 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100260

Bug ID: 100260
   Summary: DSE: join stores
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

#include 

struct pam {
  void *p1;
  void *p2;
  #ifdef LONG
  unsigned long size;
  #else
  unsigned int pad;
  unsigned int size;
  #endif
};

extern int use(struct pam *param);

unsigned int foo(void) {
  struct pam s_pam;
  memset(_pam, 0, sizeof(struct pam));
  s_pam.size = 1;
  return use(_pam);
}

INT

foo():
  sub rsp, 40
  pxor xmm0, xmm0
  mov rdi, rsp
  mov DWORD PTR [rsp+16], 0
  mov DWORD PTR [rsp+20], 1
  movaps XMMWORD PTR [rsp], xmm0
  call use(pam*)
  add rsp, 40
  ret

LONG

foo():
  sub rsp, 40
  pxor xmm0, xmm0
  mov rdi, rsp
  movaps XMMWORD PTR [rsp], xmm0
  mov QWORD PTR [rsp+16], 1
  call use(pam*)
  add rsp, 40
  ret

Stores
  mov DWORD PTR [rsp+16], 0
  mov DWORD PTR [rsp+20], 1
can be replaced with one mov QWORD..

[Bug c++/100248] ICE with global "default" keyword

2021-04-25 Thread hewillk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100248

--- Comment #1 from 康桓瑋  ---
Reduced to no header:

struct S {};
bool operator==(S&&, S&&) = default;

:2:29: internal compiler error: Segmentation fault
2 | bool operator==(S&&, S&&) = default;
  | ^~~
0x1cff019 internal_error(char const*, ...)
???:0
0x1366c00 strip_array_types(tree_node*)
???:0
0x9c9778 cp_type_quals(tree_node const*)
???:0
0x9ba4c3 cp_build_qualified_type_real(tree_node*, int, int)
???:0
0x82f194 defaultable_fn_check(tree_node*)
???:0
0x7b2cc8 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int)
???:0
0x8e1f6d c_parse_file()
???:0
0xa621b2 c_common_parse_file()
???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.

[Bug libstdc++/100259] New: ODR violations in

2021-04-25 Thread aaron at aarongraham dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259

Bug ID: 100259
   Summary: ODR violations in 
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aaron at aarongraham dot com
  Target Milestone: ---

Current implementation in  has functions that violate
ODR:

  std::experimental::net::ip::make_error_code
  std::experimental::net::ip::make_error_condition
  std::experimental::net::ip::make_network_v4

It seems these should be inline and/or constexpr. There are probably others.

[Bug middle-end/95922] Failure to optimize `((b ^ a) & c) ^ a` to `(a & ~c) | (b & c)` the right way on architectures with andnot

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95922

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug rtl-optimization/94806] Failure to optimize unary minus for 128-bit operand

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94806

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/94863] Failure to use blendps over mov when possible

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/94870] Failure to use movhlps instead of seperated mov+unpckhpd

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94870

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/94930] Failure to optimize out subvsi in expansion of __builtin_memcmp with 1 as the operand with -ftrapv

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94930

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/100257] poor codegen with vcvtph2ps / stride of 6

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/94898] Failure to optimize compare plus sub of same operands into compare

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94898

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/94916] Failure to optimize pattern into difference or zero selector

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94916

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/95404] Failure to optimize compare to power of 2 and bitwise and to more direct bitwise and

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95404

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/95408] Failure to optimize bitwise and with negated conditional using the same operand to conditional with decremented operand

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95408

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/95738] Failure to optimize comparison of vector after sign xor to unsigned comparison

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95738

Andrew Pinski  changed:

   What|Removed |Added

Summary|Failure to optimize |Failure to optimize
   |comparison of float after   |comparison of vector after
   |sign xor to unsigned|sign xor to unsigned
   |comparison  |comparison
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
Vectors optimizations are less likely to implemented really.

[Bug middle-end/19987] [meta-bug] fold missing optimizations in general

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
Bug 19987 depends on bug 95914, which changed state.

Bug 95914 Summary: Failure to optimize saturated add properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95914

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/95914] Failure to optimize saturated add properly

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95914

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Target Milestone|--- |11.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
.ident  "GCC: (GNU) 11.0.1 20210228 (experimental) [master revision
5d9d6c1cd8d:fd96f7217ea:ec9dc4fa0803cb85ae0b981ca0d6a406e8f6669c]"

Produces:

addw%di, %si
movl$-1, %eax
cmovnc  %esi, %eax

Which is exactly what you would have expected really.

This is because we find ADD_OVERFLOW now:
  _6 = .ADD_OVERFLOW (a_2(D), b_3(D));
  c_4 = REALPART_EXPR <_6>;
  _7 = IMAGPART_EXPR <_6>;
  if (_7 == 0)
goto ; [65.00%]
  else
goto ; [35.00%]

   [local count: 375809640]:

   [local count: 1073741824]:
  # iftmp.0_1 = PHI 

So fixed.

[Bug tree-optimization/95923] Failure to optimize bool checks into and

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95923

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/96172] Failure to optimize direct assignment to bitfield through shifts

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96172

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/96702] Failure to optimize comparisons involving result of subtraction

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96702

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/94893] Sign function not getting optimized to simple compare

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-04-26
 Status|UNCONFIRMED |NEW

--- Comment #2 from Andrew Pinski  ---
Confirmed.
Note the original code does have one slight undefined case, when x is INT_MIN
but that can be fixed by casting to unsigned before doing the negate of x.
That is:
inline int sign(int x)
{
return (x >> 31) | (-(unsigned)x >> 31);
}

 CUT 
Matching this:
  _1 = x_5(D) >> 31;
  x.0_2 = (unsigned int) x_5(D);
  _3 = -x.0_2;
  _4 = _3 >> 31;
  _8 = (int) _4;
  _6 = _1 | _8;

Into:
t = (int)(x > 0);
result = x < 0 ? - 1 : t;
Might not be the best thing.

[Bug tree-optimization/94893] Sign function not getting optimized to simple compare

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/98710] missing optimization (x | c) & ~(y | c) -> x & ~(y | c)

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98710

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Last reconfirmed||2021-04-26
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
I will be implementing this for GCC 12.

[Bug c++/97952] Poor optimization of closure-like construct in C++ as compared to C

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97952

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c++/100252] Internal compiler error during template instantiation

2021-04-25 Thread sand at rifkin dot dev via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100252

--- Comment #2 from Jeremy R.  ---
Even more minimal case: https://godbolt.org/z/M3Tv9oqcn

[Bug target/100257] poor codegen with vcvtph2ps / stride of 6

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257

--- Comment #1 from Andrew Pinski  ---
Looks like a few missed optimizations at the tree level (and a target issue of
the store):
  memcpy (, src_33, 6);
  _1 = pixel.b;
  _2 = pixel.g;
  _3 = pixel.r;
  val_2.0_21 = (short int) _1;
  val_1.1_22 = (short int) _2;
  val_0.2_23 = (short int) _3;
  _24 = {val_0.2_23, val_1.1_22, val_2.0_21, 0, 0, 0, 0, 0};
  _25 = __builtin_ia32_vcvtph2ps (_24);
  _14 = BIT_FIELD_REF <_25, 64, 0>;
  _28 = BIT_FIELD_REF <_25, 32, 64>;
  MEM  [(float *)dst_34] = _14;
  MEM[(float *)dst_34 + 8B] = _28;
  MEM[(float *)dst_34 + 12B] = 1.0e+0;


The store issue is now PR 100258.
This is more about the missed optimization of the first part, the conversion.

[Bug target/100258] New: constant store pulled out of the loop causes an extra memory load

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100258

Bug ID: 100258
   Summary: constant store pulled out of the loop causes an extra
memory load
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-linux-gnu

Take:
void f(float *x, int t)
{
  for(int i = 0; i < t; i++)
x[i*3] = 1.0;
}

Right now this produces for it at -O2:
testl   %esi, %esi
jle .L5
leal-1(%rsi), %eax
leaq(%rax,%rax,2), %rax
vmovss  .LC0(%rip), %xmm0
leaq12(%rdi,%rax,4), %rax
.p2align 4,,10
.p2align 3
.L3:
vmovss  %xmm0, (%rdi)
addq$12, %rdi
cmpq%rax, %rdi
jne .L3
.L5:
ret

- CUT 
If we don't have a loop, e.g. just a store to *x, we get:
movl$0x3f80, (%rdi)
Which is 100x more effiecent and we just need a loop around that without
doing the load of .LC0.

[Bug libstdc++/100017] error: 'fenv_t' has not been declared in '::' x86_64-w64-mingw32 host cross toolchain fails to build

2021-04-25 Thread davem at devkitpro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017

--- Comment #12 from Dave Murphy  ---
Naive patch based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017#c7
gets my canadian crosses building. 

diff --git a/libstdc++-v3/include/c_compatibility/fenv.h
b/libstdc++-v3/include/c_compatibility/fenv.h
index 0413e3b7c25..56cabaa3635 100644
--- a/libstdc++-v3/include/c_compatibility/fenv.h
+++ b/libstdc++-v3/include/c_compatibility/fenv.h
@@ -26,6 +26,10 @@
  *  This is a Standard C++ Library header.
  */

+#if !defined __cplusplus || defined _GLIBCXX_INCLUDE_NEXT_C_HEADERS
+# include_next 
+#else
+
 #ifndef _GLIBCXX_FENV_H
 #define _GLIBCXX_FENV_H 1

diff --git a/libstdc++-v3/include/c_global/cfenv
b/libstdc++-v3/include/c_global/cfenv
index 0b0ec35a837..d24cb1a3c81 100644
--- a/libstdc++-v3/include/c_global/cfenv
+++ b/libstdc++-v3/include/c_global/cfenv
@@ -37,9 +37,11 @@

 #include 

-#if _GLIBCXX_HAVE_FENV_H
-# include 
-#endif
+// Need to ensure this finds the C library's  not a libstdc++
+// wrapper that might already be installed later in the include search path.
+#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS
+#include_next 
+#undef _GLIBCXX_INCLUDE_NEXT_C_HEADERS

 #ifdef _GLIBCXX_USE_C99_FENV_TR1

[Bug c/100257] New: poor codegen with vcvtph2ps / stride of 6

2021-04-25 Thread witold.baryluk+gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257

Bug ID: 100257
   Summary: poor codegen with vcvtph2ps / stride of 6
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: witold.baryluk+gcc at gmail dot com
  Target Milestone: ---

gcc (Compiler-Explorer-Build) 12.0.0 20210424 (experimental)


https://godbolt.org/z/n6ooMdnz8


This C code:

```
#include 
#include 
#include 

struct float3 {
float f1;
float f2;
float f3;
};

struct util_format_r16g16b16_float {
   uint16_t r;
   uint16_t g;
   uint16_t b;
};

static inline struct float3 _mesa_half3_to_float3(uint16_t val_0, uint16_t
val_1, uint16_t val_2) {
#if defined(__F16C__)
  //const __m128i in = {val_0, val_1, val_2};
  //__m128 out;
  //__asm volatile("vcvtph2ps %1, %0" : "=v"(out) : "v"(in));

  const __m128i in = _mm_setr_epi16(val_0, val_1, val_2, 0, 0, 0, 0, 0);
  const __m128 out = _mm_cvtph_ps(in);

  const struct float3 r = {out[0], out[1], out[2]};
  return r;
#endif
}


void
util_format_r16g16b16_float_unpack_rgba_float(void *restrict dst_row, const
uint8_t *restrict src, unsigned width)
{
   float *dst = dst_row;
   for (unsigned x = 0; x < width; x += 1) {
const struct util_format_r16g16b16_float pixel;
memcpy(, src, sizeof pixel);

struct float3 r = _mesa_half3_to_float3(pixel.r, pixel.g, pixel.b);
dst[0] = r.f1; /* r */
dst[1] = r.f2; /* g */
dst[2] = r.f3; /* b */
dst[3] = 1; /* a */

src += 6;
dst += 4;
   }
}

```

Is compiled "poorly" by gcc, even worse when compiled on i386 (with -mf16c
enabled) when using -FPIE.

Example:


gcc -O3 -m32 -march=znver2 -mfpmath=sse -fPIE

util_format_r16g16b16_float_unpack_rgba_float:
pushebp
pushedi
pushesi
pushebx
sub esp, 28
mov ecx, DWORD PTR 56[esp]
mov edx, DWORD PTR 48[esp]
call__x86.get_pc_thunk.ax
add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
mov ebx, DWORD PTR 52[esp]
testecx, ecx
je  .L8
vmovss  xmm3, DWORD PTR .LC0@GOTOFF[eax]
xor esi, esi
xor ebp, ebp
vpxor   xmm2, xmm2, xmm2
.L3:
mov eax, DWORD PTR [ebx]
vmovss  DWORD PTR 12[edx], xmm3
add ebx, 6
add edx, 16
inc esi
mov ecx, eax
vmovd   xmm0, eax
shr ecx, 16
mov edi, ecx
movzx   ecx, WORD PTR -2[ebx]
vpinsrw xmm0, xmm0, edi, 1
vmovd   xmm1, ecx
vpinsrw xmm1, xmm1, ebp, 1
vpunpckldq  xmm0, xmm0, xmm1
vpunpcklqdq xmm0, xmm0, xmm2
vcvtph2ps   xmm0, xmm0
vmovss  DWORD PTR -16[edx], xmm0
vextractps  DWORD PTR -12[edx], xmm0, 1
vextractps  DWORD PTR -8[edx], xmm0, 2
cmp DWORD PTR 56[esp], esi
jne .L3
.L8:
add esp, 28
pop ebx
pop esi
pop edi
pop ebp
ret
.LC0:
.long   1065353216
__x86.get_pc_thunk.ax:
mov eax, DWORD PTR [esp]
ret



clang:

util_format_r16g16b16_float_unpack_rgba_float: #
@util_format_r16g16b16_float_unpack_rgba_float
mov eax, dword ptr [esp + 12]
testeax, eax
je  .LBB0_3
mov ecx, dword ptr [esp + 8]
mov edx, dword ptr [esp + 4]
.LBB0_2:# =>This Inner Loop Header: Depth=1
vmovd   xmm0, dword ptr [ecx]   # xmm0 = mem[0],zero,zero,zero
vpinsrw xmm0, xmm0, word ptr [ecx + 4], 2
add ecx, 6
vcvtph2ps   xmm0, xmm0
vmovss  dword ptr [edx], xmm0
vextractps  dword ptr [edx + 4], xmm0, 1
vextractps  dword ptr [edx + 8], xmm0, 2
mov dword ptr [edx + 12], 1065353216
add edx, 16
dec eax
jne .LBB0_2
.LBB0_3:
ret


clang code is essentially optimal.


The issue persist if I use `vcvtph2ps` directly via asm, or via intrinsics.

The issue might be the src stride, of 6, instead 8, that is confusing gcc.

Additionally, constant 1065353216  (which is weird, I would expect it to be 0),
is stored in data section, instead inline as immediate, this makes code
actually larger, and in PIE mode, requires extra pointer trickery, and on -m32,
even calling extra function.

Even without -fPIE the main loop has poor codegen even on x86-64 / amd64
compared to clang or what I would considered good code.

gcc -m64 -O3 -march=native

util_format_r16g16b16_float_unpack_rgba_float:
testedx, edx
je  .L8
mov edx, edx
sal rdx, 4
vmovss  xmm3, DWORD PTR .LC0[rip]
lea rcx, [rdi+rdx]
 

[Bug tree-optimization/100256] New: spurious stringop-overflow warning with memset(..., sizeof(dest)) on variable-length array at -O3

2021-04-25 Thread gandalf at winds dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100256

Bug ID: 100256
   Summary: spurious stringop-overflow warning with memset(...,
sizeof(dest)) on variable-length array at -O3
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gandalf at winds dot org
  Target Milestone: ---

When 'j_degree' is unknown per the function below, -O3 causes a
stringop-overflow warning to be emitted on memset() with strange region sizes.
The code snapshot below is the result of trying to simplify/remove as many
lines as I could while still causing the warning to generate.

GCC 10.3.0 and GCC 11.0.1 commit a6f018fcc6ce9236ff37eac33b01a0a80103c9f6,
running on x86_64-pc-linux-gnu (Gentoo):

---

typedef long unsigned int size_t;

extern void *memset (void *__s, int __c, size_t __n) __attribute__
((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));

extern void *calloc (size_t __nmemb, size_t __size)
 __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__malloc__))
__attribute__ ((__alloc_size__ (1, 2))) ;

static void setup_matrix(double **ppd_xx, double *pd_xy, int j_degree)
{
  int kk;
  double ad_xsum[j_degree*2 + 1];

  memset(ad_xsum,0,sizeof(ad_xsum));

  for(kk=0; kk < j_degree*2 + 1; kk++) {
ad_xsum[kk]++;
if(kk < j_degree + 1)
  pd_xy[kk]++;
  }
}

void polyfit(int j_degree, double ad_coef[], double *pd_xy, double **ppd_xx)
{
  int jj;

  for(jj=0;jj

[Bug debug/100255] Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register]

2021-04-25 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100255

Jakub Jelinek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-04-25
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

[Bug debug/100255] New: Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register]

2021-04-25 Thread jbglaw--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100255

Bug ID: 100255
   Summary: Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20:
error: ISO C++17 does not allow 'register' storage
class specifier [-Werror=register]
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jbg...@lug-owl.de
  Target Milestone: ---

I'm revamping my testing efforts, building cross compilers based on targets
listed in ./contrib/config-list.mk.

With .../configure --target=ia64-hp-vms --enable-werror-always
--enable-languages=all --prefix=/tmp/gcc-ia64-hp-vms (using g++ (Debian
20210320-1) 11.0.1 20210320 (experimental) [master revision
3279a9a5a9a:6526c452d22:5f256a70a05fcfc5a1caf56678ceb12b4f87f781] as the host's
compiler), build breaks (as of ed16241c6db23013d70b792a64f29080ad48a414) with
this (cf. http://toolchain.lug-owl.de:8080/jobs/gcc-ia64-hp-vms/8):

make all-gcc
[...]
[all 2021-04-25 20:52:34.441260] /usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c  
-g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I.
-I../.././gcc -I../.././gcc/. -I../.././gcc/../include
-I../.././gcc/../libcpp/include -I../.././gcc/../libcody 
-I../.././gcc/../libdecnumber -I../.././gcc/../libdecnumber/dpd
-I../libdecnumber -I../.././gcc/../libbacktrace   -o vmsdbgout.o -MT
vmsdbgout.o -MMD -MP -MF ./.deps/vmsdbgout.TPo ../.././gcc/vmsdbgout.c
[all 2021-04-25 20:52:38.329848] ../.././gcc/vmsdbgout.c: In function 'int
write_debug_string(const char*, const char*, int)':
[all 2021-04-25 20:52:38.330047] ../.././gcc/vmsdbgout.c:368:20: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.330139]   368 |   register int slen = strlen (P); 
 \
[all 2021-04-25 20:52:38.330220]   |^~~~
[all 2021-04-25 20:52:38.330296] ../.././gcc/vmsdbgout.c:549:7: note: in
expansion of macro 'ASM_OUTPUT_DEBUG_STRING'
[all 2021-04-25 20:52:38.330371]   549 |   ASM_OUTPUT_DEBUG_STRING
(asm_out_file, string);
[all 2021-04-25 20:52:38.330447]   |   ^~~
[all 2021-04-25 20:52:38.331123] ../.././gcc/vmsdbgout.c:369:28: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.331292]   369 |   register const char *p = (P);   
 \
[all 2021-04-25 20:52:38.331370]   |^
[all 2021-04-25 20:52:38.331439] ../.././gcc/vmsdbgout.c:549:7: note: in
expansion of macro 'ASM_OUTPUT_DEBUG_STRING'
[all 2021-04-25 20:52:38.331505]   549 |   ASM_OUTPUT_DEBUG_STRING
(asm_out_file, string);
[all 2021-04-25 20:52:38.331577]   |   ^~~
[all 2021-04-25 20:52:38.331953] ../.././gcc/vmsdbgout.c:370:20: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.332102]   370 |   register int i; 
 \
[all 2021-04-25 20:52:38.332181]   |^
[all 2021-04-25 20:52:38.332287] ../.././gcc/vmsdbgout.c:549:7: note: in
expansion of macro 'ASM_OUTPUT_DEBUG_STRING'
[all 2021-04-25 20:52:38.332364]   549 |   ASM_OUTPUT_DEBUG_STRING
(asm_out_file, string);
[all 2021-04-25 20:52:38.332437]   |   ^~~
[all 2021-04-25 20:52:38.333260] ../.././gcc/vmsdbgout.c:374:24: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.333448]   374 |   register int c = p[i];  
 \
[all 2021-04-25 20:52:38.333528]   |^
[all 2021-04-25 20:52:38.333600] ../.././gcc/vmsdbgout.c:549:7: note: in
expansion of macro 'ASM_OUTPUT_DEBUG_STRING'
[all 2021-04-25 20:52:38.333668]   549 |   ASM_OUTPUT_DEBUG_STRING
(asm_out_file, string);
[all 2021-04-25 20:52:38.333734]   |   ^~~
[all 2021-04-25 20:52:38.386301] ../.././gcc/vmsdbgout.c: At global scope:
[all 2021-04-25 20:52:38.386507] ../.././gcc/vmsdbgout.c:1232:42: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.386595]  1232 | vmsdbgout_begin_block (register
unsigned line, register unsigned blocknum)
[all 2021-04-25 20:52:38.386663]   |   
  ^~~~
[all 2021-04-25 20:52:38.387009] ../.././gcc/vmsdbgout.c:1232:66: error: ISO
C++17 does not allow 'register' storage class specifier [-Werror=register]
[all 2021-04-25 20:52:38.387164]  1232 | vmsdbgout_begin_block (register
unsigned 

[Bug debug/100254] New: [11/12 Regression] -fcompare-debug failure (length) with -O2 -fno-guess-branch-probability -fipa-pta -fnon-call-exceptions

2021-04-25 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100254

Bug ID: 100254
   Summary: [11/12 Regression] -fcompare-debug failure (length)
with -O2 -fno-guess-branch-probability -fipa-pta
-fnon-call-exceptions
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
CC: aoliva at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu

Created attachment 50673
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50673=edit
auto-reduced testcase (from OpenTTD sources)

Compiler output:
$ x86_64-pc-linux-gnu-g++ -O2 -mtune=goldmont -fno-guess-branch-probability
-fipa-pta -fnon-call-exceptions -fcompare-debug testcase.C 
testcase.C:20:38: warning: friend declaration 'bool
operator!=(_Rb_tree_const_iterator<_Tp>::_Self,
_Rb_tree_const_iterator<_Tp>::_Self)' declares a non-template function
[-Wnon-template-friend]
   20 |   friend bool operator!=(_Self, _Self);
  |  ^
testcase.C:20:38: note: (if this is not what you intended, make sure the
function template has already been declared and add '<>' after the function
name here)
testcase.C: In member function 'bool CargoSorter::operator()(const
CargoDataEntry*, const CargoDataEntry*) const':
testcase.C:82:61: warning: no return statement in function returning non-void
[-Wreturn-type]
   82 |  const CargoDataEntry *) const {}
  | ^
x86_64-pc-linux-gnu-g++: error: testcase.C: '-fcompare-debug' failure (length)

$ x86_64-pc-linux-gnu-g++ -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest/bin/x86_64-pc-linux-gnu-g++
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-100-20210424001429-gbcd77b7b9f3-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r12-100-20210424001429-gbcd77b7b9f3-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20210424 (experimental) (GCC)

[Bug c++/100252] Internal compiler error during template instantiation

2021-04-25 Thread sand at rifkin dot dev via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100252

--- Comment #1 from Jeremy R.  ---
A more minimal case: https://godbolt.org/z/jxP9e35bz

[Bug fortran/100245] ICE on automatic reallocation

2021-04-25 Thread jrfsousa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100245

--- Comment #2 from José Rui Faustino de Sousa  ---
Patch posted:

https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2021-04-25 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from rsandifo at gcc dot gnu.org  
---
Fixed for GCC 9 and above.  Thanks for the bug report.

[Bug target/98302] [9 Regression] Wrong code on aarch64

2021-04-25 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98302

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from rsandifo at gcc dot gnu.org  
---
Fixed.

[Bug tree-optimization/95694] [9 Regression] ICE in trunc_int_for_mode, at explow.c:59 since r9-7156-g33579b59aaf02eb7

2021-04-25 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95694

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
Fixed.

[Bug target/99929] [8/9 Backport] SVE: Wrong code at -O2 -ftree-vectorize

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99929

--- Comment #6 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:1f3c550c9188e1afb30cb9d40c419c3e6ced5cb3

commit r9-9465-g1f3c550c9188e1afb30cb9d40c419c3e6ced5cb3
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:16 2021 +0100

Check for matching CONST_VECTOR encodings [PR99929]

PR99929 is one of those âhow did we get away with this for so longâ
bugs: the equality routines weren't checking whether two variable-length
CONST_VECTORs had the same encoding.  This meant that:

   { 1, 0, 0, 0, 0, 0, ... }

would appear to be equal to:

   { 1, 0, 1, 0, 1, 0, ... }

since both are represented using the elements { 1, 0 }.

gcc/
PR rtl-optimization/99929
* rtl.h (same_vector_encodings_p): New function.
* cse.c (exp_equiv_p): Check that CONST_VECTORs have the same
encoding.
* cselib.c (rtx_equal_for_cselib_1): Likewise.
* jump.c (rtx_renumbered_equal_p): Likewise.
* lra-constraints.c (operands_match_p): Likewise.
* reload.c (operands_match_p): Likewise.
* rtl.c (rtx_equal_p_cb, rtx_equal_p): Likewise.

(cherry picked from commit a87d3f964df31d4fbceb822c6d293e85c117d992)

[Bug target/98136] [8/9 Regression] [aarch64] Internal compiler error with large classes and virtual methods since r8-5967-gf5470a77425a54efebfe1732488c40f05ef176d0

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98136

--- Comment #7 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:04aaa315db27726e090ca7c3ca3aed9dd5895701

commit r9-9464-g04aaa315db27726e090ca7c3ca3aed9dd5895701
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:15 2021 +0100

aarch64: Tweak post-RA handling of CONST_INT moves [PR98136]

This PR is a regression caused by r8-5967, where we replaced
a call to aarch64_internal_mov_immediate in aarch64_add_offset
with a call to aarch64_force_temporary, which in turn uses the
normal emit_move_insn{,_1} routines.

The problem is that aarch64_add_offset can be called while
outputting a thunk, where we require all instructions to be
valid without splitting.  However, the move expanders were
not splitting CONST_INT moves themselves.

I think the right fix is to make the move expanders work
even in this scenario, rather than require callers to handle
it as a special case.

gcc/
PR target/98136
* config/aarch64/aarch64.md (mov): Pass multi-instruction
CONST_INTs to aarch64_expand_mov_immediate when called after RA.

gcc/testsuite/
PR target/98136
* g++.dg/pr98136.C: New test.

(cherry picked from commit 48c79f054bf435051c95ee093c45a0f8c9de5b4e)

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

--- Comment #11 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:49cc1253d079bbefc18275f29adc526679422176

commit r9-9463-g49cc1253d079bbefc18275f29adc526679422176
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:14 2021 +0100

lra: Avoid cycling on certain subreg reloads [PR96796]

This PR is about LRA cycling for a reload of the form:

   

Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
  Creating newreg=287, assigning class ALL_REGS to slow/invalid mem
r287
  Creating newreg=288, assigning class ALL_REGS to slow/invalid mem
r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
  REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0
   


The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

This happens for input address reloads, where the class is taken
from the target's chosen base register class.  It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):

   

 Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
  Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
Inserting insn reload before:
  320: r291:SI=r287:DI#0
   


IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:

   

 Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
  Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to
r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
  318: r287:DI=r289:DI
---

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.

[Bug target/98302] [9 Regression] Wrong code on aarch64

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98302

--- Comment #18 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:3fa4752e29a5b44219837ebad5bb09ec98af156e

commit r9-9462-g3fa4752e29a5b44219837ebad5bb09ec98af156e
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:14 2021 +0100

vect: Avoid generating out-of-range shifts [PR98302]

In this testcase we end up with:

  unsigned long long x = ...;
  char y = (char) (x << 37);

The overwidening pattern realised that only the low 8 bits
of x << 37 are needed, but then tried to turn that into:

  unsigned long long x = ...;
  char y = (char) x << 37;

which gives an out-of-range shift.  In this case y can simply
be replaced by zero, but as the comment in the patch says,
it's kind-of awkward to do that in the middle of vectorisation.

Most of the overwidening stuff is about keeping operations
as narrow as possible, which is important for vectorisation
but could be counter-productive for scalars (especially on
RISC targets).  In contrast, optimising y to zero in the above
feels like an independent optimisation that would benefit scalar
code and that should happen before vectorisation.

gcc/
PR tree-optimization/98302
* tree-vect-patterns.c (vect_determine_precisions_from_users): Make
sure that the precision remains greater than the shift count.

gcc/testsuite/
PR tree-optimization/98302
* gcc.dg/vect/pr98302.c: New test.

(cherry picked from commit 58a12b0eadac62e691fcf7325ab2bc2c93d46b61)

[Bug tree-optimization/95694] [9 Regression] ICE in trunc_int_for_mode, at explow.c:59 since r9-7156-g33579b59aaf02eb7

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95694

--- Comment #8 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:90ce58cf411f06292dc8f96aba61f3e3d07f22e8

commit r9-9461-g90ce58cf411f06292dc8f96aba61f3e3d07f22e8
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:13 2021 +0100

expr: Fix REDUCE_BIT_FIELD for constants [PR95694, PR96151]

This is yet another PR caused by constant integer rtxes not storing
a mode.  We were calling REDUCE_BIT_FIELD on a constant integer that
didn't fit in poly_int64, and then tripped the as_a
assert on VOIDmode.

AFAICT REDUCE_BIT_FIELD is always passed rtxes that have TYPE_MODE
(rather than some other mode) and it just fills in the redundant
sign bits of that TYPE_MODE value.  So it should be safe to get
the mode from the type instead of the rtx.  The patch does that
and asserts that the modes agree, where information is available.

That on its own is enough to fix the bug, but we might as well
extend the folding case to all constant integers, not just those
that fit poly_int64.

gcc/
PR middle-end/95694
* expr.c (expand_expr_real_2): Get the mode from the type rather
than the rtx, and assert that it is consistent with the mode of
the rtx (where known).  Optimize all constant integers, not just
those that can be represented in poly_int64.

gcc/testsuite/
PR middle-end/95694
* gcc.dg/pr95694.c: New test.

(cherry picked from commit 760df6d296b8fc59796f42dca5eb14012fbfa28b)

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #7 from Hongtao.liu  ---
Confirmed, let me fix this.

[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Component|rtl-optimization|tree-optimization
   Last reconfirmed||2021-04-25
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
The problem is right away in expand:
;; vect__36.383_12 = MEM  [(char * {ref-all})_10 +
16B];

(insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ])
(mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ])
(const_int 16 [0x10])) [0 MEM 
[(char * {ref-all})_10 + 16B]+0 S16 A128])) -1
 (nil))


I think SLP did not mark the load as unaligned even though it knows it is one:
t.cc:7:8: note:   Vectorizing an unaligned access.
t.cc:7:8: note:   vect_model_load_cost: unaligned supported by hardware.
t.cc:7:8: note:   vect_model_load_cost: inside_cost = 24, prologue_cost = 0 .
t.cc:7:8: note:   ==> examining statement: MEM <__int128 unsigned> [(char *
{ref-all}) + 25B] = _36;
t.cc:7:8: note:   vect_is_simple_use: operand # VUSE <.MEM_30>
MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_10], type of def: internal
t.cc:7:8: note:   vect_is_simple_use: operand # VUSE <.MEM_35>
MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_19], type of def: internal
t.cc:7:8: note:   Vectorizing an unaligned access.
t.cc:7:8: note:   vect_model_store_cost: unaligned supported by hardware.

Confirmed.

When -fno-tree-bit-ccp is turned off, the prop of the unalignedness does not
happen.

[Bug rtl-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #1 from Hongtao.liu  ---
for sse load/store, with aligned address, movdqu is as fast as movdqa, so i'm
thinking that backend can generate only unaligned load/store instructions(which
of course may cover up some problems).