[Bug c++/97376] New: Function type to function pointer type adjustment for non-type template paramter does not work when using decltype(auto)

2020-10-11 Thread anders.granlund.0 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97376

Bug ID: 97376
   Summary: Function type to function pointer type adjustment for
non-type template paramter does not work when using
decltype(auto)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anders.granlund.0 at gmail dot com
  Target Milestone: ---

Consider the following program:

  #include 
  #include 

  template
  void f1()
  {
  }

  template
  void f2()
  {
  }

  void ff()
  {
  }

  int main()
  {
f1();
f2();
  }

When compiling it with  -std=c++17 -pedantic-errors  it gives a compilation
error complaining that the non-type parameter  X  of the template  f1  has type
 void ()  and that this is not a valid type for a non-type template paramter.

The expected behaviour is that the type of  X  in  f1  should instead be
ajusted to  void (*)() . This is what happens in template  f2 .

Note that clang gives the expected behaviour (no compilation errors). Compiler
explorer link comparing clang and gcc:

  https://godbolt.org/z/PGbrYE

[Bug c++/97375] New: Unexpected top-level const retainment when declaring non-type template paramter with decltype(auto)

2020-10-11 Thread anders.granlund.0 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97375

Bug ID: 97375
   Summary: Unexpected top-level const retainment when declaring
non-type template paramter with decltype(auto)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anders.granlund.0 at gmail dot com
  Target Milestone: ---

Consider the following program:

  #include 
  #include 

  template
  void f1()
  {
std::cout << std::is_const_v << std::endl;
  }

  template
  void f2()
  {
std::cout << std::is_const_v << std::endl;
  }

  int main()
  {
const int i = 0;

f1();
f2();
  }

When compiling it with  -std=c++17 -pedantic-errors  it gives the following
output:

  0
  1

I expect it to give the following output instead:

  0
  0

So that the ignoral of top-level const is done in both cases.

Note that clang gives the correct output. Compiler explorer link showing the
difference between clang and gcc behaviour:

  https://godbolt.org/z/aefYKd

I tried to report this bug first to clang (with a different example program),
but it turned out from the discussion (comment #3) that the bug was actually in
gcc:

  https://bugs.llvm.org/show_bug.cgi?id=47792#c3

[Bug target/97286] GCC sometimes uses an extra xmm register for the destination of _mm_blend_ps

2020-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97286

--- Comment #2 from Hongtao.liu  ---
Seems similar issue as PR97366?

[Bug rtl-optimization/97249] Missing vec_select and subreg optimization

2020-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #4 from Hongtao.liu  ---
(In reply to Richard Biener from comment #3)
> Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if
> it was appropriately simplified (and simplify_rtx would handle this case).
> In any case the vec_select is the same as (subreg:V8QI (V16QI)).

For this testcase, simplify_rtx will be omiited since it will be handle in
---
  for (i = 0; i < GET_RTX_LENGTH (code); i++)
switch (*format_ptr++)
  {
  case 'e':
if (XEXP (orig, i) != NULL)
  {
rtx result = cselib_expand_value_rtx_1 (XEXP (orig, i), evd,
max_depth - 1);
if (!result)
  return NULL; <-return here.
if (copy)
  XEXP (copy, i) = result;
  }
break;
---

So could we handle it in cselib_expand_value_rtx_1?
---
diff --git a/gcc/cselib.c b/gcc/cselib.c
index 53e9603868d..8882ac60f1e 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -1864,6 +1864,18 @@ cselib_expand_value_rtx_1 (rtx orig, struct
expand_value_data *evd,
return scopy;
   }

+/* Handle cases like
+   (vec_select:V8QI (subreg:V16QI (value:V8QI) 0)
+   (parallel [(const_int 0) (const_int 1)
+   (const_int 2) (const_int 3)
+   (const_int 4) (const_int 5)
+   (const_int 6) (const_int 7)])),
+   it should be equal to (value:V8QI).  */
+case VEC_SELECT:
+  {
+   
+  }
+
---

[Bug middle-end/97374] missing essential detail in array parameter overflow warning

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97374

--- Comment #2 from Martin Sebor  ---
This was prompted by warnings like the one below in a build of the kernel:

drivers/gpu/drm/i915/intel_pm.c:3062:9: warning: ‘intel_print_wm_latency’
reading 16 bytes from a region of size 10 [-Wstringop-overread]
 3062 | intel_print_wm_latency(dev_priv, "Primary",
dev_priv->wm.pri_latency);
  |
^
drivers/gpu/drm/i915/intel_pm.c:3062:9: note: referencing argument 3 of type
‘const u16 *’ {aka ‘const short unsigned int *’}
drivers/gpu/drm/i915/intel_pm.c:2999:13: note: in a call to function
‘intel_print_wm_latency’
 2999 | static void intel_print_wm_latency(struct drm_i915_private *dev_priv,
  | ^~

[Bug middle-end/97374] missing essential detail in array parameter overflow warning

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97374

Martin Sebor  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #1 from Martin Sebor  ---
The second note could also be improved to print the form and type of the
relevant parameter, similar to the -Warray-parameter warning:

$ cat q.c && gcc -S -Wall q.c
typedef int A[3];
typedef int B[4];

void f (A);
void f (B);
q.c:5:9: warning: argument 1 of type ‘int[4]’ with mismatched bound
[-Warray-parameter=]
5 | void f (B);
  | ^
q.c:4:9: note: previously declared as ‘int[3]’
4 | void f (A);
  | ^

[Bug middle-end/97374] New: missing essential detail in array parameter overflow warning

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97374

Bug ID: 97374
   Summary: missing essential detail in array parameter overflow
warning
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

The warning below doesn't provide enough information to understand what the
problem is.  The first note should say something like

  note: referencing argument 1 of type ‘int[4]’

$ cat q.c && gcc -O2 -S -Wall q.c
typedef int A[3];
typedef int B[4];

void f (B);

struct S { A x; B y; };

void g (struct S *p)
{
  f (p->x);
}
q.c: In function ‘g’:
q.c:10:3: warning: ‘f’ accessing 16 bytes in a region of size 12
[-Wstringop-overflow=]
   10 |   f (p->x);
  |   ^~~~
q.c:10:3: note: referencing argument 1 of type ‘int *’
q.c:4:6: note: in a call to function ‘f’
4 | void f (B);
  |  ^

[Bug middle-end/97373] missing warning on sprintf into allocated destination

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97373

Martin Sebor  changed:

   What|Removed |Added

 Blocks||85741
   Keywords||diagnostic

--- Comment #1 from Martin Sebor  ---
The simplest change to diagnose the overflow in comment #0 goes like this:

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index fff034fac4d..ed35eccebf3 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -4047,9 +4047,13 @@ get_destination_size (tree dest)
  use type-zero object size to determine the size of the enclosing
  object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
-  unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, ))
-return size;
+  access_ref ref;
+  if (compute_objsize (dest, ost, ))
+{
+  offset_int size = ref.size_remaining ();
+  if (wi::fits_uhwi_p (size))
+   return size.to_uhwi ();
+}

   return HOST_WIDE_INT_MAX;
 }


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85741
[Bug 85741] [meta-bug] bogus/missing -Wformat-overflow

[Bug middle-end/97373] New: missing warning on sprintf into allocated destination

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97373

Bug ID: 97373
   Summary: missing warning on sprintf into allocated destination
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

-Wformat-overflow doesn't detect buffer overflow in sprintf call writing to
allocated objects with non-constant sizes.  The problem is that the warning
calls compute_builtin_object_size() instead of compute_objsize().

$ cat q.c && gcc -O2 -S -Wall q.c
void* f (int n)
{
  if (n < 5 || 7 < n)
n = 5;

  char *p = __builtin_malloc (n);
  __builtin_strcpy (p, "1234567");   // warning (good)
  return p;
}

void* g (int n)
{ 
  if (n < 5 || 7 < n)
n = 5;

  char *p = __builtin_malloc (n);
  __builtin_sprintf (p, "%i", 1234567);   // missing warning
  return p;
}
q.c: In function ‘f’:
q.c:7:3: warning: ‘__builtin_memcpy’ writing 8 bytes into a region of size
between 5 and 7 [-Wstringop-overflow=]
7 |   __builtin_strcpy (p, "1234567");   // warning (good)
  |   ^~~
q.c:6:13: note: at offset 0 to an object with size between 5 and 7 allocated by
‘__builtin_malloc’ here
6 |   char *p = __builtin_malloc (n);
  | ^~~~

[Bug c++/68288] botched floating-point UDL

2020-10-11 Thread solodon at mail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68288

--- Comment #4 from Yuriy Solodkyy  ---
P.S. I added my previous example to this bug as they seemed to be related, feel
free to split it into a separate bug if they are not.

P.P.S. Change that return expression to 42_sp-p and the parser seems to think
the entire _sp-p is a UDL suffix:

:9:12: error: unable to find numeric literal operator 'operator""_sp-p'

9 | return 42_sp-p;
  |^~~

[Bug tree-optimization/97360] ICE in range_on_exit

2020-10-11 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

Martin Sebor  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||aldyh at gcc dot gnu.org,
   ||msebor at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-10-11

--- Comment #1 from Martin Sebor  ---
Confirmed in an instrumented x86_64-linux Binutils/GDB build (the note is from
the instrumentation):

/src/binutils-gdb/bfd/elf32-arc.c:3160: internal compiler error: in
range_on_exit, at gimple-range.cc:931
mv -f .deps/coff-sh.Tpo .deps/coff-sh.Plo
/bin/sh ./libtool  --tag=CC   --mode=compile /build/gcc-master/gcc/xgcc -B
/build/gcc-master/gcc -DHAVE_CONFIG_H -I. -I/src/binutils-gdb/bfd 
-DBINDIR='"/usr/local/bin"' -DLIBDIR='"/usr/local/lib"' -I.
-I/src/binutils-gdb/bfd -I/src/binutils-gdb/bfd/../include  -DHAVE_all_vecs  
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144
-I/src/binutils-gdb/bfd/../zlib -g -O2 -MT elf32-bfin.lo -MD -MP -MF
.deps/elf32-bfin.Tpo -c -o elf32-bfin.lo /src/binutils-gdb/bfd/elf32-bfin.c
libtool: compile:  /build/gcc-master/gcc/xgcc -B /build/gcc-master/gcc
-DHAVE_CONFIG_H -I. -I/src/binutils-gdb/bfd -DBINDIR=\"/usr/local/bin\"
-DLIBDIR=\"/usr/local/lib\" -I. -I/src/binutils-gdb/bfd
-I/src/binutils-gdb/bfd/../include -DHAVE_all_vecs -W -Wall -Wstrict-prototypes
-Wmissing-prototypes -Wshadow -Wstack-usage=262144
-I/src/binutils-gdb/bfd/../zlib -g -O2 -MT elf32-bfin.lo -MD -MP -MF
.deps/elf32-bfin.Tpo -c /src/binutils-gdb/bfd/elf32-bfin.c -o elf32-bfin.o
/src/binutils-gdb/bfd/ecofflink.c: In function ‘bfd_ecoff_debug_one_external’:
/src/binutils-gdb/bfd/ecofflink.c:1314:3: note: ‘’: VR_VARYING
 1314 |   strcpy (debug->ssext + symhdr->issExtMax, name);
  |   ^~~
0x24eb947 gimple_ranger::range_on_exit(irange&, basic_block_def*, tree_node*)
/src/gcc/master/gcc/gimple-range.cc:930
0x24eba6e gimple_ranger::range_on_edge(irange&, edge_def*, tree_node*)
/src/gcc/master/gcc/gimple-range.cc:949
0x17d35ee range_query::value_on_edge(edge_def*, tree_node*)
/src/gcc/master/gcc/value-query.cc:98
0x22cf6d4 hybrid_folder::value_on_edge(edge_def*, tree_node*)
/src/gcc/master/gcc/gimple-ssa-evrp.c:243
0x15cb023 substitute_and_fold_engine::propagate_into_phi_args(basic_block_def*)
/src/gcc/master/gcc/tree-ssa-propagate.c:1038
0x15cbb16 substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
/src/gcc/master/gcc/tree-ssa-propagate.c:1238
0x22789b9 dom_walker::walk(basic_block_def*)
/src/gcc/master/gcc/domwalk.c:309
0x15cbc37 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
/src/gcc/master/gcc/tree-ssa-propagate.c:1283
0x22cf9ed execute_early_vrp
/src/gcc/master/gcc/gimple-ssa-evrp.c:334
0x22cfafa execute
/src/gcc/master/gcc/gimple-ssa-evrp.c:381
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[4]: *** [Makefile:1600: elf32-arc.lo] Error 1

[Bug c++/68288] botched floating-point UDL

2020-10-11 Thread solodon at mail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68288

Yuriy Solodkyy  changed:

   What|Removed |Added

 CC||solodon at mail dot com

--- Comment #3 from Yuriy Solodkyy  ---
This seems to be a genuine bug in GCC, not specific to floating-point UDL. It
is still present in GCC 10.2. ICC barks on it as well, but Clang and MSVC
accepts. 

Consider:

struct s_points { unsigned long long value; };
inline s_points operator"" _sp(unsigned long long v) { return {v}; }

s_points operator+(s_points, s_points);
s_points operator-(s_points, s_points);

s_points foo(s_points p)
{
return p-42_sp+1_sp; // Put space before + here and GCC will accept the
code
}

I get the following error on return statement line above:

:9:14: error: unable to find numeric literal operator
'operator""_sp+1_sp'

9 | return p-42_sp+1_sp;
  |  ^~

Since ud-suffix is just an identifier in the grammar, it should not grab +
while parsing, which according to error is what it seems to be doing.

Here is this snippet on Compiler Explorer: https://godbolt.org/z/4bfs6P

[Bug c++/97372] Segmentation fault using Tracy 0.7.3 in template class

2020-10-11 Thread public at enkore dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97372

--- Comment #4 from marian  ---
Running gcc under valgrind with the original testcase produces some interesting
output. These binaries of course barely have any symbols at all, but it might
at least provide a hint:

==882380== Invalid read of size 2
==882380==at 0x75DCA1: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x2: ???
==882380==by 0x771307: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x76F4CD: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x76D86E: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x76CFBD: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x76CFBD: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x77206F: tsubst_lambda_expr(tree_node*, tree_node*, int,
tree_node*) (in /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x7748C3: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x773684: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x774CA4: ??? (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==by 0x6E8EE9: get_nsdmi(tree_node*, bool, int) (in
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus)
==882380==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

[Bug c++/97372] Segmentation fault using Tracy 0.7.3 in template class

2020-10-11 Thread public at enkore dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97372

--- Comment #3 from marian  ---
Created attachment 49344
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49344=edit
CVise differently reduced reproducer (possibly a different bug as well)

Attached automatically reduced reproducer does retain more of the structure of
the original test case, but also strikes me as suspiciously invalid C++, so
probably is another bug still.

$ g++ testcase_proper.ii   
testcase_proper.ii: In instantiation of ‘a<  >::a()
[with  = int]’:
testcase_proper.ii:6:19:   required from here
testcase_proper.ii:2:25: internal compiler error: Segmentation fault
2 |   int b{[] { static int c; }};
  | ^
$ cat testcase_proper.ii
template  class a {
  int b{[] { static int c; }};
public:
  a() {}
};
void d() { a e; }

[Bug c++/97372] Segmentation fault using Tracy 0.7.3 in template class

2020-10-11 Thread public at enkore dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97372

--- Comment #2 from marian  ---
Created attachment 49343
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49343=edit
CVise reduced reproducer (possibly a different bug)

Attached testcase.ii was produced with CVise from the original reproducer. It
also triggers an ICE, but I suspect it's an entirely different ICE (testcase.ii
is not even syntactically valid C++):

testcase.ii:2:24: internal compiler error: in splice_late_return_type, at
cp/pt.c:29152
2 | template a
  |^

$ cat testcase.ii
struct a;
template a

[Bug c++/97372] Segmentation fault using Tracy 0.7.3 in template class

2020-10-11 Thread public at enkore dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97372

--- Comment #1 from marian  ---
Created attachment 49342
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49342=edit
pre-processed reproducer

[Bug c++/97372] New: Segmentation fault using Tracy 0.7.3 in template class

2020-10-11 Thread public at enkore dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97372

Bug ID: 97372
   Summary: Segmentation fault using Tracy 0.7.3 in template class
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: public at enkore dot de
  Target Milestone: ---

Full reproducer attached:

$ g++ gcc_segfault_repro.ii  
gcc_segfault_repro.cpp: In instantiation of ‘ThreadTask2::ThreadTask2()
[with T = int]’:
gcc_segfault_repro.cpp:15:22:   required from here
gcc_segfault_repro.cpp:7:135: internal compiler error: Segmentation fault
7 | TracyLockable(std::mutex, _state_mutex);
  |
  ^

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib
--libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl
--with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit
--enable-cet=auto --enable-checking=release --enable-clocale=gnu
--enable-default-pie --enable-default-ssp --enable-gnu-indirect-function
--enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id
--enable-lto --enable-multilib --enable-plugin --enable-shared
--enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-libunwind-exceptions --disable-werror
gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)

Offending code looks like this (using Tracy from
https://github.com/wolfpld/tracy):

#include 
#include 

template
class ThreadTask2 {
TracyLockable(std::mutex, _state_mutex);

public:
ThreadTask2() {
}
};

void x() {
ThreadTask2 foo;
}

When the template is removed from ThreadTask2, GCC stops segfaulting. In the
actual codebase GCC points out a slightly more specific source location, but
this may be an artifact and unrelated to the bug itself:

Tracy.hpp:141:159: internal compiler error: Segmentation fault
  141 | #define TracyLockable( type, varname ) tracy::Lockable varname {
[] () -> const tracy::SourceLocationData* { static constexpr
tracy::SourceLocationData srcloc { nullptr, #type " " #varname, __FILE__,
__LINE__, 0 }; return  }() };
  |
   
  ^~
common.h:76:13: note: in expansion of macro ‘TracyLockable’
   76 | mutable TracyLockable(std::mutex, _state_mutex);
  |

[Bug libfortran/97063] [ MATMUL intrinsic] The value of result is wrong when vector (step size is negative) * matrix

2020-10-11 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97063

--- Comment #6 from anlauf at gcc dot gnu.org ---
Patch: https://gcc.gnu.org/pipermail/fortran/2020-October/055169.html

[Bug c/97371] evrp problem with gcc.target/s390/pr77822-2.c and -O3

2020-10-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97371

--- Comment #1 from David Binderman  ---
Reduced C code is

int a, b;
void c() {
  if (b >> 38)
a = b;
}

[Bug c/97371] New: evrp problem with gcc.target/s390/pr77822-2.c and -O3

2020-10-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97371

Bug ID: 97371
   Summary: evrp problem with gcc.target/s390/pr77822-2.c and -O3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

$ /home/dcb/gcc/results/bin/gcc -c -w -O3 gcc.target/s390/pr77822-2.c
during GIMPLE pass: evrp
./gcc.target/s390/pr77822-2.c: In function ‘pos_ll_129’:
./gcc.target/s390/pr77822-2.c:307:1: internal compiler error: in verify_range,
at value-range.cc:369
  307 | }
  | ^
0xfee18c irange::verify_range()
../../trunk.git/gcc/value-range.cc:369
0x18435ab int_range<255u>::int_range(tree_node*, tree_node*, value_range_kind)
../../trunk.git/gcc/value-range.h:414
0x18435ab operator_rshift::op1_range(irange&, tree_node*, irange const&, irange
const&) const
../../trunk.git/gcc/range-op.cc:1643

This is on a x86_64, but still it shouldn't crash.
The bug starts sometime between 20201006 and 20201007.
I'll have a go at reducing the code.

[Bug libfortran/97063] [ MATMUL intrinsic] The value of result is wrong when vector (step size is negative) * matrix

2020-10-11 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97063

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
  Component|fortran |libfortran

--- Comment #5 from anlauf at gcc dot gnu.org ---
I have a patch.

[Bug c/97370] comedy of boolean errors for '!a & (b|c)'

2020-10-11 Thread harald at gigawatt dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97370

Harald van Dijk  changed:

   What|Removed |Added

 CC||harald at gigawatt dot nl

--- Comment #1 from Harald van Dijk  ---
> * 'f' is incorrectly diagnosed even though it's the same thing as 'i' after 
> commuting the operands of '&'. ('i' is correctly allowed.)

When an expression is written as !a & b, it is possible the user intended !(a &
b). If it is rewritten as b & !a, it is clear that the user did not intend !(b
& a).

> * The diagnostic for 'f' suggests 'g', but 'g' produces the same diagnostic.

Indeed, and that looks like a bad suggestion by GCC to me. The diagnostic for
'f' should be suggesting (!a) rather than !(a), which does manage to suppress
the diagnostic.

> * The diagnostic for 'f' sugggests 'h', but 'h' produces a different
diagnostic.

Although in general, informing the user that they may have wanted to use ~ may
be useful, I personally think that suggestion should be dropped if the operand
is of type _Bool/bool. You're correct that bool & ~bool will have the intended
result but my opinion is that that is overly clever code that hurts
readability, and GCC should not be offering that as a suggestion.

[Bug libstdc++/97369] undefined reference to std::_***""

2020-10-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97369

--- Comment #2 from Jonathan Wakely  ---
If that's a linker error not a run time error, then it looks like you're not
using the right GCC to link. It could be that you're compiling with GCC 6.3.0
but then using a different GCC to link, which doesn't have the required __cxx11
symbols in its libstdc++.so

[Bug libstdc++/97369] undefined reference to std::_***""

2020-10-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97369

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-10-11
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
You haven't provided sufficient information to tell, but I doubt there's a bug
here.

Please read these links and confirm if you're using the right libstdc++.so at
run time (the one from your self-compile GCC 6.3.0, not the system compiler):
https://gcc.gnu.org/onlinedocs/libstdc++/faq.html#faq.how_to_set_paths
https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic

I've said it before, but the fact that matlab 2020 only works with a GCC
version that has been unsupported for two years is stupid.

[Bug c/97370] New: comedy of boolean errors for '!a & (b|c)'

2020-10-11 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97370

Bug ID: 97370
   Summary: comedy of boolean errors for '!a & (b|c)'
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eggert at cs dot ucla.edu
  Target Milestone: ---

I ran into this problem while compiling a proposed patch for GNU grep.

For the following program a.c:

_Bool f (_Bool a, _Bool b, _Bool c) { return !a & (b|c); }
_Bool g (_Bool a, _Bool b, _Bool c) { return !(a) & (b|c); }
_Bool h (_Bool a, _Bool b, _Bool c) { return ~a & (b|c); }
_Bool i  (_Bool a, _Bool b, _Bool c) { return (b|c) & !a; }

The command 'gcc -Wall -S a.c' generates bogus diagnostics for 'f', 'g', and
'h' (see the diagnostics at the end of this comment).

* 'f' is incorrectly diagnosed even though it's the same thing as 'i' after
commuting the operands of '&'. ('i' is correctly allowed.)

* The diagnostic for 'f' suggests 'g', but 'g' produces the same diagnostic.

* The diagnostic for 'f' sugggests 'h', but 'h' produces a different
diagnostic. I understand why 'bool = ~bool' should be diagnosed (bug#77490),
but 'h' should not be diagnosed since 'bool & ~bool' always has the usual
boolean interpretation.

I finally ended up using the equivalent of 'i' in GNU grep, but I should have
been able to use any of 'f', 'g', or 'h' without worrying about generating a
bogus warning.

Here are the bogus diagnostics in question:


a.c: In function 'f':
a.c:1:46: warning: suggest parentheses around operand of '!' or change '&' to
'&&' or '!' to '~' [-Wparentheses]
1 | _Bool f (_Bool a, _Bool b, _Bool c) { return !a & (b|c); }
  |  ^~
a.c: In function 'g':
a.c:2:46: warning: suggest parentheses around operand of '!' or change '&' to
'&&' or '!' to '~' [-Wparentheses]
2 | _Bool g (_Bool a, _Bool b, _Bool c) { return !(a) & (b|c); }
  |  ^~~~
a.c: In function 'h':
a.c:3:46: warning: '~' on a boolean expression [-Wbool-operation]
3 | _Bool h (_Bool a, _Bool b, _Bool c) { return ~a & (b|c); }
  |  ^
a.c:3:46: note: did you mean to use logical not?
3 | _Bool h (_Bool a, _Bool b, _Bool c) { return ~a & (b|c); }
  |  ^
  |  !

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Intrinsics being type-agnostic cause vector subregs to appear before register
allocation: the pseudo coming from the load has mode V2DI, the shift needs to
be done in mode V4SI, the bitwise-or and the store are done in mode V2DI again.
Subreg in the bitwise-or appears to be handled inefficiently. Didn't dig deeper
as to what happens during allocation.

FWIW, using generic vectors allows to avoid introducing such mismatches, and
indeed the variant coded with generic vectors does not have extra loads. For
your original code you'll have to convert between generic vectors and __m128i
to use the shuffle intrinsic. The last paragraphs in "Vector Extensions"
chapter [1] suggest using a union for that purpose in C; in C++ reinterpreting
via union is formally UB, so another approach could be used (probably simply
converting via assignment).

[1] https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

typedef uint32_t u32v4 __attribute__((vector_size(16)));
void gcc_double_load_128(int8_t *__restrict out, const int8_t *__restrict
input)
{
u32v4 *vin = (u32v4 *)input;
u32v4 *vout = (u32v4 *)out;
for (unsigned i=0 ; i<1024; i+=16) {
u32v4 in = *vin++;
*vout++ = in | (in >> 4);
}
}

Above code on Compiler Explorer: https://godbolt.org/z/MKPvxb

[Bug libstdc++/97369] New: undefined reference to std::_***""

2020-10-11 Thread xianping.du at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97369

Bug ID: 97369
   Summary: undefined reference to std::_***""
   Product: gcc
   Version: 6.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xianping.du at gmail dot com
  Target Milestone: ---

I compiled the code with a manually compiled HCC-6.3.0, but with the following
errors. The code is compiled to have the Matlab Simulink API so it needs to
link with the Matlab. I did use the
'-DCMAKE_CXX_FLAGS='-D_GLIBCXX_USE_CXX11_ABI=0'', but it did no help on this
issue. It can be compiled with the in-system GCC7.3.0, but this compiler is no
compatible with the MATLAB 2020a. Do you have any suggestions on how to compile
this code with the GCC-6.3.0? Thank you

"/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmex.so: undefined reference to
`VTT for std::__cxx11::basic_ostringstream,
std::allocator >@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwmlutil.so: undefined
reference to `std::__cxx11::basic_stringstream,
std::allocator >::basic_stringstream(std::__cxx11::basic_string, std::allocator > const&,
std::_Ios_Openmode)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwboost_log.so.1.70.0:
undefined reference to `typeinfo for std::codecvt@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwfl.so: undefined reference to
`std::ios_base::failure[abi:cxx11]::~failure()@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwflnetwork.so: undefined
reference to `std::__cxx11::basic_string,
std::allocator >::erase(unsigned long, unsigned long)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwboost_regex.so.1.70.0:
undefined reference to `std::__cxx11::messages const&
std::use_facet >(std::locale
const&)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwi18n.so: undefined reference
to `std::__cxx11::basic_string,
std::allocator >::_M_construct(unsigned long, wchar_t)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwservices.so: undefined
reference to `std::out_of_range::out_of_range(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmex.so: undefined reference to
`std::__cxx11::basic_string, std::allocator
>::compare(std::__cxx11::basic_string,
std::allocator > const&) const@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmex.so: undefined reference to
`std::__cxx11::basic_string, std::allocator
>::_M_create(unsigned long&, unsigned long)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwflnetwork.so: undefined
reference to `std::__cxx11::basic_string,
std::allocator >::rfind(wchar_t, unsigned long) const@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmex.so: undefined reference to
`std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwboost_regex.so.1.70.0:
undefined reference to `std::overflow_error::overflow_error(char
const*)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwboost_log.so.1.70.0:
undefined reference to `std::__cxx11::basic_string, std::allocator
>::~basic_string()@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmex.so: undefined reference to
`operator delete(void*, unsigned long)@CXXABI_1.3.9'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwboost_log.so.1.70.0:
undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_append(wchar_t const*,
unsigned long)@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libCppMicroServices.so.3.3.5:
undefined reference to `std::__cxx11::basic_string, std::allocator >::find_last_not_of(char, unsigned
long) const@GLIBCXX_3.4.21'
/cache/sw/packages/MATLAB/R2020a/bin/glnxa64/libmwservices.so: undefined
reference to `std::__cxx11::basic_stringbuf,
std::allocator >::pbackfail(int)@GLIBCXX_3.4.21'"

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2020-10-11 Thread freddie at witherden dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414

Freddie Witherden  changed:

   What|Removed |Added

 CC||freddie at witherden dot org

--- Comment #11 from Freddie Witherden  ---
I've been looking into this and the big difference appears to be that when
Clang unrolls the loop it does so using multiple accumulators (and indeed does
this without need to be told to unroll.  Given:

double acc(double *x, int n)
{
double a = 0;
#pragma omp simd
for (int i = 0; i < n; i++)
a += x[i];
return a;
}


and compiling with clang -march=native -Ofast -fopenmp -S the core loop reads
as:

vaddpd  (%rdi,%rsi,8), %ymm0, %ymm0
vaddpd  32(%rdi,%rsi,8), %ymm1, %ymm1
vaddpd  64(%rdi,%rsi,8), %ymm2, %ymm2
vaddpd  96(%rdi,%rsi,8), %ymm3, %ymm3
vaddpd  128(%rdi,%rsi,8), %ymm0, %ymm0
vaddpd  160(%rdi,%rsi,8), %ymm1, %ymm1
vaddpd  192(%rdi,%rsi,8), %ymm2, %ymm2
vaddpd  224(%rdi,%rsi,8), %ymm3, %ymm3
vaddpd  256(%rdi,%rsi,8), %ymm0, %ymm0
vaddpd  288(%rdi,%rsi,8), %ymm1, %ymm1
vaddpd  320(%rdi,%rsi,8), %ymm2, %ymm2
vaddpd  352(%rdi,%rsi,8), %ymm3, %ymm3
vaddpd  384(%rdi,%rsi,8), %ymm0, %ymm0
vaddpd  416(%rdi,%rsi,8), %ymm1, %ymm1
vaddpd  448(%rdi,%rsi,8), %ymm2, %ymm2
vaddpd  480(%rdi,%rsi,8), %ymm3, %ymm3

which is heavily unrolled and uses four separate accumulators to hide the
latency of the vector adds.  Interestingly, one could argue that Clang is not
using enough registers given that Skylake can dual-issue adds and they have a
latency of 4 cycles (implying you want 8 separate accumulators).

GCC 10 with gcc -march=skylake -Ofast -fopenmp -S test.c -funroll-loops

vaddpd  -224(%r8), %ymm1, %ymm2
vaddpd  -192(%r8), %ymm2, %ymm3
vaddpd  -160(%r8), %ymm3, %ymm4
vaddpd  -128(%r8), %ymm4, %ymm5
vaddpd  -96(%r8), %ymm5, %ymm6
vaddpd  -64(%r8), %ymm6, %ymm7
vaddpd  -32(%r8), %ymm7, %ymm0

which although it is unrolled, is not a useful unrolling due to the dependency
chain.  Indeed, I would not be surprised if the performance is similar to the
unrolled code as the loop related cruft can be hidden.

[Bug fortran/96655] [OOP] CLASS dummy arguments: Bogus "Duplicate OPTIONAL attribute specified"

2020-10-11 Thread dominiq at lps dot ens.fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96655

Dominique d'Humieres  changed:

   What|Removed |Added

   Priority|P3  |P4
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-10-11
 Ever confirmed|0   |1

--- Comment #1 from Dominique d'Humieres  ---
Confirmed since at least GCC7.

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-11 Thread pedretti.fabio at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

Fabio  changed:

   What|Removed |Added

 CC||pedretti.fabio at gmail dot com

--- Comment #3 from Fabio  ---
*** Bug 97368 has been marked as a duplicate of this bug. ***

[Bug c/97368] randomly build failure for mesa with lto on armhf

2020-10-11 Thread pedretti.fabio at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97368

Fabio  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Fabio  ---
OK, this is a known bug:
https://bugs.launchpad.net/ubuntu/+source/gcc-10/+bug/1890435
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

Closing as duplicate.

*** This bug has been marked as a duplicate of bug 97323 ***

[Bug c/97368] randomly build failure for mesa with lto on armhf

2020-10-11 Thread pedretti.fabio at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97368

Fabio  changed:

   What|Removed |Added

  Component|lto |c

--- Comment #1 from Fabio  ---
>From the error message I suppposed it was an LTO problem, however, even when
disabling LTO I still get the error.
Note the "The bug is not reproducible, so it is likely a hardware or OS
problem." which is part of the log.
Posting here anyway.

during RTL pass: reload
../src/gallium/drivers/radeonsi/si_shader_llvm_vs.c: In function
‘si_llvm_build_vs_exports’:
../src/gallium/drivers/radeonsi/si_shader_llvm_vs.c:692:1: internal compiler
error: Segmentation fault
  692 | }
  | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.

Full build log at:
https://launchpadlibrarian.net/501604133/buildlog_ubuntu-groovy-armhf.mesa_20.3~git2010111520.734693~oibaf~g_BUILDING.txt.gz

[Bug lto/97368] New: randomly build failure for mesa with lto on armhf

2020-10-11 Thread pedretti.fabio at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97368

Bug ID: 97368
   Summary: randomly build failure for mesa with lto on armhf
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pedretti.fabio at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

When building mesa git (actually on b7d16a) I get randomly build failures on
armhf.
gcc is the one currently on Ubuntu groovy (10.2.0-13ubuntu1).

I don't get this build error with gcc 9, or on other architectures (amd64 arm64
i386 ppc64el s390x) even with the same gcc 10.

This is the actual error:

during RTL pass: reload
../src/compiler/nir/nir_loop_analyze.h: In function ‘contains_other_jump’:
../src/compiler/nir/nir_loop_analyze.h:71:1: internal compiler error:
Segmentation fault
   71 | }
  | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
lto-wrapper: fatal error: c++ returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

Full build log here:
https://launchpadlibrarian.net/501568575/buildlog_ubuntu-groovy-armhf.mesa_20.3~git2010110730.b7d16a~oibaf~g_BUILDING.txt.gz

[Bug libstdc++/70358] Several 26_numerics/random/binomial_distribution/operators etc. tests FAIL

2020-10-11 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70358

--- Comment #3 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #2 from Jonathan Wakely  ---
> Rainer, what's the status of this one? Are those tests still UNSUPPORTED, or
> now PASSing?

Looking back at old testresults, the tests were FAILing on the gcc-5
branch until 20161028, on the gcc-4.9 branch until 20160408.  On current
master, they all seem to PASS just fine.

[Bug libstdc++/63332] problem with VERIFY in ext/random/k_distribution/operators/serialize.cc execution test

2020-10-11 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63332

--- Comment #11 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #10 from Jonathan Wakely  ---
> Looks like this is still failing for solaris 11:
> https://gcc.gnu.org/pipermail/gcc-testresults/2020-October/610818.html

True.  However, the fact that the test only FAILs on Solaris 11.4/x86,
but not on 11.3 nor on sparc, makes me wonder if this might not be
another instances of PR fortran/94324.

[Bug target/97367] New: powerpc64 g5 and cell optimizations result in .machine power7

2020-10-11 Thread rene at exactcode dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97367

Bug ID: 97367
   Summary: powerpc64 g5 and cell optimizations result in .machine
power7
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rene at exactcode dot de
  Target Milestone: ---

Created attachment 49341
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49341=edit
Patch

Since reworking the rs6000 .machine output selection in commit
e154242724b084380e3221df7c08fcdbd8460674 22 May 2019, compiling glibc with
either G5 or cell results in power7 assembly optimizations to be chosen, which
obviously crash with illegal instructions. This is because gcc's .machine
output was accidentally changed due to OPTION_MASK_ALTIVEC only otherwise
present in IBM CPUs since power7.

powerpc64-t2-linux-gnu-gcc  test.c -S -o - -mcpu=G5
.file   "test.c"
.machine power7
.abiversion 2
.section".text"
.ident  "GCC: (GNU) 10.2.0"
.section.note.GNU-stack,"",@progbits

Attached patch fixes this to filter out ALTIVEC just like GFXOPT and GPOPT.

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-11 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #1 from Peter Cordes  ---
Forgot to include https://godbolt.org/z/q44r13

[Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-11 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Bug ID: 97366
   Summary: [8/9/10/11 Regression] Redundant load with SSE/AVX
vector intrinsics
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: peter at cordes dot ca
  Target Milestone: ---

When you use the same _mm_load_si128 or _mm256_load_si256 result twice,
sometimes GCC loads it *and* uses it as a memory source operand.

I'm not certain this is specific to x86 back-ends, please check bug tags if it
happens elsewhere.  (But it probably doesn't on 3-operand load/store RISC
machines; it looks like one operation chooses to load and then operate, the
other chooses to use the original source as a memory operand.)

#include 
void gcc_double_load_128(int8_t *__restrict out, const int8_t *__restrict
input)
{
for (unsigned i=0 ; i<1024 ; i+=16){
__m128i in = _mm_load_si128((__m128i*)[i]);
__m128i high = _mm_srli_epi32(in, 4);
_mm_store_si128((__m128i*)[i], _mm_or_si128(in,high));
}
}

gcc 8 and later -O3 -mavx2, including 11.0.0 20200920, with 

gcc_double_load_128(signed char*, signed char const*):
xorl%eax, %eax
.L6:
vmovdqa (%rsi,%rax), %xmm1 # load
vpsrld  $4, %xmm1, %xmm0
vpor(%rsi,%rax), %xmm0, %xmm0  # reload as a memory operand
vmovdqa %xmm0, (%rdi,%rax)
addq$16, %rax
cmpq$1024, %rax
jne .L6
ret

GCC7.5 and earlier use  vpor %xmm1, %xmm0, %xmm0 to use the copy of the
original that was already loaded.

`-march=haswell` happens to fix this for GCC trunk, for this 128-bit version
but not for a __m256i version.

restrict doesn't make a difference, and there's no overlapping anyway.  The two
redundant loads both happen between any other stores.

Using a memory source operand for vpsrld wasn't an option: the form with a
memory source takes the *count* from  memory, not the data. 
https://www.felixcloutier.com/x86/psllw:pslld:psllq



Note that *without* AVX, the redundant load is a possible win, for code running
on Haswell and later Intel (and AMD) CPUs.  Possibly some heuristic is saving
instructions for the legacy-SSE case (in a way that's probably worse overall)
and hurting the AVX case.

GCC 7.5, -O3  without any -m options
gcc_double_load_128(signed char*, signed char const*):
xorl%eax, %eax
.L2:
movdqa  (%rsi,%rax), %xmm0
movdqa  %xmm0, %xmm1 # this instruction avoided
psrld   $4, %xmm1
por %xmm1, %xmm0 # with a memory source reload, in GCC8 and
later
movaps  %xmm0, (%rdi,%rax)
addq$16, %rax
cmpq$1024, %rax
jne .L2
rep ret


Using a memory-source POR saves 1 front-end uop by avoiding a register-copy, as
long as the indexed addressing mode can stay micro-fused on Intel.  (Requires
Haswell or later for that to happen, or any AMD.)  But in practice it's
probably worse.  Load-port pressure, and space in the out-of-order scheduler,
as well as code-size, is a problem for using an extra memory-source operand in
the SSE version, with the upside being saving 1 uop for the front-end.  (And
thus in the ROB.)  mov-elimination on modern CPUs means the movdqa register
copy costs no back-end resources (ivybridge and bdver1).

I don't know if GCC trunk is using por  (%rsi,%rax), %xmm0  on purpose for that
reason, of if it's just a coincidence.
I don't think it's a good idea on most CPUs, even if alignment is guaranteed.

This is of course 100% a loss with AVX; we have to `vmovdqa/u` load for the
shift, and it can leave the original value in a register so we're not saving a
vmovdqua.  And it's a bigger loss because indexed memory-source operands
unlaminate from 3-operand instructions even on Haswell/Skylake:
https://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes/31027695#31027695
so it hurts the front-end as well as wasting cycles on load ports, and taking
up space in the RS (scheduler).

The fact that -mtune=haswell fixes this for 128-bit vectors is interesting, but
it's clearly still a loss in the AVX version for all AVX CPUs.  2 memory ops /
cycle on Zen could become a bottleneck, and it's larger code size.  And
-mtune=haswell *doesn't* fix it for the -mavx2 _m256i version.

There is a possible real advantage in the SSE case, but it's very minor and
outweighed by disadvantages.  Especially for older CPUs like Nehalem that can
only do 1 load / 1 store per clock.  (Although this has so many uops in the
loop that it barely bottlenecks on that.)