[Bug target/96941] New: Initial PPC64LE transcendental auto-vectorization functionality

2020-09-04 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96941

Bug ID: 96941
   Summary: Initial PPC64LE transcendental auto-vectorization
functionality
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dje at gcc dot gnu.org
  Target Milestone: ---
Target: powerpc64le-*-linux

Demonstrate basic auto-vectorization of single- and double-precision
transcendental functions using libmvec.

[Bug c++/95164] [9/10/11 Regression] ICE regression starting with 9.3

2020-09-04 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95164

Marek Polacek  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #4 from Marek Polacek  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553311.html

[Bug gcov-profile/96913] gcc-11: __gcov_merge_topn hangs

2020-09-04 Thread slyfox at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96913

--- Comment #3 from Sergei Trofimovich  ---
Specifically I think this is already a wrong format on disk:

> _json.gcda:01a7:   0:COUNTERS topn 0 counts
> _json.gcda:01a9:  48:COUNTERS indirect_call 24 counts
> _json.gcda:   0: 1 1 140325305737168 1 1 140325305737200 0 0
> _json.gcda:   8: 0 0 0 0 0 0 0 0
> _json.gcda:  16: 0 0 0 0 0 0 0 0
> ...

Assuming indirect_call is in a 'hist' value format it should  be in form of:

  [total_executions, N, value1, counter1, ..., valueN, counterN]

Main problem: we have more than one entry here (which might be ok):
- record1 (ok):  total_executions=1 N=1 value1=140325305737168 counter1=1
- record2 (bad): total_executions=1 N=140325305737200 counter=0 ...

This is where we trip over enormous N.

[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482

2020-09-04 Thread ibuclaw at gdcproject dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924

Iain Buclaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Iain Buclaw  ---
Fix committed.

[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924

--- Comment #2 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Iain Buclaw
:

https://gcc.gnu.org/g:40af8b2eff82f28d83b2a5fe153cbc53af665956

commit r10-8711-g40af8b2eff82f28d83b2a5fe153cbc53af665956
Author: Iain Buclaw 
Date:   Fri Sep 4 22:54:22 2020 +0200

d: Fix ICE in create_tmp_var, at gimple-expr.c:482

Array concatenate expressions were creating more SAVE_EXPRs than what
was necessary.  The internal error itself was the result of a forced
temporary being made on a TREE_ADDRESSABLE type.

gcc/d/ChangeLog:

PR d/96924
* expr.cc (ExprVisitor::visit (CatAssignExp *)): Don't force
temporaries needlessly.

gcc/testsuite/ChangeLog:

PR d/96924
* gdc.dg/pr96924.d: New test.

(cherry picked from commit 52908b8de15a1c762a73063f1162bcedfcc993b4)

[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Iain Buclaw :

https://gcc.gnu.org/g:f8eabd47ac5335ebab0d83ff61fb680a46888be8

commit r11-3015-gf8eabd47ac5335ebab0d83ff61fb680a46888be8
Author: Iain Buclaw 
Date:   Fri Sep 4 22:54:22 2020 +0200

d: Fix ICE in create_tmp_var, at gimple-expr.c:482

Array concatenate expressions were creating more SAVE_EXPRs than what
was necessary.  The internal error itself was the result of a forced
temporary being made on a TREE_ADDRESSABLE type.

gcc/d/ChangeLog:

PR d/96924
* expr.cc (ExprVisitor::visit (CatAssignExp *)): Don't force
temporaries needlessly.

gcc/testsuite/ChangeLog:

PR d/96924
* gdc.dg/simd13927b.d: Removed.
* gdc.dg/pr96924.d: New test.

[Bug preprocessor/96940] ICE in linemap_compare_locations, at libcpp/line-map.c:1359

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940

--- Comment #2 from Jan Smets  ---
This is the workaround I currently have. It avoids calling min_location().

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..f49019e81d0 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -11005,8 +11005,11 @@ grokdeclarator (const cp_declarator *declarator,
   if (initialized > 1)
 funcdef_flag = true;

-  location_t typespec_loc = smallest_type_location (type_quals,
+  location_t typespec_loc = smallest_type_quals_location (type_quals,
declspecs->locations);
+  // using smallest_type_quals_location() iso. smallest_type_quals_location()
+  //  basically removes the usage of min_location on the result of
smallest_type_quals_location().
+  // typespec_loc = min_location (typespec_loc,
declspecs->locations[ds_type_spec]);
   if (typespec_loc == UNKNOWN_LOCATION)
 typespec_loc = input_location;

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #7 from Jakub Jelinek  ---
AFAIK targetm.override_options_after_change is called at the end of switching
optimization (but not target) options.
So, that is a good hook to e.g. adjust something cached from those non-target
Optimization options.
targetm.target_option.save is called during cl_target_option_save,
targetm.target_option.restore during cl_target_option_restore.
Then there is the targetm.set_current_function hook that is called during
set_cfun, i.e. when switching functions.

So, just from quick look, it seems wrong that arm
targetm.override_options_after_change calls arm_configure_build_target which is
a function which deals with target specific options (and furthermore it calls
it with the target_option_default_node node, so the command line options rather
than whatever options the function has).

I see that the rationale is probably that you want TARGET_THUMB to be reliable
for the arm_override_options_after_change_1, but I'd think you instead want to
just call the _1 and nothing else in there, and in arm_set_current_function
(perhaps at the end of it) call it again, so it is updated properly even if the
target options change.  And maybe also call it at the start of
arm_set_current_function even when nothing changed if that isn't sufficient.

I think it was a mistake to have separate OPTIMIZATION_NODE and
TARGET_OPTION_NODE, in retrospect I think that causes a lot of pain and I think
it would be better if there was just one that would hold both and would be
updated together, then have generic code deal with those changes and afterwards
a target hook.
Because when they are separate, at one time only one of them changes, some
hooks are run, then the other one changes and some options/cached variables
etc. can be dependent on both Optimization and Target options.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-09-04 Thread carll at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

Carl Love  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #9 from Carl Love  ---
Issue fixed, closing.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-09-04 Thread carll at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

Carl Love  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Carl Love  ---
The fix has been applied to the current mainline and backported to GCC 10. 
Closing the bug as fixed.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

--- Comment #7 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Carl Love :

https://gcc.gnu.org/g:e86814328251ea7da83038605df01d8def8d873a

commit r10-8710-ge86814328251ea7da83038605df01d8def8d873a
Author: Carl Love 
Date:   Thu Aug 27 13:36:13 2020 -0500

rs6000, remove improperly defined and unsupported builtins.

gcc/ChangeLog

2020-08-31  Carl Love  

PR target/85830
* config/rs6000/altivec.h (vec_popcntb, vec_popcnth, vec_popcntw,
vec_popcntd): Remove defines.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #6 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #4)
> Doesn't seem to be related to me, in the other PR everything is compiled
> with one set of options and no target attribute is involved either.

No, that's a completely different problem.  The problem there is some calls to
the back-end pass a fntype and some a fndecl.  The ones with a fndecl can work
out if a function is local and pick a different ABI, but the ones with only a
type cannot.  So we get inconsistent results.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #5 from Richard Earnshaw  ---
I batted my head against this when reworking the command line options stuff a
couple of years back, but the documentation on how the different hooks should
interact (especially for LTO and streaming) is, quite frankly woeful.  How any
back-end maintainer is supposed to support this is beyond me.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #4 from Jakub Jelinek  ---
Doesn't seem to be related to me, in the other PR everything is compiled with
one set of options and no target attribute is involved either.

[Bug preprocessor/96940] ICE in linemap_compare_locations, at libcpp/line-map.c:1359

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940

--- Comment #1 from Jan Smets  ---
Likely duplicate of Bug 96391
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391)
That one has a testcase for i686-w64-mingw32

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #3 from Andrew Pinski  ---
I think this is related to or a dup of bug 96882.

[Bug preprocessor/96391] [10/11 Regression] internal compiler error: in linemap_compare_locations, at libcpp/line-map.c:1359

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391

Jan Smets  changed:

   What|Removed |Added

 CC||jan.smets at nokia dot com

--- Comment #5 from Jan Smets  ---
Similar issue @ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935
(with bisect to the 'last known good' version)

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #5 from Jakub Jelinek  ---
It wouldn't be a fallback.  omp-low.c just decides if it is going to use
GOMP_atomic_{start,end} synchronization, __atomic_* or __sync_* to perform the
reduction.  And whether that uses the same or different lock doesn't matter,
because for one reduction omp-low.c will only use one way.

[Bug c++/83591] -Wduplicated-branches fires in system headers in template instantiation

2020-09-04 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83591

--- Comment #8 from Manuel López-Ibáñez  ---
(In reply to Tony E Lewis from comment #7)
> Manuel López-Ibáñez: are you happy that all underlying issues are resolved
> and this can be closed?

Sure.

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #4 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #3)
> For OpenMP reductions, we really don't care what kind of mutex protects the
> updates, as long as it is the same for all updates of the same reduction.
> I believe we don't rely on any other synchronization effects.
> So, I think we should change omp-low.c so that it emits __atomic_* calls
> with __ATOMIC_RELAXED rather than __sync_* calls.

That sounds like a good idea.

> And could just use
> libatomic with its own locking if we didn't go the GOMP_atomic_{start,end}
> route (that one is done if there are multiple reductions or the atomics
> aren't available or there are user defined reductions we don't understand
> (or all?), perhaps we should consider also using atomics perhaps even for
> two simple reductions or similar.
> And nvptx certainly could just use libatomic...

If we use libatomic as fallback for openmp, shouldn't we then use the same lock
in both?

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

Jakub Jelinek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-09-04
 Status|UNCONFIRMED |NEW

--- Comment #2 from Jakub Jelinek  ---
Maybe the problem isn't that arm_option_reconfigure_globals isn't called, it
is, but nothing has updated arm_active_target.
E.g. put a breakpoint on arm_option_reconfigure_globals and
arm_set_current_function and see what global_options.x_arm_arch_string and
arm_active_target and arm_arch_crc is at the end of each
arm_option_reconfigure_globals.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a6ba40 "armv7-a+fp"
(gdb) c
Continuing.

Breakpoint 5, arm_set_current_function (fndecl=) at ../../gcc/config/arm/arm.c:32315
32315 if (!fndecl || fndecl == arm_previous_fndecl)
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a6ba40 "armv7-a+fp"
(gdb) c
Continuing.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 1
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb7a6 "armv8-a",
arch_pp_name = 0x24fb7ae "8A", base_arch = BASE_ARCH_8A, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) c
Continuing.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) c
Continuing.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) 
Continuing.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) 
Continuing.

Breakpoint 5, arm_set_current_function (fndecl=) at
../../gcc/config/arm/arm.c:32315
32315 if (!fndecl || fndecl == arm_previous_fndecl)
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) 
Continuing.

Breakpoint 8, arm_option_reconfigure_globals () at
../../gcc/config/arm/arm.c:3772
3772  arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa,
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, 
  profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260
, tune_core = TARGET_CPU_genericv7a}
3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd"
(gdb) 
Continuing.

Breakpoint 5, arm_set_current_function (fndecl=) at ../../gcc/config/arm/arm.c:32315
32315 if (!fndecl || fndecl == arm_previous_fndecl)
1: arm_arch_crc = 0
2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a",
arch_pp_name = 

[Bug tree-optimization/96938] Failure to optimize bit-setting pattern when not using temporary

2020-09-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96938

--- Comment #1 from Marc Glisse  ---
With "char tmp" instead of "int tmp", we get the same code as the first
function.

[Bug preprocessor/96940] New: ICE in linemap_compare_locations, at libcpp/line-map.c:1359

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940

Bug ID: 96940
   Summary: ICE in linemap_compare_locations, at
libcpp/line-map.c:1359
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jan.smets at nokia dot com
  Target Milestone: ---

Target: x86_64-linux-gnu
Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu
--disable-multilib --enable-languages=c,c++,fortran,go

Reproduces with:
 10.2, 10.1

Works with:
 9.3, 9.1

Bisect traced it to

commit 4593483f15ca2a82049500b9434e736996bb0891
Author: Paolo Carlini 
Date:   Tue May 14 11:43:55 2019 +

Reapply r270597.

2019-05-14  Paolo Carlini  

PR preprocessor/90382
* decl.c (grokdeclarator): Fix value assigned to typespec_loc, use
min_location.
2019-05-14  Paolo Carlini  

PR preprocessor/90382
* g++.dg/diagnostic/trailing1.C: New test.

From-SVN: r271164




/x/bcm_sdk/sdk/include/shared/bitop.h:73: internal compiler error: in
linemap_compare_locations, at libcpp/line-map.c:1359
   73 |  CONST SHR_BITDCL *c,
  |
0x2233e01 linemap_compare_locations(line_maps*, unsigned int, unsigned int)
/jasmets/git/tools/gcc/libcpp/line-map.c:1359
0x9b9bbb linemap_location_before_p(line_maps*, unsigned int, unsigned int)
/jasmets/git/tools/gcc/gcc/../libcpp/include/line-map.h:1247
0x9a6998 min_location
/jasmets/git/tools/gcc/gcc/cp/decl.c:10641
0x9a6a67 smallest_type_location
/jasmets/git/tools/gcc/gcc/cp/decl.c:10673
0x9a759e grokdeclarator(cp_declarator const*, cp_decl_specifier_seq*,
decl_context, int, tree_node**)
/jasmets/git/tools/gcc/gcc/cp/decl.c:11009
0xa6115d cp_parser_parameter_declaration_list
/jasmets/git/tools/gcc/gcc/cp/parser.c:22618
0xa60fd2 cp_parser_parameter_declaration_clause
/jasmets/git/tools/gcc/gcc/cp/parser.c:22531
0xa5ea98 cp_parser_direct_declarator
/jasmets/git/tools/gcc/gcc/cp/parser.c:21203
0xa5e8a5 cp_parser_declarator
/jasmets/git/tools/gcc/gcc/cp/parser.c:21069
0xa5da7b cp_parser_init_declarator
/jasmets/git/tools/gcc/gcc/cp/parser.c:20570
0xa5281d cp_parser_simple_declaration
/jasmets/git/tools/gcc/gcc/cp/parser.c:13749
0xa5240a cp_parser_block_declaration
/jasmets/git/tools/gcc/gcc/cp/parser.c:13566
0xa52105 cp_parser_declaration
/jasmets/git/tools/gcc/gcc/cp/parser.c:13438
0xa521ea cp_parser_toplevel_declaration
/jasmets/git/tools/gcc/gcc/cp/parser.c:13466
0xa40e07 cp_parser_translation_unit
/jasmets/git/tools/gcc/gcc/cp/parser.c:4734
0xa926f8 c_parse_file()
/jasmets/git/tools/gcc/gcc/cp/parser.c:44001
0xbc9712 c_common_parse_file()
/jasmets/git/tools/gcc/gcc/c-family/c-opts.c:1190

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2020-09-04 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #14 from Hongtao.liu  ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
>   (match_operand:VI_AVX2 1 "vector_operand" "0,v")
>   (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)"
>   "@
>p\t{%2, %0|%0, %2}
>vp\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>(set_attr "type" "sseiadd")
>(set_attr "prefix_data16" "1,*")
>(set_attr "prefix" "orig,vex")
>(set_attr "mode" "")])
> 
> (define_insn "*sub3_bcst"
>   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> (minus:VI48_AVX512VL
>   (match_operand:VI48_AVX512VL 1 "register_operand" "v")
>   (vec_duplicate:VI48_AVX512VL
> (match_operand: 2 "memory_operand" "m"]
>   "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, mode, operands)"
>   "vpsub\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "type" "sseiadd")
>(set_attr "prefix" "evex")
>(set_attr "mode" "")])
> 
> What I meant is we could have just:
> (define_insn "*3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
>   (match_operand:VI_AVX2 1 "vector_bcst_operand" "0,v")
>   (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)"
>   "@
>p\t{%2, %0|%0, %2}
>vp\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>(set_attr "type" "sseiadd")
>(set_attr "prefix_data16" "1,*")
>(set_attr "prefix" "orig,vex")
>(set_attr "mode" "")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb.  Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
> 
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.

Is there any way to add preference to constraint "Bb", since we always want to
choose "Bb" when vec_duplicate existed, but sometimes, pass_reload would choose
'v', which produce a redudant broadcast instructions.

i.e: with the patch attached.
testcase avx512f-add-df-zmm-1.c would fail to generate embedded broadcast with
-m32.

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2020-09-04 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #13 from Hongtao.liu  ---
Created attachment 49182
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49182=edit
bcst_vector_operand

[Bug c++/87530] copy elision in return statement doesn't check for rvalue reference to object type

2020-09-04 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87530

--- Comment #3 from Marek Polacek  ---
No longer accepted since r11-2411.  The test should probably be added.

[Bug gcov-profile/96913] gcc-11: __gcov_merge_topn hangs

2020-09-04 Thread slyfox at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96913

--- Comment #2 from Sergei Trofimovich  ---
(In reply to Sergei Trofimovich from comment #0)
> The hang happens on real tauthon-2.8.2 interpreter from PR96394 (no nice
> reproducer yet).
> 
> In this instance I tried to build tauthon-2.8.2 against gcc-master. It hangs
> early when tries to merge topn entry:
> 
> """
> #0  0x7fd73865e0ce in __GI___libc_read (fd=3, buf=0x406284
> <__gcov_var+36>, nbytes=4096) at ../sysdeps/unix/sysv/linux/read.c:26
> #1  0x7fd7385f0090 in __GI__IO_file_xsgetn (fp=0x1295bc0,
> data=, n=4096) at libioP.h:948
> #2  0x7fd7385e4d1f in __GI__IO_fread (buf=0x406284 <__gcov_var+36>,
> size=1, count=4096, fp=0x1295bc0) at iofread.c:38
> #3  0x7fd72ab4237e in gcov_read_words (words=2) at
> ../../../gcc/libgcc/../gcc/gcov-io.c:491
> #4  0x7fd72ab42483 in __gcov_read_counter () at
> ../../../gcc/libgcc/../gcc/gcov-io.c:528
> #5  0x7fd72ab4190e in gcov_get_counter_target () at
> ../../../gcc/libgcc/libgcov.h:383
> #6  0x7fd72ab41c6b in __gcov_merge_topn (counters=0x7fd72ab4d320
> <__gcov4.encoder_clear>, n_counters=24) at
> ../../../gcc/libgcc/libgcov-merge.c:114
> #7  0x7fd72ab43569 in merge_one_data (
> filename=0x1154290
> "/tmp/portage/dev-lang/tauthon-2.8.2/work/x86_64-pc-linux-gnu/build/temp.
> linux-x86_64-2.8/tmp/portage/dev-lang/tauthon-2.8.2/work/tauthon-2.8.2/
> Modules/_json.gcda", gi_ptr=0x7fd72ab4a180, summary=0x7ffde7d6e9c0) at
> ../../../gcc/libgcc/libgcov-driver.c:314
> #8  0x7fd72ab43b1a in dump_one_gcov (gi_ptr=0x7fd72ab4a180,
> gf=0x7ffde7d6ea00, run_counted=0, run_max=125)
> at ../../../gcc/libgcc/libgcov-driver.c:492
> #9  0x7fd72ab43cba in gcov_do_dump (list=0x7fd72ab4a180, run_counted=0)
> at ../../../gcc/libgcc/libgcov-driver.c:555
> #10 0x7fd72ab43d28 in __gcov_dump_one (root=0x7fd72ab4e5c0
> <__gcov_root>) at ../../../gcc/libgcc/libgcov-driver.c:578
...
> 
> Looks like the problem is in decoding of some tag in __gcov_merge_topn().
> There count of entries is huge:
> 
> """
> (gdb) frame 6
> #6  0x7fd72ab41c6b in __gcov_merge_topn (counters=0x7fd72ab4d320
> <__gcov4.encoder_clear>, n_counters=24) at
> ../../../gcc/libgcc/libgcov-merge.c:114
> 114   gcov_type value = gcov_get_counter_target ();
> (gdb) list __gcov_merge_topn
> 96 -- counter
> 97 */
> 98
> 99  void
> 100 __gcov_merge_topn (gcov_type *counters, unsigned n_counters)
> 101 {
> 102   gcc_assert (!(n_counters % GCOV_TOPN_MEM_COUNTERS));
> 103
> 104   for (unsigned i = 0; i < (n_counters / GCOV_TOPN_MEM_COUNTERS);
> i++)
> 105 {
> (gdb)
> 106   /* First value is number of total executions of the profiler. 
> */
> 107   gcov_type all = gcov_get_counter_ignore_scaling (-1);
> 108   gcov_type n = gcov_get_counter_ignore_scaling (-1);
> 109
> 110   counters[GCOV_TOPN_MEM_COUNTERS * i] += all;
> 111
> 112   for (unsigned j = 0; j < n; j++)
> 113 {
> 114   gcov_type value = gcov_get_counter_target ();
> 115   gcov_type count = gcov_get_counter_ignore_scaling (-1);
> (gdb)
> 116
> 117   // TODO: we should use atomic here
> 118   gcov_topn_add_value (counters + GCOV_TOPN_MEM_COUNTERS *
> i, value,
> 119count, 0, 0);
> 120 }
> 121 }
> 122 }
> 123 #endif /* L_gcov_merge_topn */
> 124
> 125 #endif /* inhibit_libc */

> (gdb) print n
> $1 = 140325305737200

I looks like __gconv_merge_topn() is applied to 'indirect_call' counters
contents. But it's content does not seem to match dynamic topn structure:

$ x86_64-pc-linux-gnu-gcov-dump -l _json.gcda

...
_json.gcda:01a7:   0:COUNTERS topn 0 counts
_json.gcda:01a9:  48:COUNTERS indirect_call 24 counts
_json.gcda:   0: 1 1 140325305737168 1 1 140325305737200 0 0
_json.gcda:   8: 0 0 0 0 0 0 0 0
_json.gcda:  16: 0 0 0 0 0 0 0 0
...

Note how 140325305737200 is in the middle of topn.

My wild guess is that before

  commit 871e5ada6d53d5eb495cc9f323983f347487c1b2
  Author: Martin Liska 
  Date:   Fri Jan 31 13:10:14 2020 +0100

Make TOPN counter dynamically allocated.

both indirect_call and topn had the same fixed n-value structure and were ~ok
to be merged with __gconv_merge_topn().

But now topn got a special 0-values case (why did we emit it at all?) that
merger can't handle and gets slightly past the beginning of 'indirect_call'
section.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #1 from Jeffrey A. Law  ---
I suspect this is the same thing we're seeing with the dozen or so armv7/NEON
failures with LTO in Fedora.   It was on my list to reduce, but hadn't gotten
to it yet.

[Bug target/96939] New: LTO vs. different arm arch options

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

Bug ID: 96939
   Summary: LTO vs. different arm arch options
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

$ cat a1.c
extern unsigned crc (unsigned, const void *);
typedef unsigned (*fnptr) (unsigned, const void *);
volatile fnptr fn;

int
main ()
{
  fn = crc;
  return 0;
}
$ cat a2.c
#include 

unsigned
crc (unsigned x, const void *y)
{
  return __crc32cw (x, *(unsigned *) y);
}
$ ./xgcc -B ./ -O2 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a
-mabi=aapcs-linux -mfloat-abi=hard -c a1.c -flto
$ ./xgcc -B ./ -O2 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a
-mabi=aapcs-linux -mfloat-abi=hard -march=armv8-a+crc -c a2.c -flto
$ ./xgcc -B ./ -r -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a
-mabi=aapcs-linux -mfloat-abi=hard -o a a1.o a2.o -flto

results in:
a2.c: In function ‘crc’:
a2.c:6:10: error: this builtin is not supported for this target
6 |   return __crc32cw (x, *(unsigned *) y);
  |  ^

Adding
__attribute__((target ("arch=armv8-a+crc"))) 
to crc function doesn't help.

In gdb I see
(gdb) p global_options.x_arm_arch_string 
$2 = 0x2a38d50 "armv8-a+crc+simd"
(gdb) p arm_arch_crc
$3 = 0
which means the function got proper target attribute even if it didn't have
one, TARGET_OPTIONS and the like, but arm_option_reconfigure_globals wasn't
really called when changing current function from the armv7 built one (or the
default) and the armv8-a+crc+simd one.
I'm afraid this makes LTO not work at all on arm when one mixes command line
options between TUs or uses target attribute.

[Bug tree-optimization/96938] New: Failure to optimize bit-setting pattern when not using temporary

2020-09-04 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96938

Bug ID: 96938
   Summary: Failure to optimize bit-setting pattern when not using
temporary
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void g(char *f, int offset, char value)
{
*f = (int)(*f & ~(1 << (offset & 0x1F))) | (value << (offset & 0x1F));
}

This has much worse code generation than this:

void g(char *f, int offset, char value)
{
int tmp = *f & ~(1 << (offset & 0x1F));
*f = tmp | (value << (offset & 0x1F));
}

Which should be equivalent to the first example.

Example of the worse code generation, on x86 the first example compiles to
this:

g(char*, int, char):
  movzx ecx, BYTE PTR [rdi]
  mov eax, 1
  movsx edx, dl
  shlx eax, eax, esi
  shlx edx, edx, esi
  andn eax, eax, ecx
  or eax, edx
  mov BYTE PTR [rdi], al
  ret

Whereas the second example compiles to this:

g(char*, int, char):
  movsx eax, BYTE PTR [rdi]
  movsx edx, dl
  shlx edx, edx, esi
  btr eax, esi
  or eax, edx
  mov BYTE PTR [rdi], al
  ret

[Bug debug/96937] Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance

2020-09-04 Thread simon.marchi at polymtl dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937

--- Comment #2 from Simon Marchi  ---
Created attachment 49181
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49181=edit
Output from creduce

I compile the reproducer program with:

/opt/gcc/git/bin/g++ -x c++ -g3 -O2 -c bug.c

[Bug debug/96937] Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance

2020-09-04 Thread simon.marchi at polymtl dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937

--- Comment #1 from Simon Marchi  ---
I passed the program in creduce, the result is not pretty but it's not too big
and still reproduces the problem, so I'll attach it anyway.

[Bug debug/96937] New: Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance

2020-09-04 Thread simon.marchi at polymtl dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937

Bug ID: 96937
   Summary: Duplicate DW_TAG_formal_parameter in out-of-line
DW_TAG_subprogram instance
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: simon.marchi at polymtl dot ca
  Target Milestone: ---

While debugging GDB (compiled with GCC master and -O2) with GDB, I get:

972 do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR
addr)
(top-gdb) frame
#0  do_examine (gdbarch=0x249a6b0, addr=0x400636, fmt=..., fmt=...) at
/home/smarchi/src/binutils-gdb/gdb/printcmd.c:972
972 do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR
addr)

Note that the parameters shown by GDB aren't in the same order as in the
function declaration, and that "fmt" is duplicated.

0x0004e0bd:   DW_TAG_subprogram
DW_AT_abstract_origin [DW_FORM_ref4](0x0004a912
"do_examine")
...

0x0004e0cd: DW_TAG_formal_parameter
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a92e "gdbarch")
  ...

0x0004e0d6: DW_TAG_formal_parameter
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a93b "addr")
  ...

0x0004e0df: DW_TAG_variable
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a948 "format")
  ...

0x0004e0e8: DW_TAG_variable
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a955 "size")
  ...

... some more variables ...


0x0004e130: DW_TAG_formal_parameter
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a921 "fmt")

0x0004e135: DW_TAG_formal_parameter
  DW_AT_abstract_origin [DW_FORM_ref4]  (0x0004a921 "fmt")


This matches what we see in GDB: the parameters are not in the same order as in
the function declaration and fmt is duplicated.  The abstract origin has them
correct:

0x0004a912:   DW_TAG_subprogram
DW_AT_name [DW_FORM_strp]   ("do_examine")
...

0x0004a921: DW_TAG_formal_parameter
  DW_AT_name [DW_FORM_string]   ("fmt")
  ...

0x0004a92e: DW_TAG_formal_parameter
  DW_AT_name [DW_FORM_strp] ("gdbarch")
  ...

0x0004a93b: DW_TAG_formal_parameter
  DW_AT_name [DW_FORM_strp] ("addr")
  ...

After reading bug #49828, I presume that the fact that the parameters are not
in the right order is not considered a bug.  GDB could cope with that by
sorting them to be in the same order as what's in the abstract origin.

However, having fmt there twice could maybe be considered a bug, hence I am
reporting it.

GDB commit (used as the debugged program):
c5cd900e4f197870812c2d3e2c194871c171ef42
GCC commit: 8ad3fc6ca46c603d9c3efe8e6d4a8f2ff1a893a4

[Bug tree-optimization/96920] [10 Regression] ICE segmentation fault in tree-vectorizer at -O3

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920

Richard Biener  changed:

   What|Removed |Added

  Known to work||11.0
  Known to fail|11.0|
Summary|[10/11 Regression] ICE  |[10 Regression] ICE
   |segmentation fault in   |segmentation fault in
   |tree-vectorizer at -O3  |tree-vectorizer at -O3

--- Comment #6 from Richard Biener  ---
Fixed on trunk sofar.

[Bug tree-optimization/96698] [10 Regression] ICE during GIMPLE pass:vect

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96698

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:46a58c779af3055a4b10b285a1f4be28abe4351c

commit r11-3013-g46a58c779af3055a4b10b285a1f4be28abe4351c
Author: Richard Biener 
Date:   Fri Sep 4 14:35:39 2020 +0200

tree-optimization/96920 - another ICE when vectorizing nested cycles

This refines the previous fix for PR96698 by re-doing how and where
we arrange for setting vectorized cycle PHI backedge values.

2020-09-04  Richard Biener  

PR tree-optimization/96698
PR tree-optimization/96920
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Remove vectorized
cycle PHI latch code.
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): New
helper to set vectorized cycle PHI latch values.
(vect_transform_loop): Walk over all PHIs again after
vectorizing them, calling maybe_set_vectorized_backedge_value.
Call maybe_set_vectorized_backedge_value for each vectorized
stmt.  Remove delayed update code.
* tree-vect-slp.c (vect_analyze_slp_instance): Initialize
SLP instance reduc_phis member.
(vect_schedule_slp): Set vectorized cycle PHI latch values.

* gfortran.dg/vect/pr96920.f90: New testcase.
* gcc.dg/vect/pr96920.c: Likewise.

[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:46a58c779af3055a4b10b285a1f4be28abe4351c

commit r11-3013-g46a58c779af3055a4b10b285a1f4be28abe4351c
Author: Richard Biener 
Date:   Fri Sep 4 14:35:39 2020 +0200

tree-optimization/96920 - another ICE when vectorizing nested cycles

This refines the previous fix for PR96698 by re-doing how and where
we arrange for setting vectorized cycle PHI backedge values.

2020-09-04  Richard Biener  

PR tree-optimization/96698
PR tree-optimization/96920
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Remove vectorized
cycle PHI latch code.
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): New
helper to set vectorized cycle PHI latch values.
(vect_transform_loop): Walk over all PHIs again after
vectorizing them, calling maybe_set_vectorized_backedge_value.
Call maybe_set_vectorized_backedge_value for each vectorized
stmt.  Remove delayed update code.
* tree-vect-slp.c (vect_analyze_slp_instance): Initialize
SLP instance reduc_phis member.
(vect_schedule_slp): Set vectorized cycle PHI latch values.

* gfortran.dg/vect/pr96920.f90: New testcase.
* gcc.dg/vect/pr96920.c: Likewise.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-04 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #4 from Segher Boessenkool  ---
Yes, timing suggests there is some SHL/LHS flush.

On p9 and later we can use mtvsrdd instead of mtvsrd (moving two
bytes into place at one), which reduces the number of moves from
16 to 8, and the number of merges from 15 to 7 (and reduces path
length by 1).  This sounds like a no-brainer win with that :-)

[Bug c++/96936] brace initialization of const char* from string literal in specific cases doesn't compile

2020-09-04 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96936

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org
 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Marek Polacek  ---
Dup.  I should get to this soon.

*** This bug has been marked as a duplicate of bug 84930 ***

[Bug c++/84930] Brace-closed initialization of cstring (i.e."abcdefghi") to coresponding aggregate types fails in certain situation

2020-09-04 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84930

Marek Polacek  changed:

   What|Removed |Added

 CC||kirshamir at gmail dot com

--- Comment #6 from Marek Polacek  ---
*** Bug 96936 has been marked as a duplicate of this bug. ***

[Bug preprocessor/96935] ICE in subspan, at input.h:69

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935

--- Comment #3 from Jan Smets  ---
A bisect resulted in this commit :

commit 0d48e8779c6a9ac88f5efd1b4a2d40f43ef75faf
Author: David Malcolm 
Date:   Fri Oct 5 19:02:17 2018 +

Support string locations for C++ in -Wformat (PR c++/56856)

-Wformat in the C++ FE doesn't work as well as it could:
(a) it doesn't report precise locations within the string literal, and
(b) it doesn't underline arguments for those arguments
!CAN_HAVE_LOCATION_P,
despite having location wrapper nodes.



Your suggestion doesn't trigger it for me.
I'v  built GCC with -g -O0 , but the standard provided backtrace didn't include
function call arguments.

A printf confirms your suspicion about start.column == 0 
 => start.column=0 literal_length=1

[Bug c++/96936] New: brace initialization of const char* from string literal in specific cases doesn't compile

2020-09-04 Thread kirshamir at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96936

Bug ID: 96936
   Summary: brace initialization of const char* from string
literal in specific cases doesn't compile
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirshamir at gmail dot com
  Target Milestone: ---

template
auto convert(U&& t) {
// fails - see link to compiler explorer:
return T{std::forward(t)};
// succeeds:
// return T(std::forward(t));
}

Code: https://godbolt.org/z/5q5sfb

Related to (seem to be the same issue):
https://stackoverflow.com/questions/63740618/need-for-stddecay-in-noexcept-operator

[Bug tree-optimization/96820] ICE in verify_sra_access_forest with array and out of bounds reference

2020-09-04 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96820

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Martin Jambor  ---
Fixed.

[Bug tree-optimization/96820] ICE in verify_sra_access_forest with array and out of bounds reference

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96820

--- Comment #8 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Martin Jambor
:

https://gcc.gnu.org/g:75f5776b3fc4dad7453f8b9cf1690bd2ad628991

commit r10-8709-g75f5776b3fc4dad7453f8b9cf1690bd2ad628991
Author: Martin Jambor 
Date:   Fri Sep 4 14:31:16 2020 +0200

sra: Avoid SRAing if there is an aout-of-bounds access (PR 96820)

The testcase causes and ICE in the SRA verifier on x86_64 when
compiling with -m32 because build_user_friendly_ref_for_offset looks
at an out-of-bounds array_ref within an array_ref which accesses an
offset which does not fit into a signed 32bit integer and turns it
into an array-ref with a negative index.

The best thing is probably to bail out early when encountering an out
of bounds access to a local stack-allocated aggregate (and let the DSE
just delete such statements) which is what the patch does.

I also glanced over to the initial candidate vetting routine to make
sure the size would fit into HWI and noticed that it uses unsigned
variants whereas the rest of SRA operates on signed offsets and
sizes (because get_ref_and_extent does) and so changed that for the
sake of consistency.  These ancient checks operate on sizes of types
as opposed to DECLs but I hope that any issues potentially arising
from that are basically hypothetical.

gcc/ChangeLog:

2020-08-28  Martin Jambor  

PR tree-optimization/96820
* tree-sra.c (create_access): Disqualify candidates with accesses
beyond the end of the original aggregate.
(maybe_add_sra_candidate): Check that candidate type size fits
signed uhwi for the sake of consistency.

gcc/testsuite/ChangeLog:

2020-08-28  Martin Jambor  

PR tree-optimization/96820
* gcc.dg/tree-ssa/pr96820.c: New test.

(cherry picked from commit 8ad3fc6ca46c603d9c3efe8e6d4a8f2ff1a893a4)

[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920

--- Comment #4 from Richard Biener  ---
Another example:

int a[1024];
int b[2048];

void foo (int x, int y)
{
  for (int i = 0; i < 1024; ++i)
{
  int tem0 = b[2*i];
  int tem1 = b[2*i+1];
  for (int j = 0; j < 32; ++j)
{
  int tem = tem0;
  tem0 = tem1;
  tem1 = tem;
  a[i] += tem0;
}
}
}

[Bug tree-optimization/96929] Failure to optimize right shift of -1 to -1

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96929

Jakub Jelinek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-09-04
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
We already have a rule for this:
/* Optimize -1 >> x for arithmetic right shifts.  */
(simplify
 (rshift integer_all_onesp@0 @1)
 (if (!TYPE_UNSIGNED (type)
  && tree_expr_nonnegative_p (@1))
  @0))
but the rule requires that the shift count is non-negative.
That was added in PR38359 fix.
Even current wide_int_binop has:
case RSHIFT_EXPR:
case LSHIFT_EXPR:
  if (wi::neg_p (arg2))
{
  tmp = -arg2;
  if (code == RSHIFT_EXPR)
code = LSHIFT_EXPR;
  else
code = RSHIFT_EXPR;
}
  else
tmp = arg2;
and so it treats rshift shifts by negative values as left shifts.
So, if we wanted to fix this PR, we'd need to remove the
tree_expr_nonnegative_p and change wide_int_binop to perhaps best punt on
negative arg2 instead of trying to handle it.  Wonder what it will break.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #3 from Richard Biener  ---
very likely the byte stores and then the following vector load will also
trigger
STLF issues.

[Bug preprocessor/96935] ICE in subspan, at input.h:69

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935

--- Comment #2 from Richard Biener  ---
Guess the error is simply that we fall back to no columns and thus start.column
== 0 and we do

  char_span literal = line.subspan (start.column - 1, literal_length);

which means input.c:1467 should check whether start.column is >= 1

Might also trigger with

printf ("\
..")

who knows.

Your backtrace doesn't contain function argument values to verify.  Maybe you
can
build GCC with -O0 once and check?

[Bug target/96769] -mpure-code produces suboptimal code for immediate generation for thumb-1

2020-09-04 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96769

Christophe Lyon  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Christophe Lyon  ---
Fixed on trunk.

[Bug target/96769] -mpure-code produces suboptimal code for immediate generation for thumb-1

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96769

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:2033a63cbd0aab27d3a8450b4a4a5b371d583c85

commit r11-3011-g2033a63cbd0aab27d3a8450b4a4a5b371d583c85
Author: Christophe Lyon 
Date:   Fri Sep 4 11:48:36 2020 +

arm: Improve immediate generation for thumb-1 with -mpurecode [PR96769]

This patch moves the move-immediate splitter after the regular ones so
that it has lower precedence, and updates its constraints.

For
int f3 (void) { return 0x1100; }
int f3_2 (void) { return 0x12345678; }

we now generate:
* with -O2 -mcpu=cortex-m0 -mpure-code:
f3:
movsr0, #136
lslsr0, r0, #21
bx  lr
f3_2:
movsr0, #18
lslsr0, r0, #8
addsr0, r0, #52
lslsr0, r0, #8
addsr0, r0, #86
lslsr0, r0, #8
addsr0, r0, #121
bx  lr

* with -O2 -mcpu=cortex-m23 -mpure-code:
f3:
movsr0, #136
lslsr0, r0, #21
bx  lr
f3_2:
movwr0, #22136
movtr0, 4660
bx  lr

2020-09-04  Christophe Lyon  

PR target/96769
gcc/
* config/arm/thumb1.md: Move movsi splitter for
arm_disable_literal_pool after the other movsi splitters.

gcc/testsuite/
* gcc.target/arm/pure-code/pr96769.c: New test.

[Bug preprocessor/96935] ICE in subspan, at input.h:69

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935

--- Comment #1 from Jan Smets  ---
Proper backtrace (10.2)

x.cpp: In function ‘void a()’:
x.cpp:3: internal compiler error: in subspan, at input.h:69
3 | #define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt,
str, ##args)
  |
x.cpp:7: note: in expansion of macro ‘DB_PRINTF’
7 |   DB_PRINTF("", "%llu", 0);
  |
0x168ee7b char_span::subspan(int, int) const
/jasmets/git/tools/gcc/gcc/input.h:69
0x168ee7b get_substring_ranges_for_loc
/jasmets/git/tools/gcc/gcc/input.c:1467
0x168ee7b get_location_within_string(cpp_reader*, string_concat_db*, unsigned
int, cpp_ttype, int, int, int, unsigned int*)
/jasmets/git/tools/gcc/gcc/input.c:1553
0x8fad84 c_get_substring_location(substring_loc const&, unsigned int*)
/jasmets/git/tools/gcc/gcc/c-family/c-common.c:903
0x92e3ad get_corrected_substring
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:4505
0x92e3ad format_type_warning
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:4721
0x93142b check_format_types
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:4266
0x93142b argument_parser::check_argument_type(format_char_info const*,
length_modifier const&, tree_node*&, char const*&, bool, unsigned long&,
tree_node*&, int, char const*, char const*, unsigned int, char)
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:2859
0x9332e0 check_format_info_main
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:3998
0x9332e0 check_format_arg
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:1821
0x92f3a2 check_format_info
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:1543
0x92f3a2 check_function_format(tree_node const*, tree_node*, int, tree_node**,
vec*)
/jasmets/git/tools/gcc/gcc/c-family/c-format.c:1197
0x922f09 check_function_arguments(unsigned int, tree_node const*, tree_node
const*, int, tree_node**, vec*)
/jasmets/git/tools/gcc/gcc/c-family/c-common.c:5730
0x77d86f build_over_call
/jasmets/git/tools/gcc/gcc/cp/call.c:8901
0x77f2ea build_new_function_call(tree_node*, vec**, int)
/jasmets/git/tools/gcc/gcc/cp/call.c:4613
0x8baac6 finish_call_expr(tree_node*, vec**, bool,
bool, int)
/jasmets/git/tools/gcc/gcc/cp/semantics.c:2672
0x864abf cp_parser_postfix_expression
/jasmets/git/tools/gcc/gcc/cp/parser.c:7468
0x84d261 cp_parser_unary_expression
/jasmets/git/tools/gcc/gcc/cp/parser.c:8563
0x846d11 cp_parser_cast_expression
/jasmets/git/tools/gcc/gcc/cp/parser.c:9459
0x8473e1 cp_parser_binary_expression
/jasmets/git/tools/gcc/gcc/cp/parser.c:9562

[Bug preprocessor/96935] New: ICE in subspan, at input.h:69

2020-09-04 Thread jan.smets at nokia dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935

Bug ID: 96935
   Summary: ICE in subspan, at input.h:69
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jan.smets at nokia dot com
  Target Milestone: ---

Following ICE is seen :

x.cpp: In function 'void a()':
x.cpp:3: internal compiler error: in subspan, at input.h:69
3 | #define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt,
str, ##args)
  |
x.cpp:7: note: in expansion of macro 'DB_PRINTF'
7 |   DB_PRINTF("", "%llu", 0);
  |


Target: x86_64-linux-gnu
Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu
--disable-multilib --enable-languages=c,c++,fortran,go

Compiled with:  -O2 -Wformat 

Reproduces with 
 10.2, 10.1
 9.3, 9.1
Works with
 8.4
 7.5


The reduced testcase is :

#include "x.h"
#define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt, str,
##args)
extern "C" void db_printf(unsigned indent_len, unsigned column_split, const
char * fmt, const char * str, ...) __attribute__ ((format (printf, 3, 5)));
void a() {
  unsigned int indent_len = 0;
  DB_PRINTF("", "%llu", 0);
  // preprocesses to: db_printf(indent_len, 50, "%llu", "");
}

But I suspect the testcase is just garbage.

I tried various ways trying to reduce x.h, but even the slightest change makes
the problem go away. "x.h" recursively includes about 200 other header files.
A "flat" x.h (700k lines) (-fdirectives-only) does not reproduce the issue.

This ICE occurs on couple of dozen files in my project. Some print
 "during GIMPLE pass: strlen".

Goes away with --enable-checking=no, but then other problems show up (ICE in
linemap_compare_locations, at libcpp/line-map.c:1359 - which may or may not be
related)

I suppose my next best option is start a bisect between 8.x and 9.x ?

Thanks

[Bug tree-optimization/96512] wrong code generated with avx512 intrinsics in some cases

2020-09-04 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512

H.J. Lu  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
   See Also||https://sourceware.org/bugz
   ||illa/show_bug.cgi?id=23465
 Resolution|--- |MOVED

--- Comment #7 from H.J. Lu  ---
Binutils bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=23465

[Bug tree-optimization/96512] wrong code generated with avx512 intrinsics in some cases

2020-09-04 Thread nathanael.schaeffer at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512

--- Comment #6 from N Schaeffer  ---
Hello,

Working further on this, it seems to be a problem in the assembler step, but
only on some installations.

I have a system where gcc 8.3 to 9 and 10 are good (no bug), while another
system where gcc 8.3, 9.1 and 10.1 are NOT good (bug!)

On the buggy system, when doing:
   gcc -O1 -D_GCC_VEC_=1 -march=skylake-avx512 -c bug_gcc_avx512.c
and disasembling with gdb, one can see the offending instruction has been
generated:
vbroadcastsd 0x1(,%r8,8),%zmm

but when outputing assembly code like so:
   gcc -O1 -D_GCC_VEC_=1 -march=skylake-avx512 -S bug_gcc_avx512.c
the instruction in the bug_gcc_avx512.s file reads:
   vbroadcastsd8(,%r8,8), %zmm0
invoking now the assembler:
as bug_gcc_avx512.s, the offending instruction is indeed generated.
 vbroadcastsd 0x1(,%r8,8),%zmm0


So here are the "as --version" on various systems:

GNU assembler version 2.27-41.base.el7_7.3  ==> NO BUG
GNU assembler (GNU Binutils for Debian) 2.28 ==> NO BUG

GNU assembler version 2.30-58.el8_1.2 ==> BUG!

Assembleur GNU (GNU Binutils) 2.34  ==> NO BUG
Assembleur GNU (GNU Binutils) 2.35  ==> NO BUG


Maybe I should post this bug report somewhere else?

[Bug middle-end/91490] [9 Regression] bogus argument missing terminating nul warning on strlen of a flexible array member

2020-09-04 Thread wielkiegie at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91490

Gustaw Smolarczyk  changed:

   What|Removed |Added

 CC||wielkiegie at gmail dot com

--- Comment #8 from Gustaw Smolarczyk  ---
Bug 96934 is potentially related to this issue, but I have also found a
miscompilation of strcmp call that is possibly related.

[Bug c++/96934] Copy initialization of struct involving aggregate array initialization miscompiles in GCC 9

2020-09-04 Thread wielkiegie at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96934

--- Comment #1 from Gustaw Smolarczyk  ---
It seems that part of this issue was already reported in another bug report
(though the report is about flexible array members, the comment does not
reference them):

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91490#c6

However, I don't see any mention of the miscompilation in the thread. Possibly
the issue didn't surface in comment 6 as the tested string has only a single
character (+ the terminating null character). Or strcmp is needed in order to
trigger it.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-04 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #2 from Kewen Lin  ---
(In reply to Segher Boessenkool from comment #1)
> Is that actually faster though?  The original has shorter dependency
> chains.  Or is this to avoid some LHS/SHL?

Yes, I tested it with one constructed case, the original version takes 18.20s
while the optimized version takes 8.40s. And yes, I guess it's due to LHS/SHL
similar to the vec_insert issue xionghu is working on.

[Bug libstdc++/96731] uniform_int_distribution requirement that its type is_integral is too strict

2020-09-04 Thread TonyELewis at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96731

--- Comment #5 from Tony E Lewis  ---
Thanks very much for your work on this.

That's a shame but I appreciate the problems you've highlighted.

> I don't plan to work on this any further for now.

Yes, fair enough.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-04 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #1 from Segher Boessenkool  ---
Is that actually faster though?  The original has shorter dependency
chains.  Or is this to avoid some LHS/SHL?

[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:fab77644842869adc8871e133e4c3f4c35b2b245

commit r11-3009-gfab77644842869adc8871e133e4c3f4c35b2b245
Author: Richard Biener 
Date:   Fri Sep 4 12:18:38 2020 +0200

tree-optimization/96931 - clear ctrl-altering flag more aggressively

The testcase shows that we fail to clear gimple_call_ctrl_altering_p
when the last abnormal edge goes away, causing an edge insert to
a loop header edge when we have preheaders to split the edge
unnecessarily.

The following addresses this by more aggressively clearing the
flag in cleanup_call_ctrl_altering_flag.

2020-09-04  Richard Biener  

PR tree-optimization/96931
* tree-cfgcleanup.c (cleanup_call_ctrl_altering_flag): If
there's a fallthru edge and no abnormal edge the call is
no longer control-altering.
(cleanup_control_flow_bb): Pass down the BB to
cleanup_call_ctrl_altering_flag.

* gcc.dg/pr96931.c: New testcase.

[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug c++/96934] New: Copy initialization of struct involving aggregate array initialization miscompiles in GCC 9

2020-09-04 Thread wielkiegie at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96934

Bug ID: 96934
   Summary: Copy initialization of struct involving aggregate
array initialization miscompiles in GCC 9
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wielkiegie at gmail dot com
  Target Milestone: ---

GCC 9 (but not <= 8 and not >= 10, including trunk) miscompiles the following
piece of code. I have tested GCC 9.1 (which produces even worse code, as it
assumes the const char* has only a single character) and GCC 9.2, 9.3
(explained below). I was not able to test the current GCC 9 branch.

Test case: https://godbolt.org/z/jPq4h7

The Code struct holds an array of chars that is meant to be always
null-terminated. A simple constructor is provided to simulate how the array
should be initialized (this is a reduced real world scenario). It uses
aggregate initialization in order to store the "12" string.

There are two problems, probably originating from the same underlying issue:
1. Bogus -Wstringop-overflow warning saying the _buffer is unterminated, while
it most certainly is (as you can see in the assembly).
2. std::strcmp call miscompiled as if _buffer == "1" (and not "12").

Switching the TEST define on line 17 into T1, T2, T3, T4 doesn't change the
outcome. However, T5 and T6 fix the issue. What seems to be the difference is
the copy-initialization [1] being involved in T1 and T2 (and T3, T4
"inheriting" the buggy state). T5 doesn't do copy-initialization, and T6 just
copies it.

[1] https://en.cppreference.com/w/cpp/language/copy_initialization

[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920

--- Comment #3 from Richard Biener  ---
It's similar to PR96698 where we had a nested cycle where a cycle PHI was fed
by an induction.  Here we're feeding the cycle PHI by another cycle PHI so the
fancy detection of computing a latch value doesn't work since the def is both
the PHI of a cycle _and_ the latch def (in PR96698 we were safe because the def
was an induction, not clashing with data structures).

I never liked the current code much but now have to think about sth more
reliable
and workable for SLP.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-04 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 CC||bergner at gcc dot gnu.org,
   ||linkw at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
Summary|inefficient code for|rs6000: inefficient code
   |char/short vec CTOR |for char/short vec CTOR
   Last reconfirmed||2020-09-04
 Target||powerpc
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

[Bug target/96933] New: inefficient code for char/short vec CTOR

2020-09-04 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

Bug ID: 96933
   Summary: inefficient code for char/short vec CTOR
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

When I'm investigate the vectorization cost for vec_construct, I happened to
find the generated code for vector construction is inefficient with DIRECT_MOVE
support.

The test case looks like:

vector unsigned char test_char(unsigned char f1, unsigned char f2,
   unsigned char f3, unsigned char f4,
   unsigned char f5, unsigned char f6,
   unsigned char f7, unsigned char f8,
   unsigned char f9, unsigned char f10,
   unsigned char f11, unsigned char f12,
   unsigned char f13, unsigned char f14,
   unsigned char f15, unsigned char f16) {

  vector unsigned char v = {f1, f2,  f3,  f4,  f5,  f6,  f7,  f8,
f9, f10, f11, f12, f13, f14, f15, f16};
  return v;
}

The generated code currently with -mcpu=power9:

 :
   0:   e8 ff a1 fb std r29,-24(r1)
   4:   f0 ff c1 fb std r30,-16(r1)
   8:   f8 ff e1 fb std r31,-8(r1)
   c:   60 00 a1 8b lbz r29,96(r1)
  10:   68 00 c1 8b lbz r30,104(r1)
  14:   70 00 e1 8b lbz r31,112(r1)
  18:   d1 ff 81 98 stb r4,-47(r1)
  1c:   d2 ff a1 98 stb r5,-46(r1)
  20:   78 00 81 89 lbz r12,120(r1)
  24:   80 00 01 88 lbz r0,128(r1)
  28:   88 00 61 89 lbz r11,136(r1)
  2c:   90 00 81 88 lbz r4,144(r1)
  30:   98 00 a1 88 lbz r5,152(r1)
  34:   d0 ff 61 98 stb r3,-48(r1)
  38:   d3 ff c1 98 stb r6,-45(r1)
  3c:   d4 ff e1 98 stb r7,-44(r1)
  40:   d8 ff a1 9b stb r29,-40(r1)
  44:   d5 ff 01 99 stb r8,-43(r1)
  48:   d6 ff 21 99 stb r9,-42(r1)
  4c:   d7 ff 41 99 stb r10,-41(r1)
  50:   d9 ff c1 9b stb r30,-39(r1)
  54:   da ff e1 9b stb r31,-38(r1)
  58:   db ff 81 99 stb r12,-37(r1)
  5c:   dc ff 01 98 stb r0,-36(r1)
  60:   dd ff 61 99 stb r11,-35(r1)
  64:   de ff 81 98 stb r4,-34(r1)
  68:   df ff a1 98 stb r5,-33(r1)
  6c:   e8 ff a1 eb ld  r29,-24(r1)
  70:   f0 ff c1 eb ld  r30,-16(r1)
  74:   f8 ff e1 eb ld  r31,-8(r1)
  78:   d9 ff 41 f4 lxv vs34,-48(r1)
  7c:   20 00 80 4e blr

But it can be more efficient with direct move and vector merge, such as:

   0:   67 01 43 7c mtvsrd  vs34,r3
   4:   68 00 61 80 lwz r3,104(r1)
   8:   60 00 61 81 lwz r11,96(r1)
   c:   67 01 64 7c mtvsrd  vs35,r4
  10:   70 00 81 80 lwz r4,112(r1)
  14:   67 01 03 7d mtvsrd  vs40,r3
  18:   78 00 61 80 lwz r3,120(r1)
  1c:   67 01 85 7c mtvsrd  vs36,r5
  20:   67 01 a6 7c mtvsrd  vs37,r6
  24:   67 01 07 7c mtvsrd  vs32,r7
  28:   67 01 28 7c mtvsrd  vs33,r8
  2c:   67 01 24 7d mtvsrd  vs41,r4
  30:   80 00 81 80 lwz r4,128(r1)
  34:   0c 10 43 10 vmrghb  v2,v3,v2
  38:   67 01 63 7c mtvsrd  vs35,r3
  3c:   88 00 61 80 lwz r3,136(r1)
  40:   67 01 eb 7c mtvsrd  vs39,r11
  44:   0c 20 85 10 vmrghb  v4,v5,v4
  48:   67 01 a4 7c mtvsrd  vs37,r4
  4c:   90 00 81 80 lwz r4,144(r1)
  50:   0c 00 01 10 vmrghb  v0,v1,v0
  54:   67 01 23 7c mtvsrd  vs33,r3
  58:   98 00 61 80 lwz r3,152(r1)
  5c:   67 01 c9 7c mtvsrd  vs38,r9
  60:   0c 38 e8 10 vmrghb  v7,v8,v7
  64:   67 01 04 7d mtvsrd  vs40,r4
  68:   0c 48 63 10 vmrghb  v3,v3,v9
  6c:   67 01 23 7d mtvsrd  vs41,r3
  70:   0c 28 a1 10 vmrghb  v5,v1,v5
  74:   67 01 2a 7c mtvsrd  vs33,r10
  78:   0c 40 09 11 vmrghb  v8,v9,v8
  7c:   0c 30 21 10 vmrghb  v1,v1,v6
  80:   4c 11 44 10 vmrglh  v2,v4,v2
  84:   4c 39 63 10 vmrglh  v3,v3,v7
  88:   4c 29 88 10 vmrglh  v4,v8,v5
  8c:   4c 01 a1 10 vmrglh  v5,v1,v0
  90:   8c 19 64 10 vmrglw  v3,v4,v3
  94:   8c 11 45 10 vmrglw  v2,v5,v2
  98:   57 13 43 f0 xxmrgld vs34,vs35,vs34

[Bug target/95535] Failure to optimize out cdqe after __bultin_ctz

2020-09-04 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95535

Gabriel Ravier  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Gabriel Ravier  ---
Looks like this is fixed but not marked as such, so I'll make it so.

[Bug c++/83591] -Wduplicated-branches fires in system headers in template instantiation

2020-09-04 Thread TonyELewis at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83591

--- Comment #7 from Tony E Lewis  ---
Thanks for this comment T vd Sijs. Yes - I'm also able to compile this without
problem in 9.3 (and in 10.1).

Manuel López-Ibáñez: are you happy that all underlying issues are resolved and
this can be closed?

[Bug tree-optimization/94893] Sign function not getting optimized to simple compare

2020-09-04 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893

Gabriel Ravier  changed:

   What|Removed |Added

 Target|x86_64-*-*  |
 Blocks||19987

--- Comment #1 from Gabriel Ravier  ---
No idea why it was marked as x86-specific, it isn't as far as I can see, so I
removed the target here. Do tell me if this is wrong.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
[Bug 19987] [meta-bug] fold missing optimizations in general

[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.

2020-09-04 Thread dmjpp at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419

--- Comment #9 from Dimitrij Mijoski  ---
Ignore my last comment, here is it fixed.

Looking again at my proposed fix in comment #7, i concluded it is not the best
fix. It will fix the testsuite in the same comment #7, but I discovered another
class of errors related to the lines I am touching in that proposed fix.

The error is when we have an incomplete sequence which is in the middle of the
from range, and not at the end. In such cases codecvt_base::error should be
returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4.

I guess some more test need to be written about returning error.

[Bug tree-optimization/94880] Failure to recognize andn pattern

2020-09-04 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94880

Gabriel Ravier  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Gabriel Ravier  ---
This seems to be fixed, but it isn't closed. I'll close it myself, but do tell
me if I somehow missed something.

[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.

2020-09-04 Thread dmjpp at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419

--- Comment #8 from Dimitrij Mijoski  ---
Looking again at my proposed fix in comment #6, i concluded it is not the best
fix. It will fix the testsuite in the same comment #6, but I discovered another
class of errors related to the lines I am touching in that proposed fix.

The error is when we have an incomplete sequence which is in the middle of the
from range, and not at the end. In such cases codecvt_base::error should be
returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4.

I guess some more test need to be written about returning error.

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
For OpenMP reductions, we really don't care what kind of mutex protects the
updates, as long as it is the same for all updates of the same reduction.
I believe we don't rely on any other synchronization effects.
So, I think we should change omp-low.c so that it emits __atomic_* calls with
__ATOMIC_RELAXED rather than __sync_* calls.  And could just use libatomic with
its own locking if we didn't go the GOMP_atomic_{start,end} route (that one is
done if there are multiple reductions or the atomics aren't available or there
are user defined reductions we don't understand (or all?), perhaps we should
consider also using atomics perhaps even for two simple reductions or similar.
And nvptx certainly could just use libatomic...

[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

--- Comment #3 from Richard Biener  ---
So the testcase only triggers on trunk because store commoning is new there and
it transforms (interestingly!)

   [local count: 10631108]:
  p3 ();
  bl = 0;

   [local count: 1073741824]:
  bl.0_1 = bl;
  _2 = bl.0_1 + 1;
  bl = _2;
  goto ; [100.00%]

to

   [local count: 10631108]:
  p3 ();

   [local count: 1073741824]:
  # _8 = PHI <0(2), _2(3)>
  bl = _8;
  bl.0_1 = bl;
  _2 = bl.0_1 + 1;
  goto ; [100.00%]

which predcom tries to "improve" to

   [local count: 10631108]:
  p3 ();
  _10 = bl;

   [local count: 1073741824]:
  # _8 = PHI <0(2), _2(4)>
  # bl_lsm0.3_7 = PHI <_10(2), bl_lsm0.3_6(4)>
  bl = _8;
  bl_lsm0.3_6 = _8;
  bl.0_1 = bl_lsm0.3_6;
  _2 = bl.0_1 + 1;

   [local count: 1073741824]:
  goto ; [100.00%]

so the situation is quite arcane and eventually not worth fixing on branches.

[Bug target/96932] New: [nvptx] atomic_exchange missing barrier

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96932

Bug ID: 96932
   Summary: [nvptx] atomic_exchange missing barrier
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

After digging into GOMP_atomic_start/end I realized these also imply barrier
semantics.

And looking at the source code used for nvptx in libgomp/config/accel/mutex.h,
that should be fine:
...
static inline void
gomp_mutex_lock (gomp_mutex_t *mutex)
{
  while (__sync_lock_test_and_set (mutex, 1))
/* spin */ ;
}

static inline void
gomp_mutex_unlock (gomp_mutex_t *mutex)
{
  __sync_lock_release (mutex);
}
...

However, when looking at the resulting code in libgomp.a we see there's no
barrier for GOMP_atomic_start:
...
.visible .func GOMP_atomic_start
{
.reg .u32 %r22;
.reg .pred %r23;
$L2:
.loc 1 51 10
atom.global.exch.b32 %r22,[atomic_lock],1;
.loc 1 51 9
setp.ne.u32 %r23,%r22,0;
@ %r23 bra $L2;
.loc 2 43 1
ret;
}
...

While there is for GOMP_atomic_end:
...
.visible .func GOMP_atomic_end
{
.reg .u32 %r22;
.loc 1 58 3
membar.sys;
mov.u32 %r22,0;
st.global.u32 [atomic_lock],%r22;
.loc 2 49 1
ret;
}
...

[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

--- Comment #2 from Richard Biener  ---
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index b1d6e63559c..af71c269f4b 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1960,7 +1960,8 @@ initialize_root_vars_lm (class loop *loop, dref root,
bool written,

   init = force_gimple_operand (init, , written, NULL_TREE);
   if (stmts)
-gsi_insert_seq_on_edge_immediate (entry, stmts);
+if (gsi_insert_seq_on_edge_immediate (entry, stmts))
+  entry = loop_preheader_edge (loop);

   if (written)
 {


guess even with simple preheaders passes need to not assume inserting
on the entry edge will not split it (the call in the preheader ends the BB
because we've had returns_twice functions).

Now the ie() call was removed and so was the abnormal edge from p3() so
likely gimple_call_ctrl_altering_p should have been cleared from it which
would be a missed optimization.  Thus an alternative fix is there.

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #2 from Tom de Vries  ---
Hmm, I found this difference: 
- AFAIU, GOMP_atomic_start/end have barrier semantics
- libatomics protect_start/end are always paired with explicit barriers, so
  presumably these don't have barrier semantics

So, using GOMP_atomic_start for protect_start in libatomics will have the
effect of issuing the barrier twice, which might be a performance problem.

[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2020-09-04
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |11.0

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/96931] New: [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359

2020-09-04 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931

Bug ID: 96931
   Summary: [11 Regression] ICE in add_phi_arg, at
tree-phinodes.c:359
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gcc-11.0.0-alpha20200830 snapshot (g:6ccadc4c0486ff011a32c74de1a31148acb3cbe2)
ICEs when compiling the following testcase w/ -O1 -fpredictive-commoning
-fno-tree-loop-im:

int bl;

void
p3 (void);

void __attribute__ ((returns_twice))
ie (void)
{
  p3 ();

  bl = 0;
  for (;;)
++bl;

  ie ();
}

% gcc-11.0.0 -O1 -fpredictive-commoning -fno-tree-loop-im -c d8kidanm.c
during GIMPLE pass: pcom
d8kidanm.c: In function 'ie':
d8kidanm.c:7:1: internal compiler error: in add_phi_arg, at tree-phinodes.c:359
7 | ie (void)
  | ^~
0x6b7058 add_phi_arg(gphi*, tree_node*, edge_def*, unsigned int)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-phinodes.c:359
0xe41fd0 initialize_root_vars_lm
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-predcom.c:1969
0xe41fd0 execute_load_motion
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-predcom.c:2001
0xe41fd0 execute_pred_commoning
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-predcom.c:2265
0xe45897 tree_predictive_commoning_loop
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-predcom.c:3308
0xe45897 tree_predictive_commoning()
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200830/work/gcc-11-20200830/gcc/tree-predcom.c:

[Bug target/96918] Failure to optimize vector shift left+shift right+or to pshuf

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918

--- Comment #7 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #6)
> Or the generic code could try to expand vector rotates by multiplies of
> BITS_PER_UNIT as vector permutations, perhaps only if there is no optab for
> it.  Or trying to expand both permutation and rotate and determine at
> expansion time using costs which sequence is cheaper.

I guess for the specific case we could think of what is canonical for
GIMPLE and then deal with that during RTL expansion as you say.

The question is whether vector rotate or permutes are more likely to
be combined with earlier/later stmts.  Guess there's no clear answer to
that which means there would need to be special handling anyway of
which follows that we might stick to what the source did.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-09-04
 Status|UNCONFIRMED |NEW
 Blocks||19987

--- Comment #1 from Richard Biener  ---
Thanks for the report.  Note there's the PR19987 meta-bug all your expression
simplification reports should 'block' (not so much the target specific ones,
maybe case-by-case)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
[Bug 19987] [meta-bug] fold missing optimizations in general

[Bug c++/96926] [9/10/11 Regression] Tuple element w/ member reference to incomplete template type rejected

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96926

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.4

[Bug other/96927] [11 regression] ICE in libstdc++-v3/include/chrono:442

2020-09-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96927

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0