[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #38 from Kewen Lin  ---
Created attachment 53428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53428=edit
untested patch

A untested patch which can make it pass.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #37 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #36)
> You might need to do -O2 -fPIE -pie to reproduce the issue as debian is
> configured with --enable-default-pie

Thanks for the hint! I can reproduce this but it needs one more explicit cpu
type like -mcpu=power4/5/6. The problem comes from slp1, so
-fno-tree-slp-vectorize can make it pass.

It seems to expose one latent issue, for the code in vect_recog_mulhs_pattern:

  vect_pattern_detected ("vect_recog_mulhs_pattern", last_stmt);

  /* Check for target support.  */
  tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
  if (!new_vectype
  || !direct_internal_fn_supported_p
(ifn, new_vectype, OPTIMIZE_FOR_SPEED))
return NULL;

At this time, the new_vectype is 

(gdb) pge new_vectype
vector(2) short unsigned int

the current target doesn't support umul_highpart optab for V2HImode at all, but
the check doesn't fail since in the function direct_optab_supported_p

static bool
direct_optab_supported_p (direct_optab optab, tree_pair types,
  optimization_type opt_type)
{
  machine_mode mode = TYPE_MODE (types.first);
  gcc_checking_assert (mode == TYPE_MODE (types.second));
  return direct_optab_handler (optab, mode, opt_type) != CODE_FOR_nothing;
}

(gdb) pge types.first
vector(2) short unsigned int
(gdb) p mode
$12 = E_SImode

the current target does support umul_highpart optab for SImode, so it doesn't
fail. But we expected to query with vector mode for the given type, it's wrong
in functionality to use scalar insn for vector operation here, so this result
is unexpected.

[Bug bootstrap/106472] No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'.

2022-08-09 Thread sumbera at volny dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106472

--- Comment #14 from Petr Sumbera  ---
Sorry for late response. Unfortunatelly above patch dosen't make any
difference. The problem is still there.

[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize

2022-08-09 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839

--- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
The relationship is:

A  A::type
| 
| |
BA BA::type   CACA::type
|
CBA CBA::type

class CA and CBA are final, also function CA::type and BA::type are final, then
in function possible_polymorphic_call_targets for "target" BA::type, the
"DECL_FINAL_P (target)" check is not accurate enough, as there may be classes
like CBA derived from BA and have instance that need continue walk recursively
in possible_polymorphic_call_targets_1 to record_target_from_binfo.

  if (target)
{
  /* In the case we get complete method, we don't need 
 to walk derivations.  */
  if (DECL_FINAL_P (target))
context.maybe_derived_type = false;
}

So fix this by belong change only stop walk derivations when target is final
and it's class outer_type->type is also final?

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 412ca14f66b..77f9b268e86 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -3188,7 +3188,9 @@ possible_polymorphic_call_targets (tree otr_type,

   /* In the case we get complete method, we don't need
 to walk derivations.  */
-  if (target && DECL_FINAL_P (target))
+  if (target && TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P
(target)
+ && RECORD_OR_UNION_TYPE_P (out er_type->type)
+ && TYPE_FINAL_P (outer_type->type))
context.speculative_maybe_derived_type = false;
   if (type_possibly_instantiated_p (speculative_outer_type->type))
maybe_record_node (nodes, target, , can_refer,
_complete);
@@ -3233,7 +3235,9 @@ possible_polymorphic_call_targets (tree otr_type,
{
  /* In the case we get complete method, we don't need
 to walk derivations.  */
- if (DECL_FINAL_P (target))
+ if (TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target)
+ && RECORD_OR_UNION_TYPE_P (outer_type->type)
+ && TYPE_FINAL_P (outer_type->type))
context.maybe_derived_type = false;
}

[Bug other/106575] New: new test case gcc.dg/fold-eqandshift-4.c fails

2022-08-09 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106575

Bug ID: 106575
   Summary: new test case gcc.dg/fold-eqandshift-4.c fails
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:6fc14f1963dfefead588a4cd8902d641ed69255c, r13-2005-g6fc14f1963dfef

make  -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/fold-eqandshift-4.c"
FAIL: gcc.dg/fold-eqandshift-4.c scan-tree-dump-times optimized "return [01]"
14
FAIL: gcc.dg/fold-eqandshift-4.c scan-tree-dump-times optimized
"x_[0-9]\\(D\\)" 18
# of expected passes6
# of unexpected failures2

commit 6fc14f1963dfefead588a4cd8902d641ed69255c (HEAD, refs/bisect/bad)
Author: Roger Sayle 
Date:   Tue Aug 9 18:54:43 2022 +0100

middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs

2022-08-09 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338

--- Comment #6 from Kito Cheng  ---
My understanding is static chain is sort of compiler internal implementation,
any register could be picked if that is not used for passing argument, so I
would also prefer keep that out psABI spec for now.


And just record info for myself:

x86-64 ABI has document function's static chain pointer in their ABI

https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex#L701

[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines

2022-08-09 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573

David Malcolm  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from David Malcolm  ---
Should be fixed by the above patch.

[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573

--- Comment #2 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:bddd8d86e3036e480158ba9219ee3f290ba652ce

commit r13-2007-gbddd8d86e3036e480158ba9219ee3f290ba652ce
Author: David Malcolm 
Date:   Tue Aug 9 19:58:54 2022 -0400

analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on
special-cased functions [PR106573]

We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).

Fixed thusly.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Ensure that we call
get_arg_svalue on all arguments.

gcc/testsuite/ChangeLog:
PR analyzer/106573
* gcc.dg/analyzer/error-uninit.c: New test.
* gcc.dg/analyzer/fd-uninit-1.c: New test.
* gcc.dg/analyzer/file-uninit-1.c: New test.

Signed-off-by: David Malcolm 

[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests

2022-08-09 Thread michael.hudson at canonical dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574

--- Comment #3 from Michael Hudson-Doyle  
---
Certainly this could be "handled" by bumping the tolerance I guess. Not sure
how to tell if that is appropriate though...

[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574

--- Comment #2 from Andrew Pinski  ---
this is just 2 ulp difference ...

This could be constant folding difference between GCC and what is done for
_Float128 in the software. Which could mean this is a not a bug.

[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests

2022-08-09 Thread michael.hudson at canonical dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574

--- Comment #1 from Michael Hudson-Doyle  
---
oops forgot the link to my glibc bug
https://sourceware.org/bugzilla/show_bug.cgi?id=29463

[Bug c/106574] New: gcc 12 with O3 leads to failures in glibc's y1f128 tests

2022-08-09 Thread michael.hudson at canonical dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574

Bug ID: 106574
   Summary: gcc 12 with O3 leads to failures in glibc's y1f128
tests
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.hudson at canonical dot com
  Target Milestone: ---

Initially reported here, but more likely to be a gcc issue: if I build glibc
with gcc 12 and -O3 (as is the default in Debian/Ubuntu) I get this failure:

(kinetic-amd64)root@anduril:/build/glibc-EA2Jch/glibc-2.36/build-tree/amd64-libc#
./elf/ld-linux-x86-64.so.2 --library-path .:./elf:./math 
./math/test-float128-y1
testing _Float128 (without inline functions)
Failure: Test: y1_downward (0x1.c1badep+0)
Result:
 is: -2.49850711930108135145795303826944004e-01 
-0x1.ffb1bae4fa20118544b142160f5fp-3
 should be:  -2.49850711930108135145795303826943836e-01 
-0x1.ffb1bae4fa20118544b142160f58p-3
 difference:  1.68518870133883137142398069976181140e-34  
0x1.c000p-113
 ulp   :  7.
 max.ulp   :  5.
Maximal error of `y1_downward'
 is  : 7 ulp
 accepted: 5 ulp

Test suite completed:
  216 test cases plus 212 tests for exception flags and
212 tests for errno executed.
  2 errors occurred.

Building the e_j1f128.os object with -O2 or with gcc-11 fixes the failure.

Not sure how to reduce this to a smaller test case, but I'm happy to try
things.

[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs

2022-08-09 Thread andrew at sifive dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338

Andrew Waterman  changed:

   What|Removed |Added

 CC||andrew at sifive dot com

--- Comment #5 from Andrew Waterman  ---
(I don't want to make the static chain register part of the RISC-V ABI; the
status quo seems fine.)

[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines

2022-08-09 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573

David Malcolm  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2022-08-09

--- Comment #1 from David Malcolm  ---
I'm working on a fix for this.

[Bug c++/106207] [11/12/13 Regression] ICE in apply_fixit, at edit-context.cc:769

2022-08-09 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106207

--- Comment #2 from Marek Polacek  ---
Reduced:

#define FOO(no)  \
void f_##no() \
{ \
  int gen_##no(); \
}

#define GEN_FOO \
FOO(f##1) \
FOO(f##2)

GEN_FOO

[Bug analyzer/106573] New: Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines

2022-08-09 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573

Bug ID: 106573
   Summary: Missing -Wanalyzer-use-of-uninitialized-value on calls
handled by state machines
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: dmalcolm at gcc dot gnu.org
CC: mir at gcc dot gnu.org
  Target Milestone: ---

Consider:

int dup (int old_fd);
int not_dup (int old_fd);

int
test_1 ()
{
  int m;
  return dup (m);
}

int
test_2 ()
{
  int m;
  return not_dup (m);
}

where in each function uninitialized local "m" is passed to an
externally-defined function.

-fanalyzer currently emits:

t.c: In function ‘test_1’:
t.c:8:10: warning: ‘dup’ on possibly invalid file descriptor ‘m’
[-Wanalyzer-fd-use-without-check]
8 |   return dup (m);
  |  ^~~
  ‘test_1’: event 1
|
|8 |   return dup (m);
|  |  ^~~
|  |  |
|  |  (1) ‘m’ could be invalid
|
t.c: In function ‘test_2’:
t.c:15:10: warning: use of uninitialized value ‘m’ [CWE-457]
[-Wanalyzer-use-of-uninitialized-value]
   15 |   return not_dup (m);
  |  ^~~
  ‘test_2’: events 1-2
|
|   14 |   int m;
|  |   ^
|  |   |
|  |   (1) region created on stack here
|   15 |   return not_dup (m);
|  |  ~~~
|  |  |
|  |  (2) use of uninitialized value ‘m’ here
|

where it only complains about uninit m being passed to not_dup.

Looks like we're missing a check for poisoned svalues as params for the case
where one of the state machines recognizes the function in question.

[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #4 from Andrew Pinski  ---
The only reason why aarch64 changed their static chain register was because it
was used for TLS on darwin (and IIRC on VXWorks). 

static chain is not part of the ABI (unless RISCV folks want to do that). And
there are no PLTs between the function calls.

[Bug d/102765] [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code

2022-08-09 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765

--- Comment #6 from Iain Buclaw  ---
r13-2002 (and r12-8673) is a start that sows the seeds to make the codegen
option -fno-weak-templates the default.  Should just be a case of extending the
forced emission to all instantiations too.

[Bug d/104317] D language: rt.config module doesn't work as expected in GDC 9/10 (multiple definition linker error)

2022-08-09 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104317

--- Comment #3 from Iain Buclaw  ---
(In reply to Siarhei Siamashka from comment #2)
> I first tried to toggle "flag_weak_templates" in "gcc/d/lang.opt" from 1 to
> 0 in GDC11 instead of reverting PR99914, but the resulting toolchain was
> unable to compile and link even the most simple applications due to missing
> symbols from Phobos.
> 

r13-2002 (and r12-8673) is a start that sows the seeds to make the codegen
option -fno-weak-templates the default.  Should just be a case of extending the
forced emission to all instantiations too.

[Bug c++/101421] ICE: in lookup_template_class_1, at cp/pt.c:10005

2022-08-09 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101421

Marek Polacek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Marek Polacek  ---
Fixed by r13-1390-g07ac550393d00f.

[Bug c/77876] -Wbool-operation rejects useful code involving '~'

2022-08-09 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77876

Marek Polacek  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|mpolacek at gcc dot gnu.org|unassigned at gcc dot 
gnu.org

--- Comment #2 from Marek Polacek  ---
Clearly I never got to this PR.  clang also issues the warning, and I think
it'd be better to simply use '!' rather than '~'.  I have no plans to change
the warning, sorry.

[Bug c/106569] enhancement: use STL algorithm instead of a raw loop

2022-08-09 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569

--- Comment #4 from David Binderman  ---
(In reply to Martin Liška from comment #3)
> > My best guess is that if gcc trunk is written in some recent version of C++,
> > then all that recent version can be used.
> 
> We are written in C++11, is std::find_if available in the given standard?

Yes, there is *a* version of std::find_if available. Some additional
work might be needed to verify exact match.

[Bug target/65372] -mprofile-kernel undocumented

2022-08-09 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65372

Nick Desaulniers  changed:

   What|Removed |Added

 CC||ndesaulniers at google dot com,
   ||nemanja.i.ibm at gmail dot com

--- Comment #1 from Nick Desaulniers  ---
I filed a feature request to get this implemented in Clang, since the Linux
kernel uses it for the ppc port.  The immediate request was for documentation
about the change.
https://github.com/llvm/llvm-project/issues/57031

[Bug fortran/106566] [OpenMP] declare simd fails with with bogus "already been host associated" for module procedures

2022-08-09 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106566

Tobias Burnus  changed:

   What|Removed |Added

   Keywords||accepts-invalid
Summary|[OpenMP]|[OpenMP] declare simd fails
   ||with with bogus "already
   ||been host associated" for
   ||module procedures

--- Comment #1 from Tobias Burnus  ---
Additionally, the following is not diagnosed – at least not for this example.

"For Fortran, a declarative directive must appear after any USE, IMPORT, and
IMPLICIT statements in a declarative context."

(The original example shows this issue. This is reported by other compilers and
is being fixed on the OpenMP examples side.)


Example - *FAILS* ("has already been host associated") but is *VALID*

module m
   integer, parameter :: NN = 1023
   integer :: a(NN)

 contains
   subroutine add_one2(p)
   implicit none   !  valid - must before declare
   !$omp declare simd(add_one2) linear(p: ref) simdlen(8)
   integer :: p

   p = p + 1
   end subroutine
end module


The following is *COMPILING* - as there is no MODULE:


   subroutine add_one2(p)
   !$omp declare simd(add_one2) linear(p: ref) simdlen(8)
   implicit none  !  invalid because after declare.
   integer :: p

   p = p + 1
   end subroutine

Note: This example is on purpose invalid as 'implicit none' has been moved
after 'omp declare'. Otherwise, it would be valid.

[Bug c/106569] enhancement: use STL algorithm instead of a raw loop

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569

--- Comment #3 from Martin Liška  ---
> My best guess is that if gcc trunk is written in some recent version of C++,
> then all that recent version can be used.

We are written in C++11, is std::find_if available in the given standard?

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread quanhua.liu at noaa dot gov via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #9 from Quanhua Liu  ---
Hi Richard,

It seems that I cannot add comment online to the ticket.
I tried
    gfortran -o z -O3 -march=native test_matrixCal.f90 -fexternal-blas 
-lblas -fdump-tree-optimized
   time a.out 1
   and
    time a.out 2
Both are very slow ( 6s in comparison to previous 0.8 s using method 2).
I don't know which blab on my machine is.

On your machine, can you help to test
   BB = transpose(B)
   C = matmul(A,BB)
  using gfortran -O3 test_matrixCal.f90
  time a.out  2
against test
   C = matmul(A, transpose(B) )
using any option or blas timing?

The timing depends on machine. It would be great helpful if you can 
provide the timing for the two methods from your site

Thank you!

Quanhua Liu
On 8/9/2022 1:53 PM, sgk at troutmask dot apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
>
> --- Comment #7 from Steve Kargl  ---
> On Tue, Aug 09, 2022 at 05:17:57PM +, quanhua.liu at noaa dot gov wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
>>
>> --- Comment #5 from Quanhua Liu  ---
>> Hi Richard,
>>
>> Using -fexternal-blas for gfortran v10.3.0 is much slower than
>> the method 2:
>>     BB = transpose(B)
>>     C = matmul(A, BB)
>>
>> How about on your machine?
>>
>>> If you are doing a problem of this size or larger, you want to use the
>>> -fexternal-blas option and link in OpenBLAS.
>
> I wrote "and link in OpenBLAS".
>
>>> I added timing code and replicated the loop to both in one go.
>>>
>>> % gfcx -o z -O3 -march=native a.f90 && ./z
>>>  1.16500998   1615.08594
>>>  5.32258606   1615.08020
>
>>> % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z
>>>  2.44668889   1615.08301
>>>  1.99379802   1615.08301
> Method 1 is faster with OpenBLAS.
>

[Bug c++/81159] New warning idea: -Wself-move

2022-08-09 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81159

Marek Polacek  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #8 from Marek Polacek  ---
Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599503.html

[Bug tree-optimization/98954] ((X << CST0) & CST1) == 0 is not optimized to 0 == (X & (CST1 >> CST0))

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98954

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c

commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c
Author: Roger Sayle 
Date:   Tue Aug 9 18:54:43 2022 +0100

middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd.  Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).

This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled.  If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).

Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137.  I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla).  This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.

2022-08-09  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.

gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.

[Bug tree-optimization/21137] Convert (a >> 2) & 1 != 0 into a & 4 != 0

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21137

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c

commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c
Author: Roger Sayle 
Date:   Tue Aug 9 18:54:43 2022 +0100

middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd.  Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).

This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled.  If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).

Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137.  I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla).  This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.

2022-08-09  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.

gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.

[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #14 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c

commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c
Author: Roger Sayle 
Date:   Tue Aug 9 18:54:43 2022 +0100

middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd.  Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).

This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled.  If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).

Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137.  I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla).  This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.

2022-08-09  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.

gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #8 from Steve Kargl  ---
On Tue, Aug 09, 2022 at 05:51:51PM +, sgk at troutmask dot
apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
> 
> --- Comment #6 from Steve Kargl  ---
> On Tue, Aug 09, 2022 at 05:14:16PM +, quanhua.liu at noaa dot gov wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
> > 
> > --- Comment #4 from Quanhua Liu  ---
> > Using 
> > gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90
> 
> Which BLAS are you using?  If you are using BLAS from
> Netlib, then of course you'll likely get poor results
> as the Netlib BLAS is not tuned. 
> 

Even netlib blas is ok.

 gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lblas -fdump-tree-optimized
&& ./z
   1.41149306   1615.08020
   1.50036991   1615.08020

[Bug c/106560] ICE after conflicting types of redeclaration

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106560

--- Comment #7 from Andrew Pinski  ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Pinski from comment #3)
> > Here is the simple fix, I will submit it this weekend.
> > [apinski@xeond2 gcc]$ git diff
> > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > index f0fbdb48012..d9ada8e0f9e 100644
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -6012,6 +6012,11 @@ gimplify_modify_expr (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
> >   || TREE_CODE (*expr_p) == INIT_EXPR);
> > 
> > +  if (TREE_TYPE (*from_p) == error_mark_node)
> 
>   if (error_operand_p (*from_p))

Oh Ok, There was a few places which check directly against error_mark_node.
gimplify_decl_expr and gimplify_save_expr for example. I will submit a patch to
fix those too.

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #7 from Steve Kargl  ---
On Tue, Aug 09, 2022 at 05:17:57PM +, quanhua.liu at noaa dot gov wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
> 
> --- Comment #5 from Quanhua Liu  ---
> Hi Richard,
> 
> Using -fexternal-blas for gfortran v10.3.0 is much slower than
> the method 2:
>    BB = transpose(B)
>    C = matmul(A, BB)
> 
> How about on your machine?
> 
> >
> > If you are doing a problem of this size or larger, you want to use the
> > -fexternal-blas option and link in OpenBLAS.


I wrote "and link in OpenBLAS".

> > I added timing code and replicated the loop to both in one go.
> >
> > % gfcx -o z -O3 -march=native a.f90 && ./z
> > 1.16500998   1615.08594
> > 5.32258606   1615.08020


> > % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z
> > 2.44668889   1615.08301
> > 1.99379802   1615.08301

Method 1 is faster with OpenBLAS.

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #6 from Steve Kargl  ---
On Tue, Aug 09, 2022 at 05:14:16PM +, quanhua.liu at noaa dot gov wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
> 
> --- Comment #4 from Quanhua Liu  ---
> Using 
> gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90

Which BLAS are you using?  If you are using BLAS from
Netlib, then of course you'll likely get poor results
as the Netlib BLAS is not tuned. 

I specifically wrote  use OpenBLAS 

OpenBLAS is likely tuned for whatever hardware you have.

% gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas \
   -fdump-tree-optimized && ./z
   2.44969702   1615.08301
   2.00995278   1615.08301

The use of matmal(..., transpose()) is the fastest on a AMD FX(tm)-8350,

% grep gemm z-a.f90.252t.optimized 
  sgemm (&"N"[1]{lb: 1 sz: 1}, &"N"[1]{lb: 1 sz: 1}, , , ,
, , , , , , , , 1, 1);
  sgemm (&"N"[1]{lb: 1 sz: 1}, &"T"[1]{lb: 1 sz: 1}, , , ,
, , , , , , , , 1, 1);

[Bug c/106571] Implement -Wsection diag

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571

--- Comment #3 from Andrew Pinski  ---
(In reply to Boris from comment #2)
> How can you check a mismatch if only the definition has the section
> attribute?

You don't need to.

> 
> Here's the kernel commit which fixes this for clang:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=db886979683a8360ced9b24ab1125ad0c4d2cf76
> 
> there's the same example in the commit message.

Oh I see the section here has more semantics than the normal section attribute
does. There should be an enhancement request for a new attribute which does
more than the current section attribute instead.

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread quanhua.liu at noaa dot gov via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #5 from Quanhua Liu  ---
Hi Richard,

Using -fexternal-blas for gfortran v10.3.0 is much slower than
the method 2:
   BB = transpose(B)
   C = matmul(A, BB)

How about on your machine?

Thanks,

Quanhua Liu
On 8/9/2022 11:07 AM, kargl at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
>
> kargl at gcc dot gnu.org changed:
>
> What|Removed |Added
> 
>   CC||kargl at gcc dot gnu.org
>
> --- Comment #3 from kargl at gcc dot gnu.org ---
>
>>INTEGER, PARAMETER :: m = 200, n = 300, nn = 150
>>REAL :: A(m,n), B(nn,n), C(m,nn), BB(n,nn)
>>INTEGER :: i, j, k, L
>
> If you are doing a problem of this size or larger, you want to use the
> -fexternal-blas option and link in OpenBLAS.
>
> I added timing code and replicated the loop to both in one go.
>
> % gfcx -o z -O3 -march=native a.f90 && ./z
> 1.16500998   1615.08594
> 5.32258606   1615.08020
> % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z
> 2.44668889   1615.08301
> 1.99379802   1615.08301
>

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread quanhua.liu at noaa dot gov via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #4 from Quanhua Liu  ---
Using 
gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90
time a.out  1
real:  6.14 (s)
time a.out  2
real: 5.41

It is 6 times slower than
  BB = transpose(B)
  C = matmul(A, BB)

ifort doesn't have the problem.

[Bug c/106571] Implement -Wsection diag

2022-08-09 Thread bp at alien8 dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571

--- Comment #2 from Boris  ---
How can you check a mismatch if only the definition has the section attribute?

Here's the kernel commit which fixes this for clang:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=db886979683a8360ced9b24ab1125ad0c4d2cf76

there's the same example in the commit message.

[Bug c/106571] Implement -Wsection diag

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-09
 Ever confirmed|0   |1
   Severity|normal  |enhancement
 Status|UNCONFIRMED |WAITING
  Component|other   |c

--- Comment #1 from Andrew Pinski  ---
This example seems not to be correct as this is section is not needed on the
declaration only the definition.

If this is the example, I think the warning is incorrect and should not be
implemented in GCC.

[Bug ipa/105360] Inlined lazy parameters / delegate literals, still emitted

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Andrew Pinski  ---
Dup of bug 89139.

*** This bug has been marked as a duplicate of bug 89139 ***

[Bug ipa/89139] GCC emits code for static functions that aren't used by the optimized code

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89139

Andrew Pinski  changed:

   What|Removed |Added

 CC||witold.baryluk+gcc at gmail 
dot co
   ||m

--- Comment #8 from Andrew Pinski  ---
*** Bug 105360 has been marked as a duplicate of this bug. ***

[Bug ipa/105360] Inlined lazy parameters / delegate literals, still emitted

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|ibuclaw at gdcproject dot org  |unassigned at gcc dot 
gnu.org
 CC||marxin at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=94818
   Severity|normal  |enhancement
  Component|d   |ipa

--- Comment #4 from Andrew Pinski  ---
Or PR 94818.

[Bug d/105360] Inlined lazy parameters / delegate literals, still emitted

2022-08-09 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360

Iain Buclaw  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=80680,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=99373

--- Comment #3 from Iain Buclaw  ---
Possibly a duplicate of pr80680 or pr99373.

[Bug d/105360] Inlined lazy parameters / delegate literals, still emitted

2022-08-09 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360

--- Comment #2 from Iain Buclaw  ---
Looks like it's a middle-end missed-optimization, not a D front-end one.

https://godbolt.org/z/5WWYEG4jW

Perhaps we need an extra DCE pass?

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #36 from Andrew Pinski  ---
You might need to do -O2 -fPIE -pie to reproduce the issue as debian is
configured with --enable-default-pie

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> >which blows up the command line for the compilation. 
> 
> You can use a response file and that won't blow up the command line at all.
> 
> That is:
> g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' ' >
> cxxflags.opt
> 
> g++ @cxxflags.opt 

Oh and you don't need the tr either that is any whitespace in a response file
is will be treated as a seperator.
So just:
g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' > cxxflags.opt

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

--- Comment #5 from Andrew Pinski  ---
>which blows up the command line for the compilation. 

You can use a response file and that won't blow up the command line at all.

That is:
g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' ' >
cxxflags.opt

g++ @cxxflags.opt 

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

kargl at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #3 from kargl at gcc dot gnu.org ---

>   INTEGER, PARAMETER :: m = 200, n = 300, nn = 150
>   REAL :: A(m,n), B(nn,n), C(m,nn), BB(n,nn)
>   INTEGER :: i, j, k, L


If you are doing a problem of this size or larger, you want to use the
-fexternal-blas option and link in OpenBLAS.

I added timing code and replicated the loop to both in one go.

% gfcx -o z -O3 -march=native a.f90 && ./z
   1.16500998   1615.08594
   5.32258606   1615.08020
% gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z
   2.44668889   1615.08301
   1.99379802   1615.08301

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread j.badwaik--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

--- Comment #4 from Jayesh Badwaik  ---
I don't think any of the previous bug reports address the requirements that
this bug report does. This is not about production runs, this is about
development workflow. Unless the position is that users should not use any
warnings apart from `-Wall -Wextra` ever, the user has to look at what warnings
the compiler offers. 

The current method is a very manual method where I have to browse through the
whole GCC page and get the list of warnings and then manually put them into my
command line to see if any of the code in my repository triggers those
warnings. It will save everyone's time and effort if there was a switch to do
that. It is therefore, actually very useful.

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread j.badwaik--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

--- Comment #3 from Jayesh Badwaik  ---
I don't think any of the previous bug reports address the requirements that
this bug report does. This is not about production runs, this is about
development workflow. Unless the position is that users should not use any
warnings apart from `-Wall -Wextra` ever, the user has to look at what warnings
the compiler offers. 

The current method is a very manual method where I have to browse through the
whole GCC page and get the list of warnings and then manually put them into my
command line to see if any of the code in my repository triggers those
warnings. It will save everyone's time and effort if there was a switch to do
that.

[Bug target/106554] -fstack-usage result too low for variadic function on Arm

2022-08-09 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106554

Eric Botcazou  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-08-09

--- Comment #2 from Eric Botcazou  ---
> IIRC there's target support code eventually missing for some targets.  The
> -fstack-usage documentation isn't clear how exact the result is supposed to
> be.

It must always be conservatively correct, so it's certainly a bug.

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=31573

--- Comment #2 from Andrew Pinski  ---
-Weverything is useless and was decided years ago gcc was not going to add it.
See PR 31573.

[Bug c++/106572] A programmatic list of all possible compiler warnings

2022-08-09 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek  ---
Probably a dup of bug 31573.

[Bug c++/106572] New: A programmatic list of all possible compiler warnings

2022-08-09 Thread j.badwaik--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572

Bug ID: 106572
   Summary: A programmatic list of all possible compiler warnings
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: j.badw...@fz-juelich.de
  Target Milestone: ---

It would be an excellent workflow to run your code with `-Weverything` once in
a while just to check which new warnings are triggered by your code. Then,
depending on whether they are useful or not, one can incorporate those warnings
in their normal workflow. 

The alternative is to go through the release notes of every GCC release
everytime and then try to see if there is a warning which interests you. While
this is doable, it requires manual effort, with a possibility that you find no
warning which is useful. Also, depending on the code, it might not be possible
to play with all compiler warnings as soon as the compiler is released, since
you might want to wait for your code to be able to compile with the compiler
before you go there.
All of this makes for a very clumsy workflow with a lot of manual reminders
about what needs to be done.

The `-Weverything` allows for someone to schedule say a monthly CI job which
automatically runs the build with `-Weverything -Werror`. Any new compilers
added and any new warning which affects the current code will automatically be
detected. The user can then make a decision on whether the warning makes enough
sense for them to be used in their production runs.  

Currently, the way to get a list of all warning is very cumbersome. One has to
do:
> g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' '

which blows up the command line for the compilation. 

The request would be to provide either a `-Weverything` flag like clang does or
a `g++ --list-every-warning` to list all warnings in a format which can then be
passed to the compiler.

[Bug modula2/106443] Many 32-bit tests FAIL to link on Solaris/sparcv9

2022-08-09 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106443

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Gaius Mulley  ---
> I've pushed a fix to devel/modula2 to fix multilib install (seen on amd64).  
> It
> now builds and installs multilib.  Prior to this fix the 32 bit libraries were
> installed over the 64 bit libraries when multilib was enabled.  Curious as to
> whether this fixes the linking bugs on Solaris.

It did indeed: I've tried both sparcv9-sun-solaris2.11 and
i386-pc-solaris2.11 builds and the results are fine (rought 15 to 20
failures per multilib on both sparc and x86).

However, I still needed the gcc.cc patch from

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598822.html

to allow the 32-bit-default build to succeed.

Thanks.
Rainer

[Bug tree-optimization/106457] array_at_struct_end_p returns TRUE for a two-dimension array which is not inside any structure

2022-08-09 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106457

--- Comment #9 from qinzhao at gcc dot gnu.org ---
one more testing case failed with the current array_at_struct_end_p
is:gcc/testsuite/gcc.dg/torture/pr50067-2.c:
  1 /* { dg-do run } */
  2 
  3 /* Make sure data-dependence analysis does not compute a bogus
  4   distance vector for the different sized accesses.  */
  5 
  6 extern int memcmp(const void *, const void *, __SIZE_TYPE__);
  7 extern void abort (void);
  8 short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
  9 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, };
 10 int main()
 11 {
 12 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 13   int i;
 14   if (sizeof (short) == 2)
 15 {
 16   for (i = 0; i < 32; ++i)
 17 {
 18   a[i] = (*((char(*)[32])[0]))[i+8];
 19 }
 20   if (memcmp (, , sizeof (a)) != 0)
 21 abort ();
 22 }
 23 #endif
 24   return 0;
 25 }

In the above, at line 18: (*((char(*)[32])[0]))[i+8] was identified as TRUE:
Breakpoint 1, array_at_struct_end_p (ref=0xf57a2b18) at
../../latest_gcc/gcc/tree.cc:12690
12690 if (TREE_CODE (ref) == ARRAY_REF
(gdb) call debug_tree(ref)
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xf57d03f0 precision:8 min  max 
pointer_to_this >

arg:0 
BLK
size 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set 0 canonical-type
0xf59950b8 domain 
pointer_to_this >

arg:0 
constant arg:0 >
arg:1 
   
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:12
start:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:11
finish:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:33>
arg:1 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xf57d05e8 precision:32 min  max

pointer_to_this >
visited
def_stmt _1 = i_5 + 8;
version:1>
   
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:34
start:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:11
finish:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:38>
...
(gdb) n
12801   return true;

[Bug tree-optimization/106457] array_at_struct_end_p returns TRUE for a two-dimension array which is not inside any structure

2022-08-09 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106457

--- Comment #8 from qinzhao at gcc dot gnu.org ---
another testing case failed with the current array_at_struct_end_p is:
gcc/testsuite/gcc.dg/torture/pr50067-1.c:
  1 /* { dg-do run } */
  2 
  3 /* Make sure data-dependence analysis does not compute a bogus
  4distance vector for the different sized accesses.  */
  5 
  6 extern int memcmp(const void *, const void *, __SIZE_TYPE__);
  7 extern void abort (void);
  8 short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
  9 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, };
 10 int main()
 11 {
 12 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 13   int i;
 14   if (sizeof (short) == 2)
 15 {
 16   for (i = 0; i < 32; ++i)
 17 (*((unsigned short(*)[32])[0]))[i] =
(*((char(*)[32])[0]))[i+8];
 18   if (memcmp (, , sizeof (a)) != 0)
 19 abort ();
 20 }
 21 #endif
 22   return 0;
 23 }

In the above, the array ref at line 17: (*((char(*)[32])[0]))[i+8] was
identified as TRUE by the current array_at_struct_end_p:
12690 if (TREE_CODE (ref) == ARRAY_REF
(gdb) call debug_tree(ref)
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xf57d03f0 precision:8 min  max 
pointer_to_this >

arg:0 
BLK
size 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set 0 canonical-type
0xf5994d70 domain 
pointer_to_this >

arg:0 
constant arg:0 >
arg:1 
   
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:42
start:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:41
finish:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:63>
arg:1 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xf57d05e8 precision:32 min  max

pointer_to_this >
visited
def_stmt _1 = i_13 + 8;
version:1
ptr-info 0xf59fee20>
   
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:64
start:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:41
finish:
/home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:68>
(gdb) n
12801   return true;
(gdb)

[Bug c/106569] enhancement: use STL algorithm instead of a raw loop

2022-08-09 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569

--- Comment #2 from David Binderman  ---
(In reply to Richard Biener from comment #1)
> I find those less obvious, for example does std::any_of guarantee some
> evaluation order?

I also find any_of less obvious, but that's because my working knowledge
of C++ stopped about 20 years ago.

According to

https://cplusplus.com/reference/algorithm/any_of/

there is no guarantee of evaluation order.

My best guess is that if gcc trunk is written in some recent version of C++,
then all that recent version can be used.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #35 from Mathieu Malaterre  ---
(In reply to Mathieu Malaterre from comment #33)
> (In reply to Kewen Lin from comment #32)
> > (In reply to Mathieu Malaterre from comment #30)
> > > (In reply to Martin Liška from comment #29)
> > > > (In reply to Kewen Lin from comment #28)
> > > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > > 
> > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > > 
> > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > > 
> > > I could see unit-test failures of highway on most 32bits arch, as well as
> > > mips64el and ppc64be.
> > 
> > Thanks to both guys! I'll try with ppc64 32bit first.
> 
> Watch out that I've reduced the original test case on my local x86/32bits
> arch.
> 
> It appears that I've lifted way too much code to reproduce the issue on
> ppc32/be. Is is ok for you to use instead, reproducer from previous comment:
> 
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

Nevermind; I was using gcc-11.

I can reproduce the issue on ppc32/be using the (somewhat) reduced example:

* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c19

For reference:

% g++ -O2 -fno-tree-vectorize *.cc && ./a.out && echo "ok"
ok

But:

% g++ --verbose -O2 *.cc && ./a.out && echo "ok"
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-linux-gnu/12/lto-wrapper
Target: powerpc-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=powerpc-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --with-libphobos-druntime-only=yes
--enable-objc-gc=auto --enable-secureplt --disable-softfloat
--with-cpu=default32 --disable-softfloat
--enable-targets=powerpc-linux,powerpc64-linux --enable-multiarch
--disable-werror --with-long-double-128 --enable-multilib
--enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu
--target=powerpc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Debian 12.1.0-7)
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch
powerpc-linux-gnu -D_GNU_SOURCE bytes.cc -msecure-plt -quiet -dumpdir a-
-dumpbase bytes.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/include/powerpc-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/powerpc-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/../../../../powerpc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/powerpc-linux-gnu/c++/12
 /usr/include/c++/12/backward
 /usr/lib/gcc/powerpc-linux-gnu/12/include
 /usr/local/include
 /usr/include/powerpc-linux-gnu
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 56cdbc606649bdc6108da73e5dd1af6f
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 as -v -a32 -K PIC -mppc -many -mbig -o /tmp/ccKx6rlb.o /tmp/ccXa9nGd.s
GNU assembler version 2.38.90 (powerpc-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch
powerpc-linux-gnu -D_GNU_SOURCE demo.cc -msecure-plt -quiet -dumpdir a-
-dumpbase demo.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate 

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread quanhua.liu at noaa dot gov via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #2 from Quanhua Liu  ---
I modified the application code (see below) and use the "method" as a control
variable from command line.
I use the same code for both gfortran 10.3.0 and ifort 19.0.5.281
  gfortran -O3 matrixCal.f90
  time a.out  1
  time a.out  2
  ifort -O3 matrixCal.f90
  time a.out  1
  time a.out  2
where method 1, C = matmul(A, transpose(B) )
 method 2, BB = transpose(B),  C = matmul(A, BB)
  The timing is given in the table below.
As you can see, using gfortran, method '2' is 6 times faster than the method
'1'.
Using ifort, method '2' is very similar to the method '1'. '1' is slightly fast
because '2' may copy B to BB.

Timing
compiler   gfortran  ifort
method1 2   1  2
real6.28 0.79  0.80   0.83

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #34 from Mathieu Malaterre  ---
(In reply to Mathieu Malaterre from comment #33)
> (In reply to Kewen Lin from comment #32)
> > (In reply to Mathieu Malaterre from comment #30)
> > > (In reply to Martin Liška from comment #29)
> > > > (In reply to Kewen Lin from comment #28)
> > > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > > 
> > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > > 
> > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > > 
> > > I could see unit-test failures of highway on most 32bits arch, as well as
> > > mips64el and ppc64be.
> > 
> > Thanks to both guys! I'll try with ppc64 32bit first.
> 
> Watch out that I've reduced the original test case on my local x86/32bits
> arch.
> 
> It appears that I've lifted way too much code to reproduce the issue on
> ppc32/be. Is is ok for you to use instead, reproducer from previous comment:
> 
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

It appears this one is also way too much lifted for proper repro on ppc32/be.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #33 from Mathieu Malaterre  ---
(In reply to Kewen Lin from comment #32)
> (In reply to Mathieu Malaterre from comment #30)
> > (In reply to Martin Liška from comment #29)
> > > (In reply to Kewen Lin from comment #28)
> > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > 
> > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > 
> > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > 
> > I could see unit-test failures of highway on most 32bits arch, as well as
> > mips64el and ppc64be.
> 
> Thanks to both guys! I'll try with ppc64 32bit first.

Watch out that I've reduced the original test case on my local x86/32bits arch.

It appears that I've lifted way too much code to reproduce the issue on
ppc32/be. Is is ok for you to use instead, reproducer from previous comment:

* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

[Bug c/106569] enhancement: use STL algorithm instead of a raw loop

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569

--- Comment #1 from Richard Biener  ---
I find those less obvious, for example does std::any_of guarantee some
evaluation order?

[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.2

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |12.2
   Keywords||wrong-code

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #32 from Kewen Lin  ---
(In reply to Mathieu Malaterre from comment #30)
> (In reply to Martin Liška from comment #29)
> > (In reply to Kewen Lin from comment #28)
> > > Sorry for the breakage, I'll have a look tomorrow.
> > > 
> > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > 
> > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> 
> I could see unit-test failures of highway on most 32bits arch, as well as
> mips64el and ppc64be.

Thanks to both guys! I'll try with ppc64 32bit first.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #31 from Mathieu Malaterre  ---
(In reply to Mathieu Malaterre from comment #30)
> (In reply to Martin Liška from comment #29)
> > (In reply to Kewen Lin from comment #28)
> > > Sorry for the breakage, I'll have a look tomorrow.
> > > 
> > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > 
> > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> 
> I could see unit-test failures of highway on most 32bits arch, as well as
> mips64el and ppc64be.

For reference complete list is:

* armel
* i386
* mips64el
* mipsel
* powerpc
* ppc64

See:

*
https://buildd.debian.org/status/logs.php?pkg=highway=1.0.1%7Egit20220802.5810c58-3=experimental


(riscv64 is unrelated IMHO).

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #30 from Mathieu Malaterre  ---
(In reply to Martin Liška from comment #29)
> (In reply to Kewen Lin from comment #28)
> > Sorry for the breakage, I'll have a look tomorrow.
> > 
> > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> 
> No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.

I could see unit-test failures of highway on most 32bits arch, as well as
mips64el and ppc64be.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #29 from Martin Liška  ---
(In reply to Kewen Lin from comment #28)
> Sorry for the breakage, I'll have a look tomorrow.
> 
> btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?

No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.

[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75

2022-08-09 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570

--- Comment #2 from Andrew Macleod  ---
I think this is a duplicate of PR106379 .   At the VRP2 stage I see:

   [local count: 1073741824]:
  if (c_6(D) == s_7(D))
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 365072224]:
  _1 = ~c_6(D);
  _2 = _1 & s_7(D);
  if (_2 != 0)
goto ; [75.00%]
  else
goto ; [25.00%]

   [local count: 628138969]:
  DCEMarker0_ ();

   [local count: 1073741824]:
  return;

Which is basically the identical sequence.. it just took longer to get to it
:-)  We aren't removing this yet with ranger as I need to get to integrate
rangers relation oracle with the simplifier so that it will see that  _2 = ~s_7
& s_7.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

--- Comment #28 from Kewen Lin  ---
Sorry for the breakage, I'll have a look tomorrow.

btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?

[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-09
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Summary|DCE sometimes fails with|[12/13 Regression] DCE
   |depending if statements |sometimes fails with
   ||depending if statements
   ||since
   ||r12-2305-g398572c1544d8b75
 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com,
   ||marxin at gcc dot gnu.org

--- Comment #1 from Martin Liška  ---
Started with r12-2305-g398572c1544d8b75.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #27 from Martin Liška  ---
Crashes also w/ -fno-strict-aliasing.

[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Martin Liška  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org
Summary|tree-vectorize: Wrong code  |[12/13 Regression]
   |at O2 level |tree-vectorize: Wrong code
   |(-fno-tree-vectorize is |at O2 level
   |working)|(-fno-tree-vectorize is
   ||working) since
   ||r12-2404-ga1d27560770818c5
 Status|WAITING |NEW

--- Comment #26 from Martin Liška  ---
Cool! I can reproduce it now with:

$ g++ *.cc -O3 -m32 -mtune=generic -march=i686 && ./a.out
Aborted (core dumped)


and it started with r12-2404-ga1d27560770818c5.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #25 from Mathieu Malaterre  ---
(In reply to Martin Liška from comment #24)
> > sid64 %  g++ *.cc -O2 -m32 && ./a.out
> 
> Please provide output with --verbose.

% g++ --verbose *.cc -O2 -m32 && ./a.out
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Debian 12.1.0-7)
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch
i386-linux-gnu -D_GNU_SOURCE bytes.cc -quiet -dumpdir a- -dumpbase bytes.cc
-dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version
-fasynchronous-unwind-tables -o /tmp/cccQJh1u.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i386-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/x86_64-linux-gnu/c++/12/32
 /usr/include/c++/12/backward
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 8a56007e6299a53b3d2bb12e46ecf480
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 as -v --32 -o /tmp/ccG1Wx1X.o /tmp/cccQJh1u.s
GNU assembler version 2.38.90 (x86_64-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch
i386-linux-gnu -D_GNU_SOURCE demo.cc -quiet -dumpdir a- -dumpbase demo.cc
-dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version
-fasynchronous-unwind-tables -o /tmp/cccQJh1u.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i386-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/x86_64-linux-gnu/c++/12/32
 /usr/include/c++/12/backward
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) 

[Bug target/106524] [12/13 Regression] ICE in extract_insn, at recog.cc:2791 (error: unrecognizable insn) since r12-4349-ge36206c9940d22.

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106524

Martin Liška  changed:

   What|Removed |Added

Summary|[12/13 Regression] ICE in   |[12/13 Regression] ICE in
   |extract_insn, at|extract_insn, at
   |recog.cc:2791 (error:   |recog.cc:2791 (error:
   |unrecognizable insn)|unrecognizable insn) since
   ||r12-4349-ge36206c9940d22.
 Ever confirmed|0   |1
 CC||marxin at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-08-09

--- Comment #1 from Martin Liška  ---
Started with r12-4349-ge36206c9940d22.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #24 from Martin Liška  ---
> sid64 %  g++ *.cc -O2 -m32 && ./a.out

Please provide output with --verbose.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #23 from Mathieu Malaterre  ---
Nevermind; I can reproduce the issue with a sid/amd64 chroot:

stable64 % schroot -c sid64
sid64 % g++ --version
g++ (Debian 12.1.0-7) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


sid64 %  g++ *.cc -O2 -m32 && ./a.out
zsh: IOT instruction  ./a.out

I'll report against Debian bugtracker for now.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #22 from Uroš Bizjak  ---
(In reply to Martin Liška from comment #20)
> Hmm, can't reproduce with x86_64 compiler with -m32:
> 
> $ g++ --version
> g++ (SUSE Linux) 12.1.1 20220721 [revision
> 4f15d2234608e82159d030dadb17af678cfad626
> ...
> $ g++ *.cc -O2 -m32 && ./a.out && echo Ok
> Ok

Do you need -msse2 to actually enable vectorization?

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #21 from Mathieu Malaterre  ---
(In reply to Martin Liška from comment #20)
> Hmm, can't reproduce with x86_64 compiler with -m32:
> 
> $ g++ --version
> g++ (SUSE Linux) 12.1.1 20220721 [revision
> 4f15d2234608e82159d030dadb17af678cfad626
> ...
> $ g++ *.cc -O2 -m32 && ./a.out && echo Ok
> Ok

I also confirm the behavior over here. However my x86 binary produces the
expected 'abort' from my multi-arch amd64.

There is no point in attaching *.o here, right ? A quick check seems to
indicate that the issue is:

schroot-32 $ g++ -O2 -c -o demo.o demo.cc
schroot-32 $ 
amd64 $ g++ -O2 -m32 -c -o bytes.o bytes.cc
amd64 $ g++ -O2 -m32 -o demo demo.o bytes.o
amd64 $ ./demo
zsh: abort  ./demo

[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol

2022-08-09 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563

Iain Buclaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Iain Buclaw  ---
Fix committed.

[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563

--- Comment #3 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Iain Buclaw
:

https://gcc.gnu.org/g:79a86a608691621659b3ce3a24a72aeea4054668

commit r12-8673-g79a86a608691621659b3ce3a24a72aeea4054668
Author: Iain Buclaw 
Date:   Tue Aug 9 12:48:14 2022 +0200

d: Fix undefined reference to pragma(inline) symbol (PR106563)

Functions that are declared `pragma(inline)' should be treated as if
they are defined in every translation unit they are referenced from,
regardless of visibility protection.  Ensure they always get
DECL_ONE_ONLY linkage, and start emitting them into other modules that
import them.

PR d/106563

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun
before generating its symbol.
(function_defined_in_root_p): New function.
(function_needs_inline_definition_p): New function.
(maybe_build_decl_tree): New function.
(get_symbol_decl): Call maybe_build_decl_tree before returning
symbol.
(start_function): Use function_defined_in_root_p instead of inline
test for locally defined symbols.
(set_linkage_for_decl): Check for inline functions before private
or
protected symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/torture.exp (srcdir): New proc.
* gdc.dg/torture/imports/pr106563math.d: New test.
* gdc.dg/torture/imports/pr106563regex.d: New test.
* gdc.dg/torture/imports/pr106563uni.d: New test.
* gdc.dg/torture/pr106563.d: New test.

(cherry picked from commit 04284176d549ff2565406406a6d53ab4ba8e507d)

[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Iain Buclaw :

https://gcc.gnu.org/g:04284176d549ff2565406406a6d53ab4ba8e507d

commit r13-2002-g04284176d549ff2565406406a6d53ab4ba8e507d
Author: Iain Buclaw 
Date:   Tue Aug 9 12:48:14 2022 +0200

d: Fix undefined reference to pragma(inline) symbol (PR106563)

Functions that are declared `pragma(inline)' should be treated as if
they are defined in every translation unit they are referenced from,
regardless of visibility protection.  Ensure they always get
DECL_ONE_ONLY linkage, and start emitting them into other modules that
import them.

PR d/106563

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun
before generating its symbol.
(function_defined_in_root_p): New function.
(function_needs_inline_definition_p): New function.
(maybe_build_decl_tree): New function.
(get_symbol_decl): Call maybe_build_decl_tree before returning
symbol.
(start_function): Use function_defined_in_root_p instead of inline
test for locally defined symbols.
(set_linkage_for_decl): Check for inline functions before private
or
protected symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/torture.exp (srcdir): New proc.
* gdc.dg/torture/imports/pr106563math.d: New test.
* gdc.dg/torture/imports/pr106563regex.d: New test.
* gdc.dg/torture/imports/pr106563uni.d: New test.
* gdc.dg/torture/pr106563.d: New test.

[Bug tree-optimization/106523] [10/11/12/13 Regression] forwprop miscompile

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
Started with 4.9.0.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #20 from Martin Liška  ---
Hmm, can't reproduce with x86_64 compiler with -m32:

$ g++ --version
g++ (SUSE Linux) 12.1.1 20220721 [revision
4f15d2234608e82159d030dadb17af678cfad626
...
$ g++ *.cc -O2 -m32 && ./a.out && echo Ok
Ok

[Bug sanitizer/106558] ASan failed to detect a global-buffer-overflow

2022-08-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106558

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-09
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from Martin Liška  ---
Might be related to PR 82501.

[Bug preprocessor/106426] UTF-8 character literals do not have unsigned type in the preprocessor in -fchar8_t mode

2022-08-09 Thread tom at honermann dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106426

--- Comment #3 from Tom Honermann  ---
I believe this issue can be resolved as fixed via commit
053876cdbe8057210e6f4da4eec2df58f92ccd4c for the gcc 13 release.

[Bug other/106571] New: Implement -Wsection diag

2022-08-09 Thread bp at alien8 dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571

Bug ID: 106571
   Summary: Implement -Wsection diag
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bp at alien8 dot de
  Target Milestone: ---

Hi,

clang has this -Wsection diag which does:

https://clang.llvm.org/docs/DiagnosticsReference.html#wsection

It would be good to have it in gcc too so that declarations like

extern u64 x86_spec_ctrl_current;

for variable definitions which belong to a specific section:

__attribute__((section(".data..percpu" ""))) __typeof__(u64)
x86_spec_ctrl_current;

get caught:

arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on
redeclared variable [-Werror,-Wsection]

Thx.

[Bug tree-optimization/106570] New: DCE sometimes fails with depending if statements

2022-08-09 Thread tmayerl at student dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570

Bug ID: 106570
   Summary: DCE sometimes fails with depending if statements
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tmayerl at student dot ethz.ch
  Target Milestone: ---

Sometimes, DCE fails when multiple if statements are used. 

For example, GCC detects that the following if statements always evaluate to
false and thus removes the dead code:

#include 
#include 

void DCEMarker0_();

void f(bool s, bool c) {
if (!c == !s) {
if (s && !c) {
DCEMarker0_();
}
}
}

In the next snippet, the if statements are used to set a variable. This
variable is then used in the next if statement. However, GCC now fails to
detect and eliminate the dead code:

#include 
#include 

void DCEMarker0_();

void f(bool s, bool c) {
int intermediate_result = 0;
if (!c == !s) {
if (s && !c) {
intermediate_result = 1;
}
}
if (((!c == !s) && (s && !c)) || intermediate_result) {
DCEMarker0_();
}
}

This is actually a regression: It works fine until GCC 11.3.

This can also be seen via the following Compiler Explorer link:
https://godbolt.org/z/n9dKMfqsd

[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #7 from Richard Biener  ---
For the testcase m_imports is so big because we have

...
   [local count: 1073741824]:
  # c_1198 = PHI 
  _599 = MEM[(unsigned int *)b_1201(D) + 2792B];
  d_2401 = _599 + d_2399;
  if (d_2399 > d_2401)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  c_2402 = c_1198 + 1;

   [local count: 1073741824]:
  # c_1199 = PHI 
  _600 = MEM[(unsigned int *)b_1201(D) + 2796B];
  d_2403 = _600 + d_2401;
  if (d_2401 > d_2403)
goto ; [50.00%]
  else
goto ; [50.00%]

so when back_threader::find_paths does ->compute_imports (.., bb 1200) we
walk up the whole d_2403 definition chain unbound (for PHIs we restrict
to edges on the path which is empty).  I realize that there's no good way
to pick up extra imports on the fly cheaply - we could handle it when
we prune local defs from the imports at which point we could add operands
but it's not clear to me that will be a good trade-off.  In fact
pruning imports looks suspicious as the final path-range query will
be limited there?  Likewise for any import we add via PHI-translation
we fail to add local def operands - we're only getting those from the
initial import compute which basically picks those from blocks dominating
the exit but no others.

I will experiment with re-wiring this.

[Bug target/103498] Spec 2017 imagick_r is 2.62% slower on Power10 with pc-relative addressing compared to not using pc-relative addressing

2022-08-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103498

--- Comment #2 from Segher Boessenkool  ---
Mike, do you still see this?

[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #6 from Richard Biener  ---
So one now needs to bump the limit to 60 to get enough samples for perf.  Then
we now see

Samples: 55K of event 'cycles:u', Event count (approx.): 49013411833
Overhead   Samples  Command  Shared Object Symbol   
  51.19% 28195  cc1  cc1   [.]
path_range_query::compute_ranges_in_block
  11.67%  6427  cc1  cc1   [.]
path_range_query::adjust_for_non_null_uses
   9.20%  5069  cc1  cc1   [.]
path_range_query::range_defined_in_block
   3.39%  1869  cc1  cc1   [.] bitmap_set_bit
   1.95%  1072  cc1  cc1   [.]
back_threader::find_paths_to_names
   1.93%  1066  cc1  cc1   [.] bitmap_bit_p

the compute_ranges_in_block is also top with 30 but adjust_for_non_null_uses
pops up newly with 60.

The compute_ranges_in_block slowness is attributed to

  // ...and then the rest of the imports.
  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
{
  tree name = ssa_name (i);
  Value_Range r (TREE_TYPE (name));

  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
  && range_defined_in_block (r, name, bb))


plus

  gori_compute  = m_ranger->gori ();
  bitmap exports = g.exports (bb);
  EXECUTE_IF_AND_IN_BITMAP (m_imports, exports, 0, i, bi)
{
  tree name = ssa_name (i);
  Value_Range r (TREE_TYPE (name));
  if (g.outgoing_edge_range_p (r, e, name, *this))


for this testcase there seem to be a lot of imports but not many exports
so range_defined_in_block is called very many times compared to
outgoing_edge_range_p but the latter is comparatively more expensive.

For the path query I wonder why we are interested in computing (aka
updating the cache) for any but the exports?  When we
compute the exports, why is the cache not lazily computed just for
the interesting names?  AFAICS we invalidate all local defs (but even
then, why?  we get to see a def exactly once, why do we have to even
think about clearing sth we should not have seen?)

That is, in path_range_query::compute_ranges

  while (1)
{
  basic_block bb = curr_bb ();

  compute_ranges_in_block (bb);
  adjust_for_non_null_uses (bb);

  if (at_exit ())
break;

  move_next ();
}

I'd expect only a small portion of the actual compute_ranges_in_block
work to be done for all blocks and the real resolving work only for
the block ending the path?  Maybe the backwards threader is just using
the wrong (expensive) API here?  It does

 m_solver->compute_ranges (path, m_imports);
 m_solver->range_of_stmt (r, cond);


--

Btw, I wondered if path-range-query can handle parts of the path being a
"black box", aka, skip to the immediate dominator instead of one of the
predecessor edges?  I _think_ analysis wise this would be quite straight
forward but of course we'd have to represent this somehow in the path.
Maybe it works by simply leaving out the intermediate blocks?  Thus,

   B
   |\
   A
  / \
 C   D
  \ /
   E
\

the path would be from B to E but we don't care whether we go the C or D
way, and when duplicating the path we'd simply duplicate the whole diamond
instead of duplicating only one branch, say A->D, and keeping the edge
A->C to the original block C, defeating the threading of E to its successor
if we happen to go that way.

[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query

2022-08-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:409978d58dafa689c5b3f85013e2786526160f2c

commit r13-1998-g409978d58dafa689c5b3f85013e2786526160f2c
Author: Richard Biener 
Date:   Mon Aug 8 12:20:04 2022 +0200

tree-optimization/106514 - add --param max-jump-thread-paths

The following adds a limit for the exponential greedy search of
the backwards jump threader.  The idea is to limit the search
space in a way that the paths considered are the same if the search
were in BFS order rather than DFS.  In particular it stops considering
incoming edges into a block if the product of the in-degrees of
blocks on the path exceeds the specified limit.

When considering the low stmt copying limit of 7 (or 1 in the size
optimize case) this means the degenerate case with maximum search
space is a sequence of conditions with no actual code

  B1
   |\
   | empty
   |/
  B2
   |\
   ...
  Bn
   |\

GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so
we reach 7 already with 3 middle conditions (B1 and Bn do not count).
The search space would be 2^4 == 16 to reach this.  The FSM threads
historically allowed for a thread length of 10 but is really looking
for a single multiway branch threaded across the backedge.  I've
chosen the default of the new parameter to 64 which effectively
limits the outdegree of the switch statement (the cases reaching the
backedge) to that number (divided by 2 until I add some special
pruning for FSM threads due to the loop header indegree).  The
testcase ssa-dom-thread-7.c requires 56 at the moment (as said,
some special FSM thread pruning of considered edges would bring
it down to half of that), but we now get one more threading
and quite some more in later threadfull.  This testcase seems to
be difficult to check for expected transforms.

The new testcases add the degenerate case we currently thread
(without deciding whether that's a good idea ...) plus one with
an approripate limit that should prevent the threading.

This obsoletes the mentioned --param max-fsm-thread-length but
I am not removing it as part of this patch.  When the search
space is limited the thread stmt size limit effectively provides
max-fsm-thread-length.

The param with its default does not help PR106514 enough to unleash
path searching with the higher FSM stmt count limit.

PR tree-optimization/106514
* params.opt (max-jump-thread-paths): New.
* doc/invoke.texi (max-jump-thread-paths): Document.
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Honor max-jump-thread-paths, take overall_path argument.
(back_threader::find_paths): Pass 1 as initial overall_path.

* gcc.dg/tree-ssa/ssa-thread-16.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-17.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.

[Bug c++/106567] [13 Regression] An array with a dependent type and initializer-deduced bound is treated as an array of unknown bound when captured in a lambda

2022-08-09 Thread m.cencora at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106567

m.cencora at gmail dot com changed:

   What|Removed |Added

 CC||m.cencora at gmail dot com

--- Comment #4 from m.cencora at gmail dot com ---
Seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93259

[Bug rtl-optimization/106568] -freorder-blocks-algorithm appears to causes a crash in stable code, no way to disable it

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106568

--- Comment #21 from Richard Biener  ---
Try -fsanitize=unreachable - when reordering BBs makes crashes appear/disappear
the most likely culprit is we run into a path deemed unreachable which means we
fall through to random code.

You can also try looking at the -fdump-tree-optimized dump and find the
function that's not catching what it is supposed to catch to see if there's any
__builtin_unreachable () calls around.

[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-09 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #19 from Mathieu Malaterre  ---
Without hwy dependency:

 % more Makefile bytes.cc demo.cc
::
Makefile
::
CXXFLAGS := -O2

demo: demo.o bytes.o
$(CXX) $(CXXFLAGS) -o $@ $^

clean:
rm -f bytes.o demo.o
::
bytes.cc
::
#include 

bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) {
  return memcmp(bytes1, bytes2, size) == 0;
}
::
demo.cc
::
#include 
#include 
#include 
#include 
#include 
#include 

#define HWY_ALIGNMENT 64
constexpr size_t kAlignment = HWY_ALIGNMENT;
constexpr size_t kAlias = kAlignment * 4;

bool BytesEqual(const void *p1, const void *p2, const size_t size);

namespace hwy {
namespace N_EMU128 {
template  struct Vec128 {
  T raw[16 / sizeof(T)] = {};
};
} // namespace N_EMU128
} // namespace hwy

template 
static void Store(const hwy::N_EMU128::Vec128 v,
  T *__restrict__ aligned) {
  __builtin_memcpy(aligned, v.raw, sizeof(T) * N);
}

template 
static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) {
  hwy::N_EMU128::Vec128 v;
  __builtin_memcpy(v.raw, aligned, sizeof(T) * N);
  return v;
}

template 
static hwy::N_EMU128::Vec128
MulHigh(hwy::N_EMU128::Vec128 a,
const hwy::N_EMU128::Vec128 b) {
  for (size_t i = 0; i < N; ++i) {
// Cast to uint32_t first to prevent overflow. Otherwise the result of
// uint16_t * uint16_t is in "int" which may overflow. In practice the
// result is the same but this way it is also defined.
a.raw[i] = static_cast(
(static_cast(a.raw[i]) * static_cast(b.raw[i])) >>
16);
  }
  return a;
}

#define HWY_ASSERT(condition) assert((condition))
#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align))

#pragma pack(push, 1)
struct AllocationHeader {
  void *allocated;
  size_t payload_size;
};
#pragma pack(pop)

static void FreeAlignedBytes(const void *aligned_pointer) {
  HWY_ASSERT(aligned_pointer != nullptr);
  if (aligned_pointer == nullptr)
return;

  const uintptr_t payload = reinterpret_cast(aligned_pointer);
  HWY_ASSERT(payload % kAlignment == 0);
  const AllocationHeader *header =
  reinterpret_cast(payload) - 1;

  free(header->allocated);
}

class AlignedFreer {
public:
  template  void operator()(T *aligned_pointer) const {
FreeAlignedBytes(aligned_pointer);
  }
};

template 
using AlignedFreeUniquePtr = std::unique_ptr;

static inline constexpr size_t ShiftCount(size_t n) {
  return (n <= 1) ? 0 : 1 + ShiftCount(n / 2);
}

namespace {
static size_t NextAlignedOffset() {
  static std::atomic next{0};
  constexpr uint32_t kGroups = kAlias / kAlignment;
  const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) %
kGroups;
  const size_t offset = kAlignment * group;
  HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias);
  //  std::cerr << "O: " << offset << std::endl;
  return offset;
}
} // namespace

static void *AllocateAlignedBytes(const size_t payload_size) {
  HWY_ASSERT(payload_size != 0); // likely a bug in caller
  if (payload_size >= std::numeric_limits::max() / 2) {
HWY_ASSERT(false && "payload_size too large");
return nullptr;
  }

  size_t offset = NextAlignedOffset();

  // What: | misalign | unused | AllocationHeader |payload
  // Size: |<= kAlias | offset|payload_size
  //   ^allocated.^aligned.^header^payload
  // The header must immediately precede payload, which must remain aligned.
  // To avoid wasting space, the header resides at the end of `unused`,
  // which therefore cannot be empty (offset == 0).
  if (offset == 0) {
offset = kAlignment; // = RoundUpTo(sizeof(AllocationHeader), kAlignment)
static_assert(sizeof(AllocationHeader) <= kAlignment, "Else: round up");
  }

  const size_t allocated_size = kAlias + offset + payload_size;
  void *allocated = malloc(allocated_size);
  HWY_ASSERT(allocated != nullptr);
  if (allocated == nullptr)
return nullptr;
  // Always round up even if already aligned - we already asked for kAlias
  // extra bytes and there's no way to give them back.
  uintptr_t aligned = reinterpret_cast(allocated) + kAlias;
  static_assert((kAlias & (kAlias - 1)) == 0, "kAlias must be a power of 2");
  static_assert(kAlias >= kAlignment, "Cannot align to more than kAlias");
  aligned &= ~(kAlias - 1);

  const uintptr_t payload = aligned + offset; // still aligned

  // Stash `allocated` and payload_size inside header for FreeAlignedBytes().
  // The allocated_size can be reconstructed from the payload_size.
  AllocationHeader *header = reinterpret_cast(payload) - 1;
  header->allocated = allocated;
  header->payload_size = payload_size;

  //printf("%d-byte aligned addr: %p\n", kAlignment,
reinterpret_cast(payload));
  return HWY_ASSUME_ALIGNED(reinterpret_cast(payload), kAlignment);
}

template  static T *AllocateAlignedItems(size_t items) {
  constexpr size_t size = 

[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow

2022-08-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-09
  Known to fail||12.1.0
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1
Version|unknown |10.3.0

--- Comment #1 from Richard Biener  ---
Confirmed also with gfortran 12.  The issue is that with the combined
matmul+transpose we invoke matmul with an array descriptor representing the
transpose operation which results in suboptimal memory access patterns.

Can you check whether ifort does the transpose separately or whether its
matmul library routine simply special-cases the situation?

[Bug analyzer/106551] [13 Regression] dup2 causes -fanalyzer ICE in valid_to_unchecked_state, at analyzer/sm-fd.cc:751

2022-08-09 Thread mir at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106551

--- Comment #2 from Immad Mir  ---
 Sergei Trofimovich: Thanks for bringing the issue to our attention.

Dave: I've sent a patch via gcc-patches.

[Bug c/106569] New: enhancement: use STL algorithm instead of a raw loop

2022-08-09 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569

Bug ID: 106569
   Summary: enhancement: use STL algorithm instead of a raw loop
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Static analyser cppcheck can produce these style messages for gcc trunk source
code:

$ fgrep useStlAlgorithm cppcheck.20220809.out
trunk.git/gcc/analyzer/call-string.cc:169:9: style: Consider using
std::count_if algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/analyzer/constraint-manager.cc:2454:0: style: Consider using
std::find_if algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/analyzer/region-model-manager.cc:1230:0: style: Consider using
std::any_of algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/analyzer/region.cc:1245:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/cp/constexpr.cc:348:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/cp/constexpr.cc:5965:8: style: Consider using std::find_if
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/cp/constexpr.cc:8991:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/rtl-ssa/change-utils.h:28:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/rtl-ssa/blocks.cc:347:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/rtl-ssa/accesses.cc:1507:7: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/gcc/rtl-ssa/member-fns.inl:854:0: style: Consider using std::any_of
algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/libsanitizer/hwasan/hwasan_thread_list.h:120:20: style: Consider
using std::find_if algorithm instead of a raw loop. [useStlAlgorithm]
trunk.git/libsanitizer/hwasan/hwasan_report.cpp:293:0: style: Consider using
std::find_if algorithm instead of a raw loop. [useStlAlgorithm]
$ 

None, some or all of these might be worth fixing.

I suspect it would not be worthwhile to implement this style warning in gcc.

  1   2   >