[Bug d/108842] New: Cannot use enum array with -fno-druntime

2023-02-17 Thread zach-gcc at cs dot stanford.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108842

Bug ID: 108842
   Summary: Cannot use enum array with -fno-druntime
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: d
  Assignee: ibuclaw at gdcproject dot org
  Reporter: zach-gcc at cs dot stanford.edu
  Target Milestone: ---

I have test.d:

```
enum int[] x = [0, 1, 2];
```

and an object.d:

```
module object;
```

I get an error when I try to compile:

```
$ gdc -c -fno-druntime test.d
test.d:1:16: error: expression '[0, 1, 2]' requires 'object.TypeInfo' and
cannot be used with '-fno-rtti'
1 | enum int[] x = [0, 1, 2];
  |^
test.d:1:16: error: 'object.TypeInfo' could not be found, but is implicitly
used
1 | enum int[] x = [0, 1, 2];
  |^
```

This compiles fine with DMD and LDC with `-betterC` and the same object.d
(custom runtime).

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931

--- Comment #13 from Andrew Pinski  ---
(In reply to ishikawa,chiaki from comment #11)
> What is exactly the compiler-defined macro when "-Og" is used on the command
> line?

There is not one ...

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=61782

--- Comment #12 from Andrew Pinski  ---
(In reply to rguent...@suse.de from comment #7)
> It's even documented (I've marked the relevant sentence with
> ---> ... <---):
> 
> @item always_inline
> @cindex @code{always_inline} function attribute
> Generally, functions are not inlined unless optimization is specified.
> For functions declared inline, this attribute inlines the function
> independent of any restrictions that otherwise apply to inlining.
> Failure to inline such a function is diagnosed as an error.
> ---> Note that if such a function is called indirectly the compiler may
> or may not inline it depending on optimization level and a failure
> to inline an indirect call may or may not be diagnosed. <---

And been documented that way since 2014 even (r5-1859-g3defdb14996a82 aka PR
61782)

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-17 Thread ishikawa at yk dot rim.or.jp via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931

--- Comment #11 from ishikawa,chiaki  ---
Created attachment 54484
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54484=edit
Script to compile the previous source file.

The previous source file ought to be named
't-failure-always-inline-simplified.c'.
This script compiles it.
With -Og the compilation fails since always_inline functions do not get
inlined.
Without -Og, the compilation succeeds.

Of course, we can conditionalize the use of always_inline to avoid the issue.
What is exactly the compiler-defined macro when "-Og" is used on the command
line?

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-17 Thread ishikawa at yk dot rim.or.jp via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931

ishikawa,chiaki  changed:

   What|Removed |Added

 CC||ishikawa at yk dot rim.or.jp

--- Comment #10 from ishikawa,chiaki  ---
Created attachment 54483
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54483=edit
The source file that exhibits the failure to inline always_inline functions.

I have a similar problem compiling Thunderbird mail client with GCC-12.
A couple of always_inline functions cannot be inlined and errors are diagnosed.

I am a bit perplexed that gcc-10 and gcc-11 did not seem to have the problem.
Also, I am wondering if I am seeing the same issue discussed here or not
because I think the functions are declared in the proper topological order
(that is, the function that is called is declared in a manner that the callee
is defined before the caller.)
Yes, there is an indirect function call, that may be the reason of the failure.

I am attaching a preprocessed source file and the script to compile it to cause
the failure. 
The presence of -Og is essential. If we remove -Og, the compilation succeeds.

[Bug middle-end/83286] internal compiler error: Illegal instruction

2023-02-17 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83286

nightstrike  changed:

   What|Removed |Added

 CC||nightstrike at gmail dot com

--- Comment #9 from nightstrike  ---
I compiled both of the preprocessed source attachments and could not reproduce
your failure using a recently built gcc 13 cross compiler from linux to
x86_64-w64-mingw32.  I used no options, "-O3 -Wall -fno-strict-aliasing
-fomit-frame-pointer-funroll-loops -fPIC", and "-O3 -Wall
-fno-strict-aliasing" based on your command lines.

Is it safe to assume that this is no longer an issue?

[Bug target/92953] Undesired if-conversion with overflow builtins

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

--- Comment #5 from Andrew Pinski  ---
On x86_64 since the flags get clobbered with almost all instructions. Either
you do the subtraction twice or you use set instruction. GCC choses the later
... I suspect that is a general issue that shows up more than normal on x86_64
than any other target due to that.

[Bug tree-optimization/10520] induction variable analysis not used to eliminate comparisons

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10520

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108841

--- Comment #7 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #6)
> Also? To simplify things a little more?

I filed PR 108841 for that.

[Bug tree-optimization/108841] New: sometimes a < b && c < b is not optimized to MAX < b

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108841

Bug ID: 108841
   Summary: sometimes a < b && c < b is not optimized to MAX
< b
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
int f1(int a0, int a1, int b, int c0, int c1)
{
int a = a0 < a1 ? a1 : a0;
if (a < b) {
  int c = c0 < c1 ? c1 : c0;
  if (c < b)
return 0;
}
return 1;
}
int f2(int a0, int a1, int b, int c0, int c1)
{
int a = a0 < a1 ? a1 : a0;
int c = c0 < c1 ? c1 : c0;
if (a < b) {
  if (c < b)
return 0;
}
return 1;
}
```
These 2 functions should produce the same code, the only difference is the
calculation of c is not condtional.

[Bug tree-optimization/98966] Failure to optimize conditional or with 1 based on boolean condition to direct or

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98966

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |13.0

--- Comment #4 from Andrew Pinski  ---
Fixed by r13-4459-g6508d5e5a1a8c0 .

[Bug tree-optimization/98966] Failure to optimize conditional or with 1 based on boolean condition to direct or

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98966

--- Comment #3 from Gabriel Ravier  ---
Appears to be fixed on trunk.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #11 from Gabriel Ravier  ---
It appears like this is fixed on trunk, I think ?

[Bug rtl-optimization/96692] Failure to optimize xor+or+xor to andnot+xor

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96692

--- Comment #4 from Andrew Pinski  ---
(In reply to Gabriel Ravier from comment #3)
> This seems to be fixed on trunk now, I think ?

On x86_64-linux-gnu yes but on aarch64 it is not optimized just yet:
f(int, int, int):
eor w1, w0, w1
orr w0, w0, w2
eor w0, w1, w0
ret
f1(int, int, int):
bic w0, w2, w0
eor w0, w0, w1
ret

[Bug rtl-optimization/96692] Failure to optimize xor+or+xor to andnot+xor

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96692

--- Comment #3 from Gabriel Ravier  ---
This seems to be fixed on trunk now, I think ?

[Bug target/95427] Failure to avoid emitting rbp initialization when doing 256-bit memory store

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95427

--- Comment #2 from Gabriel Ravier  ---
Still appears to be fixed on trunk.

[Bug libstdc++/108836] std::mutex disappears in single-threaded libstdc++ builds

2023-02-17 Thread pdimov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108836

--- Comment #4 from Peter Dimov  ---
A compromise between no mutex at all, and a mutex that is silently a no-op,
could be a no-op mutex with [[deprecated]] members, although the atomic_flag is
probably better.

[Bug target/94908] Failure to optimally optimize certain shuffle patterns

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=53346,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=93720
  Component|tree-optimization   |target

--- Comment #4 from Andrew Pinski  ---
I think this was a target issue and maybe should be split into a couple
different bugs.

For GCC 8, aarch64 produces:
dup v0.4s, v0.s[1]
ldr q1, [sp, 16]
ldp x29, x30, [sp], 32
ins v0.s[1], v1.s[1]
ins v0.s[2], v1.s[2]
ins v0.s[3], v1.s[3]


For GCC 9/10 did (which is ok, though could be improved which it did in GCC
11):
adrpx0, .LC0
ldr q1, [sp, 16]
ldr q2, [x0, #:lo12:.LC0]
ldp x29, x30, [sp], 32
tbl v0.16b, {v0.16b - v1.16b}, v2.16b
For GCC 11+, aarch64 produces:
ldr q1, [sp, 16]
ins v1.s[0], v0.s[1]
mov v0.16b, v1.16b


Which means for aarch64, this was changed in GCC 10 and fixed fully for GCC 11
(by r11-2192-gc9c87e6f9c795b aka PR 93720 which was my patch in fact).

For x86_64, the trunk produces:

movaps  (%rsp), %xmm1
addq$24, %rsp
shufps  $85, %xmm1, %xmm0
shufps  $232, %xmm1, %xmm0

While for GCC 12 produces:

movaps  (%rsp), %xmm1
addq$24, %rsp
shufps  $85, %xmm0, %xmm0
movaps  %xmm1, %xmm2
shufps  $85, %xmm1, %xmm2
movaps  %xmm2, %xmm3
movaps  %xmm1, %xmm2
unpckhps%xmm1, %xmm2
unpcklps%xmm3, %xmm0
shufps  $255, %xmm1, %xmm1
unpcklps%xmm1, %xmm2
movlhps %xmm2, %xmm0

This was changed with r13-2843-g3db8e9c2422d92 (aka PR 53346).

For powerpc64le, it looks ok for GCC 11:
addis 9,2,.LC0@toc@ha
addi 1,1,48
addi 9,9,.LC0@toc@l
li 0,-16
lvx 0,0,9
vperm 2,31,2,0

Both the x86_64 and the PowerPC PERM implementation could be improved to
support the inseration like the aarch64 backend does too.

[Bug tree-optimization/94899] Failure to optimize out add before compare with INT_MIN

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Andrew Pinski  ---
Fixed.

[Bug middle-end/19987] [meta-bug] fold missing optimizations in general

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
Bug 19987 depends on bug 94899, which changed state.

Bug 94899 Summary: Failure to optimize out add before compare with INT_MIN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #3 from Gabriel Ravier  ---
Looks like this gives much better output now.

[Bug tree-optimization/94899] Failure to optimize out add before compare with INT_MIN

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899

--- Comment #7 from Gabriel Ravier  ---
I don't know if I've missed something obvious but this still appears to be
fixed.

[Bug c/108375] [10/11/12/13 Regression] Some variably modified types not detected as such

2023-02-17 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108375

--- Comment #8 from Martin Uecker  ---
PATCH: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612245.html

[Bug c++/108243] [10/11/12/13 Regression] Missed optimization for static const std::string_view(const char*)

2023-02-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108243

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:5fea1be820508e1fbc610d1a54b61c1add33c36f

commit r13-6120-g5fea1be820508e1fbc610d1a54b61c1add33c36f
Author: Patrick Palka 
Date:   Fri Feb 17 15:18:10 2023 -0500

c++: speculative constexpr and is_constant_evaluated [PR108243]

This PR illustrates that __builtin_is_constant_evaluated currently acts
as an optimization barrier for our speculative constexpr evaluation,
since we don't want to prematurely fold the builtin to false before the
expression in question undergoes manifestly constant evaluation if
appropriate (in which case the builtin must instead be folded to true).

This patch fixes this by permitting __builtin_is_constant_evaluated to
get folded as false at appropiate points, namely during cp_fold_function
and cp_fully_fold_init where we know we're done with manifestly constant
evaluation.  The function cp_fold gets a flags parameter that controls
whether we pass mce_false or mce_unknown to maybe_constant_value when
folding a CALL_EXPR.

PR c++/108243
PR c++/97553

gcc/cp/ChangeLog:

* cp-gimplify.cc (enum fold_flags): Define.
(fold_flags_t): Declare.
(cp_fold_data::genericize): Replace this data member with ...
(cp_fold_data::fold_flags): ... this.
(cp_fold_r): Adjust use of cp_fold_data and calls to cp_fold.
(cp_fold_function): Likewise.
(cp_fold_maybe_rvalue): Add an internal overload that
additionally takes and propagates a fold_flags_t parameter, and
define the existing public overload in terms of it.
(cp_fold_rvalue): Likewise.
(cp_fully_fold_init): Adjust use of cp_fold_data.
(fold_cache): Replace with ...
(fold_caches): ... this 2-element array of caches.
(get_fold_cache): Define.
(clear_fold_cache): Adjust.
(cp_fold): Add fold_flags_t parameter.  Use get_fold_cache.
Pass flags to calls to cp_fold, cp_fold_rvalue and
cp_fold_maybe_rvalue.
: If ff_mce_false is set, fold
__builtin_is_constant_evaluated to false and pass mce_false to
maybe_constant_value.

gcc/testsuite/ChangeLog:

* g++.dg/opt/is_constant_evaluated1.C: New test.
* g++.dg/opt/is_constant_evaluated2.C: New test.

[Bug c++/97553] [missed optimization] constexprness not noticed when UBsan enabled

2023-02-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:5fea1be820508e1fbc610d1a54b61c1add33c36f

commit r13-6120-g5fea1be820508e1fbc610d1a54b61c1add33c36f
Author: Patrick Palka 
Date:   Fri Feb 17 15:18:10 2023 -0500

c++: speculative constexpr and is_constant_evaluated [PR108243]

This PR illustrates that __builtin_is_constant_evaluated currently acts
as an optimization barrier for our speculative constexpr evaluation,
since we don't want to prematurely fold the builtin to false before the
expression in question undergoes manifestly constant evaluation if
appropriate (in which case the builtin must instead be folded to true).

This patch fixes this by permitting __builtin_is_constant_evaluated to
get folded as false at appropiate points, namely during cp_fold_function
and cp_fully_fold_init where we know we're done with manifestly constant
evaluation.  The function cp_fold gets a flags parameter that controls
whether we pass mce_false or mce_unknown to maybe_constant_value when
folding a CALL_EXPR.

PR c++/108243
PR c++/97553

gcc/cp/ChangeLog:

* cp-gimplify.cc (enum fold_flags): Define.
(fold_flags_t): Declare.
(cp_fold_data::genericize): Replace this data member with ...
(cp_fold_data::fold_flags): ... this.
(cp_fold_r): Adjust use of cp_fold_data and calls to cp_fold.
(cp_fold_function): Likewise.
(cp_fold_maybe_rvalue): Add an internal overload that
additionally takes and propagates a fold_flags_t parameter, and
define the existing public overload in terms of it.
(cp_fold_rvalue): Likewise.
(cp_fully_fold_init): Adjust use of cp_fold_data.
(fold_cache): Replace with ...
(fold_caches): ... this 2-element array of caches.
(get_fold_cache): Define.
(clear_fold_cache): Adjust.
(cp_fold): Add fold_flags_t parameter.  Use get_fold_cache.
Pass flags to calls to cp_fold, cp_fold_rvalue and
cp_fold_maybe_rvalue.
: If ff_mce_false is set, fold
__builtin_is_constant_evaluated to false and pass mce_false to
maybe_constant_value.

gcc/testsuite/ChangeLog:

* g++.dg/opt/is_constant_evaluated1.C: New test.
* g++.dg/opt/is_constant_evaluated2.C: New test.

[Bug c++/108833] [11/12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

--- Comment #9 from Jakub Jelinek  ---
Better reduced testcase that doesn't emit a warning:
struct input_t {
  template  struct range_t {
friend int >>(int &, range_t &);
range_t(char);
  };
  int read_s;
  void read() {
range_t range(':');
read_s >> range;
  }
};
int >>(int &, input_t::range_t &);

[Bug c++/108833] [11/12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

--- Comment #8 from Jakub Jelinek  ---
Therefore, likely dup of PR106740, but I think we want reduced testcases from
both PRs on the trunk and on branches eventually when it is fixed there too.

[Bug c++/108833] [11/12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[12 Regression] internal|[11/12 Regression] internal
   |compiler error: |compiler error:
   |Segmentation fault (GCC |Segmentation fault (GCC
   |12.1.1) |12.1.1)
   Target Milestone|12.3|11.4

--- Comment #7 from Jakub Jelinek  ---
Ah, on the trunk r13-1017 also started to ICE on this testcase (r13-1016 was
fine),
but already r13-1018 fixed that.  While r13-1017 was backported to 12 and 11
branches (current 11.x ICEs too), r13-1018 was not.

[Bug rtl-optimization/98334] Failure to optimally optimize add loop to mul

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98334

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0

--- Comment #7 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #5)
> Fixed at the RTL level, keeping open for the GIMPLE optimization.

For the testcase in comment #1 this is recorded as PR 94782.

For the original testcase in comment #0, I don't know when it was fixed on the
trunk but in sccp we get now:

final value replacement:
  result_2 = PHI 
 with expr: (int) n_4(D) * i_6(D)
 final stmt:
  result_2 = _1 * i_6(D);

instead of:

final value replacement:
  result_2 = PHI 
 with expr: (int) (n_3(D) + 4294967295) * i_6(D) + i_6(D)
 final stmt:
  result_2 = _9 + i_6(D);

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

--- Comment #4 from Andrew Pinski  ---
The RTL level for x86_64 was fixed with
r11-6456-g4615cde5d7ef281d4b554df411f82ad707f0a54d (aka PR 98334).

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||TREE

--- Comment #3 from Andrew Pinski  ---
It is fixed on the RTL level but not the gimple level.


Combine does it for x86_64:
Trying 7, 8 -> 9:
7: {r91:SI=r93:SI-0x1;clobber flags:CC;}
  REG_DEAD r93:SI
  REG_UNUSED flags:CC
8: {r92:SI=r91:SI*r94:SI;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r91:SI
9: r90:SI=r92:SI+r94:SI
  REG_DEAD r94:SI
  REG_DEAD r92:SI
Successfully matched this instruction:
(set (reg:SI 90)
(mult:SI (reg:SI 93)
(reg:SI 94)))
allowing combination of insns 7, 8 and 9
original costs 4 + 12 + 4 = 20
replacement cost 12

But it fails to do it on aarch64:

Trying 7 -> 14:
7: r101:SI=r103:SI-0x1
  REG_DEAD r103:SI
   14: x0:SI=r101:SI*r104:SI+r104:SI
  REG_DEAD r101:SI
  REG_DEAD r104:SI

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

--- Comment #2 from Gabriel Ravier  ---
Appears to be fixed on trunk.

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
So, to be exact, this ICE has been introduced by
r12-8467-ge057d454db4dcf48c22f75e57599f797d8e55baf
on the 12 branch.

[Bug target/108840] Aarch64 doesn't optimize away shift counter masking

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-02-17
   See Also|https://gcc.gnu.org/bugzill |
   |a/show_bug.cgi?id=91202 |
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization

--- Comment #1 from Andrew Pinski  ---
Confirmed:

Trying 8 -> 10:
8: r93:SI=r108:SI&0x1f
  REG_DEAD r108:SI
   10: r101:SI=r102:SI<

[Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

Bug ID: 108840
   Summary: Aarch64 doesn't optimize away shift counter masking
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

As mentioned in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612214.html
aarch64 doesn't optimize away and instructions masking shift count if there is
more than one shift with the same count.  Consider -O2 -fno-tree-vectorize:
int
foo (int x, int y)
{
  return x << (y & 31);
}

void
bar (int x[3], int y)
{
  x[0] <<= (y & 31);
  x[1] <<= (y & 31);
  x[2] <<= (y & 31);
}

void
baz (int x[3], int y)
{
  y &= 31;
  x[0] <<= y;
  x[1] <<= y;
  x[2] <<= y;
}

void corge (int, int, int);

void
qux (int x, int y, int z, int n)
{
  n &= 31;
  corge (x << n, y << n, z >> n);
}

foo is optimized correctly, combine matches the shift with masking, but in the
rest of cases due to costs the desirable combination is rejected.  Shift with
embedded masking of the count should have rtx_cost the same as normal shift
when it is actually under the hood the shift itself.

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

--- Comment #5 from Jakub Jelinek  ---
20220507 works (Fedora 12.1.1-1), 20230210 ICEs, on godbolt 12.2 ICEs
(supposedly 20220819), 20220628 ICEs (Fedora 12.1.1-2).

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords|needs-reduction |

--- Comment #4 from Jakub Jelinek  ---
Note, even latest 12 branch ICEs on this.  Tried various trunk snapshots from
13-1 up to latest in steps of ~ 500 and nothing ICEs though, so we'd need
bisection on the branch.

[Bug rtl-optimization/107949] PPC: Unnecessary rlwinm after lbzx

2023-02-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949

--- Comment #6 from Segher Boessenkool  ---
We generate loads into QImode regs, so we need to explicitly convert it to
whatever larger mode is wanted later.  We also have define_insns to do a
zero-extended load directly into a bigger pseudo, but that isn't used
apparently.

This is one instance of a much more generic problem; on rs6000 this is
usually observed as SImode being extended to DImode more often than
needed / wanted.

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Reduced testcase:
struct input_t {
  template  struct range_t {
friend int >>(int &, range_t &);
range_t(char);
  };
  int read_s;
  void read() {
range_t range(':');
read_s >> range;
  }
};
int >>(int &, input_t::range_t &);

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Marek Polacek  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||mpolacek at gcc dot gnu.org

--- Comment #2 from Marek Polacek  ---
Seems to have been fixed by

commit e8ed26c2ac38ab1f6ed5a627d9089a9243e06a0c
Author: Jason Merrill 
Date:   Tue Jun 7 15:52:30 2022 -0400

c++: non-templated friends [PR105852]

[Bug rtl-optimization/107949] PPC: Unnecessary rlwinm after lbzx

2023-02-17 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949

Peter Bergner  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org

--- Comment #5 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #4)
> How would GCC know no extension is needed?  The asm template is not parsed
> at all, by design.  Making h1 an unsigned char might solve it here?

The version with the inline asm isn't what Jens is worried about, that gives
the generated code he wants (ie, no rlwinm).  He asking why the fully C version
of the test case adds the unneeded rlwinm.

[Bug ipa/107925] ICE in update_specialized_profile at gcc/ipa-cp.cc:5082 for 531.deepsjeng_r benchmark

2023-02-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107925

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Martin Jambor  ---
The assert is bogus, the "new" division of unexplained counts in the
case of recursive functions so it can easily happen that what is left
is less than what we're trying to take away.  Having said that, there
are a few more issues with the function, chief among them not dropping
potentially guessed profiles to ipa.

I'm going to test the following:

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 4b8dedc0c51..5a6b41cf2d6 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -5093,22 +5093,24 @@ update_specialized_profile (struct cgraph_node
*new_node,
profile_count redirected_sum)
 { 
   struct cgraph_edge *cs;
-  profile_count new_node_count, orig_node_count = orig_node->count;
+  profile_count new_node_count, orig_node_count = orig_node->count.ipa ();

   if (dump_file)
 { 
   fprintf (dump_file, "the sum of counts of redirected  edges is ");
   redirected_sum.dump (dump_file);
+  fprintf (dump_file, "\nold ipa count of the original node is ");
+  orig_node_count.dump (dump_file);
   fprintf (dump_file, "\n");
 }
   if (!(orig_node_count > profile_count::zero ()))
 return;

-  gcc_assert (orig_node_count >= redirected_sum);
-  
   new_node_count = new_node->count;
   new_node->count += redirected_sum;
-  orig_node->count -= redirected_sum;
+  orig_node->count
+= lenient_count_portion_handling (orig_node->count - redirected_sum,
+ orig_node);

   for (cs = new_node->callees; cs; cs = cs->next_callee)
 cs->count += cs->count.apply_scale (redirected_sum, new_node_count);

[Bug tree-optimization/108839] Option for rerolling loops

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108839

--- Comment #1 from Andrew Pinski  ---
Note the SLP vectorizer should kick in for most cases of manually unrolled
loops.

[Bug tree-optimization/108839] New: Option for rerolling loops

2023-02-17 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108839

Bug ID: 108839
   Summary: Option for rerolling loops
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Code sometimes contains manual unrolling.  For example, the BLAS
reference implementation, subroutine DSCAL, has

  IF (INCX.EQ.1) THEN
*
*code for increment equal to 1
*
*
*clean-up loop
*
 M = MOD(N,5)
 IF (M.NE.0) THEN
DO I = 1,M
   DX(I) = DA*DX(I)
END DO
IF (N.LT.5) RETURN
 END IF
 MP1 = M + 1
 DO I = MP1,N,5
DX(I) = DA*DX(I)
DX(I+1) = DA*DX(I+1)
DX(I+2) = DA*DX(I+2)
DX(I+3) = DA*DX(I+3)
DX(I+4) = DA*DX(I+4)
 END DO
  ELSE

While such code may have been beneficial on old architectures, by
now this disturbs the compiler's own unrolling and vectorization,
and it increases code size.

It could be beneficial to have a -freroll-loops option, which
undid the manual unrolling of the code above. This could be
stand-alone, or included in options such as -Os.

[Bug fortran/108838] New: [OpenMP] Array section of allocatable deferred-string has the wrong offset for the data component

2023-02-17 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108838

Bug ID: 108838
   Summary: [OpenMP] Array section of allocatable deferred-string
has the wrong offset for the data component
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: openmp, wrong-code
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
  Target Milestone: ---

The following program fails. Looking at the address passed to 
GOMP_target_enter_exit_data, it has the address of 'astr(3)' not of 'astr(4)'
as expected.

(Looks vaguely related to issue PR 108837, but internally different.)

The dump shows:
  parm.8.data = (void *) &(*(character(kind=1)[0:][1:.astr] * restrict)
  astr.data)[4 - astr.dim[0].lbound];

which is identical to:
  parm.8.data = (void *) &(*(character(kind=1)[0:][1:.astr] * restrict)
   astr.data)[1];
and it sees as if '1:.astr' effectively is 0 as that would explain 
the offset of 0. Probably, we want to use the array syntax only if
the UNIT_SIZE is either a constant or a SAVE_EXPR but not if it is some other
expression.

Or in other words: We probably do not want to use the array syntax for
deferred strings.

I wonder whether it will work with CLASS which has the
same issue (int the case the dynamic type has more components as the declared
one).

* * *


character(len=:), allocatable :: astr(:)
allocate(character(len=6) :: astr(3:5))

print '(z16,a)', loc(astr), ' astr'
print '(z16,a)', loc(astr(4)), ' astr4'
!$omp target enter data map(alloc: astr(4:5))
astr(3) = "01db45"
!$omp target map(alloc: astr(4:5))
  if (.not. allocated(astr)) error stop
  if (len(astr) /= 6) error stop
  if (size(astr) /= 3) error stop
  if (lbound(astr, 1) /= 3) error stop
  if (ubound(astr, 1) /= 5) error stop
  astr(4:5) = ["jk$D%S", "zutg47"]
!$omp end target
!$omp target exit data map(from: astr(4:5))
if (.not. allocated(astr)) error stop
if (len(astr) /= 6) error stop
if (size(astr) /= 3) error stop
if (lbound(astr, 1) /= 3) error stop
if (ubound(astr, 1) /= 5) error stop
print '(">",a,"<")', astr
if (any (astr /= ["01db45", "jk$D%S", "zutg47"])) error stop
end

[Bug target/108831] QImode binary ops with one zero-extracted argument can be optimized

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108831

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |13.0
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Uroš Bizjak  ---
Implemented for gcc-13.

[Bug target/108831] QImode binary ops with one zero-extracted argument can be optimized

2023-02-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108831

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:6245441e124846d0c3551f312d2feef598fe251c

commit r13-6118-g6245441e124846d0c3551f312d2feef598fe251c
Author: Uros Bizjak 
Date:   Fri Feb 17 17:00:12 2023 +0100

ii386: Generate QImode binary ops with high-part input register [PR108831]

Following testcase:

--cut here--
struct S
{
  unsigned char pad1;
  unsigned char val;
  unsigned short pad2;
};

unsigned char
test_add (unsigned char a, struct S b)
{
  a += b.val;

  return a;
}
--cut here--

should be compiled to something like:

addb %dh, %al

but is currently compiled to:

movzbl  %dh, %edx
addl%edx, %eax

The patch implements insn patterns that model QImode binary ops with
high-part QImode input register.  These ops can not be encoded with
REX prefix, so only Q registers and constant memory output operands
are allowed on x86_64 targets.

2023-02-17  Uroš Bizjak  

gcc/ChangeLog:

PR target/108831
* config/i386/predicates.md
(nonimm_x64constmem_operand): New predicate.
* config/i386/i386.md (*addqi_ext_0): New insn pattern.
(*subqi_ext_0): Ditto.
(*andqi_ext_0): Ditto.
(*qi_ext_0): Ditto.

gcc/testsuite/ChangeLog:

PR target/108831
* gcc.target/i386/pr108831-1.c: New test.
* gcc.target/i386/pr108831-2.c: Ditto.

[Bug tree-optimization/108819] [12/13 Regression] ICE on valid code at -O1 with "-fno-tree-ccp -fno-tree-forwprop" on x86_64-linux-gnu: tree check: expected ssa_name, have integer_cst in number_of_ite

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108819

--- Comment #4 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #3)
> --- a/gcc/tree-ssa-reassoc.cc
> +++ b/gcc/tree-ssa-reassoc.cc
> @@ -2950,6 +2950,9 @@ update_range_test (struct range_entry *range, struct
> range_entry *otherrange,
>  }
>if (stmt == NULL)
>  gcc_checking_assert (tem == op);
> +  /* When range->exp is a constant, we can use it as-is.  */
> +  else if (is_gimple_min_invariant (tem))
> +;
>/* In rare cases range->exp can be equal to lhs of stmt.
>   In that case we have to insert after the stmt rather then before
>   it.  If stmt is a PHI, insert it at the start of the basic block.  */

That would make things worse, not better (i.e. constants could appear more
often and we could trigger these problems more often), no?
forwprop/ccp etc. should optimize it later...

I wonder if we just can't do:
--- gcc/tree-ssa-reassoc.cc.jj  2023-02-16 10:41:11.0 +0100
+++ gcc/tree-ssa-reassoc.cc 2023-02-17 17:43:52.169452832 +0100
@@ -4687,6 +4687,8 @@ update_ops (tree var, enum tree_code cod
   gimple_set_uid (g, gimple_uid (stmt));
   gimple_set_visited (g, true);
   gsi_insert_before (, g, GSI_SAME_STMT);
+  gimple_stmt_iterator gsi2 = gsi_for_stmt (g);
+  fold_stmt_inplace ();
 }
   return var;
 }
or if the in-place folding wouldn't be appropriate, at least fold it by hand if
both arguments are constants.  Though, there is also the case of commutative
ops and just the first one turned into constant etc.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #21 from Wilco  ---
(In reply to Gabriel Ravier from comment #19)

> If the original code being branchless makes it faster, wouldn't that imply
> that we should use the table-based implementation when generating code for
> `__builtin_ctz` ?

__builtin_ctz is 3-4 times faster than the table implementation, so this
optimization is always worth it. This is why I believe the current situation is
not ideal since various targets still set CTZ_DEFINED_VALUE_AT_ZERO to 0 or 1.
One option would be to always allow it in Gimple (perhaps add an extra argument
for the value to return for a zero input), and at expand time check whether the
backend supports the requested value. It it doesn't, emit branches.

[Bug libstdc++/108836] std::mutex disappears in single-threaded libstdc++ builds

2023-02-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108836

--- Comment #3 from Jonathan Wakely  ---
Whatever we do it won't help them, as they're using libstdc++ headers from gcc
6.3

It's possible to have a gcc build that has limited support for threading, but
not enough to support C++11 std::mutex, std::thread etc and in such cases it
would be odd to have a no-op mutex. Another option might be to provide a very
dumb mutex using atomic_flag for a spin lock.

[Bug libstdc++/108836] std::mutex disappears in single-threaded libstdc++ builds

2023-02-17 Thread pdimov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108836

--- Comment #2 from Peter Dimov  ---
That's good to hear, but I don't think the issue is specific to mingw32. The
other report, https://github.com/boostorg/system/issues/92, was about "B
PLC", whatever this means. :-)

[Bug libstdc++/108836] std::mutex disappears in single-threaded libstdc++ builds

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108836

--- Comment #1 from Andrew Pinski  ---
mingw32 is no longer defaults to a single threaded and std::mutex is supported
there on the trunk for GCC 13 

[Bug fortran/108837] New: Deferred string length component of DT + array section passes the wrong array elements

2023-02-17 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108837

Bug ID: 108837
   Summary: Deferred string length component of DT + array section
passes the wrong array elements
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
  Target Milestone: ---

Created attachment 54482
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54482=edit
Draft patch - fixes the issue but does not regtest

The string handling for components seems to be still rather broken - and having
the string length saves twice, in a hidden string-length component and as
(elem_len / char-kind) looks also wrong + it does not quite work as 'allocate'
does not seem to update that len component for arrays.

Additionally, it feels very wrong to have  ts->u.cl.backend_decl =
int_zero_cst(...) in gfc_get_derived_type.

 * * *

However, the issue here is that the attached testcase has
   ['1a', '2b', '3c']
instead of the expected
   ['7g', '8h', '9i']
inside the called function as offset = 0 as TYPE_SIZE = 0

 * * *

The attached patch now uses elem_len of the array descriptor which should work
and also fixes the attached testcase.

However, there are now the following fails. I have not debugged them and it is
likely that the bug there is that elem_len is not properly set.

gfortran.dg/PR100120.f90 (execution)
gfortran.dg/array_reference_3.f90 (scan dump)
gfortran.dg/class_to_type_1.f03 (execution)
gfortran.dg/class_array_21.f03 (execution)
gfortran.dg/finalize_13.f90 (execution)

Additionally, it might be very well that more callers should pass the elemsz.

[Bug libstdc++/108836] New: std::mutex disappears in single-threaded libstdc++ builds

2023-02-17 Thread pdimov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108836

Bug ID: 108836
   Summary: std::mutex disappears in single-threaded libstdc++
builds
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pdimov at gmail dot com
  Target Milestone: ---

We've been getting reports in Boost that our uses of  and std::mutex
don't work in a single-threaded build of libstdc++, so we had to add
configuration macros to avoid these issues. One example is
https://github.com/boostorg/system/commit/53c00841fc0d892bf43cda60e3ea2f05c4362b32,
another https://github.com/boostorg/url/issues/684.

Is there a reason not to make std::mutex available in single threaded builds,
with its operations being no-ops?

[Bug rtl-optimization/108805] [13 Regression] ICE: in simplify_subreg, at simplify-rtx.cc:7400 at -O and above

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108805

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Uroš Bizjak  ---
Fixed.

[Bug rtl-optimization/108805] [13 Regression] ICE: in simplify_subreg, at simplify-rtx.cc:7400 at -O and above

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108805

Uroš Bizjak  changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #4 from Uroš Bizjak  ---
Caused by g:d45ec8a732f4

[Bug rtl-optimization/108805] [13 Regression] ICE: in simplify_subreg, at simplify-rtx.cc:7400 at -O and above

2023-02-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108805

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:6ac3ebed5ffbac0d81c5a1d0cb1e345cfad202a8

commit r13-6117-g6ac3ebed5ffbac0d81c5a1d0cb1e345cfad202a8
Author: Uros Bizjak 
Date:   Fri Feb 17 15:58:12 2023 +0100

simplify-rtx: Fix VOIDmode operand handling in simplify_subreg [PR108805]

simplify_subreg can return VOIDmode const_int operand and will
cause ICE in simplify_gen_subreg when this operand is passed to it.

The patch uses int_outermode instead of GET_MODE of temporary as the
innermode argument of simplify_gen_subreg.

2023-02-17  Uroš Bizjak  

gcc/ChangeLog:

PR target/108805
* simplify-rtx.cc (simplify_context::simplify_subreg): Use
int_outermode instead of GET_MODE (tem) to prevent
VOIDmode from entering simplify_gen_subreg.

gcc/testsuite/ChangeLog:

PR target/108805
* gcc.dg/pr108805.c: New test.

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

--- Comment #5 from Martin Liška  ---
Thank you Jakub for the investigation. I'm saying yes, using symbol names from
debuginfo seems to me a nice improvement. Lemme take a look at it..

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #20 from Jakub Jelinek  ---
No, because __builtin_ctz is branchless too, it just has UB when the argument
is 0.

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

--- Comment #4 from Jakub Jelinek  ---
https://reviews.llvm.org/D127552
So I guess we need to look also if (and if not, why not) we get the same
symbolization from debug info and drop the location stuff there.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #19 from Gabriel Ravier  ---
(In reply to Jakub Jelinek from comment #14)
> The patch does:
> +  bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval)
> == 2;
> +
> +  /* Skip if there is no value defined at zero, or if we can't easily
> +return the correct value for zero.  */
> +  if (!zero_ok)
> +   return false;
> +  if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
> +   return false;
> For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
> need
> to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
> GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
> basic blocks right now), where there is a high chance that RTL opts would
> turn it back into unconditional
> ctz.
> That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is
> 0 there.
> We could handle even that case by doing the branches around, but those would
> stay there
> in the generated code, at which point I wonder whether it would be a win. 
> The original
> code is branchless...

If the original code being branchless makes it faster, wouldn't that imply that
we should use the table-based implementation when generating code for
`__builtin_ctz` ?

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

--- Comment #3 from Jakub Jelinek  ---
This is in the arrays passed to __asan_{,un}register_globals.
Now, we emit
/* Build
   struct __asan_global
   {
 const void *__beg;
 uptr __size;
 uptr __size_with_redzone;
 const void *__name;
 const void *__module_name;
 uptr __has_dynamic_init;
 __asan_global_source_location *__location;
 char *__odr_indicator;
   } type.  */
so __module_name should be the filename the global appeared in (so for LTO
DECL_NAME of corresponding TRANSLATION_UNIT_DECL?), while __location has more
details.
But, looking on the libsanitizer side, it has
  // This structure describes an instrumented global variable.
  struct __asan_global {
uptr beg;// The address of the global.
uptr size;   // The original size of the global.
uptr size_with_redzone;  // The size with the redzone.
const char *name;// Name as a C string.
const char *module_name; // Module name as a C string. This pointer is a
 // unique identifier of a module.
uptr has_dynamic_init;   // Non-zero if the global has dynamic initializer.
uptr windows_padding;// TODO: Figure out how to remove this padding
 // that's simply here to make the MSVC incremental
 // linker happy...
uptr odr_indicator;  // The address of the ODR indicator symbol.
  };
so I wonder if emitting the locations isn't just wasted .rodata if libasan
considers it being a windows_padding.  In GCC 12 libsanitizer it was still
location:
--- gcc-12/libsanitizer/asan/asan_interface_internal.h  2022-04-28
15:56:17.730640966 +0200
+++ gcc/libsanitizer/asan/asan_interface_internal.h 2022-11-15
22:57:18.450207911 +0100
@@ -53,8 +53,9 @@ extern "C" {
 const char *module_name; // Module name as a C string. This pointer is a
  // unique identifier of a module.
 uptr has_dynamic_init;   // Non-zero if the global has dynamic
initializer.
-__asan_global_source_location *location;  // Source location of a global,
-  // or NULL if it is unknown.
+uptr windows_padding;// TODO: Figure out how to remove this padding
+ // that's simply here to make the MSVC
incremental
+ // linker happy...
 uptr odr_indicator;  // The address of the ODR indicator symbol.
   };

So I wonder what kind of mess upstream introduced again.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #18 from Jakub Jelinek  ---
It is generally a win for cases where the condition can't be predicted, while
if it can, jumps are much better.  We have dozens or hundreds of PRs about this
in either direction on x86.

[Bug sanitizer/108817] ASAN at -O3 failed to detect a global-buffer-overflow

2023-02-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108817

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Liška  ---
Here we prove that 'return c' only depends on the last execution of 'c =
b[a+1];' which is going to happen with a == 0 and thus we optimize out that.

$ gcc pr108817.C -fsanitize=address -O3 -fdump-tree-optimized=/dev/stdout

int main ()
{
  signed char _4;
  unsigned long _8;
  int _12;
  unsigned long _15;
  bool _16;
  unsigned long _18;
  int _19;
  char _20;
  bool _21;
  signed char _22;
  signed char _23;
  unsigned long _25;
  signed char * _27;
  bool _38;

   [local count: 26541933]:
  a = 2;
  _15 = (unsigned long)   [(void *) + 4B];
(checking if  + 4 is valid in shadow memory) 
  if (_16 != 0)
goto ; [0.05%]
  else
goto ; [99.95%]

   [local count: 13271]:
  __builtin___asan_report_load4 (_15);

   [local count: 26541933]:
  _19 = b[1]; <- here we use it as the future value
  _20 = (char) _19;
  c = _20;
  a = -1;
  _12 = (int) _20;
  return _12;
}

Final note: note clang does not report the issue even with -O1.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #17 from Wilco  ---
(In reply to Jakub Jelinek from comment #16)
> (In reply to Wilco from comment #15)
> > It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO
> > == 2 so that you always get the same result even when you don't have tzcnt.
> > A conditional move would be possible, so it adds an extra 2 instructions at
> > worst (ie. still significantly faster than doing the table lookup, multiply
> > etc). And it could be optimized when you know CLZ/CTZ input is non-zero.
> 
> Conditional moves are a lottery on x86, in many cases very bad idea.  And
> when people actually use __builtin_clz*, they state that they don't care
> about the 0 value, so emitting terribly performing code for it just in case
> would be wrong.
> If forwprop emits the conditional in separate blocks for the CTZ_DVAZ!=2
> case, on targets where conditional moves are beneficial for it it can also
> emit them, or emit the jump which say on x86 will be most likely faster than
> cmov.

Well GCC emits a cmov for this (-O2 -march=x86-64-v2):

int ctz(long a)
{
  return (a == 0) ? 64 : __builtin_ctzl (a);
}

ctz:
xor edx, edx
mov eax, 64
rep bsf rdx, rdi
testrdi, rdi
cmovne  eax, edx
ret

Note the extra 'test' seems redundant since IIRC bsf sets Z=1 if the input is
zero.

On Zen 2 this has identical performance as the plain builtin when you loop it
as res = ctz (res) + 1; (ie. measuring latency of non-zero case). So I find it
hard to believe cmov is expensive on modern cores.

[Bug ipa/108226] __restrict on inlined function parameters does not function as expected

2023-02-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108226

--- Comment #2 from Martin Jambor  ---
(In reply to Richard Biener from comment #1)
> 
> so somehow the restrict qualification pessimizes IPA-CP?!  Martin?
> 

Well, funny thing.  Without restrict, IPA-CP sees (from release_ssa dump):

void Func3 (char * p1, int * p2)
{
   [local count: 1073741824]:
  *p1_3(D) = 123;
  *p2_2(D) = 1;
  Func1 (p1_3(D), p2_2(D));
  return;
}

But with restrict in Func2 parameters, Func3 becomes:

void Func3 (char * p1, int * p2)
{
   [local count: 1073741824]:
  *p2_2(D) = 1;
  *p1_4(D) = 123;
  Func1 (p1_4(D), p2_2(D));
  return;
}

And the different ordering of the two stores is the problem, even when
p1 is not a char pointer, because we dont't trust the types of the
actual/formal parameters for TBAA (we would need to know in what types
they are read in Func1).

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

--- Comment #5 from Jakub Jelinek  ---
Created attachment 54481
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54481=edit
gcc13-pr108832.patch

Untested fix.

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

--- Comment #2 from Martin Liška  ---
So the module name is a string that is displayed when an ASAN error happens and
I see a discrepancy in between GCC and Clang (with LTO):

$ cat jhead.i
int x;
int *p;
int main() {
  p = 
  *(p + 1) = 123;

  return 0;
}

$ clang jhead.i -fsanitize=address -flto && ./a.out
...
0x55fc34e4 is located 28 bytes to the left of the global variable 'p'
defined in 'jhead.i' (0x55fc3500) of size 8

$ gcc jhead.i -fsanitize=address -flto && ./a.out
...
0x00404104 is located 60 bytes before global variable 'p' defined in
'/tmp/cci9oq4s.ltrans0.o' (0x404140) of size 8

$ gcc jhead.i -fsanitize=address && ./a.out
...
0x00404104 is located 0 bytes after global variable 'x' defined in
'jhead.i' (0x404100) of size 4

So, yes, we should follow the DECL_CONTEXT.

[Bug c++/100295] Internal compiler error from generic lambda capturing parameter pack and expanding it in if constexpr

2023-02-17 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100295

Patrick Palka  changed:

   What|Removed |Added

  Known to work||12.2.1, 13.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Patrick Palka  ---
Fixed for GCC 12.3/13

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug tree-optimization/108819] [12/13 Regression] ICE on valid code at -O1 with "-fno-tree-ccp -fno-tree-forwprop" on x86_64-linux-gnu: tree check: expected ssa_name, have integer_cst in number_of_ite

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108819

Richard Biener  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
I have a patch to make niter analysis more defensive, the 1 & 1 is introduced
by reassoc:

@@ -30,8 +54,8 @@
[local count: 114863530]:
   _20 = a.0_1 == 0;
   _21 = a.0_1 > 0;
-  _22 = _20 & _21;
-  if (_22 != 0)
+  _7 = 1 & 1;
+  if (_7 != 0)

where update_range_test gets a '1' as result and forces that to an SSA name
and things go downhill from that.  With

diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index f163612f140..c2b30a03a9d 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -2950,6 +2950,9 @@ update_range_test (struct range_entry *range, struct
range_entry *otherrange,
 }
   if (stmt == NULL)
 gcc_checking_assert (tem == op);
+  /* When range->exp is a constant, we can use it as-is.  */
+  else if (is_gimple_min_invariant (tem))
+;
   /* In rare cases range->exp can be equal to lhs of stmt.
  In that case we have to insert after the stmt rather then before
  it.  If stmt is a PHI, insert it at the start of the basic block.  */

this is resolved (but we still get the intermediate 1 & 1 created).  Jakub,
you know this code more(?), can you see whether there's a better place to
handle this?

I'm testing the niter fortification.

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

--- Comment #4 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #1)

> and so ICEs if we see the same REGNO as from in a different mode.
> I think we actually don't need most of what replace_rtx is doing, we don't
> need to simplify anything etc. because we are just changing one register to
> another and can do it in place.
> So, I think we need a different function for what the backend wants.
> It can avoid all the simplify stuff because replace_rtx was destructive, so
> could be implemented say using FOR_EACH_SUBRTX_PTR.  When seeing *loc ==
> from, it obviously
> should set *loc = to, if it sees REG_P (*loc) && REGNO (*loc) == REGNO
> (from), then
> if the mode is the same, it can also just *loc = to, but if it is a
> different mode,
> I'd say for narrower mode it should *loc = gen_rtx_REG (GET_MODE (*loc),
> REGNO (to));
> and for wider mode (especially if say a multi-register reg) punt.
> Not sure if such a case can occur though, but the punting would be hard if
> we have made some changes already...

There are no multi-registers in flags-setting integer instructions, we only
have instructions with implicit ZERO_EXTEND from SI to DImode in case of x86_64
target.

So, a FOR_EACH_RTX loop that blindly changes REGNOs of the RTX should do the
trick. Perhaps do it on a copied RTX, to avoid nasty surprises.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #16 from Jakub Jelinek  ---
(In reply to Wilco from comment #15)
> It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO
> == 2 so that you always get the same result even when you don't have tzcnt.
> A conditional move would be possible, so it adds an extra 2 instructions at
> worst (ie. still significantly faster than doing the table lookup, multiply
> etc). And it could be optimized when you know CLZ/CTZ input is non-zero.

Conditional moves are a lottery on x86, in many cases very bad idea.  And when
people actually use __builtin_clz*, they state that they don't care about the 0
value, so emitting terribly performing code for it just in case would be wrong.
If forwprop emits the conditional in separate blocks for the CTZ_DVAZ!=2 case,
on targets where conditional moves are beneficial for it it can also emit them,
or emit the jump which say on x86 will be most likely faster than cmov.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #15 from Wilco  ---
(In reply to Jakub Jelinek from comment #14)
> The patch does:
> +  bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval)
> == 2;
> +
> +  /* Skip if there is no value defined at zero, or if we can't easily
> +return the correct value for zero.  */
> +  if (!zero_ok)
> +   return false;
> +  if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
> +   return false;
> For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
> need
> to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
> GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
> basic blocks right now), where there is a high chance that RTL opts would
> turn it back into unconditional
> ctz.
> That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is
> 0 there.
> We could handle even that case by doing the branches around, but those would
> stay there
> in the generated code, at which point I wonder whether it would be a win. 
> The original
> code is branchless...

It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO == 2
so that you always get the same result even when you don't have tzcnt. A
conditional move would be possible, so it adds an extra 2 instructions at worst
(ie. still significantly faster than doing the table lookup, multiply etc). And
it could be optimized when you know CLZ/CTZ input is non-zero.

[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708

2023-02-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351

--- Comment #5 from Martin Jambor  ---
If you rename main to something else, like bar, and so the calls to f
outside of the loop are not considered cold, you get the GCC 12
behavior.  Is this reduced from a real-world problem?

Because on the testcase IPA-CP actually does what I would like it to
do, it iginores the first parameter because really IPA-SRA is better
placed to deal with it and then does not duplicate f for the cold
calls.

The fact that the GCC 12 heuristics first cloned for a constant in a
useless parameter in the loop and then, when removing it in the other
calls, it happened to find out that those two share the same constant
in the second parameter, which happened to make the function shorter,
is basically luck rather than design.

[Bug testsuite/108835] New: gm2 tests at large -jNN numbers do not return

2023-02-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835

Bug ID: 108835
   Summary: gm2 tests at large -jNN numbers do not return
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aldyh at gcc dot gnu.org
  Target Milestone: ---

Running make check -j55 sometimes yields tests that fail to terminate in the
gm2/ directory.  For example, coroutine.x5 and testtransfer.x5.

Worst case scenario there should be a timeout for these tests.

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

--- Comment #3 from Jakub Jelinek  ---
ICEs since r13-4224-g826c22dff6455ba32 , latent before.

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Priority|P3  |P1
   Last reconfirmed||2023-02-17
   Target Milestone|--- |13.0

--- Comment #2 from Richard Biener  ---
Confirmed.

[Bug c++/108833] [12 Regression] internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Richard Biener  changed:

   What|Removed |Added

Summary|internal compiler error:|[12 Regression] internal
   |Segmentation fault (GCC |compiler error:
   |12.1.1) |Segmentation fault (GCC
   ||12.1.1)
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |12.3
   Last reconfirmed||2023-02-17
   Keywords||ice-on-valid-code,
   ||needs-bisection,
   ||needs-reduction
 Ever confirmed|0   |1
  Known to work||12.1.0, 13.0
  Known to fail||12.2.0

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

--- Comment #1 from Richard Biener  ---
But those should be all generated early?  Or if not by walking DECL_CONTEXT up
to the TRANSLATION_UNIT_DECL and from its location derive the filename.

[Bug target/108832] [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||uros at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
So, this is on
(insn 263 304 17 2 (set (reg:DI 2 cx [98])
(reg:DI 1 dx [98])) "pr108832.c":15:9 discrim 1 82 {*movdi_internal}
 (expr_list:REG_DEAD (reg:DI 1 dx [98])
(nil)))
(insn 17 263 261 2 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (and:SI (reg:SI 2 cx [98])
(const_int -2 [0xfffe]))
(const_int 0 [0])))
(set (reg:DI 2 cx)
(zero_extend:DI (and:SI (reg:SI 2 cx [98])
(const_int -2 [0xfffe]
]) "pr108832.c":16:11 565 {*andsi_2_zext}
 (nil))
(insn 261 17 259 2 (set (reg:DI 0 ax)
(const_int 1 [0x1])) 82 {*movdi_internal}
 (expr_list:REG_EQUIV (const_int 1 [0x1])
(nil)))
(insn 259 261 4 2 (set (reg:DI 2 cx)
(if_then_else:DI (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:DI 2 cx)
(reg:DI 0 ax))) 1304 {*movdicc_noc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(expr_list:REG_DEAD (reg:DI 0 ax)
(nil
on which
;; Eliminate a reg-reg mov by inverting the condition of a cmov (#1).
;; mov r0,r1; dec r0; mov r2,r3; cmov r0,r2 -> dec r1; mov r0,r3; cmov r0, r1
(define_peephole2
 [(set (match_operand:SWI248 0 "general_reg_operand")
   (match_operand:SWI248 1 "general_reg_operand"))
  (parallel [(set (reg FLAGS_REG) (match_operand 5))
 (set (match_dup 0) (match_operand:SWI248 6))])
  (set (match_operand:SWI248 2 "general_reg_operand")
   (match_operand:SWI248 3 "general_gr_operand"))
  (set (match_dup 0)
   (if_then_else:SWI248 (match_operator 4 "ix86_comparison_operator"
 [(reg FLAGS_REG) (const_int 0)])
(match_dup 0)
(match_dup 2)))]
 "TARGET_CMOVE
  && REGNO (operands[2]) != REGNO (operands[0])
  && REGNO (operands[2]) != REGNO (operands[1])
  && peep2_reg_dead_p (1, operands[1])
  && peep2_reg_dead_p (4, operands[2])
  && !reg_overlap_mentioned_p (operands[0], operands[3])"
 [(parallel [(set (match_dup 7) (match_dup 8))
 (set (match_dup 1) (match_dup 9))])
  (set (match_dup 0) (match_dup 3))
  (set (match_dup 0) (if_then_else:SWI248 (match_dup 4)
  (match_dup 1)
  (match_dup 0)))]
{
  operands[7] = SET_DEST (XVECEXP (PATTERN (peep2_next_insn (1)), 0, 0));
  operands[8] = replace_rtx (operands[5], operands[0], operands[1], true);
  operands[9] = replace_rtx (operands[6], operands[0], operands[1], true);
})

applies.  replace_rtx has 2 modes, !all_regs in which it replaces just x ==
from with to
and all_regs, in which case it does:
  if (all_regs
  && REG_P (x)
  && REG_P (from)
  && REGNO (x) == REGNO (from))
{
  gcc_assert (GET_MODE (x) == GET_MODE (from));
  return to;
}
and so ICEs if we see the same REGNO as from in a different mode.
I think we actually don't need most of what replace_rtx is doing, we don't need
to simplify anything etc. because we are just changing one register to another
and can do it in place.
So, I think we need a different function for what the backend wants.
It can avoid all the simplify stuff because replace_rtx was destructive, so
could be implemented say using FOR_EACH_SUBRTX_PTR.  When seeing *loc == from,
it obviously
should set *loc = to, if it sees REG_P (*loc) && REGNO (*loc) == REGNO (from),
then
if the mode is the same, it can also just *loc = to, but if it is a different
mode,
I'd say for narrower mode it should *loc = gen_rtx_REG (GET_MODE (*loc), REGNO
(to));
and for wider mode (especially if say a multi-register reg) punt.
Not sure if such a case can occur though, but the punting would be hard if we
have made some changes already...

[Bug tree-optimization/108821] [11/12 Regression] LIM reissuing a violatile store when it cannot/should not

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108821

Richard Biener  changed:

   What|Removed |Added

Summary|[11/12/13 Regression] LIM   |[11/12 Regression] LIM
   |reissuing a violatile store |reissuing a violatile store
   |when it cannot/should not   |when it cannot/should not
  Known to work||13.0

--- Comment #5 from Richard Biener  ---
Fixed on trunk sofar.

[Bug tree-optimization/108821] [11/12/13 Regression] LIM reissuing a violatile store when it cannot/should not

2023-02-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108821

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:4c4f0f7acd6b96ee744ef598cbea5c7046a33654

commit r13-6114-g4c4f0f7acd6b96ee744ef598cbea5c7046a33654
Author: Richard Biener 
Date:   Fri Feb 17 12:36:44 2023 +0100

tree-optimization/108821 - store motion and volatiles

The following fixes store motion to not re-issue volatile stores
to preserve TBAA behavior since that will result in the number
of volatile accesses changing.

PR tree-optimization/108821
* tree-ssa-loop-im.cc (sm_seq_valid_bb): We can also not
move volatile accesses.

* gcc.dg/tree-ssa/ssa-lim-24.c: New testcase.

[Bug sanitizer/108834] LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |13.0
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-02-17

[Bug sanitizer/108834] New: LTO: ltrans temporary file is used as module name in ASAN

2023-02-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108834

Bug ID: 108834
   Summary: LTO: ltrans temporary file is used as module name in
ASAN
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

We originally noticed that accidentally in openSUSE bugzilla:
https://bugzilla.opensuse.org/show_bug.cgi?id=1208386

$ cat jhead.i
int foo;

$ gcc -flto -fsanitize=address jhead.i -shared -fPIC -o jhead && md5sum jhead
4e0fb88f928272b4962c6dcd8b845d71  jhead
$ gcc -flto -fsanitize=address jhead.i -shared -fPIC -o jhead && md5sum jhead
e3c77e7ce9d54afb812add5b87a254d1  jhead

$ strings jhead | grep ltrans
/tmp/ccIzP3oh.ltrans0.o

it comes from ASAN module name:

...
.LC2:
.string "./jhead.ltrans0.o"
.section.data.rel,"aw"
.align 32
.type   .LASAN0.2, @object
.size   .LASAN0.2, 64
.LASAN0.2:
.quad   .LASAN1.0
.quad   4
.quad   64
.quad   .LC1
.quad   .LC2

which is main_input_filename:
gcc/asan.cc:3290

Anyway, I can fix it.

[Bug c++/108833] New: internal compiler error: Segmentation fault (GCC 12.1.1)

2023-02-17 Thread liavonlida at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108833

Bug ID: 108833
   Summary: internal compiler error: Segmentation fault (GCC
12.1.1)
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: liavonlida at gmail dot com
  Target Milestone: ---

Created attachment 54480
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54480=edit
I tried to make the sample minimal.

Compilation fails with Segmentation fault.


:250:8: internal compiler error: Segmentation fault
  250 |   s >> range;
  |^
0x1bb069e internal_error(char const*, ...)
???:0
0x87ce73 instantiate_decl(tree_node*, bool, bool)
???:0
0x898f3b instantiate_pending_templates(int)
???:0
0x7a4928 c_parse_final_cleanups()
???:0


I checked other versions at godbolt.org:

gcc 12.1.0  -> ok
gcc 12.1.1  -> fail  <<- my case with gcc-toolset-12 on Rocky Linux
8.7
gcc 12.2.0  -> fail
gcc trunk(13.0.1)   -> ok

Minimal code is attached

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #14 from Jakub Jelinek  ---
The patch does:
+  bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval) ==
2;
+
+  /* Skip if there is no value defined at zero, or if we can't easily
+return the correct value for zero.  */
+  if (!zero_ok)
+   return false;
+  if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
+   return false;
For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
need
to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
basic blocks right now), where there is a high chance that RTL opts would turn
it back into unconditional
ctz.
That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is 0
there.
We could handle even that case by doing the branches around, but those would
stay there
in the generated code, at which point I wonder whether it would be a win.  The
original
code is branchless...

[Bug target/108832] New: [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358

2023-02-17 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108832

Bug ID: 108832
   Summary: [13 Regression] ICE in replace_rtx, at rtlanal.cc:3358
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---
Target: x86_64-pc-linux-gnu

gcc 13.0.1 20230212 snapshot (g:06ca0c9abb260266b688e2c2154c72214bb47076) ICEs
when compiling the following testcase w/ -O2 -funroll-loops:

unsigned int m;
short int n;

long int
bar (unsigned int x)
{
  return x ? x : 1;
}

__attribute__ ((simd)) void
foo (void)
{
  int a;

  a = m / bar (3);
  n = 1 % bar (a << 1);
}

% x86_64-pc-linux-gnu-gcc-13 -O2 -funroll-loops -c rxkpgn0d.c
during RTL pass: peephole2
rxkpgn0d.c: In function 'foo.simdclone.7':
rxkpgn0d.c:17:1: internal compiler error: in replace_rtx, at rtlanal.cc:3358
   17 | }
  | ^
0x73a043 replace_rtx(rtx_def*, rtx_def*, rtx_def*, bool)
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/rtlanal.cc:3358
0xe97803 replace_rtx(rtx_def*, rtx_def*, rtx_def*, bool)
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/rtlanal.cc:3397
0xe97803 replace_rtx(rtx_def*, rtx_def*, rtx_def*, bool)
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/rtlanal.cc:3397
0x19035d6 gen_peephole2_125(rtx_insn*, rtx_def**)
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/config/i386/i386.md:22054
0x1c816cc peephole2_17
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/config/i386/i386.md:1077
0x1c816cc peephole2_19
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/config/i386/i386.md:1094
0x1cbb55d peephole2_46
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/config/i386/i386.md:5200
0xe62447 peephole2_optimize
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/recog.cc:4180
0xe62447 rest_of_handle_peephole2
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/recog.cc:4331
0xe62447 execute
   
/var/tmp/portage/sys-devel/gcc-13.0.1_p20230212/work/gcc-13-20230212/gcc/recog.cc:4368

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-17 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931

--- Comment #9 from Sam James  ---
For completeness, this originated from
https://github.com/Perl/perl5/issues/19776, I believe.

[Bug target/108831] QImode binary ops with one zero-extracted argument can be optimized

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108831

--- Comment #2 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #1)
> The patch also handles constant memory operands on x86_64.

--cut here--
struct S
{
  unsigned char pad1;
  unsigned char val;
  unsigned short pad2;
};

unsigned char a;

void
test_and (struct S b)
{
  a &= b.val;
}
--cut here--

compiles to:

andb%ah, a(%rip)

[Bug target/108831] QImode binary ops with one zero-extracted argument can be optimized

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108831

Uroš Bizjak  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-02-17

--- Comment #1 from Uroš Bizjak  ---
Created attachment 54479
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54479=edit
Proposed patch

The patch also handles constant memory operands on x86_64.

[Bug target/108831] New: QImode binary ops with one zero-extracted argument can be optimized

2023-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108831

Bug ID: 108831
   Summary: QImode binary ops with one zero-extracted argument can
be optimized
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase:

--cut here--
struct S
{
  unsigned char pad1;
  unsigned char val;
  unsigned short pad2;
};

unsigned char
test_add (unsigned char a, struct S b)
{
  a += b.val;

  return a;
}
--cut here--

should be compiled to something like:

addb %dh, %al

but is currently compiled to:

movzbl  %dh, %edx
addl%edx, %eax

[Bug testsuite/108813] gcc.target/powerpc/pr101384-2.c fails on power 9 BE

2023-02-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108813

Kewen Lin  changed:

   What|Removed |Added

   Last reconfirmed||2023-02-17
 CC||linkw at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
  Component|target  |testsuite

--- Comment #1 from Kewen Lin  ---
This is a test issue, GCC generates xxspltib rather than vspltis[whb] for const
vector. The below patch can fix it:

-/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } */
+/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M|\mxxspltib
[^\n\r]*,255\M} 9 } } */

[Bug tree-optimization/108828] ivopts silencing gcc.dg/Wuse-after-free-2.c:115

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108828

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic, testsuite-fail

--- Comment #1 from Richard Biener  ---
I would suggest to add -fno-ivopts to the testcases.  The diagnostic runs very
late and is prone to IL changes as all static analysis we do on optimized code.

[Bug tree-optimization/108825] [13 Regression] error during GIMPLE pass: unrolljam

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108825

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #10 from Richard Biener  ---
I will have a look.

[Bug tree-optimization/108821] [11/12/13 Regression] LIM reissuing a violatile store when it cannot/should not

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108821

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Priority|P3  |P2

--- Comment #3 from Richard Biener  ---
Mine.

[Bug tree-optimization/108819] [12/13 Regression] ICE on valid code at -O1 with "-fno-tree-ccp -fno-tree-forwprop" on x86_64-linux-gnu: tree check: expected ssa_name, have integer_cst in number_of_ite

2023-02-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108819

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Priority|P3  |P2

--- Comment #2 from Richard Biener  ---
Not exactly "wrong", but yes, passes don't expect this.  I will have a look.