[Bug target/114323] [14 Regression] MVE vector load intrinsic miscompiled since r14-5622-g4d7647edfd7d98

2024-03-14 Thread prathamesh3492 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114323

prathamesh3492 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||prathamesh3492 at gcc dot 
gnu.org

--- Comment #3 from prathamesh3492 at gcc dot gnu.org ---
Just to expand on previous comments:
Before patch, input to dse is:

  uint32x4_t D.13560;
  const uint32_t D.13545[4];
  uint32x4_t V0;
  __simd128_uint32_t _7;

   :
  # .MEM_2 = VDEF <.MEM_1(D)>
  D.13545 = *.LC0;
  # .MEM_8 = VDEF <.MEM_2>
  _7 = __builtin_mve_vld1q_uv4si ();
  # .MEM_6 = VDEF <.MEM_8>
  D.13545 ={v} {CLOBBER(eos)};
  # VUSE <.MEM_6>
  return _7;

In this case, we have following virtual def-use chain:
.MEM_1(D) -> .MEM_2 -> .MEM_8 -> .MEM_6


However after patch, input to dse is:
  const uint32_t D.13539[4];
  uint32x4_t V0;

   :
  # .MEM_2 = VDEF <.MEM_1(D)>
  D.13539 = *.LC0;
  V0_3 = vld1q_u32 ();
  # .MEM_5 = VDEF <.MEM_2>
  D.13539 ={v} {CLOBBER(eos)};
  # VUSE <.MEM_5>
  return V0_3;

There's a missing use of MEM_2 in call to vld1q_u32, and
since the only use of MEM_2 now is in clobber statement,
dse considers it as a dead store, and simplifies it to:

   :
  V0_3 = vld1q_u32 ();
  # .MEM_5 = VDEF <.MEM_1(D)>
  D.13539 ={v} {CLOBBER(eos)};
  # VUSE <.MEM_5>
  return V0_3;

thus passing uninitialized pointer to vld1q_u32.

Thanks,
Prathamesh

[Bug middle-end/114347] wrong constant folding when casting __bf16 to int

2024-03-14 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114347

--- Comment #2 from Paul Eggert  ---
(In reply to Andrew Pinski from comment #1)
> I am not so sure that 257.0bf16 gets rounded to 256.

It should get rounded to 256, since 257 has no exact representation in __bf16
and 256 is the closest representable value.

And GCC does this correctly in my experiments. If I compile this:

  __bf16 w = 256.0bf16;
  __bf16 x = 257.0bf16;

with "gcc -O2 -S", the assembly code says ".value 17280" for both constants.

[Bug middle-end/114347] wrong constant folding when casting __bf16 to int

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114347

--- Comment #1 from Andrew Pinski  ---
Hmm, I am not so sure that 257.0bf16 gets rounded to 256.

[Bug c/114347] New: wrong constant folding when casting __bf16 to int

2024-03-14 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114347

Bug ID: 114347
   Summary: wrong constant folding when casting __bf16 to int
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eggert at cs dot ucla.edu
  Target Milestone: ---

This is gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6) on Fedora 39. I found this
bug when looking into a GNU coreutils report 
originally reported against Clang (Clang has a different bug).

Compile and run this program:

  __bf16 x = 257.0bf16;
  int
  main (void)
  {
return (int) x != (int) 257.0bf16;
  }

with "gcc -O2 v.c; ./a.out; echo $?". This prints "1"; it should print "0".

The problem is that GCC constant-folds '(int) 257.0bf16' to 257. This is
incorrect, as 257.0bf16 is exactly equal to 256.0bf16, due to rounding when the
constant is parsed. The expression '(int) x' correctly yields 256 at runtime,
and 256 is not equal to the 257 incorrectly yielded by the constant folding.

[Bug libstdc++/81482] by-value lambda capture in remove_if

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81482

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Andrew Pinski  ---
Won't fix as this is the correct behavior and all.

[Bug target/81759] Improve data tracking for _pext_u64

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81759

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2017-08-08 00:00:00 |2024-3-14

--- Comment #4 from Andrew Pinski  ---
_pext_u64 is still opaque idea to gimple.

[Bug target/81759] Improve data tracking for _pext_u64 and __builtin_ffsll

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81759

--- Comment #3 from Andrew Pinski  ---
(In reply to Daniel Fruzynski from comment #2)
> Looks that __builtin_ffs does not check if input value is nonzero at all.
> Assembler code for following code also has unnecessary instructions:
> 
> [code]
> unsigned int test(unsigned int n)
> {
>   if (n == 0)
> __builtin_unreachable();
>   return __builtin_ffs(n) - 1;
> }
> [/code]

The above is now handled since GCC 11.

[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346

Bug ID: 114346
   Summary: vectorizer generates the same IV twice
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example:

---
double f(int n, double *data, double b) {
double res = b;

for (int i=0;i
  _48 = vect_vec_iv_.7_45 + { POLY_INT_CST [2, 2], ... };
  _71 = VIEW_CONVERT_EXPR(vect_vec_iv_.7_45);
  _72 = VIEW_CONVERT_EXPR({ POLY_INT_CST [4, 4],
... });
  _73 = _71 + _72;
  _49 = VIEW_CONVERT_EXPR(_73);

so it looks like _48 and _49 are the same value, except that _48 is done as
32-bit IV and _49 is calculated as a 64-bit one and truncated to 32?

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=106365

--- Comment #2 from Andrew Pinski  ---
Oh VN does have some knowledge of MASK_STORE and LEN_STORE. Just not LOAD_LANES
.


See PR 106365 for MASK_STORE and LEN_STORE implementation. Shouldn't be hard to
add LOAD_LANES/STORE_LANES there ...

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-03-15
 CC||pinskia at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
Confirmed.  I thought there was already a bug recording this but I can't find
it.

[Bug c++/83777] Invalid dependent initialization of a static data member.

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83777

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2018-01-11 00:00:00 |2024-3-14
 Status|WAITING |NEW
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=86565

--- Comment #3 from Andrew Pinski  ---
I am not 100% sure if rejecting this is required if C:BlockSize is not
used.

[Bug tree-optimization/114345] New: FRE missing knowledge of semantics of IFN loads

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345

Bug ID: 114345
   Summary: FRE missing knowledge of semantics of IFN loads
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following testcase:

---
long tdiff = 10412095;

int main() {
  struct {
long maximum;
int nonprimary_delay;
  } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}};

  for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i)
if (tdiff <= delays[i].maximum)
  return delays[i].nonprimary_delay;

  __builtin_abort();
}
---

compiled with -O2 -fno-vect-cost-model

generates on AArch64:

  vect_cst__45 = {tdiff.0_2, tdiff.0_2};
  vect_array.11 = .LOAD_LANES (MEM  [(long int *)]);
  vect__1.12_40 = vect_array.11[0];
  vect_array.11 ={v} {CLOBBER};
  vect_array.14 = .LOAD_LANES (MEM  [(long int *) + 32B]);
  vect__1.15_43 = vect_array.14[0];
  vect_array.14 ={v} {CLOBBER};
  mask_patt_15.17_46 = vect__1.12_40 >= vect_cst__45;
  mask_patt_15.17_47 = vect__1.15_43 >= vect_cst__45;
  vexit_reduc_51 = mask_patt_15.17_46 | mask_patt_15.17_47;

and on x86_64:

  vect_cst__53 = {tdiff.0_2, tdiff.0_2};
  _37 = { 0, 4294967295, 4294967294, 4294967293 };
  _32 = { 4, 5, 6, 7 };
  vect__1.11_42 = MEM  [(long int *)];
  vectp_delays.9_43 =  + 16;
  vect__1.12_44 = MEM  [(long int *)vectp_delays.9_43];
  vect_perm_even_45 = VEC_PERM_EXPR ;
  vectp_delays.9_47 =  + 32;
  vect__1.13_48 = MEM  [(long int *)vectp_delays.9_47];
  vectp_delays.9_49 =  + 48;
  vect__1.14_50 = MEM  [(long int *)vectp_delays.9_49];
  vect_perm_even_51 = VEC_PERM_EXPR ;
  mask_patt_17.15_54 = vect_perm_even_45 >= vect_cst__53;
  mask_patt_17.15_55 = vect_perm_even_51 >= vect_cst__53;
  vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55;

which is eventually simplified by FRE into:

  vect_cst__53 = {tdiff.0_2, tdiff.0_2};
  mask_patt_17.15_54 = vect_cst__53 <= { 0, 0 };
  mask_patt_17.15_55 = vect_cst__53 <= { 0, 9223372036854775807 };
  vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55;

and realizing that the loads aren't needed.

It looks like the reason is that FRE doesn't understand LOAD_LANES and
MASKED_LOAD_LANES or the other load IFNs.

We thus end up with a spill to the stack and a load of the constants.

[Bug other/70268] add option -ffile-prefix-map to map one directory name (old) to another (new) in __FILE__, __BASE_FILE__and __builtin_FILE()

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70268

Andrew Pinski  changed:

   What|Removed |Added

 CC||joerg at netbsd dot org

--- Comment #19 from Andrew Pinski  ---
*** Bug 47047 has been marked as a duplicate of this bug. ***

[Bug preprocessor/47047] Support for path translation in __FILE__

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47047

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Andrew Pinski  ---
Yes this is a dup.

*** This bug has been marked as a duplicate of bug 70268 ***

[Bug tree-optimization/88823] ivopts introduces -1(OVF)

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88823

--- Comment #2 from Andrew Pinski  ---
Looks to be fixed on the trunk.

[Bug tree-optimization/88926] ivopts with some NOP conversions

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88926

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2019-01-21 00:00:00 |2024-3-14

--- Comment #3 from Andrew Pinski  ---
Note you need to do `s/*p = 0;/*p = 1;/` otherwise you end up with just memset.

[Bug ipa/89567] [missed-optimization] Should not be initializing unused struct parameter members

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #6 from Andrew Pinski  ---
IPA-SRA does handle this if the function is static.

Also mod-ref handles this if the function takes a pointer instead of a struct.
Aka:
```
 struct two_ints { int x, y; };

  __attribute__((noinline)) int foo2(struct two_ints *s)
  {
return s->x;
  }

  int bar2(int* a)
  {
struct two_ints ti = { a[5], a[10] };
int b = foo2();
return b * b;
  }
```

The store to ti.x is only there now.

I am think this can be closed as fixed ...

[Bug rtl-optimization/43473] hword size destination variable induces suboptimal code generation compared to full word size var

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43473

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #3 from Andrew Pinski  ---
The difference is.
for ma:
```
(insn 11 10 13 2 (set (reg:SI 122)
(ior:SI (subreg:SI (reg:HI 119 [ a ]) 0)
(const_int -16384 [0xc000]))) "/app/example.cpp":9:11
110 {*iorsi3_insn}
 (expr_list:REG_DEAD (reg:HI 119 [ a ])
(nil)))
```

vs for mb:
```
(insn 9 8 10 2 (set (reg:SI 121 [ _3 ])
(ior:SI (reg:SI 120 [ b ])
(const_int 49152 [0xc000]))) "/app/example.cpp":16:22 110
{*iorsi3_insn}
 (expr_list:REG_DEAD (reg:SI 120 [ b ])
(nil)))
```

[Bug rtl-optimization/29860] comment / code incosistency in cfgcleanup.c:flow_find_cross_jump

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29860

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=20070
 CC|gcc-bugs at gcc dot gnu.org|

--- Comment #1 from Andrew Pinski  ---
r0-127486-ga0cbe71e87398b changed the code slightly .

It was originally added with r0-39385-g08f7f057cc4762 and then moved slightly
by r0-39553-gd1ee6d9bb7d372.

I don't know if Honza would remember any of this code though since it was done
over 20 years ago.

[Bug target/111555] [AArch64] __ARM_FEATURE_UNALIGNED should be undefined with -mstrict-align

2024-03-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111555

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #5 from Fangrui Song  ---
It seems that newer ports prefer -mstrict-align/-mno-strict-align to
-mno-unaligned-access/-munaligned-access.
Clang handling these options as aliases is unfortunate. I'll fix this issue in 
https://github.com/llvm/llvm-project/pull/85350 (hopefully milestone: 19.1.0)

[Bug target/114334] [14 Regression] ICE: in extract_insn, at recog.cc:2812 (unrecognizable insn and:HF?) with lroundf16() and -ffast-math -mavx512fp16

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114334

Hongtao Liu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-03-15
 CC||liuhongt at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Hongtao Liu  ---
Mine

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027

--- Comment #15 from Hongtao Liu  ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647604.html

[Bug testsuite/114343] [13 regression] many erratic errors starting with r13-8433-g1277f69b9b0206

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114343

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-15
   Keywords||testsuite-fail
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
.

[Bug target/114344] [arm/mips] __alignof__ report a member packed struct as 1, while normal load/store instruction is used

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114344

--- Comment #2 from Andrew Pinski  ---
Note __alignof__ might say 1, but alignof vs what GCC knows the alignment of
the decl are 2 different things.

[Bug target/114344] [arm/mips] __alignof__ report a member packed struct as 1, while normal load/store instruction is used

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114344

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
(insn 5 4 6 (set (reg/f:SI 115)
(symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) "/app/example.cpp":13:13
-1
 (nil))

(insn 6 5 7 (set (reg:SI 116)
(const_int 305419896 [0x12345678])) "/app/example.cpp":13:13 -1
 (nil))

(insn 7 6 0 (set (mem/v/c:SI (plus:SI (reg/f:SI 115)
(const_int 4 [0x4])) [2 sD.6114.iD.6110+0 S4 A32])
(reg:SI 116)) "/app/example.cpp":13:13 -1
 (nil))

.align  2
.set.LANCHOR0,. + 0
.type   s, %object
.size   s, 28
s:

The s variable is still aligned to 4 bytes so s.i is still aligned.

Doing:
`s __attribute__((aligned(1)));`

Makes the variable s unknown alignment and we get the multiple stores then.

[Bug tree-optimization/106119] [12 Regression] Bogus use-after-free warning triggered by optimizer

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106119

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 Regression] Bogus |[12 Regression] Bogus
   |use-after-free warning  |use-after-free warning
   |triggered by optimizer  |triggered by optimizer
 CC||law at gcc dot gnu.org

--- Comment #7 from Jeffrey A. Law  ---
Works with gcc-13 and the trunk.  Adjusting regression markers.

[Bug tree-optimization/106238] [12 regression] Inline optimization causes dangling pointer warning on "include/c++/12.1.0/bits/stl_tree.h"

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106238

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 regression]   |[12 regression] Inline
   |Inline optimization causes  |optimization causes
   |dangling pointer warning on |dangling pointer warning on
   |"include/c++/12.1.0/bits/st |"include/c++/12.1.0/bits/st
   |l_tree.h"   |l_tree.h"
 CC||law at gcc dot gnu.org

--- Comment #11 from Jeffrey A. Law  ---
Adjusting regression markers per c#10.

[Bug target/106342] [12/13/14 Regression] internal compiler error: in extract_insn, at recog.cc:2791 since r12-4240-g2b8453c401b699

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106342

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #12 from Jeffrey A. Law  ---
Per c#8 and c#10.

[Bug target/114344] New: [arm/mips] __alignof__ report a member packed struct as 1, while normal load/store instruction is used

2024-03-14 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114344

Bug ID: 114344
   Summary: [arm/mips] __alignof__ report a member packed struct
as 1, while normal load/store instruction is used
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: syq at gcc dot gnu.org
  Target Milestone: ---

#include 

volatile
struct s
{
  char c[4];
  int i;
  long long l;
  float f;
  double d;
} __attribute__ ((packed)) s;

int main() {
s.i = 0x12345678;
printf ("%zd\n", __alignof__ (s.i));
}


For this code, `1` is printed as the align of s.i. While on MIPS and ARM,
normal instructions are emitted.


$ mipsel-linux-gnu-gcc -Wall -mabi=32 -c -O3 xx.c -mips32r2
$ objdump -d xx.o
  ...
  34:   ac620004sw  v0,4(v1)
  ...
If `__alignof__ (s.i)` reports correctly, `swl/swr` should be used instead of
`sw`.

And for 
$ mipsel-linux-gnu-gcc -c -O3 xx.c -mips32r6 -mno-unaligned-access
$ arm-linux-gnueabihf-gcc -c -O3 xx.c -mno-unaligned-access

4 store_byte instruction should be used, while in fact, `sw/str` is used here.

[Bug tree-optimization/106757] [12/13 Regression] Incorrect "writing 1 byte into a region of size 0" on a vectorized loop

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106757

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 Regression]   |[12/13 Regression]
   |Incorrect "writing 1 byte   |Incorrect "writing 1 byte
   |into a region of size 0" on |into a region of size 0" on
   |a vectorized loop   |a vectorized loop
 CC||law at gcc dot gnu.org

--- Comment #6 from Jeffrey A. Law  ---
Works correctly on the trunk.  Adjusting regression markers.

[Bug sanitizer/113430] [11/12/13 only] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430

--- Comment #10 from Sam James  ---
I don't plan on pursuing it myself, leaving it to someone else, as I can't
reproduce on my main workstation and I don't want to faff w/ kernel config.

[Bug sanitizer/113430] [11/12/13 only] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430

--- Comment #9 from Sam James  ---
Created attachment 57708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57708=edit
0001-libsanitizer-fix-ASAN-with-aggressive-CONFIG_ARCH_MM.patch

Untested patch for 13.

[Bug tree-optimization/106842] [12 Regression] misleading warning : iteration X invokes undefined behavior

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106842

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 Regression]   |[12 Regression] misleading
   |misleading warning :|warning : iteration X
   |iteration X invokes |invokes undefined behavior
   |undefined behavior  |
 CC||law at gcc dot gnu.org

--- Comment #8 from Jeffrey A. Law  ---
Fixed in gcc-13.

[Bug tree-optimization/106931] [12 Regression] -Wstringop-overflow false positive -O3 -fno-tree-vectorize with loop unrolling since r12-3300-gece28da924ddda8b

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106931

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 Regression]   |[12 Regression]
   |-Wstringop-overflow false   |-Wstringop-overflow false
   |positive  -O3   |positive  -O3
   |-fno-tree-vectorize with|-fno-tree-vectorize with
   |loop unrolling since|loop unrolling since
   |r12-3300-gece28da924ddda8b  |r12-3300-gece28da924ddda8b
 CC||law at gcc dot gnu.org

--- Comment #4 from Jeffrey A. Law  ---
False positive is fixed w/ gcc-13 and the trunk.

[Bug c++/107138] [12 regression] std::variant triggers false-positive 'may be used uninitialized' warning

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107138

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[12/13/14 regression]   |[12 regression]
   |std::variant triggers   |...> triggers
   |false-positive 'may be used |false-positive 'may be used
   |uninitialized' warning  |uninitialized' warning
 CC||law at gcc dot gnu.org

--- Comment #14 from Jeffrey A. Law  ---
Works with gcc-13 and gcc-14.  Adjusting regression markers.

[Bug sanitizer/89323] Asan memory leak detection on 32bit x86 linux platform

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89323

--- Comment #7 from Andrew Pinski  ---
This might work but I can't test it with x32:
```
diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 77a0e68222b..eb99edefbd3 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -25,9 +25,9 @@ case "${target}" in
   x86_64-*-freebsd* | i?86-*-freebsd*)
;;
   x86_64-*-linux* | i?86-*-linux*)
+   LSAN_SUPPORTED=yes
if test x$ac_cv_sizeof_void_p = x8; then
TSAN_SUPPORTED=yes
-   LSAN_SUPPORTED=yes
TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_amd64.lo
HWASAN_SUPPORTED=yes
fi

```

[Bug sanitizer/113430] [12/13 only] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430

Andrew Pinski  changed:

   What|Removed |Added

Summary|Trivial program segfaults   |[12/13 only] Trivial
   |intermittently with ASAN|program segfaults
   |with large  |intermittently with ASAN
   |CONFIG_ARCH_MMAP_RND_BITS   |with large
   |in kernel configuration |CONFIG_ARCH_MMAP_RND_BITS
   ||in kernel configuration
   Target Milestone|--- |12.4

[Bug sanitizer/113430] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration

2024-03-14 Thread dmjpp at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430

Dimitrij Mijoski  changed:

   What|Removed |Added

 CC||dmjpp at hotmail dot com

--- Comment #8 from Dimitrij Mijoski  ---
This bug manifested at large on Github Actions CI/CI system in the last few
days most likely because Ubuntu's kernel also got updated to use 32 random
bits. Here is the bug report
https://github.com/actions/runner-images/issues/9491 . It would be a good idea
to backport the fix.

[Bug testsuite/114343] [13 regression] many erratic errors starting with r13-8433-g1277f69b9b0206

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114343

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.3
   Assignee|dmalcolm at gcc dot gnu.org|unassigned at gcc dot 
gnu.org
  Component|analyzer|testsuite
   Severity|normal  |blocker

[Bug analyzer/114343] [13 regression] many erratic errors starting with r13-8433-g1277f69b9b0206

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114343

--- Comment #1 from Andrew Pinski  ---
No there is a missing `}` in the line that was done for
testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c :


/* { dg-bogus "may result in an unaligned pointer value" "Fixed in
r14-6517-gb7e4a4c626e" { xfail short_enums } */

There is a missing `}` in that comment.

[Bug analyzer/114343] New: [13 regression] many erratic errors starting with r13-8433-g1277f69b9b0206

2024-03-14 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114343

Bug ID: 114343
   Summary: [13 regression] many erratic errors starting with
r13-8433-g1277f69b9b0206
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:1277f69b9b020688618bd034d3ceb03395e84326, r13-8433-g1277f69b9b0206

I am seeing many erratic errors starting with this revision.  Examples from two
different runs on the same revision:

FAIL: gcc.dg/20040622-2.c (test for excess errors)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 24)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 34)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 56)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 66)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 73)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 77)
FAIL: gcc.dg/cpp/defined.c  (test for errors, line 82)
FAIL: gcc.dg/cpp/include2.c  (test for errors, line 10)
FAIL: gcc.dg/cpp/include2.c (test for excess errors)
FAIL: gcc.dg/cpp/line2.c line # too high at line 13 (test for errors, line 2)
FAIL: gcc.dg/cpp/line2.c line # too low at line 12 (test for errors, line 1)
FAIL: gcc.dg/cpp/mac-dir-2.c  (test for errors, line 12)
FAIL: gcc.dg/cpp/multiline-2.c (test for excess errors)
FAIL: gcc.dg/cpp/skipping2.c (test for excess errors)
FAIL: gcc.dg/cpp/skipping2.c tokens after #endif (test for errors, line 13)
ERROR: tcl error sourcing
/home/seurer/gcc/git/gcc-13/gcc/testsuite/gcc.dg/analyzer/analyzer.exp.
ERROR: unmatched open brace in list


FAIL: gcc.dg/cpp/include2.c  (test for errors, line 10)
FAIL: gcc.dg/cpp/include2.c (test for excess errors)
FAIL: gcc.dg/cpp/include2a.c  (test for errors, line 10)
FAIL: gcc.dg/cpp/include2a.c (test for excess errors)
FAIL: gcc.dg/cpp/include2a.c missing at line 15 (test for errors, line )
FAIL: gcc.dg/cpp/multiline-2.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/20030920-1.c (test for excess errors)
ERROR: tcl error sourcing
/home/seurer/gcc/git/gcc-13-test/gcc/testsuite/gcc.dg/analyzer/analyzer.exp.
ERROR: unmatched open brace in list


The errors are like this:

Excess errors:
/home/seurer/gcc/git/gcc-13-test/gcc/testsuite/gcc.dg/cpp/include2.c:10:18:
warning: extra tokens at end of #include directive

Line 10 has a comment after the #include:

#include >  /* { dg-error "extra tokens" } */

or maybe it is now mishandling the \>.

Excess errors:
/home/seurer/gcc/git/gcc-13-test/gcc/testsuite/gcc.dg/cpp/include2a.c:10:18:
warning: extra tokens at end of #include directive
/home/seurer/gcc/git/gcc-13-test/gcc/testsuite/gcc.dg/cpp/include2a.c:10:48:
warning: missing terminating " character


The specific ones that fail vary from run to run but that last ERROR one is
always there.

commit 1277f69b9b020688618bd034d3ceb03395e84326 (HEAD)
Author: Torbjrn SVENSSON 
Date:   Sat Mar 9 09:40:07 2024 +0100

testsuite: xfail test for short_enums

[Bug sanitizer/89323] Asan memory leak detection on x86 platform

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89323

Andrew Pinski  changed:

   What|Removed |Added

 Target||i?86
 Status|WAITING |NEW

--- Comment #6 from Andrew Pinski  ---
  x86_64-*-linux* | i?86-*-linux*)
if test x$ac_cv_sizeof_void_p = x8; then
TSAN_SUPPORTED=yes
LSAN_SUPPORTED=yes
TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_amd64.lo
HWASAN_SUPPORTED=yes
fi

[Bug target/113934] Switch avr to LRA

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113934

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-03-15

--- Comment #3 from Andrew Pinski  ---
.

[Bug middle-end/59863] const array in function is placed on stack

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59863

Andrew Pinski  changed:

   What|Removed |Added

 CC||barry.revzin at gmail dot com

--- Comment #8 from Andrew Pinski  ---
*** Bug 99091 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/99091] local array not prompted to static

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99091

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 59863 ***

[Bug target/95943] arc -mbig-endian "inappropriate arguments" error from assembler

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95943

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |11.0

--- Comment #1 from Andrew Pinski  ---
Fixed in GCC 11 by r11-5940-gf7ad4446274831 .

[Bug target/111555] [AArch64] __ARM_FEATURE_UNALIGNED should be undefined with -mstrict-align

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111555

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to YunQiang Su from comment #2)
> > For AArch64, clang supports `-mno-unaligned-access`, while gcc doesn't,
> > should we add it as an alias of -mstrict-align?
> 
> -mno-unaligned-access is the arm option here rather than the aarch64 option
> :).
> I suspect clang folks decided to have the same option for both targets. I
> don't think we should support -mno-unaligned-access for aarch64 GCC.

See PR 99890 where -mstrict-align was rejected for arm.

[Bug target/111555] [AArch64] __ARM_FEATURE_UNALIGNED should be undefined with -mstrict-align

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111555

--- Comment #3 from Andrew Pinski  ---
(In reply to YunQiang Su from comment #2)
> For AArch64, clang supports `-mno-unaligned-access`, while gcc doesn't,
> should we add it as an alias of -mstrict-align?

-mno-unaligned-access is the arm option here rather than the aarch64 option :).
I suspect clang folks decided to have the same option for both targets. I don't
think we should support -mno-unaligned-access for aarch64 GCC.

[Bug target/111555] [AArch64] __ARM_FEATURE_UNALIGNED should be undefined with -mstrict-align

2024-03-14 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111555

YunQiang Su  changed:

   What|Removed |Added

 CC||syq at gcc dot gnu.org

--- Comment #2 from YunQiang Su  ---
`-mno-unaligned-access` is also supported by MIPSr6.

I guess we should add a more generic macro for this case?
Is __UNALIGN_ACCESS_DISABLED__ OK?

For AArch64, clang supports `-mno-unaligned-access`, while gcc doesn't,
should we add it as an alias of -mstrict-align?

[Bug middle-end/114342] suboptimal codegen of vector::vector(range)

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
   Last reconfirmed||2024-03-14

--- Comment #2 from Andrew Pinski  ---
If you change `arr` to be: `static const int arr[]`. then GCC can do it with
only one memcpy.

So basically GCC does not know it remove arr from being a stack variable.

clang/LLVM is able to figure that out but it definitely requires inlining to do
that.

```
  arr = *.LC0;
...
  __builtin_memcpy (_21, , 444);
```

Basically GCC does not realize it can "remove" the local variable arr here.

Note there are duplicates of this bug report already too.

[Bug middle-end/114342] suboptimal codegen of vector::vector(range)

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

--- Comment #1 from Andrew Pinski  ---
The first memcpy (rep movsq) is for:
```
  int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,
15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10,
15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,};

```


The second memcpy is copying arr into the vector.
```
  _25 = operator new (444);

   [local count: 1073741824]:
  dd_2(D)->D.81462._M_impl.D.80768._M_start = _25;
  _16 = _25 + 444;
  dd_2(D)->D.81462._M_impl.D.80768._M_end_of_storage = _16;
  __builtin_memcpy (_25, , 444);
```

[Bug c++/114342] New: suboptimal codegen of vector::vector(range)

2024-03-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

Bug ID: 114342
   Summary: suboptimal codegen of vector::vector(range)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include 

std::vector td() {
  int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,
15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10,
15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,};
  auto b = std::ranges::begin(arr);
  auto e = std::ranges::end(arr);
  std::vector dd(b, e);
  return dd;
}

What is the reason for calling `rep movsq` twice?

$ gcc -O3 -std=c++23
```
td():
pushrbp
mov esi, OFFSET FLAT:.LC0
mov ecx, 55
pxorxmm0, xmm0
pushrbx
mov rbx, rdi
sub rsp, 456
mov QWORD PTR [rbx+16], 0
mov rbp, rsp
movups  XMMWORD PTR [rbx], xmm0
mov rdi, rbp
rep movsq
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov edi, 444
calloperator new(unsigned long)
lea rdx, [rax+444]
mov QWORD PTR [rbx], rax
lea rdi, [rax+8]
mov rsi, rbp
mov QWORD PTR [rbx+16], rdx
mov rcx, QWORD PTR [rsp]
and rdi, -8
mov QWORD PTR [rax], rcx
mov rcx, QWORD PTR [rsp+436]
mov QWORD PTR [rax+436], rcx
sub rax, rdi
sub rsi, rax
add eax, 444
shr eax, 3
mov ecx, eax
mov rax, rbx
rep movsq
mov QWORD PTR [rbx+8], rdx
add rsp, 456
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L2
td() [clone .cold]:
.L2:
mov rdi, QWORD PTR [rbx]
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
testrdi, rdi
je  .L3
calloperator delete(void*, unsigned long)
.L3:
mov rdi, rbp
call_Unwind_Resume
```

https://godbolt.org/z/5333db8Px

[Bug tree-optimization/114341] Optimization opportunity with {mul,div} "(x & -x)" and {<<,>>} "ctz(x)"

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114341

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-14
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||missed-optimization
 CC||pinskia at gcc dot gnu.org
   Severity|normal  |enhancement

--- Comment #2 from Andrew Pinski  ---
Confirmed for the multiply as I mentioned the divide is already recorded.

[Bug tree-optimization/114341] Optimization opportunity with {mul,div} "(x & -x)" and {<<,>>} "ctz(x)"

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114341

--- Comment #1 from Andrew Pinski  ---
x/(y&-y) is already recorded as PR 97738 .

[Bug tree-optimization/114341] New: Optimization opportunity with {mul,div} "(x & -x)" and {<<,>>} "ctz(x)"

2024-03-14 Thread Explorer09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114341

Bug ID: 114341
   Summary: Optimization opportunity with {mul,div} "(x & -x)" and
{<<,>>} "ctz(x)"
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

This is an optimization opportunity that I'm not sure it's worth
implementing in gcc, since I only used the (x / (x & -x)) pattern on
compile time constants only.

When x and y are unsigned integers and the value of y is non-zero,
then (x / (y & -y)) and (x >> __builtin_ctz(y)) are equivalent.

Similarly, (x * (y & -y)) and (x << __builtin_ctz(y)) are equivalent.

One reason for using the (x / (y & -y)) pattern is that it's more
portable among C compilers before C23 made a standard "CTZ" API
(stdc_trailing_zeros) for everyone. Even though we have
stdc_trailing_zeros() now, the (x / (y & -y)) pattern is still useful
for constant expressions when stdc_trailing_zeros() might not be a
compiler built in.

Processors that support CTZ instructions would optimize (x / (y & -y))
to (x >> __builtin_ctz(y)); processors that do not support CTZ would
optimize the other way around. (I know RISC-V might need the latter
way of optimization.)

```c
unsigned int func1a(unsigned int x, unsigned int y) {
  if (y == 0)
return -1; /* placeholder value to indicate error */

  return x / (y & -y);
}

unsigned int func1b(unsigned int x, unsigned int y) {
  if (y == 0)
return -1; /* placeholder value to indicate error */

  return x >> __builtin_ctz(y);
}

unsigned int func2a(unsigned int x, unsigned int y) {
  if (y == 0)
return -1; /* placeholder value to indicate error */

  return x * (y & -y);
}

unsigned int func2b(unsigned int x, unsigned int y) {
  if (y == 0)
return -1; /* placeholder value to indicate error */

  return x << __builtin_ctz(y);
}
```

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548

--- Comment #25 from Jeffrey A. Law  ---
Well, at least in theory SPEC isn't supposed to be changing the sources or
validation criteria on us.  So while our copy may be old, I would expect it's
still the same as Filip's.

That doesn't resolve any issues here though.  It's not clear how best to
proceed.

[Bug c/82599] Assignments from statically initialized flexible arrays copy too much

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82599

Andrew Pinski  changed:

   What|Removed |Added

  Component|middle-end  |c

--- Comment #2 from Andrew Pinski  ---
Both the C and C++ front-end now produce the wrong code.

[Bug target/114339] [13/14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #15 from Jakub Jelinek  ---
Created attachment 57707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57707=edit
gcc14-pr114339.patch

Untested fix.

[Bug tree-optimization/114340] New: ` X / CST < X` -> `X > 0`

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114340

Bug ID: 114340
   Summary: ` X / CST < X` -> `X > 0`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

int f(int x)
{
return x / 100 < x;
}

Can be transformed into just:
int f(int x)
{
return x > 0;
}
For signed types.

For unsigned types, it is `x != 0` (but that is also `x > 0` really).

[Bug target/114339] [13/14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek  ---
Indeed, r13-3803-gfa271afb584230, so mine.

[Bug target/114339] [13/14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #13 from Jakub Jelinek  ---
Nice, further cleaned up:
/* PR target/114339 */
/* { dg-do run } */
/* { dg-options "-O2 -Wno-psabi" } */
/* { dg-additional-options "-mavx" { target avx_runtime } } */

typedef long V __attribute__((vector_size (16)));

__attribute__((noipa)) V
foo (V a)
{
  return a <= (V) {0, __LONG_MAX__ };
}

int
main ()
{
  V t = foo ((V) { 0, 0 });
  if (t[0] != -1L || t[1] != -1L)
__builtin_abort ();
}

[Bug target/114339] [13/14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #12 from Andrew Pinski  ---
I suspecting r13-3803-gfa271afb584230 which missed the border case of
INT_MAX/INT_MIN .

[Bug target/114339] [13/14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||10.1.0, 12.1.0, 12.3.0,
   ||7.1.0
Summary|[14 regression] Tor |[13/14 regression] Tor
   |miscompiled with -O2 -mavx  |miscompiled with -O2 -mavx
   |-fno-vect-cost-model since  |-fno-vect-cost-model since
   |r14-6822|r14-6822
  Known to fail||13.1.0
   Target Milestone|14.0|13.3

--- Comment #11 from Andrew Pinski  ---
My reduced testcase started to fail in GCC 13.

[Bug target/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Andrew Pinski  changed:

   What|Removed |Added

 Target||x86_64-linux-gnu
  Component|tree-optimization   |target

--- Comment #10 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #9)
> Reduced testcase:

This works without -mavx and fails with -mavx even.

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #9 from Andrew Pinski  ---
Reduced testcase:
```
#define vect128 __attribute__((vector_size(16)))

[[gnu::noinline]]
vect128 long f(vect128 long a)
{
return a <= (vect128 long){0, 9223372036854775807};
}

int main()
{
  vect128 long t = (vect128 long){0, 0};
  t = f(t);
  if (!t[1])
  __builtin_abort();

}
```

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #8 from Jakub Jelinek  ---
Slightly simplified/cleaned up testcase:
/* { dg-do run } */
/* { dg-options "-O2 -fno-vect-cost-model" } */
/* { dg-additional-options "-mavx" { target avx_runtime } } */

struct S { int a; long b; int c; } s;

__attribute__((noipa)) struct S *
foo (void)
{
  return 
}

int
main ()
{
  struct S r = *foo ();
  long t = 10412095L - r.b;
  struct { long d; int e; } f[4] = { {}, {}, {}, { __LONG_MAX__, 0 } };
  for (unsigned i = 0; i < 4; ++i)
if (t <= f[i].d)
  return f[i].e;
  __builtin_abort ();
}

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #7 from Andrew Pinski  ---

This looks wrong:
```
;; mask_patt_17.15_55 = vect_cst__53 <= { 0, 9223372036854775807 };

(insn 21 20 22 (set (reg:V2DI 111)
(mem/u/c:V2DI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S16 A128]))
-1
 (expr_list:REG_EQUAL (const_vector:V2DI [
(const_int 1 [0x1])
(const_int -9223372036854775808 [0x8000])
])
(nil)))

(insn 22 21 23 (set (reg:V2DI 112)
(gt:V2DI (reg:V2DI 111)
(reg:V2DI 100 [ vect_cst__53 ]))) -1
 (nil))

(insn 23 22 0 (set (reg:V2DI 102 [ mask_patt_17.15D.3121 ])
(reg:V2DI 112)) -1
 (nil))
```

we go from `a <= INT_MAX` to `INT_MIN > a` which is basically going from `true`
to `a != INT_MIN`.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548

--- Comment #24 from Robin Dapp  ---
I rebuilt GCC from scratch with your options but still have the same problem. 
Could our sources differ?  My SPEC version might not be the most recent but I'm
not aware that mcf changed at some point.

Just to be sure: I'm using r14-5075-gc05f748218a0d5 as the "before" commit.

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #6 from Tamar Christina  ---
vectorizer generates:

  mask_patt_21.19_58 = vect_perm_even_49 >= vect_cst__57;
  mask_patt_21.19_59 = vect_perm_even_55 >= vect_cst__57;
  vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59;
  if (vexit_reduc_63 != { 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

This is changed at loopdone into:

  delays[3].nonprimary_delay = 129600;
  vect_cst__57 = {tdiff_6, tdiff_6};
  mask_patt_21.19_58 = vect_cst__57 <= { 0, 0 };
  mask_patt_21.19_59 = vect_cst__57 <= { 0, 0x7FFF };
  vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59;
  if (vexit_reduc_63 != { 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

or in other words, if there's any value where the compare succeeds, find it and
return.
This looks correct to me.

It could be that my AVX is rusty but, this generates:

   vmovdqa 0xf9c(%rip),%xmm1# 0x402010 
   mov$0x1,%eax
   vmovq  %rcx,%xmm3
   vmovdqa %xmm0,(%rsp)
   vpunpcklqdq %xmm3,%xmm3,%xmm2
   vmovdqa %xmm0,0x10(%rsp)
   vpcmpgtq %xmm2,%xmm1,%xmm1#
   vmovdqa %xmm0,0x20(%rsp)
   vmovq  %rax,%xmm0
   vpunpcklqdq %xmm0,%xmm0,%xmm0
   movl   $0x1fa40,0x38(%rsp)
   vpcmpgtq %xmm2,%xmm0,%xmm0
   vpor   %xmm1,%xmm0,%xmm0
   vptest %xmm0,%xmm0

which looks off, particularly for the second compare it look like it doesn't do
a load but instead just duplicates the constant 1.
gdb seems to confirm this. At the first compare:

(gdb) p $xmm2.v2_int64
$4 = {10412095, 10412095}
(gdb) p $xmm0.v2_int64
$5 = {0, 0}

which is what's expected, but at the second compare:

(gdb) p $xmm2.v2_int64
$7 = {10412095, 10412095}
(gdb) p $xmm0.v2_int64
$6 = {1, 1}

at the second it's comparing {1, 1} instead of {0, 0x7FFF}.

on AArch64 where it doesn't fail the comparison is:

   moviv29.4s, 0
   add x1, sp, 16
   ldr x5, [x0, 8]
   mov w0, 64064
   movkw0, 0x1, lsl 16
   add x3, sp, 48
   str q29, [sp, 64]
   mov x2, 57407
   mov x4, 9223372036854775807
   str x4, [sp, 64]
   movkx2, 0x9e, lsl 16
   str w0, [sp, 72]
   sub x2, x2, x5
   stp q29, q29, [x1]
   dup v27.2d, x2
   ld2 {v30.2d - v31.2d}, [x1]
   str q29, [sp, 48]
   ld2 {v28.2d - v29.2d}, [x3]
   cmgev30.2d, v30.2d, v27.2d
   cmgev28.2d, v28.2d, v27.2d
   orr v30.16b, v30.16b, v28.16b
   umaxp   v30.4s, v30.4s, v30.4s
   fmovx0, d30
   cbnzx0, .L12

which has v30.2d being {0, 0} and v28.2d being {0, 0x7FFF} as
expected...

On AArch64 we don't inline the constants because whatever is propagating the
constants can't understand the LOAD_LANES:

  mask_patt_19.21_50 = vect__2.16_44 >= vect_cst__49;
  mask_patt_19.21_51 = vect__2.19_47 >= vect_cst__49;
  vexit_reduc_55 = mask_patt_19.21_50 | mask_patt_19.21_51;
  if (vexit_reduc_55 != { 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

so could this be another expansion bug?

Note that a simpler reproducer is this:

---
long tdiff = 10412095;

int main() {
  struct {
long maximum;
int nonprimary_delay;
  } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}};

  for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i)
if (tdiff <= delays[i].maximum)
  return delays[i].nonprimary_delay;

  __builtin_abort();
}
---

the key point is that we're not allowed to constprop tdiff at GIMPLE. If we do,
e.g:

int main() {
  struct {
long maximum;
int nonprimary_delay;
  } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}};
  long tdiff = 10412095;

  for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i)
if (tdiff <= delays[i].maximum)
  return delays[i].nonprimary_delay;

  __builtin_abort();
}

then after vectorization the const prop the entire expression is evaluated at
GIMPLE and it gets the right result.

This makes me believe it's a target expansion bug.

[Bug target/91861] invalid vectorization of isless, islessequal, etc. (with default of -ftrapping-math)

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91861

--- Comment #3 from Andrew Pinski  ---
*** Bug 94413 has been marked as a duplicate of this bug. ***

[Bug target/94413] auto-vectorization of isfinite raises FP exception

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94413

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 91861 ***

[Bug target/91861] invalid vectorization of isless, islessequal, etc. (with default of -ftrapping-math)

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91861

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|invalid vectorization of|invalid vectorization of
   |isless, islessequal, etc.   |isless, islessequal, etc.
   ||(with default of
   ||-ftrapping-math)
   Last reconfirmed||2024-03-14
 Status|UNCONFIRMED |NEW

--- Comment #2 from Andrew Pinski  ---
Confirmed.


(insn:TI 11 21 18 2 (set (reg:V4SF 20 xmm0 [108])
(unge:V4SF (reg:V4SF 20 xmm0 [orig:106 MEM  [(float
*)] ] [106])
(mem/c:V4SF (symbol_ref:DI ("y") [flags 0x2]  ) [1 MEM  [(float *)]+0 S16 A128]))) 2796
{sse_maskcmpv4sf3}
 (nil))

It should have been NLT_UQ (0x15) rather than NLT_US (NLT) (0x5)

But NLT_UQ does not exist for non-VEX encoded .
From
https://www.intel.com/content/dam/develop/external/us/en/documents/319433-024-697869.pdf

[Bug c++/109753] [13/14 Regression] pragma GCC target causes std::vector not to compile (always_inline on constructor)

2024-03-14 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753

--- Comment #15 from Jason Merrill  ---
Created attachment 57706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57706=edit
one approach

I tried just making implicit functions respect #pragma target, but that
regresses pr105306 due to seeming internal confusion over whether -Ofast or
#pragma optimize apply to the implicit ~C.  I haven't tracked that down yet.

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #6 from Andrew Pinski  ---
(In reply to Pali Rohár from comment #5)
> There is one problem with it. I had to "hardcode" x86_64-w64-mingw32-windres
> name instead of just "windres". How to declare cross compile prefix? Because
> gcc somehow for "as" automatically adds it as in spec file is just "as", not
> "x86_64-w64-mingw32-as".

See find_a_program in gcc.cc which detects special program names and then does
the transformation for them.
I suspect you should be able to add windres there in a similar way as dsymutil
(which IIRC is only used for darwin currently).

[Bug c/54454] gcc violates c99 specification w.r.t. flexible arrays

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54454

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|INVALID |DUPLICATE

--- Comment #4 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 9058 ***

[Bug c/9058] structure with flexible array member: offsetof() != sizeof()

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9058

Andrew Pinski  changed:

   What|Removed |Added

 CC||mikulas at artax dot 
karlin.mff.cu
   ||ni.cz

--- Comment #8 from Andrew Pinski  ---
*** Bug 54454 has been marked as a duplicate of this bug. ***

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-14 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #5 from Pali Rohár  ---
Thank you for info, I read that blog post and based on those details I adjusted
spec file

$ x86_64-w64-mingw32-gcc -dumpspecs > test.spec

by adding additional lines to test.spec:

.rc:
x86_64-w64-mingw32-windres -J rc -O coff -i %i %{c:%W{o*}%{!o*:-o
%w%b%O}}%{!c:-o %d%w%u%O}

.res:
x86_64-w64-mingw32-windres -J res -O coff -i %i %{c:%W{o*}%{!o*:-o
%w%b%O}}%{!c:-o %d%w%u%O}


rc files contains resources in text format and res files in binary format.

With these changes x86_64-w64-mingw32-gcc was able to take both .c and .rc file
on the input and produce .exe file with resource.

$ cat test.c
int main() { return 0; }

$ cat test.rc
1 VERSIONINFO
BEGIN
END

$ x86_64-w64-mingw32-gcc -specs=test.spec test.c test.rc -o test.exe


Now show resource stored in test.exe:

$ x86_64-w64-mingw32-windres -O rc test.exe /dev/stdout

/* Type: version

   Name: 1.  */
LANGUAGE 9, 1

1 VERSIONINFO
BEGIN
END


Also replacing text test.rc file by binary test.res file works.


There is one problem with it. I had to "hardcode" x86_64-w64-mingw32-windres
name instead of just "windres". How to declare cross compile prefix? Because
gcc somehow for "as" automatically adds it as in spec file is just "as", not
"x86_64-w64-mingw32-as".

[Bug c/91672] wrong amount of storage allocated for initialized structs with flexible array members

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91672

Andrew Pinski  changed:

   What|Removed |Added

 CC||pascal_cuoq at hotmail dot com

--- Comment #3 from Andrew Pinski  ---
*** Bug 109956 has been marked as a duplicate of this bug. ***

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #16 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 91672 ***

[Bug c/91672] wrong amount of storage allocated for initialized structs with flexible array members

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91672

--- Comment #2 from Andrew Pinski  ---
Note the .size does match up with what GCC outputs though:
e.g. a1:
.size   a1, 18
a1:
.xword  1
.hword  1
.hword  1
.zero   6

that is size of 18.
Basically gcc's padding is always 6 in size and not changing based on the size
that is needed there.

This is also normally how you get the size when allocating dynamically too:

sizeof(struct A) + N*sizeof(__INT16_TYPE__).

[Bug c/91672] wrong amount of storage allocated for initialized structs with flexible array members

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91672

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=102295

--- Comment #1 from Andrew Pinski  ---
Note the C++ sizes was recorded as PR 102295 .

[Bug target/114288] [14 regression] ICE when building binutils-2.41 on hppa (extract_constrain_insn, at recog.cc:2713)

2024-03-14 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114288

John David Anglin  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from John David Anglin  ---
Should be fixed.

[Bug modula2/114294] expression causes ICE

2024-03-14 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114294

Gaius Mulley  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Gaius Mulley  ---
Closing now that the patch has been applied.

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
Summary|[14 regression] Tor |[14 regression] Tor
   |miscompiled with -O2 -mavx  |miscompiled with -O2 -mavx
   |-fno-vect-cost-model|-fno-vect-cost-model since
   ||r14-6822
 Ever confirmed|0   |1
   Priority|P3  |P1
   Target Milestone|--- |14.0
   Last reconfirmed||2024-03-14
 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
Started with r14-6822-g01f4251b8775c832a92d55e2df57c9ac72eaceef

[Bug modula2/114294] expression causes ICE

2024-03-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114294

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Gaius Mulley :

https://gcc.gnu.org/g:6dbf0d252f69ab2924256e6778ba7dc55d5b6915

commit r14-9483-g6dbf0d252f69ab2924256e6778ba7dc55d5b6915
Author: Gaius Mulley 
Date:   Thu Mar 14 19:09:34 2024 +

PR modula2/114294 expression causes ICE

This patch fixes an ICE when encountering an expression:
1 + HIGH (a[0]).  The fix was to assign a type to the constant
created by BuildConstHighFromSym in M2Quads.mod.

gcc/m2/ChangeLog:

PR modula2/114294
* gm2-compiler/M2Quads.mod (BuildConstHighFromSym):
Call PutConst to assign the type Cardinal in the result
constant.

gcc/testsuite/ChangeLog:

PR modula2/114294
* gm2/pim/pass/log: Removed.
* gm2/pim/pass/highexp.mod: New test.

Signed-off-by: Gaius Mulley 

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #4 from Sam James  ---
(In reply to Sam James from comment #3)
> Created attachment 57705 [details]
> larger.i
> 
> Ah, wait, that might be a bad reduction. Let me attach a larger one, then I
> can give the original if needed too.

OK, no, I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339#c2 is fine.

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #3 from Sam James  ---
Created attachment 57705
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57705=edit
larger.i

Ah, wait, that might be a bad reduction. Let me attach a larger one, then I can
give the original if needed too.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #12 from Jakub Jelinek  ---
Yeah.  So the cases where we should do it is when we are reversing a narrowing
cast, or also something for the other PRs Andrew mentioned, like when reversing
BIT_AND_EXPR (but maybe also BIT_IOR_EXPR/BIT_XOR_EXPR, haven't thought that
out; maybe only if BIT_AND_EXPR has constant second argument?).
For that
  if ((i & 7) == 6)
in there aka
  _1 = i_2(D) & 7;
  if (_1 == 6)
...
we get [6, 6] range on that edge (with irange_bitmask again implicit), but if
we want to ask what the range of i_2(D) is we can ask for irange_bitmask to be
computed (MASK 0x0 VALUE 0x6) and for i_2(D) reverse the mask, i.e. MASK
0xfff8 VALUE 0x6;

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Sam James  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #2 from Sam James  ---
I've minimised it to:
```
struct entry_guard_t {
  int is_reachable;
  long failing_since;
  int is_primary;
} *create_guard() {
  struct entry_guard_t *guard = __builtin_malloc(sizeof *guard);
  guard->is_reachable = guard->failing_since = guard->is_primary = 0;
  return guard;
}
int main() {
  struct entry_guard_t guard = *create_guard();
  long tdiff = 10412095 - guard.failing_since;
  struct {
long maximum;
int nonprimary_delay;
  } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}};
  unsigned i = 0;
  for (; i < sizeof(delays) / sizeof(delays[0]); ++i)
if (tdiff <= delays[i].maximum)
  return delays[i].nonprimary_delay;
  __builtin_abort();
}
```

This fails for me with `-O2 -mavx -fno-vect-cost-model`.

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

--- Comment #1 from Sam James  ---
The assert is at
https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.8.10/src/feature/client/entrynodes.c#L2072

```
(gdb) p delays
$3 = {{
maximum = 21600,
primary_delay = 600,
nonprimary_delay = 3600
  }, {
maximum = 345600,
primary_delay = 5400,
nonprimary_delay = 14400
  }, {
maximum = 604800,
primary_delay = 14400,
nonprimary_delay = 64800
  }, {
maximum = 9223372036854775807,
primary_delay = 32400,
nonprimary_delay = 129600
  }}
(gdb)
```

The bad loop (confirmed w/ novector pragma) is:
  for (i = 0; i < ARRAY_LENGTH(delays); ++i) {
if (tdiff <= delays[i].maximum) {
  return is_primary ? delays[i].primary_delay : delays[i].nonprimary_delay;
}
  }

[Bug tree-optimization/114339] New: [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model

2024-03-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339

Bug ID: 114339
   Summary: [14 regression] Tor miscompiled with -O2 -mavx
-fno-vect-cost-model
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Tor fails its test suite with -O3 -march=znver2 -fno-vect-cost-model like:
```
$ ./configure CFLAGS="-O3 -march=znver2 -ggdb3 -fno-vect-cost-model"
--enable-all-bugs-are-fatal --disable-html-manual --disable-manpage
--disable-asciidoc --disable-memory-sentinels --disable-linker-hardening
--disable-seccomp --disable-libscrypt --disable-module-relay
--disable-module-dirauth --disable-module-pow && make -j$(nproc)
[...]
$ src/test/test --verbose entrynodes/node_filter --no-fork
assert(num_reachable_filtered_guards(gs, NULL) OP_EQ NUM): 7 vs 7Mar 14
09:50:56.182 [err] tor_assertion_failed_(): Bug:
src/feature/client/entrynodes.c:2072: get_retry_schedule: Assertion 0 failed;
aborting. (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: Tor 0.4.8.10: Assertion 0 failed in
get_retry_schedule at src/feature/client/entrynodes.c:2072: . Stack trace: (on
Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(log_backtrace_impl+0x58)
[0x55d7124029b8] (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(tor_assertion_failed_+0x14f)
[0x55d71241323f] (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(+0x4d80e7) [0x55d7122b80e7] (on Tor
0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug:
./test(entry_guards_update_filtered_sets+0x2c8) [0x55d7122bb048] (on Tor
0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(+0x2810be) [0x55d7120610be] (on Tor
0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(testcase_run_one+0x2f4)
[0x55d712239d74] (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(tinytest_main+0x218) [0x55d71223a5f8]
(on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(main+0x492) [0x55d711e77482] (on Tor
0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: /usr/lib64/libc.so.6(+0x25e6a)
[0x7f1b5816ce6a] (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: /usr/lib64/libc.so.6(__libc_start_main+0x85)
[0x7f1b5816cf25] (on Tor 0.4.8.10 )
Mar 14 09:50:56.183 [err] Bug: ./test(_start+0x21) [0x55d711e775a1] (on Tor
0.4.8.10 )


```
Program received signal SIGABRT, Aborted.
0x77489cdc in ?? () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x77489cdc in ?? () from /usr/lib64/libc.so.6
#1  0x77434032 in raise () from /usr/lib64/libc.so.6
#2  0x7741c4f2 in abort () from /usr/lib64/libc.so.6
#3  0x55b39010 in tor_raw_abort_ () at src/lib/err/torerr.c:225
#4  0x55b46e30 in tor_abort_ () at src/lib/log/util_bug.c:174
#5  0x55a0fcde in get_retry_schedule (failing_since=,
now=1710410426, is_primary=) at
src/feature/client/entrynodes.c:2072
#6  entry_guard_consider_retry (guard=guard@entry=0x55e6be60) at
src/feature/client/entrynodes.c:2089
#7  0x55a0fff0 in entry_guard_consider_retry (guard=0x55e6be60) at
src/feature/client/entrynodes.c:2084
#8  entry_guard_set_filtered_flags (options=options@entry=0x55e1a4b0,
gs=gs@entry=0x55e6b500, guard=0x55e6be60) at
src/feature/client/entrynodes.c:1737
#9  0x55a118aa in entry_guards_update_filtered_sets
(gs=gs@entry=0x55e6b500) at src/feature/client/entrynodes.c:1758
#10 0x557cded7 in test_entry_guard_node_filter (arg=) at
src/test/test_entrynodes.c:1037
#11 0x5599ece5 in testcase_run_bare_
(testcase=testcase@entry=0x55da13d8 ) at
src/ext/tinytest.c:107
#12 0x5599edb3 in testcase_run_one (group=group@entry=0x55d99380
, testcase=0x55da13d8 ) at
src/ext/tinytest.c:272
#13 0x5599f60c in tinytest_main (c=c@entry=4, v=v@entry=0x7fffd928,
groups=groups@entry=0x55d99020 ) at src/ext/tinytest.c:454
#14 0x555eb47b in main (c=4, v=) at
src/test/testing_common.c:424
(gdb)
```

```
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/14/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-14.0./work/gcc-14.0./configure
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/14
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14
--disable-silent-rules --disable-dependency-tracking
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/14/python
--enable-languages=c,c++,fortran,rust --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls 

[Bug target/114288] [14 regression] ICE when building binutils-2.41 on hppa (extract_constrain_insn, at recog.cc:2713)

2024-03-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114288

--- Comment #14 from GCC Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:53fd0f5b1fd737a208c12909fa1188281cb370a3

commit r14-9482-g53fd0f5b1fd737a208c12909fa1188281cb370a3
Author: John David Anglin 
Date:   Thu Mar 14 18:32:56 2024 +

hppa: Fix REG+D address support before reload

When generating PA 1.x code or code for GNU ld, floating-point
accesses only support 5-bit displacements but integer accesses
support 14-bit displacements.  I mistakenly assumed reload
could fix an invalid 14-bit displacement in a floating-point
access but this is not the case.

2024-03-14  John David Anglin  

gcc/ChangeLog:

PR target/114288
* config/pa/pa.cc (pa_legitimate_address_p): Don't allow
14-bit displacements before reload for modes that may use
a floating-point load or store.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #11 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #10)
> I really don't know how GORI etc. works.
> But, if when the switch handling determines that _1 (the switch controlling
> expression) has [irange] [111, 111] MASK 0x0 VALUE 0x6f (does it actually?
> i.e. for a singleton range all the bits here are known and equal to the
> value), then when trying to derive a range for related num_9(D) which is int
> rather than _1's short and
>   _1 = (short int) num_5(D);
> for the MASK/VALUE we should just use the same VALUE and or in 0x
> into MASK because we then don't know anything about the upper bits.
> Though, looking at the evrp detailed dump there is
> 2->3  _1 :  [irange] short int [111, 111]
> 2->3  _2 :  [irange] int [111, 111]
> 2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
> 111][65536, +INF]
> and so no MASK/VALUE for any of those ranges.

Right. No mask needed for _1 and _2 as the range fully represents the known
bits, and operator_cast::op1_range hasn't been taught to add a bitmask when
calculating num_5 yet.  It could have as mask and value dded to it because its
implied by the result of the cast being short int [111, 111]   The routine Aldy
provided should create that mask when asked I think.



> Now, from comments it seems that irange_bitmask is only computed on demand
> to speed things up, unless it has been explicitly set.
> Now, say for _1 or _2 above, we don't have anything recorded but we can
> always compute it on demand from the value range.  But when adding the

Which I think we are both on the same page so far.

> num_5(D) range based on the related _1 range, the on-demand irange_bitmask
> is no longer as precise as it would be if we when deriving that [-INF,
> -65537][-65425, -65425][111, 111][65536, +INF] range

We aren't deriving it from that range tho.  we are solving for num_5 the
equation
[111,111] = (short int) num_5

The range you list is the best we can currently produce with *just* ranges.  if
we also add that bitmask along with the range (generated from the range on the
LHS,  we can adjust that to what you specify

> from the [111, 111] range also derived from the in that case on-demand asked
> MASK 0x0 VALUE 0x6f to MASK 0x VALUE 0x6f.

Right, so the LHS 16 bits produce MASK 0x VALUE 0x6f, which means those
bits should apply tot he RHS as well.  Since we're extending that to 32 bits,
we'd have to make the upper ones unknown, so we should be able to create

num_5 : [rainge] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] MASK
0x VALUE 0x6f

And then in bb 3 when we see
  _8 = num_5(D) & 65534;

instead of producing 
_8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0

operator_bitwise_and::fold_range ought to be able to combine the known bits and
come up with
_8 : [irange] int [0, 65534] MASK 0x VALUE 0x6e,

which if Aldy's bitmask code is working right (:-) should turn into 
_8 : [irange] int [110, 110]

It should be fairly straightforward if operator_cast::op1_range creates the
mask for the range it produces.

[Bug c++/113141] [13/14 Regression] ICE on conversion to reference in aggregate initialization

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113141

--- Comment #7 from Andrew Pinski  ---
Note I noticed the testcase in PR 90390 ICEs starting in GCC 13 and it seems
similar to the testcase in comment #0 here.

[Bug c++/86385] calling wrong constructors?

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86385

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> Fixed for GCC 13+ by r13-2964-gbbdb5612f6661f2c64b0c0f1d2291cb59fde2b40 .

Or by r13-2963-g32b2eb59fb9049 .

Anyways both together are needed IIRC.

[Bug c++/86385] calling wrong constructors?

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86385

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |13.0
 Status|NEW |RESOLVED

--- Comment #5 from Andrew Pinski  ---
Fixed for GCC 13+ by r13-2964-gbbdb5612f6661f2c64b0c0f1d2291cb59fde2b40 .

[Bug tree-optimization/114332] wrong code with _Atomic _BitInt(5) at -O -fwrapv

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114332

--- Comment #1 from Jakub Jelinek  ---
Given that the x86-64 psABI says:
  \item The value of the unused bits beyond the width of the
   \texttt{_BitInt(N)} value but within the size of the
   \texttt{_BitInt(N)} are unspecified when stored in memory or register.
and that doesn't apply just to N > 64, but also smaller ones, but internally
GCC sign or zero extends only when reading from memory or VAR_DECLs etc., I
think we need to EXTEND_BITINT also the call return values.

[Bug modula2/114294] expression causes ICE

2024-03-14 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114294

--- Comment #2 from Gaius Mulley  ---
Created attachment 57704
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57704=edit
Proposed fix

The proposed fix was to assign a type to the result constant created by HIGH.
The call to PutConst was missing.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #10 from Jakub Jelinek  ---
I really don't know how GORI etc. works.
But, if when the switch handling determines that _1 (the switch controlling
expression) has [irange] [111, 111] MASK 0x0 VALUE 0x6f (does it actually? i.e.
for a singleton range all the bits here are known and equal to the value), then
when trying to derive a range for related num_9(D) which is int rather than
_1's short and
  _1 = (short int) num_5(D);
for the MASK/VALUE we should just use the same VALUE and or in 0x into
MASK because we then don't know anything about the upper bits.
Though, looking at the evrp detailed dump there is
2->3  _1 :  [irange] short int [111, 111]
2->3  _2 :  [irange] int [111, 111]
2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
111][65536, +INF]
and so no MASK/VALUE for any of those ranges.
Now, from comments it seems that irange_bitmask is only computed on demand to
speed things up, unless it has been explicitly set.
Now, say for _1 or _2 above, we don't have anything recorded but we can always
compute it on demand from the value range.  But when adding the num_5(D) range
based on the related _1 range, the on-demand irange_bitmask is no longer as
precise as it would be if we when deriving that [-INF, -65537][-65425,
-65425][111, 111][65536, +INF] range
from the [111, 111] range also derived from the in that case on-demand asked
MASK 0x0 VALUE 0x6f to MASK 0x VALUE 0x6f.

[Bug rtl-optimization/114338] (x & (-1 << y)) should be optimized to ((x >> y) << y) or vice versa

2024-03-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114338

--- Comment #4 from Andrew Pinski  ---
Note I added this to the list of Canonicalization issues in gimple on the wiki:
https://gcc.gnu.org/wiki/GimpleCanonical

  1   2   >