[Bug tree-optimization/113900] [14 regression] Hang and then ICE in vect_transform_loops, at tree-vectorizer.cc:1031 when building slang-2.3.3 since r14-8925

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113900

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #9 from Richard Biener  ---
It's an odd duplicate.  I confirm the fix for PR113902 fixes both the original
and the reduced testcase.

*** This bug has been marked as a duplicate of bug 113902 ***

[Bug tree-optimization/113902] [14 regression] ICE in find_uses_to_rename_use since r14-8925

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113902

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug tree-optimization/113898] [14 regression] ICE in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1156 since r14-8929-g938a419182f

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113898

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
This one is fixed now.

[Bug tree-optimization/113902] [14 regression] ICE in find_uses_to_rename_use since r14-8925

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113902

--- Comment #3 from Richard Biener  ---
*** Bug 113901 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/113901] [14 regression] ICE when building nodejs-20.11.0 (crash in find_uses_to_rename_use)

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113901

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Richard Biener  ---
Duplicate.  The fix for PR113902 works here, too.

*** This bug has been marked as a duplicate of bug 113902 ***

[Bug tree-optimization/113902] [14 regression] ICE in find_uses_to_rename_use since r14-8925

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113902

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Richard Biener  ---
Mine.

[Bug tree-optimization/113900] [14 regression] Hang and then ICE in vect_transform_loops, at tree-vectorizer.cc:1031 when building slang-2.3.3

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113900

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-13
   Keywords|compile-time-hog|
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #5 from Richard Biener  ---
What does -march=native resolve to?  I suppose znver2?  I can confirm the
compile-time-hog even with a release checking GCC 13 compiler, but nothing
really stands out here besides maybe RTL combine and load CSE after reload
(that's a usual suspect).

> gcc-13 slarith.i -S -m32 -mfpmath=sse -O3 -fPIC -march=znver2 
> -fno-strict-aliasing -Waddress -Warray-bounds -Wfree-nonheap-object 
> -Wint-to-pointer-cast -Wmain -Wnonnull -Wodr -Wreturn-type 
> -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstring-compare -Wuninitialized 
> -Wvarargs -ftime-report

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 2042k (  0%)
 phase parsing  :   0.13 (  0%)   0.40 ( 20%)   0.53 (  1%)
   25M (  1%)
 phase lang. deferred   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   96  (  0%)
 phase opt and generate :  46.65 (100%)   1.61 ( 80%)  48.27 ( 99%)
 2563M ( 99%)
 garbage collection :   0.12 (  0%)   0.01 (  0%)   0.12 (  0%)
0  (  0%)
 dump files :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
0  (  0%)
 callgraph construction :   0.05 (  0%)   0.00 (  0%)   0.01 (  0%)
  552k (  0%)
 callgraph optimization :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 2952  (  0%)
 callgraph functions expansion  :  45.66 ( 98%)   1.46 ( 73%)  47.13 ( 97%)
 2459M ( 95%)
 callgraph ipa passes   :   0.90 (  2%)   0.15 (  7%)   1.06 (  2%)
   60M (  2%)
 ipa function summary   :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)
 9208k (  0%)
 ipa cp :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  175k (  0%)
 ipa inlining heuristics:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   68k (  0%)
 ipa function splitting :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 8528  (  0%)
 ipa pure const :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)
 3504  (  0%)
 ipa icf:   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
   30k (  0%)
 ipa SRA:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   37k (  0%)
 ipa modref :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  325k (  0%)
 cfg construction   :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 3443k (  0%)
 cfg cleanup:   0.52 (  1%)   0.01 (  0%)   0.44 (  1%)
   37M (  1%)
 trivially dead code:   0.11 (  0%)   0.00 (  0%)   0.15 (  0%)
0  (  0%)
 df scan insns  :   0.07 (  0%)   0.00 (  0%)   0.10 (  0%)
   12k (  0%)
 df reaching defs   :   0.37 (  1%)   0.01 (  0%)   0.29 (  1%)
0  (  0%)
 df live regs   :   1.22 (  3%)   0.01 (  0%)   1.15 (  2%)
0  (  0%)
 df live regs   :   0.53 (  1%)   0.00 (  0%)   0.65 (  1%)
0  (  0%)
 df must-initialized regs   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 df use-def / def-use chains:   0.07 (  0%)   0.00 (  0%)   0.09 (  0%)
0  (  0%)
 df live reg subwords   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 df reg dead/unused notes   :   0.55 (  1%)   0.00 (  0%)   0.51 (  1%)
   24M (  1%)
 register information   :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)
0  (  0%)
 alias analysis :   0.51 (  1%)   0.00 (  0%)   0.48 (  1%)
  125M (  5%)
 alias stmt walking :   0.91 (  2%)   0.22 ( 11%)   0.95 (  2%)
   45M (  2%)
 register scan  :   0.06 (  0%)   0.00 (  0%)   0.04 (  0%)
 1524k (  0%)
 rebuild jump labels:   0.09 (  0%)   0.00 (  0%)   0.04 (  0%)
  264  (  0%)
 preprocessing  :   0.03 (  0%)   0.10 (  5%)   0.12 (  0%)
  500k (  0%)
 lexical analysis   :   0.06 (  0%)   0.19 (  9%)   0.20 (  0%)
0  (  0%)
 parser (global):   0.00 (  0%)   0.01 (  0%)   0.01 (  0%)
 3313k (  0%)
 parser struct body :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  165k (  0%)
 parser function body   :   0.04 (  0%)   0.10 (  5%)   0.18 (  0%)
   20M (  1%)
 parser inl. func. body :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  374k (  0%)
 inline parameters  :   0.04 (  0%)   0.02 (  1%)   0.09 (  0%)
  779k (  0%)
 integration:   

[Bug tree-optimization/113895] [14 Regression] ice in in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1144

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113895

--- Comment #5 from Richard Biener  ---
For the first testcase the issue is bitfields and 'off' being tracked in bytes.
ao_ref_init_from_vn_reference handles this by not using 'off'.

[Bug tree-optimization/113895] [14 Regression] ice in in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1144

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113895

--- Comment #4 from Richard Biener  ---
_1 = a[b.1_14][7];

we "correctly" resolve b.1_14 to 1 based on range info which is
[-INF,-1] [1, +INF].  The thing is, the get_ref_base_and_extent code
cannot do anything with this range but adjusting max_size to 32
by taking [7] and the overall size of a[] (8 elements) into account.

The reverse-engineering of a constant array index falls apart when facing
with this kind of undefined behavior - and it's the checking code trying
to verify both implementations against each other that fails.

That said, it's

tree asize = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (exp, 0)));
/* We need to adjust maxsize to the whole array bitsize.
   But we can subtract any constant offset seen so far,
   because that would get us outside of the array otherwise. 
*/
if (known_size_p (maxsize)
&& asize
&& poly_int_tree_p (asize))
  maxsize = wi::to_poly_offset (asize) - bit_offset;

that ends up constraining the access, but the resulting offset is
to a[1][3], and VN comes up with a[1][7].

[Bug tree-optimization/113900] [14 regression] Hang and then ICE in vect_transform_loops, at tree-vectorizer.cc:1031 when building slang-2.3.3

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113900

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||needs-bisection,
   ||needs-reduction

[Bug tree-optimization/113898] [14 regression] ICE in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1156 since r14-8929-g938a419182f

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113898

--- Comment #2 from Richard Biener  ---
 [local count: 101363582]:
# RANGE [irange] int [1, 2]
h_24 = 1;
ivtmp_25 = 1;
e[h_24][_9] = c.5_10;

so there's a missed CCP (this is late FRE).  We massaged it to e[1][1] but
it should have been e[1][0] instead.

Oops.  Testing fix.

[Bug tree-optimization/113898] [14 regression] ICE in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1156 since r14-8929-g938a419182f

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113898

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-02-13
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |14.0

--- Comment #1 from Richard Biener  ---
Looking.

[Bug tree-optimization/113896] [12 Regression] Assigning array elements in the wrong order after floating point optimization since r12-8841

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113896

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-13
   Keywords|needs-bisection |
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Priority|P3  |P2

--- Comment #3 from Richard Biener  ---
Hmm, OK, it was a backport..  I'll see.

[Bug tree-optimization/113896] [12 Regression] Assigning array elements in the wrong order after floating point optimization since r12-8841

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113896

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Keywords||needs-bisection

--- Comment #2 from Richard Biener  ---
what fixed it?

[Bug tree-optimization/113895] [14 Regression] ice in in copy_reference_ops_from_ref, at tree-ssa-sccvn.cc:1144

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113895

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-13
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
I will have a look.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #39 from Richard Biener  ---
(In reply to H.J. Lu from comment #32)
> (In reply to Michael Matz from comment #31)
> > (In reply to H.J. Lu from comment #30)
> > > (In reply to Michael Matz from comment #29)
> > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > clearly _does_ so :-)
> > > 
> > > ld.so can only call the malloc implementation internal to ld.so.
> > 
> > (And string functions for initializing that memory)  If that's ensured
> > already
> > everywhere: super.  Because I agree, that this is the best thing to do here.
> > From my perspective this is pure internal implementation details and hence
> > setting up thread-local areas should not be expected to be interposable by
> > users.
> > (a custom allocator that isn't malloc or doesn't interact with it also would
> > work)
> 
> Since ia32 ld.so in glibc is compiled with:
> 
> Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> 
> ia32 _dl_tlsdesc_dynamic is OK.

Maybe also use -minline-all-stringops to avoid using IFUNC accelerated
memset/memcpy?

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #5 from Richard Biener  ---
CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
component ref chain here or why it choses the dynamic type as it does
(possibly local SRA when fully scalarizing an aggregate copy does the same).

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Hmm, the important one is actually MEM[ptr + CST] vs MEM[ptr].component.  But
those are not semantically equivalent, even when the same TBAA type is in
effect.

  _31 = MEM  [(struct quantum_reg *)reg_3(D)];
  _33 = MEM  [(struct quantum_reg *)reg_3(D) + 8B];
  _34 = MEM  [(struct quantum_reg *)reg_3(D) + 16B];
  _35 = MEM  [(struct quantum_reg *)reg_3(D) + 24B];
  out = quantum_state_collapse.isra (pos_1(D), result_22, _31, _32, _33, _34,
_35); [return slot optimization]

this is from inlined quantum_state_collapse where IPA SRA is eventually
applied producing the above.

That we do produce those might hint at that we can't really assume the
dynamic type quantum_reg is at offset 8 but that was the original intent.
What we are left with is the special-case where typeof (MEM[ptr + CST])
== typeof (alias-pointed-to-type) (with CST == 0).  For any other case
what we know is only that the access MEM[ptr + CST] is to somewhere
inside an object of dynamic type quantum_reg?

I'm not sure that's not less than we make use of in the alias-oracle,
esp. aliasing_component_refs_walk and friends?  We might be fine in
practice for "bare" MEM_REFs like the above, but if we ever fold only
part of the access path into the constant offset funny things may happen?

So I think IPA SRA does wrong here (and maybe GCC in other places as well),
possibly only pessimizing and possibly creating latent wrong-code.
Note quantum_state_collapse has

  reg$size_62 = reg.size;
  reg$node_75 = reg.node;
...

pre-IPA.

Honza, any opinion?

[Bug tree-optimization/113831] [11/12/13 Regression] Wrong VN with structurally identical ref since r9-398

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] Wrong
   |Wrong VN with structurally  |VN with structurally
   |identical ref since r9-398  |identical ref since r9-398

--- Comment #7 from Richard Biener  ---
Fixed on trunk sofar.

[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #21 from Richard Biener  ---
loop->nb_iterations_upper_bound exactly is an upper bound on the number of
latch executions, so maybe I'm missing the point here.  When we update it it as
well has to reflect an upper bound on that, whether the last exit (the one
before the latch) is the IV exit or a vectorized early exit.

But yes, if the last exit is an early one that last iteration might be partial
(so we drop the -1), but that's what we already do?

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #3 from Richard Biener  ---
I can't confirm a regression (testing r14-8925-g1e3f78dbb328a2 with the
offending rev reverted vs bare).

462.libquantum  20720   61.9335 S   20720   62.6331 *
462.libquantum  20720   62.2333 *   20720   61.9335 S
462.libquantum  20720   62.4332 S   20720   62.7330 S

so the "best" run with the change is faster than the best run with it reverted
while the worst runs are the same.

There's only code-gen changes in quantum_bmeasure.part.0 and we can see
it's likely

{component_ref,mem_ref<0B>,reg_3(D)}@.MEM_166 (0030)

vs

{component_ref,mem_ref<0B>,reg_3(D)}@.MEM_9 (0022)

where once the size is 256 and once 64.  The types are

  constant 256>
unit-size  constant 32>

vs.

 
unit-size 

the former is subsetted by a COMPONENT_REF to eventually

 >
unsigned DI

so we have basically MEM vs. MEM.member-with-off.

That's indeed a case where we maybe like to avoid applying this fix, but
maybe only when strict-aliasing is in effect.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #17 from Richard Biener  ---
(In reply to Richard Biener from comment #16)
> I do wonder why __tls_get_addr would have to call the overloaded malloc, can
> we just not force-bind it to the glibc local malloc (and make sure that's
> compiled with -mgeneral-regs-only)?

I realize we end up calling memset (but __mempcpy?) as well, that might
end up in an ifunc and thus using non-general regs as well (and be
overloaded of course).  So the whole __tls_get_addr path would need to
make sure it never goes out of glibc controlled sources.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #16 from Richard Biener  ---
I do wonder why __tls_get_addr would have to call the overloaded malloc, can
we just not force-bind it to the glibc local malloc (and make sure that's
compiled with -mgeneral-regs-only)?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

Richard Biener  changed:

   What|Removed |Added

 CC||matz at gcc dot gnu.org

--- Comment #14 from Richard Biener  ---
True.  Maybe the kernel VDSO should have a _save_all_regs (fnptr) and
"indirector" ...

[Bug tree-optimization/113863] [14 Regression] ICE verify_ssa failed with -O3 -msse4.1 since r14-8768

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113863

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2024-02-12
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
I will try to investigate.  Note this was a correctness fix, it could be
relaxed a tiny bit but behavior will then depend on the order of processing of
blocks not ordered by RPO.

[Bug target/113882] V4SF->V4HI could be implemented using V4SF->V4SI and then truncation to V4HI

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113882

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947

--- Comment #1 from Richard Biener  ---
The vectoizer has some of these tricks but the intermediate conversion allowed
is somewhat hard-coded.  I think the C standard says SF -> HI invokes undefined
behavior on overflow so the conversion should be valid.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/113879] missed optimization - not exploiting known range of integers

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113879

Richard Biener  changed:

   What|Removed |Added

 Blocks||85316
 CC||amacleod at redhat dot com

--- Comment #1 from Richard Biener  ---
VRP has difficulties with cycles.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

[Bug sanitizer/113878] missed optimization with sanitizer and signed integer overflow

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113878

--- Comment #9 from Richard Biener  ---
I'd very much appreciate getting rid of TYPE_OVERFLOW_SANITIZED checks by doing
instrumentation in the frontends.

Note we do

#define TYPE_OVERFLOW_UNDEFINED(TYPE)   \
  (POINTER_TYPE_P (TYPE)\
   ? !flag_wrapv_pointer\
   : (!ANY_INTEGRAL_TYPE_CHECK(TYPE)->base.u.bits.unsigned_flag \
  && !flag_wrapv && !flag_trapv))

it might be tempting to do && !flag_trapv && !(flag_sanitize &
SANITIZE_SI_OVERFLOW) instead to get more complete coverage of disabling
foldings.

_Maybe_ we could clear SANITIZE_SI_OVERFLOW once instrumentation is complete?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #10 from Richard Biener  ---
I think a glibc fix would be very much preferred.  Is -mtls-dialect=gnu2
supposed to work on a per-TU base or are all parts of an executable + loaded
shlibs required to have the same setting?

[Bug target/113871] psrlq is not used for PERM

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871

Richard Biener  changed:

   What|Removed |Added

 Target|x86_64  |x86_64-*-* i?86-*-*
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-12

--- Comment #3 from Richard Biener  ---
Confirmed.

[Bug middle-end/113867] [14 Regression][OpenMP] Wrong code with mapping pointers in structs

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113867

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug tree-optimization/113863] [14 Regression] ICE verify_ssa failed with -O3 -msse4.1 since r14-8768

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113863

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Richard Biener  ---
It looks OK after peeling.  LOOP_VINFO_EARLY_BRK_VUSES is empty, but we have
a stray virtual PHI in the body we fail to update:

 [local count: 446046556]:
# .MEM_164 = PHI <.MEM_163(166)>
if (f_8(D) < l_162)
  goto ; [88.31%]
else
  goto ; [11.69%]

things go downhill from here.

[Bug c++/113852] -Wsign-compare doesn't warn on unsigned result types

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113852

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener  ---
Well, given athat a1 * a2 is carried out in 'int' you are invoking undefined
behavior if it overflows.  GCC assumes that doesn't happen so it's correct
to elide the diagnostic.  Unless you make overflow well-defined with -fwrapv.

I think that errors on the right side for the purpose of -Wsign-compare.

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410

--- Comment #10 from Richard Biener  ---
So this is now fixed if you use --param vect-partial-vector-usage=2, there is
at the moment no way to get masking/not masking costed against each other.  In
theory vect_analyze_loop_costing and vect_estimate_min_profitable_iters
could do both and we could delay vect_determine_partial_vectors_and_peeling.

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #4 from Richard Biener  ---
So I'd say INVALID or WONTFIX.

[Bug rust/113499] crab1 fails to link when configuring with --disable-plugin

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113499

--- Comment #3 from Richard Biener  ---
(In reply to Richard Biener from comment #2)
> Re-confirmed.  Can be reproduced both on a glibc 2.31 and glibc 2.38 system
> with

It does work with glibc 2.38, so only glibc 2.31 fails this (and possibly other
OS).

[Bug rust/113499] crab1 fails to link when configuring with --disable-plugin

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113499

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-09
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
Re-confirmed.  Can be reproduced both on a glibc 2.31 and glibc 2.38 system
with

../src/configure --enable-languages=rust --disable-bootstrap --disable-plugin

See GCC_ENABLE_PLUGIN which adjusts 'pluginlibs' but also causes symbols to
be exported from the executable.  You need to figure what you need.  For
example the 'jit' frontend also requires this (--enable-host-shared), but
IIRC it doesn't require -ldl

Some hosts may not support dynamically loading things.

[Bug target/113615] internal compiler error: in extract_insn, at recog.cc:2812

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113615

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Biener  ---
This seems fixed now.

[Bug rtl-optimization/101188] [11/12/13 Regression] [postreload] Uses content of a clobbered register

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |law at gcc dot gnu.org
   Target Milestone|--- |11.5
 Status|REOPENED|ASSIGNED

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug modula2/113848] modula2 doesn't build with clang

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113848

--- Comment #1 from Richard Biener  ---
void * arithmetic is a GCC extension, I suggest to change that to char *

[Bug tree-optimization/113849] wrong code with _BitInt() arithmetics at -O1

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113849

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-09

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug tree-optimization/113831] [11/12/13/14 Regression] Wrong VN with structurally identical ref since r9-398

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

--- Comment #5 from Richard Biener  ---
So we have equal vn_reference but with different ao_ref.  Note the recorded
vn_reference has value-numbers in operands (not sanitized via AVAIL to a
specific location) but the ao_ref is eventually initialized from
get_ref_base_and_extent on the original ref which can use context sensitive
info.  That doesn't actually compute a constant array index from a variable
one but instead it constrains the extend of the access which eventually
gets to max_size == size.

To apply the same logic consistently to the VN representation (which is
eventually valueized) we can only look at ranges on names either from the
original ref (during copy_reference_ops_from_ref) or when valueizing with
AVAIL in mind.  For consistency operating from copy_reference_ops_from_ref
would be preferred.

It's going to be quite sophisticated to reverse-engineer all constant
array indexes from the overall [offset, offset + size] computed by
get_ref_base_and_extent (we definitely want to do that only once per
copy_reference_ops_from_ref).  For PRE we do need all the components,
so we have to somehow post-process the vn_reference ops.

The other possibility for a fix would be to try to fend off ranges being
used by get_ref_base_and_extent (but only for the calls on the refs
we're going to insert into the expression hash table).  get_range_query
cannot be tricked so it would be an extra arg to get_ref_base_and_extent
and possibly ao_ref_init.  That sounds a bit ugly.

I will try to implement the post-processing.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

--- Comment #10 from Richard Biener  ---
Btw, I was hoping Richard would chime in here ...

[Bug libstdc++/113835] [13/14 Regression] compiling std::vector with const size in C++20 is slow

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113835

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-09
   Target Milestone|--- |13.3
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Known to fail||13.2.1, 14.0
  Component|c++ |libstdc++
  Known to work||12.2.1
Summary|compiling std::vector with  |[13/14 Regression]
   |const size in C++20 is slow |compiling std::vector with
   ||const size in C++20 is slow

--- Comment #1 from Richard Biener  ---
Confirmed with -std=c++20 -fsyntax-only

 constant expression evaluation :   1.80 ( 85%)   0.03 ( 14%)   1.84 ( 78%)
  220M ( 88%)
 TOTAL  :   2.13  0.22  2.36   
  250M


Samples: 8K of event 'cycles', Event count (approx.): 9294971478
Overhead   Samples  Command  Shared Object Symbol   
  16.33%  1385  cc1plus  cc1plus   [.]
cxx_eval_constant_expression
   4.35%   369  cc1plus  cc1plus   [.] cxx_eval_call_expression
   3.90%   331  cc1plus  cc1plus   [.]
cxx_eval_store_expression
   3.16%   268  cc1plus  cc1plus   [.]
hash_table::find_s
   1.98%   168  cc1plus  cc1plus   [.] tree_operand_check

GCC 12 was fast (possibly std::vector wasn't constexpr there?)

[Bug tree-optimization/113833] 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

--- Comment #3 from Richard Biener  ---
I suspect the issue would pop up with -Ofast -fno-vect-cost-model for any
sub-architecture.  The patch referenced just adjusts costs for doing BB
vectorization (and there's reductions there as well).  It might be interesting
to offer more high-level knobs to tune for vectorization, say
-fno-vect-bb-reduction or -fforce-in-order-bb-reduction-vectorization.

A compare before/after the patch of -fopt-info-vec output might show the few
cases that are affected by the patch.

[Bug tree-optimization/113831] [11/12/13/14 Regression] Wrong VN with structurally identical ref since r9-398

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108355

--- Comment #4 from Richard Biener  ---
The related bug might be also fixed then.

[Bug tree-optimization/113831] [11/12/13/14 Regression] Wrong VN with structurally identical ref since r9-398-g6b9fc1782effc67dd9f6def16207653d79647553

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

--- Comment #2 from Richard Biener  ---
I think the issue is that we're using range info for get_ref_base_and_extent
but we fail to do so when valueizing refs.

[Bug tree-optimization/113831] [11/12/13/14 Regression] Wrong VN with structurally identical ref since r9-398-g6b9fc1782effc67dd9f6def16207653d79647553

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Keywords||wrong-code
   Target Milestone|--- |11.5
   Last reconfirmed||2024-02-08

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/113774] wrong code with _BitInt() arithmetics at -O2

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113774

--- Comment #7 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #6)
> Thanks.
> The #c5 reduced testcase started to be miscompiled with
> r9-398-g6b9fc1782effc67dd9f6def16207653d79647553
> Perhaps we should move that to a separate bug so that it can be marked
> [11/12/13/14 Regression] and leave this just for the bitint lowering
> enhancements not to emit clearly always true or always false conditions if
> possible.

PR113831

[Bug tree-optimization/113831] New: [11/12/13/14 Regression] Wrong VN with structurally identical ref since r9-398-g6b9fc1782effc67dd9f6def16207653d79647553

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113831

Bug ID: 113831
   Summary: [11/12/13/14 Regression] Wrong VN with structurally
identical ref since
r9-398-g6b9fc1782effc67dd9f6def16207653d79647553
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following is miscompiled by FRE with -O2

int a[3];
int __attribute__((noipa))
foo(int i, int x)
{
  int tem = 0;
  a[2] = x;
  if (i < 1)
++i;
  else
{
  ++i;
  tem = a[i];
}
  tem += a[i];
  return tem;
}

int main() { if (foo (0, 7) != 0) __builtin_abort(); }

[Bug tree-optimization/113774] wrong code with _BitInt() arithmetics at -O2

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113774

--- Comment #5 from Richard Biener  ---
This must go wrong during alias disambiguation, somehow figuring we can ignore
the backedge?!  The ref we hoist is

  _68 = VIEW_CONVERT_EXPR(b)[_146];

where _146 is _49 + 1, but _49 is an IV:

  _134 = _105 & 1;
  MEM  [(unsigned _BitInt(257) *) + 32B] = _134;

   [local count: 1073741824]:
  # _49 = PHI <0(4), _50(28)>

it's also odd that we seem to arrive at b + 32B.

Value numbering stmt = _146 = PHI <_145(8), _140(31)>
Setting value number of _146 to _140 (changed)
Making available beyond BB10 _146 for value _140
...
Value numbering stmt = .MEM_150 = PHI <.MEM_149(8), .MEM_139(31)>
Setting value number of .MEM_150 to .MEM_150 (changed)
Value numbering stmt = _68 = VIEW_CONVERT_EXPR(b)[_146];
Setting value number of _68 to _134 (changed)

huh.

Hmm.  But we have

  # RANGE [irange] sizetype [4, 4][6, +INF] MASK 0xfffe VALUE 0x1
  _140 = _49 + 1;

  # RANGE [irange] sizetype [1, 2][4, 4][6, +INF] MASK 0xfffe VALUE
0x1 
  # _146 = PHI <_145(8), _140(6)>

we should look at the range of _146

Hmm, I _think_ I know what happens.  We have

 [local count: 1073741824]:
# _49 = PHI <0(4), _50(28)>
# _55 = PHI <0(4), _56(28)>
_51 = VIEW_CONVERT_EXPR(b)[_49];
if (_49 <= 2)
  goto ; [80.00%]
else
  goto ; [20.00%]

 [local count: 214748360]:
_135 = .USUBC (0, _51, _55);
_136 = IMAGPART_EXPR <_135>;
_137 = REALPART_EXPR <_135>;
_138 = _51 | _137;
bitint.6[_49] = _138;
_140 = _49 + 1;
_141 = VIEW_CONVERT_EXPR(b)[_140];

and this is the "same" valueized ref (what gets recorded in the hashtable),
but here we can see that _140 >= 4 which makes it known 4 based on the
array extent.  This matches it up with the store of _134:

Value numbering stmt = _141 = VIEW_CONVERT_EXPR(b)[_140];
Setting value number of _141 to _134 (changed)
_134 is available for _134

we record the expression with the VUSE of the definition.  Later when we
look up the same expression from the later block (where _140 isn't known
to be 4) we find the very same expression when looking with the VUSE of
the definition and thus we take the expression already in the hashtable
which has been assigned the value _134 and then boom.

Sth like the following is miscompiled at -O2 by FRE.

int a[3];
int __attribute__((noipa))
foo(int i, int x)
{
  int tem = 0;
  a[2] = x;
  if (i < 1)
++i;
  else
{
  ++i;
  tem = a[i];
}
  tem += a[i];
  return tem;
}

int main() { if (foo (0, 7) != 0) __builtin_abort(); }

[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #11 from Richard Biener  ---
(In reply to Tamar Christina from comment #10)
> (In reply to Richard Biener from comment #9)
> > Another bug in the dependence checking code is
> > 
> > if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
> > 
> > which will end up using TBAA - dr_may_alias_p doesn't think you are ever
> > going to move stores down across loads.  To verify if that's possible
> > you need to use
> > 
> > if (dr_may_alias_p (dr_read, dr_ref, loop_nest))
> > 
> > instead.
> > 
> > Note there's still my very original review consideration that you move
> > stmts out-of-order but the main dependence checking the vectorizer does
> > assumes the stores and loads appear in their original order.  I'm not
> > sure whether with the above we prove this doesn't matter.
> 
> But in the original review I had it that way and you said:
> 
> > + for (auto dr_read : bases)
> > +   if (dr_may_alias_p (dr_read, dr_ref, loop_nest))
> 
> I think you need to swap dr_read and dr_ref operands, since you
> are walking stmts backwards and thus all reads from 'bases' are
> after the write.
> 
> so I'm somewhat confused..

I was confused.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #42 from Richard Biener  ---
And the do_store_flag part:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fc5e998e329..44d64274071 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13693,6 +13693,19 @@ do_store_flag (sepops ops, rtx target, machine_mode
mode)
 subtarget = 0;

   expand_operands (arg0, arg1, subtarget, , , EXPAND_NORMAL);
+  unsigned HOST_WIDE_INT nunits;
+  if (VECTOR_BOOLEAN_TYPE_P (type)
+  && operand_mode == QImode
+  && TYPE_VECTOR_SUBPARTS (type).is_constant ()
+  && nunits < BITS_PER_UNIT)
+{
+  op0 = expand_binop (mode, and_optab, op0,
+ GEN_INT ((1 << nunits) - 1), NULL_RTX,
+ true, OPTAB_WIDEN);
+  op1 = expand_binop (mode, and_optab, op1,
+ GEN_INT ((1 << nunits) - 1), NULL_RTX,
+ true, OPTAB_WIDEN);
+}

   if (target == 0)
 target = gen_reg_rtx (mode);


for the testcase

typedef long v4si __attribute__((vector_size(4*sizeof(long;
typedef v4si v4sib __attribute__((vector_mask));
typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));
_Bool x;
void __GIMPLE (ssa) foo (v4sib v1, v4sib v2)
{
  v4sib tem;
  _Bool _7;

__BB(2):
  tem_5 = ~v2_2(D);
  tem_3 = v1_1(D) | tem_5;
  tem_4 = _Literal (v4sib) { _Literal (sbool1) -1, _Literal (sbool1) -1,
_Literal (sbool1) -1, _Literal (sbool1) -1 };
  _7 = tem_3 == tem_4;
  x = _7;
  return;
}

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #41 from Richard Biener  ---
(In reply to Hongtao Liu from comment #38)
> > I think we should also mask off the upper bits of variable mask?
> > 
> > notl%esi
> > orl %esi, %edi
> > notl%edi
> > andl$15, %edi
> > je  .L3
> 
> with -mbmi, it's 
> 
> andn%esi, %edi, %edi
> andl$15, %edi
> je  .L3

Well, yes, the discussion in this bug was whether to do this at consumers
(that's sth new) or with all mask operations (that's how we handle
bit-precision integer operations, so it might be relatively easy to
do that - specifically spot the places eventually needing adjustment).

There's do_store_flag to fixup for uses not in branches and
do_compare_and_jump for conditional jumps.

Note the AND is removed by combine if I add it:

Successfully matched this instruction:
(set (reg:CCZ 17 flags)
(compare:CCZ (and:HI (not:HI (subreg:HI (reg:QI 102 [ tem_3 ]) 0))
(const_int 15 [0xf]))
(const_int 0 [0])))

(*testhi_not)

-9: {r103:QI=r102:QI&0xf;clobber flags:CC;}
+  REG_DEAD r99:QI
+9: NOTE_INSN_DELETED
+   12: flags:CCZ=cmp(~r102:QI#0&0xf,0)
   REG_DEAD r102:QI
-  REG_UNUSED flags:CC
-   12: flags:CCZ=cmp(r103:QI,0xf)
-  REG_DEAD r103:QI

and we get

foo:
.LFB0:
.cfi_startproc
notl%esi
orl %esi, %edi
notl%edi
testb   $15, %dil
je  .L6
ret

which I'm not sure is OK?

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index e2d2b3cb111..784707c1e55 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1266,6 +1266,7 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum
rtx_code signed_code,
   machine_mode mode;
   int unsignedp;
   enum rtx_code code;
+  unsigned HOST_WIDE_INT nunits;

   /* Don't crash if the comparison was erroneous.  */
   op0 = expand_normal (treeop0);
@@ -1308,6 +1309,18 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum
rtx_code signed_code,
   emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
   op1 = new_op1;
 }
+  else if (VECTOR_BOOLEAN_TYPE_P (type)
+  && mode == QImode
+  && TYPE_VECTOR_SUBPARTS (type).is_constant ()
+  && nunits < BITS_PER_UNIT)
+{
+  op0 = expand_binop (mode, and_optab, op0,
+ GEN_INT ((1 << nunits) - 1), NULL_RTX,
+ true, OPTAB_WIDEN);
+  op1 = expand_binop (mode, and_optab, op1,
+ GEN_INT ((1 << nunits) - 1), NULL_RTX,
+ true, OPTAB_WIDEN);
+}

   do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
   ((mode == BLKmode)

[Bug libstdc++/113811] std::rotate does 64-bit signed division

2024-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113811

--- Comment #1 from Richard Biener  ---
In case __n is the minimum signed integer and __k is -1 the division would also
trap ;)  So yes, they should be unsigned.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #36 from Richard Biener  ---
For example with AVX512VL and the following, using -O -fgimple -mavx512vl
we get simply

notl%esi
orl %esi, %edi
cmpb$15, %dil
je  .L6

typedef long v4si __attribute__((vector_size(4*sizeof(long;
typedef v4si v4sib __attribute__((vector_mask));
typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));

void __GIMPLE (ssa) foo (v4sib v1, v4sib v2)
{
  v4sib tem;

__BB(2):
  tem_5 = ~v2_2(D);
  tem_3 = v1_1(D) | tem_5;
  tem_4 = _Literal (v4sib) { _Literal (sbool1) -1, _Literal (sbool1) -1,
_Literal (sbool1) -1, _Literal (sbool1) -1 };
  if (tem_3 == tem_4)
goto __BB3;
  else
goto __BB4;

__BB(3):
  __builtin_abort ();

__BB(4):
  return;
}


the question is whether that matches the semantics of GIMPLE (the padding
is inverted, too), whether it invokes undefined behavior (don't do it - it
seems for people using intrinsics that's what it is?) or whether we
should avoid affecting padding.

Note after the patch I proposed on the mailing list the constant mask is
now expanded with zero padding.

[Bug tree-optimization/113796] [14 Regression] ifcvt does not remove range info before folding: Runtime mismatch at -O2

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113796

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Biener  ---
Fixed (but possibly latent on branches of course).

[Bug tree-optimization/113808] [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 since r14-8768

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

--- Comment #8 from Richard Biener  ---
It's surely a bug in the vectorizer early exit handling.  I just don't know
what exactly is wrong right now ;)

[Bug tree-optimization/113808] [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 since r14-8768

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

--- Comment #6 from Richard Biener  ---
With the following I don't see things going wrong, but we end up with the loop
having the STOP exit last instead and thus a PEELED case.

function bar (n) result (k)
  integer :: n, k
  !$omp simd lastprivate(k)
  do k = 1, n + 41
if (k > 11 + 41 .or. k < 1) error stop
  end do
end

program main
  integer :: n, i,k
  integer :: bar
  n = 11
  k = bar (n)
  if (k /= 53) then
print *, k, 53
error stop
  endif
end

[Bug tree-optimization/113808] [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 since r14-8768

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

--- Comment #5 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #3)
> Started with r14-8768-g85094e2aa6dba7908f053046f02dd443e8f65d72
> The regression status is unclear because we emitted sorry on this
> before r14-2634-g85da0b40538fb0d17d89de1e7905984668e3dfef

I think r14-8768 just exposed this.

We are picking the last exit in the loop, it's not a PEELED case.
It's the exit towards the if (k/=53) not towards STOP.

[Bug tree-optimization/113808] [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 since r14-8768

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

--- Comment #4 from Richard Biener  ---
Reduced a bit, w/o collapse:

program main
  integer :: n, i,k
  n = 11
  do i = 1, n,2
!$omp simd lastprivate(k)
do k = 1, i + 41
  if (k > 11 + 41 .or. k < 1) error stop
end do
  end do
  if (k /= 53) then
print *, k, 53
error stop
  endif
end

[Bug tree-optimization/113808] [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||wrong-code
 CC||tnfchris at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
The error must be for the continuation of 'k' to the scalar loop where we have

   [local count: 829590381]:
  MEM  [(integer(kind=4) *)] =
vect_vec_iv_.27_95;
  vect_k.32_118 = vect_vec_iv_.27_95 + { 1, 1, 1, 1 };
  k.4_23 = k.4_55 + 1; 
  ivtmp_120 = ivtmp_119 + 1;
  if (ivtmp_120 < bnd.23_89)
goto ; [85.44%]
  else
goto ; [14.56%]

   [local count: 136777259]:
  # k.4_45 = PHI 
  # ivtmp_76 = PHI 
  # vect_vec_iv_.27_99 = PHI 
  # vect__19.29_108 = PHI <{ 0, 1, 2, 3 }(5)>
  _109 = BIT_FIELD_REF ;
  _48 = _109;
  _100 = BIT_FIELD_REF ;
  k.4_43 = _100;
  niters_vector_mult_vf.24_90 = bnd.23_89 << 2;
  tmp.26_93 = 53 - niters_vector_mult_vf.24_90;
  _92 = (integer(kind=4)) niters_vector_mult_vf.24_90;
  tmp.25_91 = _92 + 1;
  if (niters.22_12 == niters_vector_mult_vf.24_90)
goto ; [25.00%]
  else
goto ; [75.00%]

   [local count: 136777259]:
  # k.4_74 = PHI 
  # ivtmp_77 = PHI 

but I can't really see anything wrong here (besides redundant code).

It's possible to elide the middle loop, but I failed to emulate the
inner loop how it's presented without -fopenmp-simd.

[Bug tree-optimization/113808] New: [14 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113808

Bug ID: 113808
   Summary: [14 Regression] FAIL:
libgomp.fortran/non-rectangular-loop-1.f90
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following reduced testcase from libgomp.fortran/non-rectangular-loop-1.f90
fails execution:

program main
  integer :: n,m,p, i,j,k,ll
  n = 11
  m = 23
  p = 27
  !$omp simd collapse(3) lastprivate(k)
  do i = 1, n,2
do j = 1, m
  do k = 1, i + 41
if (k > 11 + 41 .or. k < 1) error stop
  end do
end do
  end do
  if (k /= 53) then
print *, k, 53
error stop
  endif
end

when built with -O -msse4.1 -fopenmp-simd

> ./a.out 
  50  53
ERROR STOP 

Error termination. Backtrace:
#0  0x4008ec in ???
#1  0x400909 in ???
#2  0x7f873306f24c in ???
#3  0x400679 in _start
at ../sysdeps/x86_64/start.S:120
#4  0x in ???

[Bug libgcc/113803] libgcc unwinder stops at calls to null function pointer on some targets

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113803

--- Comment #4 from Richard Biener  ---
The return address should be still on the stack for most archs, unless we run
into zero by "overflowing" the IP, of course.

[Bug tree-optimization/113796] [14 Regression] ifcvt does not remove range info before folding: Runtime mismatch at -O2

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113796

--- Comment #7 from Richard Biener  ---
We're removing flow-sensitive info in combine_blocks, but after inserting
and folding stmts comprising the PHI replacements.  There's possibly
latent issues when building up the predicates themselves since that uses
maybe_fold_or_comparisons without the workaround added for if-combine.

I have a patch resetting flow-sensitive info earlier (and also covering
PHIs).

[Bug tree-optimization/113801] Missed optimization of loop invariant elimination

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113801

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed||2024-02-07

--- Comment #1 from Richard Biener  ---
I think there's a duplicate bug having the same loop-carried "zero" where
final value replacement gets the overall update to 'a', just a bit more
complicated here.

[Bug tree-optimization/111478] [12 Regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

--- Comment #9 from Richard Biener  ---
(In reply to Saurabh Jha from comment #8)
> Hi Richard,
> 
> Are you also planning to backport it to gcc-12?

Yes.

[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #9 from Richard Biener  ---
Another bug in the dependence checking code is

if (dr_may_alias_p (dr_ref, dr_read, loop_nest))

which will end up using TBAA - dr_may_alias_p doesn't think you are ever
going to move stores down across loads.  To verify if that's possible
you need to use

if (dr_may_alias_p (dr_read, dr_ref, loop_nest))

instead.

Note there's still my very original review consideration that you move
stmts out-of-order but the main dependence checking the vectorizer does
assumes the stores and loads appear in their original order.  I'm not
sure whether with the above we prove this doesn't matter.

[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #8 from Richard Biener  ---
(In reply to Tamar Christina from comment #6)
> The reason for the miscompile popping up is this change from the previous
> patch
> 
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 109d4ce5192..df3eab2e8d5 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -725,8 +725,7 @@ vect_analyze_early_break_dependences (loop_vec_info
> loop_vinfo)
>  bounded by VF so accesses are within range.  We only need to
> check the
>  reads since writes are moved to a safe place where if we get
> there we
>  know they are safe to perform.  */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))

I think it can even be relaxed to

 if ((DR_IS_READ (dr_ref) && check_deps))

since for non-peeled the IV exit block will be only executed with a fully
enabled vector.

[Bug tree-optimization/113796] [14 Regression] ifcvt does not remove range info before folding: Runtime mismatch at -O2

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113796

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Richard Biener  ---
Let me take this.

[Bug target/113790] [14 Regression][riscv64] ICE in curr_insn_transform, at lra-constraints.cc:4294 since r14-4944-gf55cdce3f8dd85

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113790

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787

--- Comment #11 from Richard Biener  ---
Btw, there's related IPA modref wrong-code issues where IPA and late summaries
are merged incorrectly (also receiving no attention)

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787

Richard Biener  changed:

   What|Removed |Added

 Target||aarch64

--- Comment #10 from Richard Biener  ---
I think it's ipa-modref analyze_store bailing for

  if (a.parm_index == MODREF_LOCAL_MEMORY_PARM)
return false;

no idea how it arrives at that.

[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7

2024-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #7 from Richard Biener  ---
(In reply to Tamar Christina from comment #6)
> The reason for the miscompile popping up is this change from the previous
> patch
> 
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 109d4ce5192..df3eab2e8d5 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -725,8 +725,7 @@ vect_analyze_early_break_dependences (loop_vec_info
> loop_vinfo)
>  bounded by VF so accesses are within range.  We only need to
> check the
>  reads since writes are moved to a safe place where if we get
> there we
>  know they are safe to perform.  */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> 
> but this should have bee safe, as the stores shouldn't be done until the
> point we know for sure they would be safe to do.
> 
> the code out of the vectorizer looks ok to me.  Valgrind is saying we're
> reading uninitialized values.  But those values I think come from a previous
> look which sets them to 0. Or is supposed to.  So working my way up this
> giant function.

Hmm, but there isn't really a "safe" place, is there?  If there's a safe
place then it would be safe for reads as well, no?

So I guess when you manage to massage the testcase to be based on decls
then you instead (with the above suggested change) get spurious stores?

[Bug tree-optimization/111478] [12 Regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

Richard Biener  changed:

   What|Removed |Added

Summary|[12/13 Regression] aarch64  |[12 Regression] aarch64 SVE
   |SVE ICE: in |ICE: in
   |compute_live_loop_exits, at |compute_live_loop_exits, at
   |tree-ssa-loop-manip.cc:250  |tree-ssa-loop-manip.cc:250
  Known to work||13.2.1

--- Comment #7 from Richard Biener  ---
Backported to GCC 13.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 112618, which changed state.

Bug 112618 Summary: [13 Regression] internal compiler error: in 
expand_MASK_CALL, at internal-fn.cc:4529
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112618

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/112618] [13 Regression] internal compiler error: in expand_MASK_CALL, at internal-fn.cc:4529

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112618

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||13.2.1
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug tree-optimization/110243] [12/13 Regression] Wrong code at -O3 on x86_64-linux-gnu since r13-3875-g9e11ceef165

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110243

--- Comment #16 from Richard Biener  ---
Backporting to GCC 13 causes gcc.dg/tree-ssa/ldist-17.c to FAIL.

[Bug target/113779] Very inefficient m68k code generated for simple copy loop

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113779

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-06
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener  ---
It's already visible with a simple

void f(const long* src, long* dst)
{
  *dst++ = *src++;
  *dst = *src;
}

where we expand to RTL from

  _1 = *src_3(D);
  *dst_4(D) = _1;
  _2 = MEM[(const long int *)src_3(D) + 4B];
  MEM[(long int *)dst_4(D) + 4B] = _2;

there's nothing on GIMPLE that would split the add and RTLs auto-inc-dec
pass doesn't do anything either.  We'd need a form of "strength-reduction"
or maybe targets prefering auto-inc/dec should not legitimize constant
offsets before reload ...

Note with one more copy you then see

  _1 = *src_4(D);
  *dst_5(D) = _1;
  _2 = MEM[(const long int *)src_4(D) + 4B];
  MEM[(long int *)dst_5(D) + 4B] = _2;
  _3 = MEM[(const long int *)src_4(D) + 8B];
  MEM[(long int *)dst_5(D) + 8B] = _3;

and naiively splitting gives you

  src_6 = src_4(D) + 4;
  src_7 = src_4(D) + 8;

that said, it's really sth for RTL since it's going to be highly target
dependent which form is more efficient.  The auto-inc pass is well
structured, so it should be possible to extend it.

[Bug tree-optimization/113703] ivopts miscompiles loop

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703

--- Comment #5 from Richard Biener  ---
It's going wrong in iv_elimination_compare_lt which tries to exactly handle
this kind of loop:

   We aim to handle the following situation:

   sometype *base, *p;
   int a, b, i;

   i = a;
   p = p_0 = base + a;

   do
 {
   bla (*p);
   p++;
   i++;
 }
   while (i < b);

   Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
   We aim to optimize this to

   p = p_0 = base + a;
   do
 {
   bla (*p);
   p++;
 }
   while (p < p_0 - a + b);

   This preserves the correctness, since the pointer arithmetics does not
   overflow.  More precisely:

   1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is
no
  overflow in computing it or the values of p.
   2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
  overflow.  To prove this, we use the fact that p_0 = base + a.

there's either a hole in that logic or the implementation is off.

  /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not
 overflow.  */
  offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step),
cand->iv->step,
fold_convert (TREE_TYPE (cand->iv->step), a));
  if (!difference_cannot_overflow_p (data, cand->iv->base, offset))
return false;

where 'A' is 'i', CAND->IV->BASE is 'p + i' and CAND->IV->STEP is 1
as 'sizetype'.

That just checks that (p + i) - i doesn't overflow.

Somehow it misses to prove p + b doesn't overflow since we end up with
p' < (p + i) + (n - i) aka p' < p + n.

[Bug middle-end/24639] [meta-bug] bug to track all Wuninitialized issues

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639
Bug 24639 depends on bug 109559, which changed state.

Bug 109559 Summary: [12/13/14 Regression] Unexpected -Wmaybe-uninitialized 
warning when inlining with system header
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109559

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |INVALID

[Bug middle-end/109559] [12/13/14 Regression] Unexpected -Wmaybe-uninitialized warning when inlining with system header

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109559

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Richard Biener  ---
So invalid.

[Bug gcov-profile/113765] [14 Regression] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #10 from Richard Biener  ---
I see the 'pair' type is marked TYPE_CXX_ODR_P, I'd say you should see a
ODR type violation diagnostic, and if you don't, this means we force different
alias sets for both?  Not sure - Honza added this stuff.

It only affects TYPE_CANONICAL though, regular type merging shouldn't merge
them but it's likely that you get to see another type because of COMDATs
and symbol merging chosing a different prevailing function which has that
other type?

Btw, can you dump the mangled name of the type?  It should be
type_with_linkage_p () I think, of course 'pair' itself is a template
so only a specific instantiation should be subject to ODR.  (of course
there might be ODR functions that use different instantiated pair in
the signature ..)

[Bug target/113779] Very inefficient m68k code generated for simple copy loop

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113779

--- Comment #2 from Richard Biener  ---
I don't think IVOPTs would use postinc for the intermediate increments.  It's
constant propagation/forwarding that accumulates the increments to a constant
offset which removes dependences on the instructions and thus would allow the
loads/stores to be executed in parallel (well, not that m68k uarchs likely can
do any of that ...).

I wonder if the code we emit is measurably slower though?  It's possibly
a little bit larger due to the two IV increments.

[Bug tree-optimization/113775] Bogus Wstringop-overflow in __atomic_load_n combined with sanitizer flags

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113775

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #2 from Richard Biener  ---
Yeah, the

'cc1plus: note: destination object is likely at address zero'

message hints at that we likely diagnose a threaded path where the pointer
is zero.  We were likely inclined to perform the threading by dynamic
checks inserted by the sanitizer.

[Bug target/113763] [14 Regression] build fails with clang++ host compiler because aarch64.cc uses C++14 constexpr.

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113763

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug gcov-profile/113765] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
   Keywords||ice-checking
Version|unknown |14.0

--- Comment #2 from Richard Biener  ---
Honza added extra checking for this for gcc14.

[Bug middle-end/109559] [12/13/14 Regression] Unexpected -Wmaybe-uninitialized warning when inlining with system header

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109559

--- Comment #8 from Richard Biener  ---
Created attachment 57325
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57325=edit
patch

Patch.  Breaks expected diagnostics for inlines from system headers.

[Bug middle-end/109559] [12/13/14 Regression] Unexpected -Wmaybe-uninitialized warning when inlining with system header

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109559

--- Comment #7 from Richard Biener  ---
So the 2nd hunk tests OK but the first for example runs into

FAIL: gcc.dg/Wfree-nonheap-object-4.c  (test for warnings, line 19)

where we explicitly seem to expect the warning when the system header code
is inlined into non-system-header context.

That's btw the same that happens for the testcase in this bug - we inline
the has_trivial_copy_and_destroy into integrate () which isn't in a
system header.

So it seems this was a deliberate choice ... which would mean the bug at
hand is INVALID.  (-Wno-system-headers has no effect)

[Bug middle-end/109559] [12/13/14 Regression] Unexpected -Wmaybe-uninitialized warning when inlining with system header

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109559

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||msebor at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener  ---
Note the diagnostic is "valid" and for

FilonIntegral::integrate ()


function_base::has_trivial_copy_and_destroy ([(struct function1
*)].D.2804);

and we're using the stmts location to diagnose this which expands to

(gdb) p IS_ADHOC_LOC (location)
$8 = true
(gdb) p get_location_from_adhoc_loc (line_table, location)
$9 = 268224
(gdb) p expand_location ($9)
$10 = {file = 0x4d895d0 "", line = 5, column = 49, data = 0x0, sysp = true}

so the system header flag is correct.  There's

  if ((was_warning || diagnostic->kind == DK_WARNING)
  && ((!m_warn_system_headers
   && diagnostic->m_iinfo.m_allsyslocs)
  || m_inhibit_warnings))
/* Bail if the warning is not to be reported because all locations in the
   inlining stack (if there is one) are in system headers.  */
return false; 

I've added -Wno-system-headers, and

(gdb) p m_warn_system_headers
$14 = false
(gdb) p diagnostic->m_iinfo.m_allsyslocs
$15 = false
(gdb) p was_warning
$16 = true
(gdb) p m_inhibit_warnings
$17 = false

so the issue seems to be that the active m_set_locations_cb
tree-diagnostic.cc:set_inlining_locations computes that "wrongly".

The operator= associated inline block location isn't in a system header
(the abstract origin, the operator= FUNCTION_DECL does have a
DECL_SOURCE_LOCATION that's in a system header though).

_Note_ we're assigning that BLOCK the location of the _call_ (it's for
the parameter setup), _not_ the location of the callee!

  /* Build a block containing code to initialize the arguments, the
 actual inline expansion of the body, and a label for the return
 statements within the function to jump to.  The type of the
 statement expression is the return type of the function call.
 ???  If the call does not have an associated block then we will
 remap all callee blocks to NULL, effectively dropping most of
 its debug information.  This should only happen for calls to
 artificial decls inserted by the compiler itself.  We need to
 either link the inlined blocks into the caller block tree or
 not refer to them in any way to not break GC for locations.  */
  if (tree block = gimple_block (stmt))
{
  /* We do want to assign a not UNKNOWN_LOCATION BLOCK_SOURCE_LOCATION
 to make inlined_function_outer_scope_p return true on this BLOCK.  */
  location_t loc = LOCATION_LOCUS (gimple_location (stmt));
  if (loc == UNKNOWN_LOCATION)
loc = LOCATION_LOCUS (DECL_SOURCE_LOCATION (fn));
  if (loc == UNKNOWN_LOCATION)
loc = BUILTINS_LOCATION;
  id->block = make_node (BLOCK);
  BLOCK_ABSTRACT_ORIGIN (id->block) = DECL_ORIGIN (fn);
  BLOCK_SOURCE_LOCATION (id->block) = loc;
  prepend_lexical_block (block, id->block);

since this particular hook implementation was added by Martin S. I don't
have high hopes of that being a concious decision.

  while (block && TREE_CODE (block) == BLOCK
 && BLOCK_ABSTRACT_ORIGIN (block))
{
  tree ao = BLOCK_ABSTRACT_ORIGIN (block);
  if (TREE_CODE (ao) == FUNCTION_DECL)
{
  if (!diagnostic->m_iinfo.m_ao)
diagnostic->m_iinfo.m_ao = block;

  location_t bsloc = BLOCK_SOURCE_LOCATION (block);
  ilocs.safe_push (bsloc);
  if (in_system_header_at (bsloc))

I think this should either look at DECL_SOURCE_LOCATION (ao) or
at the location of the block nested in 'block'.

Note we then still warn because

  if (ilocs.length ())
{
  /* When there is an inlining context use the macro expansion
 location for the original location and bump up NSYSLOCS if
 it's in a system header since it's not counted above.  */
  location_t sysloc = expansion_point_location_if_in_system_header (loc);
  if (sysloc != loc)

gets us the same location, failing to do

  loc = sysloc;
  ++nsyslocs;
}

and then

  ilocs.safe_push (loc);

makes

  /* Set if all locations are in a system header.  */
  diagnostic->m_iinfo.m_allsyslocs = nsyslocs == ilocs.length ();

fail.  The logic is odd though, if it was not macro expanded it's off.

The following fixes it for me.

diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index a660c7d0785..a49f8939ce7 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -328,7 +328,7 @@ set_inlining_locations (diagnostic_context *,
  if (!diagnostic->m_iinfo.m_ao)
 

[Bug tree-optimization/113707] [14 Regression] ICE on valid code at -O1 on x86_64-linux-gnu: Segmentation fault since r14-8683

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113707

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
Fixed.

[Bug tree-optimization/113703] ivopts miscompiles loop

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||needs-bisection
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-05

--- Comment #4 from Richard Biener  ---
Confirmed.

[Bug middle-end/113762] TYPE_ADDR_SPACE requirements on tcc_reference trees not documented/checked

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113762

Richard Biener  changed:

   What|Removed |Added

   Keywords||documentation
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2024-02-05
 Status|UNCONFIRMED |ASSIGNED

[Bug middle-end/113762] New: TYPE_ADDR_SPACE requirements on tcc_reference trees not documented/checked

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113762

Bug ID: 113762
   Summary: TYPE_ADDR_SPACE requirements on tcc_reference trees
not documented/checked
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

There's not much documentation on what part of a tcc_reference chain
(handled_component_p + base) needs to reflect the TYPE_ADDR_SPACE in effect.
RTL expansion looks at the base of the chain but for example
build_fold_addr_expr_loc simply looks at the outermost object.

There's also no IL checking in place to verify consistency within such a chain.

And test coverage isn't too great for address-spaces in general.

[Bug tree-optimization/113736] ICE: verify_gimple failed: incompatible types in 'PHI' argument 0 with _BitInt() struct copy to __seg_fs/gs

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113736

--- Comment #4 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #3)
> On Sat, 3 Feb 2024, jakub at gcc dot gnu.org wrote:
> > Bitint lowering changes here
> >   MEM < _BitInt(768)> [( struct T 
> > *)p_2(D)] =
> > s_4(D);
> > to
> >   VIEW_CONVERT_EXPR(MEM < _BitInt(768)>
> > [( struct T *)p_2(D)])[_5] = s_7(D);
> > accesses in a loop.  Is that invalid and should have  also 
> > in
> > the VCE type?  Or is this just a vectorizer bug?
> 
> I think that's OK, I will have a look.

I stand corrected - it isn't correct.  The address-space needs to be on
all types involved in a memory reference (RTL expansion is later quite
forgiving though).

This needs better documentation and maybe even IL checking I guess.

[Bug tree-optimization/113756] [14 regression] Wrong code at -O2 on x86_64-linux-gnu since r14-2780-g39f117d6c87

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113756

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug modula2/113749] [14 Regression] m2 enabled build times out on i686-gnu (GNU Hurd)

2024-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113749

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

<    3   4   5   6   7   8   9   10   11   12   >