Re: [PATCH] tree-optimization/114151 - handle POLY_INT_CST in get_range_pos_neg

2024-02-29 Thread Richard Biener
On Thu, 29 Feb 2024, Jakub Jelinek wrote: > On Thu, Feb 29, 2024 at 09:21:02AM +0100, Richard Biener wrote: > > The following switches the logic in chrec_fold_multiply to > > get_range_pos_neg since handling POLY_INT_CST possibly mixed with > > non-poly ranges will make th

Re: [PATCH] middle-end/114070 - VEC_COND_EXPR folding

2024-02-29 Thread Richard Biener
On Thu, 29 Feb 2024, Richard Biener wrote: > The following amends the PR114070 fix to optimistically allow > the folding when we cannot expand the current vec_cond using > vcond_mask and we're still before vector lowering. This leaves > a small window between vectorization and lower

[PATCH] middle-end/114070 - VEC_COND_EXPR folding

2024-02-29 Thread Richard Biener
The following amends the PR114070 fix to optimistically allow the folding when we cannot expand the current vec_cond using vcond_mask and we're still before vector lowering. This leaves a small window between vectorization and lowering where we could break vec_conds that can be expanded via

[PATCH] tree-optimization/114151 - handle POLY_INT_CST in get_range_pos_neg

2024-02-29 Thread Richard Biener
The following switches the logic in chrec_fold_multiply to get_range_pos_neg since handling POLY_INT_CST possibly mixed with non-poly ranges will make the open-coding awkward and while not a perfect fit it should work. In turn the following makes get_range_pos_neg aware of POLY_INT_CSTs. I

Re: [PATCH] developer option: -fdump-generic-nodes; initial incorporation

2024-02-28 Thread Richard Biener
On Wed, Feb 28, 2024 at 4:14 PM David Malcolm wrote: > > On Wed, 2024-02-28 at 08:58 +0100, Richard Biener wrote: > > On Tue, Feb 27, 2024 at 10:20 PM Robert Dubner > > wrote: > > > > > > Richard, > > > > > > Thank you very much f

Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

2024-02-28 Thread Richard Biener
On Wed, 28 Feb 2024, Andre Vieira (lists) wrote: > > > On 27/02/2024 08:47, Richard Biener wrote: > > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 05/02/2024 09:56, Richard Biener wrote: >

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-28 Thread Richard Biener
> Am 28.02.2024 um 16:05 schrieb Jeff Law : > >  > >> On 2/28/24 03:05, Richard Biener wrote: >> >> Untested fix for targets that cannot handle the original IL below. >> I'm not convinced that's the way to go here, is it? Or scrap >> the testcase

[PATCH 2/2] tree-optimization/113831 - revert original fix

2024-02-28 Thread Richard Biener
This reverts the original fix for PR113831 which is better fixed by the PR114121 fix. I've XFAILed instead of removing the PR108355 testcase again. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/113831 PR tree-optimization/108355 *

[PATCH 1/2] tree-optimization/114121 - wrong VN with context sensitive range info

2024-02-28 Thread Richard Biener
When VN ends up exploiting range-info specifying the ao_ref offset and max_size we have to make sure to reflect this in the hashtable entry for the recorded expression. The PR113831 fix handled the case where we can encode this in the operands themselves but this bug shows the issue is more

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-28 Thread Richard Biener
On Tue, 27 Feb 2024, Richard Biener wrote: > On Tue, 27 Feb 2024, Jeff Law wrote: > > > > > > > On 2/27/24 06:53, Richard Biener wrote: > > > On Tue, 27 Feb 2024, Jeff Law wrote: > > > > > >> > > >> > > >> On 2

Re: [PATCH] developer option: -fdump-generic-nodes; initial incorporation

2024-02-28 Thread Richard Biener
On Wed, Feb 28, 2024 at 9:25 AM Jakub Jelinek wrote: > > On Wed, Feb 28, 2024 at 08:58:08AM +0100, Richard Biener wrote: > > Incidentially this looks like something fit for a google summer of code > > project. > > Ideally it would hook into print-tree.cc providing an

Re: [PATCH] graphite: Fix non-INTEGER_TYPE integral comparison handling [PR114041]

2024-02-28 Thread Richard Biener
*/ > + > +unsigned a[24], b[24]; > +enum E { E0 = 0, E1 = 1, E42 = 42, E56 = 56 }; > + > +__attribute__((noipa)) unsigned > +foo (enum E x) > +{ > + for (int i = 0; i < 24; ++i) > +a[i] = i; > + unsigned e; > + if (x >= E42) > +e = __builtin_clz ((un

Re: [PATCH] gimple-fold: Use bitwise vector types rather than barely supported huge integral types in memcpy etc. folding [PR113988]

2024-02-28 Thread Richard Biener
= 256 > +void > +foo (void *p, _BitInt(256) x) > +{ > + __builtin_memcpy (p, , sizeof x); > +} > + > +_BitInt(256) > +bar (void *p, _BitInt(256) x) > +{ > + _BitInt(246) y = x + 1; > + __builtin_memcpy (p, , sizeof y); > + return x; > +} > +#endif &g

Re: [PATCH] developer option: -fdump-generic-nodes; initial incorporation

2024-02-27 Thread Richard Biener
From a maintainance point I think it's important to have "dump a tree node" once, so when fields are added or deemed useful for presenting in a dump you don't have to chase down more than one place. Maintenance is also the reason to not simply accept your contribution as-is. I do hope th

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-27 Thread Richard Biener
On Tue, 27 Feb 2024, Jeff Law wrote: > > > On 2/27/24 06:53, Richard Biener wrote: > > On Tue, 27 Feb 2024, Jeff Law wrote: > > > >> > >> > >> On 2/27/24 00:43, Richard Biener wrote: > >>> On Tue, 27

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-27 Thread Richard Biener
On Tue, 27 Feb 2024, Jeff Law wrote: > > > On 2/27/24 00:43, Richard Biener wrote: > > On Tue, 27 Feb 2024, haochen.jiang wrote: > > > >> On Linux/x86_64, > >> > >> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > >> co

Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-27 Thread Richard Biener
to see the bigger picture to be kept in mind before altering the GIMPLE IL. Adding an internal function for an already present optab is a no-brainer. Adding a vectorizer and/or if-conversion pattern to make use of this during vectorization is existing practice. Adding pattern recognition to

Re: [PATCH] Fix internal error in GIMPLE DSE

2024-02-27 Thread Richard Biener
On Tue, Feb 27, 2024 at 1:50 PM Eric Botcazou wrote: > > Hi, > > this is a regression present on the mainline, 13 and 12 branches. For the > attached Ada case, it's a tree checking failure on the mainline at -O: > > +===GNAT BUG DETECTED==+ > |

[PATCH][v2] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-27 Thread Richard Biener
When folding a multiply CHRECs are handled like {a, +, b} * c is {a*c, +, b*c} but that isn't generally correct when overflow invokes undefined behavior. The following uses unsigned arithmetic unless either a is zero or a and b have the same sign. I've used simple early outs for INTEGER_CSTs and

Re: [PATCH] i386: For noreturn functions save at least the bp register if it is used [PR114116]

2024-02-27 Thread Richard Biener
On Tue, Feb 27, 2024 at 10:13 AM Jakub Jelinek wrote: > > On Tue, Feb 27, 2024 at 10:04:06AM +0100, Jakub Jelinek wrote: > > > I hope we at least avoid that at -O0, possibly also with -Og? > > > > r14-8495 fixed at least that. > > > > Of course, it can break debugging experience even when the

Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-02-27 Thread Richard Biener
On Mon, Feb 26, 2024 at 3:22 PM wrote: > > From: Pan Li > > We allowed vector type for get_stored_val when read is less than or > equal to store in previous. Unfortunately, we missed to adjust the > validate_subreg part accordingly. When the vector type's size is > less than vector register,

Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-27 Thread Richard Biener
On Sun, Feb 25, 2024 at 10:01 AM Tamar Christina wrote: > > Hi Pan, > > > From: Pan Li > > > > Hi Richard & Tamar, > > > > Try the DEF_INTERNAL_INT_EXT_FN as your suggestion. By mapping > > us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def. > > And then expand_US_PLUS in

Re: [PATCH] developer option: -fdump-generic-nodes; initial incorporation

2024-02-27 Thread Richard Biener
On Thu, Feb 22, 2024 at 5:46 PM Robert Dubner wrote: > > As part of an effort to learn how create a GENERIC tree in order to > implement a > COBOL front end, I created the dump_generic_nodes(), which accepts a > function_decl at the point it is provided to the middle end. The routine > generates

Re: [PATCH] i386: For noreturn functions save at least the bp register if it is used [PR114116]

2024-02-27 Thread Richard Biener
On Tue, Feb 27, 2024 at 9:42 AM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, on x86_64 currently a lot of ICEs end up > with crashes in the unwinder like: > during RTL pass: expand > pr114044-2.c: In function ‘foo’: > pr114044-2.c:5:3: internal compiler error: in expand_fn_using_insn,

Re: [PATCH] expand: Add trivial folding for bit query builtins at expansion time [PR114044]

2024-02-27 Thread Richard Biener
On Tue, 27 Feb 2024, Jakub Jelinek wrote: > On Tue, Feb 27, 2024 at 09:35:43AM +0100, Richard Biener wrote: > > I do wonder whether we can handle the missing LHS case generically > > in the direct optab expander for fns that are PURE or CONST? > > Maybe the 2 operand expan

Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

2024-02-27 Thread Richard Biener
On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > On 05/02/2024 09:56, Richard Biener wrote: > > On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 01/02/2024 07:19, Richard Biener wrote: > >>> On Wed, 31 Jan 2

Re: [PATCH] expand: Add trivial folding for bit query builtins at expansion time [PR114044]

2024-02-27 Thread Richard Biener
024-02-26 14:19:30.079824133 +0100 > @@ -0,0 +1,45 @@ > +/* PR rtl-optimization/114044 */ > +/* { dg-do compile { target bitint575 } } */ > +/* { dg-options "-O -fno-tree-dce" } */ > + > +void > +foo (void) > +{ > + unsigned _BitInt (575) a = 3; > + __builtin_clzg (a)

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread Richard Biener
On Tue, 27 Feb 2024, haochen.jiang wrote: > On Linux/x86_64, > > af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > commit af66ad89e8169f44db723813662917cf4cbb78fc > Author: Richard Biener > Date: Fri Feb 23 16:06:05 2024 +0100 > > middle-end/1

[PATCH] tree-optimization/114081 - dominator update for prologue peeling

2024-02-26 Thread Richard Biener
The following implements manual update for multi-exit loop prologue peeling during vectorization. Boostrap / regtest running on x86_64-unknown-linux-gnu. I think the amount of coverage for prologue peeling with early exits is very low, so my testing success might not mean much. Richard.

Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote: > On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote: > > When folding a multiply CHRECs are handled like {a, +, b} * c > > is {a*c, +, b*c} but that isn't generally correct when overflow > > invokes undefined behavior

[PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Richard Biener
When folding a multiply CHRECs are handled like {a, +, b} * c is {a*c, +, b*c} but that isn't generally correct when overflow invokes undefined behavior. The following uses unsigned arithmetic unless either a is zero or a and b have the same sign. I've used simple early outs for INTEGER_CSTs and

RE: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Richard Biener
t-loop.cc b/gcc/tree-vect-loop.cc > > > index > > 35f1f8c7d4245135ace740ff9be548919587..ab19ad6a6be516e3ee1f0fbeaae > > effeae1fb900f 100644 > > > --- a/gcc/tree-vect-loop.cc > > > +++ b/gcc/tree-vect-loop.cc > > > @@ -11987,7 +11987,12 @@ vect_tra

[PATCH 2/2] tree-optimization/114099 - virtual LC PHIs and early exit vect

2024-02-26 Thread Richard Biener
In some cases exits can lack LC PHI nodes for the virtual operand. We have to create them when the epilog loop requires them which also allows us to remove some only halfway correct fixups. This is the variant triggering for alternate exits. Bootstrap and regtest pending on

[PATCH 1/2] tree-optimization/114068 - missed virtual LC PHI after vect peeling

2024-02-26 Thread Richard Biener
When we choose the IV exit to be one leading to no virtual use we fail to have a virtual LC PHI even though we need it for the epilog entry. The following makes sure to create it so that later updating works. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. PR

Re: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Richard Biener
torizer.h > +++ b/gcc/tree-vectorizer.h > @@ -961,6 +961,10 @@ public: >/* Statements whose VUSES need updating if early break vectorization is to > happen. */ >auto_vec early_break_vuses; > + > + /* Dominators that need to be recalculated that have been deferred un

Re: [PATCH]middle-end: update vuses out of loop which use a vdef that's moved [PR114068]

2024-02-26 Thread Richard Biener
latch_edge (loop)); > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, last_seen_vuse) > + { > + if (flow_bb_inside_loop_p (loop, use_stmt->bb)) > + continue; > + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) > + SET_USE (use_p, vuse); > + } > +} > + >/* And update the LC PHIs on exits. */ >for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo))) > if (!dominated_by_p (CDI_DOMINATORS, e->src, dest_bb)) > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote: > On Mon, Feb 26, 2024 at 09:53:41AM +0100, Richard Biener wrote: > > On Mon, 26 Feb 2024, Jakub Jelinek wrote: > > > > > On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote: > > > > > > @@ -6756,7 +

Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote: > On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote: > > > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo, > > > > those through even when the mode isn't word_mode. For > >

Re: [PATCH] match.pd: Guard 2 simplifications on integral TYPE_OVERFLOW_UNDEFINED [PR114090]

2024-02-26 Thread Richard Biener
_attribute__((noipa)) int > +bar (int x) > +{ > + int w = (x >= 0 ? x : 0); > + int z = (x <= 0 ? -x : 0); > + return w + z; > +} > + > +__attribute__((noipa)) int > +baz (int x) > +{ > + return x <= 0 ? -x : 0; > +} > + > +int > +main () > +{ &

Re: [PATCH] fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]

2024-02-26 Thread Richard Biener
> return > fold_convert_loc (loc, type, associate_trees (loc, var0, con0, > code, atype)); > --- gcc/testsuite/gcc.dg/bitint-94.c.jj 2024-02-24 11:18:32.607018363 > +0100 > +++ gcc/testsuite/gcc.dg/bitint-94.c 2024-02-24 11:19:09.023500121 +0100 > @@ -0,0 +1,12 @@ > +/* PR middle-end/114084 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > + > +typedef unsigned _BitInt(31) T; > +T a, b; > + > +void > +foo (void) > +{ > + b = (T) ((a | (-1U >> 1)) >> 1 | (a | 5) << 4); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
ed branches - the effective check should be the same in GCC 13 at least, but with some added ad-hoc costing which might make this not trigger (maybe_lt (nunits_out, 4U)) - so we'd need a word_mode that can cover 4 FP elements. Possibly triggerable with HFmode? Thanks, Richard. > LGTM, but please wait until Monday evening so that Richi or Richard > have a chance to chime in. > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] middle-end/114070 - folding breaking VEC_COND expansion

2024-02-25 Thread Richard Biener
The following properly guards the simplifications that move operations into VEC_CONDs, in particular when that changes the type constraints on this operation. This needed a genmatch fix which was recording spurious implicit fors when tcc_comparison is used in a C expression. Bootstrapped and

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Richard Biener
On Mon, Feb 26, 2024 at 4:26 AM wrote: > > From: Pan Li > > We allowed vector type for get_stored_val when read is less than or > equal to store in previous. Unfortunately, we missed to adjust the > validate_subreg part accordingly. For vector type, we don't need to > restrict the mode size is

Re: [PATCH] Use HOST_WIDE_INT_{C,UC,0,0U,1,1U} macros some more

2024-02-24 Thread Richard Biener
> Am 24.02.2024 um 08:44 schrieb Jakub Jelinek : > > Hi! > > I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned > HOST_WIDE_INT) constant and turned them into uses of the appropriate > macros. > THere are quite a few cases in non-i386 backends but I've left that out > for

Re: [PATCH] bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and VECTOR/COMPLEX_TYPE etc. [PR114073]

2024-02-24 Thread Richard Biener
> Am 24.02.2024 um 08:40 schrieb Jakub Jelinek : > > Hi! > > The following patch implements support for VIEW_CONVERT_EXPRs from/to > large/huge _BitInt to/from vector or complex types or anything else but > integral/pointer types which doesn't need to live in memory. > >

Re: [PATCH] vect: Tighten check for impossible SLP layouts [PR113205]

2024-02-24 Thread Richard Biener
> Am 24.02.2024 um 11:06 schrieb Richard Sandiford : > > During its forward pass, the SLP layout code tries to calculate > the cost of a layout change on an incoming edge. This is taken > as the minimum of two costs: one in which the source partition > keeps its current layout (chosen

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Richard Biener
On Fri, 23 Feb 2024, Jakub Jelinek wrote: > On Fri, Feb 23, 2024 at 02:22:19PM +, Andrew Stubbs wrote: > > On 23/02/2024 13:02, Jakub Jelinek wrote: > > > On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: > > > > This is a follow-up to the previous patch to ensure that integer

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Richard Biener
> Am 23.02.2024 um 14:03 schrieb Jakub Jelinek : > > On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: >> This is a follow-up to the previous patch to ensure that integer vector >> bit-masks do not have excess bits set. It fixes a bug, observed on >> amdgcn, in which the mask

Re: [PATCH] expr: Fix REDUCE_BIT_FIELD in multiplication expansion [PR114054]

2024-02-23 Thread Richard Biener
29.464277919 +0100 > @@ -0,0 +1,17 @@ > +/* PR rtl-optimization/114054 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-Og -fwhole-program -fno-tree-ccp -fprofile-use > -fno-tree-copy-prop -w" } */ > + > +int x; > + > +void > +foo (int i, u

Re: [PATCH] bitintlower: Fix .{ADD,SUB}_OVERFLOW lowering [PR114040]

2024-02-23 Thread Richard Biener
"-flto" } { "" } } */ > + > +unsigned a; > +signed char b; > +short c; > +long d; > +__int128 e; > +int f; > + > +#if __BITINT_MAXWIDTH__ >= 511 > +__attribute__((noinline)) void > +foo (_BitInt(3) x, unsigned _BitInt(511) y, unsigned *z) > +

[PATCH][www] Document ia64*-*-* obsolescence

2024-02-23 Thread Richard Biener
The following documents obsoleting of ia64*-*-*. Pushed. * gcc-14/changes.html: Document ia64*-*-* obsoleting. --- htdocs/gcc-14/changes.html | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index

[PATCH] Add ia64*-*-* to the list of obsolete targets

2024-02-23 Thread Richard Biener
The following deprecates ia64*-*-* for GCC 14. Since we plan to force LRA for GCC 15 and the target only has slim chances of getting updated this notifies people in advance. Given both Linux and glibc have axed the target further development is also made difficult. "Tested" for ia64-elf and

[PATCH] tree-optimization/114048 - ICE in copy_reference_ops_from_ref

2024-02-22 Thread Richard Biener
The following adds another omission to the assert verifying we're not running into spurious off == -1. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/114048 * tree-ssa-sccvn.cc (copy_reference_ops_from_ref): MEM_REF can also produce -1

[PATCH] tree-optimization/114027 - conditional reduction chain

2024-02-22 Thread Richard Biener
When we classify a conditional reduction chain as CONST_COND_REDUCTION we fail to verify all involved conditionals have the same constant. That's a quite unlikely situation so the following simply disables such classification when there's more than one reduction statement. Bootstrapped and tested

Re: [PATCH] profile-count: Don't dump through a temporary buffer [PR111960]

2024-02-22 Thread Richard Biener
On Thu, Feb 22, 2024 at 10:07 AM Jakub Jelinek wrote: > > Hi! > > The profile_count::dump (char *, struct function * = NULL) const; > method has a single caller, the > profile_count::dump (FILE *f, struct function *fun) const; > method and for that going through a temporary buffer is just slower

Re: [PATCH] call-cdce: Add missing BUILT_IN_*F{32,64}X handling and improve BUILT_IN_*L [PR113993]

2024-02-22 Thread Richard Biener
FLT128_MANT_DIG__ > +void > +flt128 (_Float128 f1, _Float128 f2, _Float128 f3, _Float128 f4, _Float128 f5, > + _Float128 f6, _Float128 f7, _Float128 f8, _Float128 f9) > +{ > + if (!(f1 >= -1.0f128 && f1 <= 1.0f128)) __builtin_unreachable (); > + __builtin_acosf

Re: [PATCH] libcpp: Improve location for macro names [PR66290]

2024-02-22 Thread Richard Biener
On Tue, Feb 20, 2024 at 3:33 PM Lewis Hyatt wrote: > > On Mon, Feb 19, 2024 at 11:36 PM Alexandre Oliva wrote: > > > > This backport for gcc-13 is the first of two required for the > > g++.dg/pch/line-map-3.C test to stop hitting a variant of the known > > problem mentioned in that testcase: on

Re: [PATCH] bitintlower: Fix .MUL_OVERFLOW overflow checking [PR114038]

2024-02-22 Thread Richard Biener
run_expensive_tests } { "*" } { "-O0" "-O2" } } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ > + > +#if __BITINT_MAXWIDTH__ >= 129 > +int > +foo (unsigned _BitInt(63) x, unsigned _BitInt

Re: Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status)

2024-02-21 Thread Richard Biener
> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge : > > Hi! > >> On 2024-02-01T15:49:02+0100, Richard Biener wrote: >>> On Thu, 1 Feb 2024, Thomas Schwinge wrote: >>> On 2024-01-26T10:45:10+0100, Richard Biener wrote: >>>> On Fri, 26 Jan 2024,

Re: [PATCH] aarch64: Allow aarch64-linux-muscl for heap trampolines [PR113971].

2024-02-20 Thread Richard Biener
On Tue, Feb 20, 2024 at 11:27 AM Iain Sandoe wrote: > > Tested on aarch64-linux-gnu, aarch64-darwin by me and on aarch64-linux-musl > by Sam James (thanks!). OK for trunk? OK > thanks > Iain > > --- 8< --- > > > This allows the same trampoline pattern to be used on all linux variants > rather

Re: [PATCH] c-family, c++, v2: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-20 Thread Richard Biener
On Tue, 20 Feb 2024, Jakub Jelinek wrote: > On Tue, Feb 20, 2024 at 09:01:10AM +0100, Richard Biener wrote: > > I'm not sure those would be really equivalent (MEM_REF vs. V_C_E > > as well as combined vs. split). It really depends how RTL expansion > > handles this (as yo

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-20 Thread Richard Biener
On Tue, 20 Feb 2024, Thomas Schwinge wrote: > Hi Richard! > > On 2024-02-20T08:44:35+0100, Richard Biener wrote: > > On Mon, 19 Feb 2024, Thomas Schwinge wrote: > >> On 2024-02-19T17:31:20+0100, I wrote: > >> > On 2024-02-19T11:52:55+0100, Richard Biener

Re: [PATCH] c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-20 Thread Richard Biener
On Tue, 20 Feb 2024, Jakub Jelinek wrote: > On Tue, Feb 20, 2024 at 12:12:11AM +, Jason Merrill wrote: > > On 2/19/24 02:55, Jakub Jelinek wrote: > > > On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote: > > > > Ah, although __atomic_compare_exchange only takes pointers, the > >

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Richard Biener
On Mon, 19 Feb 2024, Thomas Schwinge wrote: > Hi! > > On 2024-02-19T17:31:20+0100, I wrote: > > On 2024-02-19T11:52:55+0100, Richard Biener wrote: > >> On Mon, 19 Feb 2024, Thomas Schwinge wrote: > >>> On 2024-02-16T14:53:04+0100, I wrote: > >>&

Re: [PATCH] ipa: Convert lattices from pure array to vector (PR 113476)

2024-02-19 Thread Richard Biener
a-prop.h > index 9c78dc9f486..ee3c0006add 100644 > --- a/gcc/ipa-prop.h > +++ b/gcc/ipa-prop.h > @@ -627,7 +627,7 @@ public: >vec *descriptors; >/* Pointer to an array of structures describing individual formal > parameters. */ > - class ipcp_param_lattices * G

Re: veclower: improve selection of vector mode when lowering [PR 112787]

2024-02-19 Thread Richard Biener
a+sve conflicts > with -mcpu=neoverse-n2 in previous gcc versions. Yes. Thanks, Richard. > Kind Regards, > Andre > > On 20/12/2023 14:30, Richard Biener wrote: > > On Wed, 20 Dec 2023, Andre Vieira (lists) wrote: > > > >> Thanks, fully agree with all comm

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener
On Mon, 19 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > >> I suppose that's better than the first version when a block has a > >> large number of dominance frontiers. But I can't remember whether > >> that was the case in PR98863. I have a

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener
On Mon, 19 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > On Mon, 19 Feb 2024, Richard Sandiford wrote: > > > >> Richard Biener writes: > >> > The following tries to address the PHI insertion compile-time hog in > >> > RTL

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener
On Mon, 19 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > The following tries to address the PHI insertion compile-time hog in > > RTL fwprop observed with the PR54052 testcase where the loop computing > > the "unfiltered" set of variables poss

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Richard Biener
On Mon, 19 Feb 2024, Thomas Schwinge wrote: > Hi! > > On 2024-02-16T14:53:04+0100, I wrote: > > On 2024-02-16T12:41:06+, Andrew Stubbs wrote: > >> On 16/02/2024 12:26, Richard Biener wrote: > >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote: > >>

[PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener
The following tries to address the PHI insertion compile-time hog in RTL fwprop observed with the PR54052 testcase where the loop computing the "unfiltered" set of variables possibly needing PHI nodes for each block exhibits quadratic compile-time and memory-use. Instead of only pruning the set

Re: [PATCH] match.pd: Fix ICE on BIT_INSERT_EXPR of BIT_FIELD_REF folding [PR113967]

2024-02-19 Thread Richard Biener
+/* PR tree-optimization/113967 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +typedef unsigned short W __attribute__((vector_size (4 * sizeof (short)))); > + > +void > +foo (W *p) > +{ > + W x = *p; > + W y = {}; > + __builtin_memc

Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-18 Thread Richard Biener
On Sat, Feb 17, 2024 at 11:30 AM wrote: > > From: Pan Li > > This patch would like to add the middle-end presentation for the > unsigned saturation add. Aka set the result of add to the max > when overflow. It will take the pattern similar as below. > > SAT_ADDU (x, y) => (x + y) |

[PATCH][RFC] tree-optimization/113910 - bitmap_hash is weak, improve iterative_hash_*

2024-02-16 Thread Richard Biener
The following addresses the weak bitmap_hash function which results in points-to analysis taking a long time because of a high collision rate in one of its bitmap hash tables. Using a better hash function like in the bitmap.cc hunk below doesn't help unless one also replaces the hash function in

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Richard Biener
On Fri, 16 Feb 2024, Andrew Stubbs wrote: > On 16/02/2024 10:17, Richard Biener wrote: > > On Fri, 16 Feb 2024, Thomas Schwinge wrote: > > > >> Hi! > >> > >> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: > >>

[PATCH] tree-optimization/113895 - consistency check fails in copy_reference_ops_from_ref

2024-02-16 Thread Richard Biener
The following addresses consistency check fails in copy_reference_ops_from_ref when we are handling out-of-bound array accesses (it's almost impossible to identically mimic the get_ref_base_and_extent behavior). It also addresses the case where an out-of-bound constant offset computes to a -1 off

Re: GCN RDNA2+ vs. GCC SLP vectorizer (was: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL)

2024-02-16 Thread Richard Biener
On Fri, 16 Feb 2024, Thomas Schwinge wrote: > Hi! > > On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: > > I've committed this patch > > ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 > "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 > support builds on top

Re: [PATCH] c++/modules: optimize tree flag streaming

2024-02-16 Thread Richard Biener
On Thu, Feb 15, 2024 at 7:38 PM Patrick Palka wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look > OK for trunk? Btw, there's the "bitpack" streaming support in data-streamer.h also added for exactly the same reason, it's likely not easily re-usable but this kind of

Re: [PATCH] tree-optimization/113910 - huge compile time during PTA

2024-02-15 Thread Richard Biener
> Am 15.02.2024 um 18:06 schrieb Richard Sandiford : > > Richard Biener writes: >>> On Wed, 14 Feb 2024, Richard Biener wrote: >>> >>> For the testcase in PR113910 we spend a lot of time in PTA comparing >>> bitmaps for looking up equivalence cla

Re: [PATCH] expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921]

2024-02-15 Thread Richard Biener
} > } > } > --- gcc/testsuite/gcc.target/i386/pr113921.c.jj 2024-02-14 > 21:21:15.194178515 +0100 > +++ gcc/testsuite/gcc.target/i386/pr113921.c 2024-02-14 21:20:52.745476040 > +0100 > @@ -0,0 +1,20 @@ > +/* PR middle-end/113921 */ > +/* { dg-do run } */ &g

[PATCH] tree-optimization/111156 - properly dissolve SLP only groups

2024-02-15 Thread Richard Biener
The following fixes the omission of failing to look at pattern stmts when we need to dissolve SLP only groups. Bootstrapped and tested on x86-64-unknown-linux-gnu, pushed. PR tree-optimization/56 * tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look at the pattern

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Andrew Stubbs wrote: > On 15/02/2024 10:21, Richard Biener wrote: > [snip] > >>> I suppse if RDNA really only has 32 lane vectors (it sounds like it, > >>> even if it can "simulate" 64 lane ones?) then it might make sense to

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Andrew Stubbs wrote: > On 15/02/2024 07:49, Richard Biener wrote: > > On Wed, 14 Feb 2024, Andrew Stubbs wrote: > > > >> On 14/02/2024 13:43, Richard Biener wrote: > >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote: > >>>

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Biener
On Thu, 15 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > On Wed, 14 Feb 2024, Richard Sandiford wrote: > > > >> Richard Biener writes: > >> > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> > > >> >> Richa

Re: [PATCH 2/2] doc: Add documentation of which operand matches the mode of the standard pattern name [PR113508]

2024-02-15 Thread Richard Biener
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski wrote: > > In some of the standard pattern names, it is not obvious which mode is being > used in the pattern > name. Is it operand 0, 1, or 2? Is it the wider mode or the narrower mode? > This fixes that so there is no confusion by adding a

Re: [PATCH 1/2] doc: Fix some standard named pattern documentation modes

2024-02-15 Thread Richard Biener
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski wrote: > > Currently these use `@var{m3}` but the 3 here is a literal 3 > and not part of the mode itself so it should not be inside > the var. Fixed as such. > > Built the documentation to make sure it looks correct now. OK > gcc/ChangeLog: > >

[PATCH] Do not record dependences from debug stmts in tail merging

2024-02-15 Thread Richard Biener
The following avoids recording BB dependences for debug stmt uses. Bootstrap and regtest running on x86_64-unknown-linux-gnu. It's unlikely a dependence is just because of debug stmts so actual compare-debug issues are very unlikely. Still spotted while investigating a CI regression mail (for

Re: [PATCH] lower-bitint: Ensure we don't get coalescing ICEs for (ab) SSA_NAMEs used in mul/div/mod [PR113567]

2024-02-15 Thread Richard Biener
> @@ -0,0 +1,23 @@ > +/* PR tree-optimization/113567 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-O2" } */ > + > +#if __BITINT_MAXWIDTH__ >= 129 > +_BitInt(129) v; > + > +void > +foo (_BitInt(129) a, int i) > +{ > + __label__

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Biener
On Wed, 14 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > On Wed, 14 Feb 2024, Richard Sandiford wrote: > > > >> Richard Biener writes: > >> > The following avoids accessing out-of-bound vector elements when > >> > native

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Andrew Stubbs wrote: > On 14/02/2024 13:43, Richard Biener wrote: > > On Wed, 14 Feb 2024, Andrew Stubbs wrote: > > > >> On 14/02/2024 13:27, Richard Biener wrote: > >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote: > >>>

Re: [PATCH] [libiberty] remove TBAA violation in iterative_hash, improve code-gen

2024-02-14 Thread Richard Biener
> Am 14.02.2024 um 16:22 schrieb Jakub Jelinek : > > On Wed, Feb 14, 2024 at 04:13:51PM +0100, Richard Biener wrote: >> The following removes the TBAA violation present in iterative_hash. >> As we eventually LTO that it's important to fix. This also improves >> co

Re: [PATCH]middle-end: inspect all exits for additional annotations for loop.

2024-02-14 Thread Richard Biener
> Am 14.02.2024 um 16:16 schrieb Tamar Christina : > >  >> >> >> I think this isn't entirely good. For simple cases for >> do {} while the condition ends up in the latch while for while () {} >> loops it ends up in the header. In your case the latch isn't empty >> so it doesn't end up

[PATCH][RFC] tree-optimization/113910 - improve bitmap_hash

2024-02-14 Thread Richard Biener
The following tries to improve the actual hash function for hashing bitmaps. We're still getting collision rates as high as 23 for the testcase in the PR. The following improves this by properly mixing in the bitmap element starting bit number. This brings down the collision rate below 1.4,

[PATCH] [libiberty] remove TBAA violation in iterative_hash, improve code-gen

2024-02-14 Thread Richard Biener
The following removes the TBAA violation present in iterative_hash. As we eventually LTO that it's important to fix. This also improves code generation for the >= 12 bytes loop by using | to compose the 4 byte words as at least GCC 7 and up can recognize that pattern and perform a 4 byte load

Re: [PATCH] tree-optimization/113910 - huge compile time during PTA

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Richard Biener wrote: > For the testcase in PR113910 we spend a lot of time in PTA comparing > bitmaps for looking up equivalence class members. This points to > the very weak bitmap_hash function which effectively hashes set > and a subset of not set bits. T

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Andrew Stubbs wrote: > On 14/02/2024 13:27, Richard Biener wrote: > > On Wed, 14 Feb 2024, Andrew Stubbs wrote: > > > >> On 13/02/2024 08:26, Richard Biener wrote: > >>> On Mon, 12 Feb 2024, Thomas Schwinge wrote: > >>> >

Re: [PATCH]middle-end: inspect all exits for additional annotations for loop.

2024-02-14 Thread Richard Biener
e7bc33654ffa027b493f23d278ac..a29681bffb902d2d05e3f18764ab519aacb3c5bc > 100644 > --- a/gcc/tree-cfg.cc > +++ b/gcc/tree-cfg.cc > @@ -327,6 +327,10 @@ replace_loop_annotate (void) >if (loop->latch) > replace_loop_annotate_in_block (loop->latch, loop); > &g

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Andrew Stubbs wrote: > On 13/02/2024 08:26, Richard Biener wrote: > > On Mon, 12 Feb 2024, Thomas Schwinge wrote: > > > >> Hi! > >> > >> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: > >>

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > The following avoids accessing out-of-bound vector elements when > > native encoding a boolean vector with sub-BITS_PER_UNIT precision > > elements. The error was basing the number o

Re: [PATCH] middle-end/113576 - zero padding of vector bools when expanding compares

2024-02-14 Thread Richard Biener
On Wed, 14 Feb 2024, Richard Sandiford wrote: > Richard Biener writes: > > The following zeros paddings of vector bools when expanding compares > > and the mode used for the compare is an integer mode. In that case > > targets cannot distinguish between a 4 element

<    1   2   3   4   5   6   7   8   9   10   >