On Thu, 29 Feb 2024, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 09:21:02AM +0100, Richard Biener wrote:
> > The following switches the logic in chrec_fold_multiply to
> > get_range_pos_neg since handling POLY_INT_CST possibly mixed with
> > non-poly ranges will make th
On Thu, 29 Feb 2024, Richard Biener wrote:
> The following amends the PR114070 fix to optimistically allow
> the folding when we cannot expand the current vec_cond using
> vcond_mask and we're still before vector lowering. This leaves
> a small window between vectorization and lower
The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering. This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via
The following switches the logic in chrec_fold_multiply to
get_range_pos_neg since handling POLY_INT_CST possibly mixed with
non-poly ranges will make the open-coding awkward and while not
a perfect fit it should work.
In turn the following makes get_range_pos_neg aware of POLY_INT_CSTs.
I
On Wed, Feb 28, 2024 at 4:14 PM David Malcolm wrote:
>
> On Wed, 2024-02-28 at 08:58 +0100, Richard Biener wrote:
> > On Tue, Feb 27, 2024 at 10:20 PM Robert Dubner
> > wrote:
> > >
> > > Richard,
> > >
> > > Thank you very much f
On Wed, 28 Feb 2024, Andre Vieira (lists) wrote:
>
>
> On 27/02/2024 08:47, Richard Biener wrote:
> > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:
> >
> >>
> >>
> >> On 05/02/2024 09:56, Richard Biener wrote:
>
> Am 28.02.2024 um 16:05 schrieb Jeff Law :
>
>
>
>> On 2/28/24 03:05, Richard Biener wrote:
>>
>> Untested fix for targets that cannot handle the original IL below.
>> I'm not convinced that's the way to go here, is it? Or scrap
>> the testcase
This reverts the original fix for PR113831 which is better fixed by
the PR114121 fix. I've XFAILed instead of removing the PR108355
testcase again.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/113831
PR tree-optimization/108355
*
When VN ends up exploiting range-info specifying the ao_ref offset
and max_size we have to make sure to reflect this in the hashtable
entry for the recorded expression. The PR113831 fix handled the
case where we can encode this in the operands themselves but this
bug shows the issue is more
On Tue, 27 Feb 2024, Richard Biener wrote:
> On Tue, 27 Feb 2024, Jeff Law wrote:
>
> >
> >
> > On 2/27/24 06:53, Richard Biener wrote:
> > > On Tue, 27 Feb 2024, Jeff Law wrote:
> > >
> > >>
> > >>
> > >> On 2
On Wed, Feb 28, 2024 at 9:25 AM Jakub Jelinek wrote:
>
> On Wed, Feb 28, 2024 at 08:58:08AM +0100, Richard Biener wrote:
> > Incidentially this looks like something fit for a google summer of code
> > project.
> > Ideally it would hook into print-tree.cc providing an
*/
> +
> +unsigned a[24], b[24];
> +enum E { E0 = 0, E1 = 1, E42 = 42, E56 = 56 };
> +
> +__attribute__((noipa)) unsigned
> +foo (enum E x)
> +{
> + for (int i = 0; i < 24; ++i)
> +a[i] = i;
> + unsigned e;
> + if (x >= E42)
> +e = __builtin_clz ((un
= 256
> +void
> +foo (void *p, _BitInt(256) x)
> +{
> + __builtin_memcpy (p, , sizeof x);
> +}
> +
> +_BitInt(256)
> +bar (void *p, _BitInt(256) x)
> +{
> + _BitInt(246) y = x + 1;
> + __builtin_memcpy (p, , sizeof y);
> + return x;
> +}
> +#endif
&g
From a maintainance point I think it's important to have "dump a tree node"
once, so when fields are added or deemed useful for presenting in a dump
you don't have to chase down more than one place. Maintenance is also
the reason to not simply accept your contribution as-is.
I do hope th
On Tue, 27 Feb 2024, Jeff Law wrote:
>
>
> On 2/27/24 06:53, Richard Biener wrote:
> > On Tue, 27 Feb 2024, Jeff Law wrote:
> >
> >>
> >>
> >> On 2/27/24 00:43, Richard Biener wrote:
> >>> On Tue, 27
On Tue, 27 Feb 2024, Jeff Law wrote:
>
>
> On 2/27/24 00:43, Richard Biener wrote:
> > On Tue, 27 Feb 2024, haochen.jiang wrote:
> >
> >> On Linux/x86_64,
> >>
> >> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> >> co
to see the bigger picture to be kept in mind
before altering the GIMPLE IL.
Adding an internal function for an already present optab is a
no-brainer. Adding a vectorizer
and/or if-conversion pattern to make use of this during vectorization
is existing practice.
Adding pattern recognition to
On Tue, Feb 27, 2024 at 1:50 PM Eric Botcazou wrote:
>
> Hi,
>
> this is a regression present on the mainline, 13 and 12 branches. For the
> attached Ada case, it's a tree checking failure on the mainline at -O:
>
> +===GNAT BUG DETECTED==+
> |
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior. The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.
I've used simple early outs for INTEGER_CSTs and
On Tue, Feb 27, 2024 at 10:13 AM Jakub Jelinek wrote:
>
> On Tue, Feb 27, 2024 at 10:04:06AM +0100, Jakub Jelinek wrote:
> > > I hope we at least avoid that at -O0, possibly also with -Og?
> >
> > r14-8495 fixed at least that.
> >
> > Of course, it can break debugging experience even when the
On Mon, Feb 26, 2024 at 3:22 PM wrote:
>
> From: Pan Li
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous. Unfortunately, we missed to adjust the
> validate_subreg part accordingly. When the vector type's size is
> less than vector register,
On Sun, Feb 25, 2024 at 10:01 AM Tamar Christina
wrote:
>
> Hi Pan,
>
> > From: Pan Li
> >
> > Hi Richard & Tamar,
> >
> > Try the DEF_INTERNAL_INT_EXT_FN as your suggestion. By mapping
> > us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def.
> > And then expand_US_PLUS in
On Thu, Feb 22, 2024 at 5:46 PM Robert Dubner wrote:
>
> As part of an effort to learn how create a GENERIC tree in order to
> implement a
> COBOL front end, I created the dump_generic_nodes(), which accepts a
> function_decl at the point it is provided to the middle end. The routine
> generates
On Tue, Feb 27, 2024 at 9:42 AM Jakub Jelinek wrote:
>
> Hi!
>
> As mentioned in the PR, on x86_64 currently a lot of ICEs end up
> with crashes in the unwinder like:
> during RTL pass: expand
> pr114044-2.c: In function ‘foo’:
> pr114044-2.c:5:3: internal compiler error: in expand_fn_using_insn,
On Tue, 27 Feb 2024, Jakub Jelinek wrote:
> On Tue, Feb 27, 2024 at 09:35:43AM +0100, Richard Biener wrote:
> > I do wonder whether we can handle the missing LHS case generically
> > in the direct optab expander for fns that are PURE or CONST?
>
> Maybe the 2 operand expan
On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:
>
>
> On 05/02/2024 09:56, Richard Biener wrote:
> > On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
> >
> >>
> >>
> >> On 01/02/2024 07:19, Richard Biener wrote:
> >>> On Wed, 31 Jan 2
024-02-26 14:19:30.079824133 +0100
> @@ -0,0 +1,45 @@
> +/* PR rtl-optimization/114044 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-O -fno-tree-dce" } */
> +
> +void
> +foo (void)
> +{
> + unsigned _BitInt (575) a = 3;
> + __builtin_clzg (a)
On Tue, 27 Feb 2024, haochen.jiang wrote:
> On Linux/x86_64,
>
> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> commit af66ad89e8169f44db723813662917cf4cbb78fc
> Author: Richard Biener
> Date: Fri Feb 23 16:06:05 2024 +0100
>
> middle-end/1
The following implements manual update for multi-exit loop prologue
peeling during vectorization.
Boostrap / regtest running on x86_64-unknown-linux-gnu.
I think the amount of coverage for prologue peeling with early exits
is very low, so my testing success might not mean much.
Richard.
On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote:
> > When folding a multiply CHRECs are handled like {a, +, b} * c
> > is {a*c, +, b*c} but that isn't generally correct when overflow
> > invokes undefined behavior
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior. The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.
I've used simple early outs for INTEGER_CSTs and
t-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 35f1f8c7d4245135ace740ff9be548919587..ab19ad6a6be516e3ee1f0fbeaae
> > effeae1fb900f 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -11987,7 +11987,12 @@ vect_tra
In some cases exits can lack LC PHI nodes for the virtual operand.
We have to create them when the epilog loop requires them which also
allows us to remove some only halfway correct fixups. This is the
variant triggering for alternate exits.
Bootstrap and regtest pending on
When we choose the IV exit to be one leading to no virtual use we
fail to have a virtual LC PHI even though we need it for the epilog
entry. The following makes sure to create it so that later updating
works.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
PR
torizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -961,6 +961,10 @@ public:
>/* Statements whose VUSES need updating if early break vectorization is to
> happen. */
>auto_vec early_break_vuses;
> +
> + /* Dominators that need to be recalculated that have been deferred un
latch_edge (loop));
> + FOR_EACH_IMM_USE_STMT (use_stmt, iter, last_seen_vuse)
> + {
> + if (flow_bb_inside_loop_p (loop, use_stmt->bb))
> + continue;
> + FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> + SET_USE (use_p, vuse);
> + }
> +}
> +
>/* And update the LC PHIs on exits. */
>for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> if (!dominated_by_p (CDI_DOMINATORS, e->src, dest_bb))
>
>
>
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> On Mon, Feb 26, 2024 at 09:53:41AM +0100, Richard Biener wrote:
> > On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> >
> > > On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > > > > @@ -6756,7 +
On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> > > > those through even when the mode isn't word_mode. For
> >
_attribute__((noipa)) int
> +bar (int x)
> +{
> + int w = (x >= 0 ? x : 0);
> + int z = (x <= 0 ? -x : 0);
> + return w + z;
> +}
> +
> +__attribute__((noipa)) int
> +baz (int x)
> +{
> + return x <= 0 ? -x : 0;
> +}
> +
> +int
> +main ()
> +{
&
> return
> fold_convert_loc (loc, type, associate_trees (loc, var0, con0,
> code, atype));
> --- gcc/testsuite/gcc.dg/bitint-94.c.jj 2024-02-24 11:18:32.607018363
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-94.c 2024-02-24 11:19:09.023500121 +0100
> @@ -0,0 +1,12 @@
> +/* PR middle-end/114084 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +
> +typedef unsigned _BitInt(31) T;
> +T a, b;
> +
> +void
> +foo (void)
> +{
> + b = (T) ((a | (-1U >> 1)) >> 1 | (a | 5) << 4);
> +}
>
> Jakub
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
ed branches - the effective check should be the
same in GCC 13 at least, but with some added ad-hoc costing which might
make this not trigger (maybe_lt (nunits_out, 4U)) - so we'd need a
word_mode that can cover 4 FP elements. Possibly triggerable with
HFmode?
Thanks,
Richard.
> LGTM, but please wait until Monday evening so that Richi or Richard
> have a chance to chime in.
>
> Jakub
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.
This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.
Bootstrapped and
On Mon, Feb 26, 2024 at 4:26 AM wrote:
>
> From: Pan Li
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous. Unfortunately, we missed to adjust the
> validate_subreg part accordingly. For vector type, we don't need to
> restrict the mode size is
> Am 24.02.2024 um 08:44 schrieb Jakub Jelinek :
>
> Hi!
>
> I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned
> HOST_WIDE_INT) constant and turned them into uses of the appropriate
> macros.
> THere are quite a few cases in non-i386 backends but I've left that out
> for
> Am 24.02.2024 um 08:40 schrieb Jakub Jelinek :
>
> Hi!
>
> The following patch implements support for VIEW_CONVERT_EXPRs from/to
> large/huge _BitInt to/from vector or complex types or anything else but
> integral/pointer types which doesn't need to live in memory.
>
>
> Am 24.02.2024 um 11:06 schrieb Richard Sandiford :
>
> During its forward pass, the SLP layout code tries to calculate
> the cost of a layout change on an incoming edge. This is taken
> as the minimum of two costs: one in which the source partition
> keeps its current layout (chosen
On Fri, 23 Feb 2024, Jakub Jelinek wrote:
> On Fri, Feb 23, 2024 at 02:22:19PM +, Andrew Stubbs wrote:
> > On 23/02/2024 13:02, Jakub Jelinek wrote:
> > > On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote:
> > > > This is a follow-up to the previous patch to ensure that integer
> Am 23.02.2024 um 14:03 schrieb Jakub Jelinek :
>
> On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote:
>> This is a follow-up to the previous patch to ensure that integer vector
>> bit-masks do not have excess bits set. It fixes a bug, observed on
>> amdgcn, in which the mask
29.464277919 +0100
> @@ -0,0 +1,17 @@
> +/* PR rtl-optimization/114054 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-Og -fwhole-program -fno-tree-ccp -fprofile-use
> -fno-tree-copy-prop -w" } */
> +
> +int x;
> +
> +void
> +foo (int i, u
"-flto" } { "" } } */
> +
> +unsigned a;
> +signed char b;
> +short c;
> +long d;
> +__int128 e;
> +int f;
> +
> +#if __BITINT_MAXWIDTH__ >= 511
> +__attribute__((noinline)) void
> +foo (_BitInt(3) x, unsigned _BitInt(511) y, unsigned *z)
> +
The following documents obsoleting of ia64*-*-*.
Pushed.
* gcc-14/changes.html: Document ia64*-*-* obsoleting.
---
htdocs/gcc-14/changes.html | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index
The following deprecates ia64*-*-* for GCC 14. Since we plan to
force LRA for GCC 15 and the target only has slim chances of getting
updated this notifies people in advance. Given both Linux and
glibc have axed the target further development is also made difficult.
"Tested" for ia64-elf and
The following adds another omission to the assert verifying we're
not running into spurious off == -1.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/114048
* tree-ssa-sccvn.cc (copy_reference_ops_from_ref): MEM_REF
can also produce -1
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
Bootstrapped and tested
On Thu, Feb 22, 2024 at 10:07 AM Jakub Jelinek wrote:
>
> Hi!
>
> The profile_count::dump (char *, struct function * = NULL) const;
> method has a single caller, the
> profile_count::dump (FILE *f, struct function *fun) const;
> method and for that going through a temporary buffer is just slower
FLT128_MANT_DIG__
> +void
> +flt128 (_Float128 f1, _Float128 f2, _Float128 f3, _Float128 f4, _Float128 f5,
> + _Float128 f6, _Float128 f7, _Float128 f8, _Float128 f9)
> +{
> + if (!(f1 >= -1.0f128 && f1 <= 1.0f128)) __builtin_unreachable ();
> + __builtin_acosf
On Tue, Feb 20, 2024 at 3:33 PM Lewis Hyatt wrote:
>
> On Mon, Feb 19, 2024 at 11:36 PM Alexandre Oliva wrote:
> >
> > This backport for gcc-13 is the first of two required for the
> > g++.dg/pch/line-map-3.C test to stop hitting a variant of the known
> > problem mentioned in that testcase: on
run_expensive_tests } { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 129
> +int
> +foo (unsigned _BitInt(63) x, unsigned _BitInt
> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge :
>
> Hi!
>
>> On 2024-02-01T15:49:02+0100, Richard Biener wrote:
>>> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>>> On 2024-01-26T10:45:10+0100, Richard Biener wrote:
>>>> On Fri, 26 Jan 2024,
On Tue, Feb 20, 2024 at 11:27 AM Iain Sandoe wrote:
>
> Tested on aarch64-linux-gnu, aarch64-darwin by me and on aarch64-linux-musl
> by Sam James (thanks!). OK for trunk?
OK
> thanks
> Iain
>
> --- 8< ---
>
>
> This allows the same trampoline pattern to be used on all linux variants
> rather
On Tue, 20 Feb 2024, Jakub Jelinek wrote:
> On Tue, Feb 20, 2024 at 09:01:10AM +0100, Richard Biener wrote:
> > I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
> > as well as combined vs. split). It really depends how RTL expansion
> > handles this (as yo
On Tue, 20 Feb 2024, Thomas Schwinge wrote:
> Hi Richard!
>
> On 2024-02-20T08:44:35+0100, Richard Biener wrote:
> > On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> >> On 2024-02-19T17:31:20+0100, I wrote:
> >> > On 2024-02-19T11:52:55+0100, Richard Biener
On Tue, 20 Feb 2024, Jakub Jelinek wrote:
> On Tue, Feb 20, 2024 at 12:12:11AM +, Jason Merrill wrote:
> > On 2/19/24 02:55, Jakub Jelinek wrote:
> > > On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote:
> > > > Ah, although __atomic_compare_exchange only takes pointers, the
> >
On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> Hi!
>
> On 2024-02-19T17:31:20+0100, I wrote:
> > On 2024-02-19T11:52:55+0100, Richard Biener wrote:
> >> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> >>> On 2024-02-16T14:53:04+0100, I wrote:
> >>&
a-prop.h
> index 9c78dc9f486..ee3c0006add 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -627,7 +627,7 @@ public:
>vec *descriptors;
>/* Pointer to an array of structures describing individual formal
> parameters. */
> - class ipcp_param_lattices * G
a+sve conflicts
> with -mcpu=neoverse-n2 in previous gcc versions.
Yes.
Thanks,
Richard.
> Kind Regards,
> Andre
>
> On 20/12/2023 14:30, Richard Biener wrote:
> > On Wed, 20 Dec 2023, Andre Vieira (lists) wrote:
> >
> >> Thanks, fully agree with all comm
On Mon, 19 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> >> I suppose that's better than the first version when a block has a
> >> large number of dominance frontiers. But I can't remember whether
> >> that was the case in PR98863. I have a
On Mon, 19 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Mon, 19 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener writes:
> >> > The following tries to address the PHI insertion compile-time hog in
> >> > RTL
On Mon, 19 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > The following tries to address the PHI insertion compile-time hog in
> > RTL fwprop observed with the PR54052 testcase where the loop computing
> > the "unfiltered" set of variables poss
On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> Hi!
>
> On 2024-02-16T14:53:04+0100, I wrote:
> > On 2024-02-16T12:41:06+, Andrew Stubbs wrote:
> >> On 16/02/2024 12:26, Richard Biener wrote:
> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
> >>
The following tries to address the PHI insertion compile-time hog in
RTL fwprop observed with the PR54052 testcase where the loop computing
the "unfiltered" set of variables possibly needing PHI nodes for each
block exhibits quadratic compile-time and memory-use.
Instead of only pruning the set
+/* PR tree-optimization/113967 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef unsigned short W __attribute__((vector_size (4 * sizeof (short))));
> +
> +void
> +foo (W *p)
> +{
> + W x = *p;
> + W y = {};
> + __builtin_memc
On Sat, Feb 17, 2024 at 11:30 AM wrote:
>
> From: Pan Li
>
> This patch would like to add the middle-end presentation for the
> unsigned saturation add. Aka set the result of add to the max
> when overflow. It will take the pattern similar as below.
>
> SAT_ADDU (x, y) => (x + y) |
The following addresses the weak bitmap_hash function which results
in points-to analysis taking a long time because of a high collision
rate in one of its bitmap hash tables. Using a better hash function
like in the bitmap.cc hunk below doesn't help unless one also replaces
the hash function in
On Fri, 16 Feb 2024, Andrew Stubbs wrote:
> On 16/02/2024 10:17, Richard Biener wrote:
> > On Fri, 16 Feb 2024, Thomas Schwinge wrote:
> >
> >> Hi!
> >>
> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote:
> >>
The following addresses consistency check fails in copy_reference_ops_from_ref
when we are handling out-of-bound array accesses (it's almost impossible
to identically mimic the get_ref_base_and_extent behavior). It also
addresses the case where an out-of-bound constant offset computes to a
-1 off
On Fri, 16 Feb 2024, Thomas Schwinge wrote:
> Hi!
>
> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote:
> > I've committed this patch
>
> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
> support builds on top
On Thu, Feb 15, 2024 at 7:38 PM Patrick Palka wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?
Btw, there's the "bitpack" streaming support in data-streamer.h also
added for exactly the same reason, it's likely not easily re-usable
but this kind of
> Am 15.02.2024 um 18:06 schrieb Richard Sandiford :
>
> Richard Biener writes:
>>> On Wed, 14 Feb 2024, Richard Biener wrote:
>>>
>>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>>> bitmaps for looking up equivalence cla
}
> }
> }
> --- gcc/testsuite/gcc.target/i386/pr113921.c.jj 2024-02-14
> 21:21:15.194178515 +0100
> +++ gcc/testsuite/gcc.target/i386/pr113921.c 2024-02-14 21:20:52.745476040
> +0100
> @@ -0,0 +1,20 @@
> +/* PR middle-end/113921 */
> +/* { dg-do run } */
&g
The following fixes the omission of failing to look at pattern
stmts when we need to dissolve SLP only groups.
Bootstrapped and tested on x86-64-unknown-linux-gnu, pushed.
PR tree-optimization/56
* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look
at the pattern
On Thu, 15 Feb 2024, Andrew Stubbs wrote:
> On 15/02/2024 10:21, Richard Biener wrote:
> [snip]
> >>> I suppse if RDNA really only has 32 lane vectors (it sounds like it,
> >>> even if it can "simulate" 64 lane ones?) then it might make sense to
On Thu, 15 Feb 2024, Andrew Stubbs wrote:
> On 15/02/2024 07:49, Richard Biener wrote:
> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >
> >> On 14/02/2024 13:43, Richard Biener wrote:
> >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >>>
On Thu, 15 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener writes:
> >> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >> >
> >> >> Richa
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski wrote:
>
> In some of the standard pattern names, it is not obvious which mode is being
> used in the pattern
> name. Is it operand 0, 1, or 2? Is it the wider mode or the narrower mode?
> This fixes that so there is no confusion by adding a
On Thu, Feb 15, 2024 at 12:16 AM Andrew Pinski wrote:
>
> Currently these use `@var{m3}` but the 3 here is a literal 3
> and not part of the mode itself so it should not be inside
> the var. Fixed as such.
>
> Built the documentation to make sure it looks correct now.
OK
> gcc/ChangeLog:
>
>
The following avoids recording BB dependences for debug stmt uses.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
It's unlikely a dependence is just because of debug stmts so
actual compare-debug issues are very unlikely. Still spotted
while investigating a CI regression mail (for
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/113567 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 129
> +_BitInt(129) v;
> +
> +void
> +foo (_BitInt(129) a, int i)
> +{
> + __label__
On Wed, 14 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener writes:
> >> > The following avoids accessing out-of-bound vector elements when
> >> > native
On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> On 14/02/2024 13:43, Richard Biener wrote:
> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >
> >> On 14/02/2024 13:27, Richard Biener wrote:
> >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >>>
> Am 14.02.2024 um 16:22 schrieb Jakub Jelinek :
>
> On Wed, Feb 14, 2024 at 04:13:51PM +0100, Richard Biener wrote:
>> The following removes the TBAA violation present in iterative_hash.
>> As we eventually LTO that it's important to fix. This also improves
>> co
> Am 14.02.2024 um 16:16 schrieb Tamar Christina :
>
>
>>
>>
>> I think this isn't entirely good. For simple cases for
>> do {} while the condition ends up in the latch while for while () {}
>> loops it ends up in the header. In your case the latch isn't empty
>> so it doesn't end up
The following tries to improve the actual hash function for hashing
bitmaps. We're still getting collision rates as high as 23 for the
testcase in the PR. The following improves this by properly mixing
in the bitmap element starting bit number. This brings down the
collision rate below 1.4,
The following removes the TBAA violation present in iterative_hash.
As we eventually LTO that it's important to fix. This also improves
code generation for the >= 12 bytes loop by using | to compose the
4 byte words as at least GCC 7 and up can recognize that pattern
and perform a 4 byte load
On Wed, 14 Feb 2024, Richard Biener wrote:
> For the testcase in PR113910 we spend a lot of time in PTA comparing
> bitmaps for looking up equivalence class members. This points to
> the very weak bitmap_hash function which effectively hashes set
> and a subset of not set bits. T
On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> On 14/02/2024 13:27, Richard Biener wrote:
> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> >
> >> On 13/02/2024 08:26, Richard Biener wrote:
> >>> On Mon, 12 Feb 2024, Thomas Schwinge wrote:
> >>>
>
e7bc33654ffa027b493f23d278ac..a29681bffb902d2d05e3f18764ab519aacb3c5bc
> 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -327,6 +327,10 @@ replace_loop_annotate (void)
>if (loop->latch)
> replace_loop_annotate_in_block (loop->latch, loop);
>
&g
On Wed, 14 Feb 2024, Andrew Stubbs wrote:
> On 13/02/2024 08:26, Richard Biener wrote:
> > On Mon, 12 Feb 2024, Thomas Schwinge wrote:
> >
> >> Hi!
> >>
> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote:
> >>
On Wed, 14 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > The following avoids accessing out-of-bound vector elements when
> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> > elements. The error was basing the number o
On Wed, 14 Feb 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > The following zeros paddings of vector bools when expanding compares
> > and the mode used for the compare is an integer mode. In that case
> > targets cannot distinguish between a 4 element
401 - 500 of 25218 matches
Mail list logo