[Bug tree-optimization/111882] [13 Regression] : internal compiler error: in get_expr_operand in ifcvt with Variable length arrays and bitfields inside a struct

2024-04-30 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111882

avieira at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[13/14/15 Regression] : |[13 Regression] : internal
   |internal compiler error: in |compiler error: in
   |get_expr_operand in ifcvt   |get_expr_operand in ifcvt
   |with Variable length arrays |with Variable length arrays
   |and bitfields inside a  |and bitfields inside a
   |struct  |struct
  Known to work||14.0

--- Comment #5 from avieira at gcc dot gnu.org ---
Fixed on gcc-14 (when it was trunk, so removing 14 and 15 tag. Still needs
backport to gcc-13

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #18 from avieira at gcc dot gnu.org ---
Sorry to be clear, the 'here' in the last sentence refers to supporting masks
as 0x to control the writing of the output register as the ISA allows,
rather than interpret 0x and 0x as the same mask.

I'll also see if I can propose a change to the ACLE specs to make this clearer.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #17 from avieira at gcc dot gnu.org ---
Before anything, it might be worth to redefine the testcase to something where
the predicate would have an effect in the result, for instance:

#include 
uint32x4_t test_9() {
  return vdupq_m_n_u32(vdupq_n_u32(0x), 0, 0x);
}

Next, it might be worth pointing out that the ISA does specify what happens
when a predicate mask does not have all bits set for a specific element.
Basically, the predicate mask operates on a per byte basis. Hence 16-bits in
the mask, controlling all 16-bytes in a vector register.

So for the above, the expected output would be {0x, 0x,
0x, 0x}.

Having said that I can see how you'd interpret the ACLE specs as defining such
a mask to be 'UB', but I believe the intent was to make clear that all bits
needed to be set if you wanted to true-predicate the full {32,16}-bit element.
This is the most common use, I can't imagine many users will be manipulating
the mask in such ways.

clang seems to follow this behavior generating an assembly sequence that leads
to the expected output, though they use vpsel probably due to some
canonicalization. And I'd prefer to be consistent with clang here.

[Bug target/112787] Codegen regression of large GCC vector extensions when enabling SVE

2024-03-26 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

--- Comment #13 from avieira at gcc dot gnu.org ---
They have both been backported, @Eric the tests should be passing again now.

[Bug target/112787] Codegen regression of large GCC vector extensions when enabling SVE

2024-03-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

--- Comment #12 from avieira at gcc dot gnu.org ---
Sorry, missed that comment, thanks! I'll test backporting both.

[Bug target/112787] Codegen regression of large GCC vector extensions when enabling SVE

2024-03-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

--- Comment #10 from avieira at gcc dot gnu.org ---
First of all, apologies for this! I don't know why I didn't test this on x86_64
too, I usually do for such backports.

Anyway I checked locally and backporting: 
r14-2821-gd1c072a1c3411a6fe29900750b38210af8451eeb seems to be enough for
gcc-12, I'm testing it on gcc-13 and running full regression tests on both
x86_64 and aarch64 and will get back to you.

@Andrew what made you think we also needed r14-2985-g04aa0edcace22a ? Not to
say we may not want to backport it, but just trying to figure out why it's
needed for this particular case.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64

2024-03-15 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #19 from avieira at gcc dot gnu.org ---
Should we update target and summary to also include x86_64?

[Bug tree-optimization/111478] [12 Regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2024-03-01 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #10 from avieira at gcc dot gnu.org ---
This has now been backported to gcc-13 and gcc-12, so I think we should close,
will leave that to Richard.

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

--- Comment #6 from avieira at gcc dot gnu.org ---
Oh forgot to mention, this is triggering because of the div optimization in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c69db3ef7f7d82a50f46038aa5457b7c8cc2d643

But I suspect that too is just an enabler and not the root cause? Unless we
aren't supposed to use subregs for sve modes...

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

--- Comment #5 from avieira at gcc dot gnu.org ---
Oh forgot to mention, this is triggering because of the div optimization in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c69db3ef7f7d82a50f46038aa5457b7c8cc2d643

But I suspect that too is just an enabler and not the root cause? Unless we
aren't supposed to use subregs for sve modes...

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from avieira at gcc dot gnu.org ---
So I can confirm this ICE and it was exposed rather than caused by my patch.

The problem arises because it seems we have never tried to simplify a:
(subreg: (subreg:<...> () N) M)

This makes simplify_subreg neter the if (GET_CODE (op) == SUBREG) which calls:
'paradoxical_subreg_p (VNx4SImode, OImode)'
Which seems to assume these are ordered with an assert.

I am not sure what the right fix is here, I did check and changing
paradoxical_subreg_p to return false if the mode sizes are not ordered leads to
some bizarre fail, it looks like simplify_gen_subreg then just returns 0 ...
rather than the original nested subregs.

Before I dig deeper I'll get richi and Richard S to comment.

[Bug target/113040] [14 Regression] libmvec test failures

2023-12-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113040

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #4 from avieira at gcc dot gnu.org ---
Yeah my bad. For cases where we don't expose the definition the new code
sequence doesn't add multiple vector parameters for cases where the vector
length of the single parameter is less than that of the simdclone's simdlen.

Testing a patch now.

[Bug tree-optimization/113026] Bogus -Wstringop-overflow warning on simple memcpy type loop

2023-12-15 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113026

--- Comment #4 from avieira at gcc dot gnu.org ---
Drive by comments as it's been a while since I looked at this. I'm also
surprised we didn't adjust the bounds. But why do we only subtract VF? Like you
say, if there's no loop around edge, can't we guarantee the epilogue will only
need to iterate at most VF-1?  This is assuming we didn't take an early exit,
if we do then we can't assume anything as the iterations 'reset'.

[Bug target/112787] Codegen regression of large GCC vector extensions when enabling SVE

2023-11-30 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org
   Last reconfirmed||2023-11-30
 Target||aarch64

--- Comment #1 from avieira at gcc dot gnu.org ---
The problem is veclower tries to find the largest vector type it can use for a
particular element type, which when SVE is enabled without a specified vector
length will always be a VLA type.  This then fails the check of it having less
elements than the type being used to do the computation, given that a VLA
element count is never 'known_lt' a constant one.

I am currently testing a patch that makes sure the mode selected does not have
more elements than the type we are trying to compute, given that it wouldn't be
used anyway.

[Bug target/112787] New: Codegen regression of large GCC vector extensions when enabling SVE

2023-11-30 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

Bug ID: 112787
   Summary: Codegen regression of large GCC vector extensions when
enabling SVE
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

When compiling:

typedef int __attribute__((__vector_size__ (64))) vec;

vec fn (vec a, vec b)
{
return a + b;
}

with '-O2 -march=armv8-a' vs '-O2 -march=armv8-a+sve' the codegen defaults to
scalar rather than using Advanced SIMD vectors.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-31 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #11 from avieira at gcc dot gnu.org ---
So I had a look at that u_lsm.72_510 variable and it's only undefined if we
don't loop, but if we don't loop then u_lsm_flag is set to 0 and we don't use
u_lsm. So it's OK. I also checked and the early exits are covered by the same
mechanism.
So really the question is, why does irange think the range is [-21, 0]. Anyone
have an idea of how to debug this?

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-31 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #10 from avieira at gcc dot gnu.org ---
So I had a look at that u_lsm.72_510 variable and it's only undefined if we
don't loop, but if we don't loop then u_lsm_flag is set to 0 and we don't use
u_lsm. So it's OK. I also checked and the early exits are covered by the same
mechanism.
So really the question is, why does irange think the range is [-21, 0]. Anyone
have an idea of how to debug this?

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-30 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #9 from avieira at gcc dot gnu.org ---
So I had a look at this and this is as far as I got.
It seems to get stuck in the 'for (u = -22; u < 2; ++u)' loop. It looks like
the loop IV never gets updated and it keeps looping.

Looking at the codegen it seems that cunroll decides to remove A LOT of code
and there is now:
bb 4:
..
# ivtmp_1055 = PHI 
..
bb 24:
...
ivtmp_1056 = ivtmp_1055 - 1;
  goto ; [100.00%]

I've not yet been able to figure out why this happens, the dumps weren't very
helpful. So I tried -fdisable-tree-cunroll, it was still failing. So I looked
at the dumps to try and see what was turning this loop into an infinite loop
and vrp2 shows me:
Global Exported: _19 = [irange] int [-21, 0]
Folding predicate _19 != 2 to 1

and in the dump before vrp2 we see:
  [local count: 7354175]:
  # u.13_485 = PHI <_19(105), -22(3)>
  # u_lsm.72_510 = PHI <_19(105), _497(D)(3)>
  # u_lsm_flag.73_235 = PHI <1(105), 0(3)>
...
   [local count: 6634488]:
  al ={v} {CLOBBER(eol)};
  _19 = u.13_485 + 1;
  if (_19 != 2)
goto ; [96.34%]
  else
goto ; [3.66%]

   [local count: 6391666]:
  goto ; [100.00%]

Something to point out here, that u_lsm.72_510 seems odd. It is used to set
global 'u', but its initialized with _497(D) which is undefined... So that
itself seems wrong to me too... I'll try and find out what's causing that
codegen next. Maybe that can explain why the irange for _19 is so wrong here.

[Bug tree-optimization/111882] [13/14 Regression] : internal compiler error: in get_expr_operand in ifcvt with Variable length arrays and bitfields inside a struct

2023-10-20 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111882

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #3 from avieira at gcc dot gnu.org ---
Taking this, first time I see a SAVE_EXPR. It looks like it indicates
side-effects, I'm gonna see if I can detect the presence of side-effects and
reject lowering if so. Does that sound OK?

[Bug plugins/110610] [14 Regression] File insn-opinit.h not installed ?

2023-07-17 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from avieira at gcc dot gnu.org ---
This should fix it. David please reopen if the problem still persists.

[Bug plugins/110610] File insn-opinit.h not installed ?

2023-07-10 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-07-10
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

[Bug plugins/110610] File insn-opinit.h not installed ?

2023-07-10 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

--- Comment #8 from avieira at gcc dot gnu.org ---
I'll try adding to one of the header file lists in gcc's makefile. Probably the
INTERNAL_FN_H one.

[Bug plugins/110610] File insn-opinit.h not installed ?

2023-07-10 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

--- Comment #7 from avieira at gcc dot gnu.org ---
> I guess you mean insn-opinit.h, not internal-fn.h.  internal-fn.h is in the 
> GCC Git repo.

Yeah sorry! I did mean insn-opinit.h

> We are already installing insn-{addr,attr-common,attr,codes,...}.h anyway.

Fair!

[Bug plugins/110610] File insn-opinit.h not installed ?

2023-07-10 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

--- Comment #5 from avieira at gcc dot gnu.org ---
intenral-fn.h is generated at gcc build-time. I'm not sure we want to 'install'
it with a gcc install. Might make more sense to trigger a the generation of it
when building this gcc-plugin. But I'm not sure... I'll ask around the
community see what people think.

[Bug plugins/110610] File insn-opinit.h not installed ?

2023-07-10 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110610

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
I can't reproduce this but it seems like the modula2 build also suffers from
the same issue, see PR110284.

David, what exactly are you trying to build? Can you give us the configure
command?

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #5 from avieira at gcc dot gnu.org ---
Hi Xi,

Feel free to test your patch and submit it to the list for review. I had a look
over and it looks correct to me.

I feel like it also addresses the cases where the bitfield is 'sandwiched'
like:
int x : 7;
ptrdiff_t y : 56;
long long z: 1;

As you left-shift it, and it also addresses the case where you have both
sign-extension and have to widen-it, because you still transform the type into
signed.

But it might be nice to add tests to cover those two, just in case someone
changes this.

In the future, if you do plan to work on something it would be nice to let
people know on the bugzilla ticket (preferably by assigning it to yourself) so
that multiple people don't end up working on the same thing, I had started to
write a patch, but wasn't as far as you and I like your approach :)

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
I'll have a look.

[Bug tree-optimization/110436] [14 Regression] ICE in vectorizable_live_operation, at tree-vect-loop.cc:10170

2023-06-27 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110436

--- Comment #4 from avieira at gcc dot gnu.org ---
Meant to say I'll look at it ;)

[Bug tree-optimization/110436] [14 Regression] ICE in vectorizable_live_operation, at tree-vect-loop.cc:10170

2023-06-27 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110436

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #3 from avieira at gcc dot gnu.org ---
I

[Bug tree-optimization/110310] vector epilogue handling is inefficient

2023-06-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #4 from avieira at gcc dot gnu.org ---
> OK, so I take away from this that you don't think this is done the way
it is on purpose.

I don't think so, I think I just found a place where it was safe to do so, i.e.
where we knew the vectorization factor would not change after. 

I have a vague recollection that vect_analyze_loop used to be somewhat more
complex, but given the now clear separation between main loop and epilogue
vinfo selection we have now, we could probably do this as we analyze
loop_vinfos for epilogue?

Assuming that during analysis we've had determined vf, peeling and use of
masks, which I'm pretty sure we have.

Might be worth asking Richard Sandiford if he can think of anything that we
might not be 'fixing' during analysis.

[Bug tree-optimization/110310] vector epilogue handling is inefficient

2023-06-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #2 from avieira at gcc dot gnu.org ---
I can't remember the exact reason either, though I do vaguely remember niter
updating being something that we felt 'needed more future work' at the time.

Just a side question, AVX512 has predication right? So how come you are
expecting an epilogue?

I'm also curious about the condition on that snippet of code, 'known_eq (vf,
lowest_vf)' seems odd.. lowest_vf is by definition constant, so known_eq only
succeeds if vf is constant and the same as lowest_vf, but lowest_vf is the
constant lower bound of vf, i.e. that seems like a very convoluted way of doing
vf.is_constant (_vf)? Maybe this helper function wasn't around back
then. Either way, it feels like we shouldn't be doing this if loop_vinfo is
predicated? But I also agree that we probably want to be doing all of this
during analysis, seems odd to be ruling out loop_vinfo's during transformation.

[Bug middle-end/110142] [14 Regression] x264 from SPECCPU 2017 miscompares from g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0

2023-06-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110142

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from avieira at gcc dot gnu.org ---
I believe that fixes the issue.

[Bug middle-end/110142] [14 Regression] x264 from SPECCPU 2017 miscompares from g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0

2023-06-07 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110142

--- Comment #1 from avieira at gcc dot gnu.org ---
Found the issue to be with passing a subtype to vect_recog_widen_op_pattern in
vect_recog_widen_{plus,minus}_pattern where we didn't before. Removing those
and letting it default to a NULL pointer seems to fix the codegen issue.  Will
test patches locally and send in patch when done.

[Bug tree-optimization/109543] Avoid using BLKmode for unions with a non-BLKmode member when possible

2023-04-24 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109543

--- Comment #3 from avieira at gcc dot gnu.org ---
Err that should be 'double d[4];' so:
typedef struct 
{
float __attribute__ ((vector_size(16))) v[2];
} STRUCT;

#ifdef GOOD
typedef STRUCT TYPE;
#else
typedef union
{
STRUCT s;
double d[4];
} TYPE;
#endif

[Bug tree-optimization/109543] Avoid using BLKmode for unions with a non-BLKmode member when possible

2023-04-24 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109543

--- Comment #2 from avieira at gcc dot gnu.org ---
Sorry for the delay. Here's the typedefs with GNU vectors.  

typedef struct 
{
float __attribute__ ((vector_size(16))) v[2];
} STRUCT;

#ifdef GOOD
typedef STRUCT TYPE;
#else
typedef union
{
STRUCT s;
double d[2];
} TYPE;
#endif

To be fair I suspect you could see similar behaviour with just 16-byte vectors,
but with aarch64 the backend will know to use 64-bit scalar moves for 128-bit
BLKmodes, though even then, picking the vector mode would result in more
optimal (single vector move) code.

[Bug tree-optimization/109543] New: Avoid using BLKmode for unions with a non-BLKmode member when possible

2023-04-18 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109543

Bug ID: 109543
   Summary: Avoid using BLKmode for unions with a non-BLKmode
member when possible
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hi,

So with the following C-code:
$ cat t.c
#include 
#ifdef GOOD
typedef float64x2x2_t TYPE;
#else
typedef union
{
  float64x2x2_t v;
  double d[4];
  } TYPE;
  #endif


void foo (TYPE *a, TYPE *b, TYPE *c, unsigned  n)
{
 TYPE X = a[0];
 TYPE Y = b[0];
 TYPE Z = c[0];
 for (unsigned  i = 0; i < n; ++n)
 {
  TYPE temp = X;
  X = Y;
  Y = Z;
  Z = temp;
 }
 a[0] = X;
 b[0] = Y;
 c[0] = Z;
}

If compiled for aarch64 with -DGOOD the compiler will use vector register moves
in the loop, whereas without -DGOOD it will use the stack with memmoves.

The reason for this is because when picking the mode to address a UNION with
gcc will always choose BLKmode as soon as any member of a UNION is BLKmode. In
such cases I think it would be safe to go with not-BLKmode of members that have
the same size as the entire UNION?

[Bug tree-optimization/108888] [13 Regression] error: definition in block 26 follows the use

2023-04-03 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #6 from avieira at gcc dot gnu.org ---
After this patch Andrew Stubbs patch
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3da77f217c8b2089ecba3eb201e727c3fcdcd19d)
to use in-branch simd-clones for cases like in
gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c no longer work.

I believe this is because this patch changes the 'if (gimple_call ..)' into a
else 'if (...is_gimple_call (stmt))' which doesn't work because stmt will be 0
(it's a dyn_cast of gassign).

I'm testing a patch locally to fix this.

[Bug target/98850] ICE in expand_debug_locations, at cfgexpand.c:5458

2023-03-23 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98850

--- Comment #2 from avieira at gcc dot gnu.org ---
I failed to reproduce it with a trunk build of arm-none-linux-gnueabihf.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #5 from avieira at gcc dot gnu.org ---
Im slightly confused here, on entry to BB 5 we know the opposite of _1 < 0.0
no? if we branch to BB 5 we know !(_1 < 0.0) so we can't fold _1 <= 1.0, we
just know that the range of _1 is >= 0.0 . Or am I misreading, I've not tried
compiling myself just going off the code both of you posted here.

[Bug tree-optimization/109230] [13 Regression] Maybe wrong code for opus package on aarch64 since r13-4122-g1bc7efa948f751

2023-03-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230

--- Comment #9 from avieira at gcc dot gnu.org ---
Hmm I was seeing the change in opus_ifft but that does look like different
codegen :/ I might not be looking at the right thing.

That transformation looks definitely wrong though as the selection selects 3
values from the first vector (which is the result of the plus), and the fneg
would negate 2 values right?

[Bug tree-optimization/109230] [13 Regression] Maybe wrong code for opus package on aarch64 since r13-4122-g1bc7efa948f751

2023-03-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230

--- Comment #6 from avieira at gcc dot gnu.org ---
Thanks!

My initial investigation has lead me to think the change is being caused at
vrp2, which is the only time the pattern gets triggered with -O2, the tree
before the pass (at the place where the transformation happens):

  vect__83.466_787 = VEC_PERM_EXPR ;
  vect__87.467_786 = vect__81.462_791 * vect__83.466_787;
  vect__91.469_784 = vect__84.458_794 - vect__87.467_786;
  vect__88.468_785 = vect__84.458_794 + vect__87.467_786;
  _783 = VEC_PERM_EXPR ;
 ...
  vect__96.470_782 = vect__95.450_800 - _783;

after the pass:
  vect__83.466_787 = VEC_PERM_EXPR ;
  vect__87.467_786 = vect__83.466_787 * vect__81.462_791;
  vect__91.469_784 = vect__84.458_794 - vect__87.467_786;
  vect__88.468_785 = vect__87.467_786 + vect__84.458_794;
  _756 = VIEW_CONVERT_EXPR(vect__87.467_786);
  _755 = -_756;
  _739 = VIEW_CONVERT_EXPR(_755);
  _783 = _739 + vect__84.458_794;
...
  vect__96.470_782 = vect__95.450_800 - _783;

So before we had:
_783 = the first element of vect_88 and the second element of vect__91
these are respectively
vect__88 = vect__84 + vect__87
vect__91 = vect__84 - vect__87
so _783 = {vect__84[0] + vect__87[0], vect__84[1] - vect__87[1]}

after the pass
_783 = _739 + vect__84
This is where I don't know if I'm reading the optimization correctly, but it
says all 'even' lanes are negated, does that mean we end up with:
_739 = { -vect__87[0] , vect__87[1]}
if so then that's why we have a wrong result as you want to negate lane 1 not
0.  Otherwise if lane 1 is the one that gets negated then it should be OK as
you'd get:
so _783 = { vect__87[0] + vect__84[0], -vect__87[1] + vect__84[1] }
Now obviously that's assuming -a + b == b - a (not sure if that's true with
floating point errors etc)

[Bug tree-optimization/109230] [13 Regression] Maybe wrong code for opus package on aarch64 since r13-4122-g1bc7efa948f751

2023-03-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #3 from avieira at gcc dot gnu.org ---
Hi Martin, what options do you build these tests with?

[Bug tree-optimization/109005] [13 Regression] ICE during GIMPLE pass: ifcvt

2023-03-07 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109005

--- Comment #21 from avieira at gcc dot gnu.org ---
Something else that might be obvious, how do I create a minimal ifcvt_demo.adb
file that uses the .ads, so that I can add it as a testcase to gcc, as the
testsuite seems to pick up .adb files only.

[Bug tree-optimization/109005] [13 Regression] ICE during GIMPLE pass: ifcvt

2023-03-07 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109005

--- Comment #20 from avieira at gcc dot gnu.org ---
It's probably obvious to people that know Ada, so I just have to apologize for
my ignorance in that area :)

[Bug tree-optimization/109005] [13 Regression] ICE during GIMPLE pass: ifcvt

2023-03-07 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109005

--- Comment #15 from avieira at gcc dot gnu.org ---
@richi: Yeah and as I mentioned on IRC I can confirm it fixes the issue, I also
bootstrapped and regression tested the change on aarch64-unknown-linux-gnu.

Simon, I can't compile your minimal reproducer, first it complains about
missing the body keyword, so I added that, but then it complains about missing
a ifcvt_demo.ads, tried adding an empty one but that didn't work.

[Bug tree-optimization/109005] [13 Regression] ICE during GIMPLE pass: ifcvt

2023-03-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109005

--- Comment #8 from avieira at gcc dot gnu.org ---
Oh nvm... you did.

[Bug tree-optimization/109005] [13 Regression] ICE during GIMPLE pass: ifcvt

2023-03-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109005

--- Comment #7 from avieira at gcc dot gnu.org ---
I'm still trying to build ADA to reproduce this.

Could you try 'p debug_tree (var)'

if var is a SSA_NAME debug won't print anything. If it comes back as not 0
could you also do p debug_tree (TREE_TYPE (var))

Thank you! I'll keep trying to build ADA locally to see if I can debug this
too.

[Bug target/96342] [SVE] Add support for "omp declare simd"

2023-02-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342

--- Comment #10 from avieira at gcc dot gnu.org ---
yang I assume you are no longer working on this?

[Bug target/107987] [12 Regression] MVE vcmpq vector-scalar can trigger ICE

2023-01-27 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107987

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from avieira at gcc dot gnu.org ---
Fixed in GCC-13 and backported to GCC-12, closing.

[Bug target/108443] New: arm: MVE wrongly re-interprets predicate constants

2023-01-18 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108443

Bug ID: 108443
   Summary: arm: MVE wrongly re-interprets predicate constants
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

compiling:
$ cat t.c
#include 

uint32x4_t foo (uint32_t *a)
{
  mve_pred16_t p = 0x00cc;
  return vldrwq_z_u32 (a, p);
}

with:

$ arm-none-eabi-gcc -march=armv8.1-m.main+mve -mfloat-abi=hard -O2 -S 
will yield:
foo:
mov r3, #-4   @ movhi
vmsr p0, r3 @ movhi
vpst
vldrwt.32   q0, [r0]
bx  lr

That leads to a P0 mask of 0xFFFC and not 0x00CC as it should be.

[Bug target/108442] arm: MVE's vld1* and vst1* do not work when __ARM_MVE_PRESERVE_USER_NAMESPACE is defined

2023-01-18 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108442

--- Comment #1 from avieira at gcc dot gnu.org ---
This fails equally for any vld1* vstr1* intrinsic.

[Bug target/108442] New: arm: MVE's vld1* and vst1* do not work when __ARM_MVE_PRESERVE_USER_NAMESPACE is defined

2023-01-18 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108442

Bug ID: 108442
   Summary: arm: MVE's vld1* and vst1* do not work when
__ARM_MVE_PRESERVE_USER_NAMESPACE is defined
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

When compiling:
$ cat t.c
#include 

uint32x4_t foo (uint32_t *p)
{
return __arm_vld1q_u32 (p);
}

with:
$ arm-none-eabi-gcc -march=armv8.1-m.main+mve -mfloat-abi=hard
-D__ARM_MVE_PRESERVE_USER_NAMESPACE

it will fail to compile as __arm_vld1q_u32 is defined in arm_mve.h as calling
vldrwq_u32 which will not exist when __ARM_MVE_PRESERVE_USER_NAMESPACE is
defined.

[Bug target/108177] MVE predicated stores to same address get optimized away

2022-12-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108177

--- Comment #3 from avieira at gcc dot gnu.org ---
The architecture describes it as only writing the true-predicated bytes and
leaving the others untouched. I guess reading and writting to the same memory
is the best we can do to 'mimic' that in RTL. SVE does the same as x86, so I'll
try that approach over unspec_volatile.

[Bug target/108177] MVE predicated stores to same address get optimized away

2022-12-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108177

--- Comment #1 from avieira at gcc dot gnu.org ---
I noticed that for SVE stores seem to be marked as volatile memory accesses, I
suspect it's because they are represented using masked stores which probably
are by definition volatile (for this reason?).

A fix for this for now, before MVE starts using maskedstore patterns, would be
to use unspec_volatile for such stores.

[Bug target/108177] New: MVE predicated stores to same address get optimized away

2022-12-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108177

Bug ID: 108177
   Summary: MVE predicated stores to same address get optimized
away
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

GCC currently generates wrong code for predicated MVE stores to the same
address. Like:

#include 

uint8x16_t foo (uint8x16_t a, uint8_t *pa, mve_pred16_t p1, mve_pred16_t p2)
{
vstrbq_p_u8 (pa, a, p1);
vstrbq_p_u8 (pa, a, p2);
}

with 'gcc -mcpu=cortex-m55 -mfloat-abi=hard -O3' it will only generate the
second MVE store. Though if (p2 | p1) != p2 then the second store will not
fully overwrite the first.

[Bug target/107987] New: [12/13 Regression] MVE vcmpq vector-scalar can trigger ICE

2022-12-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107987

Bug ID: 107987
   Summary: [12/13 Regression] MVE vcmpq vector-scalar can trigger
ICE
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Using the following testcase
$ cat t.c
#include 

uint32x4_t foo (uint32x4_t a, uint32x4_t b)
{
  mve_pred16_t p = vcmpneq_n_u32 (vandq_u32 (a, b), 0);
  return vaddq_x_u32 (a, b, p);
}

and compiling with arm-none-eabi-gcc -mcpu=cortex-m55 -mfloat-abi=hard -O2 will
trigger an ICE in combine.

This was caused by g:d083fbf72d4533d2009c725524983e1184981e74 as when removing
the unspec's around the vcmp's it now exposed the compiler to a comparison
operator with a vector and a scalar operand.

[Bug tree-optimization/107808] gcc.dg/vect/vect-bitfield-write-2.c etc.FAIL

2022-11-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107808

--- Comment #2 from avieira at gcc dot gnu.org ---
Hi Rainer,

I suspect this means SPARC should be added to the list of targets that fail
check_effective_target_vect_long_long. From the dump it looks like the target
doesn't support a long long vectype.

[Bug tree-optimization/107326] [13 Regression] ICE: verify_gimple failed (error: type mismatch in binary expression) since r13-3219-g25413fdb2ac249

2022-11-15 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107326

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from avieira at gcc dot gnu.org ---
Closing this then.

[Bug libgcc/107678] New: [13 Regression] Segfault in aarch64_fallback_frame_state

2022-11-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107678

Bug ID: 107678
   Summary: [13 Regression] Segfault in
aarch64_fallback_frame_state
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hi,

We ran into a segfault when running SPEC 2017 Parest for aarch64-none-linux-gnu
on a Neoverse V1 target after g:146e45914032

These are the relevant frames of the segfault:
#0  0x8bd2dd04 in aarch64_fallback_frame_state (context=0xe11f6e10,
fs=0xe11f71d0)
at ./md-unwind-support.h:74
#1  uw_frame_state_for (context=context@entry=0xe11f6e10,
fs=fs@entry=0xe11f71d0)
at .../libgcc/unwind-dw2.c:1275
#2  0x8bd2f0ec in _Unwind_RaiseException (exc=0x36b105d0)
at .../libgcc/unwind.inc:104
#3  0x8be8d6b4 in __cxxabiv1::__cxa_throw (obj=,
tinfo=0x56bf58 ,
dest=0x468c00 )
at .../libstdc++-v3/libsupc++/eh_throw.cc:93

We do not see the same failure for a NEON only run, so the size of the vectors
could be a hint? But I haven't confirmed this.

[Bug tree-optimization/107326] [13 Regression] ICE: verify_gimple failed (error: type mismatch in binary expression) since r13-3219-g25413fdb2ac249

2022-11-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107326

--- Comment #5 from avieira at gcc dot gnu.org ---
It looks that way on my end, but I'll let Arseny confirm.

[Bug tree-optimization/107346] [13 Regression] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-23 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

--- Comment #9 from avieira at gcc dot gnu.org ---
Hi Eric,

I realised the same, got a patch pending here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604139.html

[Bug tree-optimization/107346] [13 Regression] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

--- Comment #6 from avieira at gcc dot gnu.org ---
> There are no differences between gnat1 and cc1/cc1plus as far as dumps are 
> concerned, e.g. -fdump-tree-optimized creates the .optimized dump.

This was my bad, I'm not used to using cc1 directly, usually go through the
driver, so didn't realize it was putting the dumps in the same place as the
source file.

[Bug tree-optimization/107346] [13 Regression] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

--- Comment #4 from avieira at gcc dot gnu.org ---
Funnily enough, if I transform the Int24 into a 32-bit integer in the testcase
and disable all bitfield lowering just to make sure, I get the same failure. I
tried using __attribute__((packed)) in C to reproduce this, but I keep getting
a 32-bit offset... Either way, I will test a patch where
vect_check_gather_scatter bails out if pbitpos isn't a multiple of
BITS_PER_UNIT.

[Bug tree-optimization/107346] [13 Regression] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

--- Comment #3 from avieira at gcc dot gnu.org ---
I am wondering whether I should try to support this, or bail out of
vect_check_gather_scatter if pbitpos is not a multiple of BITS_PER_UNIT. The
latter obviously feels safer.

[Bug testsuite/107338] new test case gcc.dg/vect/vect-bitfield-read-7.c in r13-3413-ge10ca9544632db fails

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107338

--- Comment #3 from avieira at gcc dot gnu.org ---
Hi Kewen,

I believe you are right. I was waiting for a powerpc machine in the board farm,
but I suspect I can reproduce this with an aarch64 BE target and I should be
able to confirm.

But your reasoning seems valid to me. Because of the widening the shift_n
becomes 32-shift_n-mask_width, but the start of the bitfield didn't move by
widening the container, so it is still 16 - shift_n - mask_width bits away from
the start of the container.

Moving the calculation before the widening seems like the neatest solution to
me, there's no point in keeping the old type around I think.

Do you want to produce a patch for this, seeing you solved it?

[Bug tree-optimization/107346] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-10-21

--- Comment #1 from avieira at gcc dot gnu.org ---
I've tracked this down to 'vect_check_gather_scatter's pbytepos calculation:

poly_int64 pbytepos = exact_div (pbitpos, BITS_PER_UNIT);

Where pbitpos is 4 and that triggers an assert in exact_div. I am not sure what
the best fix would be here. The stmt this fails on is:
_ifc__23 = (*x_7(D))[_1].b.D.3707;

But I am having trouble debugging this as I cant' seem to break on
vect_recog_bit_insert_pattern and I haven't figured out how to get gnat1 to
create dumps :(

[Bug tree-optimization/107346] New: gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

Bug ID: 107346
   Summary: gnat.dg/loop_optimization23_pkg.ad failure afer
r13-3413-ge10ca9544632db
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

As reported by Eric in
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603356.html

[Bug tree-optimization/107326] [13 Regression] ICE: verify_gimple failed (error: type mismatch in binary expression) since r13-3219-g25413fdb2ac249

2022-10-20 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107326

--- Comment #2 from avieira at gcc dot gnu.org ---
Hi Arseny,

Apologies for this, I thought I had caught this with testing, but seems I had
not. I am testing a fix right now.

[Bug tree-optimization/107275] [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list

2022-10-17 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107275

--- Comment #3 from avieira at gcc dot gnu.org ---
The prodding helped! The problem is that dce was indeed removing the ASM as it
wasn't recognizing it as a stmt that was live. This is because ifcvt would have
normally bailed out when encountering such an asm stmt when doing
'find_data_references_in_loop'.

I have a patch that fixes this, will test it and post it upstream. My plan is
to bring forward the references check, as we do not need to lower bitfields if
that fails, given loop-vectorization will fail altogether anyway.

[Bug tree-optimization/107275] [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list

2022-10-17 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107275

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-10-17
 Status|UNCONFIRMED |NEW

--- Comment #2 from avieira at gcc dot gnu.org ---
ifcvt's dce seems to be removing the asm, which is rather odd...

Moving the 'struct device_link *link;' outside of the function, making it a
global seems to give a different ICE too, related to vdefs. So I suspect my
vdef/vuse update is confusing things. I've never quite understood what and how
the vdef/vuse update is supposed to happen, update_stmt used to be my goto
fix-all, but that doesnt' seem to be helping. As a side-not, I also noticed
that doing the gimple_move_vops after inserting seems to yield different
results as well...

Just to say I am nowhere yet, if anyone has an idea of what might be going
wrong I welcome the suggestion, in the meantime I'll continue prodding this.

[Bug tree-optimization/107275] [13 Regression] Recent ifcvt changes resulting in references to SSA_NAME on free list

2022-10-17 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107275

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

--- Comment #1 from avieira at gcc dot gnu.org ---
I'll have a look, thank you for the reduced testcase!

[Bug testsuite/107240] [13 Regression] FAIL: gcc.dg/vect/vect-bitfield-write-2.c since r13-3219-g25413fdb2ac249

2022-10-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107240

--- Comment #4 from avieira at gcc dot gnu.org ---
Might be worth posting the output of -fdump-tree-vect-all might be failing to
vectorize due to some specific lack of feature that we can test for.

[Bug testsuite/107240] [13 Regression] FAIL: gcc.dg/vect/vect-bitfield-write-2.c since r13-3219-g25413fdb2ac249

2022-10-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107240

--- Comment #2 from avieira at gcc dot gnu.org ---
Hi Seurer, Peter,

Adding something like: { xfail { powerpc*-*-* && { ! powerpc_vsx_ok } } } }
should xfail all powerpc architectures that don't support this no?

[Bug tree-optimization/107226] [13 regression] r13-3219-g25413fdb2ac249 caused a lot of testcase failures

2022-10-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107226

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2022-10-12
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from avieira at gcc dot gnu.org ---
So this is a regression because SLP is using the new patterns for
BITFIELD_REF's of vector's. Seeing that I never actually found a good use of
supporting non-integral container types I will just remove that and that will
cause the pattern to not match BITFIELD_REF's of vectors.

I'll go test those changes.

[Bug tree-optimization/107229] [13 Regression] ICE at -O1 and -Os with "-ftree-vectorize": verify_gimple failed since r13-3219-g25413fdb2ac24933

2022-10-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107229

--- Comment #2 from avieira at gcc dot gnu.org ---
So it seems I should have taken DECL_FIELD_OFFSET into account when computing
the bitpos in get_bitfield_rep (tree-if-conv.cc).

I am testing a patch for this whilst I also look at the failures in PR107226

[Bug tree-optimization/105219] [12 Regression] SVE: Wrong code with -O3 -msve-vector-bits=128 -mtune=thunderx

2022-04-27 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219

--- Comment #18 from avieira at gcc dot gnu.org ---
(In reply to Richard Biener from comment #16)
> (In reply to rsand...@gcc.gnu.org from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index d7bc34636bd..3b63ab7b669 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -9977,7 +9981,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, 
> > > gimple
> > > *loop_vectorized_call)
> > > lowest_vf) - 1
> > >: wi::udiv_floor (loop->nb_iterations_upper_bound +
> > > bias_for_lowest,
> > >  lowest_vf) - 1);
> > > -  if (main_vinfo)
> > > +  if (main_vinfo && !main_vinfo->peeling_for_alignment)
> > > {
> > >   unsigned int bound;
> > >   poly_uint64 main_iters
> > It might be better to add the maximum peeling amount to main_iters.
> > Maybe you'd prefer this anyway for GCC 12 though.
> > 
> > I wonder if there's a similar problem for peeling for gaps,
> > in cases where the epilogue doesn't need the same peeling.
> 
> I don't quite understand the code in if (main_vinfo) but the point is
> that for our case main_iters is zero (and so is prologue_iters if that
> would exist).  I'm not sure how the code can be adjusted with that
> given it computes upper bounds and uses min() for the upper bound
> of the epilogue - we'd need to adjust that with a max (2*vf-2,
> old-upper-bound)
> when there's prologue peeling and the short cut exists (I don't actually
> compute that).
> 
> peeling for gaps means we run the epilogue for main VF more iterations,
> but that would just mean the vectorized epilogue executes one more time
> and has peeling for gaps applied as well, so the scalar epilogue runs
> for epilogue VF more iterations.
> 
> I'm not sure what conditions prevent epilogue vectorization but I think
> there were some at least.


I think disabling this for peeling makes sense for now, but just to explain how
the code works.

The perhaps misnamed 'main_iters' represents the maximum number of iterations
left to do after the main loop, either entered or not. The maximum number of
iterations left to do after the main loop the largest of the three:
 - the main loop's VF, in case we enter the main loop there are at most VF-1
iterations left, I see I didn't add a -1 there.
 - LOOP_VINFO_COST_MODEL_THRESHOLD or LOOP_VINFO_VERSIONING_THRESHOLD in case
we don't enter the main loop because we don't have enough iterations to meet
these (but do still have enough for the epilogue).

Our problem is that this didn't take peeling into account, since skipping main
-> skipping peeling and thus really the number of iters we could be left with
after skipping main are actually main_iters + to-peel.

So I think the approach should be to add 'to_peel' to main_iters where
'to_peel' is either:
VF - 1 if PEELING_FOR_GAPS or PEELING_FOR_ALIGNMENT = -1
PEELING_FOR_ALIGNMENT otherwise.

But like I said first, disabling is probably the safest and easiest for gcc 12
and given the niche of this, I'm not even sure it's worth tightening it for gcc
13?

[Bug target/105157] [12 Regression] compile-time regressions with generic tuning since r12-7756-g27d8748df59fe6

2022-04-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105157

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from avieira at gcc dot gnu.org ---
The commit above should have fixed the issue. Let me know if you still observe
the higher compile-time in your nightlies.

[Bug target/105157] [12 Regression] compile-time regressions with generic tuning since r12-7756-g27d8748df59fe6

2022-04-06 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105157

--- Comment #9 from avieira at gcc dot gnu.org ---
Found the issue, it's due to the way we encode TARGET_CPU_DEFAULT in aarch64,
it is only able to support 64 cores and we have 65 now.

Testing a work around for now and we have plans to fix this properly in GCC 13.

[Bug rtl-optimization/104498] Alias attribute being ignored by scheduler

2022-02-21 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from avieira at gcc dot gnu.org ---
Should be fixed with latest patch.

[Bug rtl-optimization/104498] Alias attribute being ignored by scheduler

2022-02-11 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

--- Comment #7 from avieira at gcc dot gnu.org ---
And I was thinking it didn't know how to handle anchor + offset...

Anyway if I just record the swap and use it to invert the distance calculation
that seems to 'work' for the testcase. I'm happy to go bootstrap it, or would
you rather fix this some other way?

[Bug rtl-optimization/104498] Alias attribute being ignored by scheduler

2022-02-11 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

--- Comment #5 from avieira at gcc dot gnu.org ---
You mean this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92294

it only works for direct symbols I think it never enters
the block under: if (GET_CODE (x) == SYMBOL_REF && GET_CODE (y) == SYMBOL_REF)

which is where he made his changes. I'll go try to understand his changes
better, just had a quick look over.

[Bug rtl-optimization/104498] Alias attribute being ignored by scheduler

2022-02-11 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

--- Comment #3 from avieira at gcc dot gnu.org ---
Sorry some confusion there, I thought it was base_alias_check bailing out
early, but that seems to return true, it is the memrefs_conflict_p that returns
0.

I suspect rtx_equal_for_memref_p should have returned 1 for:
x:
(plus:DI (mult:DI (reg:DI 99 [ off.0_1 ])
(const_int 4 [0x4]))
(const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
(const_int 16 [0x10]

and y:
(plus:DI (mult:DI (reg:DI 99 [ off.0_1 ])
(const_int 4 [0x4]))
(symbol_ref:DI ("b") [flags 0x2] ))

But it does not... must be because of that trailing (equivalence notes? that's
what I assume they are?)

[Bug rtl-optimization/104498] Alias attribute being ignored by scheduler

2022-02-11 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

--- Comment #1 from avieira at gcc dot gnu.org ---
Forgot to mention, this happens during the sched1 pass.

[Bug rtl-optimization/104498] New: Alias attribute being ignored by scheduler

2022-02-11 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104498

Bug ID: 104498
   Summary: Alias attribute being ignored by scheduler
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Whilst working on a tuning structure I saw a correctness regression that I
believe is a result of the alias attribute not working properly.

You can reproduce it using an existing tuning for AArch64 using:
gcc -O2 src/gcc/gcc/testsuite/gcc.c-torture/execute/alias-2.c -S
-mtune=cortex-a34

This will lead to the 'a[off] = 2' store being moved after the b load in
'b[off] != 2'.

In RTL:

(insn 23 18 19 2 (set (reg:SI 110 [ b[off.0_1] ])
(mem:SI (plus:DI (mult:DI (reg:DI 99 [ off.0_1 ])
(const_int 4 [0x4]))
(reg/f:DI 97)) [1 b[off.0_1]+0 S4 A32]))
"gcc/gcc/testsuite/gcc.c-torture/execute/alias-2.c":10:6 52 {*movsi_aarch64}
 (expr_list:REG_DEAD (reg:DI 99 [ off.0_1 ])
(expr_list:REG_DEAD (reg/f:DI 97)
(nil
(insn 19 23 24 2 (set (mem:SI (plus:DI (mult:DI (reg:DI 99 [ off.0_1 ])
(const_int 4 [0x4]))
(reg/f:DI 104)) [1 a[off.0_1]+0 S4 A32])
(reg:SI 106)) "gcc/gcc/testsuite/gcc.c-torture/execute/alias-2.c":9:9
52 {*movsi_aarch64}
 (expr_list:REG_DEAD (reg:SI 106)
(expr_list:REG_DEAD (reg/f:DI 104)
(nil

After some debugging I found that true_dependence returns false for these two
memory accesses because base_alias_check sees they have different base objects
('a' and 'b') and deduces they can't alias based on that, without realising 'b'
isn't an actual base object but an alias to 'a'. I think we should make it so
that at expand pointers to 'b' get 'a' as a base object.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-25 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #12 from avieira at gcc dot gnu.org ---
Right and did you happen to see a perf increase on these benchmarks with any of
the patches I mentioned the hash of in the previous comment?

Just to explain a bit further what I think is going on. Before my initial
patches the epilogue loop analysis would start at the mode_i + 1 of the first
loop, in other others, the next mode in the list of modes.

After the patch (1) we started this from mode_i = 1, so the first mode after
VOIDmode, this caused some ICEs if the target didn't add any, not sure about
your targets, but that was fixed in (2).

In patch (3) Kewen added a fix to my check for potential use of partial
vectors, to check the param_vect_partial_vector_usage since that can disable
partial vector even if the target supports them.

So I suspect that either of these 3 patches inadvertently changed the
vectorization strategy for the epilogue of some loop(s) in these benchmarks. So
when I commited patch (4) f4ca0a53be18dfc7162fd5dcc1e73c4203805e14, the
vectorization strategy went back to what it was previously. If this is indeed
what happened then the regression you are seeing is just an indication that the
original vectorization strategy was sub-optimal. This is something that should
be looked at in separate and looked at as an optimization, probably by
improving the cost modelling of the vectorizer for your target.
















Patch 1)
commit d3ff7420e941931d32ce2e332e7968fe67ba20af
Author: Andre Vieira 
Date:   Thu Dec 2 14:34:15 2021 +

[vect] Re-analyze all modes for epilogues

Patch 2)
commit 016bd7523131b645bca5b5530c81ab5149922743
Author: Andre Vieira 
Date:   Tue Jan 11 15:52:59 2022 +

[vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

Patch 3)
commit 6d51a9c6447bace21f860e70aed13c6cd90971bd
Author: Kewen Lin 
Date:   Fri Jan 14 07:02:10 2022 -0600

vect: Check partial vector param for supports_partial_vectors [PR104015]

Patch 4)
commit f4ca0a53be18dfc7162fd5dcc1e73c4203805e14
Author: Andre Vieira 
Date:   Wed Jan 19 14:11:32 2022 +

vect: Fix epilogue mode skipping

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-24 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #10 from avieira at gcc dot gnu.org ---
Hi Levy,

I did a quick experiment, compiled exchange2_r with trunk and with trunk + all
my epilogue and unroll vector patches reverted, with '-march=alderlake -Ofast
-flto -funroll_loops' and the codegen is pretty much the same.

Could it be that picking a different mode than we did before all of my patches,
was a better choice? If this is the case then this is something that should be
fixed by an appropriate cost-model, picking the best mode for the specific
loop's epilogue.

The patches I reverted were:
f4ca0a53be18dfc7162fd5dcc1e73c4203805e14
7ca1582ca60dc84cc3fc46b9cda620e2a0bed1bb
016bd7523131b645bca5b5530c81ab5149922743
d3ff7420e941931d32ce2e332e7968fe67ba20af

What were you using as a baseline for that last regression?

[Bug target/104015] [12 regression] gcc.dg/vect/slp-perm-9.c fails on power 9 (only)

2022-01-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104015

--- Comment #7 from avieira at gcc dot gnu.org ---
Yeah I'm with Richard on this one, I just checked and the generated assembly is
the same for before and after my patch, so this looks like a testism.


And yeah I agree, if we were to decide to unroll this for instance then you'd
likely see it being printed more too, since you would likely end up with the
epilogue using the same mode.

I'll suggest changing it to just testing the existance of that string, rather
than requring it N times.

Having said that, the fail will go away for this particular case with the param
change.

[Bug target/104015] [12 regression] gcc.dg/vect/slp-perm-9.c fails on power 9 (only)

2022-01-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104015

--- Comment #5 from avieira at gcc dot gnu.org ---
Thanks Kewen, that seems worrying, I'll have a look.

[Bug target/104015] [12 regression] gcc.dg/vect/slp-perm-9.c fails on power 9 (only)

2022-01-14 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104015

--- Comment #3 from avieira at gcc dot gnu.org ---
Hi Kewen,

Thanks for the analysis. The param_vect_partial_vector_usage suggestion seems
valid, but that shouldn't be the root cause. 

 I would expect an unpredicated V8HI epilogue to fail for a V8HI main loop
(unless the main loop was unrolled).

That is what the following code in vect_analyze_loop_2 is responsible for:
  /* If we're vectorizing an epilogue loop, the vectorized loop either needs
 to be able to handle fewer than VF scalars, or needs to have a lower VF
 than the main loop.  */
  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
  && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
  && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
return opt_result::failure_at (vect_location,
   "Vectorization factor too high for"
   " epilogue loop.\n");

So PR103997 is looking at fixing the skipping, because we skip too much now.
You seem to be describing a case where it doesn't skip enough, but like I said
that should be dealt with the code above, so I have a feeling there may be some
confusion here.

I have a patch for the earlier bug at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588330.html 
This is still under review whils we work out a better way of dealing with the
issue. Could you maybe check whether that fixes your failures? I'll start a
cross build for powerpc in the meantime to see if I can check out these tests. 

As for why I don't use LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P on the first loop
vinfo to skip epilogue modes, that's because it is possible to have a
non-predicated main loop with a predicated epilogue. The test I added for
aarch64 with that patch is a motivating case.

On another note, unfortunately LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P only
'forces' the use of partial vectors it doesn't tell us whether it is possible
or not AFAIU, hence why I introduced that new function, that really only checks
whether the target is at all capable of partial vector generation, since if we
know it's not possible at all we can skip more modes and avoid unnecessary
analysis.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #7 from avieira at gcc dot gnu.org ---
Hmm thinking out loud here. As vector sizes (or ISAs) change vectorization
strategies could indeed change. Best that I can think of is things like
rounding, where you might need to do operations in higher precision, and some
targets could potentially support instructions that widen, round and narrow
again in the same instruction at some size + ISA combination and not in other,
which means some would have a 'higher' element size mode in there where others
don't. But that assumes the vectorizer would represent such 'widen + round +
narrow' instructions in a single pattern, hiding the 'higher precision'
elements. Which as far as I know don't exist right now.

There may be other cases I can't think of ofc. We could always be even more
conservative and only skip if the highest possible element size for the current
vector size + ISA would lead to a mode with NUNITS greater or equal to the
current vector mode. Or ... just never skip a mode, I don't have a good feeling
for how much that would cost compile time wise though.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #5 from avieira at gcc dot gnu.org ---
Yeah I made a mistake there using the vector_mode like that, since that vector
mode really only determines vector size (and vector ISA for aarch64).

I am almost finished testing a patch that instead goes through the
'used_vector_modes' to find the largest element for all used vector modes, then
use related_vector_mode to get the vector mode for that element with the same
size as the current vector_mode[mode_i]. That would give us the lowest possible
VF for that loop and vector size.

Should be posting the fix soon.

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #8 from avieira at gcc dot gnu.org ---
The patch Jeff mentioned is this:
[vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

gcc/ChangeLog:

* tree-vect-loop.c (vect-analyze-loop): Handle scenario where target
does not add autovectorize_vector_modes.

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=016bd7523131b645bca5b5530c81ab5149922743

Should be OK to close this now?

[Bug tree-optimization/103971] [12 regression] build fails after r12-6420, ICE at libgfortran/generated/matmul_i1.c:2450

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103971

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #2 from avieira at gcc dot gnu.org ---
Have been told powerpc is working again after:
[vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

gcc/ChangeLog:

* tree-vect-loop.c (vect-analyze-loop): Handle scenario where target
does not add autovectorize_vector_modes.

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=016bd7523131b645bca5b5530c81ab5149922743



Closing this PR.

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #7 from avieira at gcc dot gnu.org ---
Thanks for confirming that Jeff :)

[Bug tree-optimization/103971] [12 regression] build fails after r12-6420, ICE at libgfortran/generated/matmul_i1.c:2450

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103971

--- Comment #1 from avieira at gcc dot gnu.org ---
seurer could you check whether
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588237.html fixes this?
I don't have easy access to a powerpc target for bootstrap.

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #5 from avieira at gcc dot gnu.org ---
Posted a fix on ML:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588237.html

Sorry for the breakage, wrong assumption by my part :(

[Bug tree-optimization/100981] ICE in info_for_reduction, at tree-vect-loop.c:4897

2021-06-09 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100981

--- Comment #6 from avieira at gcc dot gnu.org ---
FYI Tamar asked me to make sure the instructions were being generated. I
checked and they were, but not being used as it decides to inline MAIN__ and
inlining seems to break (as in not apply/missed oppurtunity) the complex
optimization.

So for this specific test I'd use -fno-inline, it executes the fcmla
instructions that way and it runs fine.

[Bug tree-optimization/100981] ICE in info_for_reduction, at tree-vect-loop.c:4897

2021-06-09 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100981

--- Comment #5 from avieira at gcc dot gnu.org ---
Yeah that works. Ran it as is, no abort, ran it with s/ne/eq/ and it aborts.

[Bug rtl-optimization/98791] [10 Regression] ICE in paradoxical_subreg_p (in ira) with SVE

2021-03-15 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from avieira at gcc dot gnu.org ---
Closing now as backport is done.

[Bug rtl-optimization/98791] [10 Regression] ICE in paradoxical_subreg_p (in ira) with SVE

2021-03-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

--- Comment #8 from avieira at gcc dot gnu.org ---
Aye my bad there, Thanks for the change.

  1   2   >