from:"stammark at gcc dot gnu.org"

[Bug c/112702] New: C23, C++23: Extended characters not valid in an identifier with -pedantic

2023-11-24 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112702

Bug ID: 112702
   Summary: C23, C++23: Extended characters not valid in an
identifier with -pedantic
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

Hi all,

This is likely a symptom of the WIP-ness of C23 and C++23 support in the
frontends, but see here:

https://godbolt.org/z/78KeK1fnG

The use of extended characters in identifiers with -pedantic stopped working

* For C, in GCC13
* For C++ in GCC12

Removing -pedantic makes the compilation succeed.

Is this expected behaviour with -pedantic or a bug?

Thanks,

[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE

2023-11-02 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

--- Comment #4 from Stam Markianos-Wright  ---
Bisected to f55cdce3f8dd8503e080e35be59c5f5390f6d95e

Attached preprocessed source and a creduced-reproducer of it

[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE

2023-11-02 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

--- Comment #3 from Stam Markianos-Wright  ---
Created attachment 56493
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56493=edit
Full preprocessor reproducer

[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE

2023-11-02 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

--- Comment #2 from Stam Markianos-Wright  ---
Created attachment 56492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56492=edit
creduced reproducer

[Bug target/112337] New: arm: ICE in arm_effective_regno

2023-11-01 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

Bug ID: 112337
   Summary: arm: ICE in arm_effective_regno
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

Hi all,

I found this ICE when compiling CMSIS-NN with latest trunk:

./build-arm-eabi-armv8.1-m.main+mve.fp+fp.dp/install/bin/arm-eabi-gcc
-mcpu=cortex-m55
~/gnu/CMSIS-NN/Source/NNSupportFunctions/arm_nn_depthwise_conv_nt_t_padded_s8.c
-I ~/gnu/CMSIS-NN/Include/ -O3 -S
during RTL pass: ira
/home/stamar01/gnu/CMSIS-NN/Source/NNSupportFunctions/arm_nn_depthwise_conv_nt_t_padded_s8.c:
In function 'arm_nn_depthwise_conv_nt_t_padded_s8':
/home/stamar01/gnu/CMSIS-NN/Source/NNSupportFunctions/arm_nn_depthwise_conv_nt_t_padded_s8.c:172:1:
internal compiler error: in arm_effective_regno, at config/arm/arm.cc:13671
  172 | }
  | ^
0x1b590f2 arm_effective_regno
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/arm.cc:13671
0x1b5923a mve_vector_mem_operand(machine_mode, rtx_def*, bool)
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/arm.cc:13701
0x23015c2 mve_memory_operand(rtx_def*, machine_mode)
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/predicates.md:39
0x23d79fa recog_235
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/mve.md:3636
0x241db9c recog_287
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/neon.md:6161
0x24540af recog_344
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/mve.md:6390
0x2459355 recog(rtx_def*, rtx_insn*, int*)
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/sync.md:462
0x15663f5 insn_invalid_p(rtx_insn*, bool)
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/recog.cc:358
0x15667ad verify_changes(int)
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/recog.cc:469
0x1350f89 equiv_can_be_consumed_p
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira-costs.cc:1767
0x13518f0 calculate_equiv_gains
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira-costs.cc:1887
0x1351fbe find_costs_and_classes
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira-costs.cc:2007
0x135404c ira_costs()
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira-costs.cc:2564
0x1347e82 ira_build()
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira-build.cc:3481
0x133d895 ira
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira.cc:5793
0x133e215 execute
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/ira.cc:6117
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

opening this up in GDB I see that:

#1  0x01b590f3 in arm_effective_regno (op=0x76e130a8, strict=false)
at /home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/arm.cc:13671
13671 gcc_assert (REG_P (op));
(gdb) p debug_rtx (op)
(mem/f/c:SI (plus:SI (reg/f:SI 103 afp)
(const_int 28 [0x1c])) [2 output_bias+0 S4 A32])


And slightly further up:

#3  0x023015c3 in mve_memory_operand (op=0x76e24b70,
mode=E_V4SImode) at
/home/stamar01/gnu/v8.X-M/src/gcc/gcc/config/arm/predicates.md:39
39  && mve_vector_mem_operand (GET_MODE (op), XEXP (op,
0),
(gdb) p debug_rtx (op)
(mem:V4SI (post_inc:SI (mem/f/c:SI (plus:SI (reg/f:SI 103 afp)
(const_int 28 [0x1c])) [2 output_bias+0 S4 A32])) [0
MEM[(int[4] *)bias_176]+0 S16 A32])



I've started a bisect.

[Bug target/110255] arm: MVE intrinsics C++ polymorphism with -flax-vector-conversions

2023-06-14 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110255

Stam Markianos-Wright  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Stam Markianos-Wright  ---
Aha! Thanks, Andrew, that makes sense. I'll go back to the original authors for
this and check if there's any good reason why they are using
-flax-vector-conversions and if they can just change their code :)

Also woops, the godbolt link I gave above was in the middle of me messing
around with casts. Here is a clean one: https://godbolt.org/z/c9vaas6P8 . And
indeed, casting to the "correct" scalar type for the intrinsic (in this case
uint16_t), does indeed make this work

However, this is sounding like this bugzilla should also go to RESOLVED
INVALID. Sorry for the false alarm!

[Bug target/110255] New: arm: MVE intrinsics C++ polymorphism with -flax-vector-conversions

2023-06-14 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110255

Bug ID: 110255
   Summary: arm: MVE intrinsics C++ polymorphism with
-flax-vector-conversions
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

Hi all,
See: https://godbolt.org/z/53ME1fGfM

The compiler with the error is the one that is using -flax-vector-conversions
through the C++ frontend.
Unsure if this is something to do with the C++ front-end or something in the
target backend (and how the builtins are registered with the front-end).

This seems to happen regardless of if the vaddq intrinsic has been
"restructured" by Christophe's
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615997.html (so going back
to older GCC12,13 still gives the error), but another intrinsic, like vbicq,
doesn't give the error at all (although that has a different context of the
`int` immediate having to be a compile-time constant).
(clang handles all this fine FWIW)

Has anyone seen this kind of thing before, have any ideas on workarounds, or
have any insight on if this this invalid C++ to begin with?

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2023-05-18 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

Stam Markianos-Wright  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #13 from Stam Markianos-Wright  ---
Fixed on GCC12 onwards.

[Bug target/109697] New: arm: lack of MVE instruction costing causing worse codegen on a vec_duplicate

2023-05-02 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109697

Bug ID: 109697
   Summary: arm: lack of MVE instruction costing causing worse
codegen on a vec_duplicate
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

Hi all,

In the arm backend, for MVE targets we previously had this bug on the vcmp
patterns: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107987

The fix is fine, but it resulted in some failing tests:
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
(after Andrea improved these tests in GCC13)

The testcases that are failing are the ones that compare against a scalar
immediate (e.g. "vcmpeqq (a, 1.1)"), because the compiler prefers to do:
```
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16eq, q0, q3
```
When previously we would much more simply:
```
movsr3, #1
vcmp.u16cs, q0, r3
```

The underlying reason for this change is a known deficiency of the MVE
implementation: the lack of proper instruction costing.
The compiler falls back to calculating costs based on the operands and the new
vec_duplicate in the patterns (mve_vcmpq_n_, etc) gets given
a cost of 32 (when instead it should know that the vec duplicate is free and
this is all just one instruction...), so the "literal load + vector-vector
compare" wins out against the "put the immediate in a GP reg + vector-scalar
compare".
For now, I plan on simply XFAIL-ing the tests.

[Bug target/107674] [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/STR instructions

2023-04-18 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107674

Stam Markianos-Wright  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Stam Markianos-Wright  ---
Thanks Richard! I believe this is now fixed. This is likely not applicable for
backporting as Andre's changes included some mid-end additions and it's only a
missed-optimization regression -- hence closing this ticket.

[Bug target/108177] MVE predicated stores to same address get optimized away

2023-04-06 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108177

--- Comment #5 from Stam Markianos-Wright  ---
With the fix to MVE auto_inc having gone in as
ddc9b5ee13cd686c8674f92d46045563c06a23ea I have found that this fix keeps the
auto-inc on these predicated stores broken.

It seems to fail in auto_inc_dec at this condition:

```
  /* Make sure this reg appears only once in this insn.  */
  if (count_occurrences (PATTERN (mem_insn.insn), mem_insn.reg0, 1) != 1)
{
  if (dump_file)
fprintf (dump_file, "mem count failure\n");
  return false;
}
```
(which makes sense with the pattern now having the MEM appear twice)


I guess this is not urgent since this is only a performance impact on one
instruction. Also if the change needs to be in the auto-inc pass instead of the
backend, then likely something for GCC14, but I thought this would be a good
place to record this ;)

Does anyone have any ideas on this? Or I wonder what the AVX case does for this

[Bug target/107674] [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/STR instructions

2023-04-04 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107674

--- Comment #3 from Stam Markianos-Wright  ---
Thank you, Andre for fixing the Part 1 in this ticket :)

The part 2 we've found to be a regression since r13-416-g485a0ae0982abe and is
also the reason why the mve_*_memory_nodes tests are currently failing.

[Bug target/108443] arm: MVE wrongly re-interprets predicate constants

2023-03-20 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108443

Stam Markianos-Wright  changed:

   What|Removed |Added

 CC||stammark at gcc dot gnu.org

--- Comment #2 from Stam Markianos-Wright  ---
What are people's thoughts on getting this (and the rest of the patch series
the fix was part of) backported to GCC12?

One of the changes in the series is arguably a mid-end optimisation (the change
to simplify-rtx), but it is a pre-requisite to getting this wrong-code bug
resolved.

[Bug target/109158] New: arm: errors when mixing attribute((pcs("aapcs-vfp"))) with +nofp

2023-03-16 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109158

Bug ID: 109158
   Summary: arm: errors when mixing
__attribute__((pcs("aapcs-vfp"))) with +nofp
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

I've detected a couple of minor issues when using
__attribute__((pcs("aapcs-vfp"))) in no-fp architectures, resulting in wrong
code or an ICE.


reproducer code:

__attribute__((pcs("aapcs-vfp"))) a(__fp16);
void foo() {
  a(0.0);
}

Issue 1: when compiled as `-march=armv8.1-m.main+mve+nofp -mfloat-abi=softfp
-mfp16-format=ieee` we emit an invalid FP instruction `vmov.f16` to put the
immediate into `s0` before returning from the function.

I believe this is a simple case of the `*mov_vfp_16`assuming that
all variants are valid with any `TARGET_VFP_FP16INST || TARGET_HAVE_MVE` when
actually the `vmov.f16` cases are only valid with TARGET_VFP_FP16INST, but not
with MVE.


Issue 2: when compiled as `-march=armv8.1-m.main+nofp -mfloat-abi=softfp` (i.e.
no MVE, no FP at all) we ICE with `maximum number of generated reload insns per
insn achieved` -- this class of error also happens with `float` and `double`
types.

IMO this is an invalid configuration and we should be emitting a user error
instead of the ICE, similar to what we do if the user requests -mfloat-abi=hard
without any MVE or FP present:

```  else if (TARGET_HARD_FLOAT_ABI)
{
  arm_pcs_default = ARM_PCS_AAPCS_VFP;
  if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2)
  && !bitmap_bit_p (arm_active_target.isa, isa_bit_mve))
error ("%<-mfloat-abi=hard%>: selected architecture lacks an FPU");
}```

[Bug target/100000] non-leaf epologue/prologue used if MVE v4sf is used for load/return

2023-03-09 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

Stam Markianos-Wright  changed:

   What|Removed |Added

 CC||stammark at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Stam Markianos-Wright  ---
I tried out Richard's suggestion in arm_vector_mode_supported_p (allowing V8HF,
V4SF and V2DF unconditionally) and it seems to have worked! After a testsuite
run I found a few ICEs due to a number of patterns that needed enabling:

@mve_vpselq_ which was only enabled for mve.fp

And then all the patterns that were conditional on:
-  "((TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode))
-|| (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode)))

mve_vec_extract
*mve_vec_extract_sext_internal
*mve_vec_extract_zext_internal
mve_vec_set_internal
*movmisalign_mve_store
*movmisalign_mve_load

These weren't causing any ICEs but also made sense to enable:
mve_vst2q
mve_vld2q
mve_vld4q

No regressions after that, but I think I will hold off until GCC14 Stage 1 to
post this patch, just to be safe.

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2023-01-16 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

Stam Markianos-Wright  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Stam Markianos-Wright  ---
This should now be resolved on all active branches (11, 12, 13)

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-12-09 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

Stam Markianos-Wright  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
Version|12.2.0  |13.0
  Known to fail||11.3.1, 12.2.1
 Ever confirmed|0   |1
   Last reconfirmed|2022-11-17 00:00:00 |2022-12-09

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-12-01 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #9 from Stam Markianos-Wright  ---
> Clearly Helium+Linux on Godbolt is a bit confused 

Yea, I agree -- it still shouldn't be an unintuitive front-end type clash
error, though!

I've posted another patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607675.html (see there
for what the error was --- interestingly it was coming from `__ARM_mve_coerce3`
and it didn't directly have anything to do with the float types)

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-29 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #7 from Stam Markianos-Wright  ---
(In reply to Kevin Bracey from comment #6)
> Retesting the Godbolt on trunk, it's now worse - every line produces
> multiple not-very-informative errors:
> 
> source>:7:9: error: '_Generic' specifies two compatible types
> 7 | x = vmulq(x, 0.5); // ok
>   | ^
> :7:9: note: compatible type is here
> 7 | x = vmulq(x, 0.5); // ok
>   | ^
> 
> (repeated 6 times per source line)

Interesting... Thanks for spotting this, I didn't see this in my testing,
because it doesn't seem to happen on baremetal `arm-none-eabi` (and I still
can't replicate it there), but I do see this on the linux target (let me know
if you are seeing anything different). I am investigating further!

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-17 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

Stam Markianos-Wright  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from Stam Markianos-Wright  ---
(In reply to Kevin Bracey from comment #2)
> I've just spotted another apparent generic selection problem in my
> reproducer for  bug 107714 - should I create a new issue for it?

Our patch series has gone up to the mailing list! For _Float16 see:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html

For `vmuq` this:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606575.html

works on my side, so if it gets merged and works for your case, too, then I
guess we don't have to worry about a new bug report :)

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-11-17 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

Stam Markianos-Wright  changed:

   What|Removed |Added

   Last reconfirmed||2022-11-17

--- Comment #3 from Stam Markianos-Wright  ---
Thanks for finding this! Confirmed the `vld2` bug on latest trunk -- my guess
would be either a GCC backend or a gas bug (I'm not familiar with these
instructions).

For the `vmulq` issue I'll reply under the other thread, but I believe that
very soon we'll be able to put that down as "already fixed", so we can keep
this thread for `vld2` only.

[Bug target/107674] New: [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/STR instructions

2022-11-14 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107674

Bug ID: 107674
   Summary: [11/12/13 Regressions] arm: MVE codegen regressions on
VCTP and vector LDR/STR instructions
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

We've found a couple of performance regressions with Arm MVE.  These can be
seen here:
https://godbolt.org/z/onPjfW4zj

* Between GCC 11 and 12 we seem to have started emitting a strange
vmrs/sxth/vmsr instruction sequence after the vctp instruction.  I suspect this
is something to do with the introduction of MODE_VECTOR_BOOL during that
period.

* Between GCC 12 and 13 we are no longer merging the pointer increments by #16 
into the ldr/strs and we have some random movs that aren't needed either.  This
also happened in GCC 11, but we want to keep the improved codegen of GCC 12
here ;)

  This looks like a change in register allocation:

  Choosing alt 0 in insn 24:  (0) =w  (1) Ux  (2) Up {mve_vldrhq_z_sv8hi}
  Creating newreg=149, assigning class CORE_REGS to INC/DEC result r149
  Creating newreg=150 from oldreg=134, assigning class VPR_REG to r150
bad vs good
  Choosing alt 0 in insn 24:  (0) =w  (1) Ux  (2) Up {mve_vldrhq_z_sv8hi}
  Creating newreg=149 from oldreg=134, assigning class VPR_REG to r149

Does anyone have any further ideas on why these may have changed or how to fix
them?

Thanks!

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-10 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

Stam Markianos-Wright  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |stammark at gcc dot 
gnu.org
 CC||stammark at gcc dot gnu.org
   Last reconfirmed||2022-11-10
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Stam Markianos-Wright  ---
Also confirmed on trunk and assigning to myself, because I believe I've found a
fix: 
```
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 2d213e12304..ce20a6fcd24 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
short: __ARM_mve_type_int_n, \
int: __ARM_mve_type_int_n, \
long: __ARM_mve_type_int_n, \
+   _Float16: __ARM_mve_type_fp_n, \
+   __fp16: __ARM_mve_type_fp_n, \
+   float: __ARM_mve_type_fp_n, \
double: __ARM_mve_type_fp_n, \
long long: __ARM_mve_type_int_n, \
unsigned char: __ARM_mve_type_int_n, \
```

Still untested, though, and I'll likely send it to the mailing list within the
next week or two (we've got a couple more arm_mve.h changes in the pipeline, so
I'll test them all together).

[Bug libstdc++/100017] [11 regression] error: 'fenv_t' has not been declared in '::' -- canadian compilation fails

2022-01-10 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017

Stam Markianos-Wright  changed:

   What|Removed |Added

 CC||stammark at gcc dot gnu.org

--- Comment #78 from Stam Markianos-Wright  ---
Thank you, Jonathan! Is that commit OK to be backported to gcc-11, as well?

[Bug tree-optimization/103247] New: graphite: Wrong code when at -O1 or higher and -floop-nest-optimize is given without an earlier tree-cunrolli pass

2021-11-15 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103247

Bug ID: 103247
   Summary: graphite: Wrong code when at -O1 or higher and
-floop-nest-optimize is given without an earlier
tree-cunrolli pass
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stammark at gcc dot gnu.org
  Target Milestone: ---

Hi all,

I'm coming across a strange optimisation bug that happens if I try to combine
-O1 with the Graphite -floop-nest-optimize on test interchange-8
I initially spotted the symptom when compiling as:

./bin/aarch64-none-elf-gcc
../src/gcc/gcc/testsuite/gcc.dg/graphite/interchange-8.c -march=armv8-a
-specs=aem-validation.specs -o test.out -O1 -floop-nest-optimize
And then running on a simulator resulted in the abort () function being called.

and then replicated on x86 as:
gcc ../src/gcc/gcc/testsuite/gcc.dg/graphite/interchange-8.c -o test.out -O1
-floop-nest-optimize ; ./test.out
Which gives an `Aborted (core dumped)`

Some further digging showed that the execution failure seems to be something in
the graphite/loop-nest-optimize pass that requires a tree-cunrolli pass to have
been done earlier, for this test's source code, so:

# The test always runs with -floop-nest-optimize
# O2 and O3 constain tree-cunrolli
# O0 and O1 don't contain tree-cunrolli
-O0 -floop-nest-optimize: works (no abort () was called)
-O0 -fenable-tree-cunrolli -floop-nest-optimize: works
-O1 -floop-nest-optimize: broken
-O1 -fenable-tree-cunrolli -floop-nest-optimize: works
-O2 -floop-nest-optimize: works
-O2 -fdisable-tree-cunrolli -floop-nest-optimize: broken
-O3 -floop-nest-optimize: works
-O3 -fdisable-tree-cunrolli -floop-nest-optimize: broken

Is anyone aware of such a dependency between these passes and/or is this a real
bug or am I missing something?

Replicated on latest trunk and GCC-11.

Thanks!

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-28 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

Stam Markianos-Wright  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |stammark at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-28 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

--- Comment #8 from Stam Markianos-Wright  ---
I have a liiitle bit more progress here, but I have a question about
vect_get_smallest_scalar_type.

If we look at the comment before the function:

>/* Return the smallest scalar part of STMT_INFO.
>   This is used to determine the vectype of the stmt.  We generally set the
>   vectype according to the type of the result (lhs).  For stmts whose
>   result-type is different than the type of the arguments (e.g., demotion,
>   promotion), vectype will be reset appropriately (later).  Note that we have
>   to visit the smallest datatype in this function, because that determines the
>   VF.  If the smallest datatype in the loop is present only as the rhs of a
>   promotion operation - we'd miss it.

Would this be "smallest datatype in all cases", or is this more like "the
smallest
datatype within the same promotion/demotion chain"?

i.e. how should we react if we detect a smallest datatype on the rhs of "float"
when everything else in the stmt has been in the integer chain (int or,
like in this case, long int)?

>   Such a case, where a variable of this datatype does not appear in the lhs
>   anywhere in the loop, can only occur if it's an invariant: e.g.:
>   'int_x = (int) short_inv', which we'd expect to have been optimized away by
>   invariant motion.  However, we cannot rely on invariant motion to always
>   take invariants out of the loop, and so in the case of promotion we also
>   have to check the rhs.
>   LHS_SIZE_UNIT and RHS_SIZE_UNIT contain the sizes of the corresponding
>   types.  */

I have found that this is why we end up with a smaller number in:
TYPE_VECTOR_SUBPARTS (nunits_vectype) == 4
than in:
TYPE_VECTOR_SUBPARTS (*stmt_vectype_out) == 8

So I'm thinking that either A) We shouldn't allow this, and add in some check
maybe for "GET_MODE_CLASS (x) == GET_MODE_CLASS (y)"
or B) Some of the logic that generates stmt_vectype_out is deficient and it
should also be detecting the existence of a "float" type to get the VF.

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-18 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

--- Comment #7 from Stam Markianos-Wright  ---
(In reply to rsand...@gcc.gnu.org from comment #6)
> (In reply to Stam Markianos-Wright from comment #5)
> > I'm tempted to try and add a reverse:
> > 
> > || multiple_p (*stmt_vectype_out, nunits_vectype)
> > 
> > And then regtest, but I probably need to do more reading around to figure 
> > out
> > what we really should be expecting each case!
> I don't think that's right.  If nunits_vectype is not a multiple
> of stmt_vectype then the stmt_vectype contains (or might contain)
> unused elements.  The vectoriser isn't set up to work like that:
> all operations are currently supposed to be full-vector operations
> (possibly predicated, on SVE and AVX).
> 
> AFAICT the assert is correct and it's showing up a problem elsewhere.

Cool, thank you for the info and the confirmation! I will carry on
investigating to try and find the actual problem

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-18 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

--- Comment #5 from Stam Markianos-Wright  ---
(In reply to rsand...@gcc.gnu.org from comment #4)
> (In reply to Stam Markianos-Wright from comment #3)
> > Just started looking at this. I've narrowed it as the bug appearing with
> > commit 9b75f56d4b7951c60a6563964a65787b95bc.
> > 
> > I have yet to fire this up in gdb to see what's happening, but one test I
> > did do was to try commenting out the assert that is causing the ICE and it
> > then ran to completion. 
> > 
> > So one _total speculation_ would be that with these latest changes that
> > enable groups of different sizes, this condition in the assert is now too
> > strict:
> > 
> > 
> > multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
> > TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
> What are nunits_vectype and *stmt_vectype_out at the point
> that the assert fails?

Hmm so looking at this on commit 9b75f56d4b7951c60a6563964a65787b95bc,
we have:



nunits_vectype is a: vector(SUBPARTS {coeffs = {4, 0}}) float

*stmt_vectype_out is a: vector(SUBPARTS {coeffs = {8, 0}}) long int

In this case we are checking multiple_p (4, 8) == false
(and also group_size == 9 here which is expected)



Before the commit we'd get here with:



nunits_vectype is a: vector(SUBPARTS {coeffs = {4, 0}}) float

*stmt_vectype_out is a: vector(SUBPARTS {coeffs = {2, 0}}) long int

And here we were checking multiple_p (4, 2) == true



I'm tempted to try and add a reverse:

|| multiple_p (*stmt_vectype_out, nunits_vectype)

And then regtest, but I probably need to do more reading around to figure out
what we really should be expecting each case!

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2021-01-18 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

--- Comment #3 from Stam Markianos-Wright  ---
Just started looking at this. I've narrowed it as the bug appearing with commit
9b75f56d4b7951c60a6563964a65787b95bc.

I have yet to fire this up in gdb to see what's happening, but one test I did
do was to try commenting out the assert that is causing the ICE and it then ran
to completion. 

So one _total speculation_ would be that with these latest changes that enable
groups of different sizes, this condition in the assert is now too strict:


multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))

[Bug target/91816] [8/9 Regression] Arm generates out of range conditional branches in Thumb2

2020-11-30 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816

Stam Markianos-Wright  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Stam Markianos-Wright  ---
Patch is now on all active branches, so moving to RESOLVED. Thanks to all for
their reviews!

[Bug rtl-optimization/90249] [9/10/11 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2020-11-25 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

--- Comment #14 from Stam Markianos-Wright  ---
Was reminded that this was still open after many months. Have fixed the commit
and am in the process of backporting to gcc-8,9.(In reply to Stam
Markianos-Wright from comment #12)
> Was reminded that this was still open after many months. Have fixed the
> commit and am in the process of backporting to gcc-8,9.

Excuse that, am an idiot and commented in entirely the wrong place!

[Bug rtl-optimization/90249] [9/10/11 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2020-11-25 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

Stam Markianos-Wright  changed:

   What|Removed |Added

   Assignee|stammark at gcc dot gnu.org|unassigned at gcc dot 
gnu.org

--- Comment #13 from Stam Markianos-Wright  ---
Was reminded that this was still open after many months. Have fixed the commit
and am in the process of backporting to gcc-8,9.(In reply to Stam
Markianos-Wright from comment #12)
> Was reminded that this was still open after many months. Have fixed the
> commit and am in the process of backporting to gcc-8,9.

Excuse that, am an idiot and commented in entirely the wrong place!

[Bug rtl-optimization/90249] [9/10/11 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2020-11-25 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

Stam Markianos-Wright  changed:

   What|Removed |Added

 CC||stammark at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |stammark at gcc dot 
gnu.org

--- Comment #12 from Stam Markianos-Wright  ---
Was reminded that this was still open after many months. Have fixed the commit
and am in the process of backporting to gcc-8,9.

[Bug tree-optimization/96974] [10/11 Regression] ICE in vect_get_vector_types_for_stmt compiling for SVE

2020-10-15 Thread stammark at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

Stam Markianos-Wright  changed:

   What|Removed |Added

   Host||x86_64-linux-gnu
 CC||stammark at gcc dot gnu.org
   Last reconfirmed||2020-10-15

--- Comment #1 from Stam Markianos-Wright  ---
Also confirmed on today's trunk aarch64-none-elf

Seems to relate to use of that specific type of builtin or type of builtins (I
tested some other math builtins and they were ok, but e.g. __builtin_llrintf
produced the same ICE), and only inside the `d ? b() : 0` condition (i.e.
changing that `coeffs[e] = b()` makes the ICE go away)

[Bug target/93300] ICE in convert_mode_scalar, at expr.c:325

2020-02-12 Thread stammark at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Stam Markianos-Wright  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Stam Markianos-Wright  ---
Have not seen any further instances of this issue since my patch, so moving to
RESOLVED.

[Bug target/93300] ICE in convert_mode_scalar, at expr.c:325

2020-02-06 Thread stammark at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Stam Markianos-Wright  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

37 matches

Mail list logo