[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-18 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from rsandifo at gcc dot gnu.org  
---
Fixed for arm and aarch64.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #12 from CVS Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:7313381d2ce44b72b4c9f70bd5670e5d78d1f631

commit r13-1730-g7313381d2ce44b72b4c9f70bd5670e5d78d1f631
Author: Richard Sandiford 
Date:   Mon Jul 18 12:57:10 2022 +0100

arm: Replace arm_builtin_vectorized_function [PR106253]

This patch extends the fix for PR106253 to AArch32.  As with AArch64,
we were using ACLE intrinsics to vectorise scalar built-ins, even
though the two sometimes have different ECF_* flags.  (That in turn
is because the ACLE intrinsics should follow the instruction semantics
as closely as possible, whereas the scalar built-ins follow language
specs.)

The patch also removes the copysignf built-in, which only existed
for this purpose and wasn't a ârealâ arm_neon.h built-in.

Doing this also has the side-effect of enabling vectorisation of
rint and roundeven.  Logically that should be a separate patch,
but making it one would have meant adding a new int iterator
for the original set of instructions and then removing it again
when including new functions.

I've restricted the bswap tests to little-endian because we end
up with excessive spilling on big-endian.  E.g.:

sub sp, sp, #8
vstrd1, [sp]
vldrd16, [sp]
vrev16.8d16, d16
vstrd16, [sp]
vldrd0, [sp]
add sp, sp, #8
@ sp needed
bx  lr

Similarly, the copysign tests require little-endian because on
big-endian we unnecessarily load the constant from the constant pool:

vldr.32 s15, .L3
vdup.32 d0, d7[1]
vbsld0, d2, d1
bx  lr
.L3:
.word   -2147483648

gcc/
PR target/106253
* config/arm/arm-builtins.cc (arm_builtin_vectorized_function):
Delete.
* config/arm/arm-protos.h (arm_builtin_vectorized_function):
Delete.
* config/arm/arm.cc (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION):
Delete.
* config/arm/arm_neon_builtins.def (copysignf): Delete.
* config/arm/iterators.md (nvrint_pattern): New attribute.
* config/arm/neon.md (2):
New pattern.
(l2):
Likewise.
(neon_copysignf): Rename to...
(copysign3): ...this.

gcc/testsuite/
PR target/106253
* gcc.target/arm/vect_unary_1.c: New test.
* gcc.target/arm/vect_binary_1.c: Likewise.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #11 from Richard Biener  ---
(In reply to Tamar Christina from comment #10)
> For completeness, I reduced the Armhf failure and that seems to happen on
> bswap.
> 
> #include 
> #include 
> 
> void
> __sha256_process_block (uint32_t *buffer, size_t len, uint32_t *W)
> {
>  for (unsigned int t = 0; t < 16; ++t)
>  {
>W[t] = __bswap_32 (*buffer); 
> ++buffer;
>  }
> }
> 
> will ICE at -O3

So similar issue for __builtin_neon_bswapv4si_uu then which is thought to
clobber memory (it hopefully doesn't).

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #10 from Tamar Christina  ---
For completeness, I reduced the Armhf failure and that seems to happen on
bswap.

#include 
#include 

void
__sha256_process_block (uint32_t *buffer, size_t len, uint32_t *W)
{
 for (unsigned int t = 0; t < 16; ++t)
 {
   W[t] = __bswap_32 (*buffer);
   
   
++buffer;
 }
}

will ICE at -O3

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
Fixed for aarch64.  I'll do the same thing for arm.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #8 from CVS Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:00eab0c654e09c8a0f1b1a3b1c7bff8764e64991

commit r13-1647-g00eab0c654e09c8a0f1b1a3b1c7bff8764e64991
Author: Richard Sandiford 
Date:   Tue Jul 12 14:09:44 2022 +0100

Add internal functions for iround etc. [PR106253]

The PR is about the aarch64 port using an ACLE built-in function
to vectorise a scalar function call, even though the ECF_* flags for
the ACLE function didn't match the ECF_* flags for the scalar call.

To some extent that kind of difference is inevitable, since the
ACLE intrinsics are supposed to follow the behaviour of the
underlying instruction as closely as possible.  Also, using
target-specific builtins has the drawback of limiting further
gimple optimisation, since the gimple optimisers won't know what
the function does.

We handle several other maths functions, including round, floor
and ceil, by defining directly-mapped internal functions that
are linked to the associated built-in functions.  This has two
main advantages:

- it means that, internally, we are not restricted to the set of
  scalar types that happen to have associated C/C++ functions

- the functions (and thus the underlying optabs) extend naturally
  to vectors

This patch takes the same approach for the remaining functions
handled by aarch64_builtin_vectorized_function.

gcc/
PR target/106253
* predict.h (insn_optimization_type): Declare.
* predict.cc (insn_optimization_type): New function.
* internal-fn.def (IFN_ICEIL, IFN_IFLOOR, IFN_IRINT, IFN_IROUND)
(IFN_LCEIL, IFN_LFLOOR, IFN_LRINT, IFN_LROUND, IFN_LLCEIL)
(IFN_LLFLOOR, IFN_LLRINT, IFN_LLROUND): New internal functions.
* internal-fn.cc (unary_convert_direct): New macro.
(expand_convert_optab_fn): New function.
(expand_unary_convert_optab_fn): New macro.
(direct_unary_convert_optab_supported_p): Likewise.
* optabs.cc (expand_sfix_optab): Pass insn_optimization_type to
convert_optab_handler.
* config/aarch64/aarch64-protos.h
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64.cc
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete.
* config/i386/i386.cc (ix86_optab_supported_p): Handle
lround_optab.
* config/i386/i386.md (lround2):
Remove
optimize_insn_for_size_p test.

gcc/testsuite/
PR target/106253
* gcc.target/aarch64/vect_unary_1.c: Add tests for iroundf,
llround, iceilf, llceil, ifloorf, llfloor, irintf and llrint.
* gfortran.dg/vect/pr106253.f: New test.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #7 from Richard Biener  ---
Btw, I can see

FAIL: gcc.dg/vect/vect-rounding-lceil.c (internal compiler error: in
vect_transf
orm_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-rounding-lfloor.c (internal compiler error: in
vect_transform_loops, at tree-vectorizer.cc:1032)

FAIL: gfortran.dg/g77/20010430.f   -O2  (internal compiler error: in
vect_transform_loops, at tree-vectorizer.cc:1032)

(NINT user)

when testing aarch64-linux.  Those are all this aarch64 builtin issue (and
probably also reproduce on the arm backend side).

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org
 Target|aarch64-linux-gnu   |aarch64-linux-gnu,
   ||arm-none-linux-gnueabihf

--- Comment #6 from Tamar Christina  ---
Same problem happens with Armhf when building libc.

during GIMPLE pass: vect
In file included from sha256.c:213:
./sha256-block.c: In function ‘__sha256_process_block’:
./sha256-block.c:6:1: internal compiler error: in vect_transform_loops, at
tree-vectorizer.cc:1032
6 | __sha256_process_block (const void *buffer, size_t len, struct
sha256_ctx *ctx)
  | ^~

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #5 from Andrew Pinski  ---
(In reply to rsand...@gcc.gnu.org from comment #4)
> I think for those we have to honour the prevailing
> flush-to-zero mode, which makes the
> functions at best pure rather than const.

GCC does handle changing of the rounding mode or change of the denormalized
flush to zero mode even in general floating point code. So the question becomes
should  wait for the infrastructure to be added to fix that case or add another
builtin?

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-11 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
I guess we'll need different patterns in that case.  These builtins
are also used to expand ACLE intrinsics, and I think for those we
have to honour the prevailing flush-to-zero mode, which makes the
functions at best pure rather than const.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #3 from Richard Biener  ---
aarch64_builtin_vectorized_function is what returns these, but the decls
seem to be generated elsewhere.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot 
gnu.org
  Component|tree-optimization   |target
 CC||rsandifo at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
# .MEM = VDEF <.MEM>
vect__19.49_45 = __builtin_aarch64_lroundv4sfv4si (vect__18.48_44);

aarch64 lround vectorized builtin does not have the same guarantees
as the builtin.def one.  It's good that those cases are now uncovered
(I've already fixed a similar case in the x86 backend).

Can a target maintainer please take this?