Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len

2024-03-01 Thread Andre Vieira (lists)
Hi, Bootstrapped and tested the gcc-13 backport of this on gcc-12 for aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions. OK to push to gcc-12 branch? Kind regards, Andre Vieira On 10/11/2023 13:16, Richard Biener wrote: The following fixes the issue that when SLP stmts

Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-03-01 Thread Andre Vieira (lists)
Hi Thiago, Thanks for this, LGTM but I can't approve this, CC'ing Richard. Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' from bullet points 2-4. Kind regards, Andre On 13/01/2024 00:55, Thiago Jung Bauermann wrote: Since commits 2c3db94d9fd ("c: Turn

Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

2024-02-28 Thread Andre Vieira (lists)
On 27/02/2024 08:47, Richard Biener wrote: On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: On 05/02/2024 09:56, Richard Biener wrote: On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: On 01/02/2024 07:19, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote

Backport PR91838 and PR110838

2024-03-25 Thread Andre Vieira (lists)
Hi, After the backport off PR target/112787 a failure was reported against x86_64, this would be fixed by backporting: * tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C (d1c072a1c3411a6fe29900750b38210af8451eeb) * tree-optimization/110838 - less aggressively fold out-of-bound shifts

[PATCHv2 0/2] aarch64, bitint: Add support for _BitInt for AArch64 Little Endian

2024-03-27 Thread Andre Vieira (lists)
Hi, Introduced a new patch to disable diagnostics for ABI breaks involving _BitInt(N) given the type didn't exist, let me know what you think of that. Also added further testing to replicate the ABI diagnostic tests to use _BitInt(N). Andre Vieira (2) aarch64: Do not give ABI change

[PATCHv2 1/2] aarch64: Do not give ABI change diagnostics for _BitInt(N)

2024-03-27 Thread Andre Vieira (lists)
This patch makes sure we do not give ABI change diagnostics for the ABI breaks of GCC 9, 13 and 14 for any type involving _BitInt(N), since that type did not exist before this GCC version. ChangeLog: * config/aarch64/aarch64.cc (bitint_or_aggr_of_bitint_p): New function.

[PATCHv2 2/2] aarch64: Add support for _BitInt

2024-03-27 Thread Andre Vieira (lists)
This patch adds support for C23's _BitInt for the AArch64 port when compiling for little endianness. Big Endianness requires further target-agnostic support and we therefor disable it for now. The tests expose some suboptimal codegen for which I'll create PR's for optimizations after this

[PATCH] aarch64: Fix _BitInt testcases

2024-04-11 Thread Andre Vieira (lists)
This patch fixes some testisms introduced by: commit 5aa3fec38cc6f52285168b161bab1a869d864b44 Author: Andre Vieira Date: Wed Apr 10 16:29:46 2024 +0100 aarch64: Add support for _BitInt The testcases were relying on an unnecessary sign-extend that is no longer generated. The tested

Re: [PATCHv2 1/2] aarch64: Do not give ABI change diagnostics for _BitInt(N)

2024-04-10 Thread Andre Vieira (lists)
regards, Andre On 28/03/2024 12:54, Richard Sandiford wrote: "Andre Vieira (lists)" writes: This patch makes sure we do not give ABI change diagnostics for the ABI breaks of GCC 9, 13 and 14 for any type involving _BitInt(N), since that type did not exist before this GCC version.

Re: [PATCHv3 2/2] aarch64: Add support for _BitInt

2024-04-10 Thread Andre Vieira (lists)
Added the target check, also had to change some of the assembly checking due to changes upstream, the assembly is still valid, but we do extend where not necessary, I do believe that's a general issue though. The _BitInt(N > 64) codegen for non-powers of 2 did get worse, we see similar

[PATCH][wwwdocs] gcc-14/changes.html: Update _BitInt to include AArch64 (little-endian)

2024-04-10 Thread Andre Vieira (lists)
Hi, Patch to add AArch64 to the list of supported _BitInt(N) in gcc-14/changes.html. OK?diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index a7ba957110183f906938d935bfa17aaed2ba20c8..55ab8c14c6d0b54e05a5f266f25c8ef1a4f959bf 100644 --- a/htdocs/gcc-14/changes.html +++

Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand

2020-11-27 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 26/11/2020 15:31, Christophe Lyon wrote: Hi Andre, Thanks for the quick feedback. On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira wrote: Hi Christophe, Thanks for these! See some inline comments. On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote: This

Re: [PATCH 3/7] arm: Auto-vectorization for MVE: veor

2020-11-26 Thread Andre Vieira (lists) via Gcc-patches
LGTM,  but please wait for maintainer review. On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote: This patch enables MVE veorq instructions for auto-vectorization. MVE veorq insns in mve.md are modified to use xor instead of unspec expression to support xor3. The xor3 expander is

Re: RFC: ARM MVE and Neon auto-vectorization

2020-12-09 Thread Andre Vieira (lists) via Gcc-patches
On 08/12/2020 13:50, Christophe Lyon via Gcc-patches wrote: Hi, My 'vand' patch changes the definition of VDQ so that the relevant modes are enabled only when !TARGET_HAVE_MVE (V8QI, ...), and this helps writing a simpler expander. However, vneg is used by vshr (right-shifts by register are

[AArch64] Fix vector multiplication costs

2021-02-03 Thread Andre Vieira (lists) via Gcc-patches
This patch introduces a vect.mul RTX cost and decouples the vector multiplication costing from the scalar one. After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a regression in vector codegen. Reproduceable with the small test added in this patch. Upon further investigation

Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-03 Thread Andre Vieira (lists) via Gcc-patches
Same patch applies cleanly on gcc-8, bootstrapped arm-none-linux-gnueabihf and ran regressions also clean. Can I also commit it to gcc-8? Thanks, Andre On 02/02/2021 17:36, Kyrylo Tkachov wrote: -Original Message- From: Andre Vieira (lists) Sent: 02 February 2021 17:27 To: gcc

Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-02 Thread Andre Vieira (lists) via Gcc-patches
Hi, This is a gcc-9 backport of the PR97528 fix that has been applied to trunk and gcc-10. Bootstraped on arm-linux-gnueabihf and regression tested. OK for gcc-9 branch? 2021-02-02  Andre Vieira      Backport from mainline     2020-11-20  Jakub Jelinek      PR target/97528     *

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-06-16 Thread Andre Vieira (lists) via Gcc-patches
On 14/06/2021 11:57, Richard Biener wrote: On Mon, 14 Jun 2021, Richard Biener wrote: Indeed. For example a simple int a[1024], b[1024], c[1024]; void foo(int n) { for (int i = 0; i < n; ++i) a[i+1] += c[i+i] ? b[i+1] : 0; } should usually see peeling for alignment (though on x86

[RFC][ivopts] Generate better code for IVs with uses outside the loop

2021-06-10 Thread Andre Vieira (lists) via Gcc-patches
On 08/06/2021 16:00, Andre Simoes Dias Vieira via Gcc-patches wrote: Hi Bin, Thank you for the reply, I have some questions, see below. On 07/06/2021 12:28, Bin.Cheng wrote: On Fri, Jun 4, 2021 at 12:35 AM Andre Vieira (lists) via Gcc-patches wrote: Hi Andre, I didn't look

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-06-14 Thread Andre Vieira (lists) via Gcc-patches
Hi, On 20/05/2021 11:22, Richard Biener wrote: On Mon, 17 May 2021, Andre Vieira (lists) wrote: Hi, So this is my second attempt at finding a way to improve how we generate the vector IV's and teach the vectorizer to share them between main loop and epilogues. On IRC we discussed my idea

Re: [PATCH][AArch64]: Use UNSPEC_LD1_SVE for all LD1 loads

2021-05-18 Thread Andre Vieira (lists) via Gcc-patches
Hi, Using aarch64_pred_mov for these was tricky as it did both store and load. Furthermore there was some concern it might allow for a predicated mov to end up as a mem -> mem and a predicated load being wrongfully reloaded to a full-load to register. So instead we decided to let the

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, So this is my second attempt at finding a way to improve how we generate the vector IV's and teach the vectorizer to share them between main loop and epilogues. On IRC we discussed my idea to use the loop's control_iv, but that was a terrible idea and I quickly threw it in the bin. The

[PATCH][AArch64]: Use UNSPEC_LD1_SVE for all LD1 loads

2021-05-14 Thread Andre Vieira (lists) via Gcc-patches
Hi, I noticed we were missing out on LD1 + UXT combinations in some cases and found it was because of inconsistent use of the unspec enum UNSPEC_LD1_SVE. The combine pattern for LD1[S][BHWD] uses UNSPEC_LD1_SVE whereas one of the LD1 expanders was using UNSPEC_PRED_X. I wasn't sure whether

[PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-24 Thread Andre Vieira (lists) via Gcc-patches
Hi, When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an

[RFC] Implementing detection of saturation and rounding arithmetic

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up IV's shared between multiple loops. When creating a similar problem with C code I noticed IVOPTs treated IV's with uses

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Thank you Kewen!! I will apply this now. BR, Andre On 25/05/2021 09:42, Kewen.Lin wrote: on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote: Hi Andre, on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: Hi, When vectorizing with --param vect-partial-vector-usage=1

[RFC][ivopts] Generate better code for IVs with uses outside the loop (was Re: [RFC] Implementing detection of saturation and rounding arithmetic)

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Streams got crossed there and used the wrong subject ... On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote: Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up

[RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-04-30 Thread Andre Vieira (lists) via Gcc-patches
Hi, The aim of this RFC is to explore a way of cleaning up the codegen around data_references.  To be specific, I'd like to reuse the main-loop's updated data_reference as the base_address for the epilogue's corresponding data_reference, rather than use the niters.  We have found this leads

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches
? On 04/05/2021 10:56, Richard Biener wrote: On Fri, 30 Apr 2021, Andre Vieira (lists) wrote: Hi, The aim of this RFC is to explore a way of cleaning up the codegen around data_references.  To be specific, I'd like to reuse the main-loop's updated data_reference as the base_address for the e

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches
On 05/05/2021 13:34, Richard Biener wrote: On Wed, 5 May 2021, Andre Vieira (lists) wrote: I tried to see what IVOPTs would make of this and it is able to analyze the IVs but it doesn't realize (not even sure it tries) that one IV's end (loop 1) could be used as the base for the other (loop

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: Since MVE has a different set of vector comparison operators from Neon, we have to update the expansion to take into account the new ones, for instance 'NE' for which MVE does not require to use 'EQ' with the inverted

Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
It would be good to also add tests for NEON as you also enable auto-vec for it. I checked and I do think the necessary 'neon_vc' patterns exist for 'VH', so we should be OK there. On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: This patch adds __fp16 support to the previous patch

Re: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, The series LGTM but you'll need the approval of an arm port maintainer before committing. I only did code-review, did not try to build/run tests. Kind regards, Andre On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: This patch enables MVE vld4/vst4 instructions for

PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Andre Vieira (lists) via Gcc-patches
Hi, As mentioned in the PR, this patch fixes up the nvectors parameter passed to vect_get_loop_mask in vectorizable_condition. Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy separately, now we gather them all at the same time and don't need to multiply vec_num with

Re: PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Andre Vieira (lists) via Gcc-patches
On 05/02/2021 12:47, Richard Sandiford wrote: "Andre Vieira (lists)" writes: Hi, As mentioned in the PR, this patch fixes up the nvectors parameter passed to vect_get_loop_mask in vectorizable_condition. Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy separa

Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-03-10 Thread Andre Vieira (lists) via Gcc-patches
On 19/02/2021 15:05, Vladimir Makarov wrote: On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote: Hi, This patch makes sure that allocno copies are not created for unordered modes. The testcases in the PR highlighted a case where an allocno copy was being created for: (insn 121 120 123

[AArch64] PR98657: Fix vec_duplicate creation in SVE's 3

2021-02-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch prevents generating a vec_duplicate with illegal predicate. Regression tested on aarch64-linux-gnu. OK for trunk? gcc/ChangeLog: 2021-02-17  Andre Vieira      PR target/98657     * config/aarch64/aarch64-sve.md: Use 'expand_vector_broadcast' to emit vec_duplicate's  

[PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-19 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch makes sure that allocno copies are not created for unordered modes. The testcases in the PR highlighted a case where an allocno copy was being created for: (insn 121 120 123 11 (parallel [     (set (reg:VNx2QI 217)     (vec_duplicate:VNx2QI (subreg/s/v:QI

Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered

2021-02-22 Thread Andre Vieira (lists) via Gcc-patches
Hi Alex, On 22/02/2021 10:20, Alex Coplan wrote: For the testcase, you might want to use the one I posted most recently: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3 which avoids the dependency on the aarch64-autovec-preference param (which is in GCC 11 only) as this will simplify

[PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi all, This patch adds the ability to define a target hook to unroll the main vectorized loop. It also introduces --param's vect-unroll and vect-unroll-reductions to control this through a command-line. I found this useful to experiment and believe can help when tuning, so I decided to

[PATCH 0/3][vect] Enable vector unrolling of main loop

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi all, This patch series enables unrolling of an unpredicated main vectorized loop based on a target hook. The epilogue loop will have (at least) half the VF of the main loop and can be predicated. Andre Vieira (3): [vect] Add main vectorized loop unrolling [vect] Consider outside costs

[PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch changes the order in which we check outside and inside costs for epilogue loops, this is to ensure that a predicated epilogue is more likely to be picked over an unpredicated one, since it saves having to enter a scalar epilogue loop. gcc/ChangeLog:     *

Re: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-13 Thread Andre Vieira (lists) via Gcc-patches
On 13/10/2021 13:37, Kyrylo Tkachov wrote: Hi Andre, @@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int code) else if (code == POST_MODIFY || code == PRE_MODIFY) { asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0))); - postinc_reg =

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches
.     * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to finish_cost.     * tree-vectorizer.h (finish_cost): Change to pass new class vec_info parameter. On 01/10/2021 09:19, Richard Biener wrote: On Thu, 30 Sep 2021, Andre Vieira (lists) wrote: Hi, That just forces

[arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches
Hi, The way we were previously dealing with addressing modes for MVE was preventing the use of pre, post and offset addressing modes for the normal loads and stores, including widening and narrowing.  This patch fixes that and adds tests to ensure we are capable of using all the available

Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-10-14 Thread Andre Vieira (lists) via Gcc-patches
Hi, I completely forgot I still had this patch out as well, I grouped it together with the unrolling because it was what motivated the change, but it is actually wider applicable and can be reviewed separately. On 17/09/2021 16:32, Andre Vieira (lists) via Gcc-patches wrote: Hi, This patch

Re: FW: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 19/10/2021 00:22, Joseph Myers wrote: On Fri, 15 Oct 2021, Richard Biener via Gcc-patches wrote: On Fri, Sep 24, 2021 at 2:59 PM Jirui Wu via Gcc-patches wrote: Hi, Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html The patch is attached as text for ease of use. Is

Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote: On Mon, 27 Sep 2021, Jirui Wu wrote: Hi all, I now use the type based on the specification of the intrinsic instead of type based on formal argument. I use signed Int vector types because the outputs of the neon builtins that I am

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 15/10/2021 09:48, Richard Biener wrote: On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: Hi Richi, I think this is what you meant, I now hide all the unrolling cost calculations in the existing target hooks for costs. I did need to adjust 'finish_cost' to take the loop_vinfo so

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-09-30 Thread Andre Vieira (lists) via Gcc-patches
Hi, That just forces trying the vector modes we've tried before. Though I might need to revisit this now I think about it. I'm afraid it might be possible for this to generate an epilogue with a vf that is not lower than that of the main loop, but I'd need to think about this again. Either

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Andre Vieira (lists) via Gcc-patches
Hi Richi, Thanks for the review, see below some questions. On 21/09/2021 13:30, Richard Biener wrote: On Fri, 17 Sep 2021, Andre Vieira (lists) wrote: Hi all, This patch adds the ability to define a target hook to unroll the main vectorized loop. It also introduces --param's vect-unroll

Re: [vect] Re-analyze all modes for epilogues

2021-12-17 Thread Andre Vieira (lists) via Gcc-patches
Made the suggested changes. Regarding the name change to partial vectors, I agree in the name change since that is the terminology we are using in the loop_vinfo members too, but is there an actual difference between predication/masking and partial vectors that I am missing? OK for trunk?

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-29 Thread Andre Vieira (lists) via Gcc-patches
Hi Richard, Thank you for the review, I've adopted all above suggestions downstream, I am still surprised how many style things I still miss after years of gcc development :( On 17/12/2021 12:44, Richard Sandiford wrote: @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Andre Vieira (lists) via Gcc-patches
On 16/11/2021 12:10, Richard Biener wrote: On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: On 12/11/2021 10:56, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-11 Thread Andre Vieira (lists) via Gcc-patches
Hi, This is the rebased and reworked version of the unroll patch.  I wasn't entirely sure whether I should compare the costs of the unrolled loop_vinfo with the original loop_vinfo it was unrolled of. I did now, but I wasn't too sure whether it was a good idea to... Any thoughts on this?

[AArch64] Enable generation of FRINTNZ instructions

2021-11-11 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings. It also creates a backend pattern to implement them for aarch64 and a match.pd pattern to idiom recognize these. These IFN's (and optabs) represent a truncation towards zero, as if performed by

[committed][AArch64] Fix bootstrap failure due to missing ATTRIBUTE_UNUSED,andsim01,Wed 10-Nov-21 12:58 PM,View with a light background,Like,Reply,Reply all,Forward

2021-11-10 Thread Andre Vieira (lists) via Gcc-patches
Hi, Committed this as obvious. My earlier patch removed the need for the GSI to be used. gcc/ChangeLog:     * config/aarch64/aarch64-builtins.c     (aarch64_general_gimple_fold_builtin): Mark argument as unused. diff --git a/gcc/config/aarch64/aarch64-builtins.c

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-24 Thread Andre Vieira (lists) via Gcc-patches
On 22/11/2021 12:39, Richard Biener wrote: + if (first_loop_vinfo->suggested_unroll_factor > 1) +{ + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-29 Thread Andre Vieira (lists) via Gcc-patches
On 18/11/2021 11:05, Richard Biener wrote: + (if (!flag_trapping_math + && direct_internal_fn_supported_p (IFN_TRUNC, type, +OPTIMIZE_FOR_BOTH)) + (IFN_TRUNC @0) #endif does IFN_FTRUNC_INT preserve the same exceptions as

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches
On 24/11/2021 11:00, Richard Biener wrote: On Wed, 24 Nov 2021, Andre Vieira (lists) wrote: On 22/11/2021 12:39, Richard Biener wrote: + if (first_loop_vinfo->suggested_unroll_factor > 1) +{ + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches
On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-22 Thread Andre Vieira (lists) via Gcc-patches
On 12/11/2021 13:12, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This is the rebased and reworked version of the unroll patch.  I wasn't entirely sure whether I should compare the costs of the unrolled loop_vinfo with the original loop_vinfo it was unrolled

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-22 Thread Andre Vieira (lists) via Gcc-patches
On 18/11/2021 11:05, Richard Biener wrote: @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) trapping behaviour, so require !flag_trapping_math. */ #if GIMPLE (simplify - (float (fix_trunc @0)) - (if (!flag_trapping_math - && types_match (type, TREE_TYPE (@0)) -

[Aarch64] Fix alignment of neon loads & stores in gimple

2021-10-25 Thread Andre Vieira (lists) via Gcc-patches
Hi, This fixes the alignment on the memory access type for neon loads & stores in the gimple lowering. Bootstrap ubsan on aarch64 builds again with this change. 2021-10-25  Andre Vieira  gcc/ChangeLog:     * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin):

Re: [AArch64] Fix NEON load/store gimple lowering and big-endian testisms

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
Thank you both! Here is a reworked version, this OK for trunk?diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a815e4cfbccab692ca688ba87c71b06c304abbfb..e06131a7c61d31c1be3278dcdccc49c3053c78cb 100644 --- a/gcc/config/aarch64/aarch64-builtins.c

[AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
Decided to split the patches up to make it clear that the testisms fixes had nothing to do with the TBAA fix. I'll be committing these two separately First: [AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch This patch reverts the tests for big-endian after the NEON

[AArch64] Fix TBAA information when lowering NEON loads and stores to gimple

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
And second (also added a test): [AArch64] Fix TBAA information when lowering NEON loads and stores to gimple This patch fixes the wrong TBAA information when lowering NEON loads and stores to gimple that showed up when bootstrapping with UBSAN. gcc/ChangeLog:     *

[AArch64] Fix NEON load/store gimple lowering and big-endian testisms

2021-11-04 Thread Andre Vieira (lists) via Gcc-patches
Hi, This should address the ubsan bootstrap build and big-endian testisms reported against the last NEON load/store gimple lowering patch. I also fixed a follow-up issue where the alias information was leading to a bad codegen transformation. The NEON intrinsics specifications do not forbid

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-30 Thread Andre Vieira (lists) via Gcc-patches
On 25/11/2021 12:46, Richard Biener wrote: Oops, my fault, yes, it does. I would suggest to refactor things so that the mode_i = first_loop_i case is there only once. I also wonder if all the argument about starting at 0 doesn't apply to the not unrolled

Re: [vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
On 07/12/2021 11:45, Richard Biener wrote: Can you check whether, give we know the main VF, the epilogue analysis does not start with am autodetected vector mode that needs a too large VF? Hmm struggling to see how we could check this here. AFAIU before we analyze the loop for a given

Re: [vect] Re-analyze all modes for epilogues

2021-12-13 Thread Andre Vieira (lists) via Gcc-patches
Hi, Added an extra step to skip unusable epilogue modes when we know the target does not support predication. This uses a new function 'support_predication_p' that is generated at build time and checks whether the target supports at least one optab that can be used for predicated

[vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
(vect_better_loop_vinfo_p): Round factors up for epilogue costing.     (vect_analyze_loop): Re-analyze all modes for epilogues. gcc/testsuite/ChangeLog:     * gcc.target/aarch64/masked_epilogue.c: New test. On 30/11/2021 13:56, Richard Biener wrote: On Tue, 30 Nov 2021, Andre Vieira (lists

Re: [vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
): Add new member m_suggested_unroll_factor.     (vector_costs::suggested_unroll_factor): New getter function.     (finish_cost): Set return argument suggested_unroll_factor. Regards, Andre On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote: Hi, I've split this particular

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
ping On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote: On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later

Re: [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

2022-01-12 Thread Andre Vieira (lists) via Gcc-patches
On 12/01/2022 11:59, Richard Biener wrote: On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: On 12/01/2022 11:44, Richard Sandiford wrote: Another alternative would be to push autodetected_vector_mode when the length is 1 and keep 1 as the starting point. Richard I'm guessing we would still

[vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

2022-01-12 Thread Andre Vieira (lists) via Gcc-patches
Hi, This a fix for the regression caused by '[vect] Re-analyze all modes for epilogues'. The earlier patch assumed there was always at least one other mode than VOIDmode, but that does not need to be the case. If we are dealing with a target that does not define more modes for

Re: [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

2022-01-12 Thread Andre Vieira (lists) via Gcc-patches
On 12/01/2022 11:44, Richard Sandiford wrote: Another alternative would be to push autodetected_vector_mode when the length is 1 and keep 1 as the starting point. Richard I'm guessing we would still want to skip epilogue vectorization if !VECTOR_MODE_P (autodetected_vector_mode) in that

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches
On 13/01/2022 12:36, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: This time to the list too (sorry for double email) Hi, The original patch '[vect] Re-analyze all modes for epilogues', skipped modes that should not be skipped since it used the vector mode provided

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches
On 13/01/2022 14:25, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: On 13/01/2022 12:36, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: This time to the list too (sorry for double email) Hi, The original patch '[vect] Re-analyze all modes

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-14 Thread Andre Vieira (lists) via Gcc-patches
On 14/01/2022 07:08, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: On 13/01/2022 14:25, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: On 13/01/2022 12:36, Richard Biener wrote: On Thu, 13 Jan 2022, Andre Vieira (lists) wrote: This time

Re: [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

2022-01-12 Thread Andre Vieira (lists) via Gcc-patches
On 12/01/2022 12:57, Richard Biener wrote: On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: On 12/01/2022 11:59, Richard Biener wrote: On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: On 12/01/2022 11:44, Richard Sandiford wrote: Another alternative would be to push

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-10 Thread Andre Vieira (lists) via Gcc-patches
.     * gcc.target/aarch64/frintnz_vec.c: New test. On 03/01/2022 12:18, Richard Biener wrote: On Wed, 29 Dec 2021, Andre Vieira (lists) wrote: Hi Richard, Thank you for the review, I've adopted all above suggestions downstream, I am still surprised how many style things I still miss after years of gcc

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2022-01-10 Thread Andre Vieira (lists) via Gcc-patches
.     (finish_cost): Set return argument suggested_unroll_factor. Regards, Andre On 30/11/2021 13:56, Richard Biener wrote: On Tue, 30 Nov 2021, Andre Vieira (lists) wrote: On 25/11/2021 12:46, Richard Biener wrote: Oops, my fault, yes, it does. I would suggest to refactor things so that the mode_i

Re: [arm] MVE: Relax addressing modes for full loads and stores

2022-03-07 Thread Andre Vieira (lists) via Gcc-patches
On 17/01/2022 07:48, Christophe Lyon wrote: Hi André, On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches wrote: Hi Christophe, This patch relaxes the addressing modes for the mve full load and stores (by full loads and stores I mean non-widening

[aarch64] Enable FP16 feature by default for Armv9

2022-03-08 Thread Andre Vieira (lists) via Gcc-patches
Hi all, This patch adds the feature bit for FP16 to the feature set for Armv9 since Armv9 requires SVE to be implemented and SVE requires FP16 to be implemented. 2022-03-04  Andre Vieira      * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16 feature bit. diff --git

[aarch64] Update reg-costs to differentiate between memmove costs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
This patch introduces a struct to differentiate between different memmove costs to enable a better modeling of memory operations. These have been modelled for -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for all other tunings all entries are equal to the old single memmove

[aarch64] Add Neoverse N2 tuning structs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch adds tuning structures for Neoverse N2. 2022-03-16  Tamar Christina                 Andre Vieira     * config/aarch64/aarch64.cc (neoversen2_addrcost_table, neoversen2_regmove_cost,     neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost,

[aarch64] Update Neoverse N2 core definition

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, As requested, I updated the Neoverse N2 entry to use the AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the ARCH_INDENT to 9A and moved it under the Armv9 cores. gcc/ChangeLog:     * config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry. diff --git

[aarch64] Add Demeter tuning structs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch adds tuning structs for -mcpu/-mtune=demeter. 2022-03-16  Tamar Christina     Andre Vieira     * config/aarch64/aarch64.cc (demeter_addrcost_table, demeter_regmove_cost,     demeter_advsimd_vector_cost, demeter_sve_vector_cost,

[aarch64] Implement determine_suggested_unroll_factor

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch implements the costing function determine_suggested_unroll_factor for aarch64. It determines the unrolling factor by dividing the number of X operations we can do per cycle by the number of X operations in the loop body, taking this information from the vec_ops analysis during

[aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch updates the register move tunings for -mcpu/-mtune={neoverse-v1,neoverse-512tvb}. 2022-03-16  Tamar Christina     Andre Vieira     * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New tuning struct.     (neoversev1_tunings): Use

[aarch64] update reg-costs to include predicate move costs

2022-03-08 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch adds predicate move costs to several SVE enabled cores. 2022-02-25  Tamar Christina     Andre Vieira gcc/ChangeLog:     * config/aarch64/aarch64-protos.h (struct cpu_regmove_cost): Add PR2PR member.     * config/aarch64/aarch64.cc

[PATCH][gcc][middle-end] PR104498: Fix comparing symbol reference

2022-02-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, As reported on PR104498, the issue here is that when compare_base_symbol_refs swaps x and y but doesn't take that into account when computing the distance. This patch makes sure that if x and y are swapped, we correct the distance computation by multiplying it by -1 to end up with the

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-25 Thread Andre Vieira (lists) via Gcc-patches
Andre Vieira (lists)" writes: Hi, This patch implements the costing function determine_suggested_unroll_factor for aarch64. It determines the unrolling factor by dividing the number of X operations we can do per cycle by the number of X operations in the loop body, taking this inform

Re: [aarch64] Update Neoverse N2 core definition

2022-03-24 Thread Andre Vieira (lists) via Gcc-patches
Ping. On 16/03/2022 15:00, Andre Vieira (lists) via Gcc-patches wrote: Hi, As requested, I updated the Neoverse N2 entry to use the AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the ARCH_INDENT to 9A and moved it under the Armv9 cores. gcc/ChangeLog

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-31 Thread Andre Vieira (lists) via Gcc-patches
On 28/03/2022 15:59, Richard Sandiford wrote: "Andre Vieira (lists)" writes: Hi, Addressed all of your comments bar the pred ops one. Is this OK? gcc/ChangeLog:     * config/aarch64/aarch64.cc (aarch64_vector_costs): Define determine_suggested_unroll_factor and m_nos

[arm] MVE: Relax addressing modes for full loads and stores

2022-01-14 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, This patch relaxes the addressing modes for the mve full load and stores (by full loads and stores I mean non-widening or narrowing loads and stores resp). The code before was requiring a LO_REGNUM for these, where this is only a requirement if the load is widening or the store

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-18 Thread Andre Vieira (lists) via Gcc-patches
On 14/01/2022 09:57, Richard Biener wrote: The 'used_vector_modes' is also a heuristic by itself since it registers every vector type we query, not only those that are used in the end ... So it's really all heuristics that can eventually go bad. IMHO remembering the VF that we ended up with

Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-21 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def index 6ba6f211531..920c2a68e4c 100644 --- a/gcc/config/arm/arm-simd-builtin-types.def +++

Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-19 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: At some point during the development of this patch series, it appeared that in some cases the register allocator wants “VPR or general” rather than “VPR or general or FP” (which is the same thing as ALL_REGS). The

<    1   2   3   4   5   6   7   >