[PATCH]middle-end convert negate + right shift into compare greater.

2021-10-05 Thread Tamar Christina via Gcc-patches
Hi All, This turns an inversion of the sign bit + arithmetic right shift into a comparison with 0. i.e. void fun1(int32_t *x, int n) { for (int i = 0; i < (n & -16); i++) x[i] = (-x[i]) >> 31; } now generates: .L3: ldr q0, [x0] cmgtv0.4s, v0.4s, #0

RE: [PATCH 5/7]middle-end Convert bitclear + cmp #0 into cm

2021-09-30 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Thursday, September 30, 2021 7:18 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH 5/7]middle-end Convert bitclear + cmp #0 > into cm > > On Wed, 29 Sep 2021, Tamar Chr

[PATCH 7/7]AArch64 Combine cmeq 0 + not into cmtst

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This turns a bitwise inverse of an equality comparison with 0 into a compare of bitwise nonzero (cmtst). We already have one pattern for cmsts, this adds an additional one which does not require an additional bitwise and. i.e. #include uint8x8_t bar(int16x8_t abs_row0, int16x8_t

[PATCH 6/7]AArch64 Add neg + cmle into cmgt

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This turns an inversion of the sign bit + arithmetic right shift into a comparison with 0. i.e. void fun1(int32_t *x, int n) { for (int i = 0; i < (n & -16); i++) x[i] = (-x[i]) >> 31; } now generates: .L3: ldr q0, [x0] cmgtv0.4s, v0.4s, #0

[PATCH 5/7]middle-end Convert bitclear + cmp #0 into cm

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This optimizes the case where a mask Y which fulfills ~Y + 1 == pow2 is used to clear a some bits and then compared against 0 into one without the masking and a compare against a different bit immediate. We can do this for all unsigned compares and for signed we can do it for comparisons

[PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This turns truncate operations with a hi/lo pair into a single permute of half the bit size of the input and just ignoring the top bits (which are truncated out). i.e. void d2 (short * restrict a, int *b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; } now generates:

[PATCH 3/7]AArch64 Add pattern for sshr to cmlt

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is more optimal because generally compares have a higher throughput than shifts. On AArch64 the result of the shift would have been either -1 or 0 which is the results of the compare. i.e. void e (int * restrict

[PATCH 2/7]AArch64 Add combine patterns for narrowing shift of half top bits (shuffle)

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, When doing a (narrowing) right shift by half the width of the original type then we are essentially shuffling the top bits from the first number down. If we have a hi/lo pair we can just use a single shuffle instead of needing two shifts. i.e. typedef short int16_t; typedef unsigned

[PATCH 1/7]AArch64 Add combine patterns for right shift and narrow

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This adds a simple pattern for combining right shifts and narrows into shifted narrows. i.e. typedef short int16_t; typedef unsigned short uint16_t; void foo (uint16_t * restrict a, int16_t * restrict d, int n) { for( int i = 0; i < n; i++ ) d[i] = (a[i] * a[i]) >> 10; } now

[PATCH 0/7]AArch64 Optimize truncation, shifts and bitmask comparisons

2021-09-29 Thread Tamar Christina via Gcc-patches
Hi All, This patch series is optimizing AArch64 codegen for narrowing operations, shift and narrow, and some comparisons with bitmasks. There are more to come but this is the first batch. This series shows a 2% gain on x264 in SPECCPU2017 and 0.05% size reduction and shows 5-10% perf gain on

RE: [PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-09-22 Thread Tamar Christina via Gcc-patches
Hi, Sending a new version of the patch because I noticed the pattern was overriding the nor pattern. A second pattern is needed to capture the nor case as combine will match the longest sequence first. So without this pattern we end up de-optimizing nor and instead emit two nots. I did not

RE: [PATCH 2/5]AArch64 sve: combine nested if predicates

2021-09-21 Thread Tamar Christina via Gcc-patches
Hi honored reviewer, Thanks for the feedback, I hereby submit the new patch: > > Note: This patch series is working incrementally towards generating the > most > > efficient code for this and other loops in small steps. > > It looks like this could be done in the vectoriser via an

RE: [PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-09-16 Thread Tamar Christina via Gcc-patches
Hi esteemed reviewer! > -Original Message- > From: Richard Sandiford > Sent: Tuesday, August 31, 2021 4:46 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 1/5]AArch64 sve

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-09-08 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, August 31, 2021 7:38 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 2/2]AArch64: Add better cost

RE: [PATCH]AArch64 Make use of FADDP in simple reductions.

2021-09-08 Thread Tamar Christina via Gcc-patches
, *aarch64_faddp_scalar2, *aarch64_addp_scalar2v2di): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/scalar_faddp.c: New test. * gcc.target/aarch64/simd/scalar_faddp2.c: New test. * gcc.target/aarch64/simd/scalar_addp.c: New test. Co-authored-by: Tamar Christina

Re: [PATCH 1/2]middle-end Teach CSE to be able to do vector extracts.

2021-09-08 Thread Tamar Christina via Gcc-patches
Hi Jeff & Richard, > If you can turn that example into a test, even if it's just in the > aarch64 directory, that would be helpful The second patch 2/2 has various tests for this as the cost model had to be made more accurate for it to work. > > As mentioned in the 2/2 thread, I think we

[PATCH]AArch64 Make use of FADDP in simple reductions.

2021-09-01 Thread Tamar Christina via Gcc-patches
, *aarch64_addp_scalar2v2di): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/scalar_faddp.c: New test. * gcc.target/aarch64/simd/scalar_faddp2.c: New test. * gcc.target/aarch64/simd/scalar_addp.c: New test. Co-authored-by: Tamar Christina --- inline copy

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, August 31, 2021 5:07 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 2/2]AArch64: Add better cost

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, August 31, 2021 4:14 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 2/2]AArch64: Add better cost

[PATCH 4/5]AArch64 sve: optimize add reduction patterns

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, The following loop does a conditional reduction using an add: #include int32_t f (int32_t *restrict array, int len, int min) { int32_t iSum = 0; for (int i=0; i= min) iSum += array[i]; } return iSum; } for this we currently generate: mov z1.b, #0

[PATCH 3/5]AArch64 sve: do not keep negated mask and inverse mask live at the same time

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, The following example: void f11(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? w[i] : y[i]; } } Generates currently: ptrue p2.b, all ld1dz0.d,

[PATCH 2/5]AArch64 sve: combine nested if predicates

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, The following example void f5(float * restrict z0, float * restrict z1, float *restrict x, float * restrict y, float c, int n) { for (int i = 0; i < n; i++) { float a = x[i]; float b = y[i]; if (a > b) { z0[i] = a + b; if (a >

[PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, The following example void f10(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i]; } } generates currently: ld1dz1.d, p1/z, [x1,

[PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, This patch adds extended costing to cost the creation of constants and the manipulation of constants. The default values provided are based on architectural expectations and each cost models can be individually tweaked as needed. The changes in this patch covers: * Construction of

[PATCH 1/2]middle-end Teach CSE to be able to do vector extracts.

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, This patch gets CSE to re-use constants already inside a vector rather than re-materializing the constant again. Basically consider the following case: #include #include uint64_t test (uint64_t a, uint64x2_t b, uint64x2_t* rt) { uint64_t arr[2] = { 0x0942430810234076UL,

[PATCH]AArch64 RFC: Don't cost all scalar operations during vectorization if scalar will fuse

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, As the vectorizer has improved over time in capabilities it has started over-vectorizing. This has causes regressions in the order of 1-7x on libraries that Arm produces. The vector costs actually do make a lot of sense and I don't think that they are wrong. I think that the costs for

[PATCH]AArch64[RFC] Force complicated constant to memory when beneficial

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All, Consider the following case #include uint64_t test4 (uint8x16_t input) { uint8x16_t bool_input = vshrq_n_u8(input, 7); poly64x2_t mask = vdupq_n_p64(0x0102040810204080UL); poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)bool_input, 0),

[PATCH] middle-end/AArch64 Fix bootstrap after vec changes

2021-08-06 Thread Tamar Christina via Gcc-patches
Hi All, The build is broken since a3d3e8c362c2 since it's deleted the ability to pass vec<> by value and now much be past by reference. However some language hooks used by AArch64 were not updated and breaks the build on AArch64. This patch updates these hooks. However most of the changes are

[PATCH]middle-end Fix trapping access in test PR101750

2021-08-03 Thread Tamar Christina via Gcc-patches
Hi All, I believe PR101750 to be a testism. The reduced case accesses h[0] but h is uninitialized and so the changes added in r12-2523 makes the compiler realize this and replaces the code with a trap. This fixes the case by just making the variable static. regtested on aarch64-none-linux-gnu

RE: [PATCH] Add emulated gather capability to the vectorizer

2021-07-30 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard > Biener > Sent: Friday, July 30, 2021 12:34 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford > Subject: [PATCH] Add emulated gather capability to the vectorizer > > This

RE: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-23 Thread Tamar Christina via Gcc-patches
Hi, Sorry It looks like I forgot to ask if OK for backport to GCC 9, 10, 11 after some stew. Thanks, Tamar > -Original Message- > From: Richard Sandiford > Sent: Thursday, July 22, 2021 7:11 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earn

RE: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-22 Thread Tamar Christina via Gcc-patches
sion__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b) { - return __builtin_aarch64_udotv16qi_ (__r, __a, __b); + return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r); } __extension__ exter

RE: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-22 Thread Tamar Christina via Gcc-patches
rn __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b); + return __builtin_aarch64_usdot_prodv8qi_suss (__a, __b, __r); } __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) { - return __buil

RE: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-20 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Thursday, July 15, 2021 8:35 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer

[PATCH][committed] testsuite: fix IL32 issues with usdot tests.

2021-07-16 Thread Tamar Christina via Gcc-patches
Hi All, Fix tests when int == long by using long long instead. Regtested on aarch64-none-linux-gnu and no issues. Committed under the obvious rule. Thanks, Tamar gcc/testsuite/ChangeLog: PR middle-end/101457 * gcc.dg/vect/vect-reduc-dot-19.c: Use long long. *

RE: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-16 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: H.J. Lu > Sent: Friday, July 16, 2021 3:21 AM > To: Tamar Christina > Cc: GCC Patches ; Richard Sandiford > ; nd > Subject: Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests > PR101457 > > On Thu, Jul 15, 2021

[PATCH 4/4][AArch32]: correct dot-product RTL patterns.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All, The previous fix for this problem was wrong due to a subtle difference between where NEON expects the RMW values and where intrinsics expects them. The insn pattern is modeled after the intrinsics and so needs an expand for the vectorizer optab to switch the RTL. However operand[3] is

[PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All, The previous fix for this problem was wrong due to a subtle difference between where NEON expects the RMW values and where intrinsics expects them. The insn pattern is modeled after the intrinsics and so needs an expand for the vectorizer optab to switch the RTL. However operand[3] is

[PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All, There's a slight mismatch between the vectorizer optabs and the intrinsics patterns for NEON. The vectorizer expects operands[3] and operands[0] to be the same but the aarch64 intrinsics expanders expect operands[0] and operands[1] to be the same. This means we need different patterns

[PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All, These testcases accidentally contain the wrong signs for the expected values for the scalar code. The vector code however is correct. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Committed as a trivial fix. Thanks, Tamar gcc/testsuite/ChangeLog: PR

RE: [PATCH][AArch32]: Correct sdot RTL on aarch32

2021-07-15 Thread Tamar Christina via Gcc-patches
Christina Cc: GCC Patches ; Richard Earnshaw ; nd ; Ramana Radhakrishnan Subject: Re: [PATCH][AArch32]: Correct sdot RTL on aarch32 Hi Tamar, On Tue, May 25, 2021 at 5:41 PM Tamar Christina via Gcc-patches mailto:gcc-patches@gcc.gnu.org>> wrote: Hi All, The RTL Generated from dot_prod is i

[PATCH][committed] :wqmiddle-end Vect: correct rebase issue

2021-07-14 Thread Tamar Christina via Gcc-patches
Hi All, The lines being removed have been updated and merged into a new condition. But when resolving some conflicts I accidentally reintroduced them causing some test failes. This removes them. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Committed as the changes were

RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Bernd Edlinger > Sent: Wednesday, July 14, 2021 4:56 PM > To: Tamar Christina ; Michael Matz > > Cc: gcc-patches@gcc.gnu.org; Richard Biener > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier > > On 7/14/

RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Wednesday, July 14, 2021 2:19 PM > To: Tamar Christina > Cc: Michael Matz ; Bernd Edlinger > ; Richard Biener ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c ear

RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches
Hi, Ever since this commit commit c9114f2804b91690e030383de15a24e0b738e856 Author: Bernd Edlinger Date: Fri May 28 06:27:27 2021 +0200 Various tools have been having trouble with cross compilation resulting in make[2]: *** No rule to make target

RE: [PATCH] Port GCC documentation to Sphinx

2021-07-13 Thread Tamar Christina via Gcc-patches
Hi Martin, > -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Martin Liška > Sent: Tuesday, June 29, 2021 11:09 AM > To: Joseph Myers > Cc: GCC Development ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Port GCC documentation to Sphinx >

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Monday, July 12, 2021 11:26 AM > To: Tamar Christina > Cc: Richard Biener ; nd ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Monday, July 12, 2021 10:39 AM > To: Tamar Christina > Cc: Richard Biener ; nd ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches
Hi, > Richard Sandiford writes: > >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info > *vinfo, > >>/* FORNOW. Can continue analyzing the def-use chain when this stmt in > a phi > >> inside the loop (in case we are analyzing an outer-loop). */ > >>

RE: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands

2021-07-01 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, June 30, 2021 6:55 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple compar

RE: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands

2021-06-30 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Monday, June 14, 2021 4:55 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple compar

RE: [PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-22 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Tuesday, June 22, 2021 1:08 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH]middle-end[RFC] slp: new implementation of complex > numbers > > On Mon, 21 Jun 2021, Tamar

[PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-21 Thread Tamar Christina via Gcc-patches
Hi Richi, This patch is still very much incomplete and I do know that it is missing things but it's complete enough such that examples are working and allows me to show what I'm working towards. note, that this approach will remove a lot of code in tree-vect-slp-patterns but to keep the diff

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-21 Thread Tamar Christina via Gcc-patches
Ping > -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar > Christina via Gcc-patches > Sent: Monday, June 14, 2021 1:06 PM > To: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Biener > > Sub

RE: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-17 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Thursday, June 17, 2021 10:45 AM > To: gcc-patches@gcc.gnu.org > Cc: hongtao@intel.com; ubiz...@gmail.com; Tamar Christina > > Subject: [PATCH][RFC] Add x86 subadd SLP pattern > > This addds SLP pattern r

RE: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands

2021-06-14 Thread Tamar Christina via Gcc-patches
Hi Richard, > -Original Message- > From: Richard Sandiford > Sent: Monday, June 14, 2021 3:50 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH][RFC]AArch64 SVE: Fix m

[PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands

2021-06-14 Thread Tamar Christina via Gcc-patches
Hi All, This RFC is trying to address the following inefficiency when vectorizing conditional statements with SVE. Consider the case void f10(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] =

RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

2021-06-14 Thread Tamar Christina via Gcc-patches
Hi, Just adding 7 more tests, I will assume the OK still stands as it's more of the same. Thanks, Tamar > -Original Message- > From: Richard Biener > Sent: Wednesday, May 26, 2021 9:41 AM > To: Tamar Christina > Cc: nd > Subject: RE: [PATCH 4/4]middle-end: Ad

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-14 Thread Tamar Christina via Gcc-patches
Hi Richard, I've attached a new version of the patch with the changes. I have also added 7 new tests in the testsuite to check the cases you mentioned. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * optabs.def

RE: [PATCH] tree-optimization/100981 - fix SLP patterns involving reductions

2021-06-09 Thread Tamar Christina via Gcc-patches
Hi Richi, > -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard > Biener > Sent: Wednesday, June 9, 2021 1:53 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford > Subject: [PATCH] tree-optimization/100981 - fix SLP patterns

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-04 Thread Tamar Christina via Gcc-patches
type, subtype)) return NULL; /* Get the inputs in the appropriate types. */ tree mult_oprnd[2]; vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type, - unprom0, half_vectype); + unprom0, half_vectype, true); var = vect_recog_temp_ssa_var

[PATCH][committed]AArch64 Fix failing testcase for native cpu detection

2021-06-03 Thread Tamar Christina via Gcc-patches
Hi All, A late change in the patch changed the implemented ID to one that hasn't been used yet to avoid any ambiguity. Unfortunately the chosen value of 0xFF matches the value of -1 which is used as an invalid implementer so the test started failing. This patch changes it to 0xFE which is the

RE: [PATCH] ARM: reset arm_fp16_format

2021-06-02 Thread Tamar Christina via Gcc-patches
> Sent: Tuesday, June 1, 2021 3:06 PM > To: gcc-patches@gcc.gnu.org > Cc: Christophe Lyon ; Tamar Christina > ; Kyrylo Tkachov > Subject: [PATCH] ARM: reset arm_fp16_format > > Hello. > > The patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98636#c20 > where

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-02 Thread Tamar Christina via Gcc-patches
Ping, Did you have any comments Richard S? Otherwise I'll proceed with respining according to Richi's comments. Regards, Tamar > -Original Message- > From: Richard Biener > Sent: Wednesday, May 26, 2021 9:57 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.or

Re: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

2021-05-26 Thread Tamar Christina via Gcc-patches
Think list got dropped on my last reply. Forwarding to archive the OK. From: Richard Biener Sent: Wednesday, May 26, 2021 9:40 AM To: Tamar Christina Cc: nd Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct

RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-25 Thread Tamar Christina via Gcc-patches
Forgot to include the list > -Original Message- > From: Tamar Christina > Sent: Tuesday, May 25, 2021 3:57 PM > To: Tamar Christina > Cc: Richard Earnshaw ; nd ; > Ramana Radhakrishnan ; Kyrylo Tkachov > > Subject: RE: [PATCH 3/4][AArch32]: Add support for sign

FW: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

2021-05-25 Thread Tamar Christina via Gcc-patches
Forgot the list... -Original Message- From: Tamar Christina Sent: Tuesday, May 25, 2021 3:58 PM To: Tamar Christina Cc: nd ; rguent...@suse.de Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct. Hi All, Adding a few more tests

[PATCH]AArch64: Correct dot-product auto-vect optab RTL

2021-05-25 Thread Tamar Christina via Gcc-patches
Hi All, The current RTL for the vectorizer patterns for dot-product are incorrect. Operand3 isn't an output parameter so we can't write to it. This fixes this issue and reduces the number of RTL. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? And backport to GCC

[PATCH][AArch32]: Correct sdot RTL on aarch32

2021-05-25 Thread Tamar Christina via Gcc-patches
Hi All, The RTL Generated from dot_prod is invalid as operand3 cannot be written to, it's a normal input. For the expand it's just another operand but the caller does not expect it to be written to. Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues. Ok for master? and backport

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-25 Thread Tamar Christina via Gcc-patches
. (vect_recog_dot_prod_pattern): Support usdot_prod_optab. > -Original Message- > From: Richard Biener > Sent: Monday, May 10, 2021 2:29 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for the

RE: [PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.

2021-05-25 Thread Tamar Christina via Gcc-patches
Hi Richard, > -Original Message- > From: Richard Sandiford > Sent: Monday, May 10, 2021 5:49 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 2/4]AArch64: Add suppor

[PATCH][GCC-9][committed][libsanitizer]: Remove cyclades from libsanitizer

2021-05-21 Thread Tamar Christina via Gcc-patches
Hi All, [rebased patch for GCC-9 but the same as the others] The Linux kernel has removed the interface to cyclades from the latest kernel headers[1] due to them being orphaned for the past 13 years. libsanitizer uses this header when compiling against glibc, but glibcs itself doesn't seem to

[PATCH][libsanitizer]: Guard cyclades inclusion in sanitizer

2021-05-20 Thread Tamar Christina via Gcc-patches
Hi All, libsanitizer: Guard cyclades inclusion in sanitizer The Linux kernel has removed the interface to cyclades from the latest kernel headers[1] due to them being orphaned for the past 13 years. libsanitizer uses this header when compiling against glibc, but glibcs itself doesn't seem to

RE: [PATCH]AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown

2021-05-10 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Sandiford > Sent: Monday, May 10, 2021 5:31 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH]AArch64: Have -mcpu=native and -march=na

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-10 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Monday, May 10, 2021 12:40 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for the multiplicant changes. &g

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-07 Thread Tamar Christina via Gcc-patches
Hi Richi, > -Original Message- > From: Richard Biener > Sent: Friday, May 7, 2021 12:46 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for the multiplicant change

RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-06 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Christophe Lyon > Sent: Thursday, May 6, 2021 10:23 AM > To: Tamar Christina > Cc: gcc Patches ; nd > Subject: Re: [PATCH 3/4][AArch32]: Add support for sign differing dot- > product usdot for NEON. > > On Wed, 5 May 2021 at 19:

FW: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-05 Thread Tamar Christina via Gcc-patches
Forgot to CC maintainers.. -Original Message- From: Tamar Christina Sent: Wednesday, May 5, 2021 6:39 PM To: gcc-patches@gcc.gnu.org Cc: nd Subject: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON. Hi All, This adds optabs implementing usdot_prod

[PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, This adds testcases to test for auto-vect detection of the new sign differing dot product. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

[PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, This adds optabs implementing usdot_prod. The following testcase: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char

[PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, This adds optabs implementing usdot_prod. The following testcase: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char

[PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, This patch adds support for a dot product where the sign of the multiplication arguments differ. i.e. one is signed and one is unsigned but the precisions are the same. #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define

[PATCH] Vect: Remove restrictions on dotprod signedness

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, There's no reason that the sign of the operands of dot-product have to all be the same. The only restriction really is that the sign of the multiplicands are the same, however the sign between the multiplier and the accumulator need not be the same. The type of the overall operations

[PATCH]AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All, Currently when using -mcpu=native or -march=native on a CPU that is unknown to the compiler the compiler currently just used -march=armv8-a and enables none of the extensions. To make this a bit more useful this patch changes it to still use -march=armv8.a but to enable the extensions.

Re: [PATCH] AArch64 SVE: Fix wrong sve predicate split (PR100048)

2021-04-16 Thread Tamar Christina via Gcc-patches
Hi Richard, The 04/16/2021 12:23, Richard Sandiford wrote: > Tamar Christina writes: > > diff --git a/gcc/config/aarch64/aarch64-sve.md > > b/gcc/config/aarch64/aarch64-sve.md > > index > > 7db2938bb84e04d066a7b07574e5cf344a3a8fb6..2cdc6338902216760622a39b1

[PATCH] AArch64 SVE: Fix wrong sve predicate split (PR100048)

2021-04-16 Thread Tamar Christina via Gcc-patches
Hi All, The attached testcase generates the following paradoxical subregs when creating the predicates. (insn 22 21 23 2 (set (reg:VNx8BI 100) (subreg:VNx8BI (reg:VNx2BI 103) 0)) (expr_list:REG_EQUAL (const_vector:VNx8BI [ (const_int 1 [0x1])

[PATCH] slp: reject non-multiple of 2 laned SLP trees (PR99825)

2021-03-30 Thread Tamar Christina via Gcc-patches
Hi Richi, TWO_OPERANDS allows any order or number of combinations of + and - operations but the pattern matcher only supports pairs of operations. This patch has the pattern matcher for complex numbers reject SLP trees where the lanes are not a multiple of 2. Bootstrapped Regtested on

[PATCH] slp: remove unneeded permute calculation (PR99656)

2021-03-19 Thread Tamar Christina via Gcc-patches
Hi Richi, The attach testcase ICEs because as you showed on the PR we have one child which is an internal with a PERM of EVENEVEN and one with TOP. The problem is while we can conceptually merge the permute itself into EVENEVEN, merging the lanes don't really make sense. That said, we no longer

[PATCH][committed]AArch64 Fix -Werror issue in aarch64_simd_clone_compute_vecsize_and_simdlen

2021-03-17 Thread Tamar Christina via Gcc-patches
Hi All, g:fcefc59befd396267b824c170b6a37acaf10874e introduced a new variable named arg_type which shadows the function scoped one. The function scoped one is now unused and so causes bootstrap to fail due to -Werror. This patch removes the unused variable. Bootstrapped aarch64-none-linux-gnu

[PATCH][committed]middle-end slp: Don't traverse tree on (nil) nodes.

2021-02-25 Thread Tamar Christina via Gcc-patches
Hi All, The given testcase shows that one of the children of the complex MUL contains a PHI node. This results in the vectorizer having a child that's (nil). The pattern matcher handles this correctly, but optimize_load_redistribution_1 needs to not traverse/inspect the NULL nodes. This

[PATCH][comitted] Testsuite: Disable PR99149 test on big-endian

2021-02-24 Thread Tamar Christina via Gcc-patches
Hi All, This patch disables the test for PR99149 on Big-endian where for standard AArch64 the patterns are disabled. Regtested on aarch64-none-linux-gnu and no issues. Committed under the obvious rule. Thanks, Tamar gcc/testsuite/ChangeLog: PR tree-optimization/99149 *

RE: [PATCH v2] middle-end slp: fix sharing of SLP only patterns.

2021-02-24 Thread Tamar Christina via Gcc-patches
> -Original Message- > From: Christophe Lyon > Sent: Wednesday, February 24, 2021 2:17 PM > To: Richard Biener > Cc: Tamar Christina ; nd ; gcc > Patches > Subject: Re: [PATCH v2] middle-end slp: fix sharing of SLP only patterns. > > On Wed, 24 Feb 2021 at 0

Re: [PATCH]middle-end slp: fix accidental resource re-use of slp_tree (PR99220)

2021-02-24 Thread Tamar Christina via Gcc-patches
when it's about to be deleted. gcc/testsuite/ChangeLog: PR tree-optimization/99220 * g++.dg/vect/pr99220.cc: New test. The 02/24/2021 08:52, Richard Biener wrote: > On Tue, 23 Feb 2021, Tamar Christina wrote: > > > Hi Richi, > > > > The attached testca

[PATCH]middle-end slp: fix accidental resource re-use of slp_tree (PR99220)

2021-02-23 Thread Tamar Christina via Gcc-patches
Hi Richi, The attached testcase shows a bug where two nodes end up with the same pointer. During the loop that analyzes all the instances in optimize_load_redistribution_1 we do if (value) { SLP_TREE_REF_COUNT (value)++; SLP_TREE_CHILDREN (root)[i] = value;

RE: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)

2021-02-19 Thread Tamar Christina via Gcc-patches
Ps. The code in comment is needed, but was commented out due to the early free issue described in the patch. Thanks, Tamar > -Original Message- > From: Tamar Christina > Sent: Friday, February 19, 2021 2:42 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; rguent...@suse.de &g

[PATCH] slp: fix sharing of SLP only patterns. (PR99149)

2021-02-19 Thread Tamar Christina via Gcc-patches
Hi Richi, The attached testcase ICEs due to a couple of issues. In the testcase you have two SLP instances that share the majority of their definition with each other. One tree defines a COMPLEX_MUL sequence and the other tree a COMPLEX_FMA. The ice happens because: 1. the refcounts are wrong,

[PATCH][GCC] IPA: Optionally allow double costing, restoring GCC 10 behavior

2021-02-10 Thread Tamar Christina via Gcc-patches
Hi Honza and Martin, As explained in PR98782 the problem is that the behavior of IRA is quite tightly bound to the frequencies that IPA puts out. With the change introduced in g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 some, but not all edge predictions changed. This introduces a local problem

[PATCH]middle-end slp: Split out patterns away from using SLP_ONLY into their own flag

2021-02-02 Thread Tamar Christina via Gcc-patches
Hi All, Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way to dissolve the SLP only patterns during SLP cancellation. However it seems like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what I expected. Namely that the non-SLP path can still

[PATCH]AArch64 Change canonization of smlal and smlsl in order to be able to optimize the vec_dup

2021-02-01 Thread Tamar Christina via Gcc-patches
Hi All, g:87301e3956d44ad45e384a8eb16c79029d20213a and g:ee4c4fe289e768d3c6b6651c8bfa3fdf458934f4 changed the intrinsics to be proper RTL but accidentally ended up creating a regression because of the ordering in the RTL pattern. The existing RTL that combine should try to match to remove the

[PATCH]Arm: Add NEON and MVE complex mul, mla and mls patterns.

2021-01-21 Thread Tamar Christina via Gcc-patches
Hi All, This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON:

<    3   4   5   6   7   8   9   10   11   12   >