Hi All,
This turns an inversion of the sign bit + arithmetic right shift into a
comparison with 0.
i.e.
void fun1(int32_t *x, int n)
{
for (int i = 0; i < (n & -16); i++)
x[i] = (-x[i]) >> 31;
}
now generates:
.L3:
ldr q0, [x0]
cmgtv0.4s, v0.4s, #0
> -Original Message-
> From: Richard Biener
> Sent: Thursday, September 30, 2021 7:18 AM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH 5/7]middle-end Convert bitclear + cmp #0
> into cm
>
> On Wed, 29 Sep 2021, Tamar Chr
Hi All,
This turns a bitwise inverse of an equality comparison with 0 into a compare of
bitwise nonzero (cmtst).
We already have one pattern for cmsts, this adds an additional one which does
not require an additional bitwise and.
i.e.
#include
uint8x8_t bar(int16x8_t abs_row0, int16x8_t
Hi All,
This turns an inversion of the sign bit + arithmetic right shift into a
comparison with 0.
i.e.
void fun1(int32_t *x, int n)
{
for (int i = 0; i < (n & -16); i++)
x[i] = (-x[i]) >> 31;
}
now generates:
.L3:
ldr q0, [x0]
cmgtv0.4s, v0.4s, #0
Hi All,
This optimizes the case where a mask Y which fulfills ~Y + 1 == pow2 is used to
clear a some bits and then compared against 0 into one without the masking and
a compare against a different bit immediate.
We can do this for all unsigned compares and for signed we can do it for
comparisons
Hi All,
This turns truncate operations with a hi/lo pair into a single permute of half
the bit size of the input and just ignoring the top bits (which are truncated
out).
i.e.
void d2 (short * restrict a, int *b, int n)
{
for (int i = 0; i < n; i++)
a[i] = b[i];
}
now generates:
Hi All,
This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is
more optimal because generally compares have a higher throughput than shifts.
On AArch64 the result of the shift would have been either -1 or 0 which is the
results of the compare.
i.e.
void e (int * restrict
Hi All,
When doing a (narrowing) right shift by half the width of the original type then
we are essentially shuffling the top bits from the first number down.
If we have a hi/lo pair we can just use a single shuffle instead of needing two
shifts.
i.e.
typedef short int16_t;
typedef unsigned
Hi All,
This adds a simple pattern for combining right shifts and narrows into
shifted narrows.
i.e.
typedef short int16_t;
typedef unsigned short uint16_t;
void foo (uint16_t * restrict a, int16_t * restrict d, int n)
{
for( int i = 0; i < n; i++ )
d[i] = (a[i] * a[i]) >> 10;
}
now
Hi All,
This patch series is optimizing AArch64 codegen for narrowing operations,
shift and narrow, and some comparisons with bitmasks.
There are more to come but this is the first batch.
This series shows a 2% gain on x264 in SPECCPU2017 and 0.05% size reduction
and shows 5-10% perf gain on
Hi,
Sending a new version of the patch because I noticed the pattern was overriding
the nor pattern.
A second pattern is needed to capture the nor case as combine will match the
longest sequence first. So without this pattern we end up de-optimizing nor
and instead emit two nots. I did not
Hi honored reviewer,
Thanks for the feedback, I hereby submit the new patch:
> > Note: This patch series is working incrementally towards generating the
> most
> > efficient code for this and other loops in small steps.
>
> It looks like this could be done in the vectoriser via an
Hi esteemed reviewer!
> -Original Message-
> From: Richard Sandiford
> Sent: Tuesday, August 31, 2021 4:46 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 1/5]AArch64 sve
> -Original Message-
> From: Richard Sandiford
> Sent: Tuesday, August 31, 2021 7:38 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 2/2]AArch64: Add better cost
, *aarch64_faddp_scalar2,
*aarch64_addp_scalar2v2di): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/scalar_faddp.c: New test.
* gcc.target/aarch64/simd/scalar_faddp2.c: New test.
* gcc.target/aarch64/simd/scalar_addp.c: New test.
Co-authored-by: Tamar Christina
Hi Jeff & Richard,
> If you can turn that example into a test, even if it's just in the
> aarch64 directory, that would be helpful
The second patch 2/2 has various tests for this as the cost model had to
be made more accurate for it to work.
>
> As mentioned in the 2/2 thread, I think we
,
*aarch64_addp_scalar2v2di): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/scalar_faddp.c: New test.
* gcc.target/aarch64/simd/scalar_faddp2.c: New test.
* gcc.target/aarch64/simd/scalar_addp.c: New test.
Co-authored-by: Tamar Christina
--- inline copy
> -Original Message-
> From: Richard Sandiford
> Sent: Tuesday, August 31, 2021 5:07 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 2/2]AArch64: Add better cost
> -Original Message-
> From: Richard Sandiford
> Sent: Tuesday, August 31, 2021 4:14 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 2/2]AArch64: Add better cost
Hi All,
The following loop does a conditional reduction using an add:
#include
int32_t f (int32_t *restrict array, int len, int min)
{
int32_t iSum = 0;
for (int i=0; i= min)
iSum += array[i];
}
return iSum;
}
for this we currently generate:
mov z1.b, #0
Hi All,
The following example:
void f11(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? w[i] : y[i];
}
}
Generates currently:
ptrue p2.b, all
ld1dz0.d,
Hi All,
The following example
void f5(float * restrict z0, float * restrict z1, float *restrict x,
float * restrict y, float c, int n)
{
for (int i = 0; i < n; i++) {
float a = x[i];
float b = y[i];
if (a > b) {
z0[i] = a + b;
if (a >
Hi All,
The following example
void f10(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
}
}
generates currently:
ld1dz1.d, p1/z, [x1,
Hi All,
This patch adds extended costing to cost the creation of constants and the
manipulation of constants. The default values provided are based on
architectural expectations and each cost models can be individually tweaked as
needed.
The changes in this patch covers:
* Construction of
Hi All,
This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.
Basically consider the following case:
#include
#include
uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
uint64_t arr[2] = { 0x0942430810234076UL,
Hi All,
As the vectorizer has improved over time in capabilities it has started
over-vectorizing. This has causes regressions in the order of 1-7x on libraries
that Arm produces.
The vector costs actually do make a lot of sense and I don't think that they are
wrong. I think that the costs for
Hi All,
Consider the following case
#include
uint64_t
test4 (uint8x16_t input)
{
uint8x16_t bool_input = vshrq_n_u8(input, 7);
poly64x2_t mask = vdupq_n_p64(0x0102040810204080UL);
poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)bool_input,
0),
Hi All,
The build is broken since a3d3e8c362c2 since it's deleted the ability to pass
vec<> by value and now much be past by reference.
However some language hooks used by AArch64 were not updated and breaks the
build on AArch64. This patch updates these hooks.
However most of the changes are
Hi All,
I believe PR101750 to be a testism. The reduced case accesses h[0] but h is
uninitialized and so the changes added in r12-2523 makes the compiler realize
this and replaces the code with a trap.
This fixes the case by just making the variable static.
regtested on aarch64-none-linux-gnu
> -Original Message-
> From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> Biener
> Sent: Friday, July 30, 2021 12:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford
> Subject: [PATCH] Add emulated gather capability to the vectorizer
>
> This
Hi,
Sorry It looks like I forgot to ask if OK for backport to GCC 9, 10, 11 after
some stew.
Thanks,
Tamar
> -Original Message-
> From: Richard Sandiford
> Sent: Thursday, July 22, 2021 7:11 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earn
sion__ extern __inline uint32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
{
- return __builtin_aarch64_udotv16qi_ (__r, __a, __b);
+ return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
}
__extension__ exter
rn __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+ return __builtin_aarch64_usdot_prodv8qi_suss (__a, __b, __r);
}
__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
{
- return __buil
> -Original Message-
> From: Richard Sandiford
> Sent: Thursday, July 15, 2021 8:35 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer
Hi All,
Fix tests when int == long by using long long instead.
Regtested on aarch64-none-linux-gnu and no issues.
Committed under the obvious rule.
Thanks,
Tamar
gcc/testsuite/ChangeLog:
PR middle-end/101457
* gcc.dg/vect/vect-reduc-dot-19.c: Use long long.
*
> -Original Message-
> From: H.J. Lu
> Sent: Friday, July 16, 2021 3:21 AM
> To: Tamar Christina
> Cc: GCC Patches ; Richard Sandiford
> ; nd
> Subject: Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests
> PR101457
>
> On Thu, Jul 15, 2021
Hi All,
The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.
The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.
However operand[3] is
Hi All,
The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.
The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.
However operand[3] is
Hi All,
There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.
This means we need different patterns
Hi All,
These testcases accidentally contain the wrong signs for the expected values
for the scalar code. The vector code however is correct.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Committed as a trivial fix.
Thanks,
Tamar
gcc/testsuite/ChangeLog:
PR
Christina
Cc: GCC Patches ; Richard Earnshaw
; nd ; Ramana Radhakrishnan
Subject: Re: [PATCH][AArch32]: Correct sdot RTL on aarch32
Hi Tamar,
On Tue, May 25, 2021 at 5:41 PM Tamar Christina via Gcc-patches
mailto:gcc-patches@gcc.gnu.org>> wrote:
Hi All,
The RTL Generated from dot_prod is i
Hi All,
The lines being removed have been updated and merged into a new
condition. But when resolving some conflicts I accidentally
reintroduced them causing some test failes.
This removes them.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Committed as the changes were
> -Original Message-
> From: Bernd Edlinger
> Sent: Wednesday, July 14, 2021 4:56 PM
> To: Tamar Christina ; Michael Matz
>
> Cc: gcc-patches@gcc.gnu.org; Richard Biener
> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
>
> On 7/14/
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, July 14, 2021 2:19 PM
> To: Tamar Christina
> Cc: Michael Matz ; Bernd Edlinger
> ; Richard Biener ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c ear
Hi,
Ever since this commit
commit c9114f2804b91690e030383de15a24e0b738e856
Author: Bernd Edlinger
Date: Fri May 28 06:27:27 2021 +0200
Various tools have been having trouble with cross compilation resulting in
make[2]: *** No rule to make target
Hi Martin,
> -Original Message-
> From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Martin Liška
> Sent: Tuesday, June 29, 2021 11:09 AM
> To: Joseph Myers
> Cc: GCC Development ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Port GCC documentation to Sphinx
>
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, July 12, 2021 11:26 AM
> To: Tamar Christina
> Cc: Richard Biener ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, July 12, 2021 10:39 AM
> To: Tamar Christina
> Cc: Richard Biener ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for
Hi,
> Richard Sandiford writes:
> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> >>/* FORNOW. Can continue analyzing the def-use chain when this stmt in
> a phi
> >> inside the loop (in case we are analyzing an outer-loop). */
> >>
> -Original Message-
> From: Richard Sandiford
> Sent: Wednesday, June 30, 2021 6:55 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple compar
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, June 14, 2021 4:55 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple compar
> -Original Message-
> From: Richard Biener
> Sent: Tuesday, June 22, 2021 1:08 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH]middle-end[RFC] slp: new implementation of complex
> numbers
>
> On Mon, 21 Jun 2021, Tamar
Hi Richi,
This patch is still very much incomplete and I do know that it is missing things
but it's complete enough such that examples are working and allows me to show
what I'm working towards.
note, that this approach will remove a lot of code in tree-vect-slp-patterns but
to keep the diff
Ping
> -Original Message-
> From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Monday, June 14, 2021 1:06 PM
> To: Richard Sandiford
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Biener
>
> Sub
> -Original Message-
> From: Richard Biener
> Sent: Thursday, June 17, 2021 10:45 AM
> To: gcc-patches@gcc.gnu.org
> Cc: hongtao@intel.com; ubiz...@gmail.com; Tamar Christina
>
> Subject: [PATCH][RFC] Add x86 subadd SLP pattern
>
> This addds SLP pattern r
Hi Richard,
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, June 14, 2021 3:50 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix m
Hi All,
This RFC is trying to address the following inefficiency when vectorizing
conditional statements with SVE.
Consider the case
void f10(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] =
Hi,
Just adding 7 more tests, I will assume the OK still stands as it's more of the
same.
Thanks,
Tamar
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, May 26, 2021 9:41 AM
> To: Tamar Christina
> Cc: nd
> Subject: RE: [PATCH 4/4]middle-end: Ad
Hi Richard,
I've attached a new version of the patch with the changes.
I have also added 7 new tests in the testsuite to check the cases you mentioned.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* optabs.def
Hi Richi,
> -Original Message-
> From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> Biener
> Sent: Wednesday, June 9, 2021 1:53 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford
> Subject: [PATCH] tree-optimization/100981 - fix SLP patterns
type, subtype))
return NULL;
/* Get the inputs in the appropriate types. */
tree mult_oprnd[2];
vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
- unprom0, half_vectype);
+ unprom0, half_vectype, true);
var = vect_recog_temp_ssa_var
Hi All,
A late change in the patch changed the implemented ID to one that
hasn't been used yet to avoid any ambiguity. Unfortunately the
chosen value of 0xFF matches the value of -1 which is used as an
invalid implementer so the test started failing.
This patch changes it to 0xFE which is the
> Sent: Tuesday, June 1, 2021 3:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Christophe Lyon ; Tamar Christina
> ; Kyrylo Tkachov
> Subject: [PATCH] ARM: reset arm_fp16_format
>
> Hello.
>
> The patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98636#c20
> where
Ping,
Did you have any comments Richard S?
Otherwise I'll proceed with respining according to Richi's comments.
Regards,
Tamar
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, May 26, 2021 9:57 AM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.or
Think list got dropped on my last reply.
Forwarding to archive the OK.
From: Richard Biener
Sent: Wednesday, May 26, 2021 9:40 AM
To: Tamar Christina
Cc: nd
Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign
differing dotproduct
Forgot to include the list
> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, May 25, 2021 3:57 PM
> To: Tamar Christina
> Cc: Richard Earnshaw ; nd ;
> Ramana Radhakrishnan ; Kyrylo Tkachov
>
> Subject: RE: [PATCH 3/4][AArch32]: Add support for sign
Forgot the list...
-Original Message-
From: Tamar Christina
Sent: Tuesday, May 25, 2021 3:58 PM
To: Tamar Christina
Cc: nd ; rguent...@suse.de
Subject: RE: [PATCH 4/4]middle-end: Add tests middle end generic tests for sign
differing dotproduct.
Hi All,
Adding a few more tests
Hi All,
The current RTL for the vectorizer patterns for dot-product are incorrect.
Operand3 isn't an output parameter so we can't write to it.
This fixes this issue and reduces the number of RTL.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master? And backport to GCC
Hi All,
The RTL Generated from dot_prod is invalid as operand3 cannot be
written to, it's a normal input. For the expand it's just another operand
but the caller does not expect it to be written to.
Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
Ok for master? and backport
.
(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> -Original Message-
> From: Richard Biener
> Sent: Monday, May 10, 2021 2:29 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the
Hi Richard,
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, May 10, 2021 5:49 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH 2/4]AArch64: Add suppor
Hi All,
[rebased patch for GCC-9 but the same as the others]
The Linux kernel has removed the interface to cyclades from
the latest kernel headers[1] due to them being orphaned for the
past 13 years.
libsanitizer uses this header when compiling against glibc, but
glibcs itself doesn't seem to
Hi All,
libsanitizer: Guard cyclades inclusion in sanitizer
The Linux kernel has removed the interface to cyclades from
the latest kernel headers[1] due to them being orphaned for the
past 13 years.
libsanitizer uses this header when compiling against glibc, but
glibcs itself doesn't seem to
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, May 10, 2021 5:31 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov
> Subject: Re: [PATCH]AArch64: Have -mcpu=native and -march=na
> -Original Message-
> From: Richard Biener
> Sent: Monday, May 10, 2021 12:40 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
&g
Hi Richi,
> -Original Message-
> From: Richard Biener
> Sent: Friday, May 7, 2021 12:46 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant change
> -Original Message-
> From: Christophe Lyon
> Sent: Thursday, May 6, 2021 10:23 AM
> To: Tamar Christina
> Cc: gcc Patches ; nd
> Subject: Re: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> product usdot for NEON.
>
> On Wed, 5 May 2021 at 19:
Forgot to CC maintainers..
-Original Message-
From: Tamar Christina
Sent: Wednesday, May 5, 2021 6:39 PM
To: gcc-patches@gcc.gnu.org
Cc: nd
Subject: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot
for NEON.
Hi All,
This adds optabs implementing usdot_prod
Hi All,
This adds testcases to test for auto-vect detection of the new sign differing
dot product.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
Hi All,
This adds optabs implementing usdot_prod.
The following testcase:
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char
Hi All,
This adds optabs implementing usdot_prod.
The following testcase:
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char
Hi All,
This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define
Hi All,
There's no reason that the sign of the operands of dot-product have to all be
the same. The only restriction really is that the sign of the multiplicands
are the same, however the sign between the multiplier and the accumulator need
not be the same.
The type of the overall operations
Hi All,
Currently when using -mcpu=native or -march=native on a CPU that is unknown to
the compiler the compiler currently just used -march=armv8-a and enables none
of the extensions.
To make this a bit more useful this patch changes it to still use -march=armv8.a
but to enable the extensions.
Hi Richard,
The 04/16/2021 12:23, Richard Sandiford wrote:
> Tamar Christina writes:
> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> > b/gcc/config/aarch64/aarch64-sve.md
> > index
> > 7db2938bb84e04d066a7b07574e5cf344a3a8fb6..2cdc6338902216760622a39b1
Hi All,
The attached testcase generates the following paradoxical subregs when creating
the predicates.
(insn 22 21 23 2 (set (reg:VNx8BI 100)
(subreg:VNx8BI (reg:VNx2BI 103) 0))
(expr_list:REG_EQUAL (const_vector:VNx8BI [
(const_int 1 [0x1])
Hi Richi,
TWO_OPERANDS allows any order or number of combinations of + and - operations
but the pattern matcher only supports pairs of operations.
This patch has the pattern matcher for complex numbers reject SLP trees where
the lanes are not a multiple of 2.
Bootstrapped Regtested on
Hi Richi,
The attach testcase ICEs because as you showed on the PR we have one child
which is an internal with a PERM of EVENEVEN and one with TOP.
The problem is while we can conceptually merge the permute itself into EVENEVEN,
merging the lanes don't really make sense.
That said, we no longer
Hi All,
g:fcefc59befd396267b824c170b6a37acaf10874e introduced a new variable named
arg_type which shadows the function scoped one.
The function scoped one is now unused and so causes bootstrap to fail due to
-Werror.
This patch removes the unused variable.
Bootstrapped aarch64-none-linux-gnu
Hi All,
The given testcase shows that one of the children of the complex MUL contains a
PHI node. This results in the vectorizer having a child that's (nil).
The pattern matcher handles this correctly, but optimize_load_redistribution_1
needs to not traverse/inspect the NULL nodes.
This
Hi All,
This patch disables the test for PR99149 on Big-endian
where for standard AArch64 the patterns are disabled.
Regtested on aarch64-none-linux-gnu and no issues.
Committed under the obvious rule.
Thanks,
Tamar
gcc/testsuite/ChangeLog:
PR tree-optimization/99149
*
> -Original Message-
> From: Christophe Lyon
> Sent: Wednesday, February 24, 2021 2:17 PM
> To: Richard Biener
> Cc: Tamar Christina ; nd ; gcc
> Patches
> Subject: Re: [PATCH v2] middle-end slp: fix sharing of SLP only patterns.
>
> On Wed, 24 Feb 2021 at 0
when it's about to be deleted.
gcc/testsuite/ChangeLog:
PR tree-optimization/99220
* g++.dg/vect/pr99220.cc: New test.
The 02/24/2021 08:52, Richard Biener wrote:
> On Tue, 23 Feb 2021, Tamar Christina wrote:
>
> > Hi Richi,
> >
> > The attached testca
Hi Richi,
The attached testcase shows a bug where two nodes end up with the same pointer.
During the loop that analyzes all the instances
in optimize_load_redistribution_1 we do
if (value)
{
SLP_TREE_REF_COUNT (value)++;
SLP_TREE_CHILDREN (root)[i] = value;
Ps. The code in comment is needed, but was commented out due to the early free
issue described in the patch.
Thanks,
Tamar
> -Original Message-
> From: Tamar Christina
> Sent: Friday, February 19, 2021 2:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de
&g
Hi Richi,
The attached testcase ICEs due to a couple of issues.
In the testcase you have two SLP instances that share the majority of their
definition with each other. One tree defines a COMPLEX_MUL sequence and the
other tree a COMPLEX_FMA.
The ice happens because:
1. the refcounts are wrong,
Hi Honza and Martin,
As explained in PR98782 the problem is that the behavior of IRA is quite tightly
bound to the frequencies that IPA puts out. With the change introduced in
g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 some, but not all edge predictions
changed. This introduces a local problem
Hi All,
Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way
to dissolve the SLP only patterns during SLP cancellation. However it seems
like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what
I expected.
Namely that the non-SLP path can still
Hi All,
g:87301e3956d44ad45e384a8eb16c79029d20213a and
g:ee4c4fe289e768d3c6b6651c8bfa3fdf458934f4 changed the intrinsics to be
proper RTL but accidentally ended up creating a regression because of the
ordering in the RTL pattern.
The existing RTL that combine should try to match to remove the
Hi All,
This adds implementation for the optabs for complex operations. With this the
following C code:
void g (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] * b[i];
}
generates
NEON:
701 - 800 of 1427 matches
Mail list logo