[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #18)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7cf9504398c..8deeecfd4aa 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
> && rhs_code.is_tree_code ()
> && (TREE_CODE_CLASS (tree_code (first_stmt_code))
> == tcc_comparison)
> -   && (swap_tree_comparison (tree_code (first_stmt_code))
> -   == tree_code (rhs_code)))
> +   && ((swap_tree_comparison (tree_code (first_stmt_code))
> +== tree_code (rhs_code))
> +   || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
> +== tcc_comparison)
> +   && rhs_code == alt_stmt_code)))
>&& !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
> && (first_stmt_code == ARRAY_REF
> || first_stmt_code == BIT_FIELD_REF
> 


diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7cf9504398c..e35a3fa 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1519,7 +1522,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
   if (alt_stmt_code != ERROR_MARK
   && (!alt_stmt_code.is_tree_code ()
  || (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_reference
- && TREE_CODE_CLASS (tree_code (alt_stmt_code)) !=
tcc_comparison)))
+ && (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_comparison
+ || (swap_tree_comparison (tree_code (first_stmt_code))
+ != tree_code (alt_stmt_code))
 {
   *two_operators = true;
 }

is also needed btw. to avoid wrong-code.  I see

t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
t.c:8:26: missed:   unsupported load permutation
t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29 =
*_28;
t.c:8:26: note:   removing SLP instance operations starting from: .MASK_STORE
(_5, 8B, patt_12, pretmp_29);

using -O3 -march=armv8.3-a+sve - it then does

t.c:8:26: missed:  unsupported SLP instances
t.c:8:26: note:  re-trying with SLP disabled

and _that_ fails then with

t.c:8:26: missed:   Not using elementwise accesses due to variable
vectorization factor.
t.c:6:1: missed:   not vectorized: relevant stmt not supported: .MASK_STORE
(_5, 8B, patt_12, pretmp_29);

but the interesting bit is why it fails to handle the SLP case.

That's possibly because the load isn't a grouped access, we get
dr_group_size == 1 and group_size == 2 and nunits is {16, 16}
(!repeating_p) and so

  /* We need to construct a separate mask for each vector statement.  */
  unsigned HOST_WIDE_INT const_nunits, const_vf;
  if (!nunits.is_constant (_nunits)
  || !vf.is_constant (_vf))
return false;

I'm not sure what that comment means, but supposedly we simply fail to handle
another special case that we could here?  Possibly dr_group_size == 1?

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #22 from rguenther at suse dot de  ---
> Am 15.02.2024 um 19:53 schrieb tnfchris at gcc dot gnu.org 
> :
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56
> 
> --- Comment #21 from Tamar Christina  ---
> (In reply to Richard Biener from comment #18)
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 7cf9504398c..8deeecfd4aa 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
>> *swap,
>>&& rhs_code.is_tree_code ()
>>&& (TREE_CODE_CLASS (tree_code (first_stmt_code))
>>== tcc_comparison)
>> -   && (swap_tree_comparison (tree_code (first_stmt_code))
>> -   == tree_code (rhs_code)))
>> +   && ((swap_tree_comparison (tree_code (first_stmt_code))
>> +== tree_code (rhs_code))
>> +   || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
>> +== tcc_comparison)
>> +   && rhs_code == alt_stmt_code)))
>>   && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
>>&& (first_stmt_code == ARRAY_REF
>>|| first_stmt_code == BIT_FIELD_REF
>> 
>> should get you SLP but:
>> 
>> t.c:8:26: note:   === vect_slp_analyze_operations ===
>> t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
>> t.c:8:26: missed:   unsupported load permutation
>> t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29
>> = *_28;
>> 
>> t.c:8:26: note:   op template: pretmp_29 = *_28;
>> t.c:8:26: note: stmt 0 pretmp_29 = *_28;
>> t.c:8:26: note: stmt 1 pretmp_29 = *_28;
>> t.c:8:26: note: load permutation { 0 0 }
> 
> hmm with that applied I get:
> 
> sve-mis.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
> sve-mis.c:8:26: note:   Vectorizing an unaligned access.
> sve-mis.c:8:26: note:   vect_model_load_cost: unaligned supported by hardware.
> sve-mis.c:8:26: note:   vect_model_load_cost: inside_cost = 1, prologue_cost =
> 0 .
> 
> but it bails out at:
> 
> sve-mis.c:8:26: missed:   Not using elementwise accesses due to variable
> vectorization factor.
> sve-mis.c:10:25: missed:   not vectorized: relevant stmt not supported:
> .MASK_STORE (_5, 8B, _27, pretmp_29);
> sve-mis.c:8:26: missed:  bad operation or unsupported loop bound.
> 
> for me

I’ve used -fno-cost-model and looked at the SVE variant only.

> --
> You are receiving this mail because:
> You are the assignee for the bug.
> You are on the CC list for the bug.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #21 from Tamar Christina  ---
(In reply to Richard Biener from comment #18)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7cf9504398c..8deeecfd4aa 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
> && rhs_code.is_tree_code ()
> && (TREE_CODE_CLASS (tree_code (first_stmt_code))
> == tcc_comparison)
> -   && (swap_tree_comparison (tree_code (first_stmt_code))
> -   == tree_code (rhs_code)))
> +   && ((swap_tree_comparison (tree_code (first_stmt_code))
> +== tree_code (rhs_code))
> +   || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
> +== tcc_comparison)
> +   && rhs_code == alt_stmt_code)))
>&& !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
> && (first_stmt_code == ARRAY_REF
> || first_stmt_code == BIT_FIELD_REF
> 
> should get you SLP but:
> 
> t.c:8:26: note:   === vect_slp_analyze_operations ===
> t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
> t.c:8:26: missed:   unsupported load permutation
> t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29
> = *_28;
> 
> t.c:8:26: note:   op template: pretmp_29 = *_28;
> t.c:8:26: note: stmt 0 pretmp_29 = *_28;
> t.c:8:26: note: stmt 1 pretmp_29 = *_28;
> t.c:8:26: note: load permutation { 0 0 }

hmm with that applied I get:

sve-mis.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
sve-mis.c:8:26: note:   Vectorizing an unaligned access.
sve-mis.c:8:26: note:   vect_model_load_cost: unaligned supported by hardware.
sve-mis.c:8:26: note:   vect_model_load_cost: inside_cost = 1, prologue_cost =
0 .

but it bails out at:

sve-mis.c:8:26: missed:   Not using elementwise accesses due to variable
vectorization factor.
sve-mis.c:10:25: missed:   not vectorized: relevant stmt not supported:
.MASK_STORE (_5, 8B, _27, pretmp_29);
sve-mis.c:8:26: missed:  bad operation or unsupported loop bound.

for me

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #20 from Richard Biener  ---
fixed.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #19 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:b312cf21afd62b43fbc5034703e2796b0c3c416d

commit r14-9011-gb312cf21afd62b43fbc5034703e2796b0c3c416d
Author: Richard Biener 
Date:   Thu Feb 15 13:41:25 2024 +0100

tree-optimization/56 - properly dissolve SLP only groups

The following fixes the omission of failing to look at pattern
stmts when we need to dissolve SLP only groups.

PR tree-optimization/56
* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look
at the pattern stmt if any.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #18 from Richard Biener  ---
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7cf9504398c..8deeecfd4aa 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
&& rhs_code.is_tree_code ()
&& (TREE_CODE_CLASS (tree_code (first_stmt_code))
== tcc_comparison)
-   && (swap_tree_comparison (tree_code (first_stmt_code))
-   == tree_code (rhs_code)))
+   && ((swap_tree_comparison (tree_code (first_stmt_code))
+== tree_code (rhs_code))
+   || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
+== tcc_comparison)
+   && rhs_code == alt_stmt_code)))
   && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
&& (first_stmt_code == ARRAY_REF
|| first_stmt_code == BIT_FIELD_REF

should get you SLP but:

t.c:8:26: note:   === vect_slp_analyze_operations ===
t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
t.c:8:26: missed:   unsupported load permutation
t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29 =
*_28;

t.c:8:26: note:   op template: pretmp_29 = *_28;
t.c:8:26: note: stmt 0 pretmp_29 = *_28;
t.c:8:26: note: stmt 1 pretmp_29 = *_28;
t.c:8:26: note: load permutation { 0 0 }

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #17 from Richard Biener  ---
I think the following fixes it, can you verify the runtime (IL looks sane, but
it uses masked scatter stores).

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9e26b09504d..5a5865c42fc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2551,7 +2551,8 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
   gcc_assert (DR_REF (dr));
-  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (DR_STMT (dr));
+  stmt_vec_info stmt_info
+   = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr)));

   /* Check if the load is a part of an interleaving chain.  */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #16 from Richard Biener  ---
OK, so the missed SLP is a known one:

t.c:8:26: note:   starting SLP discovery for node 0x5d42840
t.c:8:26: note:   Build SLP for _27 = _3 <= 7;
t.c:8:26: note:   precomputed vectype: vector([8,8]) 
t.c:8:26: note:   nunits = [8,8]
t.c:8:26: note:   Build SLP for _14 = _3 > 2;
t.c:8:26: note:   precomputed vectype: vector([8,8]) 
t.c:8:26: note:   nunits = [8,8]
t.c:8:26: missed:   Build SLP failed: different operation in stmt _14 = _3 > 2;
t.c:8:26: missed:   original stmt _27 = _3 <= 7;

I'm not sure we can do this with a single vector stmt but of course using
'two_operator' support might be possible here (do both > and <= and then
blend the result).

I see we end up using .MASK_STORE_LANES in the end but we're not using
load-lanes.

t.c:8:26: note:   ==> examining pattern statement: .MASK_STORE (_5, 8B,
patt_12, pretmp_29);
t.c:8:26: note:   vect_is_simple_use: operand () _27, type of
def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16])

t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16]) signed char
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes
t.c:8:26: note:   can use vec_mask_store_lanes
t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes
t.c:8:26: note:   can use vec_mask_store_lanes
...
t.c:8:26: note:   ==> examining pattern statement: .MASK_STORE (_9, 8B, patt_4,
pretmp_29);
t.c:8:26: note:   vect_is_simple_use: operand () _14, type of
def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16])

t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16]) signed char
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes
t.c:8:26: note:   can use vec_mask_store_lanes
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes
t.c:8:26: note:   can use vec_mask_store_lanes

and somehow transform decides to put the two stores together again, probably
missing to verify the masks are the same.

I'll dig a bit more after lunch.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #15 from Tamar Christina  ---
and just -O3 -march=armv8-a+sve

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #14 from Tamar Christina  ---
(In reply to Richard Biener from comment #13)
> I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP
> of masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed
> a DR group of stmts we cannot combine without SLP as the masks are not equal)
> should be set for both loads and stores.
> 
> The can_group_stmts_p checks as present seem correct here (but the dump
> should not say "Load" but maybe "Access")

I guess I'm wondering because of this usage:

  /* Check that the data-refs have same first location (except init)
 and they are both either store or load (not load and store,
 not masked loads or stores).  */
  if (DR_IS_READ (dra) != DR_IS_READ (drb)
  || data_ref_compare_tree (DR_BASE_ADDRESS (dra),
DR_BASE_ADDRESS (drb)) != 0
  || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != 0
  || !can_group_stmts_p (stmtinfo_a, stmtinfo_b, true))
break;

We don't exit there now for non-SLP.

> 
> So what's the testcase comment#9 talks about?

You should be able to reproduce it with:

---
typedef __SIZE_TYPE__ size_t;
typedef signed char int8_t;
typedef unsigned short uint16_t ;

void __attribute__((noinline, noclone))
test_i8_i8_i16_2(int8_t *__restrict dest, int8_t *__restrict src,
 uint16_t *__restrict cond, size_t n) {
for (size_t i = 0; i < n; ++i) {
if (cond[i] < 8)
dest[i * 2] = src[i];
if (cond[i] > 2)
dest[i * 2 + 1] = src[i];
}
}
void __attribute__((noinline, noclone))
test_i8_i8_i16_2_1(volatile int8_t * dest, volatile int8_t * src,
   volatile uint16_t * cond, size_t n) {
#pragma GCC novector
for (size_t i = 0; i < n; ++i) {
if (cond[i] < 8)
dest[i * 2] = src[i];
if (cond[i] > 2)
dest[i * 2 + 1] = src[i];
}
}

#define size 16

int8_t srcarray[size];
uint16_t maskarray[size];
int8_t destarray[size*2];
int8_t destarray1[size*2];

int main()
{
#pragma GCC novector
  for(int i = 0; i < size; i++)
  {
maskarray[i] = i == 10 ? 0 : (i == 5 ? 9 : (2*i) & 0xff);
srcarray[i] = i;
  }
#pragma GCC novector
  for(int i = 0; i < size*2; i++)
  {
destarray[i] = i;
destarray1[i] = i;
  }
  test_i8_i8_i16_2(destarray, srcarray, maskarray, size);
  test_i8_i8_i16_2_1(destarray1, srcarray, maskarray, size);

#pragma GCC novector
  for(int i = 0; i < size*2; i++)
  {
if (destarray[i] != destarray1[i])
  __builtin_abort();
  }
}

---

since really only one of the functions needs to vectorize.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #13 from Richard Biener  ---
I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP of
masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed
a DR group of stmts we cannot combine without SLP as the masks are not equal)
should be set for both loads and stores.

The can_group_stmts_p checks as present seem correct here (but the dump
should not say "Load" but maybe "Access")

So it looks like the issue is with "late" deciding we can't actually do the
masked SLP store (why?) and the odd "vect_dissolve_slp_only_groups" and
then somehow botching strided store code-gen which likely doesn't expect
masks or should have disabled fully masking?  I'll note that we don't
support single element interleaving for stores, so vect_analyze_group_access_1
would have falled back to STMT_VINFO_STRIDED_P.  But as said, maybe that
somehow misses to disable loop masking then when vect_analyze_loop_operations?

So what's the testcase comment#9 talks about?

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Tamar Christina  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #12 from Tamar Christina  ---
The commit that caused it is:

commit g:a1558e9ad856938f165f838733955b331ebbec09
Author: Richard Biener 
Date:   Wed Aug 23 14:28:26 2023 +0200

tree-optimization/15 - SLP of masked stores

The following adds the capability to do SLP on .MASK_STORE, I do not
plan to add interleaving support.

specifically this change:

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 3e9a284666c..a2caf6cb1c7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info,
stmt_vec_info stmt2_info,
 like those created by build_mask_conversion.  */
   tree mask1 = gimple_call_arg (call1, 2);
   tree mask2 = gimple_call_arg (call2, 2);
-  if (!operand_equal_p (mask1, mask2, 0)
-  && (ifn == IFN_MASK_STORE || !allow_slp_p))
+  if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p)
{
  mask1 = strip_conversion (mask1);
  if (!mask1)

With the change it now incorrectly thinks that the two masks (a <=7, a > 2) are
the same which is why one of the masks go missing.

Part of it is that the boolean is used in a weird way. During
vect_analyze_data_ref_accesses where this difference is important we pass true
in the initial check. but the || before made it so that we checked the
MASK_STOREs still.  Now it means during analysis we never check.

later on in the same method we check it again but with false as the argument
for determining STMT_VINFO_SLP_VECT_ONLY.
The debug statement there is weird btw, as it says:

  if (dump_enabled_p () && STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a))
dump_printf_loc (MSG_NOTE, vect_location,
 "Load suitable for SLP vectorization only.\n");

but as far as I can see, stmtinfo_a can be a store too, based on the checks for
DR_IS_READ (dra) just slightly higher up.

The patch that added this check (g:997636716c5dde7d59d026726a6f58918069f122)
says it's because the vectorizer doesn't support SLP of masked loads, and I
can't tell if we do now.

If we do, the boolean should be dropped.. if we don't, we probably need the
check back to allow the check for stores.  It looks like this check us being
used to disable STMT_VINFO_SLP_VECT_ONLY for loads, which is a bit counter
intuitive and feels like a hack rather than just doing:

  STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a)
-   = !can_group_stmts_p (stmtinfo_a, stmtinfo_b, false);
+   = !can_group_stmts_p (stmtinfo_a, stmtinfo_b)
+ && DR_IS_WRITE (dra);

So I think the boolean should be dropped and just reject loads for
STMT_VINFO_SLP_VECT_ONLY...
This also seems to give much better codegen... in any case, richi?

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #11 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #10)
> Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime
> failure I mentioned.

And gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c (when tested with
-march=armv9-a which I added to my testing recently).

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #10 from Andrew Pinski  ---
Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime
failure I mentioned.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #9 from Andrew Pinski  ---
Note I think GCC should be able to vectorize this loop but it goes wrong.


SVE the 7 part gets lost:

```
  vect__3.12_54 = .MASK_LOAD (_48, 16B, loop_mask_52);
  _32 = cond_17(D) + POLY_INT_CST [16, 16];
  _25 =   [(uint16_t *)_32 + ivtmp_77 *
2];
  vect__3.13_56 = .MASK_LOAD (_25, 16B, loop_mask_53);
  _1 =   [(int8_t *)src_18(D) + ivtmp_77 * 1];
  vect_pretmp_29.16_60 = .MASK_LOAD (_1, 8B, loop_mask_59);
  mask__14.19_66 = vect__3.12_54 > { 2, ... };
  mask__14.19_67 = vect__3.13_56 > { 2, ... };
  mask_patt_4.20_68 = VEC_PACK_TRUNC_EXPR ;
  vect_array.23 ={v} {CLOBBER};
  vect_array.23[0] = vect_pretmp_29.16_60;
  vect_array.23[1] = vect_pretmp_29.16_60;
  vec_mask_and_74 = loop_mask_59 & mask_patt_4.20_68;
  _2 = ivtmp_77 * 2;
  _3 =   [(int8_t *)dest_19(D) + _2 * 1];
```


But RISCV is able to vectorize it correctly:
```
  vect__3.12_52 = .MASK_LEN_LOAD (vectp_cond.10_13, 16B, { -1, ... }, _72, 0);
  vect_pretmp_29.15_56 = .MASK_LEN_LOAD (vectp_src.13_54, 8B, { -1, ... }, _72,
0);
  mask__27.16_58 = vect__3.12_52 <= { 7, ... };
  .MASK_LEN_SCATTER_STORE (vectp_dest.17_60, { 0, 2, 4, ... }, 1,
vect_pretmp_29.15_56, mask__27.16_58, _72, 0);
  mask__14.19_64 = vect__3.12_52 > { 2, ... };
  .MASK_LEN_SCATTER_STORE (vectp_dest.20_67, { 0, 2, 4, ... }, 1,
vect_pretmp_29.15_56, mask__14.19_64, _72, 0);

```

By using 2 stores and scatter here.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #8 from Andrew Pinski  ---
Created attachment 57286
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57286=edit
Testcase that shows this is wrong code

I reduced the testcase into something which shows it is wrong code too.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-02-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #7 from Tamar Christina  ---
Yeah, I know it started between g:4e27ba6e2dd85a5ad4751c35270dbd8f277302dd and
g:721f7e2c4e5eed645593258624dd91e6c39f3bd2 but the bisect is hard because some
of the commits produce an ICE instead.

The bisects lands at

commit a739bac402ea5a583e43dbd01c14ebaff317c885 (refs/bisect/bad)
Author: Richard Biener 
Date:   Fri Aug 25 09:42:16 2023 +0200

tree-optimization/36 - STMT_VINFO_SLP_VECT_ONLY and stores

vect_dissolve_slp_only_groups currently only expects loads, for stores
we have to make sure to mark the dissolved "groups" strided.

PR tree-optimization/36
* tree-vect-loop.cc (vect_dissolve_slp_only_groups): For
stores force STMT_VINFO_STRIDED_P and also duplicate that
to all elements.

but the previous commit seems to be an ICE? so I guess this one will have to be
done the hard way.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2024-01-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Richard Biener  changed:

   What|Removed |Added

   Keywords||needs-bisection
   Priority|P3  |P1

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-11-23 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||pinskia at gcc dot gnu.org
   Last reconfirmed||2023-11-24

--- Comment #6 from Andrew Pinski  ---
Confirmed.

I was going through all of the failures on aarch64 today and noticed this one
still fails.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-08-31 Thread adhemerval.zanella at linaro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Adhemerval Zanella  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #5 from Adhemerval Zanella  
---
Reopening since this is not a duplicate of bug 36.  The issue is
mask_struct_store_4.c generates the very instructions that the test is
checking:

$ ./gcc/xgcc -Bgcc -march=armv8.2-a+sve -O2 -ftree-vectorize -ffast-math
[..]/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_4.c -S -o - | grep
st2b
st2b{z30.b - z31.b}, p7, [x0, x5]
st2b{z28.b - z29.b}, p6, [x0, x5]
st2b{z28.b - z29.b}, p6, [x0, x5]
st2b{z28.b - z29.b}, p7, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]
st2b{z28.b - z29.b}, p7, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]
st2b{z28.b - z29.b}, p7, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]
st2b{z26.b - z27.b}, p6, [x0, x5]

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-08-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
   Target Milestone|--- |14.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Richard Biener  ---
Dup.

*** This bug has been marked as a duplicate of bug 36 ***

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-08-25 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #3 from David Binderman  ---
Reduced C code seems to be:

struct median_estimator {
  long median;
  long step
} median_diff_ts[];
median_estimator_update_data, median_estimator_update_diff,
median_estimator_update_median, mm_profile_print_i;
median_estimator_update(struct median_estimator *me) {
  if (__builtin_expect(me->step, 0))
me->median = median_estimator_update_data;
  if (median_estimator_update_diff)
me->step = median_estimator_update_median;
}
mm_profile_print() {
  mm_profile_print_i = 1;
  for (; mm_profile_print_i; mm_profile_print_i++)
median_estimator_update(_diff_ts[mm_profile_print_i]);
}

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-08-25 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

--- Comment #2 from David Binderman  ---
The bug first seems to appear sometime between g:93f803d53b5ccaab
and g:68f7cb6cf9e8b9f2, some 39 commits.

[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures

2023-08-25 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #1 from David Binderman  ---
I see this also, on x86_64, with -O2 -march=znver1.

I will reduce the code.