[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #23 from Richard Biener --- (In reply to Richard Biener from comment #18) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 7cf9504398c..8deeecfd4aa 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > *swap, > && rhs_code.is_tree_code () > && (TREE_CODE_CLASS (tree_code (first_stmt_code)) > == tcc_comparison) > - && (swap_tree_comparison (tree_code (first_stmt_code)) > - == tree_code (rhs_code))) > + && ((swap_tree_comparison (tree_code (first_stmt_code)) > +== tree_code (rhs_code)) > + || ((TREE_CODE_CLASS (tree_code (alt_stmt_code)) > +== tcc_comparison) > + && rhs_code == alt_stmt_code))) >&& !(STMT_VINFO_GROUPED_ACCESS (stmt_info) > && (first_stmt_code == ARRAY_REF > || first_stmt_code == BIT_FIELD_REF > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 7cf9504398c..e35a3fa 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1519,7 +1522,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (alt_stmt_code != ERROR_MARK && (!alt_stmt_code.is_tree_code () || (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_reference - && TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_comparison))) + && (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_comparison + || (swap_tree_comparison (tree_code (first_stmt_code)) + != tree_code (alt_stmt_code)) { *two_operators = true; } is also needed btw. to avoid wrong-code. I see t.c:8:26: note: ==> examining statement: pretmp_29 = *_28; t.c:8:26: missed: unsupported load permutation t.c:10:30: missed: not vectorized: relevant stmt not supported: pretmp_29 = *_28; t.c:8:26: note: removing SLP instance operations starting from: .MASK_STORE (_5, 8B, patt_12, pretmp_29); using -O3 -march=armv8.3-a+sve - it then does t.c:8:26: missed: unsupported SLP instances t.c:8:26: note: re-trying with SLP disabled and _that_ fails then with t.c:8:26: missed: Not using elementwise accesses due to variable vectorization factor. t.c:6:1: missed: not vectorized: relevant stmt not supported: .MASK_STORE (_5, 8B, patt_12, pretmp_29); but the interesting bit is why it fails to handle the SLP case. That's possibly because the load isn't a grouped access, we get dr_group_size == 1 and group_size == 2 and nunits is {16, 16} (!repeating_p) and so /* We need to construct a separate mask for each vector statement. */ unsigned HOST_WIDE_INT const_nunits, const_vf; if (!nunits.is_constant (_nunits) || !vf.is_constant (_vf)) return false; I'm not sure what that comment means, but supposedly we simply fail to handle another special case that we could here? Possibly dr_group_size == 1?
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #22 from rguenther at suse dot de --- > Am 15.02.2024 um 19:53 schrieb tnfchris at gcc dot gnu.org > : > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 > > --- Comment #21 from Tamar Christina --- > (In reply to Richard Biener from comment #18) >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >> index 7cf9504398c..8deeecfd4aa 100644 >> --- a/gcc/tree-vect-slp.cc >> +++ b/gcc/tree-vect-slp.cc >> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char >> *swap, >>&& rhs_code.is_tree_code () >>&& (TREE_CODE_CLASS (tree_code (first_stmt_code)) >>== tcc_comparison) >> - && (swap_tree_comparison (tree_code (first_stmt_code)) >> - == tree_code (rhs_code))) >> + && ((swap_tree_comparison (tree_code (first_stmt_code)) >> +== tree_code (rhs_code)) >> + || ((TREE_CODE_CLASS (tree_code (alt_stmt_code)) >> +== tcc_comparison) >> + && rhs_code == alt_stmt_code))) >> && !(STMT_VINFO_GROUPED_ACCESS (stmt_info) >>&& (first_stmt_code == ARRAY_REF >>|| first_stmt_code == BIT_FIELD_REF >> >> should get you SLP but: >> >> t.c:8:26: note: === vect_slp_analyze_operations === >> t.c:8:26: note: ==> examining statement: pretmp_29 = *_28; >> t.c:8:26: missed: unsupported load permutation >> t.c:10:30: missed: not vectorized: relevant stmt not supported: pretmp_29 >> = *_28; >> >> t.c:8:26: note: op template: pretmp_29 = *_28; >> t.c:8:26: note: stmt 0 pretmp_29 = *_28; >> t.c:8:26: note: stmt 1 pretmp_29 = *_28; >> t.c:8:26: note: load permutation { 0 0 } > > hmm with that applied I get: > > sve-mis.c:8:26: note: ==> examining statement: pretmp_29 = *_28; > sve-mis.c:8:26: note: Vectorizing an unaligned access. > sve-mis.c:8:26: note: vect_model_load_cost: unaligned supported by hardware. > sve-mis.c:8:26: note: vect_model_load_cost: inside_cost = 1, prologue_cost = > 0 . > > but it bails out at: > > sve-mis.c:8:26: missed: Not using elementwise accesses due to variable > vectorization factor. > sve-mis.c:10:25: missed: not vectorized: relevant stmt not supported: > .MASK_STORE (_5, 8B, _27, pretmp_29); > sve-mis.c:8:26: missed: bad operation or unsupported loop bound. > > for me I’ve used -fno-cost-model and looked at the SVE variant only. > -- > You are receiving this mail because: > You are the assignee for the bug. > You are on the CC list for the bug.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #21 from Tamar Christina --- (In reply to Richard Biener from comment #18) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 7cf9504398c..8deeecfd4aa 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > *swap, > && rhs_code.is_tree_code () > && (TREE_CODE_CLASS (tree_code (first_stmt_code)) > == tcc_comparison) > - && (swap_tree_comparison (tree_code (first_stmt_code)) > - == tree_code (rhs_code))) > + && ((swap_tree_comparison (tree_code (first_stmt_code)) > +== tree_code (rhs_code)) > + || ((TREE_CODE_CLASS (tree_code (alt_stmt_code)) > +== tcc_comparison) > + && rhs_code == alt_stmt_code))) >&& !(STMT_VINFO_GROUPED_ACCESS (stmt_info) > && (first_stmt_code == ARRAY_REF > || first_stmt_code == BIT_FIELD_REF > > should get you SLP but: > > t.c:8:26: note: === vect_slp_analyze_operations === > t.c:8:26: note: ==> examining statement: pretmp_29 = *_28; > t.c:8:26: missed: unsupported load permutation > t.c:10:30: missed: not vectorized: relevant stmt not supported: pretmp_29 > = *_28; > > t.c:8:26: note: op template: pretmp_29 = *_28; > t.c:8:26: note: stmt 0 pretmp_29 = *_28; > t.c:8:26: note: stmt 1 pretmp_29 = *_28; > t.c:8:26: note: load permutation { 0 0 } hmm with that applied I get: sve-mis.c:8:26: note: ==> examining statement: pretmp_29 = *_28; sve-mis.c:8:26: note: Vectorizing an unaligned access. sve-mis.c:8:26: note: vect_model_load_cost: unaligned supported by hardware. sve-mis.c:8:26: note: vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . but it bails out at: sve-mis.c:8:26: missed: Not using elementwise accesses due to variable vectorization factor. sve-mis.c:10:25: missed: not vectorized: relevant stmt not supported: .MASK_STORE (_5, 8B, _27, pretmp_29); sve-mis.c:8:26: missed: bad operation or unsupported loop bound. for me
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #20 from Richard Biener --- fixed.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #19 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:b312cf21afd62b43fbc5034703e2796b0c3c416d commit r14-9011-gb312cf21afd62b43fbc5034703e2796b0c3c416d Author: Richard Biener Date: Thu Feb 15 13:41:25 2024 +0100 tree-optimization/56 - properly dissolve SLP only groups The following fixes the omission of failing to look at pattern stmts when we need to dissolve SLP only groups. PR tree-optimization/56 * tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look at the pattern stmt if any.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #18 from Richard Biener --- diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 7cf9504398c..8deeecfd4aa 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, && rhs_code.is_tree_code () && (TREE_CODE_CLASS (tree_code (first_stmt_code)) == tcc_comparison) - && (swap_tree_comparison (tree_code (first_stmt_code)) - == tree_code (rhs_code))) + && ((swap_tree_comparison (tree_code (first_stmt_code)) +== tree_code (rhs_code)) + || ((TREE_CODE_CLASS (tree_code (alt_stmt_code)) +== tcc_comparison) + && rhs_code == alt_stmt_code))) && !(STMT_VINFO_GROUPED_ACCESS (stmt_info) && (first_stmt_code == ARRAY_REF || first_stmt_code == BIT_FIELD_REF should get you SLP but: t.c:8:26: note: === vect_slp_analyze_operations === t.c:8:26: note: ==> examining statement: pretmp_29 = *_28; t.c:8:26: missed: unsupported load permutation t.c:10:30: missed: not vectorized: relevant stmt not supported: pretmp_29 = *_28; t.c:8:26: note: op template: pretmp_29 = *_28; t.c:8:26: note: stmt 0 pretmp_29 = *_28; t.c:8:26: note: stmt 1 pretmp_29 = *_28; t.c:8:26: note: load permutation { 0 0 }
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #17 from Richard Biener --- I think the following fixes it, can you verify the runtime (IL looks sane, but it uses masked scatter stores). diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 9e26b09504d..5a5865c42fc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -2551,7 +2551,8 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo) FOR_EACH_VEC_ELT (datarefs, i, dr) { gcc_assert (DR_REF (dr)); - stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (DR_STMT (dr)); + stmt_vec_info stmt_info + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr))); /* Check if the load is a part of an interleaving chain. */ if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #16 from Richard Biener --- OK, so the missed SLP is a known one: t.c:8:26: note: starting SLP discovery for node 0x5d42840 t.c:8:26: note: Build SLP for _27 = _3 <= 7; t.c:8:26: note: precomputed vectype: vector([8,8]) t.c:8:26: note: nunits = [8,8] t.c:8:26: note: Build SLP for _14 = _3 > 2; t.c:8:26: note: precomputed vectype: vector([8,8]) t.c:8:26: note: nunits = [8,8] t.c:8:26: missed: Build SLP failed: different operation in stmt _14 = _3 > 2; t.c:8:26: missed: original stmt _27 = _3 <= 7; I'm not sure we can do this with a single vector stmt but of course using 'two_operator' support might be possible here (do both > and <= and then blend the result). I see we end up using .MASK_STORE_LANES in the end but we're not using load-lanes. t.c:8:26: note: ==> examining pattern statement: .MASK_STORE (_5, 8B, patt_12, pretmp_29); t.c:8:26: note: vect_is_simple_use: operand () _27, type of def: internal t.c:8:26: note: vect_is_simple_use: vectype vector([16,16]) t.c:8:26: note: vect_is_simple_use: operand *_28, type of def: internal t.c:8:26: note: vect_is_simple_use: vectype vector([16,16]) signed char t.c:8:26: missed: cannot use vec_mask_len_store_lanes t.c:8:26: note: can use vec_mask_store_lanes t.c:8:26: note: vect_is_simple_use: operand *_28, type of def: internal t.c:8:26: missed: cannot use vec_mask_len_store_lanes t.c:8:26: note: can use vec_mask_store_lanes ... t.c:8:26: note: ==> examining pattern statement: .MASK_STORE (_9, 8B, patt_4, pretmp_29); t.c:8:26: note: vect_is_simple_use: operand () _14, type of def: internal t.c:8:26: note: vect_is_simple_use: vectype vector([16,16]) t.c:8:26: note: vect_is_simple_use: operand *_28, type of def: internal t.c:8:26: note: vect_is_simple_use: vectype vector([16,16]) signed char t.c:8:26: missed: cannot use vec_mask_len_store_lanes t.c:8:26: note: can use vec_mask_store_lanes t.c:8:26: missed: cannot use vec_mask_len_store_lanes t.c:8:26: note: can use vec_mask_store_lanes and somehow transform decides to put the two stores together again, probably missing to verify the masks are the same. I'll dig a bit more after lunch.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #15 from Tamar Christina --- and just -O3 -march=armv8-a+sve
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #14 from Tamar Christina --- (In reply to Richard Biener from comment #13) > I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP > of masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed > a DR group of stmts we cannot combine without SLP as the masks are not equal) > should be set for both loads and stores. > > The can_group_stmts_p checks as present seem correct here (but the dump > should not say "Load" but maybe "Access") I guess I'm wondering because of this usage: /* Check that the data-refs have same first location (except init) and they are both either store or load (not load and store, not masked loads or stores). */ if (DR_IS_READ (dra) != DR_IS_READ (drb) || data_ref_compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb)) != 0 || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != 0 || !can_group_stmts_p (stmtinfo_a, stmtinfo_b, true)) break; We don't exit there now for non-SLP. > > So what's the testcase comment#9 talks about? You should be able to reproduce it with: --- typedef __SIZE_TYPE__ size_t; typedef signed char int8_t; typedef unsigned short uint16_t ; void __attribute__((noinline, noclone)) test_i8_i8_i16_2(int8_t *__restrict dest, int8_t *__restrict src, uint16_t *__restrict cond, size_t n) { for (size_t i = 0; i < n; ++i) { if (cond[i] < 8) dest[i * 2] = src[i]; if (cond[i] > 2) dest[i * 2 + 1] = src[i]; } } void __attribute__((noinline, noclone)) test_i8_i8_i16_2_1(volatile int8_t * dest, volatile int8_t * src, volatile uint16_t * cond, size_t n) { #pragma GCC novector for (size_t i = 0; i < n; ++i) { if (cond[i] < 8) dest[i * 2] = src[i]; if (cond[i] > 2) dest[i * 2 + 1] = src[i]; } } #define size 16 int8_t srcarray[size]; uint16_t maskarray[size]; int8_t destarray[size*2]; int8_t destarray1[size*2]; int main() { #pragma GCC novector for(int i = 0; i < size; i++) { maskarray[i] = i == 10 ? 0 : (i == 5 ? 9 : (2*i) & 0xff); srcarray[i] = i; } #pragma GCC novector for(int i = 0; i < size*2; i++) { destarray[i] = i; destarray1[i] = i; } test_i8_i8_i16_2(destarray, srcarray, maskarray, size); test_i8_i8_i16_2_1(destarray1, srcarray, maskarray, size); #pragma GCC novector for(int i = 0; i < size*2; i++) { if (destarray[i] != destarray1[i]) __builtin_abort(); } } --- since really only one of the functions needs to vectorize.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #13 from Richard Biener --- I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP of masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed a DR group of stmts we cannot combine without SLP as the masks are not equal) should be set for both loads and stores. The can_group_stmts_p checks as present seem correct here (but the dump should not say "Load" but maybe "Access") So it looks like the issue is with "late" deciding we can't actually do the masked SLP store (why?) and the odd "vect_dissolve_slp_only_groups" and then somehow botching strided store code-gen which likely doesn't expect masks or should have disabled fully masking? I'll note that we don't support single element interleaving for stores, so vect_analyze_group_access_1 would have falled back to STMT_VINFO_STRIDED_P. But as said, maybe that somehow misses to disable loop masking then when vect_analyze_loop_operations? So what's the testcase comment#9 talks about?
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Tamar Christina changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #12 from Tamar Christina --- The commit that caused it is: commit g:a1558e9ad856938f165f838733955b331ebbec09 Author: Richard Biener Date: Wed Aug 23 14:28:26 2023 +0200 tree-optimization/15 - SLP of masked stores The following adds the capability to do SLP on .MASK_STORE, I do not plan to add interleaving support. specifically this change: diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 3e9a284666c..a2caf6cb1c7 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info, like those created by build_mask_conversion. */ tree mask1 = gimple_call_arg (call1, 2); tree mask2 = gimple_call_arg (call2, 2); - if (!operand_equal_p (mask1, mask2, 0) - && (ifn == IFN_MASK_STORE || !allow_slp_p)) + if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p) { mask1 = strip_conversion (mask1); if (!mask1) With the change it now incorrectly thinks that the two masks (a <=7, a > 2) are the same which is why one of the masks go missing. Part of it is that the boolean is used in a weird way. During vect_analyze_data_ref_accesses where this difference is important we pass true in the initial check. but the || before made it so that we checked the MASK_STOREs still. Now it means during analysis we never check. later on in the same method we check it again but with false as the argument for determining STMT_VINFO_SLP_VECT_ONLY. The debug statement there is weird btw, as it says: if (dump_enabled_p () && STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a)) dump_printf_loc (MSG_NOTE, vect_location, "Load suitable for SLP vectorization only.\n"); but as far as I can see, stmtinfo_a can be a store too, based on the checks for DR_IS_READ (dra) just slightly higher up. The patch that added this check (g:997636716c5dde7d59d026726a6f58918069f122) says it's because the vectorizer doesn't support SLP of masked loads, and I can't tell if we do now. If we do, the boolean should be dropped.. if we don't, we probably need the check back to allow the check for stores. It looks like this check us being used to disable STMT_VINFO_SLP_VECT_ONLY for loads, which is a bit counter intuitive and feels like a hack rather than just doing: STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a) - = !can_group_stmts_p (stmtinfo_a, stmtinfo_b, false); + = !can_group_stmts_p (stmtinfo_a, stmtinfo_b) + && DR_IS_WRITE (dra); So I think the boolean should be dropped and just reject loads for STMT_VINFO_SLP_VECT_ONLY... This also seems to give much better codegen... in any case, richi?
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #11 from Andrew Pinski --- (In reply to Andrew Pinski from comment #10) > Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime > failure I mentioned. And gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c (when tested with -march=armv9-a which I added to my testing recently).
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #10 from Andrew Pinski --- Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime failure I mentioned.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #9 from Andrew Pinski --- Note I think GCC should be able to vectorize this loop but it goes wrong. SVE the 7 part gets lost: ``` vect__3.12_54 = .MASK_LOAD (_48, 16B, loop_mask_52); _32 = cond_17(D) + POLY_INT_CST [16, 16]; _25 = [(uint16_t *)_32 + ivtmp_77 * 2]; vect__3.13_56 = .MASK_LOAD (_25, 16B, loop_mask_53); _1 = [(int8_t *)src_18(D) + ivtmp_77 * 1]; vect_pretmp_29.16_60 = .MASK_LOAD (_1, 8B, loop_mask_59); mask__14.19_66 = vect__3.12_54 > { 2, ... }; mask__14.19_67 = vect__3.13_56 > { 2, ... }; mask_patt_4.20_68 = VEC_PACK_TRUNC_EXPR ; vect_array.23 ={v} {CLOBBER}; vect_array.23[0] = vect_pretmp_29.16_60; vect_array.23[1] = vect_pretmp_29.16_60; vec_mask_and_74 = loop_mask_59 & mask_patt_4.20_68; _2 = ivtmp_77 * 2; _3 = [(int8_t *)dest_19(D) + _2 * 1]; ``` But RISCV is able to vectorize it correctly: ``` vect__3.12_52 = .MASK_LEN_LOAD (vectp_cond.10_13, 16B, { -1, ... }, _72, 0); vect_pretmp_29.15_56 = .MASK_LEN_LOAD (vectp_src.13_54, 8B, { -1, ... }, _72, 0); mask__27.16_58 = vect__3.12_52 <= { 7, ... }; .MASK_LEN_SCATTER_STORE (vectp_dest.17_60, { 0, 2, 4, ... }, 1, vect_pretmp_29.15_56, mask__27.16_58, _72, 0); mask__14.19_64 = vect__3.12_52 > { 2, ... }; .MASK_LEN_SCATTER_STORE (vectp_dest.20_67, { 0, 2, 4, ... }, 1, vect_pretmp_29.15_56, mask__14.19_64, _72, 0); ``` By using 2 stores and scatter here.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #8 from Andrew Pinski --- Created attachment 57286 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57286=edit Testcase that shows this is wrong code I reduced the testcase into something which shows it is wrong code too.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #7 from Tamar Christina --- Yeah, I know it started between g:4e27ba6e2dd85a5ad4751c35270dbd8f277302dd and g:721f7e2c4e5eed645593258624dd91e6c39f3bd2 but the bisect is hard because some of the commits produce an ICE instead. The bisects lands at commit a739bac402ea5a583e43dbd01c14ebaff317c885 (refs/bisect/bad) Author: Richard Biener Date: Fri Aug 25 09:42:16 2023 +0200 tree-optimization/36 - STMT_VINFO_SLP_VECT_ONLY and stores vect_dissolve_slp_only_groups currently only expects loads, for stores we have to make sure to mark the dissolved "groups" strided. PR tree-optimization/36 * tree-vect-loop.cc (vect_dissolve_slp_only_groups): For stores force STMT_VINFO_STRIDED_P and also duplicate that to all elements. but the previous commit seems to be an ICE? so I guess this one will have to be done the hard way.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Richard Biener changed: What|Removed |Added Keywords||needs-bisection Priority|P3 |P1
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC||pinskia at gcc dot gnu.org Last reconfirmed||2023-11-24 --- Comment #6 from Andrew Pinski --- Confirmed. I was going through all of the failures on aarch64 today and noticed this one still fails.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Adhemerval Zanella changed: What|Removed |Added Resolution|DUPLICATE |--- Status|RESOLVED|UNCONFIRMED --- Comment #5 from Adhemerval Zanella --- Reopening since this is not a duplicate of bug 36. The issue is mask_struct_store_4.c generates the very instructions that the test is checking: $ ./gcc/xgcc -Bgcc -march=armv8.2-a+sve -O2 -ftree-vectorize -ffast-math [..]/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_4.c -S -o - | grep st2b st2b{z30.b - z31.b}, p7, [x0, x5] st2b{z28.b - z29.b}, p6, [x0, x5] st2b{z28.b - z29.b}, p6, [x0, x5] st2b{z28.b - z29.b}, p7, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5] st2b{z28.b - z29.b}, p7, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5] st2b{z28.b - z29.b}, p7, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5] st2b{z26.b - z27.b}, p6, [x0, x5]
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 Richard Biener changed: What|Removed |Added Resolution|--- |DUPLICATE Target Milestone|--- |14.0 Status|UNCONFIRMED |RESOLVED --- Comment #4 from Richard Biener --- Dup. *** This bug has been marked as a duplicate of bug 36 ***
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #3 from David Binderman --- Reduced C code seems to be: struct median_estimator { long median; long step } median_diff_ts[]; median_estimator_update_data, median_estimator_update_diff, median_estimator_update_median, mm_profile_print_i; median_estimator_update(struct median_estimator *me) { if (__builtin_expect(me->step, 0)) me->median = median_estimator_update_data; if (median_estimator_update_diff) me->step = median_estimator_update_median; } mm_profile_print() { mm_profile_print_i = 1; for (; mm_profile_print_i; mm_profile_print_i++) median_estimator_update(_diff_ts[mm_profile_print_i]); }
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 --- Comment #2 from David Binderman --- The bug first seems to appear sometime between g:93f803d53b5ccaab and g:68f7cb6cf9e8b9f2, some 39 commits.
[Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56 David Binderman changed: What|Removed |Added CC||dcb314 at hotmail dot com --- Comment #1 from David Binderman --- I see this also, on x86_64, with -O2 -march=znver1. I will reduce the code.