[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #15 from prathamesh3492 at gcc dot gnu.org --- Sorry for the regression, and thanks for the prompt fix!
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #14 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:e6c01334ccfca8bc748c8de90ba2a636d1662490 commit r14-5902-ge6c01334ccfca8bc748c8de90ba2a636d1662490 Author: Jakub Jelinek Date: Tue Nov 28 10:16:47 2023 +0100 testsuite: Fix up pr111754.c test On Tue, Nov 28, 2023 at 03:56:47PM +0800, juzhe.zh...@rivai.ai wrote: > Hi, there is a regression in RISC-V caused by this patch: > > FAIL: gcc.dg/vect/pr111754.c -flto -ffat-lto-objects scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }" > FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }" > > I have checked the dump is : > F foo (F a, F b) > { >[local count: 1073741824]: >= { 0.0, 9.0e+0, 0.0, 0.0 }; > return ; > > } > > The dump IR seems reasonable to me. > I wonder whether we should walk around in RISC-V backend to generate the same IR as ARM SVE ? > Or we should adjust the test ? Note, the test also FAILs on i686-linux (but not e.g. on x86_64-linux): /home/jakub/src/gcc/obj67/gcc/xgcc -B/home/jakub/src/gcc/obj67/gcc/ /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c -fdiagnostics-plain-output -O2 -fdump-tree-optimized -S +-o pr111754.s /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c: In function 'foo': /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi] /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: note: the ABI for passing parameters with 16-byte alignment has changed in GCC 4.6 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi] FAIL: gcc.dg/vect/pr111754.c (test for excess errors) Excess errors: /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi] /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi] PASS: gcc.dg/vect/pr111754.c scan-tree-dump-not optimized "VEC_PERM_EXPR" FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }" So, I think it is wrong to specify /* { dg-options "-O2 -fdump-tree-optimized" } */ in the test, should be dg-additional-options instead, so that it gets the implied vector compilation options e.g. for i686-linux (-msse2 in that case at least), question is if -Wno-psabi should be added as well or not, and certainly the scan-tree-dump needs to be guarded by appropriate vect_* effective target (but dunno which, one which asserts support for V4SFmode and returning it). Alternatively, perhaps don't check optimized dump but some earlier one before generic vector lowering, then hopefully it could match on all targets? Maybe with the = ... vs. return ... variants. 2023-11-28 Jakub Jelinek PR middle-end/111754 * gcc.dg/vect/pr111754.c: Use dg-additional-options rather than dg-options, add -Wno-psabi and use -fdump-tree-forwprop1 rather than -fdump-tree-optimized. Scan forwprop1 dump rather than optimized and scan for either direct return or setting of to the vector.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from prathamesh3492 at gcc dot gnu.org --- Fixed.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #12 from GCC Commits --- The master branch has been updated by Prathamesh Kulkarni : https://gcc.gnu.org/g:2065438db4ac13af7e0de2f934959413647f74a7 commit r14-5890-g2065438db4ac13af7e0de2f934959413647f74a7 Author: Prathamesh Kulkarni Date: Mon Nov 27 22:40:49 2023 +0530 PR111754: Rework encoding of result for VEC_PERM_EXPR with constant input vectors. gcc/ChangeLog: PR middle-end/111754 * fold-const.cc (fold_vec_perm_cst): Set result's encoding to sel's encoding, and set res_nelts_per_pattern to 2 if sel contains stepped sequence but input vectors do not. (test_nunits_min_2): New test Case 8. (test_nunits_min_4): New tests Case 8 and Case 9. gcc/testsuite/ChangeLog: PR middle-end/111754 * gcc.target/aarch64/sve/slp_3.c: Adjust code-gen. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.dg/vect/pr111754.c: New test. Co-authored-by: Richard Sandiford
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 Sam James changed: What|Removed |Added CC||sjames at gcc dot gnu.org Keywords|needs-bisection | --- Comment #11 from Sam James --- (In reply to Richard Sandiford from comment #10) > Yeah, the problem went latent after the fix for PR111648, [...] OK, removing needs-bisection then.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #10 from Richard Sandiford --- Yeah, the problem went latent after the fix for PR111648, but the underlying problem is still there. Prathamesh is working on a fix for that. See the thread starting at https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633784.html for discussion.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #9 from Andrew Pinski --- The ICE seems to be gone, and the generated code looks correct.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #8 from rguenther at suse dot de --- > Am 10.10.2023 um 16:07 schrieb prathamesh3492 at gcc dot gnu.org > : > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 > > --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- > (In reply to rguent...@suse.de from comment #4) >>> On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote: >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 >>> >>> --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- >>> The issue is that we only support integral vector types in >>> fold_vec_perm_cst, >>> but fail to check for the same before calling it from fold_vec_perm. >>> The following tweak fixes the ICE: >>> >>> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc >>> index 4f8561509ff..a29a8af6d2f 100644 >>> --- a/gcc/fold-const.cc >>> +++ b/gcc/fold-const.cc >>> @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, >>> const >>> vec_perm_indices ) >>> return NULL_TREE; >>> >>> if (TREE_CODE (arg0) == VECTOR_CST >>> - && TREE_CODE (arg1) == VECTOR_CST) >>> + && TREE_CODE (arg1) == VECTOR_CST >>> + && INTEGRAL_TYPE_P (TREE_TYPE (type))) >>> return fold_vec_perm_cst (type, arg0, arg1, sel); >> >> Huh, that looks wrong. I fail to see how the element type matters >> at all. > > IIUC, the element type matters for VLA folding when sel has a stepped sequence > because in that case we need to derive elements from the encoding using > vector_cst_elt / vector_cst_int_elt, and it gets enforced for VLS vectors too > because they are handled in unified manner in fold_vec_perm_cst. > > Another possible approach is to use "VLS exception" in fold_vec_perm_cst to > encode all the elements: > res_npatterns = res_nelts; > res_nelts_per_patterns = 1 > just like we do if valid_mask_for_fold_vec_perm_cst_p returns false. > > Does the following fix look OK instead ? I think so, the important part is to fold for VLS types. I defer to Richard S. For whether we can do better for VLA here > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 4f8561509ff..356eb052fbc 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree > arg1, > if (sel_nelts_per_pattern < 3) > return true; > > + /* If SEL contains stepped sequence, ensure that we are dealing with > + integral vector_cst. */ > + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0))) > +return false; > + > for (unsigned pattern = 0; pattern < sel_npatterns; pattern++) > { > poly_uint64 a1 = sel[pattern + sel_npatterns]; > > -- > You are receiving this mail because: > You are on the CC list for the bug.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #7 from prathamesh3492 at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) > It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 }) > 2 and VECTOR_CST_NPATTERNS == 1. And the selector { 1, 0, 1, 2 } has > npatterns == 1 and nelts-per-pattern == 3. > > /* (1) If SEL is a suitable mask as determined by > valid_mask_for_fold_vec_perm_cst_p, then: > res_npatterns = max of npatterns between ARG0, ARG1, and SEL > res_nelts_per_pattern = max of nelts_per_pattern between > ARG0, ARG1 and SEL. > (2) If SEL is not a suitable mask, and TYPE is VLS then: > res_npatterns = nelts in result vector. > res_nelts_per_pattern = 1. > This exception is made so that VLS ARG0, ARG1 and SEL work as before. > */ > if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) > { > res_npatterns > = std::max (VECTOR_CST_NPATTERNS (arg0), > std::max (VECTOR_CST_NPATTERNS (arg1), > sel.encoding ().npatterns ())); > > res_nelts_per_pattern > = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0), > std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1), > sel.encoding ().nelts_per_pattern ())); > > res_nelts = res_npatterns * res_nelts_per_pattern; > > this seems to be a case that doesn't fit, so the fix needs to be to > valid_mask_for_fold_vec_perm_cst_p which really looks a bit > unwieldly. valid_mask_for_fold_vec_perm_cst_p returns incorrectly true here, which is being addressed in PR111648 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631926.html Even if the vectors had integral element type: arg0 = arg1 = (v4si){ 9, 0, 0, 0 } // encoded as {9, 0, ...} and sel = { 1, 0, 1, 2 } // encoded as {1, 0, 1, ...} The pattern in sel {1, 0, 1, ...} would choose elements from arg0, and res would have incorrect encoding with step = -9: res = { arg0[1], arg0[0], arg0[1], ... } = { 0, 9, 0, ... } And res[3] will be incorrectly computed as -9 instead of arg0[2]. However, for floating element types, even if encoding is correct, I assume it will still ICE when trying to derive elements not present in encoding since poly_int_cst can only deal with integral elements ? > > An assert that res_nelts is power-of-two would be nice to add. Sorry, I don't understand. res_nelts for VLA need not be power of 2, since res_nelts_per_pattern can be 3. The encoding for res is chosen to be max of npatterns and max of nelts_per_pattern between arg0, arg1, and sel. Thanks, Prathamesh
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #4) > On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 > > > > --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- > > The issue is that we only support integral vector types in > > fold_vec_perm_cst, > > but fail to check for the same before calling it from fold_vec_perm. > > The following tweak fixes the ICE: > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 4f8561509ff..a29a8af6d2f 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, > > const > > vec_perm_indices ) > > return NULL_TREE; > > > >if (TREE_CODE (arg0) == VECTOR_CST > > - && TREE_CODE (arg1) == VECTOR_CST) > > + && TREE_CODE (arg1) == VECTOR_CST > > + && INTEGRAL_TYPE_P (TREE_TYPE (type))) > > return fold_vec_perm_cst (type, arg0, arg1, sel); > > Huh, that looks wrong. I fail to see how the element type matters > at all. IIUC, the element type matters for VLA folding when sel has a stepped sequence because in that case we need to derive elements from the encoding using vector_cst_elt / vector_cst_int_elt, and it gets enforced for VLS vectors too because they are handled in unified manner in fold_vec_perm_cst. Another possible approach is to use "VLS exception" in fold_vec_perm_cst to encode all the elements: res_npatterns = res_nelts; res_nelts_per_patterns = 1 just like we do if valid_mask_for_fold_vec_perm_cst_p returns false. Does the following fix look OK instead ? diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..356eb052fbc 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, if (sel_nelts_per_pattern < 3) return true; + /* If SEL contains stepped sequence, ensure that we are dealing with + integral vector_cst. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0))) +return false; + for (unsigned pattern = 0; pattern < sel_npatterns; pattern++) { poly_uint64 a1 = sel[pattern + sel_npatterns];
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #5 from Richard Biener --- It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 }) 2 and VECTOR_CST_NPATTERNS == 1. And the selector { 1, 0, 1, 2 } has npatterns == 1 and nelts-per-pattern == 3. /* (1) If SEL is a suitable mask as determined by valid_mask_for_fold_vec_perm_cst_p, then: res_npatterns = max of npatterns between ARG0, ARG1, and SEL res_nelts_per_pattern = max of nelts_per_pattern between ARG0, ARG1 and SEL. (2) If SEL is not a suitable mask, and TYPE is VLS then: res_npatterns = nelts in result vector. res_nelts_per_pattern = 1. This exception is made so that VLS ARG0, ARG1 and SEL work as before. */ if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) { res_npatterns = std::max (VECTOR_CST_NPATTERNS (arg0), std::max (VECTOR_CST_NPATTERNS (arg1), sel.encoding ().npatterns ())); res_nelts_per_pattern = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0), std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1), sel.encoding ().nelts_per_pattern ())); res_nelts = res_npatterns * res_nelts_per_pattern; this seems to be a case that doesn't fit, so the fix needs to be to valid_mask_for_fold_vec_perm_cst_p which really looks a bit unwieldly. An assert that res_nelts is power-of-two would be nice to add.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #4 from rguenther at suse dot de --- On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 > > --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- > The issue is that we only support integral vector types in fold_vec_perm_cst, > but fail to check for the same before calling it from fold_vec_perm. > The following tweak fixes the ICE: > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 4f8561509ff..a29a8af6d2f 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const > vec_perm_indices ) > return NULL_TREE; > >if (TREE_CODE (arg0) == VECTOR_CST > - && TREE_CODE (arg1) == VECTOR_CST) > + && TREE_CODE (arg1) == VECTOR_CST > + && INTEGRAL_TYPE_P (TREE_TYPE (type))) > return fold_vec_perm_cst (type, arg0, arg1, sel); Huh, that looks wrong. I fail to see how the element type matters at all.
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- The issue is that we only support integral vector types in fold_vec_perm_cst, but fail to check for the same before calling it from fold_vec_perm. The following tweak fixes the ICE: diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..a29a8af6d2f 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices ) return NULL_TREE; if (TREE_CODE (arg0) == VECTOR_CST - && TREE_CODE (arg1) == VECTOR_CST) + && TREE_CODE (arg1) == VECTOR_CST + && INTEGRAL_TYPE_P (TREE_TYPE (type))) return fold_vec_perm_cst (type, arg0, arg1, sel); /* For fall back case, we want to ensure we have VLS vectors and results in the following .optimized dump: F bar (F a, F b) { F c; [local count: 1073741824]: c_2 = VEC_PERM_EXPR ; __builtin_logbl (0.0); return c_2; } F foo () { [local count: 1073741824]: __builtin_logbl (0.0); return { 0.0, 9.0e+0, 0.0, 0.0 }; }
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Hi, Sorry for the breakage, will take a look. Thanks, Prathamesh
[Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2023-10-10 Component|rtl-optimization|middle-end Ever confirmed|0 |1 Target Milestone|--- |14.0 Priority|P3 |P1 Keywords||needs-bisection --- Comment #1 from Richard Biener --- (gdb) p debug_tree (exp) unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76d672a0 precision:32 pointer_to_this > sizes-gimplified V4SF size unit-size align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76e99b28 nunits:4> constant npatterns:1 nelts-per-pattern:3 elt:0: constant 0.0> elt:1: constant 9.0e+0> elt:2: > Confirmed. That's an odd vector mode 'F'!? And we have three elements? Happens after inlining bar () into foo (): F bar (F a, F b) { F c; [local count: 1073741824]: c_2 = VEC_PERM_EXPR ; __builtin_logbl (0.0); return c_2; F foo () { F _3; [local count: 1073741824]: _3 = bar ({ 9.0e+0, 0.0, 0.0, 0.0 }, { 0.0, 0.0, 0.0, 0.0 }); and the constant folding of the VEC_PERM_EXPR causes this. ICEs way earlier (during inlining) when you do -fdump-ipa-inline