[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |12.0
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Hongtao.liu --- Fixed in GCC12.
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #7 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:28daadc98094501175c9dfe4a985871fa6aa4f94 commit r12-1138-g28daadc98094501175c9dfe4a985871fa6aa4f94 Author: liuhongt Date: Wed Jan 6 16:33:27 2021 +0800 Extend is_cond_scalar_reduction to handle nop_expr after/before scalar reduction.[PR98365] gcc/ChangeLog: PR tree-optimization/98365 * tree-if-conv.c (strip_nop_cond_scalar_reduction): New function. (is_cond_scalar_reduction): Handle nop_expr in cond scalar reduction. (convert_scalar_cond_reduction): Ditto. (predicate_scalar_phi): Ditto. gcc/testsuite/ChangeLog: PR tree-optimization/98365 * gcc.target/i386/pr98365.c: New test.
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #6 from Hongtao.liu --- Created attachment 49897 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49897&action=edit Bootstrapped and regtested on x86_64-linux-gnu{-m32,} Waiting for GCC12 stage1.
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #5 from Hongtao.liu --- > > And successully vectorized. > Also vectorized loop with cnt defined as signed short. .i.e int foo (short a[64], short c[64]) { int i; short cnt=0; for (int i = 0;i != 64; i++) if (a[i] == c[i]) cnt++; return cnt; } Since signed integer overflow is undefined in C++ and C, gcc would always convert signed char/short to unsigned char/short to avoid UB, and ifcvt should be able to "properly" hanlde those nop conversion.
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #4 from Hongtao.liu --- > I hope vectorizer reduction can handle the upper sequence. After hacked in ifcvt, got .165.cvt [local count: 1057206201]: # cnt_21 = PHI # i_22 = PHI # ivtmp_19 = PHI _1 = (sizetype) i_22; _2 = a_14(D) + _1; _3 = *_2; _5 = c_15(D) + _1; _6 = *_5; cnt.1_7 = (unsigned char) cnt_21; _ifc__35 = _3 == _6 ? 1 : 0; _nop__36 = cnt.1_7 + _ifc__35; cnt_9 = (char) _nop__36; i_17 = i_22 + 1; ivtmp_18 = ivtmp_19 - 1; if (ivtmp_18 != 0) goto ; [98.44%] --- And successully vectorized. .166t.vect -- [local count: 33071249]: # cnt_21 = PHI # i_22 = PHI # ivtmp_19 = PHI # vect_cnt_21.6_38 = PHI # vectp_a.7_39 = PHI # vectp_c.10_42 = PHI # ivtmp_56 = PHI _1 = (sizetype) i_22; _2 = a_14(D) + _1; vect__3.9_41 = MEM [(char *)vectp_a.7_39]; _3 = *_2; _5 = c_15(D) + _1; vect__6.12_44 = MEM [(char *)vectp_c.10_42]; _6 = *_5; vect_cnt.13_45 = VIEW_CONVERT_EXPR(vect_cnt_21.6_38); cnt.1_7 = (unsigned char) cnt_21; _48 = vect__3.9_41 == vect__6.12_44; vect__ifc__35.14_49 = VEC_COND_EXPR <_48, vect_cst__46, vect_cst__47>; _ifc__35 = _3 == _6 ? 1 : 0; vect__nop__36.15_50 = vect_cnt.13_45 + vect__ifc__35.14_49; _nop__36 = cnt.1_7 + _ifc__35; vect_cnt_9.16_51 = VIEW_CONVERT_EXPR(vect__nop__36.15_50); cnt_9 = (char) _nop__36; i_17 = i_22 + 1; ivtmp_18 = ivtmp_19 - 1; vectp_a.7_40 = vectp_a.7_39 + 32; vectp_c.10_43 = vectp_c.10_42 + 32; ivtmp_57 = ivtmp_56 + 1; if (ivtmp_57 < 2) goto ; [50.00%]
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 Richard Biener changed: What|Removed |Added Last reconfirmed||2021-01-05 Blocks||53947 Ever confirmed|0 |1 Keywords||missed-optimization Status|UNCONFIRMED |NEW --- Comment #3 from Richard Biener --- The issue is that we hit /* If this isn't a nested cycle or if the nested cycle reduction value is used ouside of the inner loop we cannot handle uses of the reduction value. */ if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "reduction used in loop.\n"); return NULL; } because cnt_21 is used in both the update and the COND_EXPR. The reduction doesn't fit the cond reductions we support but is a blend of a cond and regular reduction. Making the COND-reduction support handle this case should be possible though. Using 'int' we arrive at handled IL: # cnt_19 = PHI _ifc__32 = _4 == _7 ? 1 : 0; cnt_8 = cnt_19 + _ifc__32; so adjusting if-conversion can indeed help. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #2 from Hongtao.liu --- > cnt.1_7 = (unsigned char) cnt_21; > _8 = cnt.1_7 + 1; > cnt_16 = (char) _8; > cnt_9 = _3 == _6 ? cnt_16 : cnt_21; > In tree_if_conversion, there's is_cond_scalar_reduction, i'm think to extend the currect implementation to reduce bellow loop-header: cnt_21 = PHI <0, cnt_9> ... if (cond_expr) tmp1 = (unsigned type) cnt_21 tmp2 = tmp1 +/- rhs2 cnt_16 = (signed type) tmp2 cnt_9 = PHI to cnt_9 = PHI <0, cnt_21> tmp1 = (unsigned type)cnt_9; ifcvt = cond_expr ? rhs2 : 0 tmp2 = tmp1 +/- ifcvt; cnt_21 = (signed type)tmp2; I hope vectorizer reduction can handle the upper sequence.
[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365 --- Comment #1 from Hongtao.liu --- > Shouldn't cnt_21 = PHI , stmt relevant? > for stmt: cnt.1_7 = (unsigned char) cnt_21, the operand is defined by a previous iteration of the loop which is assumed to be handled in induction/reduction. But vect_analyze_scalar_cycles can't get reduction of cnt as (cnt_9 = _3 == _6 ? cnt_16 : cnt_21;_ since scalar evolution only handle - an SSA_NAME, - an INTEGER_CST, - a PLUS_EXPR, - a POINTER_PLUS_EXPR, - a MINUS_EXPR, - an ASSERT_EXPR, - other cases are not yet handled. */