[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-09-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-05-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

Hongtao.liu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Hongtao.liu  ---
Fixed in GCC12.

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-05-31 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #7 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:28daadc98094501175c9dfe4a985871fa6aa4f94

commit r12-1138-g28daadc98094501175c9dfe4a985871fa6aa4f94
Author: liuhongt 
Date:   Wed Jan 6 16:33:27 2021 +0800

Extend is_cond_scalar_reduction to handle nop_expr after/before scalar
reduction.[PR98365]

gcc/ChangeLog:

PR tree-optimization/98365
* tree-if-conv.c (strip_nop_cond_scalar_reduction): New function.
(is_cond_scalar_reduction): Handle nop_expr in cond scalar
reduction.
(convert_scalar_cond_reduction): Ditto.
(predicate_scalar_phi): Ditto.

gcc/testsuite/ChangeLog:

PR tree-optimization/98365
* gcc.target/i386/pr98365.c: New test.

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #6 from Hongtao.liu  ---
Created attachment 49897
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49897&action=edit
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}

Waiting for GCC12 stage1.

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #5 from Hongtao.liu  ---
> 
> And successully vectorized.
> 

Also vectorized loop with cnt defined as signed short.
.i.e
int foo (short a[64], short c[64])
{
  int i;
  short cnt=0;
  for (int i = 0;i != 64; i++)
if (a[i] == c[i])
  cnt++;
  return cnt;
}

Since signed integer overflow is undefined in C++ and C, gcc would always
convert signed char/short to unsigned char/short to avoid UB, and ifcvt should
be able to "properly" hanlde those nop conversion.

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #4 from Hongtao.liu  ---
> I hope vectorizer reduction can handle the upper sequence.

After hacked in ifcvt, got

.165.cvt

   [local count: 1057206201]:
  # cnt_21 = PHI 
  # i_22 = PHI 
  # ivtmp_19 = PHI 
  _1 = (sizetype) i_22;
  _2 = a_14(D) + _1;
  _3 = *_2;
  _5 = c_15(D) + _1;
  _6 = *_5;
  cnt.1_7 = (unsigned char) cnt_21;
  _ifc__35 = _3 == _6 ? 1 : 0;
  _nop__36 = cnt.1_7 + _ifc__35;
  cnt_9 = (char) _nop__36;
  i_17 = i_22 + 1;
  ivtmp_18 = ivtmp_19 - 1;
  if (ivtmp_18 != 0)
goto ; [98.44%]
---

And successully vectorized.

.166t.vect
--
   [local count: 33071249]:
  # cnt_21 = PHI 
  # i_22 = PHI 
  # ivtmp_19 = PHI 
  # vect_cnt_21.6_38 = PHI 
  # vectp_a.7_39 = PHI 
  # vectp_c.10_42 = PHI 
  # ivtmp_56 = PHI 
  _1 = (sizetype) i_22;
  _2 = a_14(D) + _1;
  vect__3.9_41 = MEM  [(char *)vectp_a.7_39];
  _3 = *_2;
  _5 = c_15(D) + _1;
  vect__6.12_44 = MEM  [(char *)vectp_c.10_42];
  _6 = *_5;
  vect_cnt.13_45 = VIEW_CONVERT_EXPR(vect_cnt_21.6_38);
  cnt.1_7 = (unsigned char) cnt_21;
  _48 = vect__3.9_41 == vect__6.12_44;
  vect__ifc__35.14_49 = VEC_COND_EXPR <_48, vect_cst__46, vect_cst__47>;
  _ifc__35 = _3 == _6 ? 1 : 0;
  vect__nop__36.15_50 = vect_cnt.13_45 + vect__ifc__35.14_49;
  _nop__36 = cnt.1_7 + _ifc__35;
  vect_cnt_9.16_51 = VIEW_CONVERT_EXPR(vect__nop__36.15_50);
  cnt_9 = (char) _nop__36;
  i_17 = i_22 + 1;
  ivtmp_18 = ivtmp_19 - 1;
  vectp_a.7_40 = vectp_a.7_39 + 32;
  vectp_c.10_43 = vectp_c.10_42 + 32;
  ivtmp_57 = ivtmp_56 + 1;
  if (ivtmp_57 < 2)
goto ; [50.00%]

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-01-05
 Blocks||53947
 Ever confirmed|0   |1
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #3 from Richard Biener  ---
The issue is that we hit

  /* If this isn't a nested cycle or if the nested cycle reduction value
 is used ouside of the inner loop we cannot handle uses of the reduction
 value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "reduction used in loop.\n");
  return NULL;
}

because cnt_21 is used in both the update and the COND_EXPR.  The reduction
doesn't fit the cond reductions we support but is a blend of a cond and
regular reduction.  Making the COND-reduction support handle this case should
be possible though.

Using 'int' we arrive at handled IL:

  # cnt_19 = PHI 
  _ifc__32 = _4 == _7 ? 1 : 0;
  cnt_8 = cnt_19 + _ifc__32;

so adjusting if-conversion can indeed help.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-04 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #2 from Hongtao.liu  ---

>   cnt.1_7 = (unsigned char) cnt_21;
>   _8 = cnt.1_7 + 1;
>   cnt_16 = (char) _8;
>   cnt_9 = _3 == _6 ? cnt_16 : cnt_21;
>  

In tree_if_conversion, there's is_cond_scalar_reduction, i'm think to extend
the currect implementation to reduce bellow

  loop-header:
cnt_21 = PHI <0, cnt_9>
  ...
if (cond_expr)
  tmp1 = (unsigned type) cnt_21
  tmp2 = tmp1 +/- rhs2
  cnt_16 = (signed type) tmp2
cnt_9 = PHI 

to 
 cnt_9 = PHI <0, cnt_21>
 tmp1 = (unsigned type)cnt_9;
 ifcvt = cond_expr ? rhs2 : 0
 tmp2 = tmp1 +/- ifcvt;
 cnt_21 = (signed type)tmp2;

I hope vectorizer reduction can handle the upper sequence.

[Bug tree-optimization/98365] Miss vectoization for signed char ifcvt

2021-01-04 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98365

--- Comment #1 from Hongtao.liu  ---

> Shouldn't cnt_21 = PHI , stmt relevant?
> 

for stmt: cnt.1_7 = (unsigned char) cnt_21, the operand is defined by a
previous iteration of the loop which is assumed to be handled in
induction/reduction.

But vect_analyze_scalar_cycles can't get reduction of cnt as (cnt_9 = _3 == _6
? cnt_16 : cnt_21;_ since scalar evolution only handle
 - an SSA_NAME,
 - an INTEGER_CST,
 - a PLUS_EXPR,
 - a POINTER_PLUS_EXPR,
 - a MINUS_EXPR,
 - an ASSERT_EXPR,
 - other cases are not yet handled.  */