[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2023-08-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||needs-bisection
  Known to work||13.1.0

--- Comment #21 from Andrew Pinski  ---
Looks like this was fixed in GCC 13.

IR in GCC 13:
```
  pretmp_55 = *_54;
  _33 = _3 != 0;
  _35 = pretmp_55 != 0;
  _42 = (long int) _35;
  _40 = pretmp_55 == 0;
  _39 = _33 & _40;
  _ifc__11 = _39 ? 1 : 0;
  _26 = ntf_29 + _ifc__11;
  _ifc__12 = _33 ? 0 : _42;
```

IR in GCC 12:
```
  pretmp_55 = *_54;
  _35 = pretmp_55 != 0;
  _42 = (long int) _35;
  _32 = _3 != 0;
  _41 = pretmp_55 == 0;
  _40 = _32 & _41;
  _ifc__11 = _40 ? 1 : 0;
  _26 = ntf_33 + _ifc__11;
  _ifc__12 = _3 != 0 ? 0 : _42;
```

I don't know if the changing of `_3 != 0` into `_33` fixed the issue or
something else.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2016-01-26 00:00:00 |2023-8-4

--- Comment #20 from Andrew Pinski  ---
Note one thing I noticed there is a slightly different IR between using the C
and C++ front-end. An extra cast when using the C front-end.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2021-07-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org
   Severity|normal  |enhancement

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-28 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #19 from amker at gcc dot gnu.org ---

  :
  # i_27 = PHI <0(3), i_21(5)>
  # n1_29 = PHI <0(3), n1_20(5)>
  # n2_28 = PHI <0(3), n2_34(5)>
  i.1_7 = (sizetype) i_27;
  _9 = u_8(D) + i.1_7;
  _11 = *_9;
  _13 = v_12(D) + i.1_7;
  _14 = *_13;
  _17 = v_12(D) + i.1_7;
  _18 = *_17;
  _31 = _18 != 0;
  _36 = (int) _31;
  _48 = (long int) _36;
  _45 = _11 != 0;
  _44 = _14 == 0;
  _43 = _44 & _45;
  _ifc__40 = _43 ? 1 : 0;
  n2_34 = n2_28 + _ifc__40;
  prephitmp_49 = _11 != 0 ? 0 : _48;
  n1_20 = n1_29 + prephitmp_49;
  i_21 = i_27 + 1;
  if (n_6(D) > i_21)
goto ;
  else
goto ;

  :
  goto ;

The difficult part is "prephitmp_49 = _11 != 0 ? 0 : _48;", which cannot be
simplified into:
  _a = 18 != 0;
  _b = _11 == 0;
  _c = _a && _b;
  prephitmp_49 = (long int)_c;

I don't know if there is easy fix to ifcvt so that this case can be vectorized
without vcond_mask...

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-28 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #18 from amker at gcc dot gnu.org ---
So the question is why if-conversion generates:
  _43 = _44 & _45;
  _ifc__40 = _43 ? 1 : 0;
  n2_34 = n2_28 + _ifc__40;

Not:
  _43 = _44 & _45;
  _XXX = (long int) _43;
  n2_34 = n2_28 + _XXX;

As in the good test.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-28 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #17 from amker at gcc dot gnu.org ---
The if-converted loop of the reported test is as:

  :
  # i_27 = PHI <0(3), i_21(5)>
  # n1_29 = PHI <0(3), n1_20(5)>
  # n2_28 = PHI <0(3), n2_34(5)>
  i.1_7 = (sizetype) i_27;
  _9 = u_8(D) + i.1_7;
  _11 = *_9;
  _13 = v_12(D) + i.1_7;
  _14 = *_13;
  _17 = v_12(D) + i.1_7;
  _18 = *_17;
  _31 = _18 != 0;
  _36 = (int) _31;
  _48 = (long int) _36;
  _45 = _11 != 0;
  _44 = _14 == 0;
  _43 = _44 & _45;
  _ifc__40 = _43 ? 1 : 0;
  n2_34 = n2_28 + _ifc__40;
  prephitmp_49 = _11 != 0 ? 0 : _48;
  n1_20 = n1_29 + prephitmp_49;
  i_21 = i_27 + 1;
  if (n_6(D) > i_21)
goto ;
  else
goto ;

  :
  goto ;

For stmt: _ifc__40 = _43 ? 1 : 0;  Its pattern_stmt is: patt_37 = patt_38 ? 1 :
0;
Function vectorizable_condition calls expand_vec_cond_expr_p(vectype,
comp_vectype) which we have:

(gdb) call debug_tree(vectype)
 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x7695f930 precision
64 min  max 
pointer_to_this >
V2DI
size  constant 128>
unit size  constant 16>
align 128 symtab 0 alias set -1 canonical type 0x76a5fd20 nunits 2>

(gdb) call debug_tree(comp_vectype)
 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x76a5fc78 precision
64 min  max >
V2DI
size  constant 128>
unit size  constant 16>
align 128 symtab 0 alias set -1 canonical type 0x76a5fe70 nunits 2>


In the end, GCC checks HAVE_vcond_mask_v2div2di which is defined as below on
i386:

#define HAVE_vcond_mask_v2div2di (TARGET_SSE4_2)

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-27 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #16 from amker at gcc dot gnu.org ---
(In reply to amker from comment #15)
> Also the case is reported not vectorized on Solaris/SPARC in PR70803.  I
> will investigate and follow up it in that ticket.
> 
> Thanks.

Hmm, that one is for PR56625.c not vectorized on sparc, so this is still a
different issue.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-26 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #15 from amker at gcc dot gnu.org ---
Also the case is reported not vectorized on Solaris/SPARC in PR70803.  I will
investigate and follow up it in that ticket.

Thanks.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread jtaylor.debian at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #14 from Julian Taylor  ---
I am on x86_64. It actually does vectorize with -mavx but not with -msse2.
The other variant of the loop I posted does vectorize with sse2.


$ gcc --version
gcc (GCC) 7.0.0 20160421 (experimental)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ cat test.c

double
yule_bool_distance_char2(const char *u, const char *v, long n)
{
long i;
long ntt = 0l, nff = 0l, nft = 0l, ntf = 0l;

for (i = 0l; i < n; i++) {
ntf += (u[i] && !v[i]);
nft += (!u[i] && v[i]);
}   
return (2.0 * ntf * nft);
}


$ gcc -O2 -ftree-vectorize test.c -c
#same with O3
$ objdump -d test.o

test.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   48 85 d2test   %rdx,%rdx
   3:   7e 69   jle6e 
   5:   55  push   %rbp
   6:   53  push   %rbx
   7:   45 31 d2xor%r10d,%r10d
   a:   45 31 dbxor%r11d,%r11d
   d:   31 c0   xor%eax,%eax
   f:   31 ed   xor%ebp,%ebp
  11:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
  18:   44 0f b6 0c 06  movzbl (%rsi,%rax,1),%r9d
  1d:   44 0f b6 04 07  movzbl (%rdi,%rax,1),%r8d
  22:   45 84 c9test   %r9b,%r9b
  25:   0f 94 c3sete   %bl
  28:   31 c9   xor%ecx,%ecx
  2a:   45 84 c0test   %r8b,%r8b
  2d:   0f 95 c1setne  %cl
  30:   48 21 d9and%rbx,%rcx
  33:   49 01 caadd%rcx,%r10
  36:   31 c9   xor%ecx,%ecx
  38:   45 84 c9test   %r9b,%r9b
  3b:   0f 95 c1setne  %cl
  3e:   45 84 c0test   %r8b,%r8b
  41:   48 0f 45 cd cmovne %rbp,%rcx
  45:   48 83 c0 01 add$0x1,%rax
  49:   49 01 cbadd%rcx,%r11
  4c:   48 39 c2cmp%rax,%rdx
  4f:   75 c7   jne18 
  51:   66 0f ef c0 pxor   %xmm0,%xmm0
  55:   66 0f ef c9 pxor   %xmm1,%xmm1
  59:   5b  pop%rbx
  5a:   f2 49 0f 2a c2  cvtsi2sd %r10,%xmm0
  5f:   f2 49 0f 2a cb  cvtsi2sd %r11,%xmm1
  64:   5d  pop%rbp
  65:   f2 0f 58 c0 addsd  %xmm0,%xmm0
  69:   f2 0f 59 c1 mulsd  %xmm1,%xmm0
  6d:   c3  retq   
  6e:   66 0f ef c0 pxor   %xmm0,%xmm0
  72:   c3  retq

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #13 from amker at gcc dot gnu.org ---
(In reply to Julian Taylor from comment #12)
> the testcase in this ticket is not yet vectorized with gcc 20160421 (r235341)

Hi Julian, may I ask which target?  It can be vectorized on x86_64 and AArch64
now.  Thanks.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread jtaylor.debian at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #12 from Julian Taylor  ---
the testcase in this ticket is not yet vectorized with gcc 20160421 (r235341)

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #11 from Andreas Schwab  ---
> Isn't that what was reported in PR70725 for its fix?  Does r235341 fix it?

Yes and yes, but r235252 didn't trigger it.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #10 from amker at gcc dot gnu.org ---
(In reply to Andreas Schwab from comment #7)
> The second commit triggers this ICE on ia64:
> 
> $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S
> ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’:
> ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in
> phi_convertible_by_degenerating_args, at tree-if-conv.c:605
>  fn1 ()
>  ^~~
> 0x41c26b3f phi_convertible_by_degenerating_args
> ../../gcc/tree-if-conv.c:605
> 0x41c2727f if_convertible_phi_p
> ../../gcc/tree-if-conv.c:662
> 0x41c3675f if_convertible_loop_p_1
> ../../gcc/tree-if-conv.c:1408
> 0x41c3700f if_convertible_loop_p
> ../../gcc/tree-if-conv.c:1466
> 0x41c374cf tree_if_conversion
> ../../gcc/tree-if-conv.c:2774
> 0x41c37d9f execute
> ../../gcc/tree-if-conv.c:2875

(In reply to rguent...@suse.de from comment #9)
> On Thu, 21 Apr 2016, sch...@linux-m68k.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489
> > 
> > --- Comment #7 from Andreas Schwab  ---
> > The second commit triggers this ICE on ia64:
> > 
> > $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S
> > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’:
> > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in
> > phi_convertible_by_degenerating_args, at tree-if-conv.c:605
> >  fn1 ()
> >  ^~~
> > 0x41c26b3f phi_convertible_by_degenerating_args
> > ../../gcc/tree-if-conv.c:605
> > 0x41c2727f if_convertible_phi_p
> > ../../gcc/tree-if-conv.c:662
> > 0x41c3675f if_convertible_loop_p_1
> > ../../gcc/tree-if-conv.c:1408
> > 0x41c3700f if_convertible_loop_p
> > ../../gcc/tree-if-conv.c:1466
> > 0x41c374cf tree_if_conversion
> > ../../gcc/tree-if-conv.c:2774
> > 0x41c37d9f execute
> > ../../gcc/tree-if-conv.c:2875
> 
> Isn't that what was reported in PR70725 for its fix?  Does r235341 fix it?

I will check this.  Also I have a following patch handling general cases in
which PHIs can be degenerated and have more than one arguments.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #8 from Andreas Schwab  ---
The same ICE also occurs on m68k and aarch64.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #9 from rguenther at suse dot de  ---
On Thu, 21 Apr 2016, sch...@linux-m68k.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489
> 
> --- Comment #7 from Andreas Schwab  ---
> The second commit triggers this ICE on ia64:
> 
> $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S
> ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’:
> ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in
> phi_convertible_by_degenerating_args, at tree-if-conv.c:605
>  fn1 ()
>  ^~~
> 0x41c26b3f phi_convertible_by_degenerating_args
> ../../gcc/tree-if-conv.c:605
> 0x41c2727f if_convertible_phi_p
> ../../gcc/tree-if-conv.c:662
> 0x41c3675f if_convertible_loop_p_1
> ../../gcc/tree-if-conv.c:1408
> 0x41c3700f if_convertible_loop_p
> ../../gcc/tree-if-conv.c:1466
> 0x41c374cf tree_if_conversion
> ../../gcc/tree-if-conv.c:2774
> 0x41c37d9f execute
> ../../gcc/tree-if-conv.c:2875

Isn't that what was reported in PR70725 for its fix?  Does r235341 fix it?

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-21 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #7 from Andreas Schwab  ---
The second commit triggers this ICE on ia64:

$ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S
../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’:
../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in
phi_convertible_by_degenerating_args, at tree-if-conv.c:605
 fn1 ()
 ^~~
0x41c26b3f phi_convertible_by_degenerating_args
../../gcc/tree-if-conv.c:605
0x41c2727f if_convertible_phi_p
../../gcc/tree-if-conv.c:662
0x41c3675f if_convertible_loop_p_1
../../gcc/tree-if-conv.c:1408
0x41c3700f if_convertible_loop_p
../../gcc/tree-if-conv.c:1466
0x41c374cf tree_if_conversion
../../gcc/tree-if-conv.c:2774
0x41c37d9f execute
../../gcc/tree-if-conv.c:2875

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-20 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #6 from amker at gcc dot gnu.org ---
Author: amker
Date: Wed Apr 20 15:57:01 2016
New Revision: 235292

URL: https://gcc.gnu.org/viewcvs?rev=235292=gcc=rev
Log:
PR tree-optimization/69489
* tree-if-conv.c (phi_convertible_by_degenerating_args): New.
(if_convertible_phi_p): Call phi_convertible_by_degenerating_args.
Revise dump message.
(if_convertible_bb_p): Remove check on edge count of basic block's
predecessors.

gcc/testsuite/ChangeLog
PR tree-optimization/69489
* gcc.dg/tree-ssa/ifc-pr69489-2.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-if-conv.c

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-04-20 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #5 from amker at gcc dot gnu.org ---
Author: amker
Date: Wed Apr 20 15:41:45 2016
New Revision: 235289

URL: https://gcc.gnu.org/viewcvs?rev=235289=gcc=rev
Log:
PR tree-optimization/56625
PR tree-optimization/69489
* tree-data-ref.h (DR_INNERMOST): New macro.
* tree-if-conv.c (innermost_loop_behavior_hash): New class for
hashing struct innermost_loop_behavior.
(ref_DR_map): Remove.
(innermost_DR_map): New map.
(baseref_DR_map): Revise comment.
(hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR
to innermost_DR_map accroding to its innermost loop behavior.
(ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map according
to its innermost loop behavior.
(if_convertible_loop_p_1): Remove intialization for ref_DR_map.
Add initialization for innermost_DR_map.  Record memory reference
in DR_BASE_ADDRESS if the reference is compound one or it doesn't
have innermost loop behavior.
(if_convertible_loop_p): Remove release for ref_DR_map.  Release
innermost_DR_map.

gcc/testsuite/ChangeLog
PR tree-optimization/56625
PR tree-optimization/69489
* gcc.dg/vect/pr56625.c: New test.
* gcc.dg/tree-ssa/ifc-pr69489-1.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c
trunk/gcc/testsuite/gcc.dg/vect/pr56625.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-data-ref.h
trunk/gcc/tree-if-conv.c

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-03-10 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

amker at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #4 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #3)
..
> simply trigger versioning if that happens.  We still run into
> 
> _14 = *_13;
> tree could trap...
> 
> then of course.  The fix for that is for ref_DR_map to hash/compare
> DR_BASE_ADDRESS, DR_OFFSET, DR_INIT and DR_STEP instead of a somewhat
> stripped DR_REF.

Yes, I encountered another case showing the exact same issue.  Testing a patch
to fix this one.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-01-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-01-26
 Blocks||53947
Summary|missed vectorization for|missed vectorization for
   |boolean loop|boolean loop, missed
   ||if-conversion
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Ok, so the trick here is to see that v[i] is always loaded (and thus it may
not be NULL or otherwise trapping) because either u[i] or !u[i] will be true.

This is a missed if-conversion opportunity.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-01-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

--- Comment #2 from Richard Biener  ---
And in the end it is also related to PR23286, the inability to hoist the v[i]
load out of

  if (u[i])
... = v[i];
  else
... = v[i];

which would also enable the if-conversion.

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2016-01-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||rguenth at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
For if-conversion this triggers the three-argument PHI case which is only
handled
for force_vectorize loops.  Similar to the masked-load-store case we should
simply trigger versioning if that happens.  We still run into

_14 = *_13;
tree could trap...

then of course.  The fix for that is for ref_DR_map to hash/compare
DR_BASE_ADDRESS, DR_OFFSET, DR_INIT and DR_STEP instead of a somewhat
stripped DR_REF.