[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 Andrew Pinski changed: What|Removed |Added Keywords||needs-bisection Known to work||13.1.0 --- Comment #21 from Andrew Pinski --- Looks like this was fixed in GCC 13. IR in GCC 13: ``` pretmp_55 = *_54; _33 = _3 != 0; _35 = pretmp_55 != 0; _42 = (long int) _35; _40 = pretmp_55 == 0; _39 = _33 & _40; _ifc__11 = _39 ? 1 : 0; _26 = ntf_29 + _ifc__11; _ifc__12 = _33 ? 0 : _42; ``` IR in GCC 12: ``` pretmp_55 = *_54; _35 = pretmp_55 != 0; _42 = (long int) _35; _32 = _3 != 0; _41 = pretmp_55 == 0; _40 = _32 & _41; _ifc__11 = _40 ? 1 : 0; _26 = ntf_33 + _ifc__11; _ifc__12 = _3 != 0 ? 0 : _42; ``` I don't know if the changing of `_3 != 0` into `_33` fixed the issue or something else.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2016-01-26 00:00:00 |2023-8-4 --- Comment #20 from Andrew Pinski --- Note one thing I noticed there is a slightly different IR between using the C and C++ front-end. An extra cast when using the C front-end.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 Andrew Pinski changed: What|Removed |Added CC||pinskia at gcc dot gnu.org Severity|normal |enhancement
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #19 from amker at gcc dot gnu.org --- : # i_27 = PHI <0(3), i_21(5)> # n1_29 = PHI <0(3), n1_20(5)> # n2_28 = PHI <0(3), n2_34(5)> i.1_7 = (sizetype) i_27; _9 = u_8(D) + i.1_7; _11 = *_9; _13 = v_12(D) + i.1_7; _14 = *_13; _17 = v_12(D) + i.1_7; _18 = *_17; _31 = _18 != 0; _36 = (int) _31; _48 = (long int) _36; _45 = _11 != 0; _44 = _14 == 0; _43 = _44 & _45; _ifc__40 = _43 ? 1 : 0; n2_34 = n2_28 + _ifc__40; prephitmp_49 = _11 != 0 ? 0 : _48; n1_20 = n1_29 + prephitmp_49; i_21 = i_27 + 1; if (n_6(D) > i_21) goto ; else goto ; : goto ; The difficult part is "prephitmp_49 = _11 != 0 ? 0 : _48;", which cannot be simplified into: _a = 18 != 0; _b = _11 == 0; _c = _a && _b; prephitmp_49 = (long int)_c; I don't know if there is easy fix to ifcvt so that this case can be vectorized without vcond_mask...
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #18 from amker at gcc dot gnu.org --- So the question is why if-conversion generates: _43 = _44 & _45; _ifc__40 = _43 ? 1 : 0; n2_34 = n2_28 + _ifc__40; Not: _43 = _44 & _45; _XXX = (long int) _43; n2_34 = n2_28 + _XXX; As in the good test.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #17 from amker at gcc dot gnu.org --- The if-converted loop of the reported test is as: : # i_27 = PHI <0(3), i_21(5)> # n1_29 = PHI <0(3), n1_20(5)> # n2_28 = PHI <0(3), n2_34(5)> i.1_7 = (sizetype) i_27; _9 = u_8(D) + i.1_7; _11 = *_9; _13 = v_12(D) + i.1_7; _14 = *_13; _17 = v_12(D) + i.1_7; _18 = *_17; _31 = _18 != 0; _36 = (int) _31; _48 = (long int) _36; _45 = _11 != 0; _44 = _14 == 0; _43 = _44 & _45; _ifc__40 = _43 ? 1 : 0; n2_34 = n2_28 + _ifc__40; prephitmp_49 = _11 != 0 ? 0 : _48; n1_20 = n1_29 + prephitmp_49; i_21 = i_27 + 1; if (n_6(D) > i_21) goto ; else goto ; : goto ; For stmt: _ifc__40 = _43 ? 1 : 0; Its pattern_stmt is: patt_37 = patt_38 ? 1 : 0; Function vectorizable_condition calls expand_vec_cond_expr_p(vectype, comp_vectype) which we have: (gdb) call debug_tree(vectype) unit size align 64 symtab 0 alias set -1 canonical type 0x7695f930 precision 64 min max pointer_to_this > V2DI size constant 128> unit size constant 16> align 128 symtab 0 alias set -1 canonical type 0x76a5fd20 nunits 2> (gdb) call debug_tree(comp_vectype) unit size align 64 symtab 0 alias set -1 canonical type 0x76a5fc78 precision 64 min max > V2DI size constant 128> unit size constant 16> align 128 symtab 0 alias set -1 canonical type 0x76a5fe70 nunits 2> In the end, GCC checks HAVE_vcond_mask_v2div2di which is defined as below on i386: #define HAVE_vcond_mask_v2div2di (TARGET_SSE4_2)
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #16 from amker at gcc dot gnu.org --- (In reply to amker from comment #15) > Also the case is reported not vectorized on Solaris/SPARC in PR70803. I > will investigate and follow up it in that ticket. > > Thanks. Hmm, that one is for PR56625.c not vectorized on sparc, so this is still a different issue.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #15 from amker at gcc dot gnu.org --- Also the case is reported not vectorized on Solaris/SPARC in PR70803. I will investigate and follow up it in that ticket. Thanks.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #14 from Julian Taylor --- I am on x86_64. It actually does vectorize with -mavx but not with -msse2. The other variant of the loop I posted does vectorize with sse2. $ gcc --version gcc (GCC) 7.0.0 20160421 (experimental) Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cat test.c double yule_bool_distance_char2(const char *u, const char *v, long n) { long i; long ntt = 0l, nff = 0l, nft = 0l, ntf = 0l; for (i = 0l; i < n; i++) { ntf += (u[i] && !v[i]); nft += (!u[i] && v[i]); } return (2.0 * ntf * nft); } $ gcc -O2 -ftree-vectorize test.c -c #same with O3 $ objdump -d test.o test.o: file format elf64-x86-64 Disassembly of section .text: : 0: 48 85 d2test %rdx,%rdx 3: 7e 69 jle6e5: 55 push %rbp 6: 53 push %rbx 7: 45 31 d2xor%r10d,%r10d a: 45 31 dbxor%r11d,%r11d d: 31 c0 xor%eax,%eax f: 31 ed xor%ebp,%ebp 11: 0f 1f 80 00 00 00 00nopl 0x0(%rax) 18: 44 0f b6 0c 06 movzbl (%rsi,%rax,1),%r9d 1d: 44 0f b6 04 07 movzbl (%rdi,%rax,1),%r8d 22: 45 84 c9test %r9b,%r9b 25: 0f 94 c3sete %bl 28: 31 c9 xor%ecx,%ecx 2a: 45 84 c0test %r8b,%r8b 2d: 0f 95 c1setne %cl 30: 48 21 d9and%rbx,%rcx 33: 49 01 caadd%rcx,%r10 36: 31 c9 xor%ecx,%ecx 38: 45 84 c9test %r9b,%r9b 3b: 0f 95 c1setne %cl 3e: 45 84 c0test %r8b,%r8b 41: 48 0f 45 cd cmovne %rbp,%rcx 45: 48 83 c0 01 add$0x1,%rax 49: 49 01 cbadd%rcx,%r11 4c: 48 39 c2cmp%rax,%rdx 4f: 75 c7 jne18 51: 66 0f ef c0 pxor %xmm0,%xmm0 55: 66 0f ef c9 pxor %xmm1,%xmm1 59: 5b pop%rbx 5a: f2 49 0f 2a c2 cvtsi2sd %r10,%xmm0 5f: f2 49 0f 2a cb cvtsi2sd %r11,%xmm1 64: 5d pop%rbp 65: f2 0f 58 c0 addsd %xmm0,%xmm0 69: f2 0f 59 c1 mulsd %xmm1,%xmm0 6d: c3 retq 6e: 66 0f ef c0 pxor %xmm0,%xmm0 72: c3 retq
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #13 from amker at gcc dot gnu.org --- (In reply to Julian Taylor from comment #12) > the testcase in this ticket is not yet vectorized with gcc 20160421 (r235341) Hi Julian, may I ask which target? It can be vectorized on x86_64 and AArch64 now. Thanks.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #12 from Julian Taylor --- the testcase in this ticket is not yet vectorized with gcc 20160421 (r235341)
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #11 from Andreas Schwab--- > Isn't that what was reported in PR70725 for its fix? Does r235341 fix it? Yes and yes, but r235252 didn't trigger it.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #10 from amker at gcc dot gnu.org --- (In reply to Andreas Schwab from comment #7) > The second commit triggers this ICE on ia64: > > $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’: > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in > phi_convertible_by_degenerating_args, at tree-if-conv.c:605 > fn1 () > ^~~ > 0x41c26b3f phi_convertible_by_degenerating_args > ../../gcc/tree-if-conv.c:605 > 0x41c2727f if_convertible_phi_p > ../../gcc/tree-if-conv.c:662 > 0x41c3675f if_convertible_loop_p_1 > ../../gcc/tree-if-conv.c:1408 > 0x41c3700f if_convertible_loop_p > ../../gcc/tree-if-conv.c:1466 > 0x41c374cf tree_if_conversion > ../../gcc/tree-if-conv.c:2774 > 0x41c37d9f execute > ../../gcc/tree-if-conv.c:2875 (In reply to rguent...@suse.de from comment #9) > On Thu, 21 Apr 2016, sch...@linux-m68k.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 > > > > --- Comment #7 from Andreas Schwab--- > > The second commit triggers this ICE on ia64: > > > > $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S > > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’: > > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in > > phi_convertible_by_degenerating_args, at tree-if-conv.c:605 > > fn1 () > > ^~~ > > 0x41c26b3f phi_convertible_by_degenerating_args > > ../../gcc/tree-if-conv.c:605 > > 0x41c2727f if_convertible_phi_p > > ../../gcc/tree-if-conv.c:662 > > 0x41c3675f if_convertible_loop_p_1 > > ../../gcc/tree-if-conv.c:1408 > > 0x41c3700f if_convertible_loop_p > > ../../gcc/tree-if-conv.c:1466 > > 0x41c374cf tree_if_conversion > > ../../gcc/tree-if-conv.c:2774 > > 0x41c37d9f execute > > ../../gcc/tree-if-conv.c:2875 > > Isn't that what was reported in PR70725 for its fix? Does r235341 fix it? I will check this. Also I have a following patch handling general cases in which PHIs can be degenerated and have more than one arguments.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #8 from Andreas Schwab--- The same ICE also occurs on m68k and aarch64.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #9 from rguenther at suse dot de --- On Thu, 21 Apr 2016, sch...@linux-m68k.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 > > --- Comment #7 from Andreas Schwab--- > The second commit triggers this ICE on ia64: > > $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’: > ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in > phi_convertible_by_degenerating_args, at tree-if-conv.c:605 > fn1 () > ^~~ > 0x41c26b3f phi_convertible_by_degenerating_args > ../../gcc/tree-if-conv.c:605 > 0x41c2727f if_convertible_phi_p > ../../gcc/tree-if-conv.c:662 > 0x41c3675f if_convertible_loop_p_1 > ../../gcc/tree-if-conv.c:1408 > 0x41c3700f if_convertible_loop_p > ../../gcc/tree-if-conv.c:1466 > 0x41c374cf tree_if_conversion > ../../gcc/tree-if-conv.c:2774 > 0x41c37d9f execute > ../../gcc/tree-if-conv.c:2875 Isn't that what was reported in PR70725 for its fix? Does r235341 fix it?
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #7 from Andreas Schwab--- The second commit triggers this ICE on ia64: $ gcc/xgcc -Bgcc/ ../../gcc/gcc/testsuite/gcc.dg/pr70725.c -O3 -S ../../gcc/gcc/testsuite/gcc.dg/pr70725.c: In function ‘fn1’: ../../gcc/gcc/testsuite/gcc.dg/pr70725.c:13:1: internal compiler error: in phi_convertible_by_degenerating_args, at tree-if-conv.c:605 fn1 () ^~~ 0x41c26b3f phi_convertible_by_degenerating_args ../../gcc/tree-if-conv.c:605 0x41c2727f if_convertible_phi_p ../../gcc/tree-if-conv.c:662 0x41c3675f if_convertible_loop_p_1 ../../gcc/tree-if-conv.c:1408 0x41c3700f if_convertible_loop_p ../../gcc/tree-if-conv.c:1466 0x41c374cf tree_if_conversion ../../gcc/tree-if-conv.c:2774 0x41c37d9f execute ../../gcc/tree-if-conv.c:2875
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #6 from amker at gcc dot gnu.org --- Author: amker Date: Wed Apr 20 15:57:01 2016 New Revision: 235292 URL: https://gcc.gnu.org/viewcvs?rev=235292=gcc=rev Log: PR tree-optimization/69489 * tree-if-conv.c (phi_convertible_by_degenerating_args): New. (if_convertible_phi_p): Call phi_convertible_by_degenerating_args. Revise dump message. (if_convertible_bb_p): Remove check on edge count of basic block's predecessors. gcc/testsuite/ChangeLog PR tree-optimization/69489 * gcc.dg/tree-ssa/ifc-pr69489-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #5 from amker at gcc dot gnu.org --- Author: amker Date: Wed Apr 20 15:41:45 2016 New Revision: 235289 URL: https://gcc.gnu.org/viewcvs?rev=235289=gcc=rev Log: PR tree-optimization/56625 PR tree-optimization/69489 * tree-data-ref.h (DR_INNERMOST): New macro. * tree-if-conv.c (innermost_loop_behavior_hash): New class for hashing struct innermost_loop_behavior. (ref_DR_map): Remove. (innermost_DR_map): New map. (baseref_DR_map): Revise comment. (hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR to innermost_DR_map accroding to its innermost loop behavior. (ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map according to its innermost loop behavior. (if_convertible_loop_p_1): Remove intialization for ref_DR_map. Add initialization for innermost_DR_map. Record memory reference in DR_BASE_ADDRESS if the reference is compound one or it doesn't have innermost loop behavior. (if_convertible_loop_p): Remove release for ref_DR_map. Release innermost_DR_map. gcc/testsuite/ChangeLog PR tree-optimization/56625 PR tree-optimization/69489 * gcc.dg/vect/pr56625.c: New test. * gcc.dg/tree-ssa/ifc-pr69489-1.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c trunk/gcc/testsuite/gcc.dg/vect/pr56625.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-data-ref.h trunk/gcc/tree-if-conv.c
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 amker at gcc dot gnu.org changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #4 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) .. > simply trigger versioning if that happens. We still run into > > _14 = *_13; > tree could trap... > > then of course. The fix for that is for ref_DR_map to hash/compare > DR_BASE_ADDRESS, DR_OFFSET, DR_INIT and DR_STEP instead of a somewhat > stripped DR_REF. Yes, I encountered another case showing the exact same issue. Testing a patch to fix this one.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-01-26 Blocks||53947 Summary|missed vectorization for|missed vectorization for |boolean loop|boolean loop, missed ||if-conversion Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Ok, so the trick here is to see that v[i] is always loaded (and thus it may not be NULL or otherwise trapping) because either u[i] or !u[i] will be true. This is a missed if-conversion opportunity. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 --- Comment #2 from Richard Biener --- And in the end it is also related to PR23286, the inability to hoist the v[i] load out of if (u[i]) ... = v[i]; else ... = v[i]; which would also enable the if-conversion.
[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489 Richard Biener changed: What|Removed |Added Keywords||missed-optimization CC||rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- For if-conversion this triggers the three-argument PHI case which is only handled for force_vectorize loops. Similar to the masked-load-store case we should simply trigger versioning if that happens. We still run into _14 = *_13; tree could trap... then of course. The fix for that is for ref_DR_map to hash/compare DR_BASE_ADDRESS, DR_OFFSET, DR_INIT and DR_STEP instead of a somewhat stripped DR_REF.