[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 --- Comment #9 from Tamar Christina --- (In reply to Andrew Pinski from comment #8) > This might be the path splitting running on the gimple level causing issues > too; see PR 112402 . Ah that's a good shout. It looks like Richi already

[Bug target/114577] New: Inefficient codegen for SVE/NEON bridge

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114577 Bug ID: 114577 Summary: Inefficient codegen for SVE/NEON bridge Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114510] [14 Regression] missed proping of multiply by 2 into address of load/stores

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114510 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #20 from Tamar Christina --- This is a bad interaction with early break and peeling for gaps. when peeling for gaps we set bias_for_lowest to 0, which then negates the ceil for the upper bound calculation when the div is exact. We

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Summary|[14 Regression]

[Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061 Bug ID: 114061 Summary: GCC fails vectorization when using __builtin_prefetch Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity:

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 --- Comment #5 from Tamar Christina --- (In reply to Sam James from comment #3) > (In reply to Richard Earnshaw from comment #2) > I'm missing why the combination then works though? So we've made several changes here over time. -mcpu=native

[Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061 --- Comment #2 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I thought there was already one recorded about this. I could only find https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103938 about an ICE when prefetching a

[Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061 --- Comment #4 from Tamar Christina --- (In reply to Andrew Pinski from comment #3) > Confirmed. > > Though maybe we should drop them in the vectorized version of the loop. HW > prefetchers usually do a decent job and sometimes (maybe most) SW

[Bug target/114063] New: Use IFN_CHECK_RAW_PTRS/IFN_CHECK_WAR_PTRS for Advanced. SIMD

2024-02-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114063 Bug ID: 114063 Summary: Use IFN_CHECK_RAW_PTRS/IFN_CHECK_WAR_PTRS for Advanced. SIMD Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords:

[Bug tree-optimization/114068] [14 regression] ICE when building darktable-4.6.1 (error: PHI node with wrong VUSE on edge from BB 25) since r14-8768

2024-02-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114068 --- Comment #12 from Tamar Christina --- looks like the moving of the store didn't update a stray out of block use of the MEM. working on patch.

[Bug target/102171] vget_low_*/vget_high_* intrinsics should become BIT_FIELD_REF during gimple

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > I think I am going to implement this (or assign it interally to someone else > to implement). If you do, please also remove them from arm_neon.h and use the

[Bug tree-optimization/86530] Vectorization failure for a simple loop

2024-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530 --- Comment #8 from Tamar Christina --- (In reply to Andrew Pinski from comment #6) > With my patch for V4QI, we still don't get the best code: > vect_perm_even_271 = VEC_PERM_EXPR 4, 6 }>; > vect_perm_even_273 = VEC_PERM_EXPR 4, 6 }>; >

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug tree-optimization/114099] [14 regression] ICE in find_uses_to_rename_use when building darktable-4.6.1

2024-02-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114099 --- Comment #8 from Tamar Christina --- Created attachment 57537 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57537=edit uses.patch new code seems sensitive to visitation order as get_virtual_phi returns NULL for blocks which don't

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #27 from Tamar Christina --- Created attachment 57538 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57538=edit proposed1.patch proposed patch, this gets the gathers and scatters back. doing regression run.

[Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction

2024-03-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #9 from Tamar Christina --- While RA should be able to deal with this, shouldn't we also just lower TBLs in gimple? This no reason why this can't be a VEC_PERM_EXPR which would also get the copies removed at the gimple level and

[Bug tree-optimization/114151] New: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 Bug ID: 114151 Summary: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b Product: gcc

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #3 from Tamar Christina --- > > This was a correctness fix btw, so I'm not sure we can easily recover - we > could try using niter information for CHREC_VARIABLE but then there's > variable niter here so I don't see a chance. >

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #11 from Tamar Christina --- (In reply to Andrew Pinski from comment #10) > (In reply to Tamar Christina from comment #9) > > While RA should be able to deal with this, > > shouldn't we also just lower TBLs in gimple? > > > > This

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #12 from Tamar Christina --- and it's not the first time we have conditional lowering. We already do so for e.g. shifts, where shifting by an amount => bitsize of a vector element is defined behavior or AArch64.

[Bug tree-optimization/114234] [14 Regression] verify_ssa failure with early-break vectorisation

2024-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114234 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-03-05

[Bug tree-optimization/114068] [14 regression] ICE when building darktable-4.6.1 (error: PHI node with wrong VUSE on edge from BB 25) since r14-8768

2024-02-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114068 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/114068] [14 regression] ICE when building darktable-4.6.1 (error: PHI node with wrong VUSE on edge from BB 25) since r14-8768

2024-02-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114068 --- Comment #13 from Tamar Christina --- Created attachment 57510 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57510=edit candidate-patch1.patch candidate patch being tested. I was hoping to correct it during peeling itself when the

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #3 from Tamar Christina --- That makes sense, though I also wonder how it works for scalar multi exit loops, IVops has various checks on single exits. I guess one problem is that the code in IVops that does this uses the exit to

[Bug tree-optimization/115130] New: (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug ID: 115130 Summary: (early-break) [meta-bug] early break vectorization Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: meta-bug, missed-optimization

[Bug tree-optimization/115130] (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #6 from Tamar Christina --- (In reply to Jakub Jelinek from comment #4) > Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int > rather than constant. One possibility would be to use VLA arrays in those >

[Bug tree-optimization/114635] New: OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #9 from Tamar Christina --- (In reply to prathamesh3492 from comment #8) > Hi Tamar, > Using -falign-loops=5 indeed brings back the performance. > The adrp instruction has same address (0x4ae784) by setting -falign-loops=5 > (which

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #11 from Tamar Christina --- (In reply to Richard Biener from comment #10) > I think the question is why IVOPTs ends up using both the signed and > unsigned variant of the same IV instead of expressing all uses of both with > one

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013 Tamar Christina changed: What|Removed |Added Blocks||115130 --- Comment #4 from Tamar

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #13 from Tamar Christina --- (In reply to rguent...@suse.de from comment #12) > > since we don't care about overflow here, it looks like the stripping should > > be recursive as long as it's a NOP expression between two integral

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #15 from Tamar Christina --- (In reply to rguent...@suse.de from comment #14) > On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 > > > > --- Comment #13 from Tamar

[Bug target/115464] [14 Backport] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #10 from Tamar Christina --- Thanks for the fix, but I don't think it's sufficient. what I meant with the earlier comment was that the subregs are broken in general, so not just the one generated by the undef fast path. i.e.

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect PR 20999 would fix this ... > but we have to be careful since without masked stores, you could still > vectorize this unlike the transformed

[Bug tree-optimization/115531] New: vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Bug ID: 115531 Summary: vectorizer generates inefficient code for masked conditional update loops Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords:

[Bug tree-optimization/115534] New: intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 Bug ID: 115534 Summary: intermediate stack use not eliminated Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #2 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect there is a dup of this already. See the bug which I made this one > blocking for a list of related bugs. Most of the other bugs relate to the

[Bug tree-optimization/115537] New: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 Bug ID: 115537 Summary: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d Product: gcc Version: 15.0 Status: UNCONFIRMED

[Bug tree-optimization/115537] [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 --- Comment #5 from Tamar Christina --- Thanks for the fix! I think the testcase needs SVE enabled to ICE no? shouldn't that be -mcpu=neoverse-v1 and not -mcpu=neoverse-n1?

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #5 from Tamar Christina --- (In reply to Andrew Pinski from comment #4) > This might be improved by > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it > might be the case the vectorizer case needs to be

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #9 from Tamar Christina --- It's taken me a bit of time to track down all the reasons for the speedup with the earlier patch. This comes from two parts: 1. Signed IVs don't get simplified. Due to possible UB with signed overflows

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-06-12 CC|

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #7 from Tamar Christina --- (In reply to Tamar Christina from comment #6) > (In reply to Richard Sandiford from comment #5) > > In this kind of situation, we should go through a fresh pseudo rather than > > try to take the subreg

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #6 from Tamar Christina --- (In reply to Richard Sandiford from comment #5) > In this kind of situation, we should go through a fresh pseudo rather than > try to take the subreg directly. I did try that but fwprop pushed it back

<    3   4   5   6   7   8