[Bug tree-optimization/114322] New: [14 Regression] SCEV analysis failed for bases like A[(i+x)*stride] since r14-9193-ga0b1798042d033

2024-03-13 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114322 Bug ID: 114322 Summary: [14 Regression] SCEV analysis failed for bases like A[(i+x)*stride] since r14-9193-ga0b1798042d033 Product: gcc Version: 14.0 Status:

[Bug testsuite/113446] [14 Regression] gcc.dg/tree-ssa/scev-16.c FAILs

2024-01-18 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113446 --- Comment #6 from Hao Liu --- Hi Jakub, That's great. Thanks for the fix.

[Bug target/110625] [14 Regression][AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-12-30 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #26 from Hao Liu --- But for now, the patch should fix the regression.(In reply to Tamar Christina from comment #25) > Is still pretty inefficient due to all the extends. If we generate better > code here this may tip the scale

[Bug target/113089] New: [14 Regression][aarch64] ICE in process_uses_of_deleted_def, at rtl-ssa/changes.cc:252 since r14-6605-gc0911c6b357ba9

2023-12-19 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113089 Bug ID: 113089 Summary: [14 Regression][aarch64] ICE in process_uses_of_deleted_def, at rtl-ssa/changes.cc:252 since r14-6605-gc0911c6b357ba9 Product: gcc

[Bug tree-optimization/112774] New: Vectorize the loop by inferring nonwrapping information from arrays

2023-11-30 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112774 Bug ID: 112774 Summary: Vectorize the loop by inferring nonwrapping information from arrays Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #19 from Hao Liu --- > Hi, here's the reduced case Hi Tarmar, thanks for the case. I've modified it to reproduce the ICE without LTO and have updated the patch.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #17 from Hao Liu --- > Thanks! I can reduce a testcase for you if you want :) That will be very helpful. Thanks.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #15 from Hao Liu --- Ah, I see. I've sent out a quick fix patch for code review. I'll investigate more about this and find out the root cause.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-30 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #11 from Hao Liu --- Hi Richard, That's great! Glad to hear the status. Waiting for the patches to be ready and upstreamed to trunk.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-19 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #8 from Hao Liu --- Thanks for the explanation. Understood the root cause and that's reasonable. So, do you have plan to fix this (i.e. to separate the FP and integer types)? I want to enable the new costs for Ampere1, which is

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-18 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #6 from Hao Liu --- Thanks for the confirmation about the reduction latency. I'll create a simple patch to fix this. > Discounting the loads, we do have 15 general operations. That's true, and there are indeed 8 general

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-14 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #3 from Hao Liu --- Sorry, it seems this case can not be fixed by only adjusting the calculation of "reduction latency". Even it becomes smaller, the case still can not be vectorized as the "general operations" count is still too

[Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)

2023-07-14 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649 --- Comment #2 from Hao Liu --- Hi, I bisected the following 3 commits (sequantial): [v3] 3a61ca1b925 - Improve profile updates after loop-ch and cunroll (2023-07-06) [v2] d4c2e34deef - Improve scale_loop_profile (2023-07-06) [v1]

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-11 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #2 from Hao Liu --- To my understanding, "reduction latency" is the least number of cycles needed to do the reduction calculation for 1 iteration of loop. It is calcualted by the extra instruction issue-info of the new cost models

[Bug target/110625] New: [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-11 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 Bug ID: 110625 Summary: [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large Product: gcc Version: 14.0 Status:

[Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization

2023-07-05 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Hao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 Hao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #10 from Hao Liu --- > foo is just an example for not getting inlined, the point here is extra cost > paid. My point is that the case is different from the original case in tree-vect-loop.cc. For example, change the case as

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #7 from Hao Liu --- > int foo() { > bool a = true; > bool b; > if (a || b) > return 1; > b = true; > return 0; > } > > still has the warning, it looks something can be improved (guess we prefer > not to emit

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #5 from Hao Liu --- BTW, there is no warning is probably because the original code is too complicated and not inlined. Compile the simple case by "g++ -O3 -S -Wall hello.c": int foo(bool a) { bool b; if (a || b) return 1;

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #4 from Hao Liu --- > IMHO, the initialization with false is unnecessary and very likely it isn't > able to get optimized, it seems worse from this point of view. Sorry. I don't think so. See more at

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #2 from Hao Liu --- > Is the warning from some static analyzer? No. I just find it maybe a bug while looking at the code. > slp should be true always (always do analyze slp), it doesn't care what's in > slp_done_for_suggested_uf.

[Bug tree-optimization/110531] New: Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 Bug ID: 110531 Summary: Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc Product: gcc Version: 14.0 Status: UNCONFIRMED Severity:

[Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Bug ID: 110474 Summary: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug tree-optimization/110449] Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449 --- Comment #2 from Hao Liu --- That looks better than the currently generated code (it saves one "MOV" instruction). Yes, it has the loop-carried dependency advantage. But it still uses one more register for "8*step" (There may be a register

[Bug tree-optimization/110449] New: Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449 Bug ID: 110449 Summary: Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization Product: gcc Version: 14.0 Status:

[Bug tree-optimization/98598] New: Missed opportunity to optimize dependent loads in loops

2021-01-08 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 Bug ID: 98598 Summary: Missed opportunity to optimize dependent loads in loops Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-22 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #8 from Hao Liu --- Hi Nathan, The problem is related to use another make binary, which is 4.2.0 and built by ourselves. Maybe there is a strange bug. Anyway, after using the system installed make (which is 4.2.1 and under

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-22 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #7 from Hao Liu --- I found that: 1. "make -j1" can pass, but "make -j8" always fails. It seems something wrong with parallel build 2. When "make -j8" failed, if I try "make -j8" again, it can pass. > What happens if you cd into

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-21 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #5 from Hao Liu --- Hi Nanthan, We can still reprodcue this problem on CentOS 7 (X86) and CentOS 8.2 (AArch64). We use last GCC version of yesterday:108beb75da The configure and build commands are (Bash is used): $

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-21 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 Hao Liu changed: What|Removed |Added CC||hliu at amperecomputing dot com --- Comment