[Bug tree-optimization/102440] New: Uinteger Opt/Param but the underlying type is signed

2021-09-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- The UInteger type in Opt/Param declaration can easily confuse people that the variable for this option/parameter is unsigned. But actually the internal

[Bug target/102347] "fatal error: target specific builtin not available" with MMA and LTO

2021-09-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347 --- Comment #4 from Kewen Lin --- I found i386 port seems doesn't have this issue. #include #include typedef union { __m128 x; float a[4]; } union128; #pragma GCC target("sse") int main() { union128 u; __m128 a = _mm_set_ps (24.43,

[Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization

2021-09-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #4

[Bug target/102347] "fatal error: target specific builtin not available" with MMA and LTO

2021-09-16 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347 --- Comment #3 from Kewen Lin --- This seems not a target specific issue. I noticed the target_option tree node is created expectedly when seeing target pragma, it explains why it works well without lto. When lto does streaming out, it does stre

[Bug lto/102347] "fatal error: target specific builtin not available" with MMA and LTO

2021-09-15 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-15 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #23 from Kewen Lin --- (In reply to Chip Kerchner from comment #22) > (In reply to Chip Kerchner from comment #21) - Forgot one line of code > > -- > > #pragma GCC target "cpu=power10" > > int main() { > > float

[Bug tree-optimization/102054] slightly worse code as PRE on some code got disabled for loop vectorization

2021-09-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054 --- Comment #2 from Kewen Lin --- Yet another reduced test case from 526.blender_r. #include typedef struct QMCSampler { struct QMCSampler *next, *prev; int type; int tot; int used; double *samp2d; double offs[1][2]; } QMCSampler;

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #20 from Kewen Lin --- Thanks for the detailed explanation, Mike! The fusion related flags have been considered in the posted patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. One RFC/Patch https://gcc.g

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #18 from Kewen Lin --- (In reply to Martin Liška from comment #16) > > > > Thanks for the example, it looks useful! Now the field fp_expressions is > > generic, one target specific summary class seems required then. And not sure > >

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #17 from Kewen Lin --- Created attachment 51357 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51357&action=edit Fix some issues in rs6000_can_inline_p As Martin pointed out, currently function rs6000_can_inline_p just returns

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #15 from Kewen Lin --- (In reply to Florian Weimer from comment #12) > (In reply to Richard Biener from comment #10) > > As of HTM it would make the testcase a user error - when using -mcpu=power10 > > it would require building with

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #14 from Kewen Lin --- (In reply to Richard Biener from comment #11) > Note that x86 uses for example > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath >/* If the calle doesn't use FP expressions di

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #13 from Kewen Lin --- (In reply to Richard Biener from comment #10) > OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining, > no? > > As of HTM it would make the testcase a user error - when using -mcpu=power

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #9 from Kewen Lin --- One more reduced test case: fail cmd: gcc -c -O2 -flto -mcpu=power8 pass cmd: gcc -c -O2 -flto -mcpu=power8 -mno-htm -mno-power8-fusion -- __attribute__((always_inline)) int foo(int *b) {

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org

[Bug c/102062] powerpc suboptimal unrolling simple array sum

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #8

[Bug tree-optimization/102054] slightly worse code as PRE on some code got disabled for loop vectorization

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054 Kewen Lin changed: What|Removed |Added CC||crazylht at gmail dot com,

[Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- This is a test case reduced from SPEC2017 bmk 541.leela_r source FastBoard.cpp, when I was investigating the O2

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #5 from Kewen Lin --- (In reply to Richard Biener from comment #3) > On x86 we even have > > Vector cost: 136 > Scalar cost: 196 > > note that we seem to vectorize the reduction but that only happens with > -ffast-math, not -O2

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #2 from Kewen Lin --- Back to the optimized IR, I thought the problem is that the vectorized version has longer critical path for the reduc_plus result (latency in total). For vectorized version, _51 = diffa_41(D) * 1.

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #1 from Kewen Lin --- The original costing shows the vectorized version wins, by checking the costings, it missed to model the cost of lane extraction, the patch was posted in: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/57

[Bug tree-optimization/101944] New: suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- For SPEC2017 bmk 508.namd_r, it's observed that it degraded by -3.73% at -O2 -ftree-slp-vectorize vs baseline -O2 on Power9 with either default cost model or

[Bug middle-end/101596] vect_recog_mulhs_pattern could use incorrect precision to check shift count

2021-07-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug middle-end/101596] vect_recog_mulhs_pattern could use incorrect precision to check shift count

2021-07-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596 --- Comment #3 from Kewen Lin --- Formal patch has been posted at https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576071.html

[Bug middle-end/101596] vect_recog_mulhs_pattern could use incorrect precision to check shift count

2021-07-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596 --- Comment #2 from Kewen Lin --- Created attachment 51200 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51200&action=edit Untested patch Still need test cases to be added.

[Bug middle-end/101596] vect_recog_mulhs_pattern could use incorrect precision to check shift count

2021-07-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |linkw at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #1 from Kewen Lin --- I have a untested patch.

[Bug middle-end/101596] New: vect_recog_mulhs_pattern could use incorrect precision to check shift count

2021-07-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- I happened to spot this when I was working to add one new pattern for Power10 divide extended. Now

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2021-07-19 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 100696, which changed state. Bug 100696 Summary: mult_higpart is not vectorized https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696 What|Removed |Added --

[Bug tree-optimization/100696] mult_higpart is not vectorized

2021-07-19 Thread linkw at gcc dot gnu.org via Gcc-bugs
|--- |FIXED Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org CC||linkw at gcc dot gnu.org --- Comment #4 from Kewen Lin --- Should be fixed on trunk.

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-07-06 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/101291] turns infinite loop into finite

2021-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291 --- Comment #2 from Kewen Lin --- (In reply to Kewen Lin from comment #1) > Hi Jeff, what's the option and stanza? The reason why I asked is that I can't simply reproduce it locally at O2, with C compiler it likely runs forever. I guess what y

[Bug tree-optimization/101291] turns infinite loop into finite

2021-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #1

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #8 from Kewen Lin --- (In reply to rsand...@gcc.gnu.org from comment #7) > (In reply to Kewen Lin from comment #6) > > Created attachment 51066 [details] > > aarch64 XPASS failure list > > > > The patch v3 bootstrapped and regressio

[Bug target/101235] [11/12 Regression] Fails to bootstrap with binutils 2.32

2021-06-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101235 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/101235] [11/12 Regression] Fails to bootstrap with binutils 2.32

2021-06-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101235 --- Comment #3 from Kewen Lin --- Will backport the fix after 2021 July 7th (two weeks since it's into trunk) if this isn't urgent meanwhile got the backport approval.

[Bug target/101235] [11/12 Regression] Fails to bootstrap with binutils 2.32

2021-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
|1 CC||linkw at gcc dot gnu.org, ||segher at gcc dot gnu.org Last reconfirmed||2021-06-28 --- Comment #2 from Kewen Lin --- Fixed with r12-1738 on trunk, need

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #6 from Kewen Lin --- Created attachment 51066 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51066&action=edit aarch64 XPASS failure list The patch v3 bootstrapped and regression-tested on x86_64-redhat-linux and powerpc64le-

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #5 from Kewen Lin --- Created attachment 51065 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51065&action=edit ira: Consider matching constraint heavily with some parameter v3 The mentioned only one aarch64-linux-gnu "PASS->F

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-06-24 Thread linkw at gcc dot gnu.org via Gcc-bugs
gcc dot gnu.org |linkw at gcc dot gnu.org Last reconfirmed||2021-06-24 Ever confirmed|0 |1 --- Comment #4 from Kewen Lin --- Created attachment 51059 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51059&action=e

[Bug rtl-optimization/100328] IRA doesn't model matching constraint well

2021-06-24 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #3 from Kewen Lin --- (In reply to Vladimir Makarov from comment #2) > (In reply to Kewen Lin from comment #1) > > Created attachment 50715 [details] > > ira:consider matching cstr in all alternatives > > > > With little understandi

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-06-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org Ever

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #9 from Kewen Lin --- (In reply to rguent...@suse.de from comment #5) > On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > > > --- C

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #8 from Kewen Lin --- Created attachment 50896 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50896&action=edit M1 M2 SPEC2017 P9 eval result

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #7 from Kewen Lin --- Created attachment 50895 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50895&action=edit Method 2, let pre generate loop carried dependence for very cheap and cheap cost model.

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #6 from Kewen Lin --- Created attachment 50894 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50894&action=edit Method 1, implicitly enable pcom without unrolling once loop vectorization is enabled but pcom isn't set explicitly

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #4 from Kewen Lin --- (In reply to rguent...@suse.de from comment #3) > On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > > > --- C

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

2021-05-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 --- Comment #2 from Kewen Lin --- (In reply to Richard Biener from comment #1) Thanks for the comments! > There's predictive commoning which can do similar transforms and runs after > vectorization. It might be it doesn't handle these "simple

[Bug tree-optimization/99398] Miss to optimize vector permutation fed by CTOR and CTOR/CST

2021-05-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/100794] New: suboptimal code due to missing pre2 when vectorization fails

2021-05-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- I was investigating one degradation from SPEC2017 554.roms_r on Power9, the baseline is -O2 -mcpu=power9 -ffast-math while the

[Bug rtl-optimization/100328] IRA doesn't model dup num constraint well

2021-04-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #1 from Kewen Lin --- Created attachment 50715 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50715&action=edit ira:consider matching cstr in all alternatives With little understanding on ira, I am not quite sure this patch is

[Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well

2021-04-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- source: function LBM_performStreamCollideTRT in SPEC2017 519.lbm_r This issue was exposed by O2 vectorization enablement evaluation on 519.lbm_r. baseline option

[Bug tree-optimization/99398] Miss to optimize vector permutation fed by CTOR and CTOR/CST

2021-03-07 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398 --- Comment #2 from Kewen Lin --- Created attachment 50329 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50329&action=edit tested patch

[Bug tree-optimization/99398] Miss to optimize vector permutation fed by CTOR and CTOR/CST

2021-03-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0

[Bug tree-optimization/99398] New: Miss to optimize vector permutation fed by CTOR and CTOR/CST

2021-03-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- #include "altivec.h" vector long long foo(long long a, long long b) { vector long long v1 = {a, 0}; vector long lo

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #8 from Kewen Lin --- Created attachment 49942 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49942&action=edit vectorized with altivec built-in functions

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #7 from Kewen Lin --- (In reply to Richard Biener from comment #6) > Starting from the loads is not how SLP discovery works so there will be > zero re-use of code. Sure - the only important thing is you end up > with a valid SLP grap

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #5 from Kewen Lin --- (In reply to Kewen Lin from comment #4) > One rough idea seems: > 1) Relax this condition all_uniform_p somehow to get SLP instance building > to go deeper and get those p1/p2 loads as SLP nodes. > 2) Introdu

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #4 from Kewen Lin --- (In reply to Kewen Lin from comment #3) > > IIUC, in current implementation, we get four grouped stores: > { tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] } /i=0,1,2,3/ independently > > When all these tryings

[Bug c/89126] missing -Wtype-limits for int variables

2021-01-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89126 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #4

[Bug tree-optimization/98464] [11 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_nop_conversion_p, at tree.c:12825 by r11-4637

2021-01-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 --- Comment #10 from Kewen Lin --- (In reply to rguent...@suse.de from comment #9) > On Mon, 4 Jan 2021, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 > > > > --- C

[Bug tree-optimization/98464] [11 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_nop_conversion_p, at tree.c:12825 by r11-4637

2021-01-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 --- Comment #8 from Kewen Lin --- (In reply to Richard Biener from comment #5) > But this > > sprime = eliminate_avail (gimple_bb (SSA_NAME_DEF_STMT (use)), use); > > should make it more conservative (compared to the more desirable use

[Bug tree-optimization/98464] [11 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_nop_conversion_p, at tree.c:12825 by r11-4637

2021-01-03 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 Kewen Lin changed: What|Removed |Added Assignee|linkw at gcc dot gnu.org |rguenth at gcc dot gnu.org

[Bug tree-optimization/98464] [11 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_nop_conversion_p, at tree.c:12825 by r11-4637

2020-12-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 Kewen Lin changed: What|Removed |Added Status|NEW |ASSIGNED CC|

[Bug tree-optimization/98464] [11 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_nop_conversion_p, at tree.c:12825 by r11-4637

2020-12-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org --- Comment

[Bug other/98437] New: confusing wording in the description of option -fsanitize=address

2020-12-24 Thread linkw at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- As Qingnan's question[1] in the mail list gcc-help, the last part in the current description of option -fsanitize=address looks conf

[Bug tree-optimization/98138] BB vect fail to SLP one case

2020-12-06 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #3 from Kewen Lin --- (In reply to Richard Biener from comment #2) > So the expected vectorization builds vectors > > { tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] } > > that's not SLP, SLP tries to build the > > { tmp[i][0], tmp[

[Bug tree-optimization/98138] BB vect fail to SLP one case

2020-12-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #1 from Kewen Lin --- Similar case is x264_pixel_satd_8x4 in x264 https://github.com/mirror/x264/blob/4121277b40a667665d4eea1726aefdc55d12d110/common/pixel.c#L288

[Bug tree-optimization/98138] New: BB vect fail to SLP one case

2020-12-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- Test case: extern void test(unsigned int t[4][4]); void foo(unsigned char *p1, int i1, unsigned char *p2, int i2) { unsigned int tmp[4][4]; unsigned int a0, a1, a2

[Bug tree-optimization/98113] [11 Regression] popcnt is not vectorized on s390 since f5e18dd9c7da

2020-12-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113 --- Comment #2 from Kewen Lin --- (In reply to Kewen Lin from comment #1) > (In reply to Ilya Leoshkevich from comment #0) > > s390's vxe/popcount-1.c began to fail after PR96789 fix. > > Sorry to see this regression. > > ... > > > > > that i

[Bug tree-optimization/98113] [11 Regression] popcnt is not vectorized on s390 since f5e18dd9c7da

2020-12-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113 Kewen Lin changed: What|Removed |Added CC||rguenther at suse dot de Last reconfirmed|

[Bug tree-optimization/97744] [11 regression] 32 bit floating point result errors after r11-4637

2020-11-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/97744] [11 regression] 32 bit floating point result errors after r11-4637

2020-11-16 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744 --- Comment #5 from Kewen Lin --- btw, this is power7 specific, I found it can pass with -mcpu=power8.

[Bug tree-optimization/97744] [11 regression] 32 bit floating point result errors after r11-4637

2020-11-16 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744 --- Comment #4 from Kewen Lin --- The additional pass fre4 run triggers this, to disable fre4 can make it pass (but to disable dse3 can't separately, so it's unrelated), further narrowing down shows fre4 on the function MG3XDEMO is responsible. B

[Bug rtl-optimization/97705] [11 regression] cc.c-torture/unsorted/dump-noaddr.c.*r.ira fails after r11-4637

2020-11-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/97744] [11 regression] 32 bit floating point result errors after r11-4637

2020-11-06 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org Last

[Bug gcov-profile/97594] [11 Regression] new test case gcc.dg/tree-prof/pr97461.c execution failure

2020-11-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97594 --- Comment #3 from Kewen Lin --- (In reply to Martin Liška from comment #2) > (In reply to Martin Liška from comment #1) > > Mine, I see a strange error: > > > > $ Program received signal SIGBUS, Bus error. > > 0x3fffb7ceddbc in __GI__IO_

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-11-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug testsuite/97705] [11 regression] cc.c-torture/unsorted/dump-noaddr.c.*r.ira fails after r11-4637

2020-11-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705 --- Comment #4 from Kewen Lin --- I think my commit just exposed one bug in ira. The newly introduced function remove_scratches can bump the max_regno, then the data structures regstat_n_sets_and_refs and reg_info_p which are allocated according

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2020-11-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 96789, which changed state. Bug 96789 Summary: x264: sub4x4_dct() improves when vectorization is disabled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 What|Removed |Added -

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-11-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug testsuite/97705] [11 regression] cc.c-torture/unsorted/dump-noaddr.c.*r.ira fails after r11-4637

2020-11-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705 --- Comment #3 from Kewen Lin --- The "-DMASK=2" dumping has more lines for register 282, which is introduced in ira. Something weird causes ira to dump more contexts. $ diff dump1/dump-noaddr.c.289r.ira dump2/dump-noaddr.c.289r.ira 107a108 >

[Bug other/97705] [11 regression] cc.c-torture/unsorted/dump-noaddr.c.*r.ira fails after r11-4637

2020-11-03 Thread linkw at gcc dot gnu.org via Gcc-bugs
|1 Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org Last reconfirmed||2020-11-04 --- Comment #2 from Kewen Lin --- Thanks for reporting and sorry for the failure. I did run the regression testing on P8 LE, but thought it&#

[Bug tree-optimization/96376] [11 regression] vect/vect-alias-check.c and vect/vect-live-5.c fail on armeb

2020-10-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96376 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #5

[Bug tree-optimization/96129] [11 regression] gcc.dg/vect/vect-alias-check.c etc. FAIL

2020-10-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/96376] [11 regression] vect/vect-alias-check.c and vect/vect-live-5.c fail on armeb

2020-10-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96376 Kewen Lin changed: What|Removed |Added CC||ro at gcc dot gnu.org --- Comment #4 from Ke

[Bug tree-optimization/96129] [11 regression] gcc.dg/vect/vect-alias-check.c etc. FAIL

2020-10-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129 --- Comment #4 from Kewen Lin --- As the regressed failures, it's highly suspected to be duplicated of PR96376.

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #32 from Kewen Lin --- (In reply to Richard Biener from comment #31) > (In reply to Kewen Lin from comment #29) > > (In reply to Hongtao.liu from comment #28) > > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the >

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #29 from Kewen Lin --- (In reply to Hongtao.liu from comment #28) > > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement > > Yes, it's the place. > > > is UB to UH conversion statement, further check if the d

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #27 from Kewen Lin --- (In reply to Hongtao.liu from comment #22) > >One of my workmates found that if we disable vectorization for SPEC2017 > >>525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with > >?>explicit

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #26 from Kewen Lin --- > > By following this idea, to release the restriction on loop_outer > > (loop_father) when setting the father_bbs, I can see FRE works as > > expectedly. But it actually does the rpo_vn from cfun's entry to it

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #25 from Kewen Lin --- > > > > Got it! For > > > > else if (vect_nop_conversion_p (stmt_info)) > > continue; > > > > Is it a good idea to change it to call record_stmt_cost like the others? > > 1) introduce one ve

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #19 from Kewen Lin --- (In reply to rguent...@suse.de from comment #17) > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > > > --- Co

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #18 from Kewen Lin --- (In reply to Richard Biener from comment #10) > (In reply to Kewen Lin from comment #9) > > (In reply to Richard Biener from comment #8) > > > (In reply to Kewen Lin from comment #7) > > > > Two questions in min

[Bug tree-optimization/97075] [11 regression] powerpc64 vector tests fails after r11-3230

2020-09-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-18 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #15 from Kewen Lin --- (In reply to rguent...@suse.de from comment #14) > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > > > --- Co

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-18 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #13 from Kewen Lin --- > 2) on Power, the conversion from unsigned char to unsigned short is nop > conversion, when we counting scalar cost, it's counted, then add costs 32 > totally onto scalar cost. Meanwhile, the conversion from

[Bug tree-optimization/97075] [11 regression] powerpc64 vector tests fails after r11-3230

2020-09-17 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075 --- Comment #4 from Kewen Lin --- > gcc.target/powerpc/p9-vec-length-full-6.c This is a test case issue, 64bit/32bit pairs will use full vector instead of partial vector as Andrea's improvement. > gcc.target/powerpc/p9-vec-length-epil-7.c It e

[Bug tree-optimization/97075] [11 regression] powerpc64 vector tests fails after r11-3230

2020-09-17 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075 --- Comment #3 from Kewen Lin --- (In reply to akrl from comment #2) > Thanks Kewen, unfortunately I've no Power setup. Sorry for the > inconvenience. My pleasure! If you have interests to run on Power machines, you can apply and use some Power

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #12 from Kewen Lin --- > Thanks for the explanation! I'll look at it after checking 2). IIUC, the > advantage to eliminate stores here looks able to get those things which is > fed to stores and stores' consumers bundled, then get mo

<    3   4   5   6   7   8   9   10   >