Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
The UInteger type in Opt/Param declaration can easily confuse people that the
variable for this option/parameter is unsigned. But actually the internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347
--- Comment #4 from Kewen Lin ---
I found i386 port seems doesn't have this issue.
#include
#include
typedef union
{
__m128 x;
float a[4];
} union128;
#pragma GCC target("sse")
int main() {
union128 u;
__m128 a = _mm_set_ps (24.43,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347
--- Comment #3 from Kewen Lin ---
This seems not a target specific issue. I noticed the target_option tree node
is created expectedly when seeing target pragma, it explains why it works well
without lto. When lto does streaming out, it does stre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #23 from Kewen Lin ---
(In reply to Chip Kerchner from comment #22)
> (In reply to Chip Kerchner from comment #21) - Forgot one line of code
> > --
> > #pragma GCC target "cpu=power10"
> > int main() {
> > float
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
--- Comment #2 from Kewen Lin ---
Yet another reduced test case from 526.blender_r.
#include
typedef struct QMCSampler {
struct QMCSampler *next, *prev;
int type;
int tot;
int used;
double *samp2d;
double offs[1][2];
} QMCSampler;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #20 from Kewen Lin ---
Thanks for the detailed explanation, Mike!
The fusion related flags have been considered in the posted patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html.
One RFC/Patch
https://gcc.g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #18 from Kewen Lin ---
(In reply to Martin Liška from comment #16)
> >
> > Thanks for the example, it looks useful! Now the field fp_expressions is
> > generic, one target specific summary class seems required then. And not sure
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #17 from Kewen Lin ---
Created attachment 51357
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51357&action=edit
Fix some issues in rs6000_can_inline_p
As Martin pointed out, currently function rs6000_can_inline_p just returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #15 from Kewen Lin ---
(In reply to Florian Weimer from comment #12)
> (In reply to Richard Biener from comment #10)
> > As of HTM it would make the testcase a user error - when using -mcpu=power10
> > it would require building with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #14 from Kewen Lin ---
(In reply to Richard Biener from comment #11)
> Note that x86 uses for example
>
> else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
>/* If the calle doesn't use FP expressions di
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #13 from Kewen Lin ---
(In reply to Richard Biener from comment #10)
> OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining,
> no?
>
> As of HTM it would make the testcase a user error - when using -mcpu=power
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
--- Comment #9 from Kewen Lin ---
One more reduced test case:
fail cmd: gcc -c -O2 -flto -mcpu=power8
pass cmd: gcc -c -O2 -flto -mcpu=power8 -mno-htm -mno-power8-fusion
--
__attribute__((always_inline)) int foo(int *b) {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
Kewen Lin changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
--- Comment #8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
Kewen Lin changed:
What|Removed |Added
CC||crazylht at gmail dot com,
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
This is a test case reduced from SPEC2017 bmk 541.leela_r source FastBoard.cpp,
when I was investigating the O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #5 from Kewen Lin ---
(In reply to Richard Biener from comment #3)
> On x86 we even have
>
> Vector cost: 136
> Scalar cost: 196
>
> note that we seem to vectorize the reduction but that only happens with
> -ffast-math, not -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #2 from Kewen Lin ---
Back to the optimized IR, I thought the problem is that the vectorized
version has longer critical path for the reduc_plus result (latency in total).
For vectorized version,
_51 = diffa_41(D) *
1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #1 from Kewen Lin ---
The original costing shows the vectorized version wins, by checking
the costings, it missed to model the cost of lane extraction, the
patch was posted in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/57
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
For SPEC2017 bmk 508.namd_r, it's observed that it degraded by -3.73%
at -O2 -ftree-slp-vectorize vs baseline -O2 on Power9 with either default cost
model or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596
--- Comment #3 from Kewen Lin ---
Formal patch has been posted at
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576071.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101596
--- Comment #2 from Kewen Lin ---
Created attachment 51200
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51200&action=edit
Untested patch
Still need test cases to be added.
at gcc dot gnu.org |linkw at gcc dot gnu.org
Status|UNCONFIRMED |ASSIGNED
Ever confirmed|0 |1
--- Comment #1 from Kewen Lin ---
I have a untested patch.
: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
I happened to spot this when I was working to add one new pattern for Power10
divide extended. Now
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 100696, which changed state.
Bug 100696 Summary: mult_higpart is not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
What|Removed |Added
--
|--- |FIXED
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
CC||linkw at gcc dot gnu.org
--- Comment #4 from Kewen Lin ---
Should be fixed on trunk.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
Kewen Lin changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291
--- Comment #2 from Kewen Lin ---
(In reply to Kewen Lin from comment #1)
> Hi Jeff, what's the option and stanza?
The reason why I asked is that I can't simply reproduce it locally at O2, with
C compiler it likely runs forever. I guess what y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #8 from Kewen Lin ---
(In reply to rsand...@gcc.gnu.org from comment #7)
> (In reply to Kewen Lin from comment #6)
> > Created attachment 51066 [details]
> > aarch64 XPASS failure list
> >
> > The patch v3 bootstrapped and regressio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101235
Kewen Lin changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101235
--- Comment #3 from Kewen Lin ---
Will backport the fix after 2021 July 7th (two weeks since it's into trunk) if
this isn't urgent meanwhile got the backport approval.
|1
CC||linkw at gcc dot gnu.org,
||segher at gcc dot gnu.org
Last reconfirmed||2021-06-28
--- Comment #2 from Kewen Lin ---
Fixed with r12-1738 on trunk, need
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #6 from Kewen Lin ---
Created attachment 51066
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51066&action=edit
aarch64 XPASS failure list
The patch v3 bootstrapped and regression-tested on x86_64-redhat-linux and
powerpc64le-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #5 from Kewen Lin ---
Created attachment 51065
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51065&action=edit
ira: Consider matching constraint heavily with some parameter v3
The mentioned only one aarch64-linux-gnu "PASS->F
gcc dot gnu.org |linkw at gcc dot gnu.org
Last reconfirmed||2021-06-24
Ever confirmed|0 |1
--- Comment #4 from Kewen Lin ---
Created attachment 51059
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51059&action=e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #3 from Kewen Lin ---
(In reply to Vladimir Makarov from comment #2)
> (In reply to Kewen Lin from comment #1)
> > Created attachment 50715 [details]
> > ira:consider matching cstr in all alternatives
> >
> > With little understandi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
Kewen Lin changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
Ever
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #9 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #5)
> On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> >
> > --- C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #8 from Kewen Lin ---
Created attachment 50896
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50896&action=edit
M1 M2 SPEC2017 P9 eval result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #7 from Kewen Lin ---
Created attachment 50895
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50895&action=edit
Method 2, let pre generate loop carried dependence for very cheap and cheap
cost model.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #6 from Kewen Lin ---
Created attachment 50894
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50894&action=edit
Method 1, implicitly enable pcom without unrolling once loop vectorization is
enabled but pcom isn't set explicitly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #4 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #3)
> On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> >
> > --- C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #2 from Kewen Lin ---
(In reply to Richard Biener from comment #1)
Thanks for the comments!
> There's predictive commoning which can do similar transforms and runs after
> vectorization. It might be it doesn't handle these "simple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
I was investigating one degradation from SPEC2017 554.roms_r on Power9, the
baseline is -O2 -mcpu=power9 -ffast-math while the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #1 from Kewen Lin ---
Created attachment 50715
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50715&action=edit
ira:consider matching cstr in all alternatives
With little understanding on ira, I am not quite sure this patch is
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
source: function LBM_performStreamCollideTRT in SPEC2017 519.lbm_r
This issue was exposed by O2 vectorization enablement evaluation on 519.lbm_r.
baseline option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398
--- Comment #2 from Kewen Lin ---
Created attachment 50329
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50329&action=edit
tested patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99398
Kewen Lin changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Ever confirmed|0
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
#include "altivec.h"
vector long long foo(long long a, long long b) {
vector long long v1 = {a, 0};
vector long lo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #8 from Kewen Lin ---
Created attachment 49942
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49942&action=edit
vectorized with altivec built-in functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #7 from Kewen Lin ---
(In reply to Richard Biener from comment #6)
> Starting from the loads is not how SLP discovery works so there will be
> zero re-use of code. Sure - the only important thing is you end up
> with a valid SLP grap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #5 from Kewen Lin ---
(In reply to Kewen Lin from comment #4)
> One rough idea seems:
> 1) Relax this condition all_uniform_p somehow to get SLP instance building
> to go deeper and get those p1/p2 loads as SLP nodes.
> 2) Introdu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #4 from Kewen Lin ---
(In reply to Kewen Lin from comment #3)
>
> IIUC, in current implementation, we get four grouped stores:
> { tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] } /i=0,1,2,3/ independently
>
> When all these tryings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89126
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
--- Comment #10 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #9)
> On Mon, 4 Jan 2021, linkw at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
> >
> > --- C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
--- Comment #8 from Kewen Lin ---
(In reply to Richard Biener from comment #5)
> But this
>
> sprime = eliminate_avail (gimple_bb (SSA_NAME_DEF_STMT (use)), use);
>
> should make it more conservative (compared to the more desirable use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
Kewen Lin changed:
What|Removed |Added
Assignee|linkw at gcc dot gnu.org |rguenth at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
Kewen Lin changed:
What|Removed |Added
Status|NEW |ASSIGNED
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98464
Kewen Lin changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
--- Comment
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
As Qingnan's question[1] in the mail list gcc-help, the last part in the
current description of option -fsanitize=address looks conf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #3 from Kewen Lin ---
(In reply to Richard Biener from comment #2)
> So the expected vectorization builds vectors
>
> { tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] }
>
> that's not SLP, SLP tries to build the
>
> { tmp[i][0], tmp[
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #1 from Kewen Lin ---
Similar case is x264_pixel_satd_8x4 in x264
https://github.com/mirror/x264/blob/4121277b40a667665d4eea1726aefdc55d12d110/common/pixel.c#L288
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
Test case:
extern void test(unsigned int t[4][4]);
void foo(unsigned char *p1, int i1, unsigned char *p2, int i2)
{
unsigned int tmp[4][4];
unsigned int a0, a1, a2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113
--- Comment #2 from Kewen Lin ---
(In reply to Kewen Lin from comment #1)
> (In reply to Ilya Leoshkevich from comment #0)
> > s390's vxe/popcount-1.c began to fail after PR96789 fix.
>
> Sorry to see this regression.
>
> ...
>
> >
> > that i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113
Kewen Lin changed:
What|Removed |Added
CC||rguenther at suse dot de
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744
--- Comment #5 from Kewen Lin ---
btw, this is power7 specific, I found it can pass with -mcpu=power8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744
--- Comment #4 from Kewen Lin ---
The additional pass fre4 run triggers this, to disable fre4 can make it pass
(but to disable dse3 can't separately, so it's unrelated), further narrowing
down shows fre4 on the function MG3XDEMO is responsible. B
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705
Kewen Lin changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744
Kewen Lin changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
Last
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97594
--- Comment #3 from Kewen Lin ---
(In reply to Martin Liška from comment #2)
> (In reply to Martin Liška from comment #1)
> > Mine, I see a strange error:
> >
> > $ Program received signal SIGBUS, Bus error.
> > 0x3fffb7ceddbc in __GI__IO_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705
--- Comment #4 from Kewen Lin ---
I think my commit just exposed one bug in ira. The newly introduced function
remove_scratches can bump the max_regno, then the data structures
regstat_n_sets_and_refs and reg_info_p which are allocated according
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 96789, which changed state.
Bug 96789 Summary: x264: sub4x4_dct() improves when vectorization is disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
What|Removed |Added
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97705
--- Comment #3 from Kewen Lin ---
The "-DMASK=2" dumping has more lines for register 282, which is introduced in
ira. Something weird causes ira to dump more contexts.
$ diff dump1/dump-noaddr.c.289r.ira dump2/dump-noaddr.c.289r.ira
107a108
>
|1
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
Last reconfirmed||2020-11-04
--- Comment #2 from Kewen Lin ---
Thanks for reporting and sorry for the failure. I did run the regression
testing on P8 LE, but thought it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96376
Kewen Lin changed:
What|Removed |Added
CC||linkw at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129
Kewen Lin changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96376
Kewen Lin changed:
What|Removed |Added
CC||ro at gcc dot gnu.org
--- Comment #4 from Ke
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129
--- Comment #4 from Kewen Lin ---
As the regressed failures, it's highly suspected to be duplicated of PR96376.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #32 from Kewen Lin ---
(In reply to Richard Biener from comment #31)
> (In reply to Kewen Lin from comment #29)
> > (In reply to Hongtao.liu from comment #28)
> > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #29 from Kewen Lin ---
(In reply to Hongtao.liu from comment #28)
> > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement
>
> Yes, it's the place.
>
> > is UB to UH conversion statement, further check if the d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #27 from Kewen Lin ---
(In reply to Hongtao.liu from comment #22)
> >One of my workmates found that if we disable vectorization for SPEC2017
> >>525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with
> >?>explicit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #26 from Kewen Lin ---
> > By following this idea, to release the restriction on loop_outer
> > (loop_father) when setting the father_bbs, I can see FRE works as
> > expectedly. But it actually does the rpo_vn from cfun's entry to it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #25 from Kewen Lin ---
> >
> > Got it! For
> >
> > else if (vect_nop_conversion_p (stmt_info))
> > continue;
> >
> > Is it a good idea to change it to call record_stmt_cost like the others?
> > 1) introduce one ve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #19 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #17)
> On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> >
> > --- Co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #18 from Kewen Lin ---
(In reply to Richard Biener from comment #10)
> (In reply to Kewen Lin from comment #9)
> > (In reply to Richard Biener from comment #8)
> > > (In reply to Kewen Lin from comment #7)
> > > > Two questions in min
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075
Kewen Lin changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #15 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #14)
> On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> >
> > --- Co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #13 from Kewen Lin ---
> 2) on Power, the conversion from unsigned char to unsigned short is nop
> conversion, when we counting scalar cost, it's counted, then add costs 32
> totally onto scalar cost. Meanwhile, the conversion from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075
--- Comment #4 from Kewen Lin ---
> gcc.target/powerpc/p9-vec-length-full-6.c
This is a test case issue, 64bit/32bit pairs will use full vector instead of
partial vector as Andrea's improvement.
> gcc.target/powerpc/p9-vec-length-epil-7.c
It e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97075
--- Comment #3 from Kewen Lin ---
(In reply to akrl from comment #2)
> Thanks Kewen, unfortunately I've no Power setup. Sorry for the
> inconvenience.
My pleasure! If you have interests to run on Power machines, you can apply and
use some Power
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #12 from Kewen Lin ---
> Thanks for the explanation! I'll look at it after checking 2). IIUC, the
> advantage to eliminate stores here looks able to get those things which is
> fed to stores and stores' consumers bundled, then get mo
701 - 800 of 956 matches
Mail list logo