https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101480
--- Comment #21 from hubicka at kam dot mff.cuni.cz ---
Hi,
note that also tree-ssa-structalias has:
/* If the call is to a replaceable operator delete and results
from a delete expression as opposed to a direct call to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102646
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> I think most of the regressions are fixed, we get even better numbers now.
Because we enabled vectorization. I would say they should still
reproduce with -fno-tree-vectorize, right?
Honza
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
> [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
note that fatigue2 is polyhedron, not spec...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103766
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> Even trying to find a Fortran testsuite friendly version is hard because the
> issue can only happen with print, I tried even doing internal write to a
> string
> it is passing without
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103766
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> > I would welcome a testuite friendly version of the fortran testcase
>
> Both Andrew and I failed to make a C reproducer - what about just taking the
> -fdump-tree-gimple, as input would
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> recip pass happens after vectorization
> I don't know/understand why though.
Yep, I suppose we want to either special case this in vectorizer or make
it earlier... I also wonder why
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #33 from hubicka at kam dot mff.cuni.cz ---
With the inliner tweaks (which I hope to get bit more aggressive this
week) we "solved" the wrf compile time with LTO by simply not building
the gigantic functions. However we still have
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #16 from hubicka at kam dot mff.cuni.cz ---
> >
> > It could be done, but I was under impression that the sequence to load 1.0f
> > into topmost elements nullifies the benefit of operation to divide two
>
> Sure, so perhaps we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> Can you please attach a reduced test-case?
Do you know how to produce one with a reasonable effort? The
declaratoins are quite convoluted, but the function is well isolated and
easy to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> -E and remove not needed code.
>
> > The
> > declaratoins are quite convoluted, but the function is well isolated and
> > easy to inspect from full one...
>
> Do we speak about:
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
Created attachment 52042
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52042=edit
b.slp1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #14 from hubicka at kam dot mff.cuni.cz ---
This is bit modified patch I am testing. I added pre-computation of the
number of accesses, enabled the path for const functions (in case they
have memory operand), initialized alias sets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> (The -fno-semantic-interposition thing is probably the biggest performance gap
> between gcc -fpic and clang -fpic.)
Yep, it is often confusing to users (who do not understand what ELF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #15 from hubicka at kam dot mff.cuni.cz ---
The patch passed testing on x86_64-linux.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103300
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
Needs -O2 -floop-unroll-and-jam --param early-inlining-insns=14
to fail, so I guess it may be issue with unrol-and-jam.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103246
--- Comment #14 from hubicka at kam dot mff.cuni.cz ---
> Thanks! Great you found it so quickly.
It is bit stupid code since everything is duplicated twice (for LTO and
non-LTO). I have to refactor it: we could have common base of the two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
> I like the idea of transformation phases better than putting
> everything into tree-inline (and by extension ipa-param-manipulation)
> but perhaps we have to do aggregate constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97403
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97403
>
> --- Comment #3 from Martin Jambor ---
> (In reply to Jan Hubicka from comment #2)
> > Martin,
> > I think we can close this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103175
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
The sanity check verifies that functions acessing parameter indirectly
also reads the parameter (otherwise the indirect reference can not
happen). This patch moves the check earlier and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103164
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
Yep, it only shows that we want to run ipa-pta and local oracle in
parallel since ipa-pta can not be realistically assumed to subsume all
of local PTA (for example due to being necessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #31 from hubicka at kam dot mff.cuni.cz ---
> It likely was the loop header copying missing on cold loops then.
Yep. It is good we worked that out.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223
>
> --- Comment #5 from Martin Sebor ---
> (In reply to Martin Jambor from comment #4)
> > (In reply to Jan Hubicka from comment #0)
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103264
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
What breaks in the testcase is updating profile after complete loop
unroling. I suspect the unrolling is enabled by the extra DSE.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103267
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
Works for me even with the 3 warnings.
hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ cat >tt.c
__attribute__ ((noinline,const))
infinite (int p)
{
if (p)
while (1);
return p;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103267
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
Aha, but here is better example (reproduces same way).
In the former one I forgot const attribute which makes it invalid.
The testcase tests that ipa-sra is missing ECF_LOOPING_CONST_OR_PURE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103267
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> @@ -1,4 +1,3 @@
> -static int
> __attribute__ ((noinline,const))
> infinite (int p)
> {
Just for a record, it crahes with or without static int here for me :)
I run across it because
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
> Happens with UBSAN compiler for:
>
> $ gcc gcc/testsuite/gcc.c-torture/execute/pr71494.c -O1 -flto
> ...
> /home/marxin/Programming/gcc/gcc/ipa-modref-tree.h:550:33: runtime error: load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230
>
> --- Comment #2 from Martin Liška ---
> > How do you build ubsan compiler?
>
> F="-O0 -g -fsanitize=undefined" ; make -j16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103231
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
> [659] %
> [659] % gcctk -O0 -w small.c
> [660] %
> [660] % gcctk -O1 -w small.c
> [661] % gcctk -O1 -w small.c
> [662] % gcctk -O1 -w small.c
> gcctk: internal compiler error:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941
--- Comment #19 from hubicka at kam dot mff.cuni.cz ---
> > * special case function splitting such that a BB that contains a function
> > call which has either warning or error attribute on it; not to split out to
> > a different function.
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103266
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
> I think 'X' means simply not dereferenced or escaping since this was all
> PTA based. 'S' would still eventually allow escaping. But yes, PTA
> simply takes '1' literally. So the patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103195
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
> > threader stuff would be my bet, but we need to bisect this (tfft2 is also
> > quite small)
>
> Bad bet ;) It's caused by r12-5113-gd70ef65692fced7a.
Hehe, that was my guess yeterday.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
> Oh, you are right, then it started with r12-2353-g8da8ed435e9f01b3.
OK so mine, (as I sort of suspected :)
If it is easy for you to get -ftime-report of before and after
build, it would be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
Caused by stupid thinko (also present in gcc11). I compute right
min_flags but then use wrong value (without dereference applied).
I am testing the following.
diff --git a/gcc/ipa-modref.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103441
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
> #0 gimple_set_bb (stmt=0x3fffb01a2be0, bb=0x0) at ../../gcc/gimple.c:1772
> #1 0x107209b0 in gsi_remove (i=0x3fffd7c8,
> remove_permanently=) at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> The two main changes during that time period was jump threading and modref.
> modref seems might be more likely with wrf being fortran code and even using
> nested functions and such.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223
--- Comment #11 from hubicka at kam dot mff.cuni.cz ---
> Xeon(R) Platinum 8358 (IceLake) (64C 128T 512G):
> BenchMarks Copies RunTime1RunTime2Rate1 Rate2
> Compare
> 548.exchange2_r 128 479 913 700 367
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> so indeed that's an issue. So it's a bug fixed, not an optimization
> regression.
I know, but the bug was fixed in unnecesarily generous way preventing a
lot of valid tranforms
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> unsigned p;
> unsigned __attribute__((noinline)) test (void)
> {
> return p;
> }
>
> modref analyzing 'test' (ipa=0) (pure)
> - Analyzing load: p
>- Recording base_set=0 ref_set=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
> ... fixing this problem properly.
> I just loked into thi again and we already have code that preserves
> propagates bits on pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
>
> --- Comment #7 from Martin Jambor ---
> (In reply to hubicka from comment #5)
> > > I like the idea of transformation phases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103211
--- Comment #3 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103211
>
> --- Comment #2 from Martin Liška ---
> Optimized dump differs for couple of functions in the same way:
>
> diff -u good bad
> ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103277
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> Btw. started with r12-5236-g5aa91072e24c1e16.
Yep, I know - it is modref based DSE that lets us to enable that call as
dead. So the bug is technically mine if Richi decides to pass it to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
>
> --- Comment #4 from Richard Biener ---
> (In reply to Jan Hubicka from comment #3)
> > This is simple (and fairly common) case we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103168
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
> bool unknown_memory_access = false;
> if (summary = get_modref_function_summary (stmt, NULL))
> {
> /* First search if we can do someting useful.
> Like for dse it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
Martin,
My original report here was on regression at July 17 2021 (range
g:0b7a11874d4eb428 and g:704e8a825c78b9a8)
which seems unrelated to g:r12-3903-g0288527f47cec669
which is in Sep 21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
> Started with r12-3903-g0288527f47cec669.
This is September change (for which we have PR102943) however the
regression range was g:1ae8edf5f73ca5c3 (or g:264f061997c0a534 on second
plot)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040
--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> See above comments from Iain, even if that pre-initialization is removed it is
> still miscompiled. And, the testcase fails not because of the padding bits
> not
> being zero, but
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040
--- Comment #16 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040
>
> --- Comment #15 from Iain Buclaw ---
> Got it. The difference between D and C++ is a matter of early inlining.
>
> The C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040
--- Comment #17 from hubicka at kam dot mff.cuni.cz ---
> Great, I will take a look now (I was travelling that is why i did not
> started earlier)
Found it - there is a thinko in way NOT_RETURNED flag is handled in the
call statement analysis.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> Not seen on Haswell (but w/o PGO). Is this PGO specific? There's another
> large jump visible end of 2019.
This is kabylake LTO+PGO+march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
> Not seen on Haswell (but w/o PGO). Is this PGO specific? There's another
> large jump visible end of 2019.
It is between 2019-11-15 and 18 but the revisions does not exist at git
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #16 from hubicka at kam dot mff.cuni.cz ---
> It will only help for V2DF I think, so no, not really. But an IPA idea of
> whether there's cross-call STLF issues might be nice.
>
> Generally doing wider stores is fine but of course
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #11 from Richard Biener ---
> -mtune-ctrl=^sse_unaligned_load_optimal fixes the observed regression.
Interesting. I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> Not inlining ray_sphere at -O2 is of course what makes it overall slow.
ray_spehere is not at all that small function. We already play tricks
at -O3 to inline it by detecting that some
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102982
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102982
>
> Richard Biener changed:
>
>What|Removed |Added
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102982
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
>
> fixup_cfg already removes write-only stores so that seems fit for that
> purpose.
>
> Btw,
>
> static int x = 1;
>
> int main()
> {
> x = 1;
> }
>
> should ideally be handled as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
>| b = 2.0 * ray.dir.x * (ray.orig.x - sph->pos.x) +
> #
>| movupd (%rdi),%xmm5
> #
>| 2.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103073
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> Well, the usual thing to do is to check max_size_known_p () and
> if maybe_ne (max_size, size) then use [offset, max_size] for
> disambiguation. I think for modref you can do the same -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103073
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103073
>
> --- Comment #10 from Martin Liška ---
> > This bootstraps/regtests and fixes the testcase. Does it look sane to
> > you?
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103073
--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> > diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
> > index 9976e489697..1b51323175b 100644
> > --- a/gcc/ipa-modref-tree.h
> > +++ b/gcc/ipa-modref-tree.h
> > @@ -813,6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103058
--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103058
>
> --- Comment #9 from Martin Liška ---
> And WPA cgraph dump tells:
>
> quick_sort_1.1/213 (quick_sort_1) @0x774c2550
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #19 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
>
> Aldy Hernandez changed:
>
>What|Removed |Added
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103058
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
> One can see it with -O2 -flto=auto -march=znver2:
>
> radsw.fppized.f90:39:19: internal compiler error: in
> gimple_call_static_chain_flags, at gimple.c:1669
>39 | subroutine
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103055
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> Confirmed, started with r12-4852-g18f0873d1e595dc2.
Depth=0 means that we do no analysis at all and the assert test that
some analysis was done. I suppose we could ignore depth 0 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103058
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
Hi,
I am testing the following to unbreak fortran.
However the real bug is that binds_to_current_def should work on whole
WPA and be independent of partitioning. I remember I had patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103080
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
The cdtor merging code is predating LTO - it is also used for collect2
path on targets w/o cdtor sections.
I guess the DECL_UID compare is not very safe things to do since it
depends on the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #27 from hubicka at kam dot mff.cuni.cz ---
>
> This PR is still open, at least for slowdown in the threader with LTO. The
> issue is ranger wide, so it may also cause slowdowns on non-LTO builds for
> WRF, though I haven't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #19 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
>
> --- Comment #18 from Aldy Hernandez ---
>
> > If I read it correctly, for a path that enters the loop and later leaves
> > it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #16 from hubicka at kam dot mff.cuni.cz ---
Note that it still seems to me that the crossed_loop_header handling is
overly conservative. We have:
@ -2771,6 +2771,7 @@ jt_path_registry::cancel_invalid_paths
(vec )
bool seen_latch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103117
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> I suppose modref could (for pointer returns) use ranger to query its range
> and see if it ever is non-NULL? I'm not sure if we reliably propagate
> null pointer constants everywhere.
I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103117
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> > I don't know - this way we have separate dumps etc. I think mistake was
> > scheduling pure-const and later modref too late.
>
> Maybe. If you move them please put a comment before
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #21 from hubicka at kam dot mff.cuni.cz ---
> to also allow to thread through a loop path not crossing the latch but
> at least for the issue of "breaking loops" the loops_crossed stuff shouldn't
> be necessary. It might still
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #23 from hubicka at kam dot mff.cuni.cz ---
> We verify that by simply looking at the loop depth relation of
> the entry and exit of the path.
Which seem wrong for the path leaving loop and entering another...
>
> > It seems to me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
>
> Hmmm, this commit disables problematic threads we've agreed are detrimental to
> loop form. So it's not something the threader did, but something it's not
> allowed to do. This PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409
--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> I've fixed the threading slowdown. Can someone verify and close this PR if
> all
> the slowdown has been accounted for? If not, then someone needs to explore
> any
> slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103636
--- Comment #7 from hubicka at kam dot mff.cuni.cz ---
I use
cmake -G "Unix Makefiles" /home/jh/llvm-project/llvm
-DCLANG_TABLEGEN=/home/jh/llvm-project/llvm/out/stage1/bin/clang-tblgen
-DCMAKE_BUILD_TYPE=Release
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103601
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
Thanks Roger and Andrew! It was on my TODO for weekend and I am very
happy you beat me :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103652
--- Comment #2 from hubicka at kam dot mff.cuni.cz ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103652
>
> --- Comment #1 from Martin Liška ---
> (In reply to Jan Hubicka from comment #0)
> > Building clang in the funny way (training
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103652
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
>
> Well, I'm specifically speaking about:
> error: the control flow of function ‘BZ2_compressBlock’ does not match its
> profile data (counter ‘arcs’)
>
> this type of errors should not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103662
--- Comment #4 from hubicka at kam dot mff.cuni.cz ---
> Can you explain in simple words why adding
>
> if (ptr1%k .ne. 42) print *
>
> before the line
>
> if (ptr1%k .ne. 42) STOP 2
>
> makes the test succeed, but adding it after that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #24 from hubicka at kam dot mff.cuni.cz ---
> Awesome! thanks!
>
> I wonder if we can get rid of the final magic parameter too, we run with
> --param ipa-cp-unit-growth=80 too which seems to have no more effect on
> exchange, though
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #26 from hubicka at kam dot mff.cuni.cz ---
> It's with LTO, I'll see if non-LTO has the same benefit. In terms of
> code-size
> it looks like it accounts for a 20% increase for binary size, but the hot
> function shrinks approx 6x.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
I think ipa-cp heuristics still needs some work. It is nice that we got
it to do something, but I just checked and with LTO+PGO build of clang
it produces cca 30 clones that are not "for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103830
--- Comment #5 from hubicka at kam dot mff.cuni.cz ---
> I think the recent modref change made the function const.
>
> And no, we shouldn't DSE any volatile store and generally we don't. It's
> probably some side-effect of modref that we do.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
> And I'm intentionally not doing this because -Og should still remove
> abstraction during early inlining (for functions marked 'inline'), we
> just don't want to spend the extra compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
--- Comment #7 from hubicka at kam dot mff.cuni.cz ---
> --- Comment #6 from Richard Biener ---
> Honza, -Og was supposed to not do so much work, I intended to disable IPA
> inlining but there's no knob for that. I wonder where to best put
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> You can not disable an IPA pass becasuse then we will mishandle
> optimize attributes. I think you simply want to set
>
> flag_inline_small_functions = 0
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> Yeah, and since we inline all always inline and also flatten during
> early inline the IPA inliner should really do nothing.
OK, can_inline_edge_p will do that but we will still walk the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
--- Comment #14 from hubicka at kam dot mff.cuni.cz ---
>
> Sure - I just remember (falsely?) that we finally decided to do it :)
I do not recall this, but I may have forgotten :))
> If we don't run IPA inline we don't figure we failed to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #42 from hubicka at kam dot mff.cuni.cz ---
on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2
and -Ofast -march=native which is both very nice!
Zen1 for some reason sees less improvement, about 6%.
With PGO it is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103662
--- Comment #9 from hubicka at kam dot mff.cuni.cz ---
> I'm inclined to make this P1 even though it is gfortran only. As a last
> resort
> it should work to make the receiver side a ref-all pointer.
Yes, I also think this is important bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558
--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> > Do weak aliases fall under some implicit ODR here?
>
> The whole definition of "weak" is that it entitles you to make a definition
> that will be exempt from ODR, where a non-weak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558
--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> Result pure looping 0
> Function found to be pure: foo/4
This is good - we are supposed to find it to be pure and walk all
aliases and update noninterposable ones
> Declaration updated to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #50 from hubicka at kam dot mff.cuni.cz ---
> It helps quite a bit, the worst case is now
>
> tree VRP : 5.14 ( 7%) 0.02 ( 3%) 5.15 (
> 7%)
>2
> 9M ( 3%)
> backwards jump threading
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> > According to znver2_cost
> >
> > Cost of sse_to_integer is a little bit less than fp_store, maybe increase
> > sse_to_integer cost(more than fp_store) can helps RA to choose memory
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #16 from hubicka at kam dot mff.cuni.cz ---
>
> Yep, we also have code like
>
> - movabsq $0x3ff03db8fde2ef4e, %r8
> ...
> - vmovq %r8, %xmm11
It is loading random constant to xmm11. Since reg<->xmm moves are
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #21 from hubicka at kam dot mff.cuni.cz ---
> I would say so. It saves code size and also uop space unless the two
> can magically fuse to a immediate to %xmm move (I doubt that).
I made simple benchmark
double a=10;
int
main()
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
> Fixed, the links now show better than ever numbers.
It is only fixed by not inlining enough (since I added
--param max-inline-functions-called-once). Without LTO we still have
quite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103195
--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
> So nothing to see? I guess our unit growth limit doesn't trigger because it's
> a small (benchmark) unit?
Yep, unit growths do not apply for very small units. ipa-cp heuristics
still IMO
1 - 100 of 117 matches
Mail list logo