periodic testers I am silencing the ICE for now (at expense of missed
optimization)
Honza
gcc/ChangeLog:
2021-11-04 Jan Hubicka
PR ipa/103058
* gimple.c (gimple_call_static_chain_flags): Handle case when
nested function does not bind locally.
diff --git a/gcc/gimple.c b
Hi,
this patch implements the (long promised) intraprocedural dataflow for
propagating eaf flags, so we can handle parameters that participate
in loops in SSA graphs. Typical example are acessors that walk linked
lists, for example.
I implemented dataflow using the standard iteration over BBs in
> Hello.
>
> The renaming patch fixes a -Wodr warning seen and reported in the PR.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> PR bootstrap/102828
>
> gcc/ChangeLog:
>
> * ipa-fnsummary.c (edge_predicate_pool): Rename predicate class to
>
Hi,
this patch fixes (quite nasty) thinko in how I propagate EAF flags from callee
to caller. In this case some flags needs to be changed. In particular
- EAF_NOT_RETURNED in callee does not really mean EAF_NOT_RETURNED in caller
since we speak of different return values
- if callee
> It broke GCC bootstrap:
>
> https://gcc.gnu.org/pipermail/gcc-regression/2021-November/075676.html
>
> In file included from ../../src-master/gcc/coretypes.h:474,
> from ../../src-master/gcc/expmed.c:26:
> In function ‘poly_uint16 mode_to_bytes(machine_mode)’,
> inlined
Hi,
this patch is a small refactoring of ipa-modref to make it bit more
C++y by moving logic analyzing ssa name flags to a class
and I also moved the anonymous namespace markers so we do not
export unnecessary stuff. There are no functional changes.
Bootstrapped/regtested x86_64-linux, will
Hi,
this patch adds EAF_NOT_RETURNED_DIRECTLY which works similarly as
EAF_NODIRECTESCAPE. Values pointed to by a given argument may be returned but
not the argument itself. This helps PTA quite noticeably because we mostly
care about tracking points to which given memory location can escape.
I
Hi,
this is patchs teaches ipa-modref about the static chain that is, like
retslot, a hiden argument. The patch is pretty much symemtric to what
was done for retslot handling and I verified it does the intended job
for Ada LTO bootstrap.
Bootstrapped/regtested x86_64-linux, OK?
Honza
Hi,
this patch extends modref and tree-ssa-structalias to handle retslot flags.
Since retslot it essentially a hidden argument that is known to be write-only
we can do pretty much the same stuff as we do for regular parameters.
I plan to add static chain handling similar way.
We do not handle IPA
> Hi,
>
> On 2021/9/28 20:09, Richard Biener wrote:
> > On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo wrote:
> >>
> >> Update the patch to v3, not sure whether you prefer the paste style
> >> and continue to link the previous thread as Segher dislikes this...
> >>
> >>
> >> [PATCH v3] Don't move
> On Wed, 27 Oct 2021, Jan Hubicka wrote:
>
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > > (do_split_loop_on_cond): Likewise.
> > > ---
> > > gcc/tree-ssa-
> As discussed yesterday, for loop of form
>
> for (...)
> if (cond)
> cond = something();
> else
> something2
>
> Split as
>
Say "if (cond)" has probability p, then individual statements scale as
follows:
loop1:
pfor (...)
p if (true)
1cond = something();
1
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> (do_split_loop_on_cond): Likewise.
> ---
> gcc/tree-ssa-loop-split.c | 25 -
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git
> >
> That said, likely the profile update cannot be done uniformly
> for all blocks of a loop?
For the loop:
for (i = 0; i < n; i = inc (i))
{
if (ga)
ga = do_something ();
}
to:
for (i = 0; i < x; i = inc (i))
{
if (true)
ga = do_something ();
if
> On Tue, 26 Oct 2021, Xionghu Luo wrote:
>
> >
> >
> > On 2021/10/21 18:55, Richard Biener wrote:
> > > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > >
> > >>
> > >>
> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > >>>
> > >>>
> > >>> On 2021/9/23 20:17, Richard Biener wrote:
Hi,
this patch fixes two issues I noticed while proofreading the code.
First is that I have added conditional around setting of nonlocal and
escaped flags (since they may be set from solver) while keeping the
variable in assignment that is confusing.
Second is that we still do not set pt in the
>
> Fortran has for a long time 'character(len=5), allocatable" or
> "character(len=*)". In the first case, the "5" can be ignored as both
> caller and callee know the length. In the second case, the length is
> determined by the argument, but it cannot be changed.
>
> Since a not-that-short
Hi,
>
> FAIL: gfortran.dg/deferred_type_param_6.f90 -O1 execution test
> FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test
Sorry for the breakage. This time it seems like bug in Fortran FE
which was previously latent:
__attribute__((fn spec (". . R ")))
void subfunc
Hi,
while updating compute_points_to_sets I missed that the code not only
sets the nonlocal/escaped flags but also initializes pt. With my
previous change if uses_global_memory is false pt is not updated
correctly which may lead to wrong code.
This is fixed by the following patch I comitted to
Hi,
I managed to commit an unrelatd change that was sitting my tree that
breaks bootstrap. I have reverted it now and checked bootstrap gets
past the failing point (still waiting for full bootstrap to
finish at x86_64-linux).
Honza
gcc/ChangeLog:
* ipa-modref-tree.h (struct
> For non-local nodes which can have unknown callers, the algorithm just
> takes half of the counts - we may decide that taking just a third or
> some other portion is more reasonable, but I do not think we can
> attempt anything more clever.
Can't you just sum the calling edges and subtract it
Hi,
this patch commonizes the three paths to produce constraints for function call
and makes it more flexible, so we can implement new features more easily. Main
idea is to not special case pure and const since we can now describe all of
pure/const via their EAF flags (implicit_const_eaf_flags
> Hi,
> >
> > If you boost every self fed value by factor of 6, I wonder how quickly
> > we run into exponential explosion of the cost (since the frequency
> > should be close to 1 and 6^9=10077696
>
> The factor of six is applied once for an entire SCC, so we'd reach this
> huge number only
Hi,
this patch fixes omitted case in contains_p which later trigger a sanity
check since merging is not symmetric.
Bootstrapped/regtested x86_64-linux, comitted.
Honza
gcc/ChangeLog:
2021-10-07 Jan Hubicka
PR ipa/102581
* ipa-modref-tree.h (modref_access_node::contains_p
> Recursive call graph edges, even when they are hot and important for
> the compiled program, can never have frequency bigger than one, even
> when the actual time savings in the next recursion call are not
> realized just once but depend on the depth of recursion. The current
> IPA-CP effect
> 2021-08-23 Martin Jambor
>
> * params.opt (param_ipa_cp_profile_count_base): New parameter.
> * ipa-cp.c (max_count): Replace with base_count, replace all
> occurrences too, unless otherwise stated.
> (ipcp_cloning_candidate_p): identify mostly-directly called
>
Hi,
this should be final bit of the fancy access merging. We limit number of
accesses to 16 and on the overflow we currently just throw away the whole
table. This patch instead looks for closest pair of entries in the table and
merge them (losing some precision). This is not very often during
> On Aug 22, 2021, Jan Hubicka wrote:
>
> > OK, thanks for looking into this issue!
>
> Thanks, I've finally installed it in the trunk.
>
> > It seems that analye_stmt indeed does not skip debug stmts. It is very
> > odd we got so far without hitting build d
>
> commit f075b8c5adcf9cb6336563c472c8d624c54184db
> Author: Jan Hubicka
> Date: Thu Aug 26 15:33:56 2021 +0200
>
> Fix off-by-one error in try_merge_with
>
> gcc/ChangeLog:
>
> * ipa-modref-tree.h (modref_ref_node::verify): New
Hi,
this patch makes insertion to modref access tree smarter when --param
modref-max-bases and moredref-max-refs are hit. Instead of giving up
we either give up on base alias set (make it equal to ref) or turn the
alias set to 0. This lets us to track useful info on quite large
functions, such
> On 8/26/21 10:33, Christophe Lyon via Gcc-patches wrote:
> > Can you have a look?
>
> Please create a PR for it.
I have fix, so perhaps there is no need for PR :)
I am testing the following - the problem was that try_merge_with missed
some merges because how unoredered_remove handles the
>
> This patch is causing ICEs on arm:
> FAIL: g++.dg/torture/pr89303.C -O1 (internal compiler error)
> FAIL: g++.dg/torture/pr89303.C -O1 (test for excess errors)
It happens on 32bit arches only it seems. For some reason we end up
merging
access: Parm 0 param offset:12 offset:0
Hi,
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries. This helps analyzing array initializers and similar code. It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely. For
> >
> > I noticed loop-doloop.c use _int version and likely_max, maybe you want
> > that here?
> >
> > est_niter = get_estimated_loop_iterations_int (loop);
> > if (est_niter == -1)
> > est_niter = get_likely_max_loop_iterations_int (loop)
>
> I think that are two different things -
(release checking), however code that does a lot of array/fields
initialization may hit the limit easily.
gcc/ChangeLog:
2021-08-23 Jan Hubicka
* ipa-modref-tree.h (modref_access_node::range_info_useful_p):
Improve range compare.
(modref_access_node::contains): New member
>
> Any strong opinions?
>
> Richard.
>
> 2021-08-23 Richard Biener
>
> * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> documentation.
> * params.opt (--param vect-inner-loop-cost-factor): Remove.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
>
> Looks like the existing check using has_gimple_body_p isn't enough
> at LTRANS time but I need to check in_other_partition as well.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> OK?
>
> Thanks,
> Richard.
>
> 2021-08-23 Richard Biener
>
> PR ipa/97565
>
Hi,
>
> Why does it "punish" -fno-ipa-pta? It merely "punishes" modref of
> no longer claiming that we do not alter the instruction stream pointed
> to by a->foo, sth that shouldn't be very common.
For example
struct a {
void (*foo)();
void *bar;
}
fn(struct a *a)
{
a->foo();
}
With
> Hello.
>
> Thanks for working on that. But have really run the test-cases as the newly
> added test still aborts as it used to before you installed this patch?
Eh, sorry, I had earlier version of patch that did
if (gimple_call_fn (use_stmt) == name)
lattice[index].merge
Hi,
while looking at Martin's patch I also noticed that return slots are
handled but overactively. We only care if the SSA name we analyze is
base of return slot.
Bootstrapped/regtested x86_64-linux, comitted.
Honza
gcc/ChangeLog:
* ipa-modref.c (analyze_ssa_name_flags): Improve
ortant (or
can be implemented by special casing in unified code).
Honza
gcc/ChangeLog:
2021-08-22 Jan Hubicka
Martin Liska
* ipa-modref.c (analyze_ssa_name_flags): Indirect call implies
~EAF_NOCLOBBER.
gcc/testsuite/ChangeLog:
2021-08-22 Jan Hubicka
>
> for gcc/ChangeLog
>
> * ipa-modref.c (analyze_function): Skip debug stmts.
> * tree-inline.c (estimate_num_insn): Consider builtins even
> without a cgraph_node.
OK, thanks for looking into this issue!
(for mainline and release brances bit later)
> ---
> gcc/ipa-modref.c
> Good hint. I added hash based on object file name (I don't want to handle
> proper string escaping) and -frandom-seed.
>
> What do you think about the patch?
Sorry for taking so long - I remember I was sending reply earlier but it
seems I only wrote it and never sent.
> Thanks,
> Martin
> From
> Hello.
>
> As showed in the PR, returning (EAF_NOCLOBBER | EAF_NOESCAPE) for an argument
> that is a function pointer is problematic. Doing such a function call is a
> clobber.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
>
> Hi.
>
> We already have a IPA modref debug counter, but it's only used in
> tree-ssa-alias,
> which is only a part of what IPA modref does. I used the dbg counter in
> isolation
> of PR101949.
>
> Ready for master?
OK,
thanks!
Honza
>
> gcc/ChangeLog:
>
> * dbgcnt.def
> Hi.
>
> This is a first part fixing the PR. It makes sense making "naked" functions
> "noipa".
> What's missing is IPA MOD pass support where the pass should not optimize fns
> with "noipa" attributes.
>
> @Honza: Can you please implement that?
Hmm, I had patch for that somewhere, will do
> Hello.
>
> Right now, target_clone pass complains when a target_clone function is an
> alias.
> That happens when localalias is created by callgraph. I think we should not
> create
> such aliases as we won't benefit much from it in case of target_clones.
>
> Patch can bootstrap on
02 queries
pt_solutions_intersect: 1391917 disambiguations, 14665265 queries
I think it is mostly due to better heandling of EAF_NODIRECTESCAPE.
Honza
gcc/ChangeLog:
2021-08-12 Jan Hubicka
* ipa-modref.c (dump_eaf_flags): Dump EAF_NOREAD.
(implicit_const_eaf_flags, implicit_pure
x86_64-linux. Comitted.
gcc/ChangeLog:
2021-08-12 Jan Hubicka
* ipa-split.c (consider_split): Fix condition testing void functions.
diff --git a/gcc/ipa-split.c b/gcc/ipa-split.c
index 5e918ee3fbf..c68577d04a9 100644
--- a/gcc/ipa-split.c
+++ b/gcc/ipa-split.c
@@ -546,8 +546,9
atch I have bootstrapped/regtested x86_64-linux and I
am collecting stats for (it should have minimal effect on overal
effectivity of modref).
Honza
gcc/ChangeLog:
2021-08-11 Jan Hubicka
Alexandre Oliva
* ipa-modref.c (modref_lattice::dump): Fix escape_point's min_fla
Jan Hubicka
* ipa-modref.c (struct escape_entry): Use eaf_flags_t.
(dump_eaf_flags): Dump EAF_NOT_RETURNED
(eaf_flags_useful_p): Use eaf_fleags_t; handle const functions
and EAF_NOT_RETURNED.
(modref_summary::useful_p): Likewise.
(modref_summary_lto
> Hi,
>
> gcc/ChangeLog:
>
> 2021-06-29 Martin Jambor
>
> * cgraph.h (ipa_replace_map): New field force_load_ref.
> * ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
> aded new flag load_dereferenced, adjusted comments.
>
Hi,
> 2021-06-16 Martin Jambor
>
> PR ipa/101066
> * ipa-sra.c (class isra_call_summary): New member
> m_before_any_store, initialize it in the constructor.
> (isra_call_summary::dump): Dump the new field.
> (ipa_sra_call_summaries::duplicate): Copy it.
>
>
> I was asked by Richi to split my fix for PR 93385 for easier review
> into IPA-SRA materialization refactoring and the actual DCE addition.
> Fortunately it was mostly natural except for a temporary weird
> condition in ipa_param_body_adjustments::modify_call_stmt.
> Additionally. In
> Hello.
>
> Similarly to e.g. sanitizer attributes, we sould prevent inlining when one
> function
> is marked as not instrumented. We should do that with -fprofile-generate only.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
>
>
>
> On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> > Gentle ping.
> >
> > Original message:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
> I think you need a more aggressive ping :-)
>
> OK for the trunk. Sorry for the long delay. I kept hoping someone
> On Thu, May 20, 2021 at 3:16 PM Richard Biener
> wrote:
> >
> > On Thu, May 20, 2021 at 3:06 PM Martin Liška wrote:
> > >
> > > On 5/20/21 2:54 PM, Richard Biener wrote:
> > > > So why did you go from applying this per-file to multiple files?
> > >
> > > When I did per-file for
> Hi,
>
> the node and edge summaries defined in ipa-prop.h are probably the
> oldest in GCC and so it happened that they are the only ones using
> macros to look them up and create them. With Honza and Martin we
> agreed it is ugly and the macros should be removed and the ipa-prop
> summaries
> note that if you wrap foo () into another noinline
> wrap_foo () { foo (); return 1; } function then we need to make
> sure to not DCE this call either even though it only throws
> externally. Now the question is whether this testcase is valid
> (it should not abort). The documentation of
Hi,
I have noticed that some entries was incorrectly added to C familly
while they are general improvements and I also think the option renaming
should go to canevats since renaming is hardly an improvement per se.
Since the change is rather obvious I plan to commit it after lunch so we
do not
> > That needs to be combined with the generated auto-host.h header file.
> > From which locations do you want to build the hash? Any other $objdir
> > files except auto-host.h?
>
> In fact for PCH just summing the gengtype generated files would be
> good enough I guess ...
I think one can, for
Hi,
this patch adds changesentry for IPA/LTO and FDO.
Honza
diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 6f58cfe8..bba16ead 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -170,6 +170,37 @@ a work-in-progress.
use -g together with
preffer it over your fix - I
think both are fine in general for release branches.
lto-bootstrapped/regtested x86_64-linux.
Honza
2021-04-15 Jan Hubicka
PR lto/98599
* lto.c (lto_wpa_write_files): Fix handling of clones.
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index ceb61bb300b
Hi,
stepping through the streaming process it turns out to be funny
difference between gimple_has_body and node->has_gimple_body_p.
While the first tests whether gimple body really exists in memory (by
looking for DECL_STRUCT_FUNCTION) the second tests if gimple body can be
made available via
Hello
> > Thanks.
> >
> > I think my earlier analysis was wrong.
Sorry for late reply. I was looking into it again yesterday but was bit
confused about what is goin gon here.
> >
> > With the caveat that I'm not as familiar with the IPA code as other
> > parts of the compiler, what I think is
declare_variant_alt has references.
It would be more systematic to make this also a definition. I plan
to clean this up next stage1 and also add a verifier that
non-definitions do not have references.
Honza
gcc/ChangeLog:
2021-04-10 Jan Hubicka
PR lto/99857
* tree.c (free_lang_data_in_decl
> > Do you know what of the three changes (preferring reps/stosb,
> > CLEAR_RATIO and algorithm choice changes) cause the two speedups
> > on eebmc?
>
> A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP
>
> This loop is transformed to builtin_memcpy and builtin_memset with
> > /* skylake_cost should produce code tuned for Skylake familly of CPUs. */
> > static stringop_algs skylake_memcpy[2] = {
> > - {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> > - {libcall, {{16, loop, false}, {512, unrolled_loop, false},
> > - {-1,
> This breaks bootstrap on riscv64:
>
> In function ‘alloca_type_and_limit alloca_call_type(range_query&, gimple*,
> bool ’,
> inlined from ‘virtual unsigned int pass_walloca::execute(function*)’ at
> ../../gcc/gimple-ssa-warn-alloca.c:295:25:
> ../../gcc/gimple-ssa-warn-alloca.c:206:13:
> On Thu, Apr 1, 2021 at 6:54 PM H.J. Lu wrote:
> >
> > Since uiret should be used only for user interrupt handler return, don't
> > generate uiret in interrupt handler return with -mcmodel=kernel even if
> > UINTR is enabled.
>
> NAK, -mcmodel should not affect ISAs, in the same way it doesn't
> This patch is causing ICEs on arm and aarch64, and others according to
> gcc-testresults:
> on aarch64:
> g++.dg/ipa/devirt-7.C -std=gnu++14 (internal compiler error)
> g++.dg/ipa/devirt-7.C -std=gnu++17 (internal compiler error)
> g++.dg/ipa/devirt-7.C -std=gnu++2a (internal
> On Linux/x86_64,
>
> d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit
> commit d7145b4bb6c8729a1e782373cb6256c06ed60465
> Author: Jan Hubicka
> Date: Wed Mar 31 11:35:29 2021 +0200
>
> Small refactoring of cgraph_node::release_body
>
>
> > Reading through the optimization manual it seems that mosvb is fast for
> > small block no matter if the size is hard wired. In that case you
> > probably want to check whetehr max_size or expected_size is known to be
> > small rather than max_size == min_size and both being small.
> >
> > But
> > >
> > > Patch is OK now. I was wondering about using avx256 for moves of known
> >
> > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can
> > you take a look at the patch for Skylake:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
>
> I was wondering,
> >
> > Patch is OK now. I was wondering about using avx256 for moves of known
>
> Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can
> you take a look at the patch for Skylake:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
I was wondering, if CPU preffers
> On 3/31/21 1:08 PM, Jan Hubicka wrote:
> > >
> > > 2021-03-15 Jan Hubicka
> > >
> > > * config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
> > > * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
&g
>
> 2021-03-15 Jan Hubicka
>
> * config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
> * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
> of znver2_cost.
I have backported the pat
Hi,
in the dicussion on PR 99447 there was some confusion about release_body
being used in context where call edges/references survive. This is not
a valid use because it would leave stale pointers to ggc_freed memory
location. By auditing code I did not find any however this patch moves
the
> > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously
> > benefical and independent of the rest of changes. I think we will need
> > to discuss bit more the move ratio and the code size/uop cache polution
> > issues - one option would be to use increased limits for -O3 only.
> [AMD Public Use]
>
> Hi Honza,
>
> > -Original Message-----
> > From: Jan Hubicka
> > Sent: Wednesday, March 31, 2021 1:15 AM
> > To: Kumar, Venkataramanan
> > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] [X86_64]: Ena
.
Atop of that I plan to backport the tuning patches with exception of
gather which seems bit controversal and can wait for gcc11.
Honza
2021-03-30 Jan Hubicka
Backport
Venkataramanan Kumar
Sharavan Kumar
* common/config/i386/cpuinfo.h (get_amd_cpu) recognize
> > >gcc/testsuite/ChangeLog:
> > >
> > >2021-03-29 Jan Hubicka
> > >
> > > * gcc.c-torture/compile/pr99751.c: New test.
> >
> > Why compile torture?
>
> Ah, sorry, it was meant to be execute. I
> >gcc/testsuite/ChangeLog:
> >
> >2021-03-29 Jan Hubicka
> >
> > * gcc.c-torture/compile/pr99751.c: New test.
>
> Why compile torture?
Ah, sorry, it was meant to be execute. I will move the test.
Honza
disambiguations, 13770732 queries
gcc/ChangeLog:
2021-03-29 Jan Hubicka
* ipa-modref.c (merge_call_lhs_flags): Correct handling of deref.
(analyze_ssa_name_flags): Fix typo in comment.
gcc/testsuite/ChangeLog:
2021-03-29 Jan Hubicka
* gcc.c-torture/compile
> It started with g:3e2ae3ee285a57455d5a23bd352a68c289130186 where
> new entry was added to processor_alias_table after generic node:
>
> + {"amdfam19h", PROCESSOR_GENERIC, CPU_GENERIC, 0,
> +M_CPU_TYPE (AMDFAM19H), P_NONE},
>
> and then the following is violated:
>
> /* NB:
>
> gcc/
>
> * config/i386/i386-expand.c (expand_set_or_cpymem_via_rep):
> For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode
> to SImode.
> (decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use
> "rep movsb/stosb" only for known sizes.
> *
> > Hi,
> > I plan to commit some retuning of znver3 codegen that is based on real
> > hardware benchmarks. It turns out that there are not too many changes
> > necessary sinze Zen3 is quite smooth upgrade to Zen2. In summary:
> >
> > - some instructions (like idiv) have shorter latencies.
such examples.
However in general it is better ot have actual latencies than random numbers.
Bootstrapped/regtested x86_64-linux, commited.
Honza
gcc/ChangeLog:
2021-03-18 Jan Hubicka
* config/i386/x86-tune-costs.h (struct processor_costs): Fix costs of
integer divides1.
diff
Hi,
this patch enables gather on zen3 hardware. For TSVC it get used by 6
benchmarks with following runtime improvements:
s4114: 1.424 -> 1.209 (84.9017%)
s4115: 2.021 -> 1.065 (52.6967%)
s4116: 1.549 -> 0.854 (55.1323%)
s4117: 1.386 -> 1.193 (86.075%)
vag: 2.741 -> 1.940 (70.7771%)
and
.
Honza
2021-03-15 Jan Hubicka
* config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
* config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
of znver2_cost.
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> On Mon, Mar 8, 2021 at 6:19 PM Eric Botcazou wrote:
> >
> > Hi,
> >
> > this is a regression present on the mainline and 10 branch for architectures
> > that pass all structure types by reference, e.g. 32-bit PowerPC or SPARC.
> >
> > Jakub posted a detailed analysis in the audit trail and this
> .../gcc.dg/tree-prof/indir-call-prof-malloc.c | 2 +-
> gcc/testsuite/gcc.dg/tree-prof/pr97461.c | 2 +-
> libgcc/libgcov-driver.c | 56 ---
> 3 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git
>
> libgcc/ChangeLog:
>
> PR gcov-profile/99105
> * libgcov-driver.c (write_top_counters): Rename to ...
> (write_topn_counters): ... this.
> (write_one_data): Pre-allocate buffer for number of items
> in the corresponding linked lists.
> * libgcov-merge.c
> Hello.
>
> AS mentioned here, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97461#c25, I
> like
> what Richard suggested. So instead of usage of malloc, we should use mmap
> memory
> chunks that serve as a memory pool for struct gcov_kvp.
>
> Malloc is used as a fallback when mmap is not
reference_vars_to_consider before re-allocating it.
> (ipa_reference_write_optimization_summary): Use vec_free
> and NULL reference_vars_to_consider.
Hi,
this is version I commited after discussion on the PR
(it makes it more explicit that reference_vars_to_consider are used
durin
> From: Sergei Trofimovich
>
> Before the change RVO gimple statements were treated as local
> stores by modres analysis. But in practice RVO escapes target.
>
> 2021-01-30 Sergei Trofimovich
>
> gcc/ChangeLog:
>
> PR tree-optimization/98499
> * ipa-modref.c: treat RVO
> On Fri, 29 Jan 2021, Jan Hubicka wrote:
>
> > > This removes adding very expensive DF problems which we do not
> > > use and which somehow cause 5GB of memory to leak.
Reading through the logs, isn't the leak just caused by tings going to
memory pool that we
> This removes adding very expensive DF problems which we do not
> use and which somehow cause 5GB of memory to leak.
Impressive :)
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> 2021-01-29 Richard Biener
>
> PR rtl-optimization/98863
> *
> On Tue, Jan 26, 2021 at 10:55:35AM +0100, Jan Hubicka wrote:
> > > On Tue, Jan 26, 2021 at 10:03:16AM +0100, Richard Biener wrote:
> > > > > In 4.8 and earlier we used to fold the following to 0 during GENERIC
> > > > > folding,
> > > > &
> On Tue, Jan 26, 2021 at 10:03:16AM +0100, Richard Biener wrote:
> > > In 4.8 and earlier we used to fold the following to 0 during GENERIC
> > > folding,
> > > but we don't do that anymore because ctor_for_folding etc. has been
> > > turned into a
> > > GIMPLE centric API, but as the testcase
501 - 600 of 5075 matches
Mail list logo