[Bug tree-optimization/90303] [9/10 Regression] ICE in hash_odr_name with fastcall attribute starting with r267359

2019-05-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90303

--- Comment #5 from Jan Hubicka  ---
I see, i suppose we may lose some optimizations in early opts because of
this but your patch is safe and I don't think the missed optimizations
are very important (if they are we should avoid having structural
equality on such types)

Honza

[Bug tree-optimization/90303] [9/10 Regression] ICE in hash_odr_name with fastcall attribute starting with r267359

2019-05-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90303

--- Comment #3 from Jan Hubicka  ---
> This comes from build_type_attribute_qual_variant:
> 1159  if (ntype != dtype)
> 1160/* This variant was already in the hash table, don't mess with
> 1161   TYPE_CANONICAL.  */;
> 1162  else if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
> 1163   || !comp_type_attributes (ntype, ttype))
> 1164/* If the target-dependent attributes make NTYPE different 
> from
> 1165   its canonical type, we will need to use structural equality
> 1166   checks for this type.
> 1167
> 1168   We shouldn't get here for stripping attributes from a type;
> 1169   the no-attribute type might not need structural 
> comparison. 
> But
> 1170   we can if was discarded from type_hash_table.  */
> 1171SET_TYPE_STRUCTURAL_EQUALITY (ntype);

Hmm, bit odd code.
> 
> --- gcc/ipa-devirt.c.jj 2019-04-15 19:45:28.796340266 +0200
> +++ gcc/ipa-devirt.c2019-05-02 10:46:03.077896176 +0200
> @@ -2020,7 +2020,7 @@ obj_type_ref_class (const_tree ref)
>ref = TREE_VALUE (TYPE_ARG_TYPES (ref));
>gcc_checking_assert (TREE_CODE (ref) == POINTER_TYPE);
>tree ret = TREE_TYPE (ref);
> -  if (!in_lto_p)
> +  if (!in_lto_p && !TYPE_STRUCTURAL_EQUALITY_P (ret))

This code is here to go from incomplete type to complete type after
free_lang_data was run.  By punting here we will lose optimization.
Perhaps it is safe to go to main variant first and then to type
canonical since the main variant will not have type_canonical cleared by
hunk above?

Honza
>  ret = TYPE_CANONICAL (ret);
>else
>  ret = get_odr_type (ret)->type;
> @@ -2042,7 +2042,7 @@ get_odr_type (tree type, bool insert)
>int base_id = -1;
> 
>type = TYPE_MAIN_VARIANT (type);
> -  if (!in_lto_p)
> +  if (!in_lto_p && !TYPE_STRUCTURAL_EQUALITY_P (type))
>  type = TYPE_CANONICAL (type);
> 
>gcc_checking_assert (can_be_name_hashed_p (type)
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug debug/90273] [9/10 Regression] GCC runs out of memory building Firefox

2019-04-30 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90273

--- Comment #28 from Jan Hubicka  ---
> The recent regression is we no longer throw them away plentiful during CFG
> cleanup and now they pile up during inlining. 
> 
> I agree full DCE with liveness will be expensive for usually little gain. Not
> sure if vector resets will improve things much.

One thing to keep in mind that after early opts and in late opts after
the initial cleanups post ipa-inline-transforms we likely have a lot of
new debug statements brought in by inliner.  It would make sense to do
something more expensive twice in queue to get rid of them. Especially
in early opts we do not want to make too many useless debug statements
to hit the LTO stream or get duplicated by subsequent inlining.

Honza

[Bug debug/90273] [9/10 Regression] GCC runs out of memory building Firefox

2019-04-29 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90273

--- Comment #10 from Jan Hubicka  ---
Hi,
the file was too large for bugzilla
so I uploaded it to
http://www.ucw.cz/~hubicka/Unified_cpp_dom_events0-8.ii.xz
and posted link in comment #2 :)

The agressive variant helps (I did not try to other one) but we still
get huge BBs containing:

  # DEBUG thisD.1400042 => NULL
  # DEBUG thisD.1400332 => NULL
  # DEBUG aFirstD.1400331 => NULL
  # DEBUG thisD.1400330 => NULL
  # DEBUG aFirstD.1400329 => NULL
  # DEBUG aArgs#0D.1400328 => NULL
  # DEBUG thisD.1400327 => NULL
  # DEBUG aFirstD.1400326 => NULL
  # DEBUG aArgs#0D.1400325 => NULL
  # DEBUG aArgs#1D.1400324 => NULL
  # DEBUG thisD.1400323 => NULL
  # DEBUG aFirstD.1400322 => NULL
  # DEBUG aArgs#0D.1400321 => NULL
  # DEBUG aArgs#1D.1400320 => NULL
  # DEBUG aArgs#2D.1400319 => NULL
  # DEBUG thisD.1400318 => NULL
  # DEBUG aFirstD.1400317 => NULL
  # DEBUG aArgs#0D.1400316 => NULL
  # DEBUG aArgs#1D.1400315 => NULL
  # DEBUG aArgs#2D.1400314 => NULL
  # DEBUG aArgs#3D.1400313 => NULL
  # DEBUG thisD.1400312 => NULL
  # DEBUG aFirstD.1400311 => NULL
  # DEBUG aArgs#0D.1400310 => NULL
  # DEBUG aArgs#1D.1400309 => NULL
  # DEBUG aArgs#2D.1400308 => NULL
  # DEBUG aArgs#3D.1400307 => NULL
  # DEBUG aArgs#4D.1400306 => NULL
  # DEBUG thisD.1400305 => NULL
  # DEBUG aFirstD.1400304 => NULL
  # DEBUG aArgs#0D.1400303 => NULL
  # DEBUG aArgs#1D.1400302 => NULL
  # DEBUG aArgs#2D.1400301 => NULL
  # DEBUG aArgs#3D.1400300 => NULL
  # DEBUG aArgs#4D.1400299 => NULL
  # DEBUG aArgs#5D.1400298 => NULL
  # DEBUG thisD.1400297 => NULL
  # DEBUG aFirstD.1400296 => NULL
  # DEBUG aArgs#0D.1400295 => NULL
  # DEBUG aArgs#1D.1400294 => NULL
  

I will be back after breakfast :)
Especialy for LTO we realy want to avoid too many of those in early
opts.

Honza

[Bug c++/87554] [8 Regression] internal compiler error: in record_reference, at cgraphbuild.c:64

2019-04-18 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87554

--- Comment #11 from Jan Hubicka  ---
> > The constructor indeed looks broken to me: it should not have naked
> > var_decl. So I am changing component to C++
> 
> I agree that the C++ front end is wrong here, but I also wonder why cgraph is
> looking at the DECL_INITIAL of a DECL_EXTERNAL variable.

Well, at least for constant variables those are useful for contant
folding.  But I suppose we may take care to get rid of decl initials
of non-readonly externals somewhere soon (we do it eventually as part
of unreachable code removal)

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug ipa/89924] [missed-optimization] Function not de-virtualized within the same TU

2019-04-04 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924

--- Comment #4 from Jan Hubicka  ---
And to answer the question about why GCC produces more code, it is
actually speculative devirtualization of the call.  GCC determines
the most likely target and inlines it.

foo_virtual(Aint*):

# this tests whether the dynamic type uses
# Aint::operator+=

mov rax, QWORD PTR [rdi]
mov rax, QWORD PTR [rax]
cmp rax, OFFSET FLAT:Aint::operator+=(A const&)
jne .L19

# if so, this is inlined copy of operator +=

pushrbx
xor ecx, ecx
mov edx, OFFSET FLAT:typeinfo for Aint
mov esi, OFFSET FLAT:typeinfo for A
mov rbx, rdi
call__dynamic_cast
testrax, rax
je  .L20
mov eax, DWORD PTR [rax+8]
add DWORD PTR [rbx+8], eax
pop rbx
ret

# this is the fallback code in case devirtualizaiton
# failed.

.L19:
mov rsi, rdi
jmp rax

-fno-devirtualize-speculatively will lead to same code as Clang does.
Honza

[Bug lto/89692] [9 Regression] ICE in streamer_write_chain, at tree-streamer-out.c:506

2019-03-20 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89692

--- Comment #6 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89692
> 
> --- Comment #5 from Jakub Jelinek  ---
> There is:
>   fld_worklist_push (TYPE_MAIN_VARIANT (t), fld);
>   /* Do not walk TYPE_NEXT_VARIANT.  We do not stream it and thus
>  do not and want not to reach unused variants this way.  */
> I believe this is the bug, we do want to walk it even if it is not streamed. 
> Walking it should be just more work during fld, true, but I don't see what 
> harm
> it would do.  Honza?
> 
> --- gcc/tree.c.jj   2019-03-19 17:10:04.248125906 +0100
> +++ gcc/tree.c  2019-03-20 11:50:19.166755015 +0100
> @@ -5875,8 +5875,7 @@ find_decls_types_r (tree *tp, int *ws, v
>if (!RECORD_OR_UNION_TYPE_P (t))
> fld_worklist_push (TYPE_MAX_VALUE_RAW (t), fld);
>fld_worklist_push (TYPE_MAIN_VARIANT (t), fld);
> -  /* Do not walk TYPE_NEXT_VARIANT.  We do not stream it and thus
> - do not and want not to reach unused variants this way.  */
> +  fld_worklist_push (TYPE_NEXT_VARIANT (t), fld);
>if (TYPE_CONTEXT (t))
> {
>   tree ctx = TYPE_CONTEXT (t);
> 
> certainly fixes the ICE.

Yes, I tried this some time ago but it ICEs with chekcing enabled
because C++ FE produces types that do not pass verify_type and puts them
into variant lists. Also Richi did not like reaching more types than
necessary. For this reason I went for a variant of patch that prunes the
lists but Richi did not like it either.  I will check if I still
reproduce the ICE.

https://patchwork.ozlabs.org/patch/967595/
I do not fully follow Richi's concern about duplicate types: the types
not visible by fld should be invisitable to middle-end and thus not used
later.

Honza

[Bug ipa/89330] IPA inliner touches released cgraph_edges

2019-02-14 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89330

--- Comment #5 from Jan Hubicka  ---
> Let me see if I can add the respective usefulness test to the code
> deciding to speculate.

I see, it is mine, sorry for blaming you :)

One alternative would be also to put the indirect part of pseculative
edge to the vector and lookup the direct one if speculation survives.
But checking prior creation should work too.

Thanks for looking into this!

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-14 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #6 from Jan Hubicka  ---
> Ah now, it's really doing sampling. I guess it can lead to quite some profile
> inconsistencies..
Yep, it is not coolest solution. I would not worry too much about
precision loss unless you get some weird interference between the
sampling counter and actual program behaviour.  Adding conditionals
everywhere is not very good and I am not sure how well CPU will predict
such branches.

Honza

[Bug lto/87525] [7/8/9 Regression] infinite loop generated for fread() if enabling -flto and -D_FORTIFY_SOURCE=2

2019-02-11 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87525

--- Comment #17 from Jan Hubicka  ---
> GNU extern inline is an extension, so is covered by whatever we define (or
> should have defined).  We've never required that the out of line and inline
> definitions are the same or in any way similar (prototypes have to match
> obviously), though without always_inline it is a compiler's choice what it 
> will
> pick, whether it decides to inline it or not.  AFAIK __fortify_function are
> extern inline __attribute__((always_inline, gnu_inline, aritifical)), so at
> that point the compiler needs to do what it is told, if it is possible to
> inline at all, inline, period.

The problem here is situation where indirect calls are present to the
fortified wraper.  This is usual issue with always_inline not really
having the meaning "this function will always be inlined" becuase
it may be called indirectly but also directly if indirect call was
optimized to direct call too late in the optimization queue.  We never
promised anything in this direction.

The harder problem here is the situation where both fortify wrapper and
the real function iplementation ends up in one translation unit (at LTO
time) that is something not really used before though it could be done
using asm("symbol_name") trickery which would not work very reliably.
The question is how invasiave changes we want to deploy to solve this
scenario and to what degree we want to commit supporting this in future.

We may also simply drop all those functions after early inlining and not
even consider fortifying calls turned form indirect to direct calls at
LTO time. The question would be whether to do that for all gnu_inlines
(and thus reduce their ability to be inlined through indirect calls) or
be bit more fine grained.

[Bug other/49194] Trivially stupid inlining decisions

2019-02-10 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #11 from Jan Hubicka  ---
Well, I am working on gradual improvements in the inlining decisions,
but since the PR is not very specific, we never will be perfect :)

[Bug ipa/88711] [9 Regression] scan-ipa-dump inline "Inlined tp_sum/

2019-02-09 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88711

--- Comment #8 from Jan Hubicka  ---
> "I committed a patch that causes a regression in the Fortran
>  testsuite.  Clearly, the problem is with Fortran not my patch.
>  I don't care about Fortran or respect those that work on the
>  Fortran FE.  Therefore, I just XFAIL the regressing test
>  case and call it Good(tm)."
> 
> Thanks.
> 
> How about reverting your patch?

The patch fixes obvious bug. That bug made inliner heuristics lucky for
this particular testcase (which was introduced by me because I also do
care about Fortran perfomrance) but also it caused problems elsewhere (I
noticed it while analyzing 10% slowdown of Firefox HTML parser).

There is no way to make inline heuristics perfect for everyone (it would
not be called heuristics then). Sadly I do not see way to improve this
particular testcase without simply increasing inline limits (would
increase code size and make others unhappy), adding some logic telling
inliner that Fortran functions should be inlined more (which was
discussed before but was not considered coolest direction ofthe attack)
or teching inliner to recognize quite complex interaction between
individual functions inlined that is hard to do.

Bumping up --param inline-insns-auto limit helps this particular
testcase.

Honza

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2019-02-08 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #33 from Jan Hubicka  ---
Hi,
I am testing the following fix: since we already decided about mangling
we are in fact safe to remove everything that does not have assembler
name on it.

Honza

Index: tree.c
===
--- tree.c  (revision 268579)
+++ tree.c  (working copy)
@@ -5152,7 +5152,8 @@ fld_simplified_type_name (tree type)
   /* Drop TYPE_DECLs in TYPE_NAME in favor of the identifier in the
  TYPE_DECL if the type doesn't have linkage.
  this must match fld_  */
-  if (type != TYPE_MAIN_VARIANT (type) || ! type_with_linkage_p (type))
+  if (type != TYPE_MAIN_VARIANT (type)
+  || !DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (type)))
 return DECL_NAME (TYPE_NAME (type));
   return TYPE_NAME (type);
 }

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2019-02-04 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #32 from Jan Hubicka  ---
> I guess we might end up streaming stuff we don't need.  Can't we simply
> remove the assert?  We do build the copy using the main variant type
> so this seems to be just a consistency check.
The consistency check prevents code from creating duplicated copies of
TYPE_DECL (if multiple types could reffer to one TYPE_DECL we would copy
it each time we copy a type).
We really should rewrite it to IDENTIFIER_TYPE for middle-end purposes
because TYPE_DECLs are useful only for C++ ODR types.  It just appeared
that tings works otherwise (i.e. we do not try to produce mangled name
for Ada types) because need_assembler_name_p return false or Ada
frontend makes assembler names of TYPE_DECLs NULL anyway. I will try
adding that check.

Honza

[Bug debug/87295] [8 Regression][early debug] ICE with -ffat-lto-objects -fdebug-types-section -g

2019-02-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87295

--- Comment #9 from Jan Hubicka  ---
Note that Mark also got an crash in the wrapper
#0  0x0046527f in simple_object_elf_copy_lto_debug_sections
(sobj=, dobj=, pfn=,
err=) at 
+/home/engshare/third-party2/gcc/7.x/src/gcc-8.x/libiberty/simple-object-elf.c:1481
 
#1  0x00462c05 in simple_object_copy_lto_debug_sections
(sobj=sobj@entry=0x71be50, dest=dest@entry=0x7532620   
  
+"/data/users/mwilliams/hphp-2/fbcode/buck-out/tmp/ld/tmp-lj3w0kif/ccGHeQIUdebugobjtem",
err=err@entry=0x7ffe2189b758) at   
 
+/home/engshare/third-party2/gcc/7.x/src/gcc-8.x/libiberty/simple-object.c:320  
#2  0x00450073 in debug_objcopy (infile=) at
/home/engshare/third-party2/gcc/7.x/src/gcc-8.x/gcc/lto-wrapper.c:1009  
#3  0x00450bbc in run_gcc (argc=, argv=)
at /home/engshare/third-party2/gcc/7.x/src/gcc-8.x/gcc/lto-wrapper.c:1436   
#4  0x00438a47 in main (argc=, argv=) at
/home/engshare/third-party2/gcc/7.x/src/gcc-8.x/gcc/lto-wrapper.c:1716  

This does not seem to reproduce simply to me.
Honza

[Bug c++/88049] [7/8/9 Regression] ICE in lto_symtab_prevailing_virtual_decl at gcc/lto/lto-symtab.c:1075 since r231671

2019-01-29 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88049

--- Comment #3 from Jan Hubicka  ---
> > We ICE on the fact that _ZTV1aIN12_GLOBAL__N_11fEE which is vtable for
> > anonymous namespace type but it has EXTERNAL flag set.
> > 
> > Jason, why this happens? I am changing type to C++: if there is indeed legal
> > reason to have exported vtables for anonymous types, then we can simply drop
> > the sanity check.
> 
> It isn't exported; it has DECL_EXTERNAL set because it isn't defined, and it
> isn't defined because nothing uses it, so it isn't needed.  Note that it isn't
> TREE_PUBLIC.

Hmm, so perhaps just adjusting sanity check to also check ||
!TREE_PUBLIC?

Honza

[Bug ipa/88933] ICE: verify_cgraph_node failed (Error: caller edge count does not match BB count)

2019-01-23 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88933

--- Comment #16 from Jan Hubicka  ---
Looks OK. I would move delete_unreachable_blocks_update_callgraph
to tree-cfgcleanup since it is no longer inliner specific.  We probably
also can sanity check that TODO_cfgcleanup is not done by ipa-transform
passes.

Honza

[Bug ipa/88933] ICE: verify_cgraph_node failed (Error: caller edge count does not match BB count)

2019-01-23 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88933

--- Comment #14 from Jan Hubicka  ---
> I'm currently testing this fix.
Cleanup_cfg does other transformations that makes profile to change and
statements move within bbs.  Just use the unreachable block removal
infrastructure we already have and keep it up to fixup_cfg to clean up
remaining cases.

[Bug ipa/88933] ICE: verify_cgraph_node failed (Error: caller edge count does not match BB count)

2019-01-23 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88933

--- Comment #11 from Jan Hubicka  ---
Actually, looking at Martin's patch, I guess ipcp transfrom should do
the same as inliner - do not cleanup cfg but call
delete_unreachable_blocks_update_callgraph
and then go with SSA update via TODO.

Honza

[Bug ipa/88933] ICE: verify_cgraph_node failed (Error: caller edge count does not match BB count)

2019-01-23 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88933

--- Comment #10 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88933
> 
> Martin Jambor  changed:
> 
>What|Removed |Added
> 
>  CC||jamborm at gcc dot gnu.org
> 
> --- Comment #9 from Martin Jambor  ---
> (In reply to Martin Liška from comment #7)
> > Created attachment 45504 [details]
> > Untested patch candidate
> > 
> > @Martin: Can you please take a look at the patch?
> 
> I am afraid rebuilding call-graph edges would mean forgetting all
> decisions to inline callees, because those are stored in
> e->inline_failed, would it not?
> 
> So it seems we have to somehow update the counts (or teach the
> validator to ignore these mismatches during IPA funtion transformation
> and pass manager to rebuild them after all transformations).

Yep, while producing the clones it is up to tree clonning to keep cgraph
up to date because edges can not be removed until inline decisions and
other transformations are applied.  The cfg cleanup there is mostly to
prevent dead basic blocks which would later kill dominance calculation
needed for SSA update.
This is done by delete_unreachable_blocks_update_callgraph is there.
Who calls merge_blocks?
I am currently travelling but I can try to take a look tonight or
tomorrow.

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug ipa/88936] [7/8/9 Regression] -fipa-pta breaks bash (incorrect optimisation of recursive static function)

2019-01-21 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88936

--- Comment #7 from Jan Hubicka  ---
Hi,
there is ipa_reduced_postorder that will compute SCCs and store scc
index.

[Bug ipa/88900] [9 Regression] 502.gcc_r SPEC benchmark miscompiles with LTO and PGO

2019-01-18 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88900

--- Comment #2 from Jan Hubicka  ---
> What a surprise, started with r267883. I'll carry on bisection with --param
> inline-unit-growth=40.

Well, I guess I can't claim that this is not gcc bug but it is the
benchmark that is broken :)

Honza

[Bug lto/84995] Documentation gcc-ar and gcc-ranlib vs {libdir}/bfd-plugins

2019-01-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84995

--- Comment #16 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84995
> 
> Richard Biener  changed:
> 
>What|Removed |Added
> 
>  CC||hubicka at gcc dot gnu.org
> 
> --- Comment #14 from Richard Biener  ---
> (In reply to Дилян Палаузов from comment #13)
> > At https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70345#c4 is written that
> > “Right now the plugin from any gcc can be used with any gcc.”  This is not
> > the same as the last comment.  Please clarify again, if any gcc plugin can
> > be used with any gcc.
> 
> Well, liblto_plugin.so is only half of the story.  Yes, using any version
> will probably "work" but you might get better experience with using a newer
> version (though that newer version might present older GCC with resolution
> files they do not understand).

I think we only extended resolution file for new resolution types passed
by newer linkers, so it should be backward compatible to original
implementation. Notable changes was made to pass -flinker-output.

Both however matters only for the cases where linker is invoked from gcc
command, not for ar/ranlib/nm.

We may want to update symbol table eventually but I do not think it
would be big deal to keep plugin understand old format becuse it is
simple.
> 
> > If several plugins can be installed simultaneously and the first one that
> > claims the .o file wins, why aren’t plugins for both GCC7 and GCC8 installed
> > at the same time?  Just for gcc8 files, the gcc7 plugin will not claim the
> > responsibility.
> 
> It doesn't work that way since any liblto_plugin.so version will accept
> any GCC LTO files.  It is the lto-wrapper binary that is found to the
> one matching the GCC driver version used to invocate the link that
> ultimately determines your luck - here strict version matching is required.
> Thus LTO linking objects from mixed GCC version is doomed to fail
> fatally.
> 
> Note plugin auto-loading will only work reliably for ar/nm/ranlib because
> of this and indeed for those the version of the plugin doesn't really matter.

Plugin interface should be good enough to allow LTO optimizing one
binary with multiple compilers (multiple versions of GCC or GCC+LLVM
combination) where obviously no cross-module optimization between
compilers will happen but still each compiler will get acurate
resolution data that will let it to optimize well within the portion of
binary it understands.

We are missing way to specify multiple plugins to linker and some other
supporting bits I believe.

Honza

[Bug lto/85574] [8/9 Regression] LTO bootstapped binaries differ

2019-01-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574

--- Comment #30 from Jan Hubicka  ---
We may still want to backport to gcc 7 branch. The ICF bug at least
exists there as well.

[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582

2019-01-08 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #20 from Jan Hubicka  ---
> Looking at our nightly spec runs, the bzip2 degradation has indeed been 
> cleaned
> up. But it looks like 175.vpr degraded another 2% or so over the last couple
> days.
Knowing what inline decision matters for VPR, I can try to fix it too.
It seems that the time accounting bug I fixed this weekend cured a lot
of mysteries I did not understand.  I have one additional patch to
adjust way we compute badness metric which I plan to commit todar or
tomorrow (there was downtime with LNT spec testers). It may affect these
two benchmarks again - I am still waiting for runs to finish.

Honza

[Bug lto/88677] [9 Regression] Divergence in -O2 and -O2 -flto early opts

2019-01-08 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88677

--- Comment #11 from Jan Hubicka  ---
> 
> I guess the FE needs the info.  But yes, dropping TREE_READONLY
> for TYPE_NEEDS_CONSTRUCTION decls at some point may make sense.
> 
> Generally it would be nice to know (for alias analysis) that
> an object is not written to via a pointer.  For const globals
> with a constructor such writes can only happen inside the global
> ctor function(s).  Not sure if IPA reference could be of help

If I have global ctor of C++ readonly structure, I would expect to be
able to call external function to construct its parts that may call back
to the unit, so it seems to me that we can not track much?

If the ctor is leaf we probably can track down the info, but won't
currently. It is soemthing that ipa-ref could try to do if it was not
lame.

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2019-01-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #28 from Jan Hubicka  ---
> > I already did printf debugging and indeed we only need to decide how to
> > adjust type_with_linkage_p so it returns false for all Ada types.  Maybe
> > cleanest would be to add flag to TYPE_DECL which C++ FE will set and
> > other frontends won't, but perhaps there is easier way around. Jason?
> 
> What does it mean exactly for a type to have or not have a linkage?

Types with linkage are C++ ODR types. They have associated mangled name
(which is after free lang data available by DECL_ASSEMBLER_NAME of
the TYPE_DECL of TYPE_NAME) and if the names match in different units,
one knows that they are same types even if tree representation diverges.

In C++ those are records, unions and enums.

Honza

[Bug c++/81668] LTO ODR warnings are not helpful

2019-01-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #15 from Jan Hubicka  ---
> I tested with a GCC snapshot (at r267505). I can now build all mysqld with LTO
> and get exactly one LTO warning, and it's a true positive (two Bison parsers
> that we haven't managed to untangle yet).
> 
> [1/1] Linking CXX executable runtime_output_directory/mysqld
> ../sql/sql_lex.h:2067:7: warning: type 'union YYSTYPE' violates the C++ One
> Definition Rule [-Wodr]
>  2067 | union YYSTYPE {
>   |   ^
> ../storage/innobase/include/fts0pars.h:50: note: a different type is defined 
> in
> another translation unit
>50 | typedef union YYSTYPE
>   | 
> ../sql/sql_lex.h:2071:18: note: the first difference of corresponding
> definitions is field 'hint_type'
>  2071 |   opt_hints_enum hint_type;
>   |  ^
> fts0pars.y:62: note: a field with different name is defined in another
> translation unit

Looks good to me. I think it is reported corectly and should be easy for
developer to understand what is going on.  So we can close the PR. Any
additional testcases would be great!

Honza

[Bug lto/88677] [9 Regression] Divergence in -O2 and -O2 -flto early opts

2019-01-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88677

--- Comment #9 from Jan Hubicka  ---
> https://gcc.gnu 
> -> You can't simply remove this flag.  You could set it to true 
> conservatively.
> Or we could stop marking globals that need constructing TREE_READONLY.

I see, I was under impression that we already drop TREE_READONLY for
them. But it is bit more subtle and thus I was indeed to quick to drop
streaming here.  In:

struct c { int a; c() { a=1; } };
const c cc;
const int dd=1;
extern const c ee;
int foo() { return ee.a; }

I get:
const c ee/8 (const c ee) @0x7f68d301ee80
  Type: variable
  Visibility: external public
  References: 
  Referring: _Z3foov/5 (read)
  Availability: not-ready
  Varpool flags: read-only
const int dd/4 (const int dd) @0x7f68d301ec80
  Type: variable definition analyzed
  Visibility: force_output no_reorder
  Aux: @0x7f68d316c450
  References: 
  Referring: 
  Availability: not-ready
  Varpool flags: initialized read-only const-value-known
const c cc/3 (const c cc) @0x7f68d301e900
  Type: variable definition analyzed
  Visibility: force_output no_reorder
  Aux: @0x7f68d301ec80
  References: 
  Referring: _Z41__static_initialization_and_destruction_0ii/6 (addr)
  Availability: not-ready
  Varpool flags:

So EE is indeed read-only, but CC is not.
I guess it is because constructor to CC is output locally and we clear
the flag, while constructor of EE is output externally. What is utility
of this information? Can we realy in some context that value of EE did
not change?

It seems to me that it is completely useless info, so I would be in
favor of dropping TREE_READONLY flag probably at cgraph
construction time (because I suppose it is relevant to front-end)

Honza

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2019-01-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #26 from Jan Hubicka  ---
> Run the gnat.dg testsuite and copy-and-paste the command line from the log 
> file
> without the -q option in the middle.  You'll get:
> 
> /home/eric/build/gcc/native/gcc/xgcc -c
> -I/home/eric/svn/gcc/gcc/testsuite/gnat.dg/ -B/home/eric/build/gcc/native/gcc
> --RTS=/home/eric/build/gcc/native/x86_64-suse-linux/./libada
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -gnatws -flto -lm -I-
> /home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto8.adb
> +===GNAT BUG DETECTED==+
> | 9.0.0 20190104 (experimental) [trunk revision 267574] (x86_64-suse-linux) 
> GCC
> error:|
> | in fld_incomplete_type_of, at tree.c:5348|

Aha, i did that but ignored the command line. It was long bughunting
week, sorry for that :)

I already did printf debugging and indeed we only need to decide how to
adjust type_with_linkage_p so it returns false for all Ada types.  Maybe
cleanest would be to add flag to TYPE_DECL which C++ FE will set and
other frontends won't, but perhaps there is easier way around. Jason?

Honza

[Bug ipa/88702] [7/8/9 regression] We do terrible job optimizing IsHTMLWhitespace from Firefox

2019-01-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88702

--- Comment #6 from Jan Hubicka  ---
> You could also add match.pd rules merging stuff pair-wise...
> 
> How's this a regression btw?

I was comparing with GCC 6 build where inlining apparently happened.
I believe inlining happens again - I am waiting for benchmarks to
finish.

Honza

[Bug ipa/88702] [6/7/8 regression] We do terrible job optimizing IsHTMLWhitespace from Firefox

2019-01-05 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88702

--- Comment #4 from Jan Hubicka  ---
> The only pass that can do about this (at least right now) is reassoc (both 1
> and 2), which is too late for inlining.  So, either teach fnsplit not to
> separate multiple if comparisons of the same variable against constants, or
> schedule reasoc or just the maybe_optimize_range_tests part thereof in some
> early pass.

Yep, I also found out about reassoc.
Teaching fnsplit to pattern match this is just a partial solution - we
would still miscalculate size of function body for functions like this
(which indeed look quite common). I will experiment with early reassoc.

I kind of debugged what happens later. Because code is compiled with -O2
and growth gets positive for both inlines and functions are not inline,
we won't inline.

[Bug ipa/88702] [6/7/8 regression] We do terrible job optimizing IsHTMLWhitespace from Firefox

2019-01-04 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88702

--- Comment #2 from Jan Hubicka  ---
> With what options?  I'm getting 3 bit tests both with -O2 and -O3, both when
> using C and C++.  And get that also if I rewrite the function to use a switch
> instead.

-O2 -flto and then look into release_ssa dump.
In Firefox we then fail to inline things back together. I am debugging
that but already producing that at firstplace is bad.

[Bug lto/88677] [9 Regression] Divergence in -O2 and -O2 -flto early opts

2019-01-03 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88677

--- Comment #4 from Jan Hubicka  ---
This drops TYPE_NEEDS_CONSTRUCTING.  I checked the uses
jan@skylake:~/trunk/gcc> grep TYPE_NEEDS_CONSTRU *.c
gimplify.c:   || TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (decl
print-tree.c:  if (TYPE_NEEDS_CONSTRUCTING (node))
tree.c:  TYPE_NEEDS_CONSTRUCTING (type) = 0;
tree.c:  verify_variant_match (TYPE_NEEDS_CONSTRUCTING);
tree-inline.c:  if (TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (p)))

I convinced myself that gimply and tree-inline uses are there only for
inlining/gimplifying within frontend. Perhaps they can also be dropped?

Honza

[Bug lto/85574] [8/9 Regression] LTO bootstapped binaries differ

2019-01-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574

--- Comment #24 from Jan Hubicka  ---
> Patch needs ">>>" to be repaced by "> > >" to bootstrap, but then I get 756
> mismatches building firefox, while previously it was 1300 and before the
> rebuild_type_inheritance_graph 6273, so we are improving.  I will look into 
> the
> remaining cases.

Actually I was wrong - grepped wrong file. Still 1300 warnings.  I am
looking into that - perhaps these are same problems.

Still I think the patch for uncprop makes sense.

Honza

[Bug ipa/88626] __builtin_constant_p should be as cheap as dead code for inlining purposes

2019-01-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88626

--- Comment #6 from Jan Hubicka  ---
> Maybe a heuristic that gives a "bonus" for inlining a function that contains
> __builtin_constant_p (only if the argument looks like it depends on the
> function parameters?) would be easier? I don't know if we have something like
> that yet.

Yes, we have inline bonuses (called inline_hint) and I guess adding
bonus for builtin_constant_p calls which depends on function parameter
should be doable and indeed makes sense to me.

Still for serious builtin_constant_p performance I think we need to do
something about the double accounting problem I explained above that is
bit harder to fix (the time/size vector themselves do not have any
if-then-else semantics and predicates have no NOT operation, but merging
the outcomes is possible, just bit hard to implement)

Honza

[Bug c++/88600] GCC rejects attributes on type aliases, while clang accepts them

2018-12-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88600

--- Comment #2 from Jan Hubicka  ---
> Applying the attribute to V rather than T works:
> 
> template 
> using V __attribute__ ((__vector_size (8))) = T;

Very cool, I did managed to work that out.  Should it work the clang way
too?

[Bug lto/85574] [8/9 Regression] LTO bootstapped binaries differ

2018-12-21 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574

--- Comment #14 from Jan Hubicka  ---
> Yeah.  Note that the debug stmt differences _might_ be explained by
> ICF in case ICF ignores debug stmts when merging.  But since the
> ICF dumps are ordered differently it's hard to see actual decision
> differences...

Yep, ICF ignores debug statements and choice of the leading function
from the group depends on memory layout it will diverge on that. Which
of course is a bug...

Honza

[Bug lto/85574] [9 Regression] LTO bootstapped binaries differ

2018-12-21 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574

--- Comment #11 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574
> 
> --- Comment #10 from Richard Biener  ---
> Hmm, IPA ICF dumps show differences like
> 
> -  false returned: 'references to virtual tables can not be merged'
> (compare_referenced_symbol_properties:371)
> +  false returned: 'references to virtual tables can not be merged'
> (compare_referenced_symbol_properties:370)
> 
> so somehow
> 
> return return_false_with_msg
>  ("references to virtual tables can not be merged");
> 
> yields a different line!?  OK, maybe that's host compiler vs. trunk
> behavior changes.

Libcpp sometimes overflows in locations and behaves funnily.  It would
be sad if that happened on gcc, but may be the root of this problem.

ipa-cf is Martin's code, so hope he will chime in :)

Honza

[Bug lto/88550] A compiler error when use lto: internal compiler error: in add_symbol_to_partition_1, at lto/lto-partition.c:155

2018-12-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88550

--- Comment #2 from Jan Hubicka  ---
Dump file produced by the linker with -fdump-ipa-cgraph --save-temps (it
may end up in /tmp) would help to at least have clue what kind of symbol
caused the crash.

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2018-12-11 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #19 from Jan Hubicka  ---
Yeap, the warnings was written at the time all C++ types had TYPE_NAMEs
and other types used IDENTIFIER_NODE. I have chnaged it for memory use
reaosns so only main variants have IDENTIFIER_NODE.  Usually we want to
just look for main variant and complain about that.  I will look into
it.

Honza

[Bug lto/86004] [9 regression] Several lto test cases begin failing with r260963

2018-12-10 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86004

--- Comment #12 from Jan Hubicka  ---
Thanks a lot for looking into this.  Indeed disabling the tests is
probably good idea, so the patch looks good to me. Somewhere we should
document minimal binutils release supporting incremental link...

Honza

[Bug lto/88297] [9 Regression] Assembler Error: symbol `_Z41__static_initialization_and_destruction_0ii.constprop.0' is already defined

2018-12-03 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297

--- Comment #6 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297
> 
> --- Comment #5 from michael.ploujnikov at oracle dot com ---
> (In reply to Richard Biener from comment #3)
> > So before the patch we were just lucky, right?  When seeing the patches I
> > wondered whether we instead want to add a clone_count member to cgraph_node
> > (which we could stream) and use that for the .NUM suffix.  We alread have
> > it (sort-of) if we walk the clones list and do counting, right?
> 
> But the root of the problem is that multiple different cgraph_nodes share the
> same name, so even if two or more nodes like that have counters == 0 we would
> get the same conflict. Unless it's always the case that the additional

They are linked together as "transparent aliases". So one node has no
transparent_alias set and other sets it and node->alias_target will get you to
the "master" node.

Honza

[Bug middle-end/88302] __gcov_indirect_call_profiler_v2 and first_run profiling can be cheaper with LTO

2018-12-03 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88302

--- Comment #3 from Jan Hubicka  ---
> The way GCC exports internal API for plugins, all functions are externally
> visible. Unless plugins are disabled, the linker receives -rdynamic and so all
> non-static functions appear in the ELF dynamic table.
> 
> It's possible but complicated to solve this anyway by having two entrypoints
> for direct and indirect calls.

Yep, good point.
It would be good to get plugin api sorted out, but it is kind of
orthogonal issue.  For now I am testing with --disable-plugin.
Implementing this would speed up other programs too.

Honza

[Bug tree-optimization/88272] warning: iteration 9223372036854775807 invokes undefined behavior

2018-11-30 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88272

--- Comment #2 from Jan Hubicka  ---
> Where do we set the cut-off? ;)
-fuser-patience=

I would cut it off for things that are obviously derived from sign of
64bit value :)) Those are most common.

Honza

[Bug middle-end/87157] [9 regression] gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c fails starting with r263981

2018-11-27 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87157

--- Comment #9 from Jan Hubicka  ---
> @@ -11,7 +11,7 @@ struct test {
> 
>  extern struct test s;
> 
> -int main1 ()
> +__attribute__((noipa)) int main1 ()
>  {  
>int i;
> 
> We want to test the vectorizer behavior, not depend on how many times the
> function has been inlined or versioned.

This looks like a good idea :)

[Bug lto/88140] [9 Regression] ICE: verify_gimple failed since r266325

2018-11-27 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88140

--- Comment #5 from Jan Hubicka  ---
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 39a92464414..a39e611292a 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -5201,6 +5201,15 @@ fld_process_array_type (tree t, tree t2, hash_map tree> *map,
>array = build_array_type_1 (t2, TYPE_DOMAIN (t),
>   TYPE_TYPELESS_STORAGE (t), false);
>TYPE_CANONICAL (array) = TYPE_CANONICAL (t);
> +  /* Re-building the array via build_array_type_1 causes the C FE
> + special-handling of zero-length arrays to be dropped.  So
> +we copy back TYPE_SIZE[_UNIT] from the original type here
> +if layout_type decided the type is incomplete.  */
> +  if (!TYPE_SIZE (array))
> +   {
> + TYPE_SIZE (array) = TYPE_SIZE (t);
> + TYPE_SIZE_UNIT (array) = TYPE_SIZE_UNIT (t);

Makes sense to me, here.
To get types merged, we want to build same array when structure was
originally complete or incomplete. That means we should not copy real
size of the element because it is unknown in the incomplete case.

Honza
> +   }
>add_tree_to_fld_list (array, fld);
>  }
>return array;
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #6 from Jan Hubicka  ---
> 
> Honza - can you test the effect of this patch please?
Thanks! I am just redoing the tests (rebuilding firefoxes with updated
tree), so i will do that today or tomorrow.

Honza

[Bug lto/88112] [9 regression] ICE in lto1: TYPE_FIELDS defined in incomplete type

2018-11-20 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88112

--- Comment #3 from Jan Hubicka  ---
> 
> Honza?  Eric?

I am not sure I fully understand the problem here, but
why we end up streaming ungimplified type at first place?

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2018-11-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #6 from Jan Hubicka  ---
I think we have separate PR for this ICE.  It is the ODR violation
confusing the walk of duplicates. It needs to stop assuming that all
duplicates are same TREE_CODE of type.

I will cook up patch.

Honza

[Bug ipa/65502] pure-const should play well with clobbers.

2018-11-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65502

--- Comment #7 from Jan Hubicka  ---
> Can the bug be marked as resolved?
I think to fully resolve we still want to teach DCE to replace
pure/const destructor by clobber when removing it.  This should
not be too hard to do because destructors are marked. Or do we output
clobber after every destructor call as part of lifetime-DSE?

Honza

[Bug lto/65536] LTO line number information garbled

2018-11-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65536

--- Comment #55 from Jan Hubicka  ---
> Can the bug be marked as resolved?

I think with the location cache we only made this problem less visible
and for really large programs linemap still can overflow and behave
funy, right?

[Bug ipa/87957] [9 Regression] ICE tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in warn_odr, at ipa-devirt.c:1051 since r265519

2018-11-17 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87957

--- Comment #4 from Jan Hubicka  ---
ICE fixed, but lets keep the PR open to track the fact that warning is
quite confused.

[Bug ipa/87843] [9 Regression] SPEC miscompilation of 403.gcc and 502.gcc_r benchmarks

2018-11-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87843

--- Comment #17 from Jan Hubicka  ---
> I don't see the miscompilation any longer, may I close it?
Yes, it was fixed by 

* tree.c (fld_type_variant): Copy canonical type.   
(fld_incomplete_type_of): Check that canonical types looks sane;
copy canonical type.
(verify_type): Accept when incomplete type has complete canonical type.

[Bug middle-end/88010] noinline function alias unexpectedly inlined

2018-11-14 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88010

--- Comment #4 from Jan Hubicka  ---
> Yep, GCC considers attributes to be part of the definition of a function for
> IPA passes.  We are not consitent here (i.e. warning attributes on aliases
> counts), so it makes sense to support this (and is not too difficult to do in
> this case).

Actually siplicity was bit overrated - currently all function
redirection code eliminates aliases, so we will need to carefully update
those so attributes are not lost.  Still implementable.

[Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3

2018-11-06 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #82 from Jan Hubicka  ---
> > Yep, this is because they used to be arrays indexed by symbol UIDs which
> > Martin converted to hash tables.  Inliner happily calls summary_get each
> > time it needs the summary.  I have some patches to speed this up which I
> > will push out after the type changes (while they add bit of extra
> > functionality by teaching ipa-predicates abou value range I hope they
> > are OK for early stage3).
> 
> Awww.  I guess with no longer re-using UIDs we then get bad hashing
> behavior as well :/  I hope the hash function is _not_ simply the UID?

Yep, UID reuse was there precisely to make the arrays reasonably
compact.  hash is int_hash which IMO returns simply uid.

[Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3

2018-11-06 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #80 from Jan Hubicka  ---
> 
> flat perf profile:
> 
> Samples: 510K of event 'instructions:p', Event count (approx.): 715615147320  
>   
> Overhead   Samples  Command  Shared Object Symbol 
>   
>8.08% 95243  f951 f951  [.] bitmap_ior_into
>7.21% 25966  f951 f951  [.] sreal::operator*
>5.43% 19353  f951 f951  [.]
> hash_table5.20% 23167  f951 f951  [.] get_ref_base_and_extent
>4.93% 17947  f951 f951  [.]
> profile_count::to_sreal_s
>4.37% 15865  f951 f951  [.] sreal::operator/
>3.45% 30532  f951 f951  [.] bitmap_set_bit
>3.41% 12159  f951 f951  [.]
> hash_table3.08% 11034  f951 f951  [.] default_binds_local_p_3
>3.08% 11146  f951 f951  [.]
> hash_table2.21%  7877  f951 f951  [.]
> want_inline_small_functio
>1.93%  6874  f951 f951  [.] edge_badness
>1.87%  6675  f951 f951  [.]
> compute_inlined_call_time
> 
> the ipa_fn_summary hash and edge_growth_cache / call_summary hashes are
> oddly on top of the profile...

Yep, this is because they used to be arrays indexed by symbol UIDs which
Martin converted to hash tables.  Inliner happily calls summary_get each
time it needs the summary.  I have some patches to speed this up which I
will push out after the type changes (while they add bit of extra
functionality by teaching ipa-predicates abou value range I hope they
are OK for early stage3).

[Bug tree-optimization/87885] ICE in release_ssa_name_fn with -fprofile-report

2018-11-06 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87885

--- Comment #3 from Jan Hubicka  ---
OK, I now recall. The intend was really to have three values 
- profile before pass was run (which you can see from stats of previous
  pass)
- profile after pass was run
- profile after cleanups

This is somewhat useful because, say for CCP one can see how much code
sped up just by removing some calculation and how much it was affected
by subsequent unreacable code removal.  If we can't calculate the middle
value safely, we can just drop it.

Honza

[Bug c/87868] testsuite/c-c++-common/pr60101.c with -O3 and ubsan

2018-11-05 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87868

--- Comment #2 from Jan Hubicka  ---
I am attaching testcase and patch I am lto-botstrapping now.  It copies
canonical from original type which is important for us to not lose TBAA
during ealry opts as well. 

typedef struct rtx_def *rtx;
typedef struct cselib_val_struct
{
  union
  {
  } u;
  struct elt_loc_list *locs;
}
cselib_val;
struct elt_loc_list
{
  struct elt_loc_list *next;
  rtx loc;
};
static int n_useless_values;
unchain_one_elt_loc_list (pl)
 struct elt_loc_list **pl;
{
  struct elt_loc_list *l = *pl;
  *pl = l->next;
}

discard_useless_locs (x, info)
 void **x;
{
  cselib_val *v = (cselib_val *) * x;
  struct elt_loc_list **p = >locs;
  int had_locs = v->locs != 0;
  while (*p)
{
  unchain_one_elt_loc_list (p);
  p = &(*p)->next;
}
  if (had_locs && v->locs == 0)
{
  n_useless_values++;
}
}
* tree.c (fld_incomplete_type_of): Copy type_canonical.
Index: tree.c
===
--- tree.c  (revision 265712)
+++ tree.c  (working copy)
@@ -5146,6 +5146,7 @@ fld_incomplete_type_of (tree t, struct f
  else
first = build_reference_type_for_mode (t2, TYPE_MODE (t),
TYPE_REF_CAN_ALIAS_ALL (t));
+ TYPE_CANONICAL (first) = TYPE_CANONICAL (TYPE_MAIN_VARIANT (t));
  add_tree_to_fld_list (first, fld);
  return fld_type_variant (first, t, fld);
}

[Bug ipa/87843] [9 Regression] SPEC miscompilation of 403.gcc and 502.gcc_r benchmarks

2018-11-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87843

--- Comment #7 from Jan Hubicka  ---
> If we have less MEM_REFs then we probably strip them because we think they
> reference equal types.
> 
> I think I already told you that given that MEM_REFs use pointer types
> to carry alignment info _those_ may not become incomplete!  But I didn't
> expect that to cause wrong-code but missed optimizations.

We do not make them incomplete.  The problem actually seems to be in
early optimization where we optimize out the if conditional above.
Not sure why -ffat-lto-objects worked in this context.

[Bug lto/87754] [9 regression] ICE in odr_types_equivalent_p, at ipa-devirt.c:1250

2018-10-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87754

--- Comment #6 from Jan Hubicka  ---
Hi,
this patch fixes the ICE as well as the template. I will commit it after LTO
bootstrap converges.

Honza

* ipa-devirt.c (odr_subtypes_equivalent_p): Fix recursion.
(warn_types_mismatch): Fix walk of DECL_NAME.
(odr_types_equivalent_p): Fix overactive assert.
* lto/lto-symtab.c (lto_symtab_merge_decls_2): Fix extra space.

* g++.dg/lto/ldr-1_0.C: Fix template.
* g++.dg/lto/ldr-1_1.C: Fix template.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 265519)
+++ ipa-devirt.c(working copy)
@@ -719,9 +719,10 @@ odr_subtypes_equivalent_p (tree t1, tree
 }
   if (visited->add (pair))
 return true;
-  if (odr_types_equivalent_p (TYPE_MAIN_VARIANT (t1), TYPE_MAIN_VARIANT (t2),
- false, NULL, visited, loc1, loc2)
-  && !type_variants_equivalent_p (t1, t2, warn, warned))
+  if (!odr_types_equivalent_p (TYPE_MAIN_VARIANT (t1), TYPE_MAIN_VARIANT (t2),
+ false, NULL, visited, loc1, loc2))
+return false;
+  if (!type_variants_equivalent_p (t1, t2, warn, warned))
 return false;
   return true;
 }
@@ -1138,7 +1139,7 @@ warn_types_mismatch (tree t1, tree t2, l
   if (TREE_CODE (n1) == TYPE_DECL)
n1 = DECL_NAME (n1);
   if (TREE_CODE (n2) == TYPE_DECL)
-   n1 = DECL_NAME (n2);
+   n2 = DECL_NAME (n2);
   /* Most of the time, the type names will match, do not be unnecesarily
  verbose.  */
   if (IDENTIFIER_POINTER (n1) != IDENTIFIER_POINTER (n2))
@@ -1292,10 +1293,6 @@ odr_types_equivalent_p (tree t1, tree t2
   /* Check first for the obvious case of pointer identity.  */
   if (t1 == t2)
 return true;
-  gcc_assert (!type_with_linkage_p (TYPE_MAIN_VARIANT (t1))
- || !type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t1)));
-  gcc_assert (!type_with_linkage_p (TYPE_MAIN_VARIANT (t2))
- || !type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t2)));

   /* Can't be the same type if the types don't have the same code.  */
   if (TREE_CODE (t1) != TREE_CODE (t2))
Index: lto/lto-symtab.c
===
--- lto/lto-symtab.c(revision 265517)
+++ lto/lto-symtab.c(working copy)
@@ -698,7 +698,7 @@ lto_symtab_merge_decls_2 (symtab_node *f
  if (level & 2)
diag = warning_at (DECL_SOURCE_LOCATION (decl),
   OPT_Wodr,
-  "%qD violates the C++ One Definition Rule ",
+  "%qD violates the C++ One Definition Rule",
   decl);
  if (!diag && (level & 1))
diag = warning_at (DECL_SOURCE_LOCATION (decl),
Index: testsuite/g++.dg/lto/odr-1_0.C
===
--- testsuite/g++.dg/lto/odr-1_0.C  (revision 265517)
+++ testsuite/g++.dg/lto/odr-1_0.C  (working copy)
@@ -3,6 +3,6 @@
 struct a { // { dg-lto-warning "8: type 'struct a' violates the C\\+\\+ One
Definition Rule" }
   struct b *ptr; // { dg-lto-message "13: the first difference of
corresponding definitions is field 'ptr'" }
 };
-void test(struct a *) // { dg-lto-warning "6: warning: 'test' violates the C++
One Definition Rule" }
+void test(struct a *)
 {
 }
Index: testsuite/g++.dg/lto/odr-1_1.C
===
--- testsuite/g++.dg/lto/odr-1_1.C  (revision 265517)
+++ testsuite/g++.dg/lto/odr-1_1.C  (working copy)
@@ -4,7 +4,7 @@ namespace {
 struct a {
   struct b *ptr;
 };
-void test(struct a *);
+void test(struct a *); // { dg-lto-warning "6: 'test' violates the C\\+\\+ One
Definition Rule" }
 int
 main(void)
 {

[Bug testsuite/86158] [9 regression] gcc.c-torture/unsorted/dump-noaddr.c.*i.lto-stream-out fails starting with 261546

2018-10-25 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86158

--- Comment #5 from Jan Hubicka  ---
> 
> Which is caused by --param ggc-min-heapsize=1 which is used by first 
> invocation
> of the compilation. Honza, do you call ggc_collect before streaming out?

So we have IL representation diverging because of gabage collection? That looks
like a bug...

Honza

[Bug c/87615] Possible excessive compile time with -O2

2018-10-16 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87615

--- Comment #7 from Jan Hubicka  ---
> Looks like the IPA-CP stmt walking is still unbound?
There is also walking in ipa-fnsummary that is unbound, perhaps that is the
problem...

Honza

[Bug lto/83375] partitioner partitions static arrays with label references

2018-10-11 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

--- Comment #8 from Jan Hubicka  ---
> > This breaks Linux kernel LTO builds. I currently have a workaround
> > (disabling LTO for that file), but I don't think your "is not common"
> > argument is valid.
> 
> Well, I guess pushing LTO into Linux kernel would be difficult task to 
> achieve.
> Can you please point me to a source file where it's used?
> Note that I did experiments with openSUSE distribution and I haven't seen it
> for any core package when using -flto.

In most cases we get lucky as we home variables into same partition as
function,
so the issue does not trigger that often.  Still will try to find time to fix
it
this stage1.

Honza

[Bug middle-end/87157] [9 regression] gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c fails starting with r263981

2018-09-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87157

--- Comment #6 from Jan Hubicka  ---
> But this change to sreal seems very unlikely to cause that. 
> Are we sure about the bisection to r263981?

Sreals are used to estimate profile which in turn may affect decision
of function splitting & inliner.  If something was really off we however
would likely see it in performance results which seems OK.
I will try to take look tomorrow as well.

Honza

[Bug lto/86517] relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object with LTO

2018-07-17 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86517

--- Comment #7 from Jan Hubicka  ---
Hi,
I am attaching patch I am testing and also table generated by a script that
walks through individual combinations of options.  The combination rules are as
follows.

I tried to take into account that targets may default to some form of PIC or
PIE. That is why, for example combining -fpic and empty options results in
empty options and no no-pic.

Note that in addition to lto-wrapper option merging we now have logic that
disables PIC/PIE when final binary is built based on lto-plugion output (it
knows if it builds binary, relocatable binary, library or incrementally links).
Still we rely on the merging to choose particular form of PIC/PIE and we need
it right in incremental link where linker does not help us.

   -fpic +-fpic =>  -fpic 
   -fpic +-fPIC =>  -fpic 
   -fpic +-fpie =>  -fpie 
   -fpic +-fPIE =>  -fpie 
   -fpic + -fno-pic =>  -fno-pic 
   -fpic + -fno-pie =>  
   -fpic +  =>  
   -fPIC +-fpic =>  -fpic 
   -fPIC +-fPIC =>  -fPIC 
   -fPIC +-fpie =>  -fpie 
   -fPIC +-fPIE =>  -fPIE 
   -fPIC + -fno-pic =>  -fno-pic 
   -fPIC + -fno-pie =>  
   -fPIC +  =>  
   -fpie +-fpic =>  -fpie 
   -fpie +-fPIC =>  -fpie 
   -fpie +-fpie =>  -fpie 
   -fpie +-fPIE =>  -fpie 
   -fpie + -fno-pic =>  
   -fpie + -fno-pie =>  
   -fpie +  =>  
   -fPIE +-fpic =>  -fpie 
   -fPIE +-fPIC =>  -fPIE 
   -fPIE +-fpie =>  -fpie 
   -fPIE +-fPIE =>  -fPIE 
   -fPIE + -fno-pic =>  
   -fPIE + -fno-pie =>  
   -fPIE +  =>  
-fno-pic +-fpic =>  -fno-pic 
-fno-pic +-fPIC =>  -fno-pic 
-fno-pic +-fpie =>  -fno-pic 
-fno-pic +-fPIE =>  -fno-pic 
-fno-pic + -fno-pic =>  -fno-pic 
-fno-pic + -fno-pie =>  -fno-pic 
-fno-pic +  =>  -fno-pic 
-fno-pie +-fpic =>  -fno-pie 
-fno-pie +-fPIC =>  -fno-pie 
-fno-pie +-fpie =>  -fno-pie 
-fno-pie +-fPIE =>  -fno-pie 
-fno-pie + -fno-pic =>  -fno-pie 
-fno-pie + -fno-pie =>  -fno-pie 
-fno-pie +  =>  -fno-pie 
   -fpic +  =>  
   -fPIC +  =>  
   -fpie +  =>  
   -fPIE +  =>  
-fno-pic +  =>  
-fno-pie +  =>  
 +  =>  
Index: lto-wrapper.c
===
--- lto-wrapper.c   (revision 262682)
+++ lto-wrapper.c   (working copy)
@@ -408,6 +408,11 @@ merge_and_complain (struct cl_decoded_op
  It is a common mistake to mix few -fPIC compiled objects into otherwise
  non-PIC code.  We do not want to build everything with PIC then.

+ Similarly we merge PIE options, however in addition we keep
+  -fPIC + -fPIE = -fPIE
+  -fpic + -fPIE = -fpie
+  -fPIC/-fpic + -fpie = -fpie
+
  It would be good to warn on mismatches, but it is bit hard to do as
  we do not know what nothing translates to.  */

@@ -415,11 +420,34 @@ merge_and_complain (struct cl_decoded_op
 if ((*decoded_options)[j].opt_index == OPT_fPIC
 || (*decoded_options)[j].opt_index == OPT_fpic)
   {
-   if (!pic_option
-   || (pic_option->value > 0) != ((*decoded_options)[j].value > 0))
- remove_option (decoded_options, j, decoded_options_count);
-   else if (pic_option->opt_index == OPT_fPIC
-&& (*decoded_options)[j].opt_index == OPT_fpic)
+   /* -fno-pic in one unit implies -fno-pic everywhere.  */
+   if ((*decoded_options)[j].value == 0)
+ j++;
+   /* If we have no pic option or merge in -fno-pic, we still may turn
+  existing pic/PIC mode into pie/PIE if -fpie/-fPIE is present.  */
+   else if ((pic_option && pic_option->value == 0)
+|| !pic_option)
+ {
+   if (pie_option && pie_option->value > 0)
+ {
+   bool big = (*decoded_options)[j].opt_index == OPT_fPIC
+  && pie_option->opt_index == OPT_fPIE;
+   (*decoded_options)[j].opt_index = big ? OPT_fPIE : OPT_fpie;
+   (*decoded_options)[j].canonical_option[0] = big ? "-fPIE" :
"-fpie";
+   j++;
+ }
+   else if (pic_option)
+ {
+   (*decoded_options)[j] = *pic_option;
+   j++;
+ }
+   /* We do not know if target defaults to pic or not, so just remove
+  option if it is missing in one unit but enabled in other.  */
+   else
+ remove_option (decoded_options, j, decoded_options_count);
+ }
+   else if (pic_option->opt_index == OPT_fpic
+&& (*decoded_options)[j].opt_index == OPT_fPIC)
  {
(*decoded_options)[j] = *pic_option;
j++;
@@ -430,11 +458,37 @@ merge_and_complain (struct cl_decoded_op
else if ((*decoded_options)[j].opt_index == OPT_fPIE
 || (*decoded_options)[j].opt_index == OPT_fpie)
   {
-   if 

[Bug lto/86517] relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object with LTO

2018-07-16 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86517

--- Comment #6 from Jan Hubicka  ---
The problem is logic in lto-wrapper (which is mine)
  /* Merge PIC options: 
  -fPIC + -fpic = -fpic 
  -fPIC + -fno-pic = -fno-pic   
  -fpic/-fPIC + nothin = nothing.   
 It is a common mistake to mix few -fPIC compiled objects into otherwise
 non-PIC code.  We do not want to build everything with PIC then.   

 It would be good to warn on mismatches, but it is bit hard to do as
 we do not know what nothing translates to.  */ 

  for (unsigned int j = 0; j < *decoded_options_count;) 
if ((*decoded_options)[j].opt_index == OPT_fPIC 
|| (*decoded_options)[j].opt_index == OPT_fpic) 
  { 
if (!pic_option 
|| (pic_option->value > 0) != ((*decoded_options)[j].value > 0))
  remove_option (decoded_options, j, decoded_options_count);
else if (pic_option->opt_index == OPT_fPIC  
 && (*decoded_options)[j].opt_index == OPT_fpic)
  { 
(*decoded_options)[j] = *pic_option;
j++;
  } 
else
  j++;  
  } 
   else if ((*decoded_options)[j].opt_index == OPT_fPIE 
|| (*decoded_options)[j].opt_index == OPT_fpie) 
  { 
if (!pie_option 
|| pie_option->value != (*decoded_options)[j].value)
  remove_option (decoded_options, j, decoded_options_count);
else if (pie_option->opt_index == OPT_fPIE  
 && (*decoded_options)[j].opt_index == OPT_fpie)
  { 
(*decoded_options)[j] = *pie_option;
j++;
  } 
else
  j++;  
  } 

PIC merging is OK, but PIE merging should not remove PIE if PIC is provided in
other units.
I am looking into fix.

Honza

[Bug lto/86517] relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object with LTO

2018-07-14 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86517

--- Comment #2 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86517
> 
> H.J. Lu  changed:
> 
>What|Removed |Added
> 
>  Status|UNCONFIRMED |RESOLVED
>  CC||hjl.tools at gmail dot com
>  Resolution|--- |INVALID
> 
> --- Comment #1 from H.J. Lu  ---
> I(In reply to Martin Liška from comment #0)
> 
> > 
> > $ gcc -flto -c -fPIE -O2 1.i 2.i && gcc -fPIC -c -O2 lib.i -flto && ar rv
> > x.a lib.o && gcc -pie -O2 -pthread -ldl -lxml2 1.o 2.o x.a -rdynamic -flto=9
> > -shared
> > r - lib.o
> 
> I don't believe you can build a shared object with -fPIE and linker tells
> you to recompile with -fPIC.

I think the problem here is that you can compile PIE and PIC object into pie
binary
at least on x86-64, but the way we merge options in lto-wrapper, we disable
both PIE and
PIC at LTO linktime.
I think we ought to consider PIE as lower variant of PIC and resolve such funny
combination as -fPIE.

Honza

[Bug lto/86442] Wrong error: global register variable follows a function definition when using LTO

2018-07-09 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86442

--- Comment #1 from Jan Hubicka  ---
> Following wrong error is printed with LTO:
> 
> $ cat global.cpp
> register int a __asm__("r12");
> 
> class b {
> public:
>   b();
> };
> 
> b c;
> 
> int main() { a = 3; }
> 
> $ g++ global.cpp -O2  -flto
> global.cpp: In function ‘main’:
> global.cpp:1:14: error: global register variable follows a function definition
>  register int a __asm__("r12");
>   ^
> lto-wrapper: fatal error: g++ returned 1 exit status
> compilation terminated.
> /usr/lib64/gcc/x86_64-suse-linux/8/../../../../x86_64-suse-linux/bin/ld: 
> error:
> lto-wrapper failed
> collect2: error: ld returned 1 exit status

Global register variables are not really working with LTO (because they should
be part of the function context since they may differ across units).

We could try to fix them for gcc 9.  I was thinking a bit about it, but it
would
need to attach the information somewhere into optimization node and make
inliner
to not inline across different globally allocated registers.

It is overall questionable how global registers should interact with the symbol
table.

Honza

[Bug target/84481] [8/9 Regression] 429.mcf with -O2 regresses by ~6% and ~4%, depending on tuning, on Zen compared to GCC 7.2

2018-06-29 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84481

--- Comment #4 from Jan Hubicka  ---
Ahoj,
jeste jedna vec je to, ze by asi slo udelat konzervativni slejvani pro VPT -
testovat
jen jestli stejna hodnota vyhraje ve vsech runech co maji nenulovy count. To by
melo
byt stabilni vuci poradi.

Honza

[Bug rtl-optimization/86108] [8/9 Regression] crash during unwinding with -O2

2018-06-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86108

--- Comment #7 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86108
> 
> --- Comment #6 from Richard Biener  ---
> (In reply to Jakub Jelinek from comment #5)
> > The problem is in cross-jumping, we have a landing pad with one
> > EDGE_CROSSING and some EH predecessor edges.  The DF code treats the
> > bb_has_eh_pred specially and creates artificial generation of the
> > EH_RETURN_DATA_REGNO blocks at the start of those blocks, rather than making
> > those regs visible on the in edge.
> > 
> > I've tried:
> > --- gcc/bb-reorder.c.jj 2018-05-31 21:51:18.508292965 +0200
> > +++ gcc/bb-reorder.c2018-06-15 12:57:34.501095317 +0200
> > @@ -1507,8 +1507,11 @@ dw2_fix_up_crossing_landing_pad (eh_land
> >new_lp->landing_pad = gen_label_rtx ();
> >LABEL_PRESERVE_P (new_lp->landing_pad) = 1;
> >  
> > +  e = split_block_after_labels (old_bb);
> > +  old_bb = e->src;
> > +
> >/* Create the forwarder block.  */
> > -  basic_block new_bb = create_forwarder_block (new_lp->landing_pad, 
> > old_bb);
> > +  basic_block new_bb = create_forwarder_block (new_lp->landing_pad,
> > e->dest);
> >  
> >/* Fix up the edges.  */
> >for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
> > to make sure we don't have bbs with both EDGE_CROSSING and EH incoming 
> > edges.
> > Another possibility is to disallow cross-jumping of the bb_has_eh_pred basic
> > blocks, like:
> > --- gcc/cfgcleanup.c.jj 2018-04-25 09:41:37.753686037 +0200
> > +++ gcc/cfgcleanup.c2018-06-15 13:18:43.173257421 +0200
> > @@ -1976,6 +1976,12 @@ try_crossjump_to_edge (int mode, edge e1
> >if (!outgoing_edges_match (mode, src1, src2))
> >  return false;
> >  
> > +  /* The DF uses magic for EH landing pads where the EH_RETURN_DATA_REGNO
> > + regs are artificially defined at the start.  Avoid cross-jumping in
> > + between the EH landing pads and other bbs.  */
> > +  if (bb_has_eh_pred (src1) != bb_has_eh_pred (src2))
> > +return false;
> > +
> >/* ... and part the second.  */
> >nmatch = flow_find_cross_jump (src1, src2, , , );
> >  
> > Either of the patches fix the testcase, there is some code growth though,
> > but maybe it is mainly about adding the missing register moves that are
> > incorrectly missing without them.  Without the patches main has 3259 bytes
> > and main.cold 1276 bytes, with the first path it is 3337/1371 bytes and with
> > the second patch instead 3337/1374 bytes.
> > Conceptually, I think the first patch is better, the DF info isn't incorrect
> > then (if we have bbs with both crossing and EH predecessors, we don't
> > mention the EH regs in lr/live in on the EH pad nor in lr/live out on the EH
> > pad in the other partition that just branches to it.
> 
> You mean without splitting the block the DF info is incorrect?  If so should
> we add sth to verify-flow-info that makes sure we do not have EDGE_CROSSING
> and EH incoming edges?  Can't an EH edge be EDGE_CROSSING itself?

With dwarf debug info EH edges needs to be non-crossing because they are
represented
by relative relocations. AFAIK

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug testsuite/86158] [9 regression] gcc.c-torture/unsorted/dump-noaddr.c.*i.lto-stream-out fails starting with 261546

2018-06-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86158

--- Comment #3 from Jan Hubicka  ---
The dumps contains addresses of tree nodes streamed.  I will work out how to
silence them.

Honza

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-06-05 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #22 from Jan Hubicka  ---
> The IPA SRA time is all spent in compute_fn_summary via convert_callers.
> Not sure why that's necessary here?  Martin, in r152368 you reduced those
> to once-per-caller but obviously if each function calls each other function
> as in this testcase this is still O(n^2).  Why's the summary not simply
> recomputed when we process the caller next?  Thus at most N times?

This is because summary needs to be ready for early inliner to decide whether
caller is good for inlning or not.  I think we can simply mark it as dirty and
compute on demand from the inliner.

I also have finally working patches for incremental update of inline summary in
the IPA inliner.

Honza

[Bug c++/85535] bogus code in decl2.c:decl_needed_p

2018-05-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85535

--- Comment #9 from Jan Hubicka  ---
The code is intended to avoid specializations that are done only to possibly
inline the function. When not optimizing this only happens for always inlines
and doing so is just waste of effort.

In this case you have method keyed to other compilation unit. I will check what
flags we get in this case.

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-18 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #3 from Jan Hubicka  ---
> Haswell tuning was done many years ago.  We really shouldn't change it.
> For newer processors, we need to investigate PARTIAL_REG_DEPENDENCY vs
> PARTIAL_REG_STALL.
I have revisited the tunning options primarily to define more reasonable
generic.
For that I have revisited some flags which seems to have been set incorrectly.
We run regular benchmarks on Haswell at
https://gcc.opensuse.org/gcc-old/index.html
(Czerny) and especially specfp2000 has improved noticeably past release cycle.
https://gcc.opensuse.org/gcc-old/SPEC/CFP/sb-czerny-head-64/mean-fp_big.png

There are quite few haswell chips around so I do not see why we should stop
trying to improve code generated there plus it would be good to have fewer
combinations enabled for differnt generations.

So I would suggest to revisit PARTIAL_REG_DEPENDENCY wrt PARTIAL_REG_STALL
for Haswell+

Honza

[Bug c++/85799] __builin_expect doesn't propagate through inlined functions

2018-05-16 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85799

--- Comment #4 from Jan Hubicka  ---
> I believe inlining is happening too late for it to have an effect - we are
> early inlining a body which has the __builtin_expect replaced by nothing.
> 
> Iff the expected outcome is "constant" a new function attribute would work.
> For outcome dependent on context that wouldn't work.
> 
> Not sure if we can simply introduce a return-value predictor that survives
> inlining?  Honza?

with martin we added strip_predict_hints to the end of early optimization queue
at https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01573.html
this removes the hint and it is not used by the branch prediction of the outer
function.  The statement predictors are somewhat problematic here becuase they
may associate with wrong conditional if the inner function body is optimized.

I guess we have two options
 1) do not re-do branch prediction on inlined code.
 2) strip just selected hints and not others (like builtin_expect) that is
safe even after inlning.
Not sure which one is better...

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #11 from Jan Hubicka  ---
> 
> ??  That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
> Why should gcc duplicate that?
I suppose there would be small room for improvements where GCC could use the
fact that one string's address is actually address of another string + offset.
That may save some relocations.

Honza

[Bug bootstrap/85571] [9 Regression] non-bootstrap-debug miscompare with trunk

2018-04-30 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85571

--- Comment #9 from Jan Hubicka  ---
> 
> so different BB re-ordering / partitioning?

That would be probably best visible from bb-reorder dumps. However...
> 
> For example in the case of gengtype from stage2/stage3 .text has the same size
> but .debug_info size differs by one byte.  But gentype code differs as well:
> 
> @@ -2884,6 +2884,7 @@
>xx:  xx xx xx xx xx  xxllq  xx <_ZLxxpxxk_state_tokeni>
>xx:  xx xx xx xx xx  mov$0x1,%xxi
>xx:  xx xx xxmov%rax,%rxx
> +  xx:  xx xx xx xx mov%rax,(%rsp)

This is something else and such changes may make partitioning to diverge. So
perhaps
first debug this one?
>xx:  xx xx xx xx xx  xxllq  xx <_ZLxxpxxk_state_tokeni>
>xx:  xx xx xx xx xx  mov$0x2,%xxi
>xx:  xx xx xxmov%rax,%rbp
> ...
> 
> -- 
> You are receiving this mail because:
> You reported the bug.

[Bug target/71991] Inconsistency for __attribute__ ((__always_inline__)) among LTO and non-LTO compilation

2018-04-10 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71991

--- Comment #10 from Jan Hubicka  ---
> > Well, I have tried to discuss this on IRC couple times but we got no
> > conclusion what to do here. I think I will simply go with the proposed patch
> > + additional hack to ignore arch mismatches when callee has no explicit
> > target attribute?
> 
> Or we can reject that and fix the affected packages if you believe it's just
> hack. Eventually one can remove the problematic always_inline attribute.

Well, in theory, it is all lazyness on the side of gcc - of course one can in
theory track ISA attributes at instruction granuality and make all those
inlines safe. But of course this is becuase optimize/target attributes are
hacks by themselves.

The proposed patch is consistent with hacks we already have in ipa-inline

  /* When user added an attribute to the callee honor it.  */   
  else if (lookup_attribute ("optimize", DECL_ATTRIBUTES (callee->decl))
   && opts_for_fn (caller->decl) != opts_for_fn (callee->decl)) 

It may not be that easy to fix these at user side - for example current logic
will block you from using xmmintrin.h functions from any function with custom
target attribute specifying ISA flags, which I think was one of original
motivations
for those.

[Bug lto/81004] [7 Regression] linking failed with -flto and static libboost_program_options

2018-03-08 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81004

--- Comment #34 from Jan Hubicka  ---
Hi,
I do not get any assertion at lto.c:840, so please also check if that still
reproduce
with current branch and let us know the backtrace.

Honza

[Bug lto/81004] [7 Regression] linking failed with -flto and static libboost_program_options

2018-03-08 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81004

--- Comment #31 from Jan Hubicka  ---
I will take a look.

Honza

[Bug c/84229] A valid code rejected with -flto

2018-02-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84229

--- Comment #9 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84229
> 
> --- Comment #8 from rguenther at suse dot de  ---
> On Mon, 26 Feb 2018, hubicka at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84229
> > 
> > --- Comment #7 from Jan Hubicka  ---
> > I am not sure it is really fixed.  We no longer ICE, howeverw we need
> > backporting to release branches and also I think we miss fortification 
> > whenever
> > we fail to inline (that is with -Os). I have some patches to inline more of
> > these fortify wrappers, but still not all of them.
> 
> The fortify wrappers are all extern inline __attribute__((gnu_inline)).
> The glibc ones, that is.

Well, the bug did reproduce for me on firefox build where we did not early
inline fortify wrapper for open and later we tried to be smart to clone it
for common parameters.

Honza

[Bug tree-optimization/83043] [8 Regression] FAIL: libgomp.graphite/force-parallel-1.c scan-tree-dump-times graphite "2 loops carried no dependency" 1 (found 0 times)

2018-02-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83043

--- Comment #9 from Jan Hubicka  ---
> This is just graphite, shouldn't it be P4?

I would say so. Indeed it is most probably some loop becoming cold and thus
quite harmless bug.  I will look into it.

[Bug c/84229] A valid code rejected with -flto

2018-02-07 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84229

--- Comment #2 from Jan Hubicka  ---
> Huh, __builtin_va_arg_pack_len is only valid within varargs functions.  The
> difference seems to be that without LTO we simply emit a library call and
> not emit aa.i:open (which is invalid) but with LTO we inline the call
> and thus end up trying to expand the builtin.
> 
> So it's rather a accepts-invalid without LTO ;)
> 
> Or LTO misbehaving in not throwing away extern inline functions that are not
> used within TU boundaries...  Honza - any opinion on that?

Well, i would say that the code is invalid.
Extern functions and LTO are fishy as at least for C semantics they can
differ in implementation between units.  We never supported that correctly
though we can do so now via syntactic aliases.  Perhaps something for next
stage1 (we could also use this to behave more consistently for comdats
built with different optimization flags in different units).
I think this is more QOI issue though.

Honza

[Bug lto/81004] [7/8 Regression] linking failed with -flto and static libboost_program_options

2018-02-05 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81004

--- Comment #21 from Jan Hubicka  ---
> It's really fixed on trunk since r257023.

Seems like it just went latent. I do not see how that change can fix the
problem.

Honza

[Bug lto/84044] Spurious -Wodr warning with -flto

2018-01-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84044

--- Comment #8 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84044
> 
> --- Comment #7 from rguenther at suse dot de  ---
> On Fri, 26 Jan 2018, marxin at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84044
> > 
> > --- Comment #5 from Martin Liška  ---
> > (In reply to Richard Biener from comment #4)
> > > (In reply to Martin Liška from comment #3)
> > > > Fixed on trunk by Richi's r256685. Is it intentional Richi that the 
> > > > revision
> > > > should fix such situations?
> > > 
> > > Not really.  It means that the following hunk removes too many TYPE_BINFOs
> > > I guess?
> > > 
> > > @@ -5150,15 +5145,9 @@ free_lang_data_in_type (tree type)
> > > {
> > >   free_lang_data_in_binfo (TYPE_BINFO (type));
> > >   /* We need to preserve link to bases and virtual table for all
> > > -polymorphic types to make devirtualization machinery working.
> > > -Debug output cares only about bases, but output also
> > > -virtual table pointers so merging of -fdevirtualize and
> > > --fno-devirtualize units is easier.  */
> > > - if ((!BINFO_VTABLE (TYPE_BINFO (type))
> > > -  || !flag_devirtualize)
> > > - && ((!BINFO_N_BASE_BINFOS (TYPE_BINFO (type))
> > > -  && !BINFO_VTABLE (TYPE_BINFO (type)))
> > > - || debug_info_level != DINFO_LEVEL_NONE))
> > > +polymorphic types to make devirtualization machinery 
> > > working. 
> > > */
> > > + if (!BINFO_VTABLE (TYPE_BINFO (type))
> > > + || !flag_devirtualize)
> > > TYPE_BINFO (type) = NULL;
> > > }
> > >  }
> > 
> > Reverting this hunk makes the warning visible again.
> 
> Interesting.  This means that with -g it didn't warn before either?
> Which means that maybe unconditionally
> 
>   && !BINFO_N_BASE_BINFOS (TYPE_BINFO (type))
> 
> was intented?  Honza?

Yep that was my tought exactly, but I want to take deeper look what we really
use and what not.

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug lto/84044] Spurious -Wodr warning with -flto

2018-01-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84044

--- Comment #6 from Jan Hubicka  ---
> > @@ -5150,15 +5145,9 @@ free_lang_data_in_type (tree type)
> > {
> >   free_lang_data_in_binfo (TYPE_BINFO (type));
> >   /* We need to preserve link to bases and virtual table for all
> > -polymorphic types to make devirtualization machinery working.
> > -Debug output cares only about bases, but output also
> > -virtual table pointers so merging of -fdevirtualize and
> > --fno-devirtualize units is easier.  */
> > - if ((!BINFO_VTABLE (TYPE_BINFO (type))
> > -  || !flag_devirtualize)
> > - && ((!BINFO_N_BASE_BINFOS (TYPE_BINFO (type))
> > -  && !BINFO_VTABLE (TYPE_BINFO (type)))
> > - || debug_info_level != DINFO_LEVEL_NONE))
> > +polymorphic types to make devirtualization machinery working. 
> > */
> > + if (!BINFO_VTABLE (TYPE_BINFO (type))
> > + || !flag_devirtualize)
> > TYPE_BINFO (type) = NULL;
> > }
> >  }
> 
> Reverting this hunk makes the warning visible again.

Hmm, I will take a look. BIFOs are bit of a magic.

Thanks!
Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #49 from Jan Hubicka  ---
> matrix.c is still needing additional options to get the best out of the Ryzen
> processor. But is better than before (223029 clocks vs 371978 originally), 
> but 122677 is achievable with the right options. However the same can also be

Aha, for ryzen we would still benefit from 256 vectorization. It is not a win
overall and it will need bigger surgey to vectorizer to implement properly, so
that will wait for next stage1 unfortunately.

This is the gap between -march=znver1 -mtune=generic and -march=znver1, so
about
17%

Concerning your options -mprefer-vector-width=none -mno-fma -mno-avx2 -O3
With Martin's patch in -mno-fma should no longer have effect here.  Not sure
why -mno-avx2 would be a win either. We originally introduced it to disable
scatter/gather in the other benchmark but that one is solved too.
Do those two option still improve the scores for you.

It is alaso mystery to me why -march=ivybridge would benefit anything as the
isa is more or less supperset of znver. I will try to find more to check more.

Honza

[Bug target/83358] [8 Regression] division not converted with Intel tuning since r253934

2017-12-11 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83358

--- Comment #3 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83358
> 
> --- Comment #2 from Markus Trippelsdorf  ---
> The following fixes this particular issue:
> 
> diff --git a/gcc/config/i386/x86-tune-costs.h
> b/gcc/config/i386/x86-tune-costs.h 
> index 312467d9788..00f1dae9085 100644   
> --- a/gcc/config/i386/x86-tune-costs.h  
> +++ b/gcc/config/i386/x86-tune-costs.h  
> @@ -2345,7 +2345,7 @@ struct processor_costs core_cost = {
>   
>{COSTS_N_INSNS (8),  /* cost of a divide/mod for QI */ 
>   
> COSTS_N_INSNS (8),  /*  HI */ 
>   
> /* 8-11 */  
> -   COSTS_N_INSNS (11), /*  SI */ 
>   
> +   COSTS_N_INSNS (13), /*  SI */ 
>   
> /* 24-81 */ 
> COSTS_N_INSNS (81), /*  DI */ 
>   
> COSTS_N_INSNS (81)},/* 
> other */ 
> 
> Perhaps the div costs are a bit too tight in general?

The main problem here is that the algorithm expading div/mod into shift/add/lea
does not consider at all the parallelism and thus the cost model is not
realistic.
I meant to write bencmark that will try different constatns and see if they are
after by idiv or by expanded sequence.

The original costs was still based on pentium4, so I brought them to be
consistently
basedd on latencies.
Increasing values bit over the estimated latencies (with comment on why that
was done)
is perhaps easiest short term solution.

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #34 from Jan Hubicka  ---
> So gcc loses on mt19937ar.c without -mno-avx2
> But gcc wins big on matrix.c, especially with -mprefer-vector-width=none
> -mno-fma

It is because llvm does not use vgather at all unless avx512 is present.  I
will
look into the vgather cost model tomorrow.

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #30 from Jan Hubicka  ---
Sorry, with -mno-avx2 I was speaking of the other mt benchmark.  There is no
need for gathers
in matrix multiplication...

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #27 from Jan Hubicka  ---
Hi,
one of problem here is use of vgather instruction.  It is hardly a win on Zen
architecture.
It is also on my TODO to adjust the code model to disable it for most loops.  I
only want
to benchmark if it is a win at all in some cases or not at all to set proper
weights.
You can disable it with -mno-avx2

Still the code is bit worse than for -march=amdfam10 -mtune=k8 which is bit
funny.
I will take a look at that.

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #26 from Jan Hubicka  ---
On you matrix benchmarks I get:

  Vector inside of loop cost: 44
  Vector prologue cost: 12
  Vector epilogue cost: 0
  Scalar iteration cost: 40
  Scalar outside cost: 0
  Vector outside cost: 12
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:15:7: note:   Runtime profitability threshold = 4
mult.c:15:7: note:   Static estimate profitability threshold = 4

  Vector inside of loop cost: 2428
  Vector prologue cost: 4
  Vector epilogue cost: 0
  Scalar iteration cost: 2428
  Scalar outside cost: 0
  Vector outside cost: 4
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:30:7: note:   Runtime profitability threshold = 4
mult.c:30:7: note:   Static estimate profitability threshold = 4


for 128bit vectorization and for 256bit

  Vector inside of loop cost: 88
  Vector prologue cost: 24
  Vector epilogue cost: 0
  Scalar iteration cost: 40
  Scalar outside cost: 0
  Vector outside cost: 24
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:15:7: note:   Runtime profitability threshold = 8
mult.c:15:7: note:   Static estimate profitability threshold = 8

  Vector inside of loop cost: 6472
  Vector prologue cost: 8
  Vector epilogue cost: 0
  Scalar iteration cost: 2428
  Scalar outside cost: 0
  Vector outside cost: 8
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:30:7: note:   Runtime profitability threshold = 8
mult.c:30:7: note:   Static estimate profitability threshold = 8

So if vectorizer knew to preffer bigger vector sizes when cost is about double,
it would vectoriye first loop to
256 as expected.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #25 from Jan Hubicka  ---
Hi,
I agree that the matric multiplication fma issue is important and hopefully it
will be fixed for GCC 8.  See
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00437.html

The irregularity of tune/arch is probably originating from enabling/disabling
fma
and avx256 preferrence.  I get
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma
mult.c
jh@d136:~> ./a.out
mult took 193593 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma
-mprefer-vector-width=256 mult.c
jh@d136:~> ./a.out
mult took 104745 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell
-mprefer-vector-width=256 mult.c
jh@d136:~> ./a.out
mult took 160123 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell
-mprefer-vector-width=256 -mno-fma mult.c
jh@d136:~> ./a.out
mult took 102048 clocks

90% difference on a common loop is quite noticeable.

Continuing my benchmarkings on spec2k.
This is -Ofast -march=native -mprefer-vector-width=none compared to 
-Ofast -march=native -mtune=haswell -mprefer-vector-width=128.
So neither of those are win compared to -mtune=native.

   164.gzip  140058.22407* 140057.92419*
   175.vpr   140037.53731* 140037.83704*
   176.gcc   110020.05494* 110020.05497*
   181.mcf   180021.68324* 180020.88660*
   186.crafty100020.94790* 100021.24722*
   197.parser180051.43499* 180051.83472*
   252.eon   130019.36749* 130018.27143*
   253.perlbmk   X X
   254.gap   X X
   255.vortexX X
   256.bzip2 150043.13483* 150043.53444*
   300.twolf 300056.65302* 300057.05267*
   Est. SPECint_base2000 4563
   Est. SPECint20004591

   168.wupwise   160030.95179* 160029.75387*
   171.swim  310027.411309* 310026.411739*
   172.mgrid 180031.05814* 180026.16895*
   173.applu 210025.78175* 210025.98096*
   177.mesa  140023.36006* 140023.36001*
   178.galgelX X
   179.art   260011.023702* 260011.023718*
   183.equake130013.010033* 130013.19944*
   187.facerec   190024.07931* 190017.211040*
   188.ammp  220034.46394* 220035.26249*
   189.lucas 200020.39864* 200020.89603*
   191.fma3d 210031.46686* 210030.07011*
   200.sixtrack  110041.72641* 110038.52856*
   301.apsi  260034.17630* 260034.27612*
   Est. SPECfp_base2000  7570
   Est. SPECfp2000 7947

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #22 from Jan Hubicka  ---
Hi,
this is same base (so you can see there is some noise) compared to haswell
tuning
   164.gzip  140057.12452* 140058.72384*
   175.vpr   140037.13776* 140038.33659*
   176.gcc   110020.05500* 110020.15464*
   181.mcf   180021.68327* 180020.98617*
   186.crafty100020.44905* 100021.04760*
   197.parser180051.33506* 180051.93466*
   252.eon   130018.27162* 130019.26781*
   253.perlbmk   X X
   254.gap   X X
   255.vortexX X
   256.bzip2 150042.43537* 150044.13401*
   300.twolf 300056.45317* 300056.35328*
   Est. SPECint_base2000 4632
   Est. SPECint20004548

   168.wupwise   160028.25667* 160028.75580*
   171.swim  310026.311807* 310027.411304*
   172.mgrid 180026.06930* 180031.05810*
   173.applu 210025.58239* 210025.68193*
   177.mesa  140023.45970* 140022.96116*
   178.galgelX X
   179.art   260010.923807* 260010.425014*
   183.equake130012.910039* 130012.910060*
   187.facerec   190017.311009* 190020.89135*
   188.ammp  220034.26441* 220034.26428*
   189.lucas 200020.79683* 200020.79679*
   191.fma3d 210029.77060* 210031.56660*
   200.sixtrack  110038.62847* 110040.92687*
   301.apsi  260033.17866* 260032.77952*
   Est. SPECfp_base2000  8045
   Est. SPECfp2000 7766

So mes, arta and mcf sems to benefit from Haswell tunning.
Mesa is vectorization problem (we vectorize cold loop and introduce too much
of register pressure)

What is however interesting is that zen tuning with 256bit vectorization seems
to be worse than haswell tuning.  I will run haswell with 128bit vector size.

What your  matrix multiplication benchmark runs into is issue with multiply
and add instruction.  Once machine is free I will try it, but disabling fmadd
may solve the regression.

Honza

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #21 from Jan Hubicka  ---
Hi,
this is comparing SPEC2000 -Ofast -march=native -mprefer-vector-width=128
to -Ofast -march=native -mprefer-vector-width=256 on Ryzen.

   168.wupwise   160028.25669* 160030.85187*
   171.swim  310026.411763* 310027.511261*
   172.mgrid 180026.16907* 180030.95827*
   173.applu 210025.58234* 210025.78161*
   177.mesa  140023.45971* 140023.26030*
   178.galgelX X
   179.art   260010.923752* 260010.923777*
   183.equake130012.910047* 130012.910063*
   187.facerec   190017.211025* 190024.07921*
   188.ammp  220034.26431* 220034.46397*
   189.lucas 200020.39859* 200020.49807*
   191.fma3d 210029.77061* 210031.46694*
   200.sixtrack  110038.82834* 110041.52648*
   301.apsi  260033.07873* 260033.17856*
   Est. SPECfp_base2000  8049
   Est. SPECfp2000 7590

   164.gzip  140057.12450* 140058.02413*
   175.vpr   140037.43746* 140037.53733*
   176.gcc   110020.25450* 110020.05489*
   181.mcf   180021.78310* 180021.48402*
   186.crafty100020.54874* 100020.94794*
   197.parser180051.73481* 180051.53498*
   252.eon   130018.27154* 130019.26759*
   253.perlbmk   X X
   254.gap   X X
   255.vortexX X
   256.bzip2 150042.63522* 150042.93496*
   300.twolf 300056.55313* 300056.35330*
   Est. SPECint_base2000 4612
   Est. SPECint20004575

So it does not seem to be win in general.  I will compare with -mtune=haswell
now

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #15 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
> 
> --- Comment #14 from Andrew Roberts  ---
> It would be nice if znver1 for -march and -mtune could be improved before the
> gcc 8 release. At present -march=znver1 -mtune=znver1 looks be to about the
> worst thing you could do, and not just on this vectorizable code. And given we
> tell people to use -march=native which gives this, it would be nice to 
> improve.

We benchmarked znver1 tuning quite thoroughly with spec2000, spec2006 and 2017
and istuation is not that bad. 
In August, with -O2 native tuning is about 0.3% (for both in and fp) better
than generic (this does not include vectorization becuase of -O2 and keep in
mind that spec is often bound by memory, 0.3% difference is quite noticable).
All regressions in individual benchmarks were under 2% and some fixed since
then.

For -Ofast the difference is about 0.5% for integer with two notable
regressions
wich have WIP solutions for.

Integer/core tuning went worse than generic so things was as indtended.

I will quickly re-test 256bit vectorization with specfp2k (that is fast).
Please attach regressing testcases you have and I will take a look, too.

Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #13 from Jan Hubicka  ---
> So is this option still helping with the latest microcode? Not in this case at
> least.

It is on my TODO list to re-benchmark 256bit vectorization for Zen.  I do not
think microcode is a big difference here.  Using 256 bit vectors has advantage
of exposing more of parallelism but also disadvantage of requiring more
involved setup.  So for loops that vectorize naturally (like matrix
multiplication) it can be win, while for loops that are difficult to vectorize
it is a loss. So I think the early benchmarks did not look consistent and it is
why 128bit mode was introduced.

It is not that different form vectorizing for K8 which had split SSE registers
in a similar fashion or for kabylake which splits 512 bit operations.

While rewriting the cost-model I tried to keep this in mind and more acurately
model the split operations, so it may be possible to switch to 256 by default.

Ideally vectorizer should make a deicsion whether 128 or 256 is win for
partiuclar loop but it doesn't seem to have infrastructure to do so.
My plan is to split current flag into two - preffer 128bit and assume
that registers are internally split and see if that is enough to get consistent
win for 256 bit vectorization.

Richi may know better.

Honza

[Bug bootstrap/83015] [8 regression] bootstrap comparison failure on ia64

2017-11-24 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83015

--- Comment #18 from Jan Hubicka  ---
With the release checking in stage1 it reproduces on x86-64, too.
I am testing
Index: ipa-inline.c
===
--- ipa-inline.c(revision 255103)
+++ ipa-inline.c(working copy)
@@ -1865,6 +1865,8 @@ inline_small_functions (void)
  gcc_assert (cached_badness == current_badness);
  gcc_assert (current_badness >= badness);
}
+  else
+current_badness = edge_badness (edge, false);
 #else
   current_badness = edge_badness (edge, false);
 #endif

[Bug bootstrap/82831] [8 Regression] Broken PGO bootstrap on i586-linux-gnu after r254379

2017-11-13 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82831

--- Comment #17 from Jan Hubicka  ---
> > This seems reasonable things to do. Only what BB reorder misses is that it
> > may do the partitining fixup after the duplication. I am not sure if that 
> > is 
> > desirable as that would affect existing trace that may need to be updated, 
> > too.
> 
> The result is suboptimal though, since you end up with a (cold) block in the
> hot partition whose only predecessors are in the cold partition.  What happens
> in this case if copy_bb_p returns false for the problematic block, i.e. if you
> move the test I added lines 579-584 into the copy_bb_p predicate itself?  Does
> this result in a better reordered sequence of blocks?

I have only dumps. Martin, would it be easy for you to rebuild it with the
change?

I would say that the duplication here is desriable optimization. Disabling it
would only pesmize code in hot path that we don't want to do. So we would paper
around missed optimization ICE by disabling another optimization.

Right thing to do would be to move that BB into cold partition. In this case
it won't be hard as it is not partitioned yet, so one would only need to check
that partitioning fixes are OK.

Other option may be to make partitining fixes to adjust the BB ordering.  I.e.
move all blocks promoted to be cold into beggining of cold section.  That would
still result in not completely optimal BB placement though.

Honza

  1   2   3   4   5   6   7   8   9   10   >