[PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-19 Thread Maxim Kuvyrkov
Hi,

This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED 
and SCHED_PRESSURE_MODEL) to better estimate number of available registers.

At the moment the scheduler does not account for spills in the prologues and 
restores in the epilogue, which occur from use of call-used registers.  The 
current state is, essentially, optimized for case when there is a hot loop 
inside the function, and the loop executes significantly more often than the 
prologue/epilogue.  However, on the opposite end, we have a case when the 
function is just a single non-cyclic basic block, which executes just as often 
as prologue / epilogue, so spills in the prologue hurt performance as much as 
spills in the basic block itself.  In such a case the scheduler should 
throttle-down on the number of available registers and try to not go beyond 
call-clobbered registers.

The patch uses basic block frequencies to balance the cost of using call-used 
registers for intermediate cases between the two above extremes.

The motivation for this patch was a floating-point testcase on 
arm-linux-gnueabihf (ARM is one of the few targets that use register pressure 
scheduling by default).

A "thanks" goes to Richard good discussion of the problem and suggestions on 
the approach to fix it.

The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises 
the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.

OK to apply?

--
Maxim Kuvyrkov
www.linaro.org




0001-sched_class_reg_num.ChangeLog
Description: Binary data


0001-sched_class_reg_num.patch
Description: Binary data


Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-10-19 Thread Nathaniel Smith
Hi Jakub,

Thanks for your feedback! See below.

On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek  wrote:
> On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
>> Got total silence the last 4 times I posted this, and users have been
>> bugging me about it offline, so trying again.
>>
>> This patch fixes a showstopper problem preventing the transparent use
>> of OpenMP in scientific libraries, esp. with Python. Specifically, it
>> is currently not possible to use GNU OpenMP -- even in a limited,
>> temporary manner -- in any program that uses (or might use) fork() for
>> parallelism, even if the fork() and the use of OpenMP occur at totally
>> different times. This limitation is unique to GNU OpenMP -- every
>> competing OpenMP implementation already contains something like this
>> patch. While technically not fully POSIX-compliant (because POSIX
>> gives much much weaker guarantees around fork() than any real Unix),
>> the approach used in this patch (a) performs only POSIX-compliant
>> operations when the host program is itself fully POSIX-compliant, and
>> (b) actually works perfectly reliably in practice on all commonly used
>> platforms I'm aware of.
>
> 1) gomp_we_are_forked in your patch will attempt to free the pool
>of the thread that encounters it, which is racy; consider a program
>after fork calling pthread_create several times, each thread
>thusly created then ~ at the same time doing #pragma omp parallel
>and the initial thread too.  You really should clean up the pool
>data structure only in the initial thread and nowhere else;
>for native TLS (non-emulated, IE model) the best would be to have a flag
>in the gomp_thread_pool structure,
>struct gomp_thread *thr = gomp_thread ();
>if (thr && thr->thread_pool)
>  thr->thread_pool->after_fork = true;
>should in that case be safe in the atfork child handler.
>For !HAVE_TLS or emulated TLS not sure if it is completely safe,
>it would call pthread_getspecific.  Perhaps just don't register
>atfork handler on those targets at all?

Good point. The updated patch below takes a slightly different
approach. I moved we_are_forked to the per-thread struct, and then I
moved the setting of it into the *parent* process's fork handlers --
the before-fork handler toggles it to true, then the child spawns off
and inherits this setting, and then the parent after-fork handler
toggles it back again. (Since it's per-thread, there's no race
condition here.) This lets us remove the child after-fork handler
entirely, and -- since the parent handlers aren't subject to any
restrictions on what they can call -- it works on all platforms
regardless of the TLS implementation.

> 2) can you explain why are you removing the cleanups from
>gomp_free_pool_helper ?

They aren't removed, but rather moved from the helper function (which
runs in the helper threads) into gomp_free_thread_pool (which runs in
the main thread) -- which makes it easier to run the appropriate
cleanups even in the case where the helper threads aren't running.
(But see below -- we might prefer to drop this part of the patch
entirely.)

> 3) you can call pthread_atfork many times (once for each pthread
>that creates a thread pool), that is undesirable, you want to do that
>only if the initial thread creates thread pool

Good point. I've moved the pthread_atfork call to initialize_team,
which is an __attribute__((constructor)).

I am a little uncertain whether this is the best approach, though,
because of the comment in team_destructor about wanting to correctly
handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
is that there's no way to unregister handlers, so if dlopen/dlclose is
important (is it?) then we can't call pthread_atfork from
initialize_team.

If this is a problem, then we could delay the pthread_atfork until
e.g. the first thread pool is spawned -- would this be preferred?

> 4) the testcase is clearly not portable enough, should be probably limited
>to *-*-linux* only, fork etc. will likely not work on many targets.

I think it should work on pretty much any target that has fork(); we
definitely care about having this functionality on e.g. OS X. I've
added some genericish target specifications.

> In any case, even with the patch, are you aware that you'll leak megabytes
> of thread stacks etc.?

Well, err, I wasn't, no :-). Thanks for pointing that out.

To me this does clinch the argument that a better approach would be
the one I suggested in
   https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html
i.e., of tracking whether any threadprivate variables were present,
and if not then simply shutting down the thread pools before forking.
But this would be a much more invasive change to gomp (I wouldn't know
where to start).

In the mean time, the current patch is still worthwhile. The cost is
not that bad: I wouldn't think of it as "leaking" so much as "overhead
of supporting OMP->fork->OMP".

Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

2014-10-19 Thread Xinliang David Li
On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li  wrote:
> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka  wrote:
>>> The difference in instrumentation runtime is huge -- as topn profiler
>>> is pretty expensive to run.
>>>
>>> With FDO, it is probably better to make early inlining more aggressive
>>> in order to get more context sensitive profiling.
>>
>> I agree with that, I just would like to understand where increasing the 
>> iterations
>> helps and if we can handle it without iterating (because Richi originally 
>> requested to
>> drop the iteration for correcness issues)
>> Do you have some examples?
>
> We can do FDO experiment by shutting down einline. (Note that
> increasing iteration to 2 did not actually improve performance with
> our benchmarks).

Early inlining itself has large performance impact for FDO (the
runtime of the profile-use build). With it disabled, the FDO
performance drops by >2% on average. The degradation is seen across
all benchmarks except for one.

David


>
> David
>
>> Honza
>>>
>>> David
>>>
>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka  wrote:
>>> >> Increasing the number of early inliner iterations from 1 to 2 enables 
>>> >> more
>>> >> indirect calls to be promoted/inlined before instrumentation. This in 
>>> >> turn
>>> >> reduces the instrumentation overhead, particularly for more expensive 
>>> >> indirect
>>> >> call topn profiling.
>>> >
>>> > How much difference you get here? One posibility would be also to run 
>>> > specialized
>>> > ipa-cp before profile instrumentation.
>>> >
>>> > Honza
>>> >>
>>> >> Passes internal testing and regression tests. Ok for google/4_9?
>>> >>
>>> >> 2014-10-18  Teresa Johnson  
>>> >>
>>> >> Google ref b/17934523
>>> >> * opts.c (finish_options): Increase max-early-inliner-iterations 
>>> >> to 2
>>> >> for profile-gen and profile-use builds.
>>> >>
>>> >> Index: opts.c
>>> >> ===
>>> >> --- opts.c  (revision 216286)
>>> >> +++ opts.c  (working copy)
>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
>>> >>  opts->x_param_values, opts_set->x_param_values);
>>> >>  }
>>> >>
>>> >> +  if (opts->x_profile_arc_flag
>>> >> +  || opts->x_flag_branch_probabilities)
>>> >> +{
>>> >> +  maybe_set_param_value
>>> >> +   (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>>> >> +opts->x_param_values, opts_set->x_param_values);
>>> >> +}
>>> >> +
>>> >>if (!(opts->x_flag_auto_profile
>>> >>  || (opts->x_profile_arc_flag || 
>>> >> opts->x_flag_branch_probabilities)))
>>> >>  {
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
No, Google branch is not yet synced to 4.9 head.

David

On Sun, Oct 19, 2014 at 1:11 PM, Jan Hubicka  wrote:
>> It was in upstream 4.9 branch, but got reverted in r215061 to fix
>> PR/61214 and PR/62224.
>
> Was your tests done with this change or without?
>
> Honza


Re: [fortran,patch] Handle infinities and NaNs in intrinsics code generation

2014-10-19 Thread FX
> Looks good to me. Thanks for taking care of F2003's IEEE support.

Committed as rev. 216443, thanks for the review.


> PS: You might want to browse through the current (F2008 + corrigenda
> + first F2015 additions) draft at 
> http://j3-fortran.org/doc/year/14/14-007r2.pdf
> 
> See especially the list at the beginning under the item
> "Changes to the intrinsic modules IEEE_ARITHMETIC, IEEE_EXCEPTIONS, and
> IEEE_FEATURES for conformance with ISO/IEC/IEEE 60559:2011: [...]"
> and then later in that file.

Thanks for the link.
I’d rather wait until later in the process, and let the existing F2003 / F2008 
parts mature & be tested for now.

FX

Re: [PATCH] Fix for PR63583

2014-10-19 Thread Jan Hubicka
> Hello.
> 
> I added missing gimple_asm_string comparison for a function with an asm 
> statement.
> Bootstrap and regression tests still running, ready for trunk after it 
> finishes?

OK.
(I remember pointing this out at review :))

Honza
> 
> Thank you,
> Martin

> gcc/ChangeLog:
> 
> 2014-10-19  Martin Liska  
> 
>   * ipa-icf-gimple.c (func_checker::compare_gimple_asm):
>   Gimple tempate string is compared.
> 
> gcc/testsuite/ChangeLog:
> 
> 2014-10-19  Martin Liska  
> 
>   * gcc.dg/ipa/pr63595.c: New test.

> diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
> index 792a3e4..1369b74 100644
> --- a/gcc/ipa-icf-gimple.c
> +++ b/gcc/ipa-icf-gimple.c
> @@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2)
>if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
>  return false;
>  
> +  if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0)
> +return return_false_with_msg ("ASM strings are different");
> +
>for (unsigned i = 0; i < gimple_asm_ninputs (g1); i++)
>  {
>tree input1 = gimple_asm_input_op (g1, i);
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c 
> b/gcc/testsuite/gcc.dg/ipa/pr63595.c
> new file mode 100644
> index 000..9c9f3bf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
> +
> +static int f(int t) __attribute__((noinline));
> +
> +static int g(int t) __attribute__((noinline));
> +static int g(int t)
> +{
> +asm("addl %0, 1": "+r"(t));  
> +  return t;
> +}
> +static int f(int t)
> +{
> +asm("addq %0, -1": "+r"(t));
> +  return t;
> +}
> +
> +
> +int h(int t)
> +{
> +return f(t) + g(t);
> +}
> +
> +/* { dg-final { scan-ipa-dump "ASM strings are different" "icf"  } } */
> +/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
> +/* { dg-final { cleanup-ipa-dump "icf" } } */



[PATCH] Fix for PR63583

2014-10-19 Thread Martin Liška
Hello.

I added missing gimple_asm_string comparison for a function with an asm 
statement.
Bootstrap and regression tests still running, ready for trunk after it finishes?

Thank you,
Martin
gcc/ChangeLog:

2014-10-19  Martin Liska  

* ipa-icf-gimple.c (func_checker::compare_gimple_asm):
Gimple tempate string is compared.

gcc/testsuite/ChangeLog:

2014-10-19  Martin Liska  

* gcc.dg/ipa/pr63595.c: New test.
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 792a3e4..1369b74 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2)
   if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
 return false;
 
+  if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0)
+return return_false_with_msg ("ASM strings are different");
+
   for (unsigned i = 0; i < gimple_asm_ninputs (g1); i++)
 {
   tree input1 = gimple_asm_input_op (g1, i);
diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c b/gcc/testsuite/gcc.dg/ipa/pr63595.c
new file mode 100644
index 000..9c9f3bf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+
+static int f(int t) __attribute__((noinline));
+
+static int g(int t) __attribute__((noinline));
+static int g(int t)
+{
+asm("addl %0, 1": "+r"(t));  
+  return t;
+}
+static int f(int t)
+{
+asm("addq %0, -1": "+r"(t));
+  return t;
+}
+
+
+int h(int t)
+{
+return f(t) + g(t);
+}
+
+/* { dg-final { scan-ipa-dump "ASM strings are different" "icf"  } } */
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
+/* { dg-final { cleanup-ipa-dump "icf" } } */


Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-19 Thread Gerald Pfeifer

On Friday 2014-10-10 11:53, Kyrill Tkachov wrote:

This adds a new configure-time option --enable-fix-cortex-a53-835769 that
will enable the Cortex-A53 erratum fix by default
so you don't have to specify -mfix-cortex-a53-835769 every time.

Documentation in install.texi is added.


Thank you.  Can you please also update gcc-5/changes.html on the
web side of things?

Gerald


Re: [wwwdocs] Add recent C++ changes to gcc-5/changes.html

2014-10-19 Thread Gerald Pfeifer

On Friday 2014-10-17 13:34, Jonathan Wakely wrote:

Index: htdocs/gcc-5/changes.html
===
@@ -128,12 +164,13 @@
  
 Class std::experimental::any; 
 Function template std::experimental::apply; 
+ Variable templates for type traits; 

The trailing semi-colon feels a bit odd, doesn't it?

(Also, when using a semi-colon, wouldn't the next word -- Function
and Variable here -- be lower-case?)

Gerald


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
> It was in upstream 4.9 branch, but got reverted in r215061 to fix
> PR/61214 and PR/62224.

Was your tests done with this change or without?

Honza


[PATCH, committed] Set SECTION_EXCLUDE flag for LTO sections.

2014-10-19 Thread Ilya Verbin
Hello,

This patch sets SECTION_EXCLUDE flag for LTO sections in lhd_begin_section and
fixes the corresponding gcc_GAS_CHECK_FEATURE.

Approved here: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01535.html
Bootstrapped and regtested on i686-linux and x86_64-linux, committed to trunk.

  -- Ilya


2014-10-19  Ilya Verbin  

gcc/
* configure: Regenerate.
* configure.ac: Move the test for section attribute specifier "e" in GAS
out to all i[34567]86-*-* | x86_64-*-* targets and add --fatal-warnings.
* langhooks.c (lhd_begin_section): Set SECTION_EXCLUDE flag.
* varasm.c (default_elf_asm_named_section): Guard SECTION_EXCLUDE with
ifdef HAVE_GAS_SECTION_EXCLUDE.

---

diff --git a/gcc/configure b/gcc/configure
index bd1215d..16f128f 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24676,9 +24676,12 @@ $as_echo "$as_me: WARNING: LTO for $target requires 
binutils >= 2.20.1, but vers
  ;;
  esac
fi
-   # Test if the assembler supports the section flag 'e' for specifying
-   # an excluded section.
-   { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
.section with e" >&5
+   ;;
+esac
+
+# Test if the assembler supports the section flag 'e' for specifying
+# an excluded section.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .section 
with e" >&5
 $as_echo_n "checking assembler for .section with e... " >&6; }
 if test "${gcc_cv_as_section_has_e+set}" = set; then :
   $as_echo_n "(cached) " >&6
@@ -24691,7 +24694,7 @@ fi
   elif test x$gcc_cv_as != x; then
 $as_echo '.section foo1,"e"
 .byte 0,0,0,0' > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o 
conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5
   ac_status=$?
@@ -24714,8 +24717,6 @@ cat >>confdefs.h <<_ACEOF
 #define HAVE_GAS_SECTION_EXCLUDE `if test $gcc_cv_as_section_has_e = yes; then 
echo 1; else echo 0; fi`
 _ACEOF
 
-   ;;
-esac
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for filds and 
fists mnemonics" >&5
 $as_echo_n "checking assembler for filds and fists mnemonics... " >&6; }
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 8f7c814..35ce9ee 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3799,18 +3799,19 @@ foo:nop
  ;;
  esac
fi
-   # Test if the assembler supports the section flag 'e' for specifying
-   # an excluded section.
-   gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e,
- [2,22,51],,
-[.section foo1,"e"
-.byte 0,0,0,0])
-   AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE,
- [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; 
fi`],
-  [Define if your assembler supports specifying the section flag e.])
;;
 esac
 
+# Test if the assembler supports the section flag 'e' for specifying
+# an excluded section.
+gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e,
+  [2,22,51], [--fatal-warnings],
+[.section foo1,"e"
+.byte 0,0,0,0])
+AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE,
+  [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; fi`],
+  [Define if your assembler supports specifying the section flag e.])
+
 gcc_GAS_CHECK_FEATURE([filds and fists mnemonics],
gcc_cv_as_ix86_filds,,,
[filds mem; fists mem],,
diff --git a/gcc/langhooks.c b/gcc/langhooks.c
index 7d4c294..4bdeaa0 100644
--- a/gcc/langhooks.c
+++ b/gcc/langhooks.c
@@ -660,7 +660,7 @@ lhd_begin_section (const char *name)
 saved_section = text_section;
 
   /* Create a new section and switch to it.  */
-  section = get_section (name, SECTION_DEBUG, NULL);
+  section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL);
   switch_to_section (section);
 }
 
diff --git a/gcc/varasm.c b/gcc/varasm.c
index abb743b..4ae9d58 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6141,8 +6141,10 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
 
   if (!(flags & SECTION_DEBUG))
 *f++ = 'a';
+#if defined (HAVE_GAS_SECTION_EXCLUDE) && HAVE_GAS_SECTION_EXCLUDE == 1
   if (flags & SECTION_EXCLUDE)
 *f++ = 'e';
+#endif
   if (flags & SECTION_WRITE)
 *f++ = 'w';
   if (flags & SECTION_CODE)
-- 
1.7.1


[PATCH, rtl-optimization]: Remove const_alias_set

2014-10-19 Thread Uros Bizjak
Hello!

The fix that fixed scheduler issues with AND addresses (the fix
prevented early exit for MEM_READONLY_P addresses when AND alignment
addresses were involved) caused some fall-out for libgo testsuite.
These tests triggered an assert in mems_in_disjoint_alias_sets_p,
which checks for zero alias set when flag_strict_aliasing is false. We
have had some off-list discussion with Ian Lance Taylor about this
issue.

The problem was, that Go dynamically switches off flag_strict_aliasing
after compilation started and when "unsafe" package is imported
(similar to when__attribute__ ((optimize ("-fno-strict-aliasing"))) is
used in c). To mitigate this issue, the Go frontend called
varasm_init_once again to recalculated (= cleared) const_alias_set in
this case.

As observed in [1], the fix for canon_true_depence [2] that introduced
quick exit for a MEM_READONLY_P operands made const_alias_set
redundant, it is no longer user for anything.

The patch that fixed scheduling of AND operands removed early
MEM_READONLY_P exit for memory operands with AND realignment, so
operands could reach more complex code later in the function that was
able to determine dependence of memory operands. This code includes
the call to mems_in_disjoint_alias_sets_p, and the assert triggered
again for some MEM_READONLY_P operands that have had non-zero alias
set, set from the value, cached in const_alias_set from before
flag_strict_aliasing flag was cleared.

The proposed solution is to remove const_alias_set altogether. The
MEM_READONLY_P successfully supersedes const_alias_set functionality,
and this is also confirmed by the removal of the second
varasm_init_once call in the Go frontend. In an off-list discussion,
Ian agrees that attached patch should also fix the problem.

2014-10-19  Uros Bizjak  

* varasm.c (const_alias_set): Remove.
(init_varasm_once): Remove initialization of const_alias_set.
(build_constant_desc): Do not set alias set to const_alias_set.

The patch was tested on alpha-linux-gnu [3], alphaev68-linux-gnu and
x86_64-linux-gnu {,-m32} for all default languages plus Go and
obj-c++.

The patch fixes all mentioned libgo failures on alpha.

OK for mainline?

[1] https://gcc.gnu.org/ml/gcc-patches/2013-07/msg01033.html
[2] https://gcc.gnu.org/ml/gcc-patches/2010-07/msg01758.html
[3] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg02041.html

Uros.
Index: varasm.c
===
--- varasm.c(revision 216362)
+++ varasm.c(working copy)
@@ -98,11 +98,6 @@ tree last_assemble_variable_decl;
 
 bool first_function_block_is_cold;
 
-/* We give all constants their own alias set.  Perhaps redundant with
-   MEM_READONLY_P, but pre-dates it.  */
-
-static alias_set_type const_alias_set;
-
 /* Whether we saw any functions with no_split_stack.  */
 
 static bool saw_no_split_stack;
@@ -3231,7 +3226,6 @@ build_constant_desc (tree exp)
   rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol);
   set_mem_attributes (rtl, exp, 1);
   set_mem_alias_set (rtl, 0);
-  set_mem_alias_set (rtl, const_alias_set);
 
   /* We cannot share RTX'es in pool entries.
  Mark this piece of RTL as required for unsharing.  */
@@ -5928,7 +5922,6 @@ init_varasm_once (void)
   object_block_htab = hash_table::create_ggc (31);
   const_desc_htab = hash_table::create_ggc (1009);
 
-  const_alias_set = new_alias_set ();
   shared_constant_pool = create_constant_pool ();
 
 #ifdef TEXT_SECTION_ASM_OP


Re: [PATCH][3/n] Merge from match-and-simplify, first patterns and questions

2014-10-19 Thread Marc Glisse

Hello,

looking though the patterns on the branch (not specifically the ones 
attached here), I am surprised to see so few calls to has_single_use. In 
RTL-land, we don't even valueize if there are several uses, so the 
question doesn't occur. In generic, we assume everything is single use 
(CSE could later disagree, but that's the user's fault for writing his 
code that way). In tree-ssa-forwprop.c, helpers like get_prop_source_stmt 
do test for single use. Since has_single_use is a bit painful to use in 
.pd files (separate test for generic and constants), it might deserve 
another helper function, or a special syntax.


--
Marc Glisse


Re: Is there an error in GCC Internals Manual, chapter 16.4?

2014-10-19 Thread Andreas Schwab
Panu-Kristian Poiksalo  writes:

> Chapter 16.4, "RTL Template" in GCC Internals Manual, found at
> https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#RTL-Template
> has this to say about match_scratch:
>
> "
> (match_scratch:m n constraint)
> This expression is also a placeholder for operand number n and
> indicates that operand must be a scratch or reg expression.
>
> When matching patterns, this is equivalent to
>
>   (match_operand:m n "scratch_operand" pred)
> "
> I'm wondering what is "pred" in this context?

I think it is supposed to be "constraint".  This error is already
present in the first revision of the file.  I have installed this patch
in trunk as obvious.  Thanks for the report.

Andreas.

* doc/md.texi (RTL Template) [match_scratch]: Correct equivalent
match_operand expression.

Index: doc/md.texi
===
--- doc/md.texi (revision 216440)
+++ doc/md.texi (working copy)
@@ -297,7 +297,7 @@
 When matching patterns, this is equivalent to
 
 @smallexample
-(match_operand:@var{m} @var{n} "scratch_operand" @var{pred})
+(match_operand:@var{m} @var{n} "scratch_operand" @var{constraint})
 @end smallexample
 
 but, when generating RTL, it produces a (@code{scratch}:@var{m})
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
It was in upstream 4.9 branch, but got reverted in r215061 to fix
PR/61214 and PR/62224.

David

On Sun, Oct 19, 2014 at 1:21 AM, Jan Hubicka  wrote:
>> Hi Honza,
>>
>> As David says, we will do some more experiments with the change you
>> suggest and speculative devirtualization, but we needed to make this
>> change in part to get an internal release out. One of the issues was a
>> recent change to cp/decl2.c to make virtual function decls needed
>> under flag_devirtualize.
>
> I think that one is not in current 4.9 branch (only in mainline).
>
> Honza
>>
>> Thanks!
>> Teresa
>>
>> On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka  wrote:
>> >>
>> >> The virtual functions will be emitted in some modules, right? If they
>> >> are hot, they will be included with the auxiliary modules. Note that
>> >> LIPO indirect call profile will point to the comdat copy that is
>> >> picked by the linker in the instrumentation build, so it guarantees to
>> >> exist.
>> >
>> > If you have COMDAT virtual inline function that is used by module FOO and
>> > LIPO's indirect inlining will work out that it goes to comdat copy of 
>> > module BAR
>> > (that won the merging).  It will make C++ parser to also parse BAR to get
>> > the function, but callgarph builder will ignore it, too.
>> > (this is new logic in analyze_function that with -fno-devirtualize will 
>> > just
>> > prevent all virtual functions not directly reachable to appear in symbol 
>> > table)
>> >
>> > I am surprised you hit the size limits with 4.9 only - for quite some time
>> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
>> > first
>> > that works harder to drop them early (because I hit the problem with LTO
>> > where they artifically bloat the size of LTO object files)
>> >
>> > Honza
>> >>
>> >> David
>> >>
>> >>
>> >> >
>> >> > Honza
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
>> >> >> >> Disabling devirtualization reduces code size, both for 
>> >> >> >> instrumentation (because
>> >> >> >> many more virtual functions are kept longer and therefore 
>> >> >> >> instrumented) and for
>> >> >> >> normal optimization.
>> >> >> >
>> >> >> > OK, with profile instrumentation (that you seem to try to minimize) 
>> >> >> > i can see
>> >> >> > how you get noticeably more counters because virtual functions are 
>> >> >> > kept longer.
>> >> >> > (note that 4.9 is a lot more agressive on removing unreacable 
>> >> >> > virtual functions
>> >> >> > than earlier compilers).
>> >> >> >
>> >> >> > Instead of disabling -fdevirtualize completely (that will get you 
>> >> >> > more indirect
>> >> >> > calls and thus more topn profiling) you may consider just hacking
>> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets 
>> >> >> > as
>> >> >> > reachable. (see the conditional on before_inlining_p).
>> >> >> >
>> >> >> > Of course this will get you less devirtualization (but with LTO the 
>> >> >> > difference
>> >> >> > should not be big - perhaps I could make switch for that for 
>> >> >> > mainline) and less
>> >> >> > accurate profiles when you get speculative devirtualization via topn.
>> >> >> >
>> >> >> > I would be very interested to see how much difference this makes.
>> >> >> >
>> >> >> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
> > In opts.c I have:
> > case OPT_fprofile_use:
> >   if (!opts_set->x_flag_branch_probabilities)
> > ...
> >   /* Indirect call profiling should do all useful transformations
> >  speculative devirtualization does.  */
> >   if (!opts_set->x_flag_devirtualize_speculatively
> >   && opts->x_flag_value_profile_transformations)
> > opts->x_flag_devirtualize_speculatively = false;
> >
> > so perhaps this hunk is somehow skipped with LIPO?
> 
> The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.

OK, in that case I may look into trimming down the speculation with non-LTO 
compilation.
The code makes assumptions about completeness of type hiearchy that without LTO 
are
quite far reached. I tested it on firefox and there it seemed to do mostly
good job, so I decided to try to enable it for non-LTO, too.

It would be nice to understand what really happens in your case and if there
are lot of wrong speculations or if the speculated calls simply happens to not
be on hot paths in your benchmark.
> 
> >
> > Speculative devirtualization is somehwat less useful (may have more falce
> > positives) without LTO depending on how your headers are constructed.
> > It would be interesting to see if it does a lot of mistakes on your 
> > codebase.
> > (this can be easily done by forcing it to run with profile feedback, too and
> > it will tell you when its speculation differs from speculation already 
> > there).
> >>
> >> By the way, you mentioned 'hacking the
> >> ipa.c:walk_polymorphic_call_targets to not make the possible targets
> >> as
> >> reachable' -- is that something worth doing in trunk?   With that, we
> >> can probably just turn off speculative devirtualization.
> >
> > Well, the check is there to enable inlining. Disabling it for
> > -fprofile-generate will result in lost profile samples for virtual 
> > functions.
> > Disabling it by default will prevent inlining of devirtualized calls making
> > devirtualization not really useful.
> > Perhaps with LIPO situation is bit different because you bring in the other
> > module just to inline the call as you describe.
> 
> What I meant is whether it is suitable for plain build ? I have not
> looked at the details.

Yes, dropping the reachability walk should just work, only result in less
inlining.

Honza
> 
> >
> > One thing I can imagine doing is to make inliner consider the reachable
> > (in post-inlining sense, that is after removing extern inlines and virtual
> > functions) calls with priority and account only those to unit growth model.
> > This would make it more consistent over -fdevirtualize and more realistic
> > about resulting code size.
> >
> > I sort of considered this option but did not have any good data suggesting
> > I should implement it.
> >
> > In general it would be nice to understand this problem.  Also I plan to do
> > some retunning for 5.0 so it would be nice to know if you have other issues
> > with 4.9? (I did not closely followed Google branch changes, so if you can
> > point out those that are relevant for IPA tuning, I would be very interested
> > to see what problems you hit).
> 
> Ok I will collect a list of inlining related changes and let you know latter.
> 
> thanks,
> 
> David
> 
> >
> > Honza
> >>
> >> David
> >>
> >>
> >>
> >> >
> >> > Honza
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >> >
> >> >> > Honza
> >> >> >>
> >> >> >> David
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
> >> >> >> >> Disabling devirtualization reduces code size, both for 
> >> >> >> >> instrumentation (because
> >> >> >> >> many more virtual functions are kept longer and therefore 
> >> >> >> >> instrumented) and for
> >> >> >> >> normal optimization.
> >> >> >> >
> >> >> >> > OK, with profile instrumentation (that you seem to try to 
> >> >> >> > minimize) i can see
> >> >> >> > how you get noticeably more counters because virtual functions are 
> >> >> >> > kept longer.
> >> >> >> > (note that 4.9 is a lot more agressive on removing unreacable 
> >> >> >> > virtual functions
> >> >> >> > than earlier compilers).
> >> >> >> >
> >> >> >> > Instead of disabling -fdevirtualize completely (that will get you 
> >> >> >> > more indirect
> >> >> >> > calls and thus more topn profiling) you may consider just hacking
> >> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible 
> >> >> >> > targets as
> >> >> >> > reachable. (see the conditional on before_inlining_p).
> >> >> >> >
> >> >> >> > Of course this will get you less devirtualization (but with LTO 
> >> >> >> > the difference
> >> >> >> > should not be big - perhaps I could make switch for that for 
> >> >> >> > mainline) and less
> >> >> >> > accurate profiles when you get speculative devirtualization via 
> >> >> >> > topn.
> >> >> >> >
> >> >> >> > I would be very interested to see how much difference this makes.
> >> >> >> >
> >> >> >> > Honza


Re: [C PATCH] Another initialization fix (PR c/63567)

2014-10-19 Thread Joseph S. Myers
On Sun, 19 Oct 2014, Marek Polacek wrote:

> It turned out that there is another spot where we need to allow
> initializing objects with static storage duration with compound
> literals even in C99 -- when the compound literal is inside the
> initializer.  Fixed in the same way as previously.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2014-10-18  Marek Polacek  
> 
>   PR c/63567
>   * c-typeck.c (output_init_element): Allow initializing objects with
>   static storage duration with compound literals even in C99 and add
>   pedwarn for it.
> 
>   * gcc.dg/pr63567-3.c: New test.
>   * gcc.dg/pr63567-4.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR preprocessor/42014

2014-10-19 Thread Joseph S. Myers
On Sat, 18 Oct 2014, Krzesimir Nowak wrote:

> +  pp_verbatim (context->printer,
> +"%s from %r%s:%d%R", prefix, "locus",

> +   diagnostic_report_from (context, map, "In file included");

We don't want to split up diagnostic text like that, because for 
translation it may be necessary to translate the whole "In file included 
from" text together rather than expecting two fragments to go together in 
the same way they do in English.  Now, right now this message isn't marked 
for translation anyway (an independent bug there's no need for you to 
fix), but still as a design principle things should be structured to avoid 
splitting up English fragments like that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH doc] Explain options precedence and difference between -pedantic-errors and -Werror=pedantic

2014-10-19 Thread Joseph S. Myers
On Sat, 18 Oct 2014, Manuel López-Ibáñez wrote:

> What about this version?
> 
> Give an error whenever the @dfn{base standard} (see @option{-Wpedantic})
> requires a diagnostic, in cases where there is undefined behavior at
> compile-time

Only in *some* such cases of compile-time undefined behavior.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
On Sun, Oct 19, 2014 at 1:19 AM, Jan Hubicka  wrote:
>> >
>> > I am surprised you hit the size limits with 4.9 only - for quite some time
>> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
>> > first
>> > that works harder to drop them early (because I hit the problem with LTO
>> > where they artifically bloat the size of LTO object files)
>>
>> We can dig it more to later understand why only 4.9 hits the problem.
>
> This would be very interesting, because 4.9 ought to be better here
> (removing more virtuals early) than previous compilers.
> There was number of changes in 4.9 that may affect this - some fixes at
> C++ side giving middle end more inline candidates and also this change
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
> So perhaps most of your virtual functions are !COMDAT&&!EXTERNAL, but that
> does not seem to make much sense to me either :(
>>
>> My size results with -fno-devirtualize-speculatively is out. It
>> shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
>> O2 compile.
>
> Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
> expect or have I seen on other testcases.  You can take a look at summaries in
> -fdump-ipa-devirt pass.
>
> In opts.c I have:
> case OPT_fprofile_use:
>   if (!opts_set->x_flag_branch_probabilities)
> ...
>   /* Indirect call profiling should do all useful transformations
>  speculative devirtualization does.  */
>   if (!opts_set->x_flag_devirtualize_speculatively
>   && opts->x_flag_value_profile_transformations)
> opts->x_flag_devirtualize_speculatively = false;
>
> so perhaps this hunk is somehow skipped with LIPO?

The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.

>
> Speculative devirtualization is somehwat less useful (may have more falce
> positives) without LTO depending on how your headers are constructed.
> It would be interesting to see if it does a lot of mistakes on your codebase.
> (this can be easily done by forcing it to run with profile feedback, too and
> it will tell you when its speculation differs from speculation already there).
>>
>> By the way, you mentioned 'hacking the
>> ipa.c:walk_polymorphic_call_targets to not make the possible targets
>> as
>> reachable' -- is that something worth doing in trunk?   With that, we
>> can probably just turn off speculative devirtualization.
>
> Well, the check is there to enable inlining. Disabling it for
> -fprofile-generate will result in lost profile samples for virtual functions.
> Disabling it by default will prevent inlining of devirtualized calls making
> devirtualization not really useful.
> Perhaps with LIPO situation is bit different because you bring in the other
> module just to inline the call as you describe.

What I meant is whether it is suitable for plain build ? I have not
looked at the details.

>
> One thing I can imagine doing is to make inliner consider the reachable
> (in post-inlining sense, that is after removing extern inlines and virtual
> functions) calls with priority and account only those to unit growth model.
> This would make it more consistent over -fdevirtualize and more realistic
> about resulting code size.
>
> I sort of considered this option but did not have any good data suggesting
> I should implement it.
>
> In general it would be nice to understand this problem.  Also I plan to do
> some retunning for 5.0 so it would be nice to know if you have other issues
> with 4.9? (I did not closely followed Google branch changes, so if you can
> point out those that are relevant for IPA tuning, I would be very interested
> to see what problems you hit).

Ok I will collect a list of inlining related changes and let you know latter.

thanks,

David

>
> Honza
>>
>> David
>>
>>
>>
>> >
>> > Honza
>> >>
>> >> David
>> >>
>> >>
>> >> >
>> >> > Honza
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
>> >> >> >> Disabling devirtualization reduces code size, both for 
>> >> >> >> instrumentation (because
>> >> >> >> many more virtual functions are kept longer and therefore 
>> >> >> >> instrumented) and for
>> >> >> >> normal optimization.
>> >> >> >
>> >> >> > OK, with profile instrumentation (that you seem to try to minimize) 
>> >> >> > i can see
>> >> >> > how you get noticeably more counters because virtual functions are 
>> >> >> > kept longer.
>> >> >> > (note that 4.9 is a lot more agressive on removing unreacable 
>> >> >> > virtual functions
>> >> >> > than earlier compilers).
>> >> >> >
>> >> >> > Instead of disabling -fdevirtualize completely (that will get you 
>> >> >> > more indirect
>> >> >> > calls and thus more topn profiling) you may consider just hacking
>> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets 
>> >> >> > as
>> >> >> > reachable. (see the conditional on before_inlining_p).
>> >> >> >
>> >> >> > Of course this will g

[C PATCH] Another initialization fix (PR c/63567)

2014-10-19 Thread Marek Polacek
It turned out that there is another spot where we need to allow
initializing objects with static storage duration with compound
literals even in C99 -- when the compound literal is inside the
initializer.  Fixed in the same way as previously.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-18  Marek Polacek  

PR c/63567
* c-typeck.c (output_init_element): Allow initializing objects with
static storage duration with compound literals even in C99 and add
pedwarn for it.

* gcc.dg/pr63567-3.c: New test.
* gcc.dg/pr63567-4.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 0dd3366..ee874da 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -8251,11 +8251,14 @@ output_init_element (location_t loc, tree value, tree 
origtype,
 value = array_to_pointer_conversion (input_location, value);
 
   if (TREE_CODE (value) == COMPOUND_LITERAL_EXPR
-  && require_constant_value && !flag_isoc99 && pending)
+  && require_constant_value && pending)
 {
   /* As an extension, allow initializing objects with static storage
 duration with compound literals (which are then treated just as
 the brace enclosed list they contain).  */
+  if (flag_isoc99)
+   pedwarn_init (loc, OPT_Wpedantic, "initializer element is not "
+ "constant");
   tree decl = COMPOUND_LITERAL_EXPR_DECL (value);
   value = DECL_INITIAL (decl);
 }
diff --git gcc/testsuite/gcc.dg/pr63567-3.c gcc/testsuite/gcc.dg/pr63567-3.c
index e69de29..d626406 100644
--- gcc/testsuite/gcc.dg/pr63567-3.c
+++ gcc/testsuite/gcc.dg/pr63567-3.c
@@ -0,0 +1,7 @@
+/* PR c/63567 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+struct T { int i; };
+struct S { struct T t; };
+struct S s = { .t = { (int) { 1 } } };
diff --git gcc/testsuite/gcc.dg/pr63567-4.c gcc/testsuite/gcc.dg/pr63567-4.c
index e69de29..0ca6c45 100644
--- gcc/testsuite/gcc.dg/pr63567-4.c
+++ gcc/testsuite/gcc.dg/pr63567-4.c
@@ -0,0 +1,7 @@
+/* PR c/63567 */
+/* { dg-do compile } */
+/* { dg-options "-Wpedantic" } */
+
+struct T { int i; };
+struct S { struct T t; };
+struct S s = { .t = { (int) { 1 } } }; /* { dg-warning "initializer element is 
not constant" } */

Marek


Re: [committed] Remove hppa -mjump-in-delay option

2014-10-19 Thread John David Anglin

On 19-Oct-14, at 10:13 AM, Segher Boessenkool wrote:


You can set it to "Ignored" in pa.opt now?  And remove the mask?


Yes, will update.

Dave
--
John David Anglin   dave.ang...@bell.net





Re: [committed] Remove hppa -mjump-in-delay option

2014-10-19 Thread Segher Boessenkool
On Sat, Oct 18, 2014 at 12:04:36PM -0400, John David Anglin wrote:
> The attached change removes support for the hppa -mjump-in-delay  
> option.

You can set it to "Ignored" in pa.opt now?  And remove the mask?


Segher


Re: -fuse-caller-save - Collect register usage information

2014-10-19 Thread Tom de Vries

On 17-10-14 21:24, Eric Botcazou wrote:

Let's look at the effect of the option (after the recent fix for PR61605) on
gcc.target/i386/fuse-calller-save.c:
...
   foo:
   .LFB1:
.cfi_startproc
-   pushq   %rbx
-   .cfi_def_cfa_offset 16
-   .cfi_offset 3, -16
-   movl%edi, %ebx
callbar
-   addl%ebx, %eax
-   popq%rbx
-   .cfi_def_cfa_offset 8
+   addl%edi, %eax
ret
.cfi_endproc
   .LFE1:
...
So, the effect is: instead of using a callee-save register, we use a
caller-save register to store a value that's live over a call, without
needing to add a caller-save, as would be normally the case.

If I see an option -foptimize-caller-saves, I'd expect the effect to be that
without, there are some caller-saves and with, there are less. This is not
the case in the diff above.


To me it is, "movl %edi, %ebx"/"addl %ebx, %eax" is a caller-save/restore.



I agree that it can look like that. But the insn 'movl %edi, %ebx' is generated 
by assign_parm_setup_reg at expand. AFAIU, the purpose is to decouple the value 
of the argument and its uses from the register it's passed in.


The definition of -fcaller-saves below explains why insn 'movl %edi, %ebx' is 
not a caller-save: because it's not generated before a call, but rather at the 
start of a function. This seems to be confirmed by the fact that the insn 'movl 
%edi, %ebx' is still generated with -fno-caller-saves.



I'm starting to lean towards -foptimize-call-clobbers or similar.


Yes, that's also a good name and was my initial preference.  But you pointed
out the existing -fcaller-saves:

`-fcaller-saves'
  Enable allocation of values to registers that are clobbered by
  function calls, by emitting extra instructions to save and restore
  the registers around such calls.  Such allocation is done only
  when it seems to result in better code.

so -foptimize-caller-saves can be understood as optimizing out the "extra
instructions to save and restore  the registers around such calls"  and, thus,
as having a direct relationship with -fcaller-saves.


Agree.

But, given the preference of a number of others for fipa-ra, could you live with 
that?


Thanks,
- Tom


Re: [Patch] Fix PR61889 for the w64-mingw32 case

2014-10-19 Thread Jan Hubicka
> Honza, not sure if this patch is idea, but this will unblock mingw
> build problems. Can this one get in?

Hmm, the patch is somewhat ugly and I do not know why MingW32 defines mkdir
macro and how. If Kai Tietz or other MingW32 maintainer is OK about it, the
patch is OK.

Honza
> 
> thanks,
> 
> David
> 
> On Wed, Sep 24, 2014 at 8:22 AM, Rainer Emrich
>  wrote:
> > The following patch fixes PR61889 for x86_64-w64-mingw32. Details can be 
> > found
> > on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61889
> >
> > The patch was bootstrapped on x86_64-w64-mingw32.
> >
> > If patch the patch is ok, Kai would you apply, please?
> >
> > Rainer
> >
> > 2014-09-24  Rainer Emrich  
> >
> > PR gcov-profile/61889
> > * gcc/gcov-tool.c: Remove wrong #if !defined(_WIN32)
> > * libgcc/libgcov-driver-system.c: undefine clashing macro for mkdir
> >
> >
> > Index: gcc/gcov-tool.c
> > ===
> > --- gcc/gcov-tool.c (Revision 215554)
> > +++ gcc/gcov-tool.c (Arbeitskopie)
> > @@ -89,11 +89,7 @@ gcov_output_files (const char *out, stru
> >/* Try to make directory if it doesn't already exist.  */
> >if (access (out, F_OK) == -1)
> >  {
> > -#if !defined(_WIN32)
> >if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 && errno != 
> > EEXIST)
> > -#else
> > -  if (mkdir (out) == -1 && errno != EEXIST)
> > -#endif
> >  fatal_error ("Cannot make directory %s", out);
> >  } else
> >unlink_profile_dir (out);
> > Index: libgcc/libgcov-driver-system.c
> > ===
> > --- libgcc/libgcov-driver-system.c  (Revision 215554)
> > +++ libgcc/libgcov-driver-system.c  (Arbeitskopie)
> > @@ -66,6 +66,9 @@ create_file_directory (char *filename)
> >  #ifdef TARGET_POSIX_IO
> >  && mkdir (filename, 0755) == -1
> >  #else
> > +#ifdef mkdir
> > +#undef mkdir
> > +#endif
> >  && mkdir (filename) == -1
> >  #endif
> >  /* The directory might have been made by another process.  */


Re: [PATCH] AutoFDO patch for trunk

2014-10-19 Thread Jan Hubicka
> >> +/* Member functions for string_table.  */
> >> +
> >> +string_table *
> >> +string_table::create ()
> >
> > Why this is not a constructor?
> 
> We use static initializer because it's not suggested to put too much
> logic in constructor.

Why not? :)
> >> +}
> >
> > The two hunks probably can be unified.
> > Why get_index_by_decl does not already use dwarf name and why do you need 
> > to consider both?
> 
> get_index_by_decl is actually a wrapper of get_index. It tolerates
> error when assembler name is not emitted in the debug binary (mostly
> for functions that are fully inlined). In this case, we will first try
> to find the index of the assembler name, if not found, then bfd name.
> If still not found, then we try to find name from its abstract
> location.

Yep, I am just somewhat concerned that you need to try different variations of 
the name.
I would expect the assembler name (minus the random seed mangling) to be 
sufficient.
> 
> >
> >> +  if (DECL_ABSTRACT_ORIGIN (decl))
> >> +return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN 
> >> (decl));
> >
> > What really happens for ipa-cp function clones and for split functions?
> 
> Their name suffix will be stripped before matching names.

I see and profile merged, that seems resonable.
> > I am not sure it is a win to have the things represented as VPT histograms 
> > when you shoot
> > for quite special purpose problem.  But lets deal with these incrementally, 
> > too.
> > (in general I do not see why you try to share so much with the gcov based 
> > VTP - it seems
> > easier to do the speculation by hand)
> 
> Basically there are 2 problem:
> 
> * before annotation. I agree this is a special purpose problem
> * after annotation. This should be the same problem as VPT.

Yep, I think adding VPT histograms to get devirtualization happen 
inter-procedurally
with LTO is a good idea.
You will need (incrementally I guess) to translate your indexes into the 
profile-id used
or you will need to initialize profile-ids accordingly, so the IPA pass is able 
to identify
the target cross-module.
> >> +{
> >> +  if (gcov_open (auto_profile_file, 1) == 0)
> >> +error ("Cannot open profile file %s.", auto_profile_file);
> >> +
> >> +  if (gcov_read_unsigned () != GCOV_DATA_MAGIC)
> >> +error ("AutoFDO profile magic number does not mathch.");
> >> +
> >> +  /* Skip the version number.  */
> >> +  gcov_read_unsigned ();
> >> +
> >> +  /* Skip the empty integer.  */
> >> +  gcov_read_unsigned ();
> >
> > Perhaps we can just check the values?
> 
> What do you mean? Check the value against 0?

Yes, if you have version in there it seems to make sense to check it ;)
I guess it is there so you can add extra stuff into the file format that will 
make
it incompatible with current one, so probably we should reject all versions 
greater
than 0. (especially because the tool is off-tree)
> >
> > Does this have chance to work inter-module?
> 
> Yes, it works fine with LIPO.

Important (for me) is to make it work with LTO, because LIPO is apparently not 
landing to 5.0.
> 
> Yes, this could be done.
> 
> After a second thought, I think we actually need to expose einliner
> interface to autofdo (instead of doing vpt in early inliner). This is
> because we need to mark icall as "promoted", so that later vpt passes
> will not try to promote it again. Currently in AutoFDO, we use a
> self-contained way (use a "set" to mark all promoted stmts). If we do
> vpt in einline, then we will need some flag attached to gimple to
> indicate whether it's already promoted.

If we manage to remove the iteration loop that repeativly applies the inline 
plan, you only
need way to mark edge as promoted.  I guess you can just test the speculative 
flag on the edge
to skip ones you dealt with earlier.

Lets do that incrementally though.
> +
> +@item -fauto-profile
> +@itemx -fauto-profile=@var{path}
> +@opindex fauto-profile
> +Enable sampling based feedback directed optimizations, and optimizations
> +generally profitable only with profile feedback available.
> +
> +The following options are enabled: @code{-fbranch-probabilities}, 
> @code{-fvpt},
> +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, 
> @code{-ftree-vectorize},
> +@code{ftree-loop-distribute-patterns}

I wonder about loop peeling - we do not enale it by default because we do not 
believe
our static branch predictor can well identify loops that have low expected trip 
count
(except ones that can be fully inlined).
I am not sure auto-FDO is much better here - how reliable the info about number 
of iteratoins
is?

But this is again something that can be handled separately.

Documentation seems out of date though - it seems to enable also
-finline, -fipa-cp and cloning, predictive commoning, unswitching, 
gcse-after-reload,

You do not want to enable reorder-functions because autoFDO does not include 
the time profiler.
Also I think speculative devirtualization should not be disabled.
> +
> +I

Re: [PATCH] PR36312

2014-10-19 Thread Manuel López-Ibáñez
On 18 October 2014 14:43, Anthony Brandon  wrote:
> Never mind about functions.texi. I figured out how to do it.
> Here is the new diff and changelog.
>
> libiberty/ChangeLog:
>
> 2014-10-18  Anthony Brandon  
>
> * filename_cmp.c (filename_eq): No change.

Unfortunately mklog is not 100% perfect (actually, it is 'diff -p'
which is far from perfect). You need to revise that the ChageLog makes
sense (or make mklog smarter). Thus, drop (filename_eq)...

Also, the changelogs should say PR driver/36312
(https://gcc.gnu.org/codingconventions.html#ChangeLogs).

I think you need to explain the difference between using fatal_error()
and fatal_error(UNKNOWN_LOCATION). Yes, I should know because I wrote
this part of the patch. But to be honest, I don't remember why I did
this change.

+This function first normalizes the file names so that different file names
+pointing to the same underlying file are treated as being identical.

I would suggest: "This function compares the canonical versions of the
filenames as returned by @code{lrealpath()}, so that ..."

I cannot approve the patch, but it looks fine to me. If you don't get
a reply in a few days, you should ping the relevant maintainers:
https://gcc.gnu.org/wiki/Community#ping

Great first contribution! What are your plans next?

Cheers,

Manuel.


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
> Hi Honza,
> 
> As David says, we will do some more experiments with the change you
> suggest and speculative devirtualization, but we needed to make this
> change in part to get an internal release out. One of the issues was a
> recent change to cp/decl2.c to make virtual function decls needed
> under flag_devirtualize.

I think that one is not in current 4.9 branch (only in mainline). 

Honza
> 
> Thanks!
> Teresa
> 
> On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka  wrote:
> >>
> >> The virtual functions will be emitted in some modules, right? If they
> >> are hot, they will be included with the auxiliary modules. Note that
> >> LIPO indirect call profile will point to the comdat copy that is
> >> picked by the linker in the instrumentation build, so it guarantees to
> >> exist.
> >
> > If you have COMDAT virtual inline function that is used by module FOO and
> > LIPO's indirect inlining will work out that it goes to comdat copy of 
> > module BAR
> > (that won the merging).  It will make C++ parser to also parse BAR to get
> > the function, but callgarph builder will ignore it, too.
> > (this is new logic in analyze_function that with -fno-devirtualize will just
> > prevent all virtual functions not directly reachable to appear in symbol 
> > table)
> >
> > I am surprised you hit the size limits with 4.9 only - for quite some time
> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
> > first
> > that works harder to drop them early (because I hit the problem with LTO
> > where they artifically bloat the size of LTO object files)
> >
> > Honza
> >>
> >> David
> >>
> >>
> >> >
> >> > Honza
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
> >> >> >> Disabling devirtualization reduces code size, both for 
> >> >> >> instrumentation (because
> >> >> >> many more virtual functions are kept longer and therefore 
> >> >> >> instrumented) and for
> >> >> >> normal optimization.
> >> >> >
> >> >> > OK, with profile instrumentation (that you seem to try to minimize) i 
> >> >> > can see
> >> >> > how you get noticeably more counters because virtual functions are 
> >> >> > kept longer.
> >> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual 
> >> >> > functions
> >> >> > than earlier compilers).
> >> >> >
> >> >> > Instead of disabling -fdevirtualize completely (that will get you 
> >> >> > more indirect
> >> >> > calls and thus more topn profiling) you may consider just hacking
> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets 
> >> >> > as
> >> >> > reachable. (see the conditional on before_inlining_p).
> >> >> >
> >> >> > Of course this will get you less devirtualization (but with LTO the 
> >> >> > difference
> >> >> > should not be big - perhaps I could make switch for that for 
> >> >> > mainline) and less
> >> >> > accurate profiles when you get speculative devirtualization via topn.
> >> >> >
> >> >> > I would be very interested to see how much difference this makes.
> >> >> >
> >> >> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
> >
> > I am surprised you hit the size limits with 4.9 only - for quite some time
> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
> > first
> > that works harder to drop them early (because I hit the problem with LTO
> > where they artifically bloat the size of LTO object files)
> 
> We can dig it more to later understand why only 4.9 hits the problem.

This would be very interesting, because 4.9 ought to be better here
(removing more virtuals early) than previous compilers.
There was number of changes in 4.9 that may affect this - some fixes at
C++ side giving middle end more inline candidates and also this change
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
So perhaps most of your virtual functions are !COMDAT&&!EXTERNAL, but that
does not seem to make much sense to me either :(
> 
> My size results with -fno-devirtualize-speculatively is out. It
> shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
> O2 compile.

Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
expect or have I seen on other testcases.  You can take a look at summaries in
-fdump-ipa-devirt pass.

In opts.c I have:
case OPT_fprofile_use:
  if (!opts_set->x_flag_branch_probabilities)
...
  /* Indirect call profiling should do all useful transformations
 speculative devirtualization does.  */
  if (!opts_set->x_flag_devirtualize_speculatively
  && opts->x_flag_value_profile_transformations)
opts->x_flag_devirtualize_speculatively = false;

so perhaps this hunk is somehow skipped with LIPO?

Speculative devirtualization is somehwat less useful (may have more falce
positives) without LTO depending on how your headers are constructed.
It would be interesting to see if it does a lot of mistakes on your codebase.
(this can be easily done by forcing it to run with profile feedback, too and
it will tell you when its speculation differs from speculation already there).
> 
> By the way, you mentioned 'hacking the
> ipa.c:walk_polymorphic_call_targets to not make the possible targets
> as
> reachable' -- is that something worth doing in trunk?   With that, we
> can probably just turn off speculative devirtualization.

Well, the check is there to enable inlining. Disabling it for
-fprofile-generate will result in lost profile samples for virtual functions.
Disabling it by default will prevent inlining of devirtualized calls making
devirtualization not really useful.
Perhaps with LIPO situation is bit different because you bring in the other
module just to inline the call as you describe.

One thing I can imagine doing is to make inliner consider the reachable
(in post-inlining sense, that is after removing extern inlines and virtual
functions) calls with priority and account only those to unit growth model.
This would make it more consistent over -fdevirtualize and more realistic
about resulting code size.

I sort of considered this option but did not have any good data suggesting
I should implement it.

In general it would be nice to understand this problem.  Also I plan to do
some retunning for 5.0 so it would be nice to know if you have other issues
with 4.9? (I did not closely followed Google branch changes, so if you can
point out those that are relevant for IPA tuning, I would be very interested
to see what problems you hit).

Honza
> 
> David
> 
> 
> 
> >
> > Honza
> >>
> >> David
> >>
> >>
> >> >
> >> > Honza
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
> >> >> >> Disabling devirtualization reduces code size, both for 
> >> >> >> instrumentation (because
> >> >> >> many more virtual functions are kept longer and therefore 
> >> >> >> instrumented) and for
> >> >> >> normal optimization.
> >> >> >
> >> >> > OK, with profile instrumentation (that you seem to try to minimize) i 
> >> >> > can see
> >> >> > how you get noticeably more counters because virtual functions are 
> >> >> > kept longer.
> >> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual 
> >> >> > functions
> >> >> > than earlier compilers).
> >> >> >
> >> >> > Instead of disabling -fdevirtualize completely (that will get you 
> >> >> > more indirect
> >> >> > calls and thus more topn profiling) you may consider just hacking
> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets 
> >> >> > as
> >> >> > reachable. (see the conditional on before_inlining_p).
> >> >> >
> >> >> > Of course this will get you less devirtualization (but with LTO the 
> >> >> > difference
> >> >> > should not be big - perhaps I could make switch for that for 
> >> >> > mainline) and less
> >> >> > accurate profiles when you get speculative devirtualization via topn.
> >> >> >
> >> >> > I would be very interested to see how much difference this makes.
> >> >> >
> >> >> > Honza


Re: [PATCH 5/5] New tests introduction

2014-10-19 Thread Andreas Schwab
Martin Liška  writes:

> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c 
> b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c
> new file mode 100644
> index 000..7358e43
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-ipa-icf"  } */
> +
> +#include 
> +
> +__attribute__ ((noinline))
> +void foo()
> +{
> +  float x = 1.2345f;
> +  __m128 v =_mm_load1_ps(&x);
> +}
> +
> +__attribute__ ((noinline))
> +void bar()
> +{
> +  float x = 1.2345f;
> +  __m128 v =_mm_load1_ps(&x);
> +}
> +
> +int main()
> +{
> +  return 2;
> +}
> +
> +/* { dg-final { scan-ipa-dump "Semantic equality hit:bar->foo" "icf"  } } */
> +/* { dg-final { scan-ipa-dump "Equal symbols: 1" "icf"  } } */
> +/* { dg-final { cleanup-ipa-dump "icf" } } */

FAIL: gcc.dg/ipa/ipa-icf-21.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20141019/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c:4:23: fatal e\
rror: xmmintrin.h: No such file or directory
compilation terminated.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: IPA ICF fallout: fix for two ipa-icf-*.C tests

2014-10-19 Thread Andreas Schwab
Martin Liška  writes:

> diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C 
> b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
> index 9d17889..9434289 100644
> --- a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
> +++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
> @@ -44,5 +44,5 @@ int main()
>  }
>  
>  /* { dg-final { scan-ipa-dump "Varpool alias has been created" "icf"  } } */
> -/* { dg-final { scan-ipa-dump "Equal symbols: 2" "icf"  } } */
> +/* { dg-final { scan-ipa-dump "Equal symbols: 6" "icf"  } } */
>  /* { dg-final { cleanup-ipa-dump "icf" } } */

On ia64:

FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++98  scan-ipa-dump icf "Varpool alias 
has been created"
FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++98  scan-ipa-dump icf "Equal symbols: 6"

$ grep -e Equal -e Varpool ipa-icf-4.C.045i.icf
Equal symbols: 5

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."