[PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand

2014-10-14 Thread Uros Bizjak
Hello!

Attached patch fixes PR63475, where postreload CSE propagates aliased
memory operand.

The core of the problem was with the call to base_alias_check when
VALUE RTXes are involved. Before the call, find_base_term is used to
extract the base of x_addr and mem_addr. Please note that
find_base_term is able to extract the bases from VALUE RTXes. These
extracted bases were passed to base_alias_check, together with
original VALUE RTXes x_addr and mem_addr.

The problem begins here. base_alias_check doesn't handle VALUE RTXes,
and uses e.g. canon_rtx on VALUEs and various GET_CODE accessors to
determine various properties of passed x_addr and mem_addr. One of
these check checks for the AND alignment addresses to prevent:

  /* Differing symbols not accessed via AND never alias.  */
  if (GET_CODE (x_base) != ADDRESS  GET_CODE (y_base) != ADDRESS)
return 0;

early exit. However, when x and y are passed as VALUE RTXes (that
corresponds and hides the address with AND), and preceding calls to
find_base_term are nevertheless able to extract the bases of x and y,
this condition fires erroneously and invalid return value is returned
(with 0 meaning that the addresses X and Y are known to point to
different objects).

The solution is to always extract values for x_addr and mem_addr and
use them in the calls to find_base_term and base_alias_check.

[It can happen that get_addr is not able to match VALUE RTX with some
address, so it is not possible to simply add a bunch of GET_CODE (x)
!= VALUE asserts in base_alias_check. But in this case find_base_term
returns ADDRESS RTX, so we stay in sync as far as base_alias_check is
concerned (see the quoted code above).]

Added benefit of the patch is, that canon_rtx now works as expected.
canon_rtx does NOT handle VALUE RTXes.

A small optimization is also present. If the address is already
canonicalized, we pass original address to memrefs_conflict_p, but we
have to extract original address for preceding functions nevertheless.
Also, we use extracted original address in recently added check for
AND aligned addresses when checking for MEM_READONLY_P.

The patch also removes a couple of unneeded and unused calls to
canon_rtx, also to show the level of bitrot in this area ...

2014-10-14  Uros Bizjak  ubiz...@gmail.com

PR rtl-optimization/63475
* alias.c (true_dependence_1): Always use get_addr to extract
true address operands from x_addr and mem_addr.  Use extracted
address operands to check for references with alignment ANDs.
Use extracted address operands with find_base_term and
base_alis_check. For noncanonicalized operands call canon_rtx with
extracted address operand.
(write_dependence_1): Ditto.
(may_alias_p): Ditto.  Remove unused calls to canon_rtx.

Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and
alpha-linux-gnu for all default languages plus obj-c++ and go. While
there was no differences on x86_64-linux-gnu (as expected),
alpha-linux-gnu improved the result [1] for some hundred of PASSes in
gfortran testsuite [2].

OK for mainline?

[1] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01151.html
[2] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01478.html

Uros.
Index: alias.c
===
--- alias.c (revision 216149)
+++ alias.c (working copy)
@@ -2439,6 +2439,7 @@ static int
 true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
   const_rtx x, rtx x_addr, bool mem_canonicalized)
 {
+  rtx true_mem_addr;
   rtx base;
   int ret;
 
@@ -2458,6 +2459,10 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
+  if (! x_addr)
+x_addr = XEXP (x, 0);
+  x_addr = get_addr (x_addr);
+
   if (! mem_addr)
 {
   mem_addr = XEXP (mem, 0);
@@ -2464,23 +2469,8 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   if (mem_mode == VOIDmode)
mem_mode = GET_MODE (mem);
 }
+  true_mem_addr = get_addr (mem_addr);
 
-  if (! x_addr)
-{
-  x_addr = XEXP (x, 0);
-  if (!((GET_CODE (x_addr) == VALUE
- GET_CODE (mem_addr) != VALUE
- reg_mentioned_p (x_addr, mem_addr))
-   || (GET_CODE (x_addr) != VALUE
-GET_CODE (mem_addr) == VALUE
-reg_mentioned_p (mem_addr, x_addr
-   {
- x_addr = get_addr (x_addr);
- if (! mem_canonicalized)
-   mem_addr = get_addr (mem_addr);
-   }
-}
-
   /* Read-only memory is by definition never modified, and therefore can't
  conflict with anything.  However, don't assume anything when AND
  addresses are involved and leave to the code below to determine
@@ -2488,7 +2478,7 @@ true_dependence_1 (const_rtx mem, enum machine_mod
  stupid user tricks can produce them, so don't die.  */
   if (MEM_READONLY_P (x)
GET_CODE (x_addr) != AND
-   GET_CODE (mem_addr) != AND)
+ 

Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.

2014-10-14 Thread Kirill Yukhin
Hello Uroš,
It seems like I missed to post uppdated patch.
On 25 Sep 20:11, Uros Bizjak wrote:
 I'd rather go with the second approach, it is less confusing from the
 maintainer POV. All other patterns with masking use some consistent
 template, so I'd suggest using the same approach for everything. If it
 is indeed too many patterns, then please split the patch to smaller
 pieces.
Goal was not to decrease size of the patch, I wanted to make pattern look
simpler by hiding masking stuff beyond `subst'.
Anyway, I've updated the patch.

Here it is (bootstrapped and regtested).

Is it ok for trunk?

gcc/
* config/i386/sse.md (define_mode_iterator VI_AVX2): Extend
to support AVX-512BW.
(define_mode_iterator VI124_AVX2_48_AVX512F): Remove.
(define_expand plusminus_insnmode3): Remove masking support.
(define_insn *plusminus_insnmode3): Ditto.
(define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New.
(define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
(define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto.
(define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
(define_expand sse2_avx2_andnotmode3): Remove masking support.
(define_insn *andnotmode3): Ditto.
(define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New.
(define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto.
(define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto.
(define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto.
(define_insn *absmode2): Remove masking support.
(define_insn absVI48_AVX512VL:mode2_mask): New.
(define_insn absVI12_AVX512VL:mode2_mask): Ditto.
(define_expand absmode2): Use VI_AVX2 mode iterator.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ffc831f..9edfebc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -268,8 +268,8 @@
(V4DI TARGET_AVX) V2DI])
 
 (define_mode_iterator VI_AVX2
-  [(V32QI TARGET_AVX2) V16QI
-   (V16HI TARGET_AVX2) V8HI
+  [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX2) V16QI
+   (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX2) V8HI
(V16SI TARGET_AVX512F) (V8SI TARGET_AVX2) V4SI
(V8DI TARGET_AVX512F) (V4DI TARGET_AVX2) V2DI])
 
@@ -359,12 +359,6 @@
   [(V16HI TARGET_AVX2) V8HI
(V8SI TARGET_AVX2) V4SI])
 
-(define_mode_iterator VI124_AVX2_48_AVX512F
-  [(V32QI TARGET_AVX2) V16QI
-   (V16HI TARGET_AVX2) V8HI
-   (V16SI TARGET_AVX512F) (V8SI TARGET_AVX2) V4SI
-   (V8DI TARGET_AVX512F)])
-
 (define_mode_iterator VI124_AVX512F
   [(V32QI TARGET_AVX2) V16QI
(V32HI TARGET_AVX512F) (V16HI TARGET_AVX2) V8HI
@@ -9051,20 +9045,43 @@
   TARGET_SSE2
   operands[2] = force_reg (MODEmode, CONST0_RTX (MODEmode));)
 
-(define_expand plusminus_insnmode3mask_name
+(define_expand plusminus_insnmode3
   [(set (match_operand:VI_AVX2 0 register_operand)
(plusminus:VI_AVX2
  (match_operand:VI_AVX2 1 nonimmediate_operand)
  (match_operand:VI_AVX2 2 nonimmediate_operand)))]
-  TARGET_SSE2  mask_mode512bit_condition
+  TARGET_SSE2
+  ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)
+
+(define_expand plusminus_insnmode3_mask
+  [(set (match_operand:VI48_AVX512VL 0 register_operand)
+   (vec_merge:VI48_AVX512VL
+ (plusminus:VI48_AVX512VL
+   (match_operand:VI48_AVX512VL 1 nonimmediate_operand)
+   (match_operand:VI48_AVX512VL 2 nonimmediate_operand))
+ (match_operand:VI48_AVX512VL 3 vector_move_operand)
+ (match_operand:avx512fmaskmode 4 register_operand)))]
+  TARGET_AVX512F
+  ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)
+
+(define_expand plusminus_insnmode3_mask
+  [(set (match_operand:VI12_AVX512VL 0 register_operand)
+   (vec_merge:VI12_AVX512VL
+ (plusminus:VI12_AVX512VL
+   (match_operand:VI12_AVX512VL 1 nonimmediate_operand)
+   (match_operand:VI12_AVX512VL 2 nonimmediate_operand))
+ (match_operand:VI12_AVX512VL 3 vector_move_operand)
+ (match_operand:avx512fmaskmode 4 register_operand)))]
+  TARGET_AVX512BW
   ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)
 
-(define_insn *plusminus_insnmode3mask_name
+(define_insn *plusminus_insnmode3
   [(set (match_operand:VI_AVX2 0 register_operand =x,v)
(plusminus:VI_AVX2
  (match_operand:VI_AVX2 1 nonimmediate_operand comm0,v)
  (match_operand:VI_AVX2 2 nonimmediate_operand xm,vm)))]
-  TARGET_SSE2  ix86_binary_operator_ok (CODE, MODEmode, operands)  
mask_mode512bit_condition
+  TARGET_SSE2
+ix86_binary_operator_ok (CODE, MODEmode, operands)
   @
pplusminus_mnemonicssemodesuffix\t{%2, %0|%0, %2}
vpplusminus_mnemonicssemodesuffix\t{%2, %1, 
%0mask_operand3|%0mask_operand3, %1, %2}
@@ -9074,6 +9091,35 @@
(set_attr prefix mask_prefix3)
(set_attr mode sseinsnmode)])
 
+(define_insn *plusminus_insnmode3_mask
+  [(set 

Move loop peeling from RTL to gimple

2014-10-14 Thread Jan Hubicka
Hi,
this is update of my 2013 update to 2012 patch to move rtl loop peeling
to tree level. This is to expose optimization oppurtunities earlier.
Incrementally I think I can also improve profiling to provide a histogram
on loop iterations and get more sensible peeling decisions.

profiled-bootstrapped/regtested x86_64-linux, OK?
Honza

* loop-unroll.c: (decide_unrolling_and_peeling): Rename to
(decide_unrolling): ... this one.
(peel_loops_completely): Remove.
(decide_peel_simple): Remove.
(decide_peel_once_rolling): Remove.
(decide_peel_completely): Remove.
(peel_loop_simple): Remove.
(peel_loop_completely): Remove.
(unroll_and_peel_loops): Rename to ...
(unroll_loops): ... this one; handle only unrolling.
* cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and
LPT_PEEL_SIMPLE.
(UAP_PEEL): Remove.
(unroll_and_peel_loops): Remove.
(unroll_loops): New.
* passes.def: Replace
pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops.
* loop-init.c (gate_rtl_unroll_and_peel_loops,
rtl_unroll_and_peel_loops): Rename to ...
(gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update.
(pass_rtl_unroll_and_peel_loops): Rename to ...
(pass_rtl_unroll_loops): ... this one.
* tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove.
(make_pass_rtl_unroll_loops): New.
* tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, 
try_peel_loop): New.
(canonicalize_loop_induction_variables): Update.

* gcc.dg/tree-prof/peel-1.c: Update.
* gcc.dg/tree-prof/unroll-1.c: Update.
* gcc.dg/gcc.dg/unroll_1.c: Update.
* gcc.dg/gcc.dg/unroll_2.c: Update.
* gcc.dg/gcc.dg/unroll_3.c: Update.
* gcc.dg/gcc.dg/unroll_4.c: Update.
Index: tree-pass.h
===
--- tree-pass.h (revision 216145)
+++ tree-pass.h (working copy)
@@ -504,7 +504,7 @@ extern rtl_opt_pass *make_pass_outof_cfg
 extern rtl_opt_pass *make_pass_loop2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_loop_init (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_move_loop_invariants (gcc::context *ctxt);
-extern rtl_opt_pass *make_pass_rtl_unroll_and_peel_loops (gcc::context *ctxt);
+extern rtl_opt_pass *make_pass_rtl_unroll_loops (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_doloop (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_loop_done (gcc::context *ctxt);
 
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 216145)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -28,9 +28,12 @@ along with GCC; see the file COPYING3.
variables.  In that case the created optimization possibilities are likely
to pay up.
 
-   Additionally in case we detect that it is beneficial to unroll the
-   loop completely, we do it right here to expose the optimization
-   possibilities to the following passes.  */
+   We also perform
+ - complette unrolling (or peeling) when the loops is rolling few enough
+   times
+ - simple peeling (i.e. copying few initial iterations prior the loop)
+   when number of iteration estimate is known (typically by the profile
+   info).  */
 
 #include config.h
 #include system.h
@@ -657,11 +660,12 @@ try_unroll_loop_completely (struct loop
HOST_WIDE_INT maxiter,
location_t locus)
 {
-  unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
+  unsigned HOST_WIDE_INT n_unroll = 0, ninsns, max_unroll, unr_insns;
   gimple cond;
   struct loop_size size;
   bool n_unroll_found = false;
   edge edge_to_cancel = NULL;
+  int report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_RTL | TDF_DETAILS;
 
   /* See if we proved number of iterations to be low constant.
 
@@ -821,6 +825,8 @@ try_unroll_loop_completely (struct loop
 loop-num);
  return false;
}
+  dump_printf_loc (report_flags, locus,
+   loop turned into non-loop; it never loops.\n);
 
   initialize_original_copy_tables ();
   wont_exit = sbitmap_alloc (n_unroll + 1);
@@ -902,6 +908,133 @@ try_unroll_loop_completely (struct loop
   return true;
 }
 
+/* Return number of instructions after peeling.  */
+static unsigned HOST_WIDE_INT
+estimated_peeled_sequence_size (struct loop_size *size,
+   unsigned HOST_WIDE_INT npeel)
+{
+  return MAX (npeel * (HOST_WIDE_INT) (size-overall
+  - size-eliminated_by_peeling), 1);
+}
+
+/* If the loop is expected to iterate N times and is
+   small enough, duplicate the loop body N+1 times before
+   the loop itself.  This way the hot path will never
+   enter the loop.  
+   Parameters are the same as for 

Re: Towards GNU11

2014-10-14 Thread Marek Polacek
On Tue, Oct 07, 2014 at 11:07:56PM +0200, Marek Polacek wrote:
 I'd like to kick off a discussion about moving the default standard
 for C from gnu89 to gnu11.

The consensus seems to be to go forward with this change.  I will
commit the patch in 24 hours unless I hear objections.

Thanks,

Marek


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-10-14 Thread Richard Biener
On Mon, 13 Oct 2014, Bernd Schmidt wrote:

 On 10/13/2014 12:33 PM, Ilya Verbin wrote:
  On 13 Oct 12:19, Jakub Jelinek wrote:
   But I'd like to understand why is this one needed.
   Why should the compilers care?  Aggregates layout and alignment of
   integral/floating types must match between host and offload compilers,
   sure,
   but isn't that something streamed already in the LTO bytecode?
   Or is LTO streamer not streaming some types like long_type_node?
 
 It isn't, see the preload_common_nodes code.

Something I'd like to get rid of at some point (but it's not 100%
easy as backends for example compare va_list_type_node by pointer).

 Also, the backend needs to choose
 the right Pmode (and in the case of ptx, emit a directive about address
 sizes).

Surely that will only be one problem with going the LTO way to handle
the offloading ;)

Richard.


[PATCH] Fix PR63512

2014-10-14 Thread Richard Biener

I forgot to mark stmts as modified.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

PR tree-optimization/63512
* tree-ssa-pre.c (create_expression_by_pieces): Mark stmts
modified.

* g++.dg/torture/pr63512.C: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 216146)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2897,6 +2897,7 @@ create_expression_by_pieces (basic_block
}
 
  gimple_set_vuse (stmt, BB_LIVE_VOP_ON_EXIT (block));
+ gimple_set_modified (stmt, true);
}
   gimple_seq_add_seq (stmts, forced_stmts);
 }
@@ -2904,6 +2905,7 @@ create_expression_by_pieces (basic_block
   name = make_temp_ssa_name (exprtype, NULL, pretmp);
   newstmt = gimple_build_assign (name, folded);
   gimple_set_vuse (newstmt, BB_LIVE_VOP_ON_EXIT (block));
+  gimple_set_modified (newstmt, true);
   gimple_set_plf (newstmt, NECESSARY, false);
 
   gimple_seq_add_stmt (stmts, newstmt);
Index: gcc/testsuite/g++.dg/torture/pr63512.C
===
--- gcc/testsuite/g++.dg/torture/pr63512.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr63512.C  (working copy)
@@ -0,0 +1,46 @@
+// { dg-do compile }
+
+extern C {
+void __assert_fail ();
+unsigned long strlen (const char *);
+}
+class A
+{
+  int Data;
+  int Length;
+
+public:
+  A (const char *p1) : Data ()
+  {
+p1 ? void() : __assert_fail ();
+Length = strlen (p1);
+  }
+};
+enum TokenKind
+{
+  semi
+};
+class B
+{
+public:
+  void m_fn1 ();
+};
+class C
+{
+  void m_fn2 (TokenKind, int, A);
+  struct D
+  {
+D (int);
+B Range;
+  };
+  int *m_fn3 (const int , int , int **);
+};
+int a, b;
+int *
+C::m_fn3 (const int , int , int **)
+{
+  D c (0);
+  if (a)
+c.Range.m_fn1 ();
+  m_fn2 (semi, 0, b ?  : a ? alias declaration : using declaration);
+}


Re: Move loop peeling from RTL to gimple

2014-10-14 Thread Richard Biener
On Tue, 14 Oct 2014, Jan Hubicka wrote:

 Hi,
 this is update of my 2013 update to 2012 patch to move rtl loop peeling
 to tree level. This is to expose optimization oppurtunities earlier.
 Incrementally I think I can also improve profiling to provide a histogram
 on loop iterations and get more sensible peeling decisions.
 
 profiled-bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

 Honza
 
   * loop-unroll.c: (decide_unrolling_and_peeling): Rename to
   (decide_unrolling): ... this one.
   (peel_loops_completely): Remove.
   (decide_peel_simple): Remove.
   (decide_peel_once_rolling): Remove.
   (decide_peel_completely): Remove.
   (peel_loop_simple): Remove.
   (peel_loop_completely): Remove.
   (unroll_and_peel_loops): Rename to ...
   (unroll_loops): ... this one; handle only unrolling.
   * cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and
   LPT_PEEL_SIMPLE.
   (UAP_PEEL): Remove.
   (unroll_and_peel_loops): Remove.
   (unroll_loops): New.
   * passes.def: Replace
   pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops.
   * loop-init.c (gate_rtl_unroll_and_peel_loops,
   rtl_unroll_and_peel_loops): Rename to ...
   (gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update.
   (pass_rtl_unroll_and_peel_loops): Rename to ...
   (pass_rtl_unroll_loops): ... this one.
   * tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove.
   (make_pass_rtl_unroll_loops): New.
   * tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, 
 try_peel_loop): New.
   (canonicalize_loop_induction_variables): Update.
 
   * gcc.dg/tree-prof/peel-1.c: Update.
   * gcc.dg/tree-prof/unroll-1.c: Update.
   * gcc.dg/gcc.dg/unroll_1.c: Update.
   * gcc.dg/gcc.dg/unroll_2.c: Update.
   * gcc.dg/gcc.dg/unroll_3.c: Update.
   * gcc.dg/gcc.dg/unroll_4.c: Update.
 Index: tree-pass.h
 ===
 --- tree-pass.h   (revision 216145)
 +++ tree-pass.h   (working copy)
 @@ -504,7 +504,7 @@ extern rtl_opt_pass *make_pass_outof_cfg
  extern rtl_opt_pass *make_pass_loop2 (gcc::context *ctxt);
  extern rtl_opt_pass *make_pass_rtl_loop_init (gcc::context *ctxt);
  extern rtl_opt_pass *make_pass_rtl_move_loop_invariants (gcc::context *ctxt);
 -extern rtl_opt_pass *make_pass_rtl_unroll_and_peel_loops (gcc::context 
 *ctxt);
 +extern rtl_opt_pass *make_pass_rtl_unroll_loops (gcc::context *ctxt);
  extern rtl_opt_pass *make_pass_rtl_doloop (gcc::context *ctxt);
  extern rtl_opt_pass *make_pass_rtl_loop_done (gcc::context *ctxt);
  
 Index: tree-ssa-loop-ivcanon.c
 ===
 --- tree-ssa-loop-ivcanon.c   (revision 216145)
 +++ tree-ssa-loop-ivcanon.c   (working copy)
 @@ -28,9 +28,12 @@ along with GCC; see the file COPYING3.
 variables.  In that case the created optimization possibilities are likely
 to pay up.
  
 -   Additionally in case we detect that it is beneficial to unroll the
 -   loop completely, we do it right here to expose the optimization
 -   possibilities to the following passes.  */
 +   We also perform
 + - complette unrolling (or peeling) when the loops is rolling few enough
 +   times
 + - simple peeling (i.e. copying few initial iterations prior the loop)
 +   when number of iteration estimate is known (typically by the profile
 +   info).  */
  
  #include config.h
  #include system.h
 @@ -657,11 +660,12 @@ try_unroll_loop_completely (struct loop
   HOST_WIDE_INT maxiter,
   location_t locus)
  {
 -  unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
 +  unsigned HOST_WIDE_INT n_unroll = 0, ninsns, max_unroll, unr_insns;
gimple cond;
struct loop_size size;
bool n_unroll_found = false;
edge edge_to_cancel = NULL;
 +  int report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_RTL | TDF_DETAILS;
  
/* See if we proved number of iterations to be low constant.
  
 @@ -821,6 +825,8 @@ try_unroll_loop_completely (struct loop
loop-num);
 return false;
   }
 +  dump_printf_loc (report_flags, locus,
 +   loop turned into non-loop; it never loops.\n);
  
initialize_original_copy_tables ();
wont_exit = sbitmap_alloc (n_unroll + 1);
 @@ -902,6 +908,133 @@ try_unroll_loop_completely (struct loop
return true;
  }
  
 +/* Return number of instructions after peeling.  */
 +static unsigned HOST_WIDE_INT
 +estimated_peeled_sequence_size (struct loop_size *size,
 + unsigned HOST_WIDE_INT npeel)
 +{
 +  return MAX (npeel * (HOST_WIDE_INT) (size-overall
 +- size-eliminated_by_peeling), 1);
 +}
 +
 +/* If the loop is expected to iterate N times and is
 +   small enough, duplicate the loop body N+1 times before
 +   the loop itself.  

[PATCH][match-and-simplify] Change back default behavior of fold_stmt

2014-10-14 Thread Richard Biener

This changes default behavior of fold_stmt back to _not_ following
SSA use-def chains when trying to simplify things.  I had to force
that already for one caller and for the merge to trunk I'd rather
not track down issues in every other existing caller.

This means that fold_stmt will not become more powerful, at least for now.
I still hope to get rid of its use of fold() during the merge process.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

(yeah, I'm preparing a first batch of changes to merge from the
branch)

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* gimple-fold.c (fold_stmt): Make old API never follow SSA edges
when simplifying.
(no_follow_ssa_edges): New function.
* tree-cfg.c (no_follow_ssa_edges): Remove.
(replace_uses_by): Use plain fold_stmt again.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 216146)
+++ gcc/gimple-fold.c   (working copy)
@@ -3136,6 +3136,14 @@ fail:
   return changed;
 }
 
+/* Valueziation callback that ends up not following SSA edges.  */
+
+static tree
+no_follow_ssa_edges (tree)
+{
+  return NULL_TREE;
+}
+
 /* Fold the statement pointed to by GSI.  In some cases, this function may
replace the whole statement with a new one.  Returns true iff folding
makes any changes.
@@ -3146,7 +3154,7 @@ fail:
 bool
 fold_stmt (gimple_stmt_iterator *gsi)
 {
-  return fold_stmt_1 (gsi, false, NULL);
+  return fold_stmt_1 (gsi, false, no_follow_ssa_edges);
 }
 
 bool
@@ -3167,7 +3175,7 @@ bool
 fold_stmt_inplace (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
-  bool changed = fold_stmt_1 (gsi, true, NULL);
+  bool changed = fold_stmt_1 (gsi, true, no_follow_ssa_edges);
   gcc_assert (gsi_stmt (*gsi) == stmt);
   return changed;
 }
Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 216146)
+++ gcc/tree-cfg.c  (working copy)
@@ -1709,14 +1709,6 @@ gimple_can_merge_blocks_p (basic_block a
   return true;
 }
 
-/* ???  Maybe this should be a generic overload of fold_stmt.  */
-
-static tree
-no_follow_ssa_edges (tree)
-{
-  return NULL_TREE;
-}
-
 /* Replaces all uses of NAME by VAL.  */
 
 void
@@ -1773,17 +1765,7 @@ replace_uses_by (tree name, tree val)
  recompute_tree_invariant_for_addr_expr (op);
  }
 
- /* If we have sth like
-  neighbor_29 = name + -1;
-  _33 = name + neighbor_29;
-and substitute 1 for name then when visiting
-_33 first then folding will simplify the stmt
-to _33 = name; and the new immediate use will
-be inserted before the stmt iterator marker and
-thus we fail to visit it again, ICEing within the
-has_zero_uses assert.
-Avoid that by never following SSA edges.  */
- if (fold_stmt (gsi, no_follow_ssa_edges))
+ if (fold_stmt (gsi))
stmt = gsi_stmt (gsi);
 
  if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt))


[v3] Rename a few testcases

2014-10-14 Thread Paolo Carlini

Hi,

I'm renaming a few testcases which actually are about alias declarations 
not typedefs.


Thanks,
Paolo.


2014-10-14  Paolo Carlini  paolo.carl...@oracle.com

* testsuite/20_util/add_lvalue_reference/requirements/typedefs.cc:
Rename to alias_decl.cc.
* testsuite/20_util/add_rvalue_reference/requirements/typedefs.cc:
Likewise.
* testsuite/20_util/common_type/requirements/typedefs-3.cc: Likewise.
* testsuite/20_util/conditional/requirements/typedefs-2.cc: Likewise.
* testsuite/20_util/decay/requirements/typedefs-2.cc: Likewise.
* testsuite/20_util/enable_if/requirements/typedefs-2.cc: Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-3.cc: Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-3.cc:
Likewise.
* testsuite/20_util/remove_reference/requirements/typedefs.cc:
Likewise.
* testsuite/20_util/result_of/requirements/typedefs.cc: Likewise.
* testsuite/20_util/underlying_type/requirements/typedefs-3.cc:
Likewise.


Re: [PATCH 3/5] timevar.h: Add an auto_timevar class

2014-10-14 Thread Richard Biener
On Mon, Oct 13, 2014 at 7:45 PM, David Malcolm dmalc...@redhat.com wrote:
 This is used in a couple of places in jit/jit-playback.c to ensure
 that we pop the timevar on every exit path from a function.

 I could rewrite them if need be, but it does simplify things.

Sorry to be bikeshedding but auto_timevar sounds odd - this is
just a one-element timevar stack.

Don't have a real better name though :/  Maybe timevar_pushpop ?

Otherwise this looks ok.

Thanks,
Richard.

 Written by Tom Tromey.

 gcc/ChangeLog:
 * timevar.h (class auto_timevar): New class.
 ---
  gcc/timevar.h | 24 
  1 file changed, 24 insertions(+)

 diff --git a/gcc/timevar.h b/gcc/timevar.h
 index 6703cc9..f018e39 100644
 --- a/gcc/timevar.h
 +++ b/gcc/timevar.h
 @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv)
  timevar_pop_1 (tv);
  }

 +// This is a simple timevar wrapper class that pushes a timevar in its
 +// constructor and pops the timevar in its destructor.
 +class auto_timevar
 +{
 + public:
 +  auto_timevar (timevar_id_t tv)
 +: m_tv (tv)
 +  {
 +timevar_push (m_tv);
 +  }
 +
 +  ~auto_timevar ()
 +  {
 +timevar_pop (m_tv);
 +  }
 +
 + private:
 +
 +  // Private to disallow copies.
 +  auto_timevar (const auto_timevar );
 +
 +  timevar_id_t m_tv;
 +};
 +
  extern void print_time (const char *, long);

  #endif /* ! GCC_TIMEVAR_H */
 --
 1.8.5.3



Re: [PATCH 1/2] Revert PR49721's patch

2014-10-14 Thread Richard Biener
On Tue, Oct 14, 2014 at 12:35 AM, Andrew Pinski pins...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 8:51 PM, Andrew Pinski apin...@cavium.com wrote:
 OK? When the second patch is approved?

 Ping?

Ok if the second patch was approved.

Richard.


 Thanks,
 Andrew Pinski

 ChangeLog:
 Revert:
 2011-08-19  H.J. Lu  hongjiu...@intel.com

 PR middle-end/49721
 * explow.c (convert_memory_address_addr_space): Also permute the
 conversion and addition of constant for zero-extend.

 ---
  gcc/explow.c |   19 +++
  1 files changed, 7 insertions(+), 12 deletions(-)

 diff --git a/gcc/explow.c b/gcc/explow.c
 index 92c4e57..eb7dc85 100644
 --- a/gcc/explow.c
 +++ b/gcc/explow.c
 @@ -376,23 +376,18 @@ convert_memory_address_addr_space (enum machine_mode 
 to_mode ATTRIBUTE_UNUSED,

  case PLUS:
  case MULT:
 -  /* FIXME: For addition, we used to permute the conversion and
 -addition operation only if one operand is a constant and
 -converting the constant does not change it or if one operand
 -is a constant and we are using a ptr_extend instruction
 -(POINTERS_EXTEND_UNSIGNED  0) even if the resulting address
 -may overflow/underflow.  We relax the condition to include
 -zero-extend (POINTERS_EXTEND_UNSIGNED  0) since the other
 -parts of the compiler depend on it.  See PR 49721.
 -
 +  /* For addition we can safely permute the conversion and addition
 +operation if one operand is a constant and converting the constant
 +does not change it or if one operand is a constant and we are
 +using a ptr_extend instruction  (POINTERS_EXTEND_UNSIGNED  0).
  We can always safely permute them if we are making the address
  narrower.  */
if (GET_MODE_SIZE (to_mode)  GET_MODE_SIZE (from_mode)
   || (GET_CODE (x) == PLUS
CONST_INT_P (XEXP (x, 1))
 -  (POINTERS_EXTEND_UNSIGNED != 0
 - || XEXP (x, 1) == convert_memory_address_addr_space
 -   (to_mode, XEXP (x, 1), as
 +  (XEXP (x, 1) == convert_memory_address_addr_space
 +  (to_mode, XEXP (x, 1), as)
 + || POINTERS_EXTEND_UNSIGNED  0)))
 return gen_rtx_fmt_ee (GET_CODE (x), to_mode,
convert_memory_address_addr_space
  (to_mode, XEXP (x, 0), as),
 --
 1.7.2.5



Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-14 Thread Jakub Jelinek
On Tue, Oct 14, 2014 at 02:42:47AM +0400, Ilya Verbin wrote:
  For that I guess
  lhd_begin_section
  would need to replace:
section = get_section (name, SECTION_DEBUG, NULL);
  with:
section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL);
  either just for the .gnu.offload_lto prefixed section, or all.
  The question is what will old assemblers and/or linkers do with that, and
  if there are any that support linker plugins, but not SHF_EXCLUDE.
 
 I've tried to set SECTION_EXCLUDE bit with as+ld version 2.20.51 and got a lot
 of warnings like:
 
 /tmp/ccg7P7iS.s:2: Warning: entity size for SHF_MERGE not specified
 /tmp/ccg7P7iS.s:2: Warning: group name for SHF_GROUP not specified
 as: /tmp/ccKFKXfc.o: warning: sh_link not set for section 
 `.gnu.lto_main.11d9780ff2ebf166'
 /usr/bin/ld: /tmp/ccKFKXfc.o: warning: sh_link not set for section 
 `.gnu.lto_main.11d9780ff2ebf166'
 
 I think, it can be placed under such ifdef:
 
 #if defined (HAVE_SECTION_EXCLUDE)  HAVE_SECTION_EXCLUDE == 1
   section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL);
 #else
   section = get_section (name, SECTION_DEBUG, NULL);
 #endif
 
 Currently there is HAVE_GAS_SECTION_EXCLUDE implemented in gcc/configure.ac, 
 and
 HAVE_SECTION_EXCLUDE can use it + check a version of the linker.

My preference would be to add the | SECTION_EXCLUDE unconditionally, and
instead guard the
  if (flags  SECTION_EXCLUDE)
*f++ = 'e';
in varasm.c (default_elf_asm_named_section).  The only other user of
SECTION_EXCLUDE seems to be -gsplit-dwarf right now, Cary, is such a change
ok with you?

If you have new gas and old linker, I'd expect it would just ignore
SHF_EXCLUDE.

Jakub


[PING] [PATCH] Fix PR ipa/61190, 2nd edition‏

2014-10-14 Thread Bernd Edlinger
Ping...

see: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00536.html

 Hi Honza,


 as you know, we have a wrong code bug, when a pure or const method is called 
 via a virtual thunk.
 I had some more Ideas, how to fix that, but all of them had some serious 
 draw-backs, so I leave the details out...


 But now I have a new insight, why the obvious fix for this serious code 
 generation bug did not work
 in the first place.


 And the reason was, that if ipa-pure-const.c calls set_const_flag or 
 set_pure_flag for a thunk, it calls the same
 function later for the called method, and this overwrites the flags of _all_ 
 associated thunks and aliases.
 However that should at least not be done for virtual thunks, as these need to 
 be IPA_NEITHER, even if
 the method itself has different attributes, that is because the assembler 
 thunk accesses the vtable, while
 other thunks do not.


 So I re-factored set_const_flag and set_pure_flag to exclude the virtual 
 thunks, taking care that other
 users of call_for_symbol_thunks_and_aliases do not get a different behavior 
 than before this patch.


 The attached patch was boot-strapped and
 regression-tested on x86_64-linux-gnu.
 Ok for trunk?


 PS: As a side-note, there are two identical functions, named 
 call_for_symbol_and_aliases, in
 class symtab_node and in class cgraph_node, which inherits from symtab_node. 
 Both functions are
 not declared virtual.  Is that what's intended?  Usually this could lead to 
 errors, or at least some serious
 compiler warnings.


 Thanks
 Bernd.

  

[PATCH][match-and-simplify] More TLC to genmatch

2014-10-14 Thread Richard Biener

This applies more comment / whitespace TLC to genmatch and does
minor refactoring on-the-fly.

Bootstrap running on x86_64-unknown-linux-gnu.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* genmatch.c: Whitespace and comment fixes, some minor
refactoring.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 216146)
+++ gcc/genmatch.c  (working copy)
@@ -390,7 +390,8 @@ struct expr : public operand
   /* Whether the operation is to be applied commutatively.  This is
  later lowered to two separate patterns.  */
   bool is_commutative;
-  virtual void gen_transform (FILE *f, const char *, bool, int, const char *, 
dt_operand ** = 0);
+  virtual void gen_transform (FILE *f, const char *, bool, int,
+ const char *, dt_operand ** = 0);
 };
 
 /* An operator that is represented by native C code.  This is always
@@ -419,7 +420,8 @@ struct c_expr : public operand
   unsigned nr_stmts;
   /* The identifier replacement vector.  */
   vecid_tab ids;
-  virtual void gen_transform (FILE *f, const char *, bool, int, const char *, 
dt_operand **);
+  virtual void gen_transform (FILE *f, const char *, bool, int,
+ const char *, dt_operand **);
 };
 
 /* A wrapper around another operand that captures its value.  */
@@ -432,7 +434,8 @@ struct capture : public operand
   unsigned where;
   /* The captured value.  */
   operand *what;
-  virtual void gen_transform (FILE *f, const char *, bool, int, const char *, 
dt_operand ** = 0);
+  virtual void gen_transform (FILE *f, const char *, bool, int,
+ const char *, dt_operand ** = 0);
 };
 
 template
@@ -569,7 +572,8 @@ print_matches (struct simplify *s, FILE
 /* Lowering of commutative operators.  */
 
 static void
-cartesian_product (const vec vecoperand *  ops_vector, vec vecoperand 
*  result, vecoperand * v, unsigned n)
+cartesian_product (const vec vecoperand *  ops_vector,
+  vec vecoperand *  result, vecoperand * v, unsigned n)
 {
   if (n == ops_vector.length ())
 {
@@ -584,14 +588,8 @@ cartesian_product (const vec vecoperan
   cartesian_product (ops_vector, result, v, n + 1);
 }
 }
- 
-static void
-cartesian_product (const vec vecoperand *  ops_vector, vec vecoperand 
*  result, unsigned n_ops)
-{
-  vecoperand * v = vNULL;
-  v.safe_grow_cleared (n_ops);
-  cartesian_product (ops_vector, result, v, 0);
-}
+
+/* Lower OP to two operands in case it is marked as commutative.  */
 
 static vecoperand *
 commutate (operand *op)
@@ -625,8 +623,11 @@ commutate (operand *op)
   for (unsigned i = 0; i  e-ops.length (); ++i)
 ops_vector.safe_push (commutate (e-ops[i]));
 
-  vec vecoperand *  result = vNULL;
-  cartesian_product (ops_vector, result, e-ops.length ());
+  auto_vec vecoperand *  result;
+  auto_vecoperand * v (e-ops.length ());
+  v.quick_grow_cleared (e-ops.length ());
+  cartesian_product (ops_vector, result, v, 0);
+
 
   for (unsigned i = 0; i  result.length (); ++i)
 {
@@ -651,6 +652,9 @@ commutate (operand *op)
   return ret;
 }
 
+/* Lower operations marked as commutative in the AST of S and push
+   the resulting patterns to SIMPLIFIERS.  */
+
 static void
 lower_commutative (simplify *s, vecsimplify * simplifiers)
 {
@@ -664,15 +668,16 @@ lower_commutative (simplify *s, vecsimp
 }
 }
 
-/* Lowering of conditional converts.  */
+/* Strip conditional conversios using operator OPER from O and its
+   children if STRIP, else replace them with an unconditional convert.  */
 
-static operand *
-lower_opt_convert (operand *o, enum tree_code oper)
+operand *
+lower_opt_convert (operand *o, enum tree_code oper, bool strip)
 {
-  if (capture *c = dyn_castcapture * (o))  
+  if (capture *c = dyn_castcapture * (o))
 {
   if (c-what)
-   return new capture (c-where, lower_opt_convert (c-what, oper));
+   return new capture (c-where, lower_opt_convert (c-what, oper, strip));
   else
return c;
 }
@@ -683,42 +688,23 @@ lower_opt_convert (operand *o, enum tree
 
   if (*e-operation == oper)
 {
+  if (strip)
+   return lower_opt_convert (e-ops[0], oper, strip);
+
   expr *ne = new expr (get_operator (CONVERT_EXPR));
-  ne-append_op (lower_opt_convert (e-ops[0], oper));
+  ne-append_op (lower_opt_convert (e-ops[0], oper, strip));
   return ne; 
 }
 
   expr *ne = new expr (e-operation, e-is_commutative);
   for (unsigned i = 0; i  e-ops.length (); ++i)
-ne-append_op (lower_opt_convert (e-ops[i], oper));
+ne-append_op (lower_opt_convert (e-ops[i], oper, strip));
 
   return ne;
 }
 
-operand *
-remove_opt_convert (operand *o, enum tree_code oper)
-{
-  if (capture *c = dyn_castcapture * (o))
-{
-  if (c-what)
-   return new capture (c-where, remove_opt_convert (c-what, oper));
-  else
-   return c;
-}
-
-  expr *e = as_aexpr * (o);
-  if (!e)
-

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [4/n] Memory accesses instrumentation

2014-10-14 Thread Ilya Enkovich
On 13 Oct 14:52, Jeff Law wrote:
 On 10/08/14 13:01, Ilya Enkovich wrote:
 Hi,
 
 This is the main chunk of instrumentation codes.  This patch introduces 
 instrumentation pass which instruments memory accesses.
 
 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovichilya.enkov...@intel.com
 
  * tree-chkp.c (chkp_may_complete_phi_bounds): New.
  (chkp_may_finish_incomplete_bounds): New.
  (chkp_recompute_phi_bounds): New.
  (chkp_find_valid_phi_bounds): New.
  (chkp_finish_incomplete_bounds): New.
  (chkp_maybe_copy_and_register_bounds): New.
  (chkp_build_returned_bound): New.
  (chkp_get_bound_for_parm): New.
  (chkp_compute_bounds_for_assignment): New.
  (chkp_get_bounds_by_definition): New.
  (chkp_get_bounds_for_decl_addr): New.
  (chkp_get_bounds_for_string_cst): New.
  (chkp_parse_array_and_component_ref): New.
  (chkp_make_addressed_object_bounds): New.
  (chkp_find_bounds_1): New.
  (chkp_find_bounds): New.
  (chkp_find_bounds_loaded): New.
  (chkp_copy_bounds_for_elem): New.
  (chkp_process_stmt): New.
  (chkp_fix_cfg): New.
  (chkp_instrument_function): New.
  (chkp_fini): New.
  (chkp_execute): New.
  (chkp_gate): New.
  (pass_data_chkp): New.
  (pass_chkp): New.
  (make_pass_chkp): New.
 
 
 @@ -491,6 +910,129 @@ chkp_get_bounds_var (tree ptr_var)
 return bnd_var;
   }
 
 +
 +
 +/* Register bounds BND for object PTR in global bounds table.
 +   A copy of bounds may be created for abnormal ssa names.
 +   Returns bounds to use for PTR.  */
 +static tree
 +chkp_maybe_copy_and_register_bounds (tree ptr, tree bnd)
 +{
 +  bool abnormal_ptr;
 +
 +  if (!chkp_reg_bounds)
 +return bnd;
 +
 +  /* Do nothing if bounds are incomplete_bounds
 + because it means bounds will be recomputed.  */
 +  if (bnd == incomplete_bounds)
 +return bnd;
 +
 +  abnormal_ptr = (TREE_CODE (ptr) == SSA_NAME
 +   SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ptr)
 +   gimple_code (SSA_NAME_DEF_STMT (ptr)) != GIMPLE_PHI);
 +
 +  /* A single bounds value may be reused multiple times for
 + different pointer values.  It may cause coalescing issues
 + for abnormal SSA names.  To avoid it we create a bounds
 + copy in case it is copmputed for abnormal SSA name.
 s/copmputed/computed/
 
  +  if (!bounds)
 +{
 +  tree orig_decl = cgraph_node::get (cfun-decl)-orig_decl;
 +
 +  /* For static chain param we return zero bounds
 + because currently we do not check dereferences
 + of this pointer.  */
 +  /* ?? Is it a correct way to identify such parm?  */
 +  if (cfun-decl  DECL_STATIC_CHAIN (cfun-decl)
 +   DECL_ARTIFICIAL (decl))
 +bounds = chkp_get_zero_bounds ();
 Are you just looking for the parameter in which we pass the static
 chain?   Look at get_chain_decl for how we set it up.  You may
 actually have to peek at more fields.  I don't think there's a
 single magic bit that says this is the static chain.  Though it
 may always appear in the same location on the parameter list.
 Nested functions aren't something I'd poked with much.  Richard
 Henderson might know more since he wrote tree-nested a while back.

Looking through tree-nested.c I found there is a static_chain_decl in function 
structure holding created decl.

 
 @@ -1107,6 +1821,323 @@ chkp_build_bndstx (tree addr, tree ptr, tree bounds,
   }
   }
 
 +/* Compute bounds for pointer NODE which was assigned in
 +   assignment statement ASSIGN.  Return computed bounds.  */
 +static tree
 +chkp_compute_bounds_for_assignment (tree node, gimple assign)
 Ugh.  Note how this introduces another place that anyone who might
 add a new RHS gimple statement needs to edit.  We need a pointer
 back to this code so that folks will know it needs updating.  The
 question is where to put it.
 
 Basically we want a place where anyone adding a new code that can
 appear on the RHS of an assignment must change already.  Thoughts on
 a good location?
 
 I realize there's probably many other places that probably need
 these kinds of documentation back links, I'm not asking you to
 address all of them.

Actually it shouldn't be so critical to meet some new RHS code in this switch.  
We may always say that we cannot find proper bounds and use default ones.  I 
replaced gcc_uneachable with a warning about lost bounds and added a comment 
into tree.def.  Would it be enough?

 
 
 
 +/* Compute and returne bounds for address of OBJ.  */
 s/returne/return
 
 
 +
 +/* Some code transformation made during instrumentation pass
 +   may put code into inconsistent state.  Here we find and fix
 +   such flaws.  */
 +static void
 +chkp_fix_cfg ()
 Presumably none of the code you're inserting that causes these
 problems is ever supposed to be executed on the non-fallthru edge?
 Else your creative method of hiding the abnormal nature of the
 edge for a period of time, then recreating it won't work.
 
 I'm a bit worried by this 

[PATCH][match-and-simplify] Update texi documentation

2014-10-14 Thread Richard Biener

This updates it with changed/added features.

pfd-build checked and inspected, applied.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* doc/match-and-simplify.texi: Update.

Index: gcc/doc/match-and-simplify.texi
===
--- gcc/doc/match-and-simplify.texi (revision 216146)
+++ gcc/doc/match-and-simplify.texi (working copy)
@@ -38,6 +38,8 @@ APIs are introduced.
 @deftypefnx {GIMPLE function} tree gimple_simplify (enum tree_code, tree, 
tree, tree, gimple_seq *, tree (*)(tree))
 @deftypefnx {GIMPLE function} tree gimple_simplify (enum tree_code, tree, 
tree, tree, tree, gimple_seq *, tree (*)(tree))
 @deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, 
tree, tree, gimple_seq *, tree (*)(tree))
+@deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, 
tree, tree, tree, gimple_seq *, tree (*)(tree))
+@deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, 
tree, tree, tree, gimple_seq *, tree (*)(tree))
 The main GIMPLE API entry to the expression simplifications mimicing
 that of the GENERIC fold_@{unary,binary,ternary@} functions.
 @end deftypefn
@@ -48,22 +50,27 @@ inserted on (if @code{NULL} then simplif
 are not performed) and a valueization hook that can be used to
 tie simplifications to a SSA lattice.
 
-In addition to those APIs a fold_stmt-like interface is provided with
+In addition to those APIs @code{fold_stmt} is overloaded with
+a valueization hook:
 
-@deftypefn bool gimple_simplify (gimple_stmt_iterator *, tree (*)(tree));
+@deftypefn bool fold_stmt (gimple_stmt_iterator *, tree (*)(tree));
 @end deftypefn
 
-which also has the additional valueization hook.
 
 Ontop of these a @code{fold_buildN}-like API for GIMPLE is introduced:
 
-@deftypefn tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, 
tree, tree (*valueize) (tree) = NULL);
-@deftypefnx tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, 
tree, tree, tree (*valueize) (tree) = NULL);
-@deftypefnx tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, 
tree, tree, tree, tree (*valueize) (tree) = NULL);
-@deftypefnx tree gimple_build (gimple_seq *, location_t, enum 
built_in_function, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefn {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum 
tree_code, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, 
enum tree_code, tree, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, 
enum tree_code, tree, tree, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, 
enum built_in_function, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, 
enum built_in_function, tree, tree, tree, tree (*valueize) (tree) = NULL);
+@deftypefnx {GIMPLE function} tree gimple_convert (gimple_seq *, location_t, 
tree, tree);
 @end deftypefn
 
-which is supposed to replace @code{force_gimple_operand (fold_buildN (...), 
...)}.
+which is supposed to replace @code{force_gimple_operand (fold_buildN (...), 
...)}
+and calls to @code{fold_convert}.  Overloads without the @code{location_t}
+argument exist.  Built statements are inserted on the provided sequence
+and simplification is performed using the optional valueization hook.
 
 
 @node The Language
@@ -72,7 +79,7 @@ which is supposed to replace @code{force
 
 The language to write expression simplifications in resembles other
 domain-specific languages GCC uses.  Thus it is lispy.  Lets start
-with an example from the match.pd file on the branch:
+with an example from the match.pd file:
 
 @smallexample
 (simplify
@@ -86,13 +93,14 @@ That contains at least two operands - an
 with the GIMPLE or GENERIC IL and a replacement expression that is
 returned if the match was successful.
 
-Expressions have an ID, @code{bit_and} in this case.  Expressions can
+Expressions have an operator ID, @code{bit_and} in this case.  Expressions can
 be lower-case tree codes with @code{_expr} stripped off or builtin
 function code names in all-caps, like @code{BUILT_IN_SQRT}.
 
 @code{@@n} denotes a so-called capture.  It captures the operand and lets
 you refer to it in other places of the match-and-simplify.  In the
-above example it is refered to in the replacement expression.
+above example it is refered to in the replacement expression.  Captures
+are @code{@@} followed by a number or an identifier.
 
 @smallexample
 (simplify
@@ -103,7 +111,8 @@ above example it is refered to in the re
 In this example @code{@@0} is mentioned twice which constrains the matched
 expression to have two equal operands.  This example also introduces
 operands written in C code.  These can be used in the expression

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [6/n] Instrument calls and returns

2014-10-14 Thread Ilya Enkovich
On 13 Oct 14:49, Ilya Enkovich wrote:
 On 10 Oct 12:50, Jeff Law wrote:
  On 10/08/14 13:04, Ilya Enkovich wrote:
  Hi,
  
  This patch adds intrumentation of calls and returns into instrumentation 
  pass.
  
  Thanks,
  Ilya
  --
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  
 * tree-chkp.c (chkp_add_bounds_to_ret_stmt): New.
 (chkp_replace_address_check_builtin): New.
 (chkp_replace_extract_builtin): New.
 (chkp_find_bounds_for_elem): New.
 (chkp_add_bounds_to_call_stmt): New.
 (chkp_instrument_function): Instrument rets and calls.
  
  
  [ snip ]
  
  +/* Additionall we need to add bounds
  s/Additionall/Additionally/
  
  OK with that nit fixed.
  
  jeff
 
 Here is a fixed version.
 
 Thanks,
 Ilya

Here is a slightly modified version with no chkp_can_be_shared check before 
unshare_expr calls.

Thanks,
Ilya
--
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index c546d97..2ddd25f 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -1042,6 +1042,29 @@ chkp_get_registered_bounds (tree ptr)
   return slot ? *slot : NULL_TREE;
 }
 
+/* Add bound retvals to return statement pointed by GSI.  */
+
+static void
+chkp_add_bounds_to_ret_stmt (gimple_stmt_iterator *gsi)
+{
+  gimple ret = gsi_stmt (*gsi);
+  tree retval = gimple_return_retval (ret);
+  tree ret_decl = DECL_RESULT (cfun-decl);
+  tree bounds;
+
+  if (!retval)
+return;
+
+  if (BOUNDED_P (ret_decl))
+{
+  bounds = chkp_find_bounds (retval, gsi);
+  bounds = chkp_maybe_copy_and_register_bounds (ret_decl, bounds);
+  gimple_return_set_retbnd (ret, bounds);
+}
+
+  update_stmt (ret);
+}
+
 /* Force OP to be suitable for using as an argument for call.
New statements (if any) go to SEQ.  */
 static tree
@@ -1166,6 +1189,64 @@ chkp_check_mem_access (tree first, tree last, tree 
bounds,
   chkp_check_upper (last, bounds, iter, location, dirflag);
 }
 
+/* Replace call to _bnd_chk_* pointed by GSI with
+   bndcu and bndcl calls.  DIRFLAG determines whether
+   check is for read or write.  */
+
+void
+chkp_replace_address_check_builtin (gimple_stmt_iterator *gsi,
+   tree dirflag)
+{
+  gimple_stmt_iterator call_iter = *gsi;
+  gimple call = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (call);
+  tree addr = gimple_call_arg (call, 0);
+  tree bounds = chkp_find_bounds (addr, gsi);
+
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_LBOUNDS
+  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_BOUNDS)
+chkp_check_lower (addr, bounds, *gsi, gimple_location (call), dirflag);
+
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_UBOUNDS)
+chkp_check_upper (addr, bounds, *gsi, gimple_location (call), dirflag);
+
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_BOUNDS)
+{
+  tree size = gimple_call_arg (call, 1);
+  addr = fold_build_pointer_plus (addr, size);
+  addr = fold_build_pointer_plus_hwi (addr, -1);
+  chkp_check_upper (addr, bounds, *gsi, gimple_location (call), dirflag);
+}
+
+  gsi_remove (call_iter, true);
+}
+
+/* Replace call to _bnd_get_ptr_* pointed by GSI with
+   corresponding bounds extract call.  */
+
+void
+chkp_replace_extract_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple call = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (call);
+  tree addr = gimple_call_arg (call, 0);
+  tree bounds = chkp_find_bounds (addr, gsi);
+  gimple extract;
+
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_GET_PTR_LBOUND)
+fndecl = chkp_extract_lower_fndecl;
+  else if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_GET_PTR_UBOUND)
+fndecl = chkp_extract_upper_fndecl;
+  else
+gcc_unreachable ();
+
+  extract = gimple_build_call (fndecl, 1, bounds);
+  gimple_call_set_lhs (extract, gimple_call_lhs (call));
+  chkp_mark_stmt (extract);
+
+  gsi_replace (gsi, extract, false);
+}
+
 /* Return COMPONENT_REF accessing FIELD in OBJ.  */
 static tree
 chkp_build_component_ref (tree obj, tree field)
@@ -1227,6 +1308,78 @@ chkp_build_array_ref (tree arr, tree etype, tree esize,
   return res;
 }
 
+/* Helper function for chkp_add_bounds_to_call_stmt.
+   Fill ALL_BOUNDS output array with created bounds.
+
+   OFFS is used for recursive calls and holds basic
+   offset of TYPE in outer structure in bits.
+
+   ITER points a position where bounds are searched.
+
+   ALL_BOUNDS[i] is filled with elem bounds if there
+   is a field in TYPE which has pointer type and offset
+   equal to i * POINTER_SIZE in bits.  */
+static void
+chkp_find_bounds_for_elem (tree elem, tree *all_bounds,
+  HOST_WIDE_INT offs,
+  gimple_stmt_iterator *iter)
+{
+  tree type = TREE_TYPE (elem);
+
+  if (BOUNDED_TYPE_P (type))
+{
+  if (!all_bounds[offs / POINTER_SIZE])
+   {
+ tree temp = make_temp_ssa_name (type, gimple_build_nop (), );
+ gimple assign = gimple_build_assign (temp, elem);
+ 

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [3/n] Helper functions

2014-10-14 Thread Ilya Enkovich
On 14 Oct 01:13, Ilya Enkovich wrote:
 2014-10-14 1:05 GMT+04:00 Jeff Law l...@redhat.com:
 
  Where does chkp_can_be_shared get used?Normally the thing to do would
  just be to call unshare_expr.  It'll create copies as needed.   If it's
  something that is supposed to be shared then it'll leave it alone.  If you
  need to do something different than unshare_expr, then that needs deeper
  investigation as you're mucking around in the structure sharing assumptions
  and that's not to be done lightly.
 
 All its uses are like following:
 
 if (!chkp_can_be_shared (rhs1))
   rhs1 = unshare_expr (rhs1);
 
 If unshare_expr avoids copies by itself then this check is useless and
 I should remove all its uses.
 
 Thanks,
 Ilya
 
 
  jeff
 
 

Here is a version with no chkp_can_be_shared function.  Patches having its uses 
were updated.

Thanks,
Ilya
--
2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp.c (assign_handler): New.
(chkp_get_zero_bounds): New.
(chkp_uintptr_type): New.
(chkp_none_bounds_var): New.
(entry_block): New.
(zero_bounds): New.
(none_bounds): New.
(incomplete_bounds): New.
(tmp_var): New.
(size_tmp_var): New.
(chkp_abnormal_copies): New.
(chkp_invalid_bounds): New.
(chkp_completed_bounds_set): New.
(chkp_reg_bounds): New.
(chkp_bound_vars): New.
(chkp_reg_addr_bounds): New.
(chkp_incomplete_bounds_map): New.
(chkp_static_var_bounds): New.
(in_chkp_pass): New.
(CHKP_BOUND_TMP_NAME): New.
(CHKP_SIZE_TMP_NAME): New.
(CHKP_BOUNDS_OF_SYMBOL_PREFIX): New.
(CHKP_STRING_BOUNDS_PREFIX): New.
(CHKP_VAR_BOUNDS_PREFIX): New.
(CHKP_NONE_BOUNDS_VAR_NAME): New.
(chkp_get_tmp_var): New.
(chkp_get_tmp_reg): New.
(chkp_get_size_tmp_var): New.
(chkp_register_addr_bounds): New.
(chkp_get_registered_addr_bounds): New.
(chkp_mark_completed_bounds): New.
(chkp_completed_bounds): New.
(chkp_erase_completed_bounds): New.
(chkp_register_incomplete_bounds): New.
(chkp_incomplete_bounds): New.
(chkp_erase_incomplete_bounds): New.
(chkp_mark_invalid_bounds): New.
(chkp_valid_bounds): New.
(chkp_mark_invalid_bounds_walker): New.
(chkp_build_addr_expr): New.
(chkp_get_entry_block): New.
(chkp_get_bounds_var): New.
(chkp_get_registered_bounds): New.
(chkp_check_lower): New.
(chkp_check_upper): New.
(chkp_check_mem_access): New.
(chkp_build_component_ref): New.
(chkp_build_array_ref): New.
(chkp_make_bounds): New.
(chkp_get_none_bounds_var): New.
(chkp_get_zero_bounds): New.
(chkp_get_none_bounds): New.
(chkp_get_invalid_op_bounds): New.
(chkp_get_nonpointer_load_bounds): New.
(chkp_get_next_bounds_parm): New.
(chkp_build_bndldx): New.
(chkp_make_static_bounds): New.
(chkp_generate_extern_var_bounds): New.
(chkp_intersect_bounds): New.
(chkp_may_narrow_to_field): New.
(chkp_narrow_bounds_for_field): New.
(chkp_narrow_bounds_to_field): New.
(chkp_walk_pointer_assignments): New.
(chkp_init): New.
* tree-chkp.h (chkp_get_none_bounds_var): New.
(chkp_check_mem_access): New.


diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index eb7a8df..9245fa7 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -60,6 +60,10 @@ along with GCC; see the file COPYING3.  If not see
 #include rtl.h /* For MEM_P, assign_temp.  */
 #include tree-dfa.h
 
+typedef void (*assign_handler)(tree, tree, void *);
+
+static tree chkp_get_zero_bounds ();
+
 #define chkp_bndldx_fndecl \
   (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDLDX))
 #define chkp_bndstx_fndecl \
@@ -83,11 +87,37 @@ along with GCC; see the file COPYING3.  If not see
 #define chkp_extract_upper_fndecl \
   (targetm.builtin_chkp_function (BUILT_IN_CHKP_EXTRACT_UPPER))
 
-static GTY (()) tree chkp_zero_bounds_var;
+static GTY (()) tree chkp_uintptr_type;
 
+static GTY (()) tree chkp_zero_bounds_var;
+static GTY (()) tree chkp_none_bounds_var;
+
+static GTY (()) basic_block entry_block;
+static GTY (()) tree zero_bounds;
+static GTY (()) tree none_bounds;
+static GTY (()) tree incomplete_bounds;
+static GTY (()) tree tmp_var;
+static GTY (()) tree size_tmp_var;
+static GTY (()) bitmap chkp_abnormal_copies;
+
+struct hash_settree *chkp_invalid_bounds;
+struct hash_settree *chkp_completed_bounds_set;
+struct hash_maptree, tree *chkp_reg_bounds;
+struct hash_maptree, tree *chkp_bound_vars;
+struct hash_maptree, tree *chkp_reg_addr_bounds;
+struct hash_maptree, tree *chkp_incomplete_bounds_map;
 struct hash_maptree, tree *chkp_bounds_map;
+struct hash_maptree, tree *chkp_static_var_bounds;
+
+static bool in_chkp_pass;
 
+#define CHKP_BOUND_TMP_NAME 

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-10-14 Thread Bernd Schmidt

On 10/14/2014 09:25 AM, Richard Biener wrote:

On Mon, 13 Oct 2014, Bernd Schmidt wrote:


On 10/13/2014 12:33 PM, Ilya Verbin wrote:

On 13 Oct 12:19, Jakub Jelinek wrote:

But I'd like to understand why is this one needed.
Why should the compilers care?  Aggregates layout and alignment of
integral/floating types must match between host and offload compilers,
sure,
but isn't that something streamed already in the LTO bytecode?
Or is LTO streamer not streaming some types like long_type_node?


It isn't, see the preload_common_nodes code.


Something I'd like to get rid of at some point (but it's not 100%
easy as backends for example compare va_list_type_node by pointer).


Hmm, this is unfortunate - I was about to submit a patch not to stream 
that one since it can differ between host and offload target.


I see one such comparison in i386.c - any others you are aware of? 
Should it be sufficient to just compare the TYPE_MAIN_VARIANT instead?



Also, the backend needs to choose
the right Pmode (and in the case of ptx, emit a directive about address
sizes).


Surely that will only be one problem with going the LTO way to handle
the offloading ;)


Another problem I mentioned above, beyond that I have a patch to use the 
$host-modes.def file to define machine modes - and that's essentially it.


I'll be submitting these additional offloading patches for the case of 
different host and target once Ilya has committed the others.



Bernd



[PATCH][match-and-simplify] Fix ICE

2014-10-14 Thread Richard Biener

This fixes an ICE that occurs when valueziation returns NULL and
we are looking at single-rhs REALPART_EXPR.  We should check
for is_gimple_min_invariant before valueization.

Bootstrapped on x86_64-unknown-linux-gnu, applied.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* genmatch.c (dt_operand::gen_gimple_expr): Check for an
invariant operand before valueizing it.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 216197)
+++ gcc/genmatch.c  (working copy)
@@ -1566,8 +1566,8 @@ dt_operand::gen_gimple_expr (FILE *f)
  fprintf (f, tree %s = TREE_OPERAND (gimple_assign_rhs1 
(def_stmt), %i);\n,
   child_opname, i);
  fprintf (f, if ((TREE_CODE (%s) == SSA_NAME\n
-   (%s = do_valueize (valueize, %s)))\n
   || is_gimple_min_invariant (%s))\n
+   (%s = do_valueize (valueize, %s)))\n
   {\n, child_opname, child_opname, child_opname,
   child_opname);
  continue;


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 10:03:38AM -0600, Jeff Law wrote:
 Can you add a PR markers to your changelog
 
   PR target/8340
   PR middle-end/47602
   PR rtl-optimization/55458
 
 Actually I think there is an additional test in 47602.  Can you please add
 it to the suite?  You'll also want to change the state of 47602 to
 RESOLVED/FIXED.

Unfortunately this broke bootstrap on x86_64/i686-linux,
see http://gcc.gnu.org/PR63534
- pretty much everything with -m32 -fsplit-stack -fpic ICEs, -m32 -fpic -p
results in wrong-code, and I see significant code quality regressions even
on simple testcases.

For the first two, I think (and said it before already) that the current
model of emitting set_got from a target hook during RA can't work, as there
can be calls in the prologue, and the prologue is inserted before the
set_got in that case.  I really think the RA should in that case just tell
the backend whether and in which register it wants to have the PIC register
loaded upon start of the function, and it should be emit prologue pass
that should arrange for that.

As for the code quality, either some RA improvements are needed, or
postreload must be able to fix it up, or hardreg propagation (though,
cprop_hardreg is forward propagation rather than backwards, right?).
Better before prologue is emitted though, because that will save/restore
the badly chosen hard reg too.

Jakub


Re: [PATCH][match-and-simplify] Change back default behavior of fold_stmt

2014-10-14 Thread Richard Biener
On Tue, 14 Oct 2014, Richard Biener wrote:

 
 This changes default behavior of fold_stmt back to _not_ following
 SSA use-def chains when trying to simplify things.  I had to force
 that already for one caller and for the merge to trunk I'd rather
 not track down issues in every other existing caller.
 
 This means that fold_stmt will not become more powerful, at least for now.
 I still hope to get rid of its use of fold() during the merge process.
 
 Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.
 
 (yeah, I'm preparing a first batch of changes to merge from the
 branch)

Unfortunately this exposes an issue with combining our SSA propagators
with pattern matching which makes us miscompile tree-vect-generic.c
from VRP.  Consider

Visiting PHI node: i_137 = PHI 0(51), i_48(63)
Argument #0 (51 - 52 executable)
0: [0, 0]
Argument #1 (63 - 52 not executable)
Found new range for i_137: [0, 0]
...
i_48 = delta_25 + i_137;
Found new range for i_48: VARYING
_67 = (unsigned int) delta_25;
Found new range for _67: [0, +INF]
_78 = (unsigned int) i_48;
Found new range for _78: [0, +INF]
_257 = _78 - _67;
(unsigned int) (delta_25 + i_137) - (unsigned int) delta_25
Match-and-simplified _78 - _67 to 0
Found new range for _257: [0, 0]

now after i_137 is revisited and it becomes VARYING the SSA propagator
stops at i_48 because its value does not change.  Thus it fails to
re-visit _257 where a pattern was applied that used the optimistic
value of i_137 to its advantage.

The following patch makes sure SSA propagators (CCP and VRP) do
not get any benefit during their propagation phase from
match-and-simplify by disabling the following of SSA use-def edges.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.


[PATCH, DWARF] re-init dw_frame_pointer_regnum between functions

2014-10-14 Thread Christian Bruel
Hello,

ARM and Thumb modes use different hard_frame_pointer_regnum ABIs. The
problem is that dwarf2cfi.c:dw_frame_pointer_regnum cache is initialized
only once per file, when creating the CIE. 
While testing the ARM attribute target to switch modes between
functions, I got a few assertion with -g, because this value gets
inconsistent with the respective FDEs that have different
hard_frame_pointer_rtx...

The snippet from dwarf2cfi.c illustrates the potential issue with the
mismatch between hard_frame_pointer_rtx and a badly set CFA register :

 if (dest == hard_frame_pointer_rtx)
   ...
  cur_cfa-reg = dw_frame_pointer_regnum;
  ...

I'm not aware of other targets giving the possibility to change the
frame_pointer_regnum ABI in a file, so the issue will only be show up
with the ARM target attribute. However I'd like very much your feedback
on this change, before I can send the remaining ARM parts.

Tested manually for arm-none-eabi with gdb, unwinding and frame access
seem OK when mixing modes.
x86 bootstrapped and regressions tests are running.

Many thanks,

Christian





2014-09-23  Christian Bruel  christian.br...@st.com

	* execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each function.

Index: dwarf2cfi.c
===
--- dwarf2cfi.c	(revision 216146)
+++ dwarf2cfi.c	(working copy)
@@ -2860,7 +2860,6 @@
   dw_trace_info cie_trace;
 
   dw_stack_pointer_regnum = DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM);
-  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
 
   memset (cie_trace, 0, sizeof (cie_trace));
   cur_trace = cie_trace;
@@ -2913,6 +2912,9 @@
 static unsigned int
 execute_dwarf2_frame (void)
 {
+  /* Different HARD_FRAME_POINTER_REGNUM might coexist in the same file.  */
+  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
+
   /* The first time we're called, compute the incoming frame state.  */
   if (cie_cfi_vec == NULL)
 create_cie_data ();


[patch libstdc++]: Fix PR/59807

2014-10-14 Thread Kai Tietz
Hi,

this patch fixes issue PR/59807 mutex misses destructor if
non-function call initialization is used.  This issue just got
reported for mingw-w64, as this is the only venture providing
posix-threading enabled toolchains (C++11).  Nevertheless this issue
could happen for other native Windows toolchains, too. Therefore I
adjusted the default mingw32-case, too.

ChangeLog

2014-10-14  Kai Tietz  kti...@redhat.com

PR libstdc++/59807
* config/os/mingw32/os_defines.h (_GTHREAD_USE_MUTEX_INIT_FUNC):
Define to avoid leak.
* config/os/mingw32-w64/os_defines.h: Likewise.

I am just testing bootstrap for it, and if successful, I will commit.

Thanks,
Kai

Index: config/os/mingw32/os_defines.h
===
--- config/os/mingw32/os_defines.h(Revision 216199)
+++ config/os/mingw32/os_defines.h(Arbeitskopie)
@@ -75,4 +75,7 @@
 #define _GLIBCXX_LLP64 1
 #endif

+// See libstdc++/59807
+#define _GTHREAD_USE_MUTEX_INIT_FUNC 1
+
 #endif
Index: config/os/mingw32-w64/os_defines.h
===
--- config/os/mingw32-w64/os_defines.h(Revision 216199)
+++ config/os/mingw32-w64/os_defines.h(Arbeitskopie)
@@ -83,4 +83,7 @@
 // their dtors are called
 #define _GLIBCXX_THREAD_ATEXIT_WIN32 1

+// See libstdc++/59807
+#define _GTHREAD_USE_MUTEX_INIT_FUNC 1
+
 #endif


Re: [PATCH][match-and-simplify] Change back default behavior of fold_stmt

2014-10-14 Thread Richard Biener
On Tue, 14 Oct 2014, Richard Biener wrote:

 On Tue, 14 Oct 2014, Richard Biener wrote:
 
  
  This changes default behavior of fold_stmt back to _not_ following
  SSA use-def chains when trying to simplify things.  I had to force
  that already for one caller and for the merge to trunk I'd rather
  not track down issues in every other existing caller.
  
  This means that fold_stmt will not become more powerful, at least for now.
  I still hope to get rid of its use of fold() during the merge process.
  
  Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.
  
  (yeah, I'm preparing a first batch of changes to merge from the
  branch)
 
 Unfortunately this exposes an issue with combining our SSA propagators
 with pattern matching which makes us miscompile tree-vect-generic.c
 from VRP.  Consider
 
 Visiting PHI node: i_137 = PHI 0(51), i_48(63)
 Argument #0 (51 - 52 executable)
 0: [0, 0]
 Argument #1 (63 - 52 not executable)
 Found new range for i_137: [0, 0]
 ...
 i_48 = delta_25 + i_137;
 Found new range for i_48: VARYING
 _67 = (unsigned int) delta_25;
 Found new range for _67: [0, +INF]
 _78 = (unsigned int) i_48;
 Found new range for _78: [0, +INF]
 _257 = _78 - _67;
 (unsigned int) (delta_25 + i_137) - (unsigned int) delta_25
 Match-and-simplified _78 - _67 to 0
 Found new range for _257: [0, 0]
 
 now after i_137 is revisited and it becomes VARYING the SSA propagator
 stops at i_48 because its value does not change.  Thus it fails to
 re-visit _257 where a pattern was applied that used the optimistic
 value of i_137 to its advantage.
 
 The following patch makes sure SSA propagators (CCP and VRP) do
 not get any benefit during their propagation phase from
 match-and-simplify by disabling the following of SSA use-def edges.
 
 Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
 
 Richard.

And here is the patch.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* gimple-fold.h (no_follow_ssa_edges): Declare.
(gimple_fold_stmt_to_constant_1): Add separate valueize hook for
gimple_simplify, defaulted to no_follow_ssa_edges.
* gimple-fold.c (fold_stmt): Make old API never follow SSA edges
when simplifying.
(no_follow_ssa_edges): New function.
(gimple_fold_stmt_to_constant_1): Adjust.
* tree-cfg.c (no_follow_ssa_edges): Remove.
(replace_uses_by): Use plain fold_stmt again.
* gimple-match-head.c (gimple_simplify): When simplifying
a statement do not stop when valueizing its operands yields NULL.

Index: gcc/gimple-fold.h
===
--- gcc/gimple-fold.h   (revision 216146)
+++ gcc/gimple-fold.h   (working copy)
@@ -32,7 +32,9 @@ extern tree maybe_fold_and_comparisons (
enum tree_code, tree, tree);
 extern tree maybe_fold_or_comparisons (enum tree_code, tree, tree,
   enum tree_code, tree, tree);
-extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree));
+extern tree no_follow_ssa_edges (tree);
+extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree),
+   tree (*) (tree) = 
no_follow_ssa_edges);
 extern tree gimple_fold_stmt_to_constant (gimple, tree (*) (tree));
 extern tree fold_const_aggregate_ref_1 (tree, tree (*) (tree));
 extern tree fold_const_aggregate_ref (tree);
Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 216146)
+++ gcc/gimple-fold.c   (working copy)
@@ -3136,6 +3136,14 @@ fail:
   return changed;
 }
 
+/* Valueziation callback that ends up not following SSA edges.  */
+
+tree
+no_follow_ssa_edges (tree)
+{
+  return NULL_TREE;
+}
+
 /* Fold the statement pointed to by GSI.  In some cases, this function may
replace the whole statement with a new one.  Returns true iff folding
makes any changes.
@@ -3146,7 +3154,7 @@ fail:
 bool
 fold_stmt (gimple_stmt_iterator *gsi)
 {
-  return fold_stmt_1 (gsi, false, NULL);
+  return fold_stmt_1 (gsi, false, no_follow_ssa_edges);
 }
 
 bool
@@ -3167,7 +3175,7 @@ bool
 fold_stmt_inplace (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
-  bool changed = fold_stmt_1 (gsi, true, NULL);
+  bool changed = fold_stmt_1 (gsi, true, no_follow_ssa_edges);
   gcc_assert (gsi_stmt (*gsi) == stmt);
   return changed;
 }
@@ -4527,12 +4535,19 @@ gimple_fold_stmt_to_constant_2 (gimple s
 }
 }
 
+/* ???  The SSA propagators do not correctly deal with following SSA use-def
+   edges if there are intermediate VARYING defs.  For this reason
+   there are two valueization hooks here, one for the legacy code
+   in gimple_fold_stmt_to_constant_2 and one for gimple_simplify
+   which is defaulted to no_follow_ssa_edges.  */
+
 tree
-gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree))
+gimple_fold_stmt_to_constant_1 (gimple 

Re: [C++] Handle || ! for simd vectors

2014-10-14 Thread Jason Merrill

On 10/13/2014 03:45 PM, Marc Glisse wrote:

Ping https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00361.html
(sorry that my message looked like I had committed as obvious)


Indeed.  OK. :)


On Sat, 4 Oct 2014, Marc Glisse wrote:


On Thu, 2 Oct 2014, Jason Merrill wrote:


OK.


Thanks. While committing, I noticed that I restricted ! to integer
vectors, whereas it seems to work just fine with scalar floats, so it
would make sense to extend it to float vectors. Tested on
x86_64-linux-gnu.

2014-10-04  Marc Glisse  marc.gli...@inria.fr

gcc/cp/
* typeck.c (cp_build_unary_op) [TRUTH_NOT_EXPR]: Accept float
vectors.
gcc/testsuite/
* g++.dg/ext/vector9.C: Test ! with float vectors.






Re: __intN patch 3/5: main __int128 - __intN conversion.

2014-10-14 Thread Jason Merrill

On 10/13/2014 04:54 PM, DJ Delorie wrote:

This is what I ended up with for the test case.  It was a bit tricky
since it only works with msp430x (not msp430) and requires the gnu
extensions.  Is this OK?  If so, is there anything else, or can I
check the whole mess in yet?


Go ahead.

Jason




Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-10-14 Thread Ilya Enkovich
On 10 Oct 21:20, Ilya Enkovich wrote:
 2014-10-10 20:45 GMT+04:00 Jeff Law l...@redhat.com:
  On 10/09/14 10:54, Uros Bizjak wrote:
 
  On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com
  wrote:
 
  It appeared I changed a semantics of BNDMK expand when replaced tree
  operations with rtl ones.
 
  Original code:
 
  +  op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
  +   arg1, integer_minus_one_node));
  +  op1 = force_reg (Pmode, op1);
 
  Modified code:
 
  +  op1 = expand_normal (arg1);
  +
  +  if (!register_operand (op1, Pmode))
  +   op1 = ix86_zero_extend_to_Pmode (op1);
  +
  +  /* Builtin arg1 is size of block but instruction op1 should
  +be (size - 1).  */
  +  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
  +op1, 1, OPTAB_DIRECT);
 
  The problem is that in the fixed version we may modify value of a pseudo
  register into which arg1 is expanded which means incorrect value for all
  following usages of arg1.
 
  Didn't reveal it early because programs surprisingly rarely hit this bug.
  I do following change to fix it:
 
  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
  -op1, 1, OPTAB_DIRECT);
  +NULL, 1, OPTAB_DIRECT);
 
  Similar problem was also fixed for BNDNARROW.  Does it look OK?
 
 
  I'm not aware of this type of limitation, and there are quite some
  similar constructs in i386.c. It is hard to say without the testcase
  what happens, perhaps one of RTX experts (CC'd) can advise what is
  recommended here.
 
  The problem is the call to expand_simple_binop.
 
  The source (op1) and the destination (op1) are obviously the same, so its
  going to clobber whatever value is in there.  If there are other uses of the
  original value of op1, then things aren't going to work. But I'm a little
  unclear how there's be other later uses of that value.  Perhaps Ilya could
  comment on that.
 
 op1 is a result of expand_normal called for SSA name.  Other uses of
 op1 come from expand of uses of this SSA name in GIMPLE code.
 
 
  Regardless, unless there's a strong reason to do so, I'd generally recommend
  passing a NULL_RTX as the target for expansions so that you always get a new
  pseudo.  Lots of optimizers in the RTL world work better if we don't have
  pseudos with multiple assignments.  By passing NULL_RTX for the target we
  get that property more often.  So a change like Ilya suggests (though I'd
  use NULL_RTX rather than NULL) makes sense.
 
 Will replace it with NULL_RTX.
 
 Thanks,
 Ilya
 
 
 
 
  Jeff


Here is a version with NULL_RTX used instead of NULL.

Thanks,
Ilya
--
2014-10-14  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/i386-builtin-types.def (BND): New.
(ULONG): New.
(BND_FTYPE_PCVOID_ULONG): New.
(VOID_FTYPE_BND_PCVOID): New.
(VOID_FTYPE_PCVOID_PCVOID_BND): New.
(BND_FTYPE_PCVOID_PCVOID): New.
(BND_FTYPE_PCVOID): New.
(BND_FTYPE_BND_BND): New.
(PVOID_FTYPE_PVOID_PVOID_ULONG): New.
(PVOID_FTYPE_PCVOID_BND_ULONG): New.
(ULONG_FTYPE_VOID): New.
(PVOID_FTYPE_BND): New.
* config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h.
(ix86_builtins): Add
IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX,
IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL,
IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET,
IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT,
IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER,
IX86_BUILTIN_BNDUPPER.
(builtin_isa): Add leaf_p and nothrow_p fields.
(def_builtin): Initialize leaf_p and nothrow_p.
(ix86_add_new_builtins): Handle leaf_p and nothrow_p
flags.
(bdesc_mpx): New.
(bdesc_mpx_const): New.
(ix86_init_mpx_builtins): New.
(ix86_init_builtins): Call ix86_init_mpx_builtins.
(ix86_emit_cmove): New.
(ix86_emit_move_max): New.
(ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK,
IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX,
IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU,
IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW,
IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF,
IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER.


diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index 9161287..5421ba9 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node)
 DEF_PRIMITIVE_TYPE (QI, char_type_node)
 DEF_PRIMITIVE_TYPE (HI, intHI_type_node)
 DEF_PRIMITIVE_TYPE (SI, intSI_type_node)
+DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node)
 # ??? Logically this should be intDI_type_node, but that maps to long
 # with 64-bit, and that's not how the emmintrin.h is written.  Again, 
 

Re: [PATCH] support ggc hash_map and hash_set

2014-10-14 Thread Richard Biener
On Tue, Sep 2, 2014 at 3:56 AM,  tsaund...@mozilla.com wrote:
 From: Trevor Saunders tsaund...@mozilla.com

 Hi,

 There are still some issues to make this work really nicely, but this part is
 probably good enough its worth reviewing.

 For one thing you can't use ggc hash_map or set in front ends with some types
 or gengtype will decide to put the overloads of the marking routines it
 provides in a front end file instead of the one it choose before breaking 
 other
 front ends.  However that seems to be an unrelated issue you can trigger it
 without using hash_map/set, so we might as well solve it separetly.

 I had to have the entry marking functions for set deligate to the traits class
 because gcc  4.9.1 issues clearly bogus errors if you inline the code from 
 the
 traits implementation.  We may well want to make map work the same way at some
 point to enable some of the special GTY attributes like if_marked, but it
 doesn't seem to be necessary right now.

 bootstrapped + regtested without regressions on x86_64-unknown-linux-gnu, ok?

I have just noticed that this (ggc support for hash-table.h) makes it no longer
suitable for use from generator programs (trying to merge from trunk on
match-and-simplify).  If you look at vec.h it has sophisticated guards
to block out GGC support if GENERATOR_FILE is defined.

Can you try to fix this please?

Thanks,
Richard.

 Trev

 gcc/ChangeLog:

 2014-09-01  Trevor Saunders  tsaund...@mozilla.com

 * alloc-pool.c: Include coretypes.h.
 * cgraph.h, dbxout.c, dwarf2out.c, except.c, except.h, function.c,
 function.h, symtab.c, tree-cfg.c, tree-eh.c: Use hash_map and
 hash_set instead of htab.
 * ggc-page.c (in_gc): New variable.
 (ggc_free): Do nothing if a collection is taking place.
 (ggc_collect): Set in_gc appropriately.
 * ggc.h (gt_ggc_mx(const char *)): New function.
 (gt_pch_nx(const char *)): Likewise.
 (gt_ggc_mx(int)): Likewise.
 (gt_pch_nx(int)): Likewise.
 * hash-map.h (hash_map::hash_entry::ggc_mx): Likewise.
 (hash_map::hash_entry::pch_nx): Likewise.
 (hash_map::hash_entry::pch_nx_helper): Likewise.
 (hash_map::hash_map): Adjust.
 (hash_map::create_ggc): New function.
 (gt_ggc_mx): Likewise.
 (gt_pch_nx): Likewise.
 * hash-set.h (default_hashset_traits::ggc_mx): Likewise.
 (default_hashset_traits::pch_nx): Likewise.
 (hash_set::hash_entry::ggc_mx): Likewise.
 (hash_set::hash_entry::pch_nx): Likewise.
 (hash_set::hash_entry::pch_nx_helper): Likewise.
 (hash_set::hash_set): Adjust.
 (hash_set::create_ggc): New function.
 (hash_set::elements): Likewise.
 (gt_ggc_mx): Likewise.
 (gt_pch_nx): Likewise.
 * hash-table.h (hash_table::hash_table): Adjust.
 (hash_table::m_ggc): New member.
 (hash_table::~hash_table): Adjust.
 (hash_table::expand): Likewise.
 (hash_table::empty): Likewise.
 (gt_ggc_mx): New function.
 (hashtab_entry_note_pointers): Likewise.
 (gt_pch_nx): Likewise.


 diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c
 index 0d31835..bfaa0e4 100644
 --- a/gcc/alloc-pool.c
 +++ b/gcc/alloc-pool.c
 @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see

  #include config.h
  #include system.h
 +#include coretypes.h
  #include alloc-pool.h
  #include hash-table.h
  #include hash-map.h
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index 879899c..030a1c7 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -1604,7 +1604,6 @@ struct cgraph_2node_hook_list;

  /* Map from a symbol to initialization/finalization priorities.  */
  struct GTY(()) symbol_priority_map {
 -  symtab_node *symbol;
priority_type init;
priority_type fini;
  };
 @@ -1872,7 +1871,7 @@ public:
htab_t GTY((param_is (symtab_node))) assembler_name_hash;

/* Hash table used to hold init priorities.  */
 -  htab_t GTY ((param_is (symbol_priority_map))) init_priority_hash;
 +  hash_mapsymtab_node *, symbol_priority_map *init_priority_hash;

FILE* GTY ((skip)) dump_file;

 diff --git a/gcc/dbxout.c b/gcc/dbxout.c
 index 946f1d1..d856bdd 100644
 --- a/gcc/dbxout.c
 +++ b/gcc/dbxout.c
 @@ -2484,12 +2484,9 @@ dbxout_expand_expr (tree expr)
  /* Helper function for output_used_types.  Queue one entry from the
 used types hash to be output.  */

 -static int
 -output_used_types_helper (void **slot, void *data)
 +bool
 +output_used_types_helper (tree const type, vectree *types_p)
  {
 -  tree type = (tree) *slot;
 -  vectree *types_p = (vectree *) data;
 -
if ((TREE_CODE (type) == RECORD_TYPE
 || TREE_CODE (type) == UNION_TYPE
 || TREE_CODE (type) == QUAL_UNION_TYPE
 @@ -2502,7 +2499,7 @@ output_used_types_helper (void **slot, void *data)
 TREE_CODE (TYPE_NAME (type)) == TYPE_DECL)
  types_p-quick_push (TYPE_NAME (type));

 -  return 1;
 +  return true;
  }

  /* This is a qsort callback which sorts types and declarations into a
 @@ -2544,8 +2541,9 @@ output_used_types 

Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Joel Brobecker
Hello Ian,

 libiberty/ChangeLog
 2014-08-05  Iain Buclaw  ibuc...@gdcproject.org
 
 * Makefile.in (CFILES): Add d-demangle.c.
 (REQUIRED_OFILES): Add d-demangle.o.
 * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case.
 (cplus_demangle): Likewise.
 * d-demangle.c: New file.
 * testsuite/Makefile.in (really-check): Add check-d-demangle.
 * testsuite/d-demangle-expected: New file.

As hinted on gdb-patches, this patch causes a GDB build failure
on Solaris 2.9, because it uses strtold which is not available.
According to gnulib's documentation, it should also break on
the following systems:

NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0,
Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS.

This patch attempts to fix the issue by adding a configure check
for strtold and adjusts the code to use strtod if strtold does not
exist.

Does this look OK to you? If yes, can one of the GCC maintainers
please review?

libiberty/ChangeLog:

* configure.ac: Add check for strtold's availability.
* configure, config.in: Regenerate.
* d-demangle.c [!HAVE_STRTOLD]: #define strtold to strtod.

Thank you!

-- 
Joel
From 9e4d74607075ef857dfa4e118f43641494aaff90 Mon Sep 17 00:00:00 2001
From: Joel Brobecker brobec...@adacore.com
Date: Tue, 14 Oct 2014 09:54:05 -0400
Subject: [PATCH] libiberty: fallback on strtod if strtold is not available.

This patch fixes a build failurer on Solaris 2.9, and all other
systems that do not provide strtold.

libiberty/ChangeLog:

* configure.ac: Add check for strtold's availability.
* configure, config.in: Regenerate.
* d-demangle.c [!HAVE_STRTOLD]: #define strtold to strtod.
---
 libiberty/config.in| 3 +++
 libiberty/configure| 2 +-
 libiberty/configure.ac | 2 +-
 libiberty/d-demangle.c | 3 +++
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 3380819..da20a5f 100644
--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -401,7 +401,7 @@ if test x = y; then
 sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
  stpcpy stpncpy strcasecmp strchr strdup \
  strerror strncasecmp strndup strnlen strrchr strsignal strstr strtod \
- strtol strtoul strverscmp sysconf sysctl sysmp \
+ strtol strtold strtoul strverscmp sysconf sysctl sysmp \
 table times tmpnam \
 vasprintf vfprintf vprintf vsprintf \
 wait3 wait4 waitpid)
diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index d31bf94..59de083 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -46,6 +46,9 @@ If not, see http://www.gnu.org/licenses/.  */
 extern long strtol (const char *nptr, char **endptr, int base);
 extern long double strtold (const char *nptr, char **endptr);
 #endif
+#if !defined(HAVE_STRTOLD)
+#define strtold strtod
+#endif
 
 #include demangle.h
 #include libiberty.h
diff --git a/libiberty/config.in b/libiberty/config.in
index 1cf9c11..8c5f0b6 100644
--- a/libiberty/config.in
+++ b/libiberty/config.in
@@ -280,6 +280,9 @@
 /* Define to 1 if you have the `strtol' function. */
 #undef HAVE_STRTOL
 
+/* Define to 1 if you have the `strtold' function. */
+#undef HAVE_STRTOLD
+
 /* Define to 1 if you have the `strtoul' function. */
 #undef HAVE_STRTOUL
 
diff --git a/libiberty/configure b/libiberty/configure
index 96feaed..072b03b 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -5423,7 +5423,7 @@ if test x = y; then
 sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
  stpcpy stpncpy strcasecmp strchr strdup \
  strerror strncasecmp strndup strnlen strrchr strsignal strstr strtod \
- strtol strtoul strverscmp sysconf sysctl sysmp \
+ strtol strtold strtoul strverscmp sysconf sysctl sysmp \
 table times tmpnam \
 vasprintf vfprintf vprintf vsprintf \
 wait3 wait4 waitpid
-- 
1.9.1



Re: New rematerialization sub-pass in LRA

2014-10-14 Thread Vladimir Makarov
On 10/13/2014 12:24 PM, Wilco Dijkstra wrote:
   Here is a new rematerialization sub-pass of LRA.

   I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
 sub-pass permits to generate a smaller code in average on both
 architecture (although improvement no-significant), adds  0.4%
 additional compilation time in -O2 mode of release GCC (according user
 time of compilation of 500K lines fortran program and valgrind lakey #
 insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
 performance result, the best I found is 1% SPECFP2000 improvement on
 ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
 results are practically the same (Haswell has a very good
 sophisticated memory sub-system).
 I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves
 -mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is
 AArch64 specific, but previously non-optimal register allocation choices 
 triggered
 A latent bug in ree (it's unclear why GCC still allocates FP registers in 
 high-pressure integer code, as I set the costs for int-FP moves high).

 On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and 
 SPECFP is ~0.2% faster.
Thanks for reporting this.  It is important for me as I have no aarch64
machine for benchmarking.

Perlbmk performance degradation is too big and I'll definitely look at
this problem.

 Generally I think it is good to have a specific pass for rematerialization.
 However should this not also affect the costs of instructions that can be 
 cheaply rematerialized? Similarly for the choice whether to caller save or 
 spill 
 (today the caller-save code doesn't care at all about rematerialization, so 
 it 
 aggressively caller-saves values which could be rematerialized - see eg. 
 https://gcc.gnu.org/ml/gcc/2014-09/msg00071.html).
I wanted to address the cost issues later but I guess perlbmk
performance problem might be solved by this.  So I'll be starting
working on this.

The rematerialization pass can fix caller-saves code if we add
processing move insns too.  So it could be another project to improve
the rematerialization.  Thanks for pointing this out.
 

 Also I am confused by the claim memory reads are not profitable to 
 rematerialize. 
 Surely rematerializing a memory read from const-data or literal pool is 
 cheaper
 than spilling as you avoid a store to the stack?

Most such cases are covered by cfg-insensitive rematerialization but I
guess there are cfg-sensitve cases.  I should try this too.

Wilco, thanks for very informative email with three ideas to improve the
rematerialization.  As I wrote the patch is an initial implementation of
the rematerialization and the infrastructure with modifications will be
able to handle these and other improvements.  Most important we have the
infrastructure in the right place now,




[PATCH][match-and-simplify] Remove/revert unneeded changes

2014-10-14 Thread Richard Biener

This removes duplicate/not needed code from generic-match-head.c
and removes integral_op_p (if needed these new predicates should
go to tree.h).  It also revers one unnecessary Makefile.in change.

Applied.

Richard.

2014-10-14  Richard Biener  rguent...@suse.de

* Makefile.in (BUILD_RTL): Revert not needed change.
* match.pd (integral_op_p): Remove predicate and use.
* generic-match-head.c: Include gimple-match.h and remove
all code.
* gimple-match-head.c (integral_op_p): Remove.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 216146)
+++ gcc/Makefile.in (working copy)
@@ -1032,7 +1032,7 @@ BUILD_LIBS = $(BUILD_LIBIBERTY)
 
 BUILD_RTL = build/rtl.o build/read-rtl.o build/ggc-none.o \
build/vec.o build/min-insn-modes.o build/gensupport.o \
-   build/print-rtl.o build/hash-table.o
+   build/print-rtl.o
 BUILD_MD = build/read-md.o
 BUILD_ERRORS = build/errors.o
 
Index: gcc/match.pd
===
--- gcc/match.pd(revision 216146)
+++ gcc/match.pd(working copy)
@@ -24,7 +24,6 @@ along with GCC; see the file COPYING3.
 
 /* Generic tree predicates we inherit.  */
 (define_predicates
-   integral_op_p
integer_onep integer_zerop integer_all_onesp
real_zerop real_onep
CONSTANT_CLASS_P)
@@ -132,8 +131,9 @@ (define_predicates
 
 /* fold_negate_exprs convert - (~A) to A + 1.  */
 (simplify
-  (negate (bit_not integral_op_p@0))
-  (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } ))
+  (negate (bit_not @0))
+  (if (INTEGRAL_TYPE_P (type))
+   (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } )))
 
 /* One ternary pattern.  */
 
Index: gcc/generic-match-head.c
===
--- gcc/generic-match-head.c(revision 216146)
+++ gcc/generic-match-head.c(working copy)
@@ -41,37 +41,6 @@ along with GCC; see the file COPYING3.
 #include tree-phinodes.h
 #include ssa-iterators.h
 #include dumpfile.h
+#include gimple-match.h
 
-#define INTEGER_CST_P(node) (TREE_CODE(node) == INTEGER_CST)
-#define integral_op_p(node) INTEGRAL_TYPE_P(TREE_TYPE(node))
-#define REAL_CST_P(node) (TREE_CODE(node) == REAL_CST)
 
-
-/* Helper to transparently allow tree codes and builtin function codes
-   exist in one storage entity.  */
-class code_helper
-{
-public:
-  code_helper () {}
-  code_helper (tree_code code) : rep ((int) code) {}
-  code_helper (built_in_function fn) : rep (-(int) fn) {}
-  operator tree_code () const { return (tree_code) rep; }
-  operator built_in_function () const { return (built_in_function) -rep; }
-  bool is_tree_code () const { return rep  0; }
-  bool is_fn_code () const { return rep  0; }
-private:
-  int rep;
-};
-
-
-/* Return whether T is a constant that we'll dispatch to fold to
-   evaluate fully constant expressions.  */
-
-static inline bool
-constant_for_folding (tree t)
-{
-  return (CONSTANT_CLASS_P (t)
- /* The following is only interesting to string builtins.  */
- || (TREE_CODE (t) == ADDR_EXPR
-  TREE_CODE (TREE_OPERAND (t, 0)) == STRING_CST));
-}
Index: gcc/gimple-match-head.c
===
--- gcc/gimple-match-head.c (revision 216146)
+++ gcc/gimple-match-head.c (working copy)
@@ -43,8 +43,6 @@ along with GCC; see the file COPYING3.
 #include dumpfile.h
 #include gimple-match.h
 
-#define integral_op_p(node) INTEGRAL_TYPE_P(TREE_TYPE(node))
-
 
 /* Forward declarations of the private auto-generated matchers.
They expect valueized operands in canonical order and do not



Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Ian Lance Taylor
On Tue, Oct 14, 2014 at 7:12 AM, Joel Brobecker brobec...@adacore.com wrote:

 libiberty/ChangeLog
 2014-08-05  Iain Buclaw  ibuc...@gdcproject.org

 * Makefile.in (CFILES): Add d-demangle.c.
 (REQUIRED_OFILES): Add d-demangle.o.
 * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case.
 (cplus_demangle): Likewise.
 * d-demangle.c: New file.
 * testsuite/Makefile.in (really-check): Add check-d-demangle.
 * testsuite/d-demangle-expected: New file.

 As hinted on gdb-patches, this patch causes a GDB build failure
 on Solaris 2.9, because it uses strtold which is not available.
 According to gnulib's documentation, it should also break on
 the following systems:

 NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0,
 Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS.

 This patch attempts to fix the issue by adding a configure check
 for strtold and adjusts the code to use strtod if strtold does not
 exist.

 Does this look OK to you? If yes, can one of the GCC maintainers
 please review?

It doesn't make sense to me to use strtod if strtold is required.  And
if strtold is not required, then it seems to me that we should always
use strtod.  It seems to me that the right options are either 1) use
strtod unconditionally; 2) add strtold to libiberty

Since option 1 is simpler, what bad things would happen if we use
strtod unconditionally?

Ian


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread H.J. Lu
On Mon, Oct 13, 2014 at 11:49 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

 Uros.

 This caused a regression:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527

Another bootstrap failure:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63536

-- 
H.J.


Re: [PATCH] AutoFDO patch for trunk

2014-10-14 Thread Jan Hubicka
 Index: gcc/cgraphclones.c
 ===
 --- gcc/cgraphclones.c(revision 215826)
 +++ gcc/cgraphclones.c(working copy)
 @@ -453,6 +453,11 @@
  }
else
  count_scale = 0;
 +  /* In AutoFDO, if edge count is larger than callee's entry block
 + count, we will not update the original callee because it may
 + mistakenly mark some hot function as cold.  */
 +  if (flag_auto_profile  gcov_count = count)
 +update_original = false;

lets drop this from initial patch.

 Index: gcc/bb-reorder.c
 ===
 --- gcc/bb-reorder.c  (revision 215826)
 +++ gcc/bb-reorder.c  (working copy)
 @@ -1569,15 +1569,14 @@
/* Mark which partition (hot/cold) each basic block belongs in.  */
FOR_EACH_BB_FN (bb, cfun)
  {
 -  bool cold_bb = false;
 +  bool cold_bb = probably_never_executed_bb_p (cfun, bb);

and this too
(basically all the tweaks should IMO go in independently and ideally in
a way that does not need flag_auto_profile test).
 +/* Return true if BB contains indirect call.  */
 +
 +static bool
 +has_indirect_call (basic_block bb)
 +{
 +  gimple_stmt_iterator gsi;
 +
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  if (gimple_code (stmt) == GIMPLE_CALL
 +(gimple_call_fn (stmt) == NULL
 +   || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL))

You probably want to skip gimple_call_internal_p calls here.
 +
 +/* From AutoFDO profiles, find values inside STMT for that we want to measure
 +   histograms for indirect-call optimization.  */
 +
 +static void
 +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map,
 + bool transform)
 +{
 +  gimple stmt = gsi_stmt (*gsi);
 +  tree callee;
 +
 +  if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL
 +  || gimple_call_fndecl (stmt) != NULL_TREE)
 +return;
 +
 +  callee = gimple_call_fn (stmt);
 +
 +  histogram_value hist = gimple_alloc_histogram_value (
 +  cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
 +  hist-n_counters = 3;
 +  hist-hvalue.counters =  XNEWVEC (gcov_type, hist-n_counters);
 +  gimple_add_histogram_value (cfun, stmt, hist);
 +
 +  gcov_type total = 0;
 +  icall_target_map::const_iterator max_iter = map.end();
 +
 +  for (icall_target_map::const_iterator iter = map.begin();
 +   iter != map.end(); ++iter)
 +{
 +  total += iter-second;
 +  if (max_iter == map.end() || max_iter-second  iter-second)
 + max_iter = iter;
 +}
 +
 +  hist-hvalue.counters[0] = (unsigned long long)
 +  afdo_string_table-get_name (max_iter-first);
 +  hist-hvalue.counters[1] = max_iter-second;
 +  hist-hvalue.counters[2] = total;
 +
 +  if (!transform)
 +return;
 +
 +  if (gimple_ic_transform (gsi))
 +{
 +  struct cgraph_edge *indirect_edge =
 +   cgraph_node::get (current_function_decl)-get_edge (stmt);
 +  struct cgraph_node *direct_call =
 +   find_func_by_profile_id ((int)hist-hvalue.counters[0]);
 +  if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL)
 + return;
 +  struct cgraph_edge *new_edge =
 +   indirect_edge-make_speculative (direct_call, 0, 0);
 +  new_edge-redirect_call_stmt_to_callee ();
 +  gimple_remove_histogram_value (cfun, stmt, hist);
 +  inline_call (new_edge, true, NULL, NULL, false);
 +  return;
 +}
 +  return;

Is it necessary to go via histogram and gimple_ic_transform here?  I would 
expect that all you
need is to make the speculative edge and inline it. (bypassing the work of 
producing fake
histogram value and calling igmple_ic_transofrm on it)

Also it seems to me that you want to set direct_count nad frequency argument of
make_speculative so the resulting function profile is not off.

The rest of interfaces seems quite sane now.  Can you please look into
using speculative edges directly instead of hooking into the vpt infrastructure
and fixing the formatting issues of the new pass?

I will try to make another pass over the actual streaming logic that I find bit 
difficult
to read, but I quite trust you it does the right thing ;)

Honza


[PATCH x86, pr63534] Fix go bootstrap

2014-10-14 Thread Evgeny Stupachenko
Hi,

Bootstaped with --enable-languages=c,c++,fortran,lto,go passed.
Make check in progress.

Is it ok?

ChangeLog

2014-10-14  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/i386.c (ix86_expand_split_stack_prologue): Make
__morestack calls local.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a3ca2ed..5117572 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11999,7 +11999,10 @@ ix86_expand_split_stack_prologue (void)
REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);

   if (split_stack_fn == NULL_RTX)
-split_stack_fn = gen_rtx_SYMBOL_REF (Pmode, __morestack);
+{
+  split_stack_fn = gen_rtx_SYMBOL_REF (Pmode, __morestack);
+  SYMBOL_REF_FLAGS (split_stack_fn) |= SYMBOL_FLAG_LOCAL;
+}
   fn = split_stack_fn;

   /* Get more stack space.  We pass in the desired stack space and the
@@ -12044,9 +12047,11 @@ ix86_expand_split_stack_prologue (void)
  gcc_assert ((args_size  0x) == args_size);

  if (split_stack_fn_large == NULL_RTX)
-   split_stack_fn_large =
- gen_rtx_SYMBOL_REF (Pmode, __morestack_large_model);
-
+   {
+ split_stack_fn_large =
+   gen_rtx_SYMBOL_REF (Pmode, __morestack_large_model);
+ SYMBOL_REF_FLAGS (split_stack_fn_large) |= SYMBOL_FLAG_LOCAL;
+   }
  if (ix86_cmodel == CM_LARGE_PIC)
{
  rtx_code_label *label;


Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2))

2014-10-14 Thread David Malcolm
On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote:
 I'd like to merge the JIT branch into trunk:
   https://gcc.gnu.org/wiki/JIT
 
 This is v2 since it incorporates fixes for the various issues
 identified by Joseph in an earlier submission:
   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html
 
 I've split up the current diff between trunk and the branch into 5
 areas for ease of review (and to allow for early merger of the
 supporting work, if it's deemed ready):
 
 patch 1: exposes an entrypoint in libiberty that I need
 patch 2: configure and Makefile changes in gcc
 patch 3: timevar.h: Add an auto_timevar class
 patch 4: State cleanups in gcc
 patch 5: Add the jit code itself
 
 [this is a diff of trunk r215958 aka
 e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06,
 vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1].
 
 I've successfully bootstrapped and regression-tested the cumulative
 result of all of the patches against a control build, building them
 both with --enable-host-shared, and with
   --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
 adding ,jit to the test build (both on x86_64-unknown-linux-gnu;
 Fedora 20).
 
 There were no regressions vs the control build, and the patched build
 gains a jit.sum, with 4663 passes (and no failures).
 
 OK for trunk?

Patch 5 seems to have been too large, even compressed, so I'm breaking
it up into separate pieces and compressing, giving 10 patches in total

Patches 1-4 are as above.

Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir

Patch 6: the core of the JIT implementation: the gcc/jit subdir

Patch 7: the testsuite: gcc/testsuite/jit.dg

Patch 8: sphinx-based documentation: the gcc/jit/docs subdir

Patch 9: texinfo documentation autogenerated from the sphinx sources.

Patch 10: the ChangeLog.jit logs from the branch.



[PATCH 05/10] JIT-related changes outside of jit subdir

2014-10-14 Thread David Malcolm
ChangeLog:
* MAINTAINERS (Various Maintainers): Add myself as jit maintainer.

contrib/ChangeLog:
* jit-coverage-report.py: New file: a script to print crude
code-coverage information for the libgccjit API.

gcc/ChangeLog:
* doc/install.texi (--enable-host-shared): Specify that this is
required when building libgccjit.
* timevar.def (TV_JIT_REPLAY): New.
(TV_ASSEMBLE): New.
(TV_LINK): New.
(TV_LOAD): New.
---
 MAINTAINERS|  1 +
 contrib/jit-coverage-report.py | 67 ++
 gcc/doc/install.texi   |  2 +-
 gcc/timevar.def|  6 
 4 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 contrib/jit-coverage-report.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dca84e..1fa679e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -260,6 +260,7 @@ testsuite   Janis Johnson   
jani...@codesourcery.com
 register allocationVladimir Makarovvmaka...@redhat.com
 gdbhooks.pyDavid Malcolm   dmalc...@redhat.com
 SLSR   Bill Schmidtwschm...@linux.vnet.ibm.com
+jitDavid Malcolm   dmalc...@redhat.com
 
 Note that individuals who maintain parts of the compiler need approval to
 check in changes outside of the parts of the compiler they maintain.
diff --git a/contrib/jit-coverage-report.py b/contrib/jit-coverage-report.py
new file mode 100644
index 000..529336f
--- /dev/null
+++ b/contrib/jit-coverage-report.py
@@ -0,0 +1,67 @@
+#! /usr/bin/python
+#
+# Print a report on which libgccjit.so symbols are used in which test
+# cases, and which lack test coverage.  Tested with Python 2.7 and 3.2
+# To be run from the root directory of the source tree.
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# Written by David Malcolm dmalc...@redhat.com.
+#
+# This script is Free Software, and it can be copied, distributed and
+# modified as defined in the GNU General Public License.  A copy of
+# its license can be downloaded from http://www.gnu.org/copyleft/gpl.html
+
+from collections import Counter
+import glob
+import re
+import sys
+
+def parse_map_file(path):
+
+Parse libgccjit.map, returning the symbols in the API as a list of str.
+
+syms = []
+with open(path) as f:
+for line in f:
+m = re.match('^\s+([a-z_]+);$', line)
+if m:
+syms.append(m.group(1))
+return syms
+
+def parse_test_case(path):
+
+Locate all symbol-like things in a C test case, yielding
+them as a sequence of str.
+
+with open(path) as f:
+for line in f:
+for m in re.finditer('([_A-Za-z][_A-Za-z0-9]*)', line):
+yield m.group(1)
+
+def find_test_cases():
+for path in glob.glob('gcc/testsuite/jit.dg/*.[ch]'):
+yield path
+
+api_syms = parse_map_file('gcc/jit/libgccjit.map')
+
+syms_in_test_cases = {}
+for path in find_test_cases():
+syms_in_test_cases[path] = list(parse_test_case(path))
+
+uses = Counter()
+for sym in sorted(api_syms):
+print('symbol: %s' % sym)
+uses[sym] = 0
+for path in syms_in_test_cases:
+count = syms_in_test_cases[path].count(sym)
+uses[sym] += count
+if count:
+print('  uses in %s: %i' % (path, count))
+if uses[sym] == 0:
+print('  NEVER USED')
+sys.stdout.write('\n')
+
+layout = '%40s  %5s  %s'
+print(layout % ('SYMBOL', 'USES', 'HISTOGRAM'))
+for sym, count in uses.most_common():
+print(layout % (sym, count, '*' * count if count else 'UNUSED'))
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 75ac9a6..c92de28 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -954,7 +954,7 @@ Specify that the @emph{host} code should be built into 
position-independent
 machine code (with -fPIC), allowing it to be used within shared libraries,
 but yielding a slightly slower compiler.
 
-Currently this option is only of use to people developing GCC itself.
+This option is required when building the libgccjit.so library.
 
 Contrast with @option{--enable-shared}, which affects @emph{target}
 libraries.
diff --git a/gcc/timevar.def b/gcc/timevar.def
index a04d05c..b406c16 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -277,3 +277,9 @@ DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED, verify loop closed)
 DEFTIMEVAR (TV_VERIFY_RTL_SHARING, verify RTL sharing)
 DEFTIMEVAR (TV_REBUILD_FREQUENCIES   , rebuild frequencies)
 DEFTIMEVAR (TV_REPAIR_LOOPS , repair loop structures)
+
+/* Stuff used by libgccjit.so.  */
+DEFTIMEVAR (TV_JIT_REPLAY   , replay of JIT client activity)
+DEFTIMEVAR (TV_ASSEMBLE , assemble JIT code)
+DEFTIMEVAR (TV_LINK , link JIT code)
+DEFTIMEVAR (TV_LOAD , load JIT result)
-- 
1.8.5.3



[PATCH 06/10] Heart of the JIT implementation (was: Re: [PATCH 0/5] Merger of jit branch (v2))

2014-10-14 Thread David Malcolm
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote:
 On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote:
  I'd like to merge the JIT branch into trunk:
https://gcc.gnu.org/wiki/JIT
  
  This is v2 since it incorporates fixes for the various issues
  identified by Joseph in an earlier submission:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html
  
  I've split up the current diff between trunk and the branch into 5
  areas for ease of review (and to allow for early merger of the
  supporting work, if it's deemed ready):
  
  patch 1: exposes an entrypoint in libiberty that I need
  patch 2: configure and Makefile changes in gcc
  patch 3: timevar.h: Add an auto_timevar class
  patch 4: State cleanups in gcc
  patch 5: Add the jit code itself
  
  [this is a diff of trunk r215958 aka
  e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06,
  vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1].
  
  I've successfully bootstrapped and regression-tested the cumulative
  result of all of the patches against a control build, building them
  both with --enable-host-shared, and with
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
  adding ,jit to the test build (both on x86_64-unknown-linux-gnu;
  Fedora 20).
  
  There were no regressions vs the control build, and the patched build
  gains a jit.sum, with 4663 passes (and no failures).
  
  OK for trunk?
 
 Patch 5 seems to have been too large, even compressed, so I'm breaking
 it up into separate pieces and compressing, giving 10 patches in total
 
 Patches 1-4 are as above.
 
 Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir
 
 Patch 6: the core of the JIT implementation: the gcc/jit subdir
 
 Patch 7: the testsuite: gcc/testsuite/jit.dg
 
 Patch 8: sphinx-based documentation: the gcc/jit/docs subdir
 
 Patch 9: texinfo documentation autogenerated from the sphinx sources.
 
 Patch 10: the ChangeLog.jit logs from the branch.

This commit adds the gcc/jit subdirectory, implementing the library,
which looks like a frontend named jit from the POV of the rest of the
gcc code.

gcc/jit/ChangeLog:

* Make-lang.in: New.
* TODO.rst: New.
* config-lang.in: New.
* dummy-frontend.c: New.
* jit-builtins.c: New.
* jit-builtins.h: New.
* jit-common.h: New.
* jit-playback.c: New.
* jit-playback.h: New.
* jit-recording.c: New.
* jit-recording.h: New.
* libgccjit++.h: New.
* libgccjit.c: New.
* libgccjit.h: New.
* libgccjit.map: New.
* libgccjit.pc.in: New.
* notes.txt: New.



0006-Heart-of-the-JIT-implementation.patch.gz
Description: GNU Zip compressed data


[PATCH 07/10] Testsuite for the JIT (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))

2014-10-14 Thread David Malcolm
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote:
 On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote:
  I'd like to merge the JIT branch into trunk:
https://gcc.gnu.org/wiki/JIT
  
  This is v2 since it incorporates fixes for the various issues
  identified by Joseph in an earlier submission:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html
  
  I've split up the current diff between trunk and the branch into 5
  areas for ease of review (and to allow for early merger of the
  supporting work, if it's deemed ready):
  
  patch 1: exposes an entrypoint in libiberty that I need
  patch 2: configure and Makefile changes in gcc
  patch 3: timevar.h: Add an auto_timevar class
  patch 4: State cleanups in gcc
  patch 5: Add the jit code itself
  
  [this is a diff of trunk r215958 aka
  e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06,
  vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1].
  
  I've successfully bootstrapped and regression-tested the cumulative
  result of all of the patches against a control build, building them
  both with --enable-host-shared, and with
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
  adding ,jit to the test build (both on x86_64-unknown-linux-gnu;
  Fedora 20).
  
  There were no regressions vs the control build, and the patched build
  gains a jit.sum, with 4663 passes (and no failures).
  
  OK for trunk?
 
 Patch 5 seems to have been too large, even compressed, so I'm breaking
 it up into separate pieces and compressing, giving 10 patches in total
 
 Patches 1-4 are as above.
 
 Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir
 
 Patch 6: the core of the JIT implementation: the gcc/jit subdir
 
 Patch 7: the testsuite: gcc/testsuite/jit.dg

 Patch 8: sphinx-based documentation: the gcc/jit/docs subdir
 
 Patch 9: texinfo documentation autogenerated from the sphinx sources.
 
 Patch 10: the ChangeLog.jit logs from the branch.

Here's patch 7, the testsuite.



0007-Testsuite-for-the-JIT.patch.gz
Description: GNU Zip compressed data


Re: [PATCH, Pointer Bounds Checker 14/x] Passes [16/n] Reduce bounds lifetime

2014-10-14 Thread Ilya Enkovich
On 09 Oct 11:32, Jeff Law wrote:
 On 10/08/14 13:24, Ilya Enkovich wrote:
 Hi,
 
 This patch adds a bounds lifetime reduction into checker optimization.
 
 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 
  * tree-chkp.c (chkp_reduce_bounds_lifetime): New.
  (chkp_opt_execute): Run bounds lifetime reduction
  algorithm.
 Basic tests  pull into a file with the other optimization work.
 
 How expensive is nearest_common_dominator?  Would it make more sense
 to use something like the concept of an anticipated expression from
 LCM?

nearest_common_dominator searches for the nearest common ancestor in a tree so 
I expect it to be not more expensive than O(h), h - height of a tree.  I 
suppose LCM would be more efficient in case of many processed bounds and many 
their uses.  But this optimization is only for INIT bounds, NULL bounds and 
bounds for statically allocated objects.  Thus their usage is quite limited.

 
 
 
 
 +  /* Check we do not increase other values lifetime.  */
 +  FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
 +{
 +  op = USE_FROM_PTR (use_p);
 +
 +  if (TREE_CODE (op) == SSA_NAME
 +   gimple_code (SSA_NAME_DEF_STMT (op)) != GIMPLE_NOP)
 +deps = true;
 Might as well break out of the FOR_EACH_PHI_OR_STMT_USE loop here.
 Note that some of our iterators have special mechanisms to break out
 of the loop, but my recollection is those are for the immediate use
 iterators to ensure the marker is removed.
 
 Code is probably OK if LCM/anticipated isn't reasonable and the
 above issues are dealt with.
 
 jeff
 

Here is an updated version with break and testcase added.

Thanks,
Ilya
--
gcc/

2014-10-14  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp-opt.c (chkp_reduce_bounds_lifetime): New.
(chkp_opt_execute): Run bounds lifetime reduction
algorithm.

gcc/testsuite/

2014-10-14  Ilya Enkovich  ilya.enkov...@intel.com

* gcc.target/i386/chkp-lifetime-1.c: New.


diff --git a/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c 
b/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c
new file mode 100644
index 000..bcecdd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt-details 
} */
+/* { dg-final { scan-tree-dump Moving creation of \[^ \]+ down to its use 
chkpopt } } */
+
+extern int arr[];
+
+int test (int i)
+{
+  int res;
+  if (i = 0)
+res = arr[i];
+  else
+res = -i;
+  return res;
+}
diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c
index b3ff433..37da035 100644
--- a/gcc/tree-chkp-opt.c
+++ b/gcc/tree-chkp-opt.c
@@ -1277,6 +1277,158 @@ chkp_optimize_string_function_calls (void)
 }
 }
 
+/* Intrumentation pass inserts most of bounds creation code
+   in the header of the function.  We want to move bounds
+   creation closer to bounds usage to reduce bounds lifetime.
+   We also try to avoid bounds creation code on paths where
+   bounds are not used.  */
+static void
+chkp_reduce_bounds_lifetime (void)
+{
+  basic_block bb = FALLTHRU_EDGE (ENTRY_BLOCK_PTR_FOR_FN (cfun))-dest;
+  gimple_stmt_iterator i;
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); )
+{
+  gimple dom_use, use_stmt, stmt = gsi_stmt (i);
+  basic_block dom_bb;
+  ssa_op_iter iter;
+  imm_use_iterator use_iter;
+  use_operand_p use_p;
+  tree op;
+  bool want_move = false;
+  bool deps = false;
+
+  if (gimple_code (stmt) == GIMPLE_CALL
+  gimple_call_fndecl (stmt) == chkp_bndmk_fndecl)
+   want_move = true;
+
+  if (gimple_code (stmt) == GIMPLE_ASSIGN
+  POINTER_BOUNDS_P (gimple_assign_lhs (stmt))
+  gimple_assign_rhs_code (stmt) == VAR_DECL)
+   want_move = true;
+
+  if (!want_move)
+   {
+ gsi_next (i);
+ continue;
+   }
+
+  /* Check we do not increase other values lifetime.  */
+  FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
+   {
+ op = USE_FROM_PTR (use_p);
+
+ if (TREE_CODE (op) == SSA_NAME
+  gimple_code (SSA_NAME_DEF_STMT (op)) != GIMPLE_NOP)
+   {
+ deps = true;
+ break;
+   }
+   }
+
+  if (deps)
+   {
+ gsi_next (i);
+ continue;
+   }
+
+  /* Check all usages of bounds.  */
+  if (gimple_code (stmt) == GIMPLE_CALL)
+   op = gimple_call_lhs (stmt);
+  else
+   {
+ gcc_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
+ op = gimple_assign_lhs (stmt);
+   }
+
+  dom_use = NULL;
+  dom_bb = NULL;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op)
+   {
+ if (dom_bb 
+ dominated_by_p (CDI_DOMINATORS,
+ dom_bb, gimple_bb (use_stmt)))
+   {
+ dom_use = use_stmt;
+ dom_bb = NULL;
+   }
+ else if 

[PATCH 10/10] ChangeLog files (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))

2014-10-14 Thread David Malcolm
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote:
 On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote:
  I'd like to merge the JIT branch into trunk:
https://gcc.gnu.org/wiki/JIT
  
  This is v2 since it incorporates fixes for the various issues
  identified by Joseph in an earlier submission:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html
  
  I've split up the current diff between trunk and the branch into 5
  areas for ease of review (and to allow for early merger of the
  supporting work, if it's deemed ready):
  
  patch 1: exposes an entrypoint in libiberty that I need
  patch 2: configure and Makefile changes in gcc
  patch 3: timevar.h: Add an auto_timevar class
  patch 4: State cleanups in gcc
  patch 5: Add the jit code itself
  
  [this is a diff of trunk r215958 aka
  e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06,
  vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1].
  
  I've successfully bootstrapped and regression-tested the cumulative
  result of all of the patches against a control build, building them
  both with --enable-host-shared, and with
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
  adding ,jit to the test build (both on x86_64-unknown-linux-gnu;
  Fedora 20).
  
  There were no regressions vs the control build, and the patched build
  gains a jit.sum, with 4663 passes (and no failures).
  
  OK for trunk?
 
 Patch 5 seems to have been too large, even compressed, so I'm breaking
 it up into separate pieces and compressing, giving 10 patches in total
 
 Patches 1-4 are as above.
 
 Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir
 
 Patch 6: the core of the JIT implementation: the gcc/jit subdir
 
 Patch 7: the testsuite: gcc/testsuite/jit.dg
 
 Patch 8: sphinx-based documentation: the gcc/jit/docs subdir
 
 Patch 9: texinfo documentation autogenerated from the sphinx sources.
 
 Patch 10: the ChangeLog.jit logs from the branch.

Finally, patch 10, the ChangeLog files.




0010-ChangeLog-files.patch.gz
Description: GNU Zip compressed data


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-14 Thread Felix Yang
PING?
Cheers,
Felix


On Tue, Oct 14, 2014 at 12:30 AM, Felix Yang fei.yang0...@gmail.com wrote:
 Thanks for the comments.

 The patch checked the usage of teh trip count register, making sure
 that it is not used in the loop body other than the doloop_end or
 lives past the doloop_end instruction, as the following code snippet
 shows:

 +  /* Scan all the blocks to make sure they don't use iter_reg.  */
 +  if (loop-iter_reg_used || loop-iter_reg_used_outside)
 +{
 +  if (dump_file)
 +fprintf (dump_file, ;; loop %d uses iterator\n,
 + loop-loop_no);
 +  return false;
 +}

 For the spill issue, I think we need to handle it. The reason is
 that currently we are not telling GCC about the existence of the
 LCOUNT register. Instead, we keep the trip count in a general register
 and it's possible that this register can be spilled when register
 pressure is high.
 It's a good idea to post another patch to describe the LCOUNT
 register in GCC in order to free this general register. But I want
 this patch applied as a first step, OK?

 Cheers,
 Felix


 On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterl...@gmail.com
 augustine.sterl...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang fei.yang0...@gmail.com wrote:
 Hi Sterling,

 I made some improvement to the patch. Two changes:
 1. TARGET_LOOPS is now used as a condition of the doloop related
 patterns, which is more elegant.

 Fine.

 2. As the trip count register of the zero-cost loop maybe
 potentially spilled, we need to change the patterns in order to handle
 this issue.

 Actually, for xtensa you don't. The trip count is copied into LCOUNT
 at the execution of the loop instruction, and therefore a spill or
 whatever doesn't matter--it won't affect the result. So as long as you
 have the trip count at the start of the loop, you are fine.

 This does bring up an issue of whether or not the trip count can be
 modified during the loop. (note that this is different than early
 exit.) If it can, you can't use a zero-overhead loop. Does your patch
 address this case.

 The solution is similar to that adapted by c6x backend.
 Just turn the zero-cost loop into a regular loop when that happens
 when reload is completed.
 Attached please find version 4 of the patch. Make check regression
 tested with xtensa-elf-gcc/simulator.
 OK for trunk?


Re: [PATCH x86, pr63534] Fix go bootstrap

2014-10-14 Thread Richard Henderson
On 10/14/2014 08:08 AM, Evgeny Stupachenko wrote:
 Hi,
 
 Bootstaped with --enable-languages=c,c++,fortran,lto,go passed.
 Make check in progress.
 
 Is it ok?
 
 ChangeLog
 
 2014-10-14  Evgeny Stupachenko  evstu...@gmail.com
 
 * config/i386/i386.c (ix86_expand_split_stack_prologue): Make
 __morestack calls local.

Ok.


r~


Re: [PATCH x86, pr63534] Fix go bootstrap

2014-10-14 Thread Jakub Jelinek
On Tue, Oct 14, 2014 at 08:43:39AM -0700, Richard Henderson wrote:
 On 10/14/2014 08:08 AM, Evgeny Stupachenko wrote:
  Hi,
  
  Bootstaped with --enable-languages=c,c++,fortran,lto,go passed.
  Make check in progress.
  
  Is it ok?
  
  ChangeLog
  
  2014-10-14  Evgeny Stupachenko  evstu...@gmail.com
  
  * config/i386/i386.c (ix86_expand_split_stack_prologue): Make
  __morestack calls local.
 
 Ok.

Please mention 
PR target/63534
in the ChangeLog.

Jakub


Re: [PATCH 3/5] timevar.h: Add an auto_timevar class

2014-10-14 Thread David Malcolm
On Tue, 2014-10-14 at 11:03 +0200, Richard Biener wrote:
 On Mon, Oct 13, 2014 at 7:45 PM, David Malcolm dmalc...@redhat.com wrote:
  This is used in a couple of places in jit/jit-playback.c to ensure
  that we pop the timevar on every exit path from a function.
 
  I could rewrite them if need be, but it does simplify things.
 
 Sorry to be bikeshedding but auto_timevar sounds odd - this is
 just a one-element timevar stack.

Sorry that the usage examples didn't make it through in my original
email; these are in patch 06/10 in gcc/jit/jit-playback.c and look like
this:

playback::context::
compile ()
{
  ... lots of code...

  {
auto_timevar assemble_timevar (TV_ASSEMBLE);

... lots of code, with multiple return paths...

  }

}

the idea being that the timevar_pop happens implicitly at the exit from
the scope (e.g. via one of the error-handling returns).

FWIW I rather like the current name: I think of it as an RAII-style way
of not having to manually call timevar_pop.

The auto_ prefix to me evokes both such RAII types as auto_ptr and
auto_vec, and the fact that it's intended to be on the stack i.e. have
auto storage class.

 Don't have a real better name though :/  Maybe timevar_pushpop ?
 
 Otherwise this looks ok.
 
 Thanks,
 Richard.
 
  Written by Tom Tromey.
 
  gcc/ChangeLog:
  * timevar.h (class auto_timevar): New class.
  ---
   gcc/timevar.h | 24 
   1 file changed, 24 insertions(+)
 
  diff --git a/gcc/timevar.h b/gcc/timevar.h
  index 6703cc9..f018e39 100644
  --- a/gcc/timevar.h
  +++ b/gcc/timevar.h
  @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv)
   timevar_pop_1 (tv);
   }
 
  +// This is a simple timevar wrapper class that pushes a timevar in its
  +// constructor and pops the timevar in its destructor.
  +class auto_timevar
  +{
  + public:
  +  auto_timevar (timevar_id_t tv)
  +: m_tv (tv)
  +  {
  +timevar_push (m_tv);
  +  }
  +
  +  ~auto_timevar ()
  +  {
  +timevar_pop (m_tv);
  +  }
  +
  + private:
  +
  +  // Private to disallow copies.
  +  auto_timevar (const auto_timevar );
  +
  +  timevar_id_t m_tv;
  +};
  +
   extern void print_time (const char *, long);
 
   #endif /* ! GCC_TIMEVAR_H */
  --
  1.8.5.3
 




Re: [PATCH] Implement -fsanitize=object-size

2014-10-14 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 12:26:44PM +0200, Jakub Jelinek wrote:
 2014-10-10  Jakub Jelinek  ja...@redhat.com
 
   * ubsan/Makefile.am (DEFS): Add -DPIC.
   * ubsan/Makefile.in: Regenerated.

I've now bootstrapped/regtested this on x86_64-linux and i686-linux
and committed as obvious.

2014-10-14  Jakub Jelinek  ja...@redhat.com

* ubsan/Makefile.am (DEFS): Add -DPIC.
* ubsan/Makefile.in: Regenerated.

--- libsanitizer/ubsan/Makefile.am  2014-09-24 11:08:04.183026156 +0200
+++ libsanitizer/ubsan/Makefile.am  2014-10-10 12:15:19.124247283 +0200
@@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_
 # May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
-DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS 
+DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS -DPIC
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
-Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti 
-fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 ACLOCAL_AMFLAGS = -I m4

--- libsanitizer/ubsan/Makefile.in  2014-09-25 15:01:25.448109866 +0200
+++ libsanitizer/ubsan/Makefile.in  2014-10-14 11:26:17.772201307 +0200
@@ -132,7 +132,7 @@ CXXCPP = @CXXCPP@
 CXXDEPMODE = @CXXDEPMODE@
 CXXFLAGS = @CXXFLAGS@
 CYGPATH_W = @CYGPATH_W@
-DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS 
+DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS -DPIC
 DEPDIR = @DEPDIR@
 DSYMUTIL = @DSYMUTIL@
 DUMPBIN = @DUMPBIN@


Jakub


RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
 Vladimir Makarov wrote:
  On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
  SPECFP is ~0.2% faster.
 Thanks for reporting this.  It is important for me as I have no aarch64
 machine for benchmarking.
 
 Perlbmk performance degradation is too big and I'll definitely look at
 this problem.

Looking at the diffs in regexec.c which has the hot function regmatch(), 
nothing obvious stands out that could cause a serious regression.
I did notice this around line 2300:

.L802:
ldr x1, [x23, 48]
adrpx5, PL_savestack_ix
ldr w0, [x23]
str x5, [sp, 104]
str x1, [x24, #:lo12:PL_regcc]
ldr w27, [x1, 4]
bl  regcppush
-   ldr x5, [sp, 104]
str w0, [sp, 112]
ldr x0, [x23, 32]
+   adrpx5, PL_savestack_ix
ldr w28, [x5, #:lo12:PL_savestack_ix]
+   str x5, [sp, 104]
bl  regmatch
ldr x5, [sp, 104]
mov w19, w0
ldr w1, [sp, 112]
ldr w0, [x5, #:lo12:PL_savestack_ix]

So it rematerializes once instance, but fails to rematerialize the second use. 
An extra store is inserted, and the first adrp and store are not removed as 
dead.

Wilco




[PATCH] Fix optimize_range_tests_diff

2014-10-14 Thread Jakub Jelinek
Hi!

When hacking on range reassoc opt, I've noticed we can emit
code with undefined behavior even when there wasn't one originally,
in particular for:
   (X - 43U) = 3U || (X - 75U) = 3U
   and this loop can transform that into
   ((X - 43U)  ~(75U - 43U)) = 3U.  */
we actually don't transform it to what the comment says, but
   ((X - 43)  ~(75U - 43U)) = 3U
i.e. the initial subtraction can be performed in signed type,
if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
at gimple level would be UB (not caught by -fsanitize=undefined,
because that is handled much earlier).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-10-14  Jakub Jelinek  ja...@redhat.com

* tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
MINUS_EXPR in unsigned type to avoid undefined behavior.

--- gcc/tree-ssa-reassoc.c.jj   2014-10-13 17:54:33.0 +0200
+++ gcc/tree-ssa-reassoc.c  2014-10-13 17:58:07.312705218 +0200
@@ -2250,8 +2250,13 @@ optimize_range_tests_diff (enum tree_cod
   if (tree_log2 (tem1)  0)
 return false;
 
+  type = unsigned_type_for (type);
+  tem1 = fold_convert (type, tem1);
+  tem2 = fold_convert (type, tem2);
+  lowi = fold_convert (type, lowi);
   mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
-  tem1 = fold_binary (MINUS_EXPR, type, rangei-exp, lowi);
+  tem1 = fold_binary (MINUS_EXPR, type,
+ fold_convert (type, rangei-exp), lowi);
   tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
   lowj = build_int_cst (type, 0);
   if (update_range_test (rangei, rangej, 1, opcode, ops, tem1,

Jakub


Re: [PATCH 3/5] IPA ICF pass

2014-10-14 Thread Jan Hubicka
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index fb41b01..2de98b4 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -172,6 +172,12 @@ public:
/* Dump referring in list to FILE.  */
void dump_referring (FILE *);
  
 +  /* Get number of references for this node.  */
 +  inline unsigned get_references_count (void)
 +  {
 +return ref_list.references ? ref_list.references-length () : 0;
 +  }

Probably better called num_references() (like we have num_edge in basic-block.h)
 @@ -8068,6 +8069,19 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.
  
 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +The optimization reduces code size and may disturb unwind stacks by replacing
 +a function by equivalent one with a different name. The optimization works
 +more effectively with link time optimization enabled.
 +
 +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
 +works on different levels and thus the optimizations are not same - there are
 +equivalences that are found only by GCC and equivalences found only by Gold.
 +
 +This flag is enabled by default at @option{-O2}.
... and -Os?
 +case ARRAY_REF:
 +case ARRAY_RANGE_REF:
 +  {
 + x1 = TREE_OPERAND (t1, 0);
 + x2 = TREE_OPERAND (t2, 0);
 + y1 = TREE_OPERAND (t1, 1);
 + y2 = TREE_OPERAND (t2, 1);
 +
 + if (!compare_operand (array_ref_low_bound (t1),
 +   array_ref_low_bound (t2)))
 +   return return_false_with_msg ();
 + if (!compare_operand (array_ref_element_size (t1),
 +   array_ref_element_size (t2)))
 +   return return_false_with_msg ();
 + if (!compare_operand (x1, x2))
 +   return return_false_with_msg ();
 + return compare_operand (y1, y2);
 +  }

No need for {...} if there are no local vars.
 +bool
 +func_checker::compare_function_decl (tree t1, tree t2)
 +{
 +  bool ret = false;
 +
 +  if (t1 == t2)
 +return true;
 +
 +  symtab_node *n1 = symtab_node::get (t1);
 +  symtab_node *n2 = symtab_node::get (t2);
 +
 +  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
 +{
 +  ret = m_ignored_source_nodes-contains (n1)
 +  m_ignored_target_nodes-contains (n2);
 +
 +  if (ret)
 + return true;
 +}
 +
 +  /* If function decl is WEAKREF, we compare targets.  */
 +  cgraph_node *f1 = cgraph_node::get (t1);
 +  cgraph_node *f2 = cgraph_node::get (t2);
 +
 +  if(f1  f2  f1-weakref  f2-weakref)
 +ret = f1-alias_target == f2-alias_target;
 +
 +  return ret;

Comparing aliases is bit more complicated than just handling weakrefs. I have
patch for symtab_node::equivalent_address_p somewhre in queue.  lets just drop
the fancy stuff for the moment and compare f1f2 for equivalence.
 +  ret = compare_decl (t1, t2);

Why functions are not compared with compare_decl while variables are?
 +
 +  return return_with_debug (ret);
 +}
 +
 +void
 +func_checker::parse_labels (sem_bb *bb)
 +{
 +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi);
 +   gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  if (gimple_code (stmt) == GIMPLE_LABEL)
 + {
 +   tree t = gimple_label_label (stmt);
 +   gcc_assert (TREE_CODE (t) == LABEL_DECL);
 +
 +   m_label_bb_map.put (t, bb-bb-index);
 + }
 +}
 +}
 +
 +/* Basic block equivalence comparison function that returns true if
 +   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.
 +
 +   In general, a collection of equivalence dictionaries is built for types
 +   like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This 
 infrastructure
 +   is utilized by every statement-by-stament comparison function.  */
 +
 +bool
 +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
 +{
 +  unsigned i;
 +  gimple_stmt_iterator gsi1, gsi2;
 +  gimple s1, s2;
 +
 +  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
 +  || bb1-edge_count != bb2-edge_count)
 +return return_false ();
 +
 +  gsi1 = gsi_start_bb (bb1-bb);
 +  gsi2 = gsi_start_bb (bb2-bb);
 +
 +  for (i = 0; i  bb1-nondbg_stmt_count; i++)
 +{
 +  if (is_gimple_debug (gsi_stmt (gsi1)))
 + gsi_next_nondebug (gsi1);
 +
 +  if (is_gimple_debug (gsi_stmt (gsi2)))
 + gsi_next_nondebug (gsi2);
 +
 +  s1 = gsi_stmt (gsi1);
 +  s2 = gsi_stmt (gsi2);
 +
 +  int eh1 = lookup_stmt_eh_lp_fn
 + (DECL_STRUCT_FUNCTION (m_source_func_decl), s1);
 +  int eh2 = lookup_stmt_eh_lp_fn
 + (DECL_STRUCT_FUNCTION (m_target_func_decl), s2);
 +
 +  if (eh1 != eh2)
 + return return_false_with_msg (EH regions are different);
 +
 +  if (gimple_code (s1) != gimple_code (s2))
 + return return_false_with_msg (gimple codes are different);
 +
 +  switch (gimple_code (s1))
 + {
 + case GIMPLE_CALL:
 + 

[gomp] [3/3] OpenACC 2.0 support for libgomp - documentation

2014-10-14 Thread Julian Brown
This is a version of the patch:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02024.html

against gomp4 branch instead of mainline.

OK to apply?

Thanks,

Julian

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
James Norris  jnor...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.
From c58006a7ade2a9556bd73bac9ef45b3bbd62ca37 Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Wed, 17 Sep 2014 10:26:56 -0700
Subject: [PATCH 2/3] OpenACC documentation

---
 libgomp/libgomp.texi |  661 --
 1 file changed, 636 insertions(+), 25 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
-* The libgomp ABI::Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
-* Copying::GNU general public license says
-   how you can copy and share libgomp.
-* GNU Free Documentation License::
-   How you can copy and share this manual.
-* Funding::How to help assure continued work for free 
-   software.
-* Library Index::  Index of this documentation.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+  programming interface.
+* OpenACC Environment Variables::Influencing OpenACC runtime behavior with
+ environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* The libgomp ABI::  Notes on the external libgomp ABI.
+* Reporting Bugs::   How to report bugs.
+* Copying::  GNU general public license says how you
+ can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share this manual.
+* Funding::  How to help assure continued work for free
+ software.
+* Library Index::Index of this documentation.
 @end menu
 
 
+
+@c 

Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Iain Buclaw
On 14 October 2014 15:28, Ian Lance Taylor i...@google.com wrote:
 On Tue, Oct 14, 2014 at 7:12 AM, Joel Brobecker brobec...@adacore.com wrote:

 libiberty/ChangeLog
 2014-08-05  Iain Buclaw  ibuc...@gdcproject.org

 * Makefile.in (CFILES): Add d-demangle.c.
 (REQUIRED_OFILES): Add d-demangle.o.
 * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case.
 (cplus_demangle): Likewise.
 * d-demangle.c: New file.
 * testsuite/Makefile.in (really-check): Add check-d-demangle.
 * testsuite/d-demangle-expected: New file.

 As hinted on gdb-patches, this patch causes a GDB build failure
 on Solaris 2.9, because it uses strtold which is not available.
 According to gnulib's documentation, it should also break on
 the following systems:

 NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0,
 Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS.

 This patch attempts to fix the issue by adding a configure check
 for strtold and adjusts the code to use strtod if strtold does not
 exist.

 Does this look OK to you? If yes, can one of the GCC maintainers
 please review?

 It doesn't make sense to me to use strtod if strtold is required.  And
 if strtold is not required, then it seems to me that we should always
 use strtod.  It seems to me that the right options are either 1) use
 strtod unconditionally; 2) add strtold to libiberty

 Since option 1 is simpler, what bad things would happen if we use
 strtod unconditionally?

 Ian

I've just seen this, so I'll repeat what I've said in gdb patches too.

The call to strtold is only needed to decode templates which have a
floating point value encoded inside. This value may or may not have a
greater than double precision.

Replacing long double with double will be fine with me.  I'll accept
that I didn't consider legacy in hindsight, and in reality it would be
rather rare to stumble upon the need for strtold.

Regards
Iain


RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
 Wilco Dijkstra wrote:
  Vladimir Makarov wrote:
   On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
   SPECFP is ~0.2% faster.
  Thanks for reporting this.  It is important for me as I have no aarch64
  machine for benchmarking.
 
  Perlbmk performance degradation is too big and I'll definitely look at
  this problem.
 
 Looking at the diffs in regexec.c which has the hot function regmatch(),
 nothing obvious stands out that could cause a serious regression.
 I did notice this around line 2300:
 
 .L802:
 ldr x1, [x23, 48]
 adrpx5, PL_savestack_ix
 ldr w0, [x23]
 str x5, [sp, 104]
 str x1, [x24, #:lo12:PL_regcc]
 ldr w27, [x1, 4]
 bl  regcppush
 -   ldr x5, [sp, 104]
 str w0, [sp, 112]
 ldr x0, [x23, 32]
 +   adrpx5, PL_savestack_ix
 ldr w28, [x5, #:lo12:PL_savestack_ix]
 +   str x5, [sp, 104]
 bl  regmatch
 ldr x5, [sp, 104]
 mov w19, w0
 ldr w1, [sp, 112]
 ldr w0, [x5, #:lo12:PL_savestack_ix]
 
 So it rematerializes once instance, but fails to rematerialize the second use.
 An extra store is inserted, and the first adrp and store are not removed as 
 dead.

A simple example that reproduces the issue (-mcpu=cortex-a57 -O2 
-fomit-frame-pointer 
-ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 
-ffixed-x25 
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30). It looks like an 
odd
interaction between -fcaller-saves and rematerialization.

void g(void);
int x;
int f3b(int y)
{
   y += x;
   g();
   y += x;
   g();
   y += x;
   return y;
}

f3b:
adrpx2, x   -- DEAD
sub sp, sp, #16
ldr w1, [x2, #:lo12:x]
str x2, [sp]  -- DEAD
add w0, w0, w1
str w0, [sp]  -- reuse of stackslot!!!
bl  g
adrpx2, x
ldr w0, [sp]
ldr w1, [x2, #:lo12:x]
str x2, [sp, 8]
add w0, w0, w1
str w0, [sp]  -- REMOVE
bl  g
ldr x2, [sp, 8] -- rematerialize adrp
ldr w0, [sp]
add sp, sp, 16
ldr w1, [x2, #:lo12:x]
add w0, w0, w1
ret

Wilco




[PATCH] PR lto/61048 Define missed builtins on demand

2014-10-14 Thread Ilya Palachev

Hi all,

Attached patch fixes PR lto/61048 - 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61048


The reason of failure was that the builtin information structure was not 
initialized properly at the link stage. The failed assertion was caused 
by missing builtin declaration ( BUILT_IN_ASAN_AFTER_DYNAMIC_INIT), 
which was requested from this structure.
As usual this information should be initialized in function 
lto_define_builtins, which is called from LTO lang hook function 
lto_init. But in the given testcase the initialization did not happen, 
since the declaration is initialized only if the following condition holds:


(flag_sanitize  (SANITIZE_ADDRESS | SANITIZE_THREAD \
| SANITIZE_UNDEFINED | 
SANITIZE_NONDEFAULT))


But if the user compiles (without linking) file in LTO mode with 
-fsanitize=address option, and then tries to link the executable from 
*.o file, but does not specify option -fsanitize=address, variable 
flag_sanitize will be 0 and sanitizer builtins info will not be 
initialized, and ICE will happen.


Commands to reproduce the problem:
g++ test.cpp -c -o test.o -fsanitize=address -flto
g++ test.o -o test -Wl,-flto  # At this stage flag_sanitize is 0, 
and sanitizer builtins are not defined.


The simplest way to fix this seems to add initialization of sanitizer 
builtins using function initialize_sanitizer_builtins - and this helps 
to avoid ICE:


diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index bc53632..f5ca849 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include ipa-inline.h
 #include params.h
 #include ipa-utils.h
+#include asan.h


 /* Number of parallel tasks to run, -1 if we want to use GNU Make 
jobserver.  */
@@ -1856,6 +1857,9 @@ lto_read_decls (struct lto_file_decl_data 
*decl_data, const void *data,
   data_in = lto_data_in_create (decl_data, (const char *) data + 
string_offset,

header-string_size, resolutions);

+  /* Initialize sanitizer builtins if necessary.  */
+  initialize_sanitizer_builtins();
+
   /* We do not uniquify the pre-loaded cache entries, those are middle-end
  internal types that should not be merged.  */


But this approach means that asan-specific functions must be called from 
lto.


The suggested patch proposes another approach: add definitions of 
builtins during the final stage, when they are requested from 
builtin_info structure.
I have tried to do it by adding lto-specific lang-hook, so that to reuse 
existing code for builtins initialization (currently builtins are 
initialized in lto_init hook).
In the attached patch such hook is added, and it is used in 
streamer_get_builtin_tree.


It seems that the discussed issue can happen not only for flag 
-fsanitize, but also for all options that cause the definition of 
builtins, so the proposed patch is independent from sanitizers.


The patch was bootstrapped and regtested on x86_64-unknown-linux-gnu.

Ok for trunk?

Best regards,
Ilya Palachev



From 926a8b84a52f3120c3f71cd28e0d782c719b7791 Mon Sep 17 00:00:00 2001
From: Ilya Palachev i.palac...@samsung.com
Date: Tue, 14 Oct 2014 19:22:32 +0400
Subject: [PATCH] Define missed builtins on demand

gcc/

2014-10-14  Ilya Palachev  i.palac...@samsung.com

	* langhooks.h (define_builtin_on_demand): New function.
	* langhooks-def.h (LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND): New
	macro.
	* lto/lto-lang.c (lto_define_builtin_on_demand): New function.
	* tree-streamer-in.c (streamer_get_builtin_tree): Use
	define_builtin_on_demand in case when the declaration of builtin
	is missing.

gcc/testsuite/

2014-10-14  Ilya Palachev  i.palac...@samsung.com

	* g++.dg/lto/pr61048_0.C: New test from bugzilla.
---
 gcc/langhooks-def.h  |  4 +++-
 gcc/langhooks.h  |  3 +++
 gcc/lto/lto-lang.c   | 16 
 gcc/testsuite/g++.dg/lto/pr61048_0.C | 10 ++
 gcc/tree-streamer-in.c   |  4 
 5 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr61048_0.C

diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index e5ae3e3..2ddccbc 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -254,11 +254,13 @@ extern void lhd_end_section (void);
 #define LANG_HOOKS_BEGIN_SECTION lhd_begin_section
 #define LANG_HOOKS_APPEND_DATA lhd_append_data
 #define LANG_HOOKS_END_SECTION lhd_end_section
+#define LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND 0
 
 #define LANG_HOOKS_LTO { \
   LANG_HOOKS_BEGIN_SECTION, \
   LANG_HOOKS_APPEND_DATA, \
-  LANG_HOOKS_END_SECTION \
+  LANG_HOOKS_END_SECTION, \
+  LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND \
 }
 
 /* The whole thing.  The structure is defined in langhooks.h.  */
diff --git a/gcc/langhooks.h b/gcc/langhooks.h
index 32e76f9..a0cbe5f 100644
--- a/gcc/langhooks.h
+++ b/gcc/langhooks.h
@@ -255,6 +255,9 @@ struct lang_hooks_for_lto
 
   /* End the previously begun LTO section.  

Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread Jeff Law

On 10/14/14 07:00, Jakub Jelinek wrote:


For the first two, I think (and said it before already) that the current
model of emitting set_got from a target hook during RA can't work, as there
can be calls in the prologue, and the prologue is inserted before the
set_got in that case.  I really think the RA should in that case just tell
the backend whether and in which register it wants to have the PIC register
loaded upon start of the function, and it should be emit prologue pass
that should arrange for that.
That works for me -- I've been encouraging Intel to push emitting the 
PIC setup further and further back in the RTL pipeline.  Their early 
patches had it very early in the RTL pipeline and naturally there was 
fallout/bleedout in various places in the optimizers.


I don't see much value in emitting the PIC setup prior to allocation, 
all I see is problems.





As for the code quality, either some RA improvements are needed, or
postreload must be able to fix it up, or hardreg propagation (though,
cprop_hardreg is forward propagation rather than backwards, right?).
Better before prologue is emitted though, because that will save/restore
the badly chosen hard reg too.
RA improvements are the way to go -- however, my understanding is that 
overall the code is better now than it was before Intel's changes, so I 
don't consider the performance side as a blocker for this code.


The biggest performance issue identified so far is rematerialization. 
The initial patch Intel sent to me was totally unacceptable as it just 
hacked off optimizers rather than digging into the guts of why IRA/LRA 
was unable to sanely rematerialize the PIC register value.


jeff


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread H.J. Lu
On Tue, Oct 14, 2014 at 9:43 AM, Jeff Law l...@redhat.com wrote:

 RA improvements are the way to go -- however, my understanding is that
 overall the code is better now than it was before Intel's changes, so I
 don't consider the performance side as a blocker for this code.


The new approach improves PIC code quality in functions where there
no frequent GOT access and extra register helps.

For ld.so and libc.so from glibc build, we use 2 registers to access GOT
instead of one register which may lead to lower performance in shared
libraries.


-- 
H.J.


Re: [PATCH 2/3] libstdc++: Add put_time support.

2014-10-14 Thread Jonathan Wakely

On 13/10/14 16:28 +0100, Jonathan Wakely wrote:

On 13/10/14 13:08 +0100, Jonathan Wakely wrote:

On 15/04/14 23:20 +0200, Rüdiger Sonderfeld wrote:

Described in [ext.manip].

* libstdc++-v3/include/std/iomanip (_Put_time): New struct.
(put_time): New manipulator.
(operator): New overloaded function.
* libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc:
* libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/2.cc:
* libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/wchar_t/1.cc:
* libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/wchar_t/2.cc:
New file.


The 27_io/manipulators/extended/put_time/char/2.cc and
27_io/manipulators/extended/put_time/wchar_t/2.cc tests fail for me.

i2.exe: 
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/2.cc:41:
 void test01(): Assertion `oss.str() == Son 1971' failed.
FAIL: 27_io/manipulators/extended/put_time/char/2.cc execution test


With my de_DE.utf8 locale the output is So 1971 not Son 1971.

$ LANG=de_DE.utf8 date +%a Mo


So let's just test the full name and not worry about how it's
abbreviated.

Tested x86_64-linux, committed to trunk.
commit 4ae8f20e4924754d7fb7809730f5491dc6a74944
Author: Jonathan Wakely jwak...@redhat.com
Date:   Tue Oct 14 17:48:44 2014 +0100

2014-10-14  R??diger Sonderfeld  ruedi...@c-plusplus.de

	PR libstdc++/54354
	* include/std/iomanip (_Put_time): New struct.
	(put_time): New manipulator.
	(operator): New overloaded function.
	* testsuite/27_io/manipulators/extended/put_time/char/1.cc: New.
	* testsuite/27_io/manipulators/extended/put_time/char/2.cc: New.
	* testsuite/27_io/manipulators/extended/put_time/wchar_t/1.cc: New.
	* testsuite/27_io/manipulators/extended/put_time/wchar_t/2.cc: New.

diff --git a/libstdc++-v3/include/std/iomanip b/libstdc++-v3/include/std/iomanip
index 9625d43..fce74c9 100644
--- a/libstdc++-v3/include/std/iomanip
+++ b/libstdc++-v3/include/std/iomanip
@@ -337,6 +337,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __os; 
 }
 
+  templatetypename _CharT
+struct _Put_time
+{
+  const std::tm* _M_tmb;
+  const _CharT* _M_fmt;
+};
+
+  /**
+   *  @brief  Extended manipulator for formatting time.
+   *
+   *  This manipulator uses time_put::put to format time.
+   *  [ext.manip]
+   *
+   *  @param __tmb  struct tm time data to format.
+   *  @param __fmt  format string.
+   */
+  templatetypename _CharT
+inline _Put_time_CharT
+put_time(const std::tm* __tmb, const _CharT* __fmt)
+{ return { __tmb, __fmt }; }
+
+  templatetypename _CharT, typename _Traits
+basic_ostream_CharT, _Traits
+operator(basic_ostream_CharT, _Traits __os, _Put_time_CharT __f)
+{
+  typename basic_ostream_CharT, _Traits::sentry __cerb(__os);
+  if (__cerb)
+{
+  ios_base::iostate __err = ios_base::goodbit;
+  __try
+{
+  typedef ostreambuf_iterator_CharT, _Traits   _Iter;
+  typedef time_put_CharT, _Iter_TimePut;
+
+  const _CharT* const __fmt_end = __f._M_fmt +
+_Traits::length(__f._M_fmt);
+
+  const _TimePut __mp = use_facet_TimePut(__os.getloc());
+  if (__mp.put(_Iter(__os.rdbuf()), __os, __os.fill(),
+   __f._M_tmb, __f._M_fmt, __fmt_end).failed())
+__err |= ios_base::badbit;
+}
+  __catch(__cxxabiv1::__forced_unwind)
+{
+  __os._M_setstate(ios_base::badbit);
+  __throw_exception_again;
+}
+  __catch(...)
+{ __os._M_setstate(ios_base::badbit); }
+  if (__err)
+__os.setstate(__err);
+}
+  return __os;
+}
+
 #if __cplusplus  201103L
 
 #define __cpp_lib_quoted_string_io 201304
diff --git a/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc b/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc
new file mode 100644
index 000..76e64ea
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc
@@ -0,0 +1,44 @@
+// { dg-options  -std=gnu++11  }
+
+// 2014-04-14 R??diger Sonderfeld  ruedi...@c-plusplus.de
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; 

Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Joel Brobecker
 I've just seen this, so I'll repeat what I've said in gdb patches too.
 
 The call to strtold is only needed to decode templates which have a
 floating point value encoded inside. This value may or may not have a
 greater than double precision.
 
 Replacing long double with double will be fine with me.  I'll accept
 that I didn't consider legacy in hindsight, and in reality it would be
 rather rare to stumble upon the need for strtold.

Attached is a patch that switches it to strtod. Do you have any
test that could quickly verify it? That seems to be the best
approach, at least short-term. Later on, if we do want to use
higher precision, we can indeed add strtold in libiberty.

libiberty/ChangeLog:

* d-demangle.c: Replace strtold with strtod in global comment.
(strtold): Remove declaration.
(strtod): New declaration.
(dlang_parse_real): Declare value as double instead of long
double.  Replace call to strtold by call to strtod.
Update format in call to snprintf.

I verified that the patch allows GDB to build on both sparc-solaris
and x86_64-linux.

Thanks,
-- 
Joel
From 99f9794c6d2f4dabed0bbcf2cf362b1eb25ee2a7 Mon Sep 17 00:00:00 2001
From: Joel Brobecker brobec...@adacore.com
Date: Tue, 14 Oct 2014 12:47:43 -0400
Subject: [PATCH] Use strtod instead of strtold in libiberty/d-demangle.c

strtold is currently used to decode templates which have a floating-point
value encoded inside; but this routine is not available on some systems,
such as Solaris 2.9 for instance.

This patch fixes the issue by replace the use of strtold by strtod.
It reduces a bit the precision, but it should still remain acceptable
in most cases.

libiberty/ChangeLog:

* d-demangle.c: Replace strtold with strtod in global comment.
(strtold): Remove declaration.
(strtod): New declaration.
(dlang_parse_real): Declare value as double instead of long
double.  Replace call to strtold by call to strtod.
Update format in call to snprintf.
---
 libiberty/d-demangle.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index d31bf94..bb481c0 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -28,7 +28,7 @@ If not, see http://www.gnu.org/licenses/.  */
 
 /* This file exports one function; dlang_demangle.
 
-   This file imports strtol and strtold for decoding mangled literals.  */
+   This file imports strtol and strtod for decoding mangled literals.  */
 
 #ifdef HAVE_CONFIG_H
 #include config.h
@@ -44,7 +44,7 @@ If not, see http://www.gnu.org/licenses/.  */
 #include stdlib.h
 #else
 extern long strtol (const char *nptr, char **endptr, int base);
-extern long double strtold (const char *nptr, char **endptr);
+extern double strtod (const char *nptr, char **endptr);
 #endif
 
 #include demangle.h
@@ -810,7 +810,7 @@ dlang_parse_real (string *decl, const char *mangled)
 {
   char buffer[64];
   int len = 0;
-  long double value;
+  double value;
   char *endptr;
 
   /* Handle NAN and +-INF.  */
@@ -877,12 +877,12 @@ dlang_parse_real (string *decl, const char *mangled)
 
   /* Convert buffer from hexadecimal to floating-point.  */
   buffer[len] = '\0';
-  value = strtold (buffer, endptr);
+  value = strtod (buffer, endptr);
 
   if (endptr == NULL || endptr != (buffer + len))
 return NULL;
 
-  len = snprintf (buffer, sizeof(buffer), %#Lg, value);
+  len = snprintf (buffer, sizeof(buffer), %#g, value);
   string_appendn (decl, buffer, len);
   return mangled;
 }
-- 
1.7.9.5



[patch] Make std::align tests depend on stdint.h

2014-10-14 Thread Jonathan Wakely

Tested x86_64-linux, committed to trunk.



Re: [PATCH] Fix optimize_range_tests_diff

2014-10-14 Thread Richard Biener
On October 14, 2014 6:02:19 PM CEST, Jakub Jelinek ja...@redhat.com wrote:
Hi!

When hacking on range reassoc opt, I've noticed we can emit
code with undefined behavior even when there wasn't one originally,
in particular for:
   (X - 43U) = 3U || (X - 75U) = 3U
   and this loop can transform that into
   ((X - 43U)  ~(75U - 43U)) = 3U.  */
we actually don't transform it to what the comment says, but
   ((X - 43)  ~(75U - 43U)) = 3U
i.e. the initial subtraction can be performed in signed type,
if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
at gimple level would be UB (not caught by -fsanitize=undefined,
because that is handled much earlier).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

2014-10-14  Jakub Jelinek  ja...@redhat.com

   * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
   MINUS_EXPR in unsigned type to avoid undefined behavior.

--- gcc/tree-ssa-reassoc.c.jj  2014-10-13 17:54:33.0 +0200
+++ gcc/tree-ssa-reassoc.c 2014-10-13 17:58:07.312705218 +0200
@@ -2250,8 +2250,13 @@ optimize_range_tests_diff (enum tree_cod
   if (tree_log2 (tem1)  0)
 return false;
 
+  type = unsigned_type_for (type);
+  tem1 = fold_convert (type, tem1);
+  tem2 = fold_convert (type, tem2);
+  lowi = fold_convert (type, lowi);
   mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
-  tem1 = fold_binary (MINUS_EXPR, type, rangei-exp, lowi);
+  tem1 = fold_binary (MINUS_EXPR, type,
+fold_convert (type, rangei-exp), lowi);
   tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
   lowj = build_int_cst (type, 0);
   if (update_range_test (rangei, rangej, 1, opcode, ops, tem1,

   Jakub




Re: [PATCH] Fix optimize_range_tests_diff

2014-10-14 Thread Jeff Law

On 10/14/14 10:02, Jakub Jelinek wrote:

Hi!

When hacking on range reassoc opt, I've noticed we can emit
code with undefined behavior even when there wasn't one originally,
in particular for:
(X - 43U) = 3U || (X - 75U) = 3U
and this loop can transform that into
((X - 43U)  ~(75U - 43U)) = 3U.  */
we actually don't transform it to what the comment says, but
((X - 43)  ~(75U - 43U)) = 3U
i.e. the initial subtraction can be performed in signed type,
if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
at gimple level would be UB (not caught by -fsanitize=undefined,
because that is handled much earlier).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-10-14  Jakub Jelinek  ja...@redhat.com

* tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
MINUS_EXPR in unsigned type to avoid undefined behavior.

Any chance this fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302


Jeff


[patch] Update libstdc++ status docs

2014-10-14 Thread Jonathan Wakely

Committed to trunk.
commit a94516a841a0588c6c7bf95248c2eaefd5e406f1
Author: Jonathan Wakely jwak...@redhat.com
Date:   Tue Oct 14 18:21:03 2014 +0100

	* doc/xml/manual/intro.xml: Update.
	* doc/xml/manual/status_cxx2011.xml: Update.
	* doc/html/manual/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index a71a9f9..2dd833d 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -803,6 +803,13 @@ requirements of the license of GCC.
 listitemparaThe traditional HP / SGI return type and value is blessed
 		by the resolution of the DR.
 /para/listitem/varlistentry
+
+varlistentrytermlink xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=../ext/lwg-defects.html#13391339/link:
+   emphasisuninitialized_fill_n should return the end of its range/emphasis
+/term
+listitemparaReturn the end of the filled range.
+/para/listitem/varlistentry
+
   /variablelist
 
  /section
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index c4b4457..a553adf 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -539,10 +539,9 @@ particular release.
   entry/
 /row
 row
-  ?dbhtml bgcolor=#C8B0B0 ?
   entry20.6.5/entry
   entryAlign/entry
-  entryN/entry
+  entryY/entry
   entry/
 /row
 row
@@ -2139,7 +2138,7 @@ particular release.
   entryFormatting and manipulators/entry
   entryPartial/entry
   entry
-Missing codeget_time/code and codeput_time/code manipulators.
+Missing codeget_time/code manipulator.
   /entry
 /row
 row


Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Ian Lance Taylor
On Tue, Oct 14, 2014 at 10:07 AM, Joel Brobecker brobec...@adacore.com wrote:

 libiberty/ChangeLog:

 * d-demangle.c: Replace strtold with strtod in global comment.
 (strtold): Remove declaration.
 (strtod): New declaration.
 (dlang_parse_real): Declare value as double instead of long
 double.  Replace call to strtold by call to strtod.
 Update format in call to snprintf.

This is OK.

Thanks.

Ian


Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Iain Buclaw
On 14 October 2014 18:07, Joel Brobecker brobec...@adacore.com wrote:
 I've just seen this, so I'll repeat what I've said in gdb patches too.

 The call to strtold is only needed to decode templates which have a
 floating point value encoded inside. This value may or may not have a
 greater than double precision.

 Replacing long double with double will be fine with me.  I'll accept
 that I didn't consider legacy in hindsight, and in reality it would be
 rather rare to stumble upon the need for strtold.

 Attached is a patch that switches it to strtod. Do you have any
 test that could quickly verify it? That seems to be the best
 approach, at least short-term. Later on, if we do want to use
 higher precision, we can indeed add strtold in libiberty.


See d-demangle-expected in the libiberty testsuite, in particular:

_D8demangle17__T4testVde0A8P6Zv
demangle.test!(42.)

_D8demangle16__T4testVdeA8P2Zv
demangle.test!(42.)

_D8demangle18__T4testVdeN0A8P6Zv
demangle.test!(-42.)

_D8demangle31__T4testVde0F6E978D4FDF3B646P7Zv
demangle.test!(123.456)


I doubt they would need adjusting.

Regards
Iain


Re: [PATCH] Fix optimize_range_tests_diff

2014-10-14 Thread Jakub Jelinek
On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote:
 On 10/14/14 10:02, Jakub Jelinek wrote:
 When hacking on range reassoc opt, I've noticed we can emit
 code with undefined behavior even when there wasn't one originally,
 in particular for:
 (X - 43U) = 3U || (X - 75U) = 3U
 and this loop can transform that into
 ((X - 43U)  ~(75U - 43U)) = 3U.  */
 we actually don't transform it to what the comment says, but
 ((X - 43)  ~(75U - 43U)) = 3U
 i.e. the initial subtraction can be performed in signed type,
 if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
 at gimple level would be UB (not caught by -fsanitize=undefined,
 because that is handled much earlier).
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2014-10-14  Jakub Jelinek  ja...@redhat.com
 
  * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
  MINUS_EXPR in unsigned type to avoid undefined behavior.
 Any chance this fixes:
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302

No.  For that I have right now:
-  if (tree_log2 (lowxor)  0)
+  if (wi::popcount (wi::to_widest (lowxor)) != 1)
in my tree, though supposedly:
   if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION 
(TREE_TYPE (lowxor != 1)
might be better, as without zext it will supposedly
not say popcount is 1 for smaller precision signed minimum values.
My wide-int-fu is limited, so if there is a better way to do this, I'm all
ears.

Jakub


Re: [PATCH] Fix optimize_range_tests_diff

2014-10-14 Thread Jeff Law

On 10/14/14 11:40, Jakub Jelinek wrote:

On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote:

On 10/14/14 10:02, Jakub Jelinek wrote:

When hacking on range reassoc opt, I've noticed we can emit
code with undefined behavior even when there wasn't one originally,
in particular for:
(X - 43U) = 3U || (X - 75U) = 3U
and this loop can transform that into
((X - 43U)  ~(75U - 43U)) = 3U.  */
we actually don't transform it to what the comment says, but
((X - 43)  ~(75U - 43U)) = 3U
i.e. the initial subtraction can be performed in signed type,
if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
at gimple level would be UB (not caught by -fsanitize=undefined,
because that is handled much earlier).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-10-14  Jakub Jelinek  ja...@redhat.com

* tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
MINUS_EXPR in unsigned type to avoid undefined behavior.

Any chance this fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302


No.  For that I have right now:
-  if (tree_log2 (lowxor)  0)
+  if (wi::popcount (wi::to_widest (lowxor)) != 1)
in my tree, though supposedly:
if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION 
(TREE_TYPE (lowxor != 1)
might be better, as without zext it will supposedly
not say popcount is 1 for smaller precision signed minimum values.
My wide-int-fu is limited, so if there is a better way to do this, I'm all
ears.

Ok.  Thanks for checking.

jeff


Re: [PATCH, Pointer Bounds Checker 14/x] Passes [4/n] Memory accesses instrumentation

2014-10-14 Thread Jeff Law

On 10/14/14 04:08, Ilya Enkovich wrote:

Are you just looking for the parameter in which we pass the static
chain?   Look at get_chain_decl for how we set it up.  You may
actually have to peek at more fields.  I don't think there's a
single magic bit that says this is the static chain.  Though it
may always appear in the same location on the parameter list.
Nested functions aren't something I'd poked with much.  Richard
Henderson might know more since he wrote tree-nested a while back.


Looking through tree-nested.c I found there is a static_chain_decl in
function structure holding created decl.

Perfect.



Ugh.  Note how this introduces another place that anyone who might
add a new RHS gimple statement needs to edit.  We need a pointer
back to this code so that folks will know it needs updating.  The
question is where to put it.

Basically we want a place where anyone adding a new code that can
appear on the RHS of an assignment must change already.  Thoughts
on a good location?

I realize there's probably many other places that probably need
these kinds of documentation back links, I'm not asking you to
address all of them.


Actually it shouldn't be so critical to meet some new RHS code in
this switch.  We may always say that we cannot find proper bounds and
use default ones.  I replaced gcc_uneachable with a warning about
lost bounds and added a comment into tree.def.  Would it be enough?
It'd be better than hitting the gcc_unreachable :-)  It's not perfect, 
but probably good enough.


Jeff


Re: [PATCH] Add D demangling support to libiberty

2014-10-14 Thread Joel Brobecker
  libiberty/ChangeLog:
 
  * d-demangle.c: Replace strtold with strtod in global comment.
  (strtold): Remove declaration.
  (strtod): New declaration.
  (dlang_parse_real): Declare value as double instead of long
  double.  Replace call to strtold by call to strtod.
  Update format in call to snprintf.
 
 This is OK.

Thanks, Ian. As suggested by Iain, I re-ran the libiberty
testsuite on x86_64-linux before committing the patch.

Thank you both!
-- 
Joel


Re: [PATCH 2/3] libstdc++: Add put_time support.

2014-10-14 Thread Rüdiger Sonderfeld
On Tuesday 14 October 2014 18:01:59 Jonathan Wakely wrote:
 So let's just test the full name and not worry about how it's
 abbreviated.
 
 Tested x86_64-linux, committed to trunk.

Sorry for causing the trouble.  I had it tested on my local machine.  Maybe 
the de_DE.utf8 locale is different.  Anyway testing for the full name is 
probably a better idea.  Thanks.

Regards,
Rüdiger.



Re: [PATCH, DWARF] re-init dw_frame_pointer_regnum between functions

2014-10-14 Thread Richard Henderson
On 10/14/2014 06:02 AM, Christian Bruel wrote:
 2014-09-23  Christian Bruel  christian.br...@st.com
 
   * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each 
 function.

It's tempting to make this a local variable within dwarf2out_frame_debug_expr
and not try to cache it at all.

But this is ok.


r~


Re: [PATCH, DWARF] re-init dw_frame_pointer_regnum between functions

2014-10-14 Thread Richard Henderson
On 10/14/2014 11:25 AM, Richard Henderson wrote:
 On 10/14/2014 06:02 AM, Christian Bruel wrote:
 2014-09-23  Christian Bruel  christian.br...@st.com

  * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each 
 function.
 
 It's tempting to make this a local variable within dwarf2out_frame_debug_expr
 and not try to cache it at all.
 
 But this is ok.

For the record, this also points out that the arm backend ought to be weaned
away from using dwarf2out_frame_debug_expr and use the REG_CFA_* notes
exclusively.  That would also fix an apparent error in arm_expand_prologue:

  if (IS_INTERRUPT (func_type))
{
  /* Interrupt functions must not corrupt any registers.
 Creating a frame pointer however, corrupts the IP
 register, so we must push it first.  */
  emit_multi_reg_push (1  IP_REGNUM, 1  IP_REGNUM);

  /* Do not set RTX_FRAME_RELATED_P on this insn.
 The dwarf stack unwinding code only wants to see one
 stack decrement per function, and this is not it.  If
 this instruction is labeled as being part of the frame
 creation sequence then dwarf2out_frame_debug_expr will
 die when it encounters the assignment of IP to FP
 later on, since the use of SP here establishes SP as
 the CFA register and not IP.

 Anyway this instruction is not really part of the stack
 frame creation although it is part of the prologue.  */

Certainly dwarf2cfi can handle arbitrary REG_CFA_ADJUST_CFA notes;
it's just the frame_debug_expr state machine that gets confused.


r~


Re: [PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand

2014-10-14 Thread Jeff Law

On 10/14/14 01:11, Uros Bizjak wrote:


2014-10-14  Uros Bizjak  ubiz...@gmail.com

 PR rtl-optimization/63475
 * alias.c (true_dependence_1): Always use get_addr to extract
 true address operands from x_addr and mem_addr.  Use extracted
 address operands to check for references with alignment ANDs.
 Use extracted address operands with find_base_term and
 base_alis_check. For noncanonicalized operands call canon_rtx with
 extracted address operand.
 (write_dependence_1): Ditto.
 (may_alias_p): Ditto.  Remove unused calls to canon_rtx.

s/alis/alias in the ChangeLog




Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and
alpha-linux-gnu for all default languages plus obj-c++ and go. While
there was no differences on x86_64-linux-gnu (as expected),
alpha-linux-gnu improved the result [1] for some hundred of PASSes in
gfortran testsuite [2].

OK for mainline?

OK.  No addition tests needed since this is covered by the existing suite.

jeff



Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.

2014-10-14 Thread Uros Bizjak
On Tue, Oct 14, 2014 at 9:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 It seems like I missed to post uppdated patch.
 On 25 Sep 20:11, Uros Bizjak wrote:
 I'd rather go with the second approach, it is less confusing from the
 maintainer POV. All other patterns with masking use some consistent
 template, so I'd suggest using the same approach for everything. If it
 is indeed too many patterns, then please split the patch to smaller
 pieces.
 Goal was not to decrease size of the patch, I wanted to make pattern look
 simpler by hiding masking stuff beyond `subst'.
 Anyway, I've updated the patch.

 Here it is (bootstrapped and regtested).

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md (define_mode_iterator VI_AVX2): Extend
 to support AVX-512BW.
 (define_mode_iterator VI124_AVX2_48_AVX512F): Remove.
 (define_expand plusminus_insnmode3): Remove masking support.
 (define_insn *plusminus_insnmode3): Ditto.
 (define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New.
 (define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
 (define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto.
 (define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto.
 (define_expand sse2_avx2_andnotmode3): Remove masking support.
 (define_insn *andnotmode3): Ditto.
 (define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New.
 (define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto.
 (define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto.
 (define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto.
 (define_insn *absmode2): Remove masking support.
 (define_insn absVI48_AVX512VL:mode2_mask): New.
 (define_insn absVI12_AVX512VL:mode2_mask): Ditto.
 (define_expand absmode2): Use VI_AVX2 mode iterator.

IMO, it seems much more readable this way.

OK for mainline.

Thanks,
Uros.


Re: [PATCH 2/2] Fix ILP32 ld.so.

2014-10-14 Thread Richard Henderson
On 08/08/2014 08:51 PM, Andrew Pinski wrote:
 ChangeLog:
   * explow.c (convert_memory_address_addr_space): Rename to ...
   (convert_memory_address_addr_space_1): This.  Add in_const argument.
   Inside a CONST RTL, permute the conversion and addition of constant
   for zero and sign extended pointers.
   (convert_memory_address_addr_space): New function.

Ok, with one nit...

 +((in_const  POINTERS_EXTEND_UNSIGNED !=0)

Missing space after !=


r~


[PATCH v2 03/13] Allow the static chain to be set from C

2014-10-14 Thread Richard Henderson
Replacing the hacky v1 with the proposed syntax relayed by PCC,
and changing the name to __builtin_call_with_static_chain.  Which
is kinda long, but at least it's more properly descriptive.

Adds documentation and an errors test case.


r~
From 7e31234f2e112bad576b748b2ff6cc615194c0f7 Mon Sep 17 00:00:00 2001
From: Richard Henderson r...@redhat.com
Date: Tue, 7 Oct 2014 12:17:28 -0700
Subject: [PATCH 03/13] Allow the static chain to be set from C

We need to be able to set the static chain on a few calls within the
Go runtime, so expose this with __builtin_call_with_static_chain.
---
 gcc/c-family/c-common.c  |  2 ++
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-parser.c | 40 
 gcc/doc/extend.texi  | 13 +
 gcc/testsuite/gcc.dg/cwsc0.c | 18 ++
 gcc/testsuite/gcc.dg/cwsc1.c | 31 +++
 6 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c
 create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 23163f5..f1bf47b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -442,6 +442,8 @@ const struct c_common_resword c_common_reswords[] =
   { __attribute__,	RID_ATTRIBUTE,	0 },
   { __auto_type,	RID_AUTO_TYPE,	D_CONLY },
   { __bases,  RID_BASES, D_CXXONLY },
+  { __builtin_call_with_static_chain,
+RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
   { __builtin_choose_expr, RID_CHOOSE_EXPR, D_CONLY },
   { __builtin_complex, RID_BUILTIN_COMPLEX, D_CONLY },
   { __builtin_shuffle, RID_BUILTIN_SHUFFLE, 0 },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 1e3477f..da1c12e 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -102,7 +102,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,	 RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
-  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE,
+  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, RID_BUILTIN_CALL_WITH_STATIC_CHAIN,
 
   /* C11 */
   RID_ALIGNAS, RID_GENERIC,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 346448a..708a125 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7372,6 +7372,46 @@ c_parser_postfix_expression (c_parser *parser)
 	  = comptypes (e1, e2) ? integer_one_node : integer_zero_node;
 	  }
 	  break;
+	case RID_BUILTIN_CALL_WITH_STATIC_CHAIN:
+	  {
+	vecc_expr_t, va_gc *cexpr_list;
+	c_expr_t *e2_p;
+	tree chain_value;
+
+	c_parser_consume_token (parser);
+	if (!c_parser_get_builtin_args (parser,
+	__builtin_call_with_static_chain,
+	cexpr_list, false))
+	  {
+		expr.value = error_mark_node;
+		break;
+	  }
+	if (vec_safe_length (cexpr_list) != 2)
+	  {
+		error_at (loc, wrong number of arguments to 
+			   %__builtin_call_with_static_chain%);
+		expr.value = error_mark_node;
+		break;
+	  }
+
+	expr = (*cexpr_list)[0];
+	e2_p = (*cexpr_list)[1];
+	*e2_p = convert_lvalue_to_rvalue (loc, *e2_p, true, true);
+	chain_value = e2_p-value;
+	mark_exp_read (chain_value);
+
+	if (TREE_CODE (expr.value) != CALL_EXPR)
+	  error_at (loc, first argument to 
+			%__builtin_call_with_static_chain% 
+			must be a call expression);
+	else if (TREE_CODE (TREE_TYPE (chain_value)) != POINTER_TYPE)
+	  error_at (loc, second argument to 
+			%__builtin_call_with_static_chain% 
+			must be a pointer type);
+	else
+	  CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value;
+	break;
+	  }
 	case RID_BUILTIN_COMPLEX:
 	  {
 	vecc_expr_t, va_gc *cexpr_list;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6db142e..f092ea1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8639,6 +8639,7 @@ in the Cilk Plus language manual which can be found at
 @node Other Builtins
 @section Other Built-in Functions Provided by GCC
 @cindex built-in functions
+@findex __builtin_call_with_static_chain
 @findex __builtin_fpclassify
 @findex __builtin_isfinite
 @findex __builtin_isnormal
@@ -9227,6 +9228,18 @@ depending on the arguments' types.  For example:
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp})
+
+The @var{call_exp} expression must be a function call, and the
+@var{pointer_exp} expression must be a pointer.  The @var{pointer_exp}
+is passed to the function call in the target's static chain location.
+The result of builtin is the result of the function call.
+
+@emph{Note:} This builtin is only available for C@.
+This builtin can be used to call Go closures from C.
+
+@end deftypefn
+
 @deftypefn {Built-in Function} @var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2})
 
 You can use the built-in function @code{__builtin_choose_expr} to
diff --git 

Re: [PATCH] Fix typo in comment for IRA

2014-10-14 Thread Jeff Law

On 10/13/14 20:49, Kito Cheng wrote:

Hi Marc:

- -1 if it is not a cost classe.  */
+ -1 if it is not a cost classes.  */


a cost class, no plural here.


Thank you for correcting me :)


Hi Jeff:

Thanks, and updated patch in attachment,

However I don't have commit right yet, can you help me to commit it? thanks.

Done.  Thanks.
jeff




Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-14 Thread Jeff Law

On 10/13/14 18:16, Joern Rennecke wrote:

On 13 October 2014 20:43, Jeff Law l...@redhat.com wrote: ...

I think you want smode in the mode_for_size call rather than
mode, right (both instances)?


No, nregs is the number of hard registers of regno in mode.  Hence
we must use the size of mode.

OK.  My bad.



To get some case where there's a difference, I was thinking of an
architecture that has partial integer mode registers that can be
grouped together as integral integer mode registers (e.g. one reg is
HImode or PSImode, save_mode would be PSImode, two regs form SImode).
In that case, you'd want something so that you can piece together
mode, i.e. either GET_MODE_CLASS (mode) or MODE_INT (which happen
to be again the same), but not GET_MODE_CLASS(smode), which would be
MODE_PARTIAL_INT

You're right.  We definitely don't want MODE_PARTIAL_INT here.

So if your patch resolves your issue, passes the usual 
bootstrap/regression test, then let's go with it.


jeff


Re: __intN patch 3/5: main __int128 - __intN conversion.

2014-10-14 Thread DJ Delorie

  extensions.  Is this OK?  If so, is there anything else, or can I
  check the whole mess in yet?
 
 Go ahead.

Thanks!  Committed.


Re: __intN patch 3/5: main __int128 - __intN conversion.

2014-10-14 Thread Markus Trippelsdorf
On 2014.08.25 at 23:03 -0400, DJ Delorie wrote:
 
  I'd like to see the updated version of the whole of patch 3 (tested
  to be actually independent of the other patches) for review, though
  I won't be reviewing the C++ parts.
 
 Here it is.  Tested on x86_64.  I include the msp430-modes.def patch
 for demonstration purposes although obviously msp430's __int20 won't
 work without the other patches.

This patch breaks ppc64:

../../gcc/gcc/config/rs6000/rs6000-c.c: In function ‘cpp_hashnode* 
rs6000_macro_to_expand(cpp_reader*, const cpp_token*)’:
../../gcc/gcc/config/rs6000/rs6000-c.c:237:24: error: ‘RID_INT128’ was not 
declared in this scope
make[3]: *** [rs6000-c.o] Error 1

-- 
Markus


Re: New rematerialization sub-pass in LRA

2014-10-14 Thread Peter Bergner
On Fri, 2014-10-10 at 11:02 -0400, Vladimir Makarov wrote:
 Here is a new rematerialization sub-pass of LRA.

When Mike and I build with this patch along with the patch that
enables LRA by default on powerpc64*-linux (attached below), we're
seeing the following error message.  I'm not sure how your patch
can cause this error, but it does go away if we remove your patch
and build again.

Peter


# Enable LRA by default
Index: gcc/config/rs6000/rs6000.opt
===
--- gcc/config/rs6000/rs6000.opt(revision 216216)
+++ gcc/config/rs6000/rs6000.opt(working copy)
@@ -466,7 +466,7 @@ Target RejectNegative Joined UInteger Va
 -mlong-double-n  Specify size of long double (64 or 128 bits)
 
 mlra
-Target Report Var(rs6000_lra_flag) Init(0) Save
+Target Report Var(rs6000_lra_flag) Init(1) Save
 Use LRA instead of reload
 
 msched-costly-dep=


Error message caused by LRA Rematerialization patch:

make[5]: Entering directory
`/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include'
mkdir -p ./powerpc64-linux/bits/stdc++.h.gch
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/./gcc/xgcc
-shared-libgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/./gcc -nostdinc++
-L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/src
 
-L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/src/.libs
 
-L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/libsupc++/.libs
 -B/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/bin/ 
-B/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/lib/ 
-isystem 
/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/include 
-isystem 
/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/sys-include
-x c++-header -nostdinc++ -g -O2 -D_GNU_SOURCE  
-I/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/powerpc64-linux
 
-I/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include
 -I/home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc++-v3/libsupc++ -O2 -g 
-std=gnu++0x 
/home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc++-v3/include/precompiled/stdc++.h
 \
-o powerpc64-linux/bits/stdc++.h.gch/O2ggnu++0x.gch
In file included
from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/move.h:57:0,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_pair.h:59,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_algobase.h:64,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/char_traits.h:39,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/ios:40,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/istream:38,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/sstream:38,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/complex:45,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/ccomplex:38,

from /home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc
++-v3/include/precompiled/stdc++.h:52:
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:251:12:
 error: redefinition of ‘struct std::__is_integral_helperunsigned int’
 struct __is_integral_helperunsigned __int128
^
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:226:12:
 error: previous definition of ‘struct std::__is_integral_helperunsigned int’
 struct __is_integral_helperunsigned int
^
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:1763:12:
 error: redefinition of ‘struct std::__make_signedunsigned int’
 struct __make_signedunsigned __int128
^
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:1735:12:
 error: previous definition of ‘struct std::__make_signedunsigned int’
 struct __make_signedunsigned int
^
In file included
from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/random:42:0,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_algo.h:66,

from 
/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/algorithm:62,

from /home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc
++-v3/include/precompiled/stdc++.h:64:

Fix PR ada/62019

2014-10-14 Thread Eric Botcazou
Someone broke again weak external symbols in Ada in exactly the same way as:
  https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00431.html
probably during the ongoing C++ reshuffling:

FAIL: gnat.dg/weak2.adb (test for excess errors)

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2014-10-14  Eric Botcazou  ebotca...@adacore.com

PR ada/62019
* tree-eh.c (tree_could_trap) FUNCTION_DECL: Revamp and really
do not choke on null node.
VAR_DECL: Likewise.


-- 
Eric BotcazouIndex: tree-eh.c
===
--- tree-eh.c	(revision 216193)
+++ tree-eh.c	(working copy)
@@ -2657,15 +2657,12 @@ tree_could_trap_p (tree expr)
   /* Assume that accesses to weak functions may trap, unless we know
 	 they are certainly defined in current TU or in some other
 	 LTO partition.  */
-  if (DECL_WEAK (expr)  !DECL_COMDAT (expr))
+  if (DECL_WEAK (expr)  !DECL_COMDAT (expr)  DECL_EXTERNAL (expr))
 	{
-	  struct cgraph_node *node;
-	  if (!DECL_EXTERNAL (expr))
-	return false;
-	  node = cgraph_node::get (expr)-function_symbol ();
-	  if (node  node-in_other_partition)
-	return false;
-	  return true;
+	  cgraph_node *node = cgraph_node::get (expr);
+	  if (node)
+	node = node-function_symbol ();
+	  return !(node  node-in_other_partition);
 	}
   return false;
 
@@ -2673,15 +2670,12 @@ tree_could_trap_p (tree expr)
   /* Assume that accesses to weak vars may trap, unless we know
 	 they are certainly defined in current TU or in some other
 	 LTO partition.  */
-  if (DECL_WEAK (expr)  !DECL_COMDAT (expr))
+  if (DECL_WEAK (expr)  !DECL_COMDAT (expr)  DECL_EXTERNAL (expr))
 	{
-	  varpool_node *node;
-	  if (!DECL_EXTERNAL (expr))
-	return false;
-	  node = varpool_node::get (expr)-ultimate_alias_target ();
-	  if (node  node-in_other_partition)
-	return false;
-	  return true;
+	  varpool_node *node = varpool_node::get (expr);
+	  if (node)
+	node = node-ultimate_alias_target ();
+	  return !(node  node-in_other_partition);
 	}
   return false;
 


Re: [PATCH] AutoFDO patch for trunk

2014-10-14 Thread Dehao Chen
On Tue, Oct 14, 2014 at 8:02 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Index: gcc/cgraphclones.c
 ===
 --- gcc/cgraphclones.c(revision 215826)
 +++ gcc/cgraphclones.c(working copy)
 @@ -453,6 +453,11 @@
  }
else
  count_scale = 0;
 +  /* In AutoFDO, if edge count is larger than callee's entry block
 + count, we will not update the original callee because it may
 + mistakenly mark some hot function as cold.  */
 +  if (flag_auto_profile  gcov_count = count)
 +update_original = false;

 lets drop this from initial patch.

Done

 Index: gcc/bb-reorder.c
 ===
 --- gcc/bb-reorder.c  (revision 215826)
 +++ gcc/bb-reorder.c  (working copy)
 @@ -1569,15 +1569,14 @@
/* Mark which partition (hot/cold) each basic block belongs in.  */
FOR_EACH_BB_FN (bb, cfun)
  {
 -  bool cold_bb = false;
 +  bool cold_bb = probably_never_executed_bb_p (cfun, bb);

 and this too
 (basically all the tweaks should IMO go in independently and ideally in
 a way that does not need flag_auto_profile test).

Done.

 +/* Return true if BB contains indirect call.  */
 +
 +static bool
 +has_indirect_call (basic_block bb)
 +{
 +  gimple_stmt_iterator gsi;
 +
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  if (gimple_code (stmt) == GIMPLE_CALL
 +(gimple_call_fn (stmt) == NULL
 +   || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL))

 You probably want to skip gimple_call_internal_p calls here.

Done

 +
 +/* From AutoFDO profiles, find values inside STMT for that we want to 
 measure
 +   histograms for indirect-call optimization.  */
 +
 +static void
 +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map,
 + bool transform)
 +{
 +  gimple stmt = gsi_stmt (*gsi);
 +  tree callee;
 +
 +  if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL
 +  || gimple_call_fndecl (stmt) != NULL_TREE)
 +return;
 +
 +  callee = gimple_call_fn (stmt);
 +
 +  histogram_value hist = gimple_alloc_histogram_value (
 +  cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
 +  hist-n_counters = 3;
 +  hist-hvalue.counters =  XNEWVEC (gcov_type, hist-n_counters);
 +  gimple_add_histogram_value (cfun, stmt, hist);
 +
 +  gcov_type total = 0;
 +  icall_target_map::const_iterator max_iter = map.end();
 +
 +  for (icall_target_map::const_iterator iter = map.begin();
 +   iter != map.end(); ++iter)
 +{
 +  total += iter-second;
 +  if (max_iter == map.end() || max_iter-second  iter-second)
 + max_iter = iter;
 +}
 +
 +  hist-hvalue.counters[0] = (unsigned long long)
 +  afdo_string_table-get_name (max_iter-first);
 +  hist-hvalue.counters[1] = max_iter-second;
 +  hist-hvalue.counters[2] = total;
 +
 +  if (!transform)
 +return;
 +
 +  if (gimple_ic_transform (gsi))
 +{
 +  struct cgraph_edge *indirect_edge =
 +   cgraph_node::get (current_function_decl)-get_edge (stmt);
 +  struct cgraph_node *direct_call =
 +   find_func_by_profile_id ((int)hist-hvalue.counters[0]);
 +  if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL)
 + return;
 +  struct cgraph_edge *new_edge =
 +   indirect_edge-make_speculative (direct_call, 0, 0);
 +  new_edge-redirect_call_stmt_to_callee ();
 +  gimple_remove_histogram_value (cfun, stmt, hist);
 +  inline_call (new_edge, true, NULL, NULL, false);
 +  return;
 +}
 +  return;

 Is it necessary to go via histogram and gimple_ic_transform here?  I would 
 expect that all you
 need is to make the speculative edge and inline it. (bypassing the work of 
 producing fake
 histogram value and calling igmple_ic_transofrm on it)

 Also it seems to me that you want to set direct_count nad frequency argument 
 of
 make_speculative so the resulting function profile is not off.

This function is actually served for 2 purposes:

* before annotation, we need to mark histogram, promote and inline
* after annotation, we just need to mark, and let follow-up logic to
decide if it needs to promote and inline.

And you are right, for the before annotation case, we can simply
call mark speculative and inline. But we still need the logic to
fake histogram for after annotation case. As a result, I unified two
cases into one function to reuse code as much as possible. Shall I
separate it into two functions instead?


 The rest of interfaces seems quite sane now.  Can you please look into
 using speculative edges directly instead of hooking into the vpt 
 infrastructure
 and fixing the formatting issues of the new pass?

I'll work on the formatting issues now (need to learn the format first
;-). The attached patch is up-to-date except for formatting changes.
I'll upload the patch again once the format change is in.

Thanks,
Dehao


 I will try to make another pass over 

Re: __intN patch 3/5: main __int128 - __intN conversion.

2014-10-14 Thread DJ Delorie

 ../../gcc/gcc/config/rs6000/rs6000-c.c:237:24: error: ‘RID_INT128’ was not 
 declared in this scope

Two options:

1. If you know the RS6000 will never have any __intN other than
   __int128, just use RID_INT_N_0, although this is a hack it will
   work as long as there *is* an __int128 for RS6000.

2. Alternately, you need to check all entries in the __intN array
   for proper size, which is more correct but more complex.

Would you like me to work on the second option, or would you prefer to
tackle this yourself?


Re: __intN patch 5/5: msp430-specific changes

2014-10-14 Thread DJ Delorie

 This is the MSP430-specific use of the new intN framework to enable
 true 20-bit pointers.  Since I'm one of the MSP430 maintainers, this
 patch is being posted for reference, not for approval.

Now that the other parts are committed, I'm checking this one in too.

 gcc/config/msp430
   * config/msp430/msp430-modes.def (PSI): Add.
 
   * config/msp430/msp430-protos.h (msp430_hard_regno_nregs_has_padding): 
 New.
   (msp430_hard_regno_nregs_with_padding): New.
   * config/msp430/msp430.c (msp430_scalar_mode_supported_p): New.
   (msp430_hard_regno_nregs_has_padding): New.
   (msp430_hard_regno_nregs_with_padding): New.
   (msp430_unwind_word_mode): Use PSImode instead of SImode.
   (msp430_addr_space_legitimate_address_p): New.
   (msp430_asm_integer): New.
   (msp430_init_dwarf_reg_sizes_extra): New.
   (msp430_print_operand): Use X suffix for PSImode even in small model.
   * config/msp430/msp430.h (POINTER_SIZE): Use 20 bits, not 32.
   (PTR_SIZE): ...but 4 bytes for EH.
   (SIZE_TYPE): Use __int20.
   (PTRDIFF_TYPE): Likewise.
   (INCOMING_FRAME_SP_OFFSET): Adjust.
   * config/msp430/msp430.md (movqi_topbyte): New.
   (movpsi): Use fixed suffixes.
   (movsipsi2): Enable for 430X, not large model.
   (extendhipsi2): Likewise.
   (zero_extendhisi2): Likewise.
   (zero_extendhisipsi2): Likewise.
   (extend_and_shift1_hipsi2): Likewise.
   (extendpsisi2): Likewise.
   (*bitbranchmode4_z): Fix suffix logic.
 
 
 Index: gcc/config/msp430/msp430-protos.h
 ===
 --- gcc/config/msp430/msp430-protos.h (revision 213886)
 +++ gcc/config/msp430/msp430-protos.h (working copy)
 @@ -27,12 +27,15 @@ void  msp430_expand_epilogue (int);
  void msp430_expand_helper (rtx *operands, const char *, bool);
  void msp430_expand_prologue (void);
  const char * msp430x_extendhisi (rtx *);
  void msp430_fixup_compare_operands (enum machine_mode, rtx *);
  int  msp430_hard_regno_mode_ok (int, enum machine_mode);
  int  msp430_hard_regno_nregs (int, enum machine_mode);
 +int  msp430_hard_regno_nregs_has_padding (int, enum machine_mode);
 +int  msp430_hard_regno_nregs_with_padding (int, enum machine_mode);
 +boolmsp430_hwmult_enabled (void);
  rtx  msp430_incoming_return_addr_rtx (void);
  void msp430_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
  int  msp430_initial_elimination_offset (int, int);
  boolmsp430_is_interrupt_func (void);
  const char * msp430x_logical_shift_right (rtx);
  const char * msp430_mcu_name (void);
 Index: gcc/config/msp430/msp430.md
 ===
 --- gcc/config/msp430/msp430.md   (revision 213886)
 +++ gcc/config/msp430/msp430.md   (working copy)
 @@ -176,12 +176,19 @@

@
 MOV.B\t%1, %0
 MOV%X1.B\t%1, %0
  )
  
 +(define_insn movqi_topbyte
 +  [(set (match_operand:QI 0 msp_nonimmediate_operand =r)
 + (subreg:QI (match_operand:PSI 1 msp_general_operand r) 2))]
 +  msp430x
 +  PUSHM.A\t#1,%1 { POPM.W\t#1,%0 { POPM.W\t#1,%0
 +)
 +
  (define_insn movqi
[(set (match_operand:QI 0 msp_nonimmediate_operand =rYs,rm)
   (match_operand:QI 1 msp_general_operand riYs,rmi))]

@
MOV.B\t%1, %0
 @@ -220,27 +227,27 @@
  ;; Some MOVX.A cases can be done with MOVA, this is only a few of them.
  (define_insn movpsi
[(set (match_operand:PSI 0 msp_nonimmediate_operand =r,Ya,rm)
   (match_operand:PSI 1 msp_general_operand riYa,r,rmi))]

@
 -  MOV%Q0\t%1, %0
 -  MOV%Q0\t%1, %0
 -  MOV%X0.%Q0\t%1, %0)
 +  MOVA\t%1, %0
 +  MOVA\t%1, %0
 +  MOVX.A\t%1, %0)
  
  ; This pattern is identical to the truncsipsi2 pattern except
  ; that it uses a SUBREG instead of a TRUNC.  It is needed in
  ; order to prevent reload from converting (set:SI (SUBREG:PSI (SI)))
  ; into (SET:PSI (PSI)).
  ;
  ; Note: using POPM.A #1 is two bytes smaller than using POPX.A
  
  (define_insn movsipsi2
[(set (match_operand:PSI0 register_operand =r)
   (subreg:PSI (match_operand:SI 1 register_operand r) 0))]
 -  TARGET_LARGE
 +  msp430x
PUSH.W\t%H1 { PUSH.W\t%L1 { POPM.A #1, %0 ; Move reg-pair %L1:%H1 into 
 pointer %0
  )
  
  ;;
  ;; Math
  
 @@ -564,49 +571,49 @@
{ return msp430x_extendhisi (operands); }
  )
  
  (define_insn extendhipsi2
[(set (match_operand:PSI 0 nonimmediate_operand =r)
   (subreg:PSI (sign_extend:SI (match_operand:HI 1 nonimmediate_operand 
 0)) 0))]
 -  TARGET_LARGE
 +  msp430x
RLAM #4, %0 { RRAM #4, %0
  )
  
  ;; Look for cases where integer/pointer conversions are suboptimal due
  ;; to missing patterns, despite us not having opcodes for these
  ;; patterns.  Doing these manually allows for alternate optimization
  ;; paths.
  (define_insn zero_extendhisi2
[(set (match_operand:SI 0 nonimmediate_operand =rm)
   

Re: [PATCH] AutoFDO patch for trunk

2014-10-14 Thread Dehao Chen
The new patch is attached. I used clang-format for format auto-profile.{c|h}

Thanks,
Dehao

On Tue, Oct 14, 2014 at 2:05 PM, Dehao Chen de...@google.com wrote:
 On Tue, Oct 14, 2014 at 8:02 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Index: gcc/cgraphclones.c
 ===
 --- gcc/cgraphclones.c(revision 215826)
 +++ gcc/cgraphclones.c(working copy)
 @@ -453,6 +453,11 @@
  }
else
  count_scale = 0;
 +  /* In AutoFDO, if edge count is larger than callee's entry block
 + count, we will not update the original callee because it may
 + mistakenly mark some hot function as cold.  */
 +  if (flag_auto_profile  gcov_count = count)
 +update_original = false;

 lets drop this from initial patch.

 Done

 Index: gcc/bb-reorder.c
 ===
 --- gcc/bb-reorder.c  (revision 215826)
 +++ gcc/bb-reorder.c  (working copy)
 @@ -1569,15 +1569,14 @@
/* Mark which partition (hot/cold) each basic block belongs in.  */
FOR_EACH_BB_FN (bb, cfun)
  {
 -  bool cold_bb = false;
 +  bool cold_bb = probably_never_executed_bb_p (cfun, bb);

 and this too
 (basically all the tweaks should IMO go in independently and ideally in
 a way that does not need flag_auto_profile test).

 Done.

 +/* Return true if BB contains indirect call.  */
 +
 +static bool
 +has_indirect_call (basic_block bb)
 +{
 +  gimple_stmt_iterator gsi;
 +
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  if (gimple_code (stmt) == GIMPLE_CALL
 +(gimple_call_fn (stmt) == NULL
 +   || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL))

 You probably want to skip gimple_call_internal_p calls here.

 Done

 +
 +/* From AutoFDO profiles, find values inside STMT for that we want to 
 measure
 +   histograms for indirect-call optimization.  */
 +
 +static void
 +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map,
 + bool transform)
 +{
 +  gimple stmt = gsi_stmt (*gsi);
 +  tree callee;
 +
 +  if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL
 +  || gimple_call_fndecl (stmt) != NULL_TREE)
 +return;
 +
 +  callee = gimple_call_fn (stmt);
 +
 +  histogram_value hist = gimple_alloc_histogram_value (
 +  cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
 +  hist-n_counters = 3;
 +  hist-hvalue.counters =  XNEWVEC (gcov_type, hist-n_counters);
 +  gimple_add_histogram_value (cfun, stmt, hist);
 +
 +  gcov_type total = 0;
 +  icall_target_map::const_iterator max_iter = map.end();
 +
 +  for (icall_target_map::const_iterator iter = map.begin();
 +   iter != map.end(); ++iter)
 +{
 +  total += iter-second;
 +  if (max_iter == map.end() || max_iter-second  iter-second)
 + max_iter = iter;
 +}
 +
 +  hist-hvalue.counters[0] = (unsigned long long)
 +  afdo_string_table-get_name (max_iter-first);
 +  hist-hvalue.counters[1] = max_iter-second;
 +  hist-hvalue.counters[2] = total;
 +
 +  if (!transform)
 +return;
 +
 +  if (gimple_ic_transform (gsi))
 +{
 +  struct cgraph_edge *indirect_edge =
 +   cgraph_node::get (current_function_decl)-get_edge (stmt);
 +  struct cgraph_node *direct_call =
 +   find_func_by_profile_id ((int)hist-hvalue.counters[0]);
 +  if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL)
 + return;
 +  struct cgraph_edge *new_edge =
 +   indirect_edge-make_speculative (direct_call, 0, 0);
 +  new_edge-redirect_call_stmt_to_callee ();
 +  gimple_remove_histogram_value (cfun, stmt, hist);
 +  inline_call (new_edge, true, NULL, NULL, false);
 +  return;
 +}
 +  return;

 Is it necessary to go via histogram and gimple_ic_transform here?  I would 
 expect that all you
 need is to make the speculative edge and inline it. (bypassing the work of 
 producing fake
 histogram value and calling igmple_ic_transofrm on it)

 Also it seems to me that you want to set direct_count nad frequency argument 
 of
 make_speculative so the resulting function profile is not off.

 This function is actually served for 2 purposes:

 * before annotation, we need to mark histogram, promote and inline
 * after annotation, we just need to mark, and let follow-up logic to
 decide if it needs to promote and inline.

 And you are right, for the before annotation case, we can simply
 call mark speculative and inline. But we still need the logic to
 fake histogram for after annotation case. As a result, I unified two
 cases into one function to reuse code as much as possible. Shall I
 separate it into two functions instead?


 The rest of interfaces seems quite sane now.  Can you please look into
 using speculative edges directly instead of hooking into the vpt 
 infrastructure
 and fixing the formatting issues of the new pass?

 I'll work on the formatting issues now (need to learn the format first
 ;-). The 

[Bug libstdc++/63500] [4.9/5 Regression] bug in debug version of std::make_move_iterator?

2014-10-14 Thread François Dumont

Hi

Here is a proposal to fix the issue with iterators which do not 
expose lvalue references when dereferenced. I simply chose to detect 
such an issue in c++11 mode thanks to the is_lvalue_reference meta function.


2014-10-15  François Dumont  fdum...@gcc.gnu.org

PR libstdc++/63500
* include/bits/cpp_type_traits.h (__true_type): Add __value constant.
(__false_type): Likewise.
* include/debug/functions.h (__foreign_iterator_aux2): Do not check for
foreign iterators if input iterators returns rvalue reference.
* testsuite/23_containers/vector/63500.cc: New.

Tested under Linux x86_64.

François

Index: include/bits/cpp_type_traits.h
===
--- include/bits/cpp_type_traits.h	(revision 216158)
+++ include/bits/cpp_type_traits.h	(working copy)
@@ -79,9 +79,12 @@
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-  struct __true_type { };
-  struct __false_type { };
+  struct __true_type
+  { enum { __value = 1 }; };
 
+  struct __false_type
+  { enum { __value = 0 }; };
+
   templatebool
 struct __truth_type
 { typedef __false_type __type; };
Index: include/debug/functions.h
===
--- include/debug/functions.h	(revision 216158)
+++ include/debug/functions.h	(working copy)
@@ -34,7 +34,7 @@
 	  // _Iter_base
 #include bits/cpp_type_traits.h	  // for __is_integer
 #include bits/move.h// for __addressof and addressof
-# include bits/stl_function.h		  // for less
+#include bits/stl_function.h		  // for less
 #if __cplusplus = 201103L
 # include type_traits			  // for is_lvalue_reference and __and_
 #endif
@@ -252,8 +252,21 @@
 			const _InputIterator __other,
 			const _InputIterator __other_end)
 {
+#if __cplusplus = 201103L
+  typedef std::iterator_traits_InputIterator _InputIteTraits;
+  typedef typename _InputIteTraits::reference _InputIteRefType;
+#endif
   return __foreign_iterator_aux3(__it, __other, __other_end,
+#if __cplusplus  201103L
  _Is_contiguous_sequence_Sequence());
+#else
+  typename std::conditional
+	std::__and_std::integral_constant
+	  bool, _Is_contiguous_sequence_Sequence::__value,
+		std::is_lvalue_reference_InputIteRefType ::value,
+	std::__true_type,
+	std::__false_type::type());
+#endif
 }
 
   /* Handle the case where we aren't really inserting a range after all */
Index: testsuite/23_containers/vector/63500.cc
===
--- testsuite/23_containers/vector/63500.cc	(revision 0)
+++ testsuite/23_containers/vector/63500.cc	(working copy)
@@ -0,0 +1,39 @@
+// -*- C++ -*-
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// { dg-options -std=gnu++11 }
+// { dg-do compile }
+
+#include memory
+#include iterator
+#include debug/vector
+
+class Foo
+{};
+
+void
+test01()
+{
+  __gnu_debug::vectorstd::unique_ptrFoo v;
+  __gnu_debug::vectorstd::unique_ptrFoo w;
+
+  v.insert(end(v),
+	   make_move_iterator(begin(w)),
+	   make_move_iterator(end(w)));
+}


Re: New rematerialization sub-pass in LRA

2014-10-14 Thread Vladimir Makarov

On 2014-10-14 4:17 PM, Peter Bergner wrote:

On Fri, 2014-10-10 at 11:02 -0400, Vladimir Makarov wrote:

Here is a new rematerialization sub-pass of LRA.


When Mike and I build with this patch along with the patch that
enables LRA by default on powerpc64*-linux (attached below), we're
seeing the following error message.  I'm not sure how your patch
can cause this error, but it does go away if we remove your patch
and build again.


Peter, thanks for checking the patch and reporting this.  I had several 
wrong code generation problems with rematerialization on x86 and arm. 
I've solved them before posting the patch but I did not check ppc64.


As a lot of people started to try the patch, several problems were 
reported.  I'll address them and do some patch modifications.  Now I 
think that I'll commit the patch into the trunk not earlier the next 
week.  And I'll check with ppc64 too to be sure that we have no wrong 
code generation problems on this target too.





Re: [PATCH] AutoFDO patch for trunk

2014-10-14 Thread Andi Kleen
Dehao Chen de...@google.com writes:
 +
 +@item -fauto-profile
 +@itemx -fauto-profile=@var{path}
 +@opindex fauto-profile
 +Enable sampling based feedback directed optimizations, and optimizations
 +generally profitable only with profile feedback available.
 +
 +The following options are enabled: @code{-fbranch-probabilities}, 
 @code{-fvpt},
 +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, 
 @code{-ftree-vectorize},
 +@code{ftree-loop-distribute-patterns}

This needs more description aimed end-users, what it is good for and why,
and a pointer to the needed utilities and a short summary
what steps they need to take.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] support ggc hash_map and hash_set

2014-10-14 Thread Trevor Saunders
On Tue, Oct 14, 2014 at 04:05:25PM +0200, Richard Biener wrote:
 On Tue, Sep 2, 2014 at 3:56 AM,  tsaund...@mozilla.com wrote:
  From: Trevor Saunders tsaund...@mozilla.com
 
  Hi,
 
  There are still some issues to make this work really nicely, but this part 
  is
  probably good enough its worth reviewing.
 
  For one thing you can't use ggc hash_map or set in front ends with some 
  types
  or gengtype will decide to put the overloads of the marking routines it
  provides in a front end file instead of the one it choose before breaking 
  other
  front ends.  However that seems to be an unrelated issue you can trigger it
  without using hash_map/set, so we might as well solve it separetly.
 
  I had to have the entry marking functions for set deligate to the traits 
  class
  because gcc  4.9.1 issues clearly bogus errors if you inline the code from 
  the
  traits implementation.  We may well want to make map work the same way at 
  some
  point to enable some of the special GTY attributes like if_marked, but it
  doesn't seem to be necessary right now.
 
  bootstrapped + regtested without regressions on x86_64-unknown-linux-gnu, 
  ok?
 
 I have just noticed that this (ggc support for hash-table.h) makes it no 
 longer
 suitable for use from generator programs (trying to merge from trunk on
 match-and-simplify).  If you look at vec.h it has sophisticated guards
 to block out GGC support if GENERATOR_FILE is defined.

yeah, it works, but its kind of messy since some of the generator
programs include ggc.h.

 Can you try to fix this please?

I expect its doable, I can try to get it done later this week / week
end, but next few days are busy for me.

Trev

 
 Thanks,
 Richard.
 
  Trev
 
  gcc/ChangeLog:
 
  2014-09-01  Trevor Saunders  tsaund...@mozilla.com
 
  * alloc-pool.c: Include coretypes.h.
  * cgraph.h, dbxout.c, dwarf2out.c, except.c, except.h, function.c,
  function.h, symtab.c, tree-cfg.c, tree-eh.c: Use hash_map and
  hash_set instead of htab.
  * ggc-page.c (in_gc): New variable.
  (ggc_free): Do nothing if a collection is taking place.
  (ggc_collect): Set in_gc appropriately.
  * ggc.h (gt_ggc_mx(const char *)): New function.
  (gt_pch_nx(const char *)): Likewise.
  (gt_ggc_mx(int)): Likewise.
  (gt_pch_nx(int)): Likewise.
  * hash-map.h (hash_map::hash_entry::ggc_mx): Likewise.
  (hash_map::hash_entry::pch_nx): Likewise.
  (hash_map::hash_entry::pch_nx_helper): Likewise.
  (hash_map::hash_map): Adjust.
  (hash_map::create_ggc): New function.
  (gt_ggc_mx): Likewise.
  (gt_pch_nx): Likewise.
  * hash-set.h (default_hashset_traits::ggc_mx): Likewise.
  (default_hashset_traits::pch_nx): Likewise.
  (hash_set::hash_entry::ggc_mx): Likewise.
  (hash_set::hash_entry::pch_nx): Likewise.
  (hash_set::hash_entry::pch_nx_helper): Likewise.
  (hash_set::hash_set): Adjust.
  (hash_set::create_ggc): New function.
  (hash_set::elements): Likewise.
  (gt_ggc_mx): Likewise.
  (gt_pch_nx): Likewise.
  * hash-table.h (hash_table::hash_table): Adjust.
  (hash_table::m_ggc): New member.
  (hash_table::~hash_table): Adjust.
  (hash_table::expand): Likewise.
  (hash_table::empty): Likewise.
  (gt_ggc_mx): New function.
  (hashtab_entry_note_pointers): Likewise.
  (gt_pch_nx): Likewise.
 
 
  diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c
  index 0d31835..bfaa0e4 100644
  --- a/gcc/alloc-pool.c
  +++ b/gcc/alloc-pool.c
  @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
   #include config.h
   #include system.h
  +#include coretypes.h
   #include alloc-pool.h
   #include hash-table.h
   #include hash-map.h
  diff --git a/gcc/cgraph.h b/gcc/cgraph.h
  index 879899c..030a1c7 100644
  --- a/gcc/cgraph.h
  +++ b/gcc/cgraph.h
  @@ -1604,7 +1604,6 @@ struct cgraph_2node_hook_list;
 
   /* Map from a symbol to initialization/finalization priorities.  */
   struct GTY(()) symbol_priority_map {
  -  symtab_node *symbol;
 priority_type init;
 priority_type fini;
   };
  @@ -1872,7 +1871,7 @@ public:
 htab_t GTY((param_is (symtab_node))) assembler_name_hash;
 
 /* Hash table used to hold init priorities.  */
  -  htab_t GTY ((param_is (symbol_priority_map))) init_priority_hash;
  +  hash_mapsymtab_node *, symbol_priority_map *init_priority_hash;
 
 FILE* GTY ((skip)) dump_file;
 
  diff --git a/gcc/dbxout.c b/gcc/dbxout.c
  index 946f1d1..d856bdd 100644
  --- a/gcc/dbxout.c
  +++ b/gcc/dbxout.c
  @@ -2484,12 +2484,9 @@ dbxout_expand_expr (tree expr)
   /* Helper function for output_used_types.  Queue one entry from the
  used types hash to be output.  */
 
  -static int
  -output_used_types_helper (void **slot, void *data)
  +bool
  +output_used_types_helper (tree const type, vectree *types_p)
   {
  -  tree type = (tree) *slot;
  -  vectree *types_p = (vectree *) data;
  -
 if ((TREE_CODE (type) == RECORD_TYPE
  

Re: [PATCH 1/2] xtensa: drop unimplemented floating point operations

2014-10-14 Thread Max Filippov
On Mon, Oct 13, 2014 at 8:05 PM, augustine.sterl...@gmail.com
augustine.sterl...@gmail.com wrote:
 On Sun, Oct 12, 2014 at 3:46 PM, Max Filippov jcmvb...@gmail.com wrote:
 xtensa ISA never implemented FP division, reciprocal, square root and
 inverse square root as single opcode. Remove patterns that can emit
 them.

 2014-10-09  Max Filippov  jcmvb...@gmail.com

 gcc/
 * config/xtensa/xtensa.md (divsf3, *recipsf2, sqrtsf2, *rsqrtsf2):
 remove.

 Approved.

Applied to trunk. Thanks!

-- Max


Re: [PATCH 2/2] xtensa: use pre- and postincrement FP load/store when available

2014-10-14 Thread Max Filippov
On Mon, Oct 13, 2014 at 8:04 PM, augustine.sterl...@gmail.com
augustine.sterl...@gmail.com wrote:
 On Sun, Oct 12, 2014 at 3:46 PM, Max Filippov jcmvb...@gmail.com wrote:
 2014-10-10  Max Filippov  jcmvb...@gmail.com

 gcc/
 * config/xtensa/xtensa.h (TARGET_HARD_FLOAT_POSTINC): new macro.
 * config/xtensa/xtensa.md (*lsiu, *ssiu): add dependency on
 !TARGET_HARD_FLOAT_POSTINC.
 (*lsip, *ssip): new instructions.

 Approved. Do you have write priviliges?

Applied to trunk. Thanks!

-- Max


[committed] MAINTAINERS: add myself to write-after-approval list.

2014-10-14 Thread Max Filippov
2014-10-15  Max Filippov  jcmvb...@gmail.com

* MAINTAINERS (write-after-approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 216231)
+++ MAINTAINERS (revision 216232)
@@ -380,6 +380,7 @@
 Chris Fairles  cfair...@gcc.gnu.org
 Changpeng Fang changpeng.f...@amd.com
 Li Fengnemoking...@gmail.com
+Max Filippov   jcmvb...@gmail.com
 Thomas Fitzsimmons fitz...@redhat.com
 Brian Ford f...@vss.fsi.com
 John Freeman   jfreema...@gmail.com


[PATCH] Better tolerance of incoming profile insanities in jump threading

2014-10-14 Thread Teresa Johnson
The below patch fixes the overflow detection when recomputing
probabilities after jump threading, in case of incoming profile
insanities. It detects more cases where the computation will overflow
not only the max probability but the max int and possibly wrap around.

LTO profilebootstrapped and tested on x86_64-unknown-linux-gnu.

Ok for trunk?

Thanks,
Teresa


2014-10-14  Teresa Johnson  tejohn...@google.com

PR bootstrap/63432
* tree-ssa-threadupdate.c (recompute_probabilities): Better
overflow checking.

Index: tree-ssa-threadupdate.c
===
--- tree-ssa-threadupdate.c (revision 216150)
+++ tree-ssa-threadupdate.c (working copy)
@@ -871,21 +871,23 @@ recompute_probabilities (basic_block bb)
   edge_iterator ei;
   FOR_EACH_EDGE (esucc, ei, bb-succs)
 {
-  if (bb-count)
+  if (!bb-count)
+continue;
+
+  /* Prevent overflow computation due to insane profiles.  */
+  if (esucc-count  bb-count)
 esucc-probability = GCOV_COMPUTE_SCALE (esucc-count,
  bb-count);
-  if (esucc-probability  REG_BR_PROB_BASE)
-{
- /* Can happen with missing/guessed probabilities, since we
-may determine that more is flowing along duplicated
-path than joiner succ probabilities allowed.
-Counts and freqs will be insane after jump threading,
-at least make sure probability is sane or we will
-get a flow verification error.
-Not much we can do to make counts/freqs sane without
-redoing the profile estimation.  */
- esucc-probability = REG_BR_PROB_BASE;
-   }
+  else
+/* Can happen with missing/guessed probabilities, since we
+   may determine that more is flowing along duplicated
+   path than joiner succ probabilities allowed.
+   Counts and freqs will be insane after jump threading,
+   at least make sure probability is sane or we will
+   get a flow verification error.
+   Not much we can do to make counts/freqs sane without
+   redoing the profile estimation.  */
+esucc-probability = REG_BR_PROB_BASE;
 }
 }

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: libffi patch RFA: Pass -Qunused-arguments for asm files

2014-10-14 Thread Paolo Bonzini
Il 30/09/2014 02:12, Ian Lance Taylor ha scritto:
 Similar to a recent patch to libgo, this patch to the libffi configure
 script checks whether the compiler support -Qunused-arguments.  If it
 does, it passes -Qunused-arguments when invoking the compiler on .s
 files.  This is because the clang driver complains by default when given
 useless arguments, such as -I options when compiling a .s file.  This
 somewhat annoying behaviour works poorly with configure scripts.  The
 -Qunused-arguments option disables it.  Bootstrapped and ran libffi and
 libgo tests on x86_64-unknown-linux-gnu.
 
 OK for mainline?
 
 Ian
 
 
 2014-09-29  Ian Lance Taylor  i...@google.com
 
   * configure.ac: If the compiler supports -Qunused-arguments, use
   it when running the compiler on .s files.
   * configure: Regenerated.

Ok.

Paolo